Skip to content
This repository
Fetching contributors…

Cannot retrieve contributors at this time

executable file 222 lines (170 sloc) 8.862 kb
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222
= DEVELOPMENT =

Developing with Biolib takes a few skills. To map a new library one
needs to master git, cmake, SWIG, C/C++ and a high level language (in
that order ;). Each of these 'environments' takes significant
investment.

In this section I'll try to document successful strategies I use for
developing functionality and mapping (new) libraries and modules.
Maybe explaining some of the ideas can help...

== Git ==

Git is used to store source code and keep track of versions. In the
past I have extensively used CVS, SVN, darcs, mercurial (hg). Git is a
big step forward as it allows managing a large number of branches and
repositories in the easiest way. Biolib uses git submodules to contain
'embedded' git repositories. That way, not all code has to be copied
from other source trees. For example ./tools/cmake-support is shared
between two projects (Biolib and ASciLib) and the ./src/clibs/rqtl
tree is pulled from its own repository. Git has changed the way I
think and work, with regard to source code management. One powerful
tool to use with git is 'gitk'.

To make sure all git submodules are loaded in the source tree do:

  cd biolib
  git --version (should show >1.6)
  git branch (are you on the correct branch?)
  git submodule init
  git submodule update

One important thing to note is that these sub repositories (like
./tools/cmake_support) are not part of the main tree. So if you change
a file there, it is not properly recorded for others to see. However,
since these are git submodules they are repositories in their own
right, allowing adding remote repositories, but it is tricky to work
them. The safe solution, when you change something, is to send the
upstream maintainer a diff. For example:

  cd ./tools/cmake_support
  (change something)
  git diff > ~/cmake_support.patch

And send that patch.

== CMake ==

CMake generates Makefiles. Like the often seen autoconf tools
('configure'), it assesses the environment and creates a build system.
Read the FAQ and Documentation of CMake. Just to get a feel for what
it can do. Biolib uses CMake big time. CMake itself is a small and
simple language. It's strength comes from included modules, nicely
separated. For example we use the SWIG module. It is usually in a
place like /usr/share/cmake-2.6/Modules/. Check:

  less /usr/share/cmake-2.6/Modules/FindSWIG.cmake

I have learned to like CMake - it has the simplicity a programmer
needs. The syntax wins no points for beauty, but it is good enough.
Use the 'Message' command to print out what variables contain. That
makes the build system very transparent. Also CMake can give verbose
output (see the manual).

Biolib's CMake modules are in a git submodule under
./tools/cmake-support/modules/. Most are unique to Biolib/AScilib, but
some override the CMake modules, like FindZLIB.cmake and
FindPerlLibs.cmake.

Every Clib and mapped module in Biolib contains its CMakeLists.txt
file. In principle these are minimalistic. See for example
the one in ./src/clibs/example-1.0/. It contains a few settings
describing the name and version of the shared library to be build. It
tells the build system where to find include files, which files to
compile and to build and install the C library (when 'make' and 'make
install' are invoked). Likewise the Ruby mapping for this library is
in ./src/mappings/swig/ruby/example/. The CMakeLists.txt file names
the Ruby module, tells where to find files and build them. It also
adds a default test script, which runs on 'make test'.

To build the system you can build from the root of the tree, but
importantly also at the C library and module level. Do not mix these
two strategies(!) To build a C library and module library, rather than
the whole thing, do:

  cd biolib
  sh scripts/cleanup.sh
  cd src/clibs/example-1.0/
  cmake .
  make
  cd ../../mappings/swig/ruby/example/
  cmake .
  make
  make test
  cat Testing/Temporary/LastTest.log

This allows very efficient compilation steps during development. No
need to build anything but the libraries that depend on the module.
Once the Cmake files are there you don't have to generate them every
time. A simple:

  make clean ; make

will suffice. Only when you are tweaking CMake itself you *need* to
remove the cache. For some reason CMake caches all variables. In that
case:

  rm CMakeCache.txt ; cmake . ; make clean ; make

will do the trick. Or use the cleanup.sh script.

In general, with CMake, study all the CMakeLists.txt files that come
with Biolib. At some point all features may be documented, but at this
stage the CMake modules and CMakeLists.txt files *are* the
documentation.

CMake can handle many different setups for code. For example the
staden_io_lib Clib has a non-standard source repository, and
non-standard location for include files. And it requires an external
Zlib.so lirary. The Affyio Clib has a typical R layout and I have
written special mapping functions in biolib_affyio.c|h, and requires
the external R library. The rqtl Clib uses an external repository for
fetching the source code (using a git submodule). I have a special
directory in it in ./src/clibs/rqtl/contrib/biolib, where the
CMakeLists.txt file resides. Note that rqtl overrides linkage for the
modules using the USE_C_LINKAGE directive, used in
./tools/cmake-support/modules/FindMapSWIG.cmake.

You can see CMake is very flexible handling all these different
layouts. See ./tools/cmake-support/doc/CMAKE.txt for more information.

== SWIG ==

SWIG is software done right (I think this is true for all mentioned
tools here). The syntax could have been a bit nicer, but I really
like the principles of pattern matching and generated code.

Hacking SWIG typedefs can be tricky. My advise is to really study the
*generated* code. And do things one step at a time. When I encounter a
data type that is not mapped automatically I create a new git branch,
for an example, and test that against my favourite high level
language. For example in the example clib I have tested double**
matrices against a Ruby module. By studying the generated code (in
ruby_exampleRUBY_wrap.cxx) I could see what SWIG was doing with
variables, scope etc. Playing with different solutions led to one
preferred solution. It is also possible to invoke gdb to see what is
happening under the hood.

== Using GDB ==

To compile with debug information do:

  cmake -DDEBUG=1 .
  make

in both the shared lib build and the module.

In principle an interpreter can be run inside gdb. So, from the test
log above (.Testing/Temporary/LastTest.log):

  cd ./src/mappings/swig/ruby/example/
  gdb /usr/bin/ruby
    run "./../test/test_example.rb"

On a memory exception a backtrace can be found with 'where'.

However, I find I hardly use gdb (never have done). I prefer to use
print statements, also in the SWIG typedefs.

== Testing ==

=== Integration testing ===

Every implemented mapping module (Perl, Ruby etc.) should have a test
script. This is located in

  ./src/mappings/swig/language/test/test_modulename.ext

these are automatically run on 'make test' and are effectively
integration tests - i.e. they should fail if something goes wrong in
the compilation/linking steps for different platforms. Keep these
tests short (no long runtime). They are, obviously, practical during
mapping development.

=== Unit tests ===

For unit testing (not automatically run during builds) there are two
options. Either use above scripts, making sure the tests do not run
automatically from the command line. Or create tests in
./src/extra/language/test with an accompanying runner script.

In the future we may add a 'make fulltest' command, to run all tests
in both places.

Note the distinction between two source trees. The language (Perl,
Ruby etc) code in ./src/mappings/swig/language/... is closely tied
to the mapping effort, and the code should reflect that. Think of it
this way: any code that mirrors functionality in the different
language domains should be there.

More remote functionality goes into ./src/extra/language/... I.e.
code specific to one language implementation.

Obviously the latter code is candidate for inclusion into BioPerl,
BioRuby etc. It is 'extra' functionality outside the mapping scope of
BioLib. Even though it may live in that location forever.

=== Test data ===

Test data files are stored in ./src/test/data (at least the smaller
files with size < 50K).

With regard to test *data* the policy is *not* to store large data
files in the source tree. With biological data this would become
really unwieldy in no time. The idea is to use stubs for data files,
like what I have done for Affyio. See:

  src/test/data/microarray/affy/test/test*.gz.stub

When a test requires one of these files they get downloaded from a
central webserver. The script that handles it for Ruby is in
./src/extra/ruby/biolib/biolibtest.rb.
Something went wrong with that request. Please try again.