Permalink
Find file
526c167 Nov 29, 2016
283 lines (175 sloc) 16.5 KB

Developing Shogun

This is a very basic list of things how to get started hacking Shogun. Your first steps should be to

  1. Compile from source, see INSTALL.md.
  2. Run the API examples, see INTERFACES.md, or create your own, see EXAMPLES.md
  3. Run the tests.

As we would like to avoid spending a lot of our time on explaining the same basic things many times, please excessively use the internet for any questions on the commands and tools needed. If you feel that this readme is missing something, please send a patch! :)

Quicklinks

Shogun git development cycle

We use the git flow workflow. The steps are

  1. Read the guide.
  2. Register on GitHub.
  3. Fork the shogun repository.
  4. Clone your fork, add the original shogun develop repository as a remote, and check out locally

    git clone https://github.com/YOUR_USERNAME/shogun
    cd shogun
    git remote add upstream https://github.com/shogun-toolbox/shogun
    git branch develop
    git checkout develop
    git pull --rebase upstream develop
    

    The steps until here only need to be executed once, with the exception being the last command: rebasing against the development branch. You will need to rebase everytime when the develop branch is updated.

  5. Create a feature branch (from develop)

    git branch feature/BRANCH_NAME
    
  6. Your code here: Fix bug or add feature. If you add something, or fix something, mention it in the NEWS file.

  7. Make sure (!) that locally, your code compiles, it is tested, it complies to the code style described on the wiki.

    make && make test
    

    If something does not work, try to find out whether your change caused it, and why. Read error messages and use the internet to find solutions. Compile errors are the easiest to fix! If all that does not help, ask us.

  8. Commit locally, using neat and informative commit messages, grouping commits, potentially iterate over more changes to the code,

    git commit FILENAME(S) -m "Fix issue #1234"
    git commit FILENAME(S) -m "Add feature XYZ"
    

    The amend option is your friend if you are updating single commits (so that they appear as one)

    git commit --amend FILENAME(S)
    

    If you want to group say the last three commits as one, squash them, for example

    git reset --soft HEAD~3
    git commit -m 'Clear commit message'
    
  9. Rebase against shogun's develop branch. This might cause rebase errors, which you need to solve

    git pull --rebase upstream develop
    
  10. Push your commits to your fork

    git push origin feature/BRANCH_NAME
    

    If you squashed or amended commits after you had pushed already, you might be required to force push via using the git push -f option with care.

  11. Send a pull request (PR) via GitHub. As described above, you can always update a pull request using the the git push -f option. Please do not close and send new ones instead, always update.

  12. Once the PR is merged, keep an eye on the buildfarm to see whether your patch broke something.

Requirements for merging your PR

  • Read some tips on how to write good pull requests. Make sure you don't waste your (and our) time by not respecting these basic rules.
  • All tests pass (your pull request causes automatic checks). We will not look at the patch otherwise.
  • The PR is small in terms of lines changes.
  • The PR is clean and addresses one issue.
  • The number of commits is minimal (i.e. one), the message is neat and clear.
  • If C++ code: it is covered by tests, it doesn't leak memory, its API is documented.
  • If API example: it has a clear scope, it is minimal, it looks polished, it has a passing test
  • If docs: clear, correct English language, spell-checked
  • If notebook: cell output is removed, template is respected, plots have axis labels.

Testing

There are three types of tests that can be executed locally, C++ unit tests, running the API examples, and integration testing the results of the API examples. To activate them locally, enable the -DENABLE_TESTING=ON cmake switch before running cmake. Which tests are activated depends on your configuration. Adding a test in most cases requires to re-run cmake. All activated tests can be executed with

make && make test

The first make is necessary as some tests need to be generated and/or compiled first.

Sometimes, it is useful to run a single test, which can be done via ctest, for example

ctest -R unit-LibSVR
ctest -R generated_cpp-binary_classifier-kernel_svm
ctest -R integration_meta_cpp-binary_classifier-kernel_svm -V

If a test name (or even the make test target) does not exist, this means that your configuration did not include it.

If you are interested in details how the test is executed (command, variables, directory), add the -V option. Further details can be extracted from the CMakeLists.txt configuration files in the tests folder.

C++ Unit tests

These are based on the googletest framework and are located in tests/unit/. You can compile them with

make shogun-unit-test

You can execute single tests via ctest, or via directly executing the unit test binary and passing it a filter, which gives a more grained control over which sub-tests are executed

./tests/unit/shogun-unit-test --gtest_filter=GaussianProcessRegression.apply_*

Note that wildcards are allowed. Running single sub-tests is sometimes useful (i.e. for bug hunting)

./shogun-unit-test --gtest_filter=GaussianProcessRegression.apply_apply_regression

Debugging and Memory leaks

All your C++ code and unit tests must be checked to not leak memory! You want to use a memory checker such as valgrind (or a debugger such as gdb). If you do that, you might want to compile with debugging symbols and without compiler optimizations, by using -DCMAKE_BUILD_TYPE=Debug

Then

valgrind ./shogun-unit-test --gtest_filter=GaussianProcessRegression.apply_apply_regression
gdb ./shogun-unit-test --gtest_filter=GaussianProcessRegression.apply_apply_regression

The option --leak-check=full for valgrind might be useful. In addition to manually running valgrind on your tests, you can use ctest to check multiple tests. This requires to be enable in dashboard reports in via -DBUILD_DASHBOARD_REPORTS=ON. For example

ctest -D ExperimentalMemCheck -R unit-GaussianProcessRegression

Adding tests

We aim to write clear, minimal, yet exhaustive tests of basic building blocks in Shogun. Whenever you send us C++ code, we will ask you for a unit test for it. We do test numerical results as compared to reference implementations (e.g. in Python), as well as corner cases, consistency etc. Read on test driven development, and search the web for tips on unit tests, e.g. googletest's tips.

Take inspiration from existing tests when writing new ones. Please structure them well.

API example tests

Make sure to read INTERFACES.md and EXAMPLES.md to understand how API examples are generated, you will need the cmake switch -DBUILD_META_EXAMPLES=ON. Every API example is used for two tests: simple execution and continuous integration testing of results. These two tests are executed for every enabled interface language.

Note that code for all interface examples needs to be generated as part of make, or using

make meta_examples

This needs to be done everytime you add or modify an example. Examples for compiled interface languages (e.g. C++, Java) need to be compiled, either as part of make, or via more specific targets, e.g.

make build_cpp_meta_examples
make build_java_meta_examples

Check the CMakeLists.txt in examples/meta/* for all such make targets.

Simple execution.

These tests are to make sure the code is executable, and to generate results for integration testing. These can be executed with ctest as described above, e.g.

ctest -R generated*
ctest -R generated_cpp-binary_classifier-kernel_svm -V

You can also execute the examples manually as described in INTERFACES.md. Note that the data git submodule is required to run the examples, see INSTALL.md.

Check the CMakeLists.txt in examples/meta/* for further details.

Adding tests

As every example is turned into a test when running cmake, all you need to do is to add an example as described in EXAMPLES.md.

Integration testing of results

You will note that each example produces an output file with the *.dat extension. This is a serialized version of all numerical results of the example. The purpose is to make sure all interface versions (say C++ and Python) of an example produce the same output, and that this output does not change over time.

The reference results are stored in the data git submodule, more precisely in data/testsuite/meta/*. There is a symbolic link for both generated and reference results in the build/tests/meta/ folder. Naturally, these tests depend on executing the corresponding example first. Therefore, running a test does not run the example again, but it simply compares the output to the reference file.

Again ctest can be used,

ctest -R integration_meta_*
ctest -R integration_meta_cpp-binary_classifier-kernel_svm
ctest -R integration_meta_python-binary_classifier-kernel_svm

See the CMakeLists.txt in tests/meta for details on the mechanics.

Adding tests

CMake automatically creates a test for every reference result file that it finds. Therefore, if you want to add new test, for example after having added an example as described in EXAMPLES.md, then you need to copy its generated output to the reference file folder, e.g.

cp build/tests/meta/generated_results/cpp/regression/kernel_ridge_regression.dat data/testsuite/meta/regression/

Note we usually use the output of the C++ example as reference.

Once that is done, it would be good if you sent us a patch with the new test. This is done via first sending a PR against the shogun-data, just like the standard development cycle, after doing (in the data directory)

git commit testsuite/meta/regression/kernel_ridge_regression.dat -m "Integration testing data for kernel ridge regression"
git push origin

After this PR is merged, you need to send a second PR against the main repository, after commiting the updated version hash of the submodule (in the main shogun directory)

git commit data -m "Updated to including kernel ridge regression test data"
git push origin

If everything worked, then the travis build in the second PR will include your test in all interface languages. Please check the logs!

Build farm

We run two types of buildfarms that are automatically triggered

  1. Travis, executed in a third-party cloud when opening a PR
  2. Buildbot, executed in our own cloud after every merged PR or commit

In addition, we have a few hooks on PRs that are executed along with travis, such as a preview of API examples. You will see a list of checks in your PR.

Travis

This is to do basic sanity checks on every PR. All interfaces have a different build, see .travis.yml in the repository. The Docker image that runs the travis tests is based on configs/shogun/Dockerfile and can be found here.

If you obbey the dev cycle, in particular if you run tests before sending a PR, travis should never fail.

If travis fails

  1. Read the logs, find the error message
  2. Try to identify the problem
  3. Find out whether you caused it
  4. If so, reproduce locally
  5. Fix it and update your PR

Buildbot

This service builds and tests Shogun in a large number of different configurations, OS, interfaces, etc. It ensures Shogun is portable, the build is backward compatible. It analysis Shogun's memory usage and performs static code analysis. It often catches very subtle errors

After one of your PR is merged, check the status of the buildbot for a while. The waterfall view is most useful. Again, check the logs if there are problems.

CMake tips

CMake is a beast. Make sure to read the docs and CMake_Useful_Variables. Make sure to understand the concept of out of source builds. Here are some tips on common options that are useful

Options for developers (debugging symbols on, optimization off, etc.):

cmake -DCMAKE_BUILD_TYPE=Debug -DENABLE_TESTING=ON -DBUILD_DASHBOARD_REPORTS=ON ..

Options for building the final binaries (debugging off, optimization on):

cmake -DCMAKE_BUILD_TYPE=Release ..

Getting a list of possible interfaces to enable:

grep -E "OPTION.*(Modular)" CMakeLists.txt

If eigen3 or json-c are missing use the following to download and compile these dependencies:

cmake -DBUNDLE_EIGEN=ON -DBUNDLE_JSON=ON

Specify a different swig executable:

cmake -DSWIG_EXECUTABLE=/usr/bin/swig2.0

To specify a different compiler, see CMake FAQ, "How do I use a different compiler?". You might have to delete the build directory or clear the cmake cache otherwise for this to work.

CC=/path/to/gcc CXX=/path/to/g++ cmake ..

Under OS X one often has the same Python major versions installed in /usr and /usr/local via brew etc, so one might observe crashes if the wrong Python version is linked against. To use a custom Python installation for Python bindings one would under brew use something like:

cmake -DPYTHON_INCLUDE_DIR=/usr/local/Cellar/python/2.7.5/Frameworks/Python.framework/Headers -DPYTHON_LIBRARY=/usr/local/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/libpython2.7.dylib  -DPythonModular=ON ..

or, in general:

cmake -DPYTHON_INCLUDE_DIR=/path/to/python/include/dir -DPYTHON_LIBRARY=path/to/python/libpythonVERSION.so ..

Under Linux, one may need to switch between different versions of Python, in which case the following options need to be included: -DPYTHON_EXECUTABLE:FILEPATH=/path/to/python/version, -DPYTHON_INCLUDE_DIR=/path/to/includes and -DPYTHON_PACKAGES_PATH=/path/to/dist/packages

For example:

cmake -DPYTHON_INCLUDE_DIR=/usr/include/python3.3 -DPYTHON_EXECUTABLE:FILEPATH=/usr/bin/python3 -DPYTHON_PACKAGES_PATH=/usr/local/lib/python3.3/dist-packages -DPythonModular=ON ..

In case header files or libraries are not at standard locations one needs to manually adjust the libray and include paths, -DCMAKE_INCLUDE_PATH=/my/include/path and -DCMAKE_LIBRARY_PATH=/my/library/path.

API documentation

Shogun uses doxygen for its API documentation. Every bit of C++ code that is added to Shogun needs doxygen compatible source-code comments.

  • Every class needs a description of what it implements. If possible, use LaTeX for math.
  • Every method needs a description, plus all parameters and return values documented.

Check existing code for inspiration. Documentation is important, so polish as good as you can!

If you have doxygen installed, you can generate the documentation locally via running

make doxygen

and then opening build/doc/doxygen/html/index.html with the browser.