This is a very basic list of things how to get started hacking Shogun. Your first steps should be to
- Compile from source, see INSTALL.md.
- Run the API examples, see INTERFACES.md, or create your own, see EXAMPLES.md
- Run the tests.
As we would like to avoid spending a lot of our time on explaining the same basic things many times, please excessively use the internet for any questions on the commands and tools needed. If you feel that this readme is missing something, please send a patch! :)
We use the git flow workflow. The steps are
- Read the guide.
- Register on GitHub.
- Fork the shogun repository.
Clone your fork, add the original shogun develop repository as a remote, and check out locally
git clone https://github.com/YOUR_USERNAME/shogun cd shogun git remote add upstream https://github.com/shogun-toolbox/shogun git branch develop git checkout develop git pull --rebase upstream develop
The steps until here only need to be executed once, with the exception being the last command: rebasing against the development branch. You will need to rebase everytime when the develop branch is updated.
Create a feature branch (from develop)
git branch feature/BRANCH_NAME
Your code here: Fix bug or add feature. If you add something, or fix something, mention it in the
Make sure (!) that locally, your code compiles, it is tested, it complies to the code style described on the wiki.
make && make test
If something does not work, try to find out whether your change caused it, and why. Read error messages and use the internet to find solutions. Compile errors are the easiest to fix! If all that does not help, ask us.
Commit locally, using neat and informative commit messages, grouping commits, potentially iterate over more changes to the code,
git commit FILENAME(S) -m "Fix issue #1234" git commit FILENAME(S) -m "Add feature XYZ"
The amend option is your friend if you are updating single commits (so that they appear as one)
git commit --amend FILENAME(S)
If you want to group say the last three commits as one, squash them, for example
git reset --soft HEAD~3 git commit -m 'Clear commit message'
git pull --rebase upstream develop
Push your commits to your fork
git push origin feature/BRANCH_NAME
If you squashed or amended commits after you had pushed already, you might be required to force push via using the
git push -foption with care.
Send a pull request (PR) via GitHub. As described above, you can always update a pull request using the the
git push -foption. Please do not close and send new ones instead, always update.
Once the PR is merged, keep an eye on the buildfarm to see whether your patch broke something.
Requirements for merging your PR
- Read some tips on how to write good pull requests. Make sure you don't waste your (and our) time by not respecting these basic rules.
- All tests pass (your pull request causes automatic checks). We will not look at the patch otherwise.
- The PR is small in terms of lines changes.
- The PR is clean and addresses one issue.
- The number of commits is minimal (i.e. one), the message is neat and clear.
- If C++ code: it is covered by tests, it doesn't leak memory, its API is documented.
- If API example: it has a clear scope, it is minimal, it looks polished, it has a passing test
- If docs: clear, correct English language, spell-checked
- If notebook: cell output is removed, template is respected, plots have axis labels.
There are three types of tests that can be executed locally, C++ unit tests, running the API examples, and integration testing the results of the API examples. To activate them locally, enable the
-DENABLE_TESTING=ON cmake switch before running cmake. Which tests are activated depends on your configuration. Adding a test in most cases requires to re-run
cmake. All activated tests can be executed with
make && make test
make is necessary as some tests need to be generated and/or compiled first.
Sometimes, it is useful to run a single test, which can be done via ctest, for example
ctest -R unit-LibSVR ctest -R generated_cpp-binary_classifier-kernel_svm ctest -R integration_meta_cpp-binary_classifier-kernel_svm -V
If a test name (or even the
make test target) does not exist, this means that your configuration did not include it.
If you are interested in details how the test is executed (command, variables, directory), add the
-V option. Further details can be extracted from the
CMakeLists.txt configuration files in the tests folder.
C++ Unit tests
These are based on the googletest framework and are located in
tests/unit/. You can compile them with
You can execute single tests via
ctest, or via directly executing the unit test binary and passing it a filter, which gives a more grained control over which sub-tests are executed
Note that wildcards are allowed. Running single sub-tests is sometimes useful (i.e. for bug hunting)
Debugging and Memory leaks
All your C++ code and unit tests must be checked to not leak memory!
You want to use a memory checker such as valgrind (or a debugger such as gdb).
If you do that, you might want to compile with debugging symbols and without compiler optimizations, by using
valgrind ./shogun-unit-test --gtest_filter=GaussianProcessRegression.apply_apply_regression gdb ./shogun-unit-test --gtest_filter=GaussianProcessRegression.apply_apply_regression
--leak-check=full for valgrind might be useful. In addition to manually running valgrind on your tests, you can use
ctest to check multiple tests. This requires to be enable in dashboard reports in via
-DBUILD_DASHBOARD_REPORTS=ON. For example
ctest -D ExperimentalMemCheck -R unit-GaussianProcessRegression
We aim to write clear, minimal, yet exhaustive tests of basic building blocks in Shogun. Whenever you send us C++ code, we will ask you for a unit test for it. We do test numerical results as compared to reference implementations (e.g. in Python), as well as corner cases, consistency etc. Read on test driven development, and search the web for tips on unit tests, e.g. googletest's tips.
Take inspiration from existing tests when writing new ones. Please structure them well.
API example tests
Make sure to read INTERFACES.md and EXAMPLES.md to understand how API examples are generated, you will need the cmake switch
Every API example is used for two tests: simple execution and continuous integration testing of results. These two tests are executed for every enabled interface language.
Note that code for all interface examples needs to be generated as part of
make, or using
This needs to be done everytime you add or modify an example. Examples for compiled interface languages (e.g. C++, Java) need to be compiled, either as part of
make, or via more specific targets, e.g.
make build_cpp_meta_examples make build_java_meta_examples
examples/meta/* for all such make targets.
These tests are to make sure the code is executable, and to generate results for integration testing. These can be executed with
ctest as described above, e.g.
ctest -R generated* ctest -R generated_cpp-binary_classifier-kernel_svm -V
examples/meta/* for further details.
As every example is turned into a test when running
cmake, all you need to do is to add an example as described in EXAMPLES.md.
Integration testing of results
You will note that each example produces an output file with the
*.dat extension. This is a serialized version of all numerical results of the example.
The purpose is to make sure all interface versions (say C++ and Python) of an example produce the same output, and that this output does not change over time.
The reference results are stored in the
data git submodule, more precisely in
data/testsuite/meta/*. There is a symbolic link for both generated and reference results in the
build/tests/meta/ folder. Naturally, these tests depend on executing the corresponding example first. Therefore, running a test does not run the example again, but it simply compares the output to the reference file.
ctest can be used,
ctest -R integration_meta_* ctest -R integration_meta_cpp-binary_classifier-kernel_svm ctest -R integration_meta_python-binary_classifier-kernel_svm
tests/meta for details on the mechanics.
CMake automatically creates a test for every reference result file that it finds. Therefore, if you want to add new test, for example after having added an example as described in EXAMPLES.md, then you need to copy its generated output to the reference file folder, e.g.
cp build/tests/meta/generated_results/cpp/regression/kernel_ridge_regression.dat data/testsuite/meta/regression/
Note we usually use the output of the C++ example as reference.
Once that is done, it would be good if you sent us a patch with the new test. This is done via first sending a PR against the shogun-data, just like the standard development cycle, after doing (in the
git commit testsuite/meta/regression/kernel_ridge_regression.dat -m "Integration testing data for kernel ridge regression" git push origin
After this PR is merged, you need to send a second PR against the main repository, after commiting the updated version hash of the submodule (in the main shogun directory)
git commit data -m "Updated to including kernel ridge regression test data" git push origin
If everything worked, then the travis build in the second PR will include your test in all interface languages. Please check the logs!
We run two types of buildfarms that are automatically triggered
- Travis, executed in a third-party cloud when opening a PR
- Buildbot, executed in our own cloud after every merged PR or commit
In addition, we have a few hooks on PRs that are executed along with travis, such as a preview of API examples. You will see a list of checks in your PR.
This is to do basic sanity checks on every PR. All interfaces have a different build, see
.travis.yml in the repository. The Docker image that runs the travis tests is based on
configs/shogun/Dockerfile and can be found here.
If you obbey the dev cycle, in particular if you run tests before sending a PR, travis should never fail.
If travis fails
- Read the logs, find the error message
- Try to identify the problem
- Find out whether you caused it
- If so, reproduce locally
- Fix it and update your PR
This service builds and tests Shogun in a large number of different configurations, OS, interfaces, etc. It ensures Shogun is portable, the build is backward compatible. It analysis Shogun's memory usage and performs static code analysis. It often catches very subtle errors
After one of your PR is merged, check the status of the buildbot for a while. The waterfall view is most useful. Again, check the logs if there are problems.
Options for developers (debugging symbols on, optimization off, etc.):
cmake -DCMAKE_BUILD_TYPE=Debug -DENABLE_TESTING=ON -DBUILD_DASHBOARD_REPORTS=ON ..
Options for building the final binaries (debugging off, optimization on):
cmake -DCMAKE_BUILD_TYPE=Release ..
Getting a list of possible interfaces to enable:
grep -E "OPTION.*(Modular)" CMakeLists.txt
If eigen3 or json-c are missing use the following to download and compile these dependencies:
cmake -DBUNDLE_EIGEN=ON -DBUNDLE_JSON=ON
Specify a different swig executable:
To specify a different compiler, see CMake FAQ, "How do I use a different compiler?". You might have to delete the build directory or clear the cmake cache otherwise for this to work.
CC=/path/to/gcc CXX=/path/to/g++ cmake ..
Under OS X one often has the same Python major versions installed in
/usr/local via brew etc, so one might observe crashes if the wrong Python version is linked against. To use a custom Python installation for Python bindings one would under brew use something like:
cmake -DPYTHON_INCLUDE_DIR=/usr/local/Cellar/python/2.7.5/Frameworks/Python.framework/Headers -DPYTHON_LIBRARY=/usr/local/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/libpython2.7.dylib -DPythonModular=ON ..
or, in general:
cmake -DPYTHON_INCLUDE_DIR=/path/to/python/include/dir -DPYTHON_LIBRARY=path/to/python/libpythonVERSION.so ..
Under Linux, one may need to switch between different versions of Python, in which case the following options need to be included:
cmake -DPYTHON_INCLUDE_DIR=/usr/include/python3.3 -DPYTHON_EXECUTABLE:FILEPATH=/usr/bin/python3 -DPYTHON_PACKAGES_PATH=/usr/local/lib/python3.3/dist-packages -DPythonModular=ON ..
In case header files or libraries are not at standard locations one needs
to manually adjust the libray and include paths,
- Every class needs a description of what it implements. If possible, use LaTeX for math.
- Every method needs a description, plus all parameters and return values documented.
Check existing code for inspiration. Documentation is important, so polish as good as you can!
If you have doxygen installed, you can generate the documentation locally via running
and then opening
build/doc/doxygen/html/index.html with the browser.