Skip to content
Fetching latest commit…
Cannot retrieve the latest commit at this time.
..
Failed to load latest commit information.
cmdline_static
generator
libshogun
lua_modular
matlab_and_octave
octave_modular
perl_modular
python_modular
python_static
r_modular
r_static
CMakeLists.txt
README
blacklist
data
generate_testdata.py
matlab_static
octave_static
test_configure.py

README

README testsuite
================

This document tries to explain how to use and extend the testsuite.
Written January 2009 by Sebastian Henschel <shogun@kodeaffe.de>


Usage
-----

There are two major functions:

- Generate testcase data.
   This is run seldomly, usually when new functionality has been added or
   algorithmic bugs have been found. The generator is implemented in python
   and uses SHOGUN's interface python-modular or perl-modular. The resulting files are in
   matlab/octave format.

   Just issue

     ./generate_testdata.py

   and all data files will be created in data/, within subdirectories
   according to the category of which they are part of. This will not remove
   previously created testcases, so you might want to run

     ./generate_testdata.py clear

   to remove these files.

   If you want to (re)create only the testcases for a specific category, issue

   ./generate_testdata.py <category>

   where category is one of: classifier, clustering, distance, distribution,
   kernel, preproc, regression.


- Test your current iteration of SHOGUN against the previously created
  testcases.
   For that you have to change directory into the interface you want to test
   against, e.g.

     cd python-modular

   There you can test against all testcases or all of one category or
   individually.
   To test against all, just issue

     ./test_all.sh

  To test against all of one category, issue

    ./test_all.sh <category>

  where category is one of: see above

  To test individually, issue

    ./test_one.py ../data/<category>/<testcase>.m (python)
    ./test_one.pl ../data/<category>/<testcase>.m (perl)
    ./test_one.sh ../data/<category>/<testcase>.m (matlab/octave/R)

  If you run tests individually, you get to see more than ERROR in case
  of failure.




Extension
---------

Now that was easy, the difficult part is how to extend the beast. The overview
given below is hopefully sufficient to help the uninitiated to succeed.

- Generator
   Change the working directory to generator/.

   There are 4 modules to help with recurring operations:
    1) category.py - defines the categories used, their names and
        representation as number (as a sort of enum)
    2) dataop.py - helps generating various random data, like numbers, DNA
        sequences, random numbers structured in clouds, etc.
    3) featop.py - helps creating feature objects out of the given data
    4) fileop.py - helps preparing and writing the generated data to file.

   The first task now is to define the category where you want your testcase
   to be generated. If you should ever add another category, don't forget to
   add it to the constant CATEGORIES in __init__.py and to update category.py
   accordingly.
   Each category module has a function run() which is called via __init__.py
   and itself runs other run_* functions which group items together,
   for instance kernels with Byte features, kernels with String features,
   kernels that can handle subkernels, linear SVM classifiers, kernel-based
   SVM classifiers and so on.
   Basically each group function creates the necessary data and then
   establishes a hash called params which contains all the parameters that are
   necessary to run an item and must be written to file. This hash is
   often modified while the group function is executed, depending on
   which item is actually run. Sometimes it contains other hashes and tuples
   like kernel parameters, sometimes it is just another integer. Each item's
   parameters are defined either at the beginning of the group's run
   function (if relevant to the whole group) or shortly before the item is
   about to be run.
   Usually, a compute_* function is then executed which actually feeds the
   gathered data and parameters to SHOGUN and handles its output. Some
   compute_* functions are relevant to only one group function, some are used
   by several group functions, kernel.compute() is used by
   kernel.run_feats_byte() and kernel.run_distance(), for example while
   kernel.compute_pie() is only used by kernel.run_pie().
   Classifiers (and regressions) are a bit different, because some of them run
   through a loop function which feeds the compute function with different
   values for C, number of threads, epsilon, etc., modifying the hash params
   while doing so.
   The compute_* functions interface with the fileop module to gather output
   data and write it all to file. Afterwards the next group or category is
   run or the generator has finished its job.



- Testers
   Each interface has its own tester, although libshogun and cmdline have no
   tester at all at the moment. Furthermore, there are 2 groups which run
   somewhat similar code - the static and the dynamic interfaces.


   Static interfaces (python/, perl/, matlab_and_octave/, r/):

   Of course, the details are dependent on the actual interface, but all of
   them have some utility functions (util.py, util/*) to check accuracy, set
   features, set distances/kernels/classifiers and fixing some name
   inconsistencies between static and modular interface. the generator uses
   the modular interface and hence some names in the testcase files don't
   match what the static interface understands. These utility functions are
   common to all categories.
   The aforementioned test_all scripts call a test_one script for each
   testcase which takes care of converting the matlab/octave format of the
   testcase file into a local format. It then calls a module from the
   appropriate category. As in the generator, each category is represented in
   one file, e.g.  kernel.py/kernel.m/kernel.R.
   For python or for perl, the function test() is the entry point from which various
   evaluate() functions are spawned, depending on the item to be tested. these
   evaluations are similar to the generator's compute() functions, but instead
   of writing the results and parameters too file, they are compared to the
   results in the testcase file.
   In octave/matlab, the main function of a category is named as the category.
   Due to the nature of how global variables are handled in matlab/octave,
   there is a file called util/globals.m which contains all possible variable
   names that can be in the testcase files. So be aware that you might need to
   update this file and also use 'global <varname>;' in the function where you
   want to use it. It is cludgy indeed, but did seem to be the only way to
   handle the problem with the limited capabilities of matlab and me.
   The structure of the R files resembles the structure octave/matlab, but
   in its own syntax of course and without the global variable cludge.


   Modular interfaces (python-modular/, perl-modular/, octave-modular/, r-modular/):

   Essentially, the structure is the same as in the static interfaces with
   the main difference being a different object handling in python/R, perl/R as in
   get_* functions to return the created objects instead of set_* where the
   data is fed bit by bit into one big state machine called SHOGUN.
   octave still uses set_* because it makes use of global variables which
   might be able to converted to get_* as well. The inclined reader is
   encouraged to try that.
Something went wrong with that request. Please try again.