Fetching latest commit…
Cannot retrieve the latest commit at this time.
|Failed to load latest commit information.|
README testsuite ================ This document tries to explain how to use and extend the testsuite. Written January 2009 by Sebastian Henschel <email@example.com> Usage ----- There are two major functions: - Generate testcase data. This is run seldomly, usually when new functionality has been added or algorithmic bugs have been found. The generator is implemented in python and uses SHOGUN's interface python-modular or perl-modular. The resulting files are in matlab/octave format. Just issue ./generate_testdata.py and all data files will be created in data/, within subdirectories according to the category of which they are part of. This will not remove previously created testcases, so you might want to run ./generate_testdata.py clear to remove these files. If you want to (re)create only the testcases for a specific category, issue ./generate_testdata.py <category> where category is one of: classifier, clustering, distance, distribution, kernel, preproc, regression. - Test your current iteration of SHOGUN against the previously created testcases. For that you have to change directory into the interface you want to test against, e.g. cd python-modular There you can test against all testcases or all of one category or individually. To test against all, just issue ./test_all.sh To test against all of one category, issue ./test_all.sh <category> where category is one of: see above To test individually, issue ./test_one.py ../data/<category>/<testcase>.m (python) ./test_one.pl ../data/<category>/<testcase>.m (perl) ./test_one.sh ../data/<category>/<testcase>.m (matlab/octave/R) If you run tests individually, you get to see more than ERROR in case of failure. Extension --------- Now that was easy, the difficult part is how to extend the beast. The overview given below is hopefully sufficient to help the uninitiated to succeed. - Generator Change the working directory to generator/. There are 4 modules to help with recurring operations: 1) category.py - defines the categories used, their names and representation as number (as a sort of enum) 2) dataop.py - helps generating various random data, like numbers, DNA sequences, random numbers structured in clouds, etc. 3) featop.py - helps creating feature objects out of the given data 4) fileop.py - helps preparing and writing the generated data to file. The first task now is to define the category where you want your testcase to be generated. If you should ever add another category, don't forget to add it to the constant CATEGORIES in __init__.py and to update category.py accordingly. Each category module has a function run() which is called via __init__.py and itself runs other run_* functions which group items together, for instance kernels with Byte features, kernels with String features, kernels that can handle subkernels, linear SVM classifiers, kernel-based SVM classifiers and so on. Basically each group function creates the necessary data and then establishes a hash called params which contains all the parameters that are necessary to run an item and must be written to file. This hash is often modified while the group function is executed, depending on which item is actually run. Sometimes it contains other hashes and tuples like kernel parameters, sometimes it is just another integer. Each item's parameters are defined either at the beginning of the group's run function (if relevant to the whole group) or shortly before the item is about to be run. Usually, a compute_* function is then executed which actually feeds the gathered data and parameters to SHOGUN and handles its output. Some compute_* functions are relevant to only one group function, some are used by several group functions, kernel.compute() is used by kernel.run_feats_byte() and kernel.run_distance(), for example while kernel.compute_pie() is only used by kernel.run_pie(). Classifiers (and regressions) are a bit different, because some of them run through a loop function which feeds the compute function with different values for C, number of threads, epsilon, etc., modifying the hash params while doing so. The compute_* functions interface with the fileop module to gather output data and write it all to file. Afterwards the next group or category is run or the generator has finished its job. - Testers Each interface has its own tester, although libshogun and cmdline have no tester at all at the moment. Furthermore, there are 2 groups which run somewhat similar code - the static and the dynamic interfaces. Static interfaces (python/, perl/, matlab_and_octave/, r/): Of course, the details are dependent on the actual interface, but all of them have some utility functions (util.py, util/*) to check accuracy, set features, set distances/kernels/classifiers and fixing some name inconsistencies between static and modular interface. the generator uses the modular interface and hence some names in the testcase files don't match what the static interface understands. These utility functions are common to all categories. The aforementioned test_all scripts call a test_one script for each testcase which takes care of converting the matlab/octave format of the testcase file into a local format. It then calls a module from the appropriate category. As in the generator, each category is represented in one file, e.g. kernel.py/kernel.m/kernel.R. For python or for perl, the function test() is the entry point from which various evaluate() functions are spawned, depending on the item to be tested. these evaluations are similar to the generator's compute() functions, but instead of writing the results and parameters too file, they are compared to the results in the testcase file. In octave/matlab, the main function of a category is named as the category. Due to the nature of how global variables are handled in matlab/octave, there is a file called util/globals.m which contains all possible variable names that can be in the testcase files. So be aware that you might need to update this file and also use 'global <varname>;' in the function where you want to use it. It is cludgy indeed, but did seem to be the only way to handle the problem with the limited capabilities of matlab and me. The structure of the R files resembles the structure octave/matlab, but in its own syntax of course and without the global variable cludge. Modular interfaces (python-modular/, perl-modular/, octave-modular/, r-modular/): Essentially, the structure is the same as in the static interfaces with the main difference being a different object handling in python/R, perl/R as in get_* functions to return the created objects instead of set_* where the data is fed bit by bit into one big state machine called SHOGUN. octave still uses set_* because it makes use of global variables which might be able to converted to get_* as well. The inclined reader is encouraged to try that.