TODO_Code

+ [62] B:10.07.2012 E:??.??.????
  Task: For the "CorrOrder" coding method, we usually employ a very low
    sparsity value (to the tune of 0.01~0.2 depending on the number of coding
    words). It's wasteful to sort all similarities for an observation, and
    then use only the top of them. Using a heap we can determine the top k and
    then we can sort just these (with heap sort). The time will go down from
    O(wlogw) per observation to O(wlogk+klogk) per observation, where k is
    "sparsity_control * w". For low sparsity values, the time is essentially linear.

+ [61] B:02.07.2012 E:09.07.2012
  Task: Add all pre-processing steps for dictionary learning but none of them
    for coding (except maybe DC offset). This seems to be a more natural approach.
  Resolved: Made ZCA transform optional for learning dictionaries. In rest, it
    is not used.
 
+ [60] B:21.06.2012 E:09.07.2012
  Task: Add "Learn:GradSt" as valid learning methods in "window_sparse_recoder".
    Also make "learn.grad" divide by the number of samples.
  Resolved: Did this in the new update as requested.

- [59] B:21.06.2012 E:??.??.????
  Task: Make "load_image_cifar" more robust. Add in-function tests for expected input
    format and add tests for these modes of failure.

+ [58] B:21.06.2012 E:09.07.2012
  Task: Make the "logger" object passed to "dataset.load_*" functions be non-optional.
  Resolved: Made them non-optional and simplified the interface quite a lot.

+ [57] B:19.06.2012 E:09.07.2012
  Task: See if it is possible to consider the node message of a "new_*" or
    "beg_*" as being part of the previous type of message. This will make some
    visualizations a little bit more verbose, but maybe a little bit more clear.
  Resolved: It was possible. The output is a little bit more verbose, but we
    get a clearer picture of what's happening.

+ [56] B:19.06.2012 E:09.07.2012
  Task: Remove all the cruft that has gathered in the "logging" module. Remove
    in general all references to architectures.
  Resolved: I've removed the extra stuff from "logging" (Dataset_IO,
    Architecture) and added Regressor for the future. I've also simplified a
    lot of things there, including having just Experiment, Transform,
    Classifier, Regressor and Results as logging levels.

+ [55] B:07.05.2012 E:09.07.2012
  Task: Check all occurances of "samples" in the source code (even in logging
    messages) and replace it with "sample". This also includes expressions
    like "each sample".
  Resolved: Too specific.

- [54] B:06.05.2012 E:??.??.????
  Task: See where "logger.new_*" is used in critical code and remove
    it. Experiments with "transforms.image.window_sparse_recoder" have shown
    that code without many calls to "logger.new_*" inside critical loops is
    faster. This comes as a surpise, as the loops were small (3x3x7x7 in the
    experiment) and most of the heavy work was done in vectorized
    MATLAB. Also, see why "logger.beg_*" is so expensive. It adds two seconds
    to the test runs.

+ [53] B:30.04.2012 E:09.07.2012
  Task: Add "confidence" surface test for classifiers. This should be a
    visualization of the per-class confidence returned for the domain points
    for each classifier.
  Resolved: Added this for all classifiers as well as "sane-ish" confidence
    score computation.

+ [52] B:19.04.2012 E:09.07.2012
  Task: Add "RBF" as a coding method. A RBF-like distance is taken to each
    dictionary element and the catenation of all distances is the feature
    vector of a sample. This is similar to what is done in SVMs.
  Resolved: I've decided on no longer using distances as features, as, in
    general, the activation distribution tended to be uniform, which didn't
    help a lot for classification (we need a tightly peaked around 0 distribution).

+ [51] B:18.04.2012 E:09.07.2012
  Task: Separate the "coding" algorithms: correlation, matching pursuit and
    orthogonal matching pursuit into a separate module etc. Separate the
    "dictionary learning" algorithms: k-means, gradient descent etc. into a
    separate module, which accept a coding algorithm as input.
  Resolved: I've made this move. The class "transforms.record.dictionary"
    defines what a dictionary coding method does, and implements several
    coding methods, while several subclasses of this, in the
    "+transforms/+record/+dictionary" module, define different ways to learn
    the dictionary words.

+ [50] B:11.04.2012 E:09.07.2012
  Task: Add progress reports for long experiments. Delivered by mail, ofcourse.
  Resolved: Too fancy.

+ [49] B:04.04.2012 E:09.07.2012
  Task: Add stricter constraints to a dataset. It must contain at least one
    point from each class. Or at least one point. See whichone makes more
    sense.
  Task: Neither made sense. Dataset is OK as it is.

+ [48] B:29.03.2012 E:12.04.2012
  Task: Formalize the types of exceptions we will see in an application. For
    the moment we have:
       master:NoLoad
       master:NoConvergence
    The current model of "master:modules...:class:function:error" is not good
    for bigger handling. This is linked to TODO item [1] as well.
  Resolved: I've defined several types of errors until now. They are simply:
      * NoLoad: something could not load a file.
      * NoOpen: something could not open a file for writing.
      * InvalidFormat: an loaded file did not have the expected format and there
          was nothing we could do to recuperate.
      * NoConvergence: a classifier could not converge on a solution. Usually,
          raised in response to a MatLAB exception.

+ [47] B:29.03.2012 E:12.04.2012
  Task: Formalize the "params" type structure and make it work with all kinds
    of datatypes, like it did in the "RecognizeThis" implementation. We'll
    also drop the usage of "allcomb" from the project.
  Resolved: I've added the "prams" "module" and it works with all kinds of values
    and even has dependencies in the form of "params.depend" and "params.condition"
    fields.

+ [46] B:28.03.2012 E:12.04.2012
  Task: Setup a guideline for the order of properties and methods in a
    class. We now have, first properties, then methods, which start with
    constructors, then "basic" functions like "eq", "ne" etc. Then we have
    static methods, and finally we have a static method "test".
  Resolved: I've added an order and made all classes follow it. The order is:
      * Immutable properties.
      * Mutable properties for handle classes.
      * Public methods:
          * Constructor.
          * Basic methods: eq,ne,gt etc.
          * Other methods.
      * Protected methods.
      * Private methods.
      * Public static methods.
      * Protected static methods.
      * Private static methods.
      * The "test" function.    

+ [45] B:28.03.2012 E:12.04.2012
  Task: Setup a guideline for placement of assert pre-conditions in the bodies
    of functions. For the moment we have:
      * first, assertions relating to individual parameters.
      * second, assertions relating to groups of parameters.
    For the first kind, in the assertion conjunction, we first have conditions
    on the shape of the object, then other shape related constraints, then on
    the type of the object, then any other extra type constraint. Move all
    classes to follow this guideline.
    Also, an assert should check just one condition, not conjunctions of
    conditions, except in special cases.
  Resolved: I've made all classes and functions follow these guidelines: first
    structure and structure properties, then type and domain properties then extra
    object properties. This is for all arguments, in the order in which they appear
    in the argument list. After these there are asserts for more than one object.
    These appear in argument list order of the first objects that appear in the
    assert body, then the second if first are equal etc. One exception is structure
    properties, like the length of a vector, which might depend on another object.
    These appear at structure properties and not in combined asserts.

+ [44] B:28.03.2012 E:12.04.2012
  Task: Move the "data/test" directory to "test". All IO stuff done by tests
    should be from this directory, and this no longer includes just reading
    some data, but also writing some log files.
  Resolved: Added the "test" directory where test data is stored and where logs
    can be written during tests.

+ [43] B:28.03.2012 E:12.04.2012
  Task: Make all logging paths used in tests be in "../data/test" instead of
    just in "../data".
  Resolved: See message for TODO item [44].

+ [42] B:28.03.2012 E:12.04.2012
  Task: There is a problem with logging and exception handling. All our
    exceptions should be fatal, so maybe it won't materialize. There is a way
    to solve this though. I need to visit each logging and exception throwing
    function (which are basically functions wich log and do IO) and take care
    that all "beg_*" are appropriately matched in exception handlers.
  Resolved: There are few places where logging is in effect and exceptions can
    be thrown ("classifiers.svm" and "classifiers.one_vs_one" are two examples).
    Others are either in IO functions, where logging is controlled and in the
    creation of handlers ("logging.handlers.file"), where logging doesn't yet
    exist in most applications (you create handlers before loggers). Therefore,
    the effects are easily controlled. Also, the fact that we use "new_*" mitigates
    some of these problems.

+ [41] B:26.03.2012 E:09.07.2012
  Task: In many places we test that a dataset has at least one element. A
    dataset cannot be empty, so this test is redundant. See if it is worth it
    to eliminate it or if it has some documentation purposes.
  Resolved: The new formulation for "dataset" has this implicit. A "dataset"
    cannot be empty.

+ [40] B:26.03.2012 E:12.04.2012
  Task: Add a function "utils.string_in" or "tc.string_in" which checks that a
    string is equal to one from a cell array of other strings.
  Resolved: Added "tc.one_of" which tests if an object belongs to a collection.
    This is not limited to strings though.

- [39] B:26.03.2012 E:??.??.????
  Task: Add tests after calls to abstract methods to make sure the results are
    compatible.

+ [38] B:22.03.2012 E:??.??.????
  Task: There are a lot of places in the code, where functions use optional
    parameters. We should define a function in "utils" which, by accessing the
    workspace of the calling function, can provide a default value for a
    variable. It should work like.

      aa = utils.get_with_default('sender','coman@inb.uni-luebeck.de'); 
  Resolved: There aren't that many functions with optional parameters now,
    after the big rewrite.

+ [37] B:21.03.2012 E:12.04.2012
  Task: Replace usage of "!" syntax to call system utilities with calls to the
    "system" function and do proper testing of return codes.
  Resolved: I've replaced usage of "!" with calls to "system" and proper checking
    of return codes.

+ [36] B:20.03.2012 E:12.04.2012
  Task: "gray_images_set" maybe should not keep a copy of the original images
    around. For larger datasets or when many copies need to be made (like for the
    "random_corr_transform") a lot of memory is wasted.
  Resolved: Not really a problem after all. MatLAB copying is cheap on the one hand
    and for most processing images are unaffected ("sparse" and co. discard image
    information and produce a simple "dataset".

+ [35] B:20.03.2012 E:12.04.2012
  Task: See if there is a way to make "one_vs_one_classifier" safer by using
    "meta.class" and similar mechanisms in MatLAB. More specifically, pass a
    metaclass object to the classifier constructor instead of an anonymous
    function. 
  Resolved: I've added "architectures" after this task was submitted. These have
    more complex object creation requirements than we can satisfy using just
    "meta.class". We're going to have to rely on constructor functions for this one.

+ [34] B:19.03.2012 E:12.04.2012
  Task: "svm_classifier" should signal when it couldn't converge to a
    solution. Ditto for "logistic_regression_classifier" if we keep the MatLAB
    version. The proper mechanism should be raising an exception. Calling code
    should propagate the exception (in "architecture", most often, or
    "one_vs_one_classifier") until someone can handle it, usually by ignoring
    it and printing a message to the user.
  Resolved: I've added exception handling code to "classifiers.svm" and 
    "classifiers.one_vs_one" to handle this problem.

+ [33] B:19.03.2012 E:12.04.2012
  Task: Add a "one_sample" mechanism to "architecture", similar to the one in
    "classifier".
  Resolved: I've added the required "one_sample" as well as chains of tests for
    all the transforms we're doing on the data.

+ [32] B:19.03.2012 E:12.04.2012
  Task: explore the option of keeping a concatenated version of the images in
    a "gray_images_set" in one large image. See what operations could be
    hastened by this approach.
  Resolved: I've tried this approach with "transforms.image.random_corr" with
    unsatisfactory results. There was no noticeable increase in performance. The
    transformation was done locally, in the "do_code" function, though, so this
    might be the cause, but, in any case, it's too much work for too little gain
    to do such a massive change now.

+ [31] B:16.03.2012 E:12.04.2012
  Task: See what can be done about relaxing the "unitreal" constraint on
    images. At least some kind of non-image 2d or 3d signals could be
    represented as images, but for the unitreal constraint.
  Resolved: Images are no longer forced to be "unitreal". Just after loading,
    they indeed are, but with further processing it becomes increasinly unsafe
    to rescale images. Thefore we let them be and do rescaling just for
    visualization.

+ [30] B:16.03.2012 E:09.07.2012
  Task: Add a subclass of reversible transform for all dictionary based
    transform methods (save PCA), if it seems that it makes sense to share
    code like "corr_code" or "mp_code" or "omp_code" etc. between transforms.
  Resolved: Related to [51]. This was solved by solving that.

+ [29] B:14.03.2012 E:15.03.2012
  Task: Add a "image_resize_transform" which does resizing of
    "gray_images_set".
  Resolved: I've added the transform and some tests for it.

+ [28] B:14.03.2012 E:12.04.2012
  Task: Reorganize the file/module structure a little. For now, all classes
    live in the top-level. As we continue adding stuff, the top-level will
    become more crowded.
  Resolved: I've solved this. We have the following directory structure:
      * experiments.
      * architectures.
      * classifiers.
      * transforms.
          * image.
          * sparse.

- [27] B:14.03.2012 E:??.??.????
  Task: Add a "normal_bayes_classifier" which does classification using Bayes
    rule and assuming data is distributed normally, given a class. 

- [26] B:14.03.2012 E:??.??.????
  Task: Add a "probs_classifier" which does classification based on the class
    with the highest probability. This is the simplest probabilistic
    classifier.

+ [25] B:14.03.2012 E:15.03.2012
  Task: Add a "means_classifier" which does classification based on the
    distances from a sample to the mean of a class. This is the simplest
    geometric classifier.
  Resolved: I've added the classifier and the usual tests for it.

+ [24] B:14.03.2012 E:09.07.2012
  Task: Add a whole sleuth of other transforms besides the "dct_transform".
    Think of "fourier_transform", "random_patches_transform" and
    "random_weights_transform".
  Resolved: Task was open-ended. Some things were added, some things were not.
   
- [23] B:13.03.2012 E:??.??.????
  Task: Add "dct_transform".

+ [22] B:13.03.2012 E:12.04.2012
  Task: "gray_images_set.load_csvfile" produces a "samples_set" object. This
    is not quite what a user of "gray_images_set" expects. In fact, a user of
    this class wouldn't use this methods to begin with. We must either mask it
    or make it do something useful for images.
  Resolved: This was resolved since the last massive commit, but I've missed it
    somehow. "datasets.images" is no longer a descendant of "datasets.record" (as
    it would have been, have we kept the system at the time of this task's
    introduction). This problem cannot occur anymore.

- [21] B:13.03.2012 E:??.??.????
  Task: Make "display" variable passed to all tests a little bit more
    meaningful. It should have values "none", "slow", "fast", "step" for
    different levels of control.

+ [20] B:13.03.2012 E:12.04.2012
  Task: Lots of small tasks for "sparse_sgdmp_transform".
    * Change normalization code to work vectorized. Implement norm by hand
      and do fast matrix math on whole dict.
    * Store an learning rate schedule as a vector of learning rates at
      different times, computed from "initial_learning_rate",
      "final_learning_rate" and "max_iter_count". This will make calling
      "dict_gradient_descent" easier and will make it easier somewhat to
      change the learning rate schedule.
    * Make "matching_pursuit" work on multiple samples at a time. This is
      required both for fast batch mode operations and for fast coding.
    * Optionally make it accept an initial directory (for cases where you
      have hints of what a good directory might be or when you want to
      continue the training from a previous session). It could also accept a
      schedule parameter directly, instead of several schedule defining
      parameters.
  Resolved: I've vectorized the normalization code and made "matching_pursuit"
    as well as "do_code" and "do_decode" work vectorized. I've also added the
    "correlation", "matching_pursuit_alpha" and "ortho_matching_pursuit" ways
    to do coding, and added an extra parameter to the constructor to allow
    user selection. The other things will be added if needed.

+ [19] B:13.03.2012 E:09.07.2012
  Task: Add batch mode options for learning the sparse dictionary
    "transforms.sparse.sgdmp". Currently it does only online learning.
    A "batch_size" parameter should control how many vectors are selected per
    update.
  Resolved: In the new setup this is solved in the "transforms.record.dictionary.learn.grad_st"
    learning method.

+ [18] B:13.03.2012 E:15.03.2012
  Task: Make the testing code for "gray_images_set" which load the MNIST
    training data, load a smaller set instead (either MNIST test data or a
    smaller version alltogether).
  Resolved: I've made the test use the MNIST test data.

+ [17] B:13.03.2012 E:12.04.2012
  Task: Transforms which require training should also store one sample of
    their training data. This should be used later in calls to "code" and
    "decode" for compatibility testing.
  Resolved: I've added two extra properties, "one_sample_plain" and "one_sample_coded",
    which each derived class must fill in (sadly the base class constructor can't
    do this, without adding a lot of complexity and even more contracts that
    must be implicitly satisfied). These are used for compatibility testing
    mostly, though some transforms also use them to retrieve extra information
    during coding or decoding.

+ [16] B:13.03.2012 E:09.07.2012
  Task: We currently have no support for data of an unknown class. These
    appear quite a lot as data for which we have no labels and that we want to
    classify or as data for unsupervised pre-training. The support for this is
    "sketchy" at best: the "gray_images_set.load_from_dir" function assigns a
    class of "none" to the images it loaded. I should fix this. There should
    indeed be a special class, name it "__none__" or something which is not a
    valid label now, like -1, and make it present in every dataset "classes"
    list. Also a label index of -1 could mean that we don't know the class of
    the data. We should add the required code and add and change the required
    tests for this to work well. Transforms won't be affected by this change
    but classifiers should. They should ignore data that has the "__none__"
    label. This would be useful for "nicer" implementations of "one-vs-one"
    and "one-vs-all" classification. To prevent excessive copying we could
    also introduce a special mask operation so certain samples are ignored for
    purposes of classification etc.
    In any case, at first, think about this, then implement it, because it
    seems like a big change for the framework.
  Resolved: This was solved earlier, but I forgot to update the TODO.
    Separating the sample data and classification/regression information
    achieves the request's problems.

+ [15] B:09.03.2012 E:12.04.2012
  Task: Introduce more succint assertions for the objects we work with to use
    in tests. Instead of writing 10-15 assert statements, a function could
    take as parameters the desired form of an object and return true or false
    and the problem localization.
  Resolved: This is partial actually. I've moved many tests to use "tc.same".
    Some more complex tests could be replaced by proper use of "tc.same" or
    something extra, but it's not worth it, for now.

+ [14] B:09.03.2012 E:12.04.2012
  Task: Make tests more to the point. We should test computed values to be of
    a certain value, instead of testing also for form, domain etc. There are
    some cases where, indeed, this cannot be done, but we should try to do it
    for the rest.
  Resolved: See message for TODO item [14].

+ [13] B:08.03.2012 E:08.03.2012
  Task: Add a fast mode for testing where no display of images is done. This
    is also useful for running tests on other machines which may not have X.
  Resolved: I've added to all "test" functions an argument named "display". If
    this is present and "true" then image display occurs. This helps "all_tests"
   speed by a lot.

+ [12] B:08.03.2012 E:08.03.2012
  Task: There are bugs in the part of the test where we display images. More
   precisely, some figures are not properly closed after a test. I must make
   sure they work as intended.
  Resolved: I've made the code much simpler. It depends on MatLAB keeping
    track of the required interal figure handle and axis handle state.

+ [11] B:08.03.2012 E:08.03.2012
  Task: Whenever we test for an "object" we should also test for it being a
    scalar.
  Resolved: I've added the required checks.

+ [10] B:08.03.2012 E:??.??.????
  Task: Add timing code to "all_tests" so we know how much time a full test
    suite takes.
  Resolved: I've added such timing code.

+ [9] B:08.03.2012 E:08.03.2012
  Task: Add tests which work on images to "pca_transform",  "pca_whitening_transform"
    and "zca_transform". Also display the original images in the tests for
    "utils.remap_images_to_unit" because it is helpful to see how MatLAB displays
    such images badly.
  Resolved: I've added the image viewing code for "pca_transform" and
    "pca_whitening_transform" as it made the most sense there. Other commits
     have fixed the "utils.remap_images_to_unit" tests.

+ [8] B:07.03.2012 E:08.03.2012
  Task: There is a degree of non-uniformity in the logging for tests. I should
    fix this.
  Resolved: I've made all test logging messages follow a similar format and
    I've also made them more succint.

+ [7] B:07.03.2012 E:08.03.2012
  Task: Move all datasets used  in test in "$PROJECT_ROOT/data/test" so
    they'll be accessible no matter what.
  Resolved: I've moved all datesets used in testing in the required directory.

+ [6] B:06.03.2012 E:07.03.2012
  Task: In test functions add "printf" statements for subtests as well as full
    tests.
  Resolved: Added "printf" statements appropriate to each subtest.

+ [5] B:06.03.2012 E:07.03.2012
  Task: Make code that tests transforms use an approximate test for equality
    of samples. Due to numeric precision errors we might have problems in our
    tests. It's not something we want.
  Resolved: Added the function "utils.approx" and used it in all places where
    transform results comparisons were made.

+ [4] B:06.03.2012 E:06.03.2012
  Task: Add tests in "tc" structure functions for empty values. In particular,
    "tc.scalar", "tc.vector", "tc.matrix" and "tc.tensor" should return false on
    empty objects.
  Resolved: I've redefined the "type-tree" for this project and rewritten
    tests so they are simpler and more robust. Also, the structural functions
    have been modified as required by this task.

+ [3] B:06.03.2012 E:06.03.2012
  Task: Make all dependent properties be immutable. Since our objects contain
    only immutable fields, it makes no sense to have dependent properties. We
    can compute them only once, at construction time, and be done with
    that. This should reduce some of the clutter in "samples_set" and "gray_images_set".
  Resolved: Changed the properties in "samples_set" and "gray_images_set" to
    be immutable and computed in the constructor.

+ [2] B:05.03.2012 E:05.03.2012
  Task: In all "test" functions, make the destruction of objects between tests
    in the order of object creation. Alternatively, use "clear all".
  Resolved: Added a clear all after each test. Also, I've renamed objects so
    they are no longer unique at the function level. For example, most sample
    sets are now called "s"  instead of "s1", "s2" etc. Derived versions of
    such sets are, of course, still called "s_p", "s_f11" etc.

+ [1] B:05.03.2012 E:12.04.2012
  Task: Make use of the "Causes" field in MatLAB exception handling in the
    code so far. This seems like an implementation of the "hierarchical"
    exception handling I've been thinking about.
  Resolved: This was not really needed. Two months of working on this have made
    this point clear. Exception handling is not used that much. It appears in
    dataset and logger IO operations and in "classifiers.svm" when it does not
    converge.