The algorithm as proposed in:
O. Pujol, P. Radeva, , and J. Vitria`. "Discriminant ECOC: A heuristic method for application dependent design of error correcting output codes"
Thanks for the PR!
The paper you implemented was cited 141 times so it fits our rule that implemented papers should be prominent enough.
Could you benchmark the method (accuracy and training time) on a few datasets? It doesn't need to be super fancy, it's just to have a rough idea.
Rather than creating a ecoc_utils file, I would create a multiclass/ folder and put ECOC-related functions and classes in a ecoc.py file.
I did some tests, and it turns out the current implementation is '"really" slow (I tried to benchmark it on mnist dataset with 60,000 samples. I left it for about a night and it didn't even finish). The algorithm spends most of the time on calculating quadratic mutual information. Do you know any faster alternatives of calculating quadratic mutual information to the one proposed in the paper or some efficient implementation? On the other hand, to calculate quadratic mutual information, we calculate three different elements: V_in, V_all and V_btw, which have a lot of calculations in common. I think that feature could be dynamically taken advantage of and we could avoid doing the same calculations repeatedly.
I made some notes on how I mean to speed up calculating quadratic mutual information. Please, have a look and let me know what you think about this, thanks a lot!
Implemented Determinant ECOC
Implemented dynamic calculation of quadratic mutual information
Calculating only lower traingle of the decoc quadratic mutual informa…
Implemented parallel decoc
Passing indices of classes to create_ecoc books instead of classes th…
Speeded up sigma calculation
Added support for multi-labels classification
I just did a couple of experiments for: protein uci data set (time was 3921 seconds and score was 0.669) and usps data set (time: 361 seconds, score: 0.62). Are such scores acceptable? I also noticed that for most of the problems (especially those with huge number of samples) the time of creating a code book is not very significant as opposed to the time of training binary classifiers. However, the algorithm also speeds up that step, because we train classifier sonly with the subset of classes picked for them.
Thanks for the follow-up work. What is the accuracy of random codebook on the same datasets? Could you give results for different codebook sizes? Thanks!
@mblondel Actually, I didn't get any results for the random codebook on those datasets, because it took too much time on my laptop. The reason for this is that Determinant ECOC also divides data, so we don't need to use all of the classes for each classifier (we use only the current subset of classes that is currently in our tree node). I'm not sure how suitable it would be to test it on some smaller dataset? What do you think?
@mblondel The accuracy for those data sets is very similar for randomly-generated code books. However, a random code book is twice as big as that generated by determinant code book. Lowering the size of code book decreases the precision of random method.
Currently the random code book uses only two symbols (0 and 1). We could try to generate 3 symbols (-1, 0 and +1) and ignores all samples for which the symbol is 0.
@mblondel Do you mean using (-1, 0 and +1) for all Error Correcting Output Codes (only random code for now)?
Yes, it would be interesting to know if it works.
In light of #3768 I'm not sure we want this. ECOC don't really seem to work better then ovr in most settings. Close?