[MGR] Implemented Determinant ECOC #2391

Open
wants to merge 8 commits into
from

Projects

None yet

3 participants

@kpysniak
Contributor

The algorithm as proposed in:

O. Pujol, P. Radeva, , and J. Vitria`. "Discriminant ECOC: A heuristic method for application dependent design of error correcting output codes"

@mblondel
Member

Thanks for the PR!

The paper you implemented was cited 141 times so it fits our rule that implemented papers should be prominent enough.

Could you benchmark the method (accuracy and training time) on a few datasets? It doesn't need to be super fancy, it's just to have a rough idea.

Rather than creating a ecoc_utils file, I would create a multiclass/ folder and put ECOC-related functions and classes in a ecoc.py file.

@kpysniak
Contributor

I did some tests, and it turns out the current implementation is '"really" slow (I tried to benchmark it on mnist dataset with 60,000 samples. I left it for about a night and it didn't even finish). The algorithm spends most of the time on calculating quadratic mutual information. Do you know any faster alternatives of calculating quadratic mutual information to the one proposed in the paper or some efficient implementation? On the other hand, to calculate quadratic mutual information, we calculate three different elements: V_in, V_all and V_btw, which have a lot of calculations in common. I think that feature could be dynamically taken advantage of and we could avoid doing the same calculations repeatedly.

@kpysniak
Contributor

I made some notes on how I mean to speed up calculating quadratic mutual information. Please, have a look and let me know what you think about this, thanks a lot!

https://www.dropbox.com/sh/r5dmhuc82d9y797/PSV8OxYlwL/QuadraticMutualInformation.pdf

@kpysniak
Contributor
kpysniak commented Sep 1, 2013

I just did a couple of experiments for: protein uci data set (time was 3921 seconds and score was 0.669) and usps data set (time: 361 seconds, score: 0.62). Are such scores acceptable? I also noticed that for most of the problems (especially those with huge number of samples) the time of creating a code book is not very significant as opposed to the time of training binary classifiers. However, the algorithm also speeds up that step, because we train classifier sonly with the subset of classes picked for them.

@mblondel
Member

Thanks for the follow-up work. What is the accuracy of random codebook on the same datasets? Could you give results for different codebook sizes? Thanks!

@kpysniak
Contributor

@mblondel Actually, I didn't get any results for the random codebook on those datasets, because it took too much time on my laptop. The reason for this is that Determinant ECOC also divides data, so we don't need to use all of the classes for each classifier (we use only the current subset of classes that is currently in our tree node). I'm not sure how suitable it would be to test it on some smaller dataset? What do you think?

@kpysniak
Contributor

@mblondel The accuracy for those data sets is very similar for randomly-generated code books. However, a random code book is twice as big as that generated by determinant code book. Lowering the size of code book decreases the precision of random method.

@mblondel
Member

Currently the random code book uses only two symbols (0 and 1). We could try to generate 3 symbols (-1, 0 and +1) and ignores all samples for which the symbol is 0.

@kpysniak
Contributor
kpysniak commented Nov 8, 2013

@mblondel Do you mean using (-1, 0 and +1) for all Error Correcting Output Codes (only random code for now)?

@mblondel
Member
mblondel commented Nov 9, 2013

Yes, it would be interesting to know if it works.

@amueller
Member

In light of #3768 I'm not sure we want this. ECOC don't really seem to work better then ovr in most settings. Close?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment