Fix the issue #3745, the code book generation for OutputCodeClassifier #3768

queqichao · 2014-10-13T20:37:41Z

Change the process of generating output code for OutputCodeClassifier. The process is to draw subsets of the exhaustive code book, see [1], multiple times and pick the one that give largest hamming distances between classes.
Change the default value of code_size from 1.5 to 1. 1.5 is problematic. For example when n_classes = 3, the size of exhaustive code book is 3, so 1.5 for code_size is not possible.
Add test case.
Update the document.

[1] Thomas G. Dietterich, Ghulum Bakiri. Solving Multiclass Learning Problems via Error-Correcting Output Codes

+ Add test case. + Update the document.

coveralls · 2014-10-13T20:50:46Z

Coverage increased (+0.0%) when pulling 9565a2a on queqichao:multiclass_code_book_fix into 031a3fc on scikit-learn:master.

arjoly · 2014-10-14T08:05:53Z

Can you preserve the previous strategy? We need to remain backward compatible.

queqichao · 2014-10-14T12:34:12Z

@arjoly I didn't the change the interface actually. The only place that could potentially cause backward compatibility issue is that when code_size is not in the correct range, it will give ValueError.

To resolve this, there are 2 things I can do:
(1) keep old method as the default one, and add option use the new methods. However, the old one is sub optimal and inefficient. I do not think using it as default is good for OutputCodeClassifier.
(2) I can use the old method, when the code_size is not in the correct range for the new one. When this happens, instead of giving a Error, the program could still run, and maybe output some deprecated information.

Please give me your thought. Thanks.

arjoly · 2014-10-14T12:58:45Z

Being backward compatible also means that you are still able to reproduce results from past experiments with the current version of scikit-learn.

What do you think of having a new constructor parameter called strategy which would able to select between the previous strategy and the one that you implemented through a string? By default, there could be an "auto" option to have automatically a good choice.

queqichao · 2014-10-14T13:20:26Z

I can add an extra parameter strategy to the constructor. By "auto" do you mean setting the default value of "strategy" to be the old strategy. Of course I can do this, but as I mentioned before, the old one is not a very good method. One example case is that for iris in the example code, there are 3 classes, [1,2,3], the exhaustive code would be
[[1, 1, 1],
[0, 0, 1],
[0, 1, 0],
each row is the code for one class, and column corresponds to a binary classification problem. Any other extra columns will be complement of the existing columns, thus gives the same binary classification problem. In this case, code_size could not be larger than 1, which is allowed by the old strategy.

The new method only samples subset from the exhaustive code book, which has 2^(n_class-1)-1 columns. I admit that the choice of code_size becomes a little bit more tricky.

So what I would like to do is to set the default strategy to be the new strategy, but still keep the old strategy if old user does not change their code. Do you have any idea how this could be achieved?

arjoly · 2014-10-14T13:35:24Z

By 'auto', I mean that it would always choose a not too bad choice for the user, i.e. select the old or new strategy depending on the code size.

…w user choosing the coding strategy.

queqichao · 2014-10-14T17:58:58Z

@arjoly I just keep the old strategy, and add an extra parameter in the constructor to allow user to choose the coding strategy.

arjoly · 2014-10-15T07:39:35Z

sklearn/multiclass.py

@@ -625,6 +631,13 @@ class OutputCodeClassifier(BaseEstimator, ClassifierMixin, MetaEstimatorMixin):
        If 1 is given, no parallel computing code is used at all, which is
        useful for debugging. For n_jobs below -1, (n_cpus + 1 + n_jobs) are
        used. Thus for n_jobs = -2, all CPUs but one are used.
+    coding_strategy : str, optional, default: None


strategy would be consistent with the dummy estimators.

What is the meaning of None?

Instead of None, could it be auto for the automatic strategy?

In case of None it will use an auto strategy inside, so maybe auto is better in terms of readability.

arjoly · 2014-10-15T08:00:29Z

Can you add tests that ensure that each codebook follows the property said in the documentation?

arjoly · 2014-10-15T08:01:46Z

sklearn/multiclass.py

+        elif self.coding_strategy == "opt_column_selection":
+            self._opt_column_selection_code_book(random_state, code_size_)
+        else:
+            raise ValueError("Unknown coding strategy %r" % self.coding_strategy)


Can you give all the possible strategy to the user?

queqichao · 2014-10-16T19:43:51Z

Hi, @arjoly Thanks for your comments and I addressed them in the new version.

arjoly · 2014-10-17T12:07:46Z

sklearn/multiclass.py

+    dist = 0
+    for k in range(max_iter):
+        p = random_state.permutation(max_code_size)
+        tmp_code_book = (p[:code_size, None] + max_code_size+1 & (1 << np.arange(n_classes-1, -1, -1)) > 0).astype(int).T


Could you cut this in several line? It's a bit hard to read for now.

arjoly · 2014-10-20T08:33:54Z

sklearn/tests/test_multiclass.py

+    dist1 = np.sum(pairwise_distances(_max_hamming_code_book(5, random_state,
+                                                            10, 2),
+                                      metric='hamming')) 
+    assert_true(dist0 >= dist1);


Here you can use assert_greater_equal.

coveralls · 2014-10-21T03:10:03Z

Coverage increased (+0.03%) when pulling e5187ee on queqichao:multiclass_code_book_fix into 031a3fc on scikit-learn:master.

queqichao · 2014-10-21T14:24:01Z

@arjoly, please take a look at the new version. Thanks.

arjoly · 2014-10-22T07:51:24Z

Ok, I have now a better understanding of the whole algorithm. To summarize, the two main differences between your approach and the old one:

Ensure non repeated codeword by sampling without replacement.
Try iteratively to obtain a good code book.

Why do you think for one of adding a parameter such as bootstrap_codes to select between a sampling with and without replacement? Why do you think of adding the possibility to iteratively generate a good code book for both approach?

Finally, what are the advantages of the "dense" codebook vs the "sparse" codebook presented in the paper Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers.

Could you come up with the example that illustrate the benefit of the new feature? Based on the example, we might have a better picture for the best default parameter of the estimator.

+ Update reference.

queqichao · 2014-10-23T03:59:18Z

@arjoly, I think the first feature is the most important feature in the new algorithm. Suppose you sample from the code book with replacement, you could have two same code words. E.g. for 3-class problem, you sample 3 code words as follows,
1 1 1
0 0 1
1 1 0
Each column gives a binary problem. So here the first two columns correspond to the same binary problem (setting class 0 and 2 t "1", and class 1 to "0".). As long as you do not change the training data, the result should be same if you use "deterministic" algorithm like SVM. And you get a duplicated classifier. So sampling with replacement does not help in this case.

The second thing is why iterative optimization could improve the code book. This part is more subtle. As Solving Multiclass Learning Problems via Error-Correcting Output Codes
suggested, there're two criteria: (1) row separation and (2) column separation. Reasoning is provided in the paper. What I do in the algorithm is basically optimizing these two criteria iteratively through random sampling.

I can also do some experiments to demonstrating the empirical effectiveness of the new algorithm on MNIST later.

arjoly · 2014-10-23T07:41:25Z

Thanks @queqichao for the explanation. Now, we have to be sure that we have the proper interface that rationalizes the current and the new features (e.g. benefit everywhere from the iterative algorithm while being DRY) while not blocking everything for later without having a yagni case.

Could you come up with the example that illustrate the benefit of the new feature? Based on the example, we might have a better picture for the best default parameter of the estimator.

I can also do some experiments to demonstrating the empirical effectiveness of the new algorithm on MNIST later.

What I suggested is a new example for the narrative documentation. It's very important to highlight your work and make it know to everybody that you have written a very useful piece of code. Without an example, it will be hard to user to discover your contribution.

Unfortunately MNIST is a too big dataset for the narrative documentation, instead we can use any dataset used in http://scikit-learn.org/stable/auto_examples/index.html.

+ Make too small code_size invalid for all strategies.

queqichao · 2014-10-24T04:14:35Z

@arjoly I think you make a good point. I am planning to add an simple example to compare the new algorithm and the old one, and other coding methods for multi-class. I ran a simple experiment on the digits dataset and plot the classification error of using different coding algorithms.

Here horizontal axis is code_size and vertical axis is error. Because the 'iter_hamming' and 'random' are random algorithm, so the errors for these two are averages over 50 repetitions.

The new algorithm 'iter_hamming' is better than the old one 'random' when code_size is relatively small. This is expected, because 'random' strategy is more vulnerable to code word collision when code_size is small. But both is worse than the other algorithms. I guess it is probably because both the randomize coding schemas are not so optimized.

Finding a better coding algorithm for multi-class problem is still of open problem I believe. But the new algorithm is at least better than the old one in certain situations. So what do you think about this? Where would be the right place for the example and the corresponding documentation.

… distances, which seems to wrosen the results.

queqichao · 2014-10-28T05:25:46Z

Hi, @arjoly, please take a look at the new version with an example for the new multiclass.

arjoly · 2014-11-03T08:46:29Z

Recently, I haven't had much time to look at this pull request. I will try to dig some time this week. Thanks for your patience.

amueller · 2014-11-06T16:27:52Z

examples/classification/plot_digits_multiclass.py

@@ -0,0 +1,139 @@
+"""
+===========================
+Multi-class classification


I would rename this "Multi-class encoding" or something since the example is about coding strategies, not multi-class classification.

amueller · 2014-11-06T16:38:01Z

Basically your example shows that output coding is much worse in any way than just using OVO or OVR. With this graph, it is not really clear why we want to add the algorithm.
The method you contributed is better than the random one, but I'm not super convinced about adding a substantial amount of code for something that will not be useful in practice. Do you have an example where output encoding fares better than OVR or OVO?

amueller · 2014-11-06T16:40:08Z

I guess we could still add the algorithms for completeness as we already have the error correcting output code, but maybe add a note that this is more for illustration purposes? I'm not entirely sure what the purpose of adding it is...

queqichao · 2014-11-06T16:48:35Z

You're correct. I guess output code does not necessary out-perform the OVO or OVR in practice. That's probably why most people still prefer to use OVO or OVR. Before I initiated this pull request, I just thought the original algorithm is not perfect. But after doing the experiment, I found output code does not work so well, at least on the data set I have tried.

amueller · 2014-11-06T17:40:20Z

Do you think it would still be worth including this in scikit-learn? Or do you think it would be worth doing more experiments?

queqichao · 2014-11-06T18:28:51Z

If the output code is still kept in scikit-learn, I think an improvement to the original algorithm might be worthy. But I admit that the justification of the improvement is not solid. Actually the original motivation for adding output code to scikit-learn is confusing to me, because its effectiveness was not fully tested.

I would like to do more experiments, the data sets available for multi-class classification is quite limited.

glemaitre · 2022-07-29T12:39:26Z

From the latest comment, closing this PR.

+ Change the process of generating output code for OutputCodeClassifier.

9565a2a

+ Add test case. + Update the document.

+ Keep the old strategy, and add extra parameter in construct to allo…

9d98cca

…w user choosing the coding strategy.

arjoly reviewed Oct 15, 2014
View reviewed changes

+ Address the comments from arjoly

98640e4

arjoly reviewed Oct 17, 2014
View reviewed changes

arjoly reviewed Oct 20, 2014
View reviewed changes

address the comments.

e5187ee

+ Change strategy name.

95d0343

+ Update reference.

+ Fix the bug that max_code_size goes overflow for too large n_classes

346d6f2

+ Make too small code_size invalid for all strategies.

queqichao added 2 commits October 24, 2014 00:23

+ Modify the optimizing strategy in "iter_hamming": remove the column…

169a8a3

… distances, which seems to wrosen the results.

Add Example for multiclass.py

f4c3fe9

MechCoder force-pushed the master branch from 6deaea0 to 3f49cee Compare November 3, 2014 12:36

amueller reviewed Nov 6, 2014
View reviewed changes

amueller mentioned this pull request Aug 25, 2015

[MGR] Implemented Determinant ECOC #2391

Closed

amueller added the Needs Decision Requires decision label Aug 5, 2019

Base automatically changed from master to main January 22, 2021 10:48

thomasjpfan added Needs Decision - Close Requires decision for closing and removed Needs Decision Requires decision labels Feb 8, 2022

glemaitre closed this Jul 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix the issue #3745, the code book generation for OutputCodeClassifier #3768

Fix the issue #3745, the code book generation for OutputCodeClassifier #3768

queqichao commented Oct 13, 2014

coveralls commented Oct 13, 2014

arjoly commented Oct 14, 2014

queqichao commented Oct 14, 2014

arjoly commented Oct 14, 2014

queqichao commented Oct 14, 2014

arjoly commented Oct 14, 2014

queqichao commented Oct 14, 2014

arjoly Oct 15, 2014

arjoly Oct 15, 2014

queqichao Oct 15, 2014

arjoly commented Oct 15, 2014

arjoly Oct 15, 2014

queqichao commented Oct 16, 2014

arjoly Oct 17, 2014

queqichao Oct 20, 2014

arjoly Oct 20, 2014

coveralls commented Oct 21, 2014

queqichao commented Oct 21, 2014

arjoly commented Oct 22, 2014

queqichao commented Oct 23, 2014

arjoly commented Oct 23, 2014

queqichao commented Oct 24, 2014

queqichao commented Oct 28, 2014

arjoly commented Nov 3, 2014

amueller Nov 6, 2014

amueller commented Nov 6, 2014

amueller commented Nov 6, 2014

queqichao commented Nov 6, 2014

amueller commented Nov 6, 2014

queqichao commented Nov 6, 2014

glemaitre commented Jul 29, 2022

Fix the issue #3745, the code book generation for OutputCodeClassifier #3768

Fix the issue #3745, the code book generation for OutputCodeClassifier #3768

Conversation

queqichao commented Oct 13, 2014

coveralls commented Oct 13, 2014

arjoly commented Oct 14, 2014

queqichao commented Oct 14, 2014

arjoly commented Oct 14, 2014

queqichao commented Oct 14, 2014

arjoly commented Oct 14, 2014

queqichao commented Oct 14, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arjoly commented Oct 15, 2014

Choose a reason for hiding this comment

queqichao commented Oct 16, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coveralls commented Oct 21, 2014

queqichao commented Oct 21, 2014

arjoly commented Oct 22, 2014

queqichao commented Oct 23, 2014

arjoly commented Oct 23, 2014

queqichao commented Oct 24, 2014

queqichao commented Oct 28, 2014

arjoly commented Nov 3, 2014

Choose a reason for hiding this comment

amueller commented Nov 6, 2014

amueller commented Nov 6, 2014

queqichao commented Nov 6, 2014

amueller commented Nov 6, 2014

queqichao commented Nov 6, 2014

glemaitre commented Jul 29, 2022