New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix the issue #3745, the code book generation for OutputCodeClassifier #3768
Conversation
+ Add test case. + Update the document.
Can you preserve the previous strategy? We need to remain backward compatible. |
@arjoly I didn't the change the interface actually. The only place that could potentially cause backward compatibility issue is that when code_size is not in the correct range, it will give ValueError. To resolve this, there are 2 things I can do: Please give me your thought. Thanks. |
Being backward compatible also means that you are still able to reproduce results from past experiments with the current version of scikit-learn. What do you think of having a new constructor parameter called |
I can add an extra parameter The new method only samples subset from the exhaustive code book, which has So what I would like to do is to set the default strategy to be the new strategy, but still keep the old strategy if old user does not change their code. Do you have any idea how this could be achieved? |
By |
…w user choosing the coding strategy.
@arjoly I just keep the old strategy, and add an extra parameter in the constructor to allow user to choose the coding strategy. |
sklearn/multiclass.py
Outdated
@@ -625,6 +631,13 @@ class OutputCodeClassifier(BaseEstimator, ClassifierMixin, MetaEstimatorMixin): | |||
If 1 is given, no parallel computing code is used at all, which is | |||
useful for debugging. For n_jobs below -1, (n_cpus + 1 + n_jobs) are | |||
used. Thus for n_jobs = -2, all CPUs but one are used. | |||
coding_strategy : str, optional, default: None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
strategy
would be consistent with the dummy estimators.
What is the meaning of None?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of None, could it be auto
for the automatic strategy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case of None it will use an auto strategy inside, so maybe auto is better in terms of readability.
Can you add tests that ensure that each codebook follows the property said in the documentation? |
sklearn/multiclass.py
Outdated
elif self.coding_strategy == "opt_column_selection": | ||
self._opt_column_selection_code_book(random_state, code_size_) | ||
else: | ||
raise ValueError("Unknown coding strategy %r" % self.coding_strategy) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you give all the possible strategy to the user?
Hi, @arjoly Thanks for your comments and I addressed them in the new version. |
sklearn/multiclass.py
Outdated
dist = 0 | ||
for k in range(max_iter): | ||
p = random_state.permutation(max_code_size) | ||
tmp_code_book = (p[:code_size, None] + max_code_size+1 & (1 << np.arange(n_classes-1, -1, -1)) > 0).astype(int).T |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you cut this in several line? It's a bit hard to read for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
sklearn/tests/test_multiclass.py
Outdated
dist1 = np.sum(pairwise_distances(_max_hamming_code_book(5, random_state, | ||
10, 2), | ||
metric='hamming')) | ||
assert_true(dist0 >= dist1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here you can use assert_greater_equal.
@arjoly, please take a look at the new version. Thanks. |
Ok, I have now a better understanding of the whole algorithm. To summarize, the two main differences between your approach and the old one:
Why do you think for one of adding a parameter such as bootstrap_codes to select between a sampling with and without replacement? Why do you think of adding the possibility to iteratively generate a good code book for both approach? Finally, what are the advantages of the "dense" codebook vs the "sparse" codebook presented in the paper Could you come up with the example that illustrate the benefit of the new feature? Based on the example, we might have a better picture for the best default parameter of the estimator. |
+ Update reference.
@arjoly, I think the first feature is the most important feature in the new algorithm. Suppose you sample from the code book with replacement, you could have two same code words. E.g. for 3-class problem, you sample 3 code words as follows, The second thing is why iterative optimization could improve the code book. This part is more subtle. As Solving Multiclass Learning Problems via Error-Correcting Output Codes I can also do some experiments to demonstrating the empirical effectiveness of the new algorithm on MNIST later. |
Thanks @queqichao for the explanation. Now, we have to be sure that we have the proper interface that rationalizes the current and the new features (e.g. benefit everywhere from the iterative algorithm while being DRY) while not blocking everything for later without having a yagni case.
What I suggested is a new example for the narrative documentation. It's very important to highlight your work and make it know to everybody that you have written a very useful piece of code. Without an example, it will be hard to user to discover your contribution. Unfortunately MNIST is a too big dataset for the narrative documentation, instead we can use any dataset used in http://scikit-learn.org/stable/auto_examples/index.html. |
+ Make too small code_size invalid for all strategies.
@arjoly I think you make a good point. I am planning to add an simple example to compare the new algorithm and the old one, and other coding methods for multi-class. I ran a simple experiment on the digits dataset and plot the classification error of using different coding algorithms. The new algorithm 'iter_hamming' is better than the old one 'random' when code_size is relatively small. This is expected, because 'random' strategy is more vulnerable to code word collision when code_size is small. But both is worse than the other algorithms. I guess it is probably because both the randomize coding schemas are not so optimized. Finding a better coding algorithm for multi-class problem is still of open problem I believe. But the new algorithm is at least better than the old one in certain situations. So what do you think about this? Where would be the right place for the example and the corresponding documentation. |
… distances, which seems to wrosen the results.
Hi, @arjoly, please take a look at the new version with an example for the new multiclass. |
Recently, I haven't had much time to look at this pull request. I will try to dig some time this week. Thanks for your patience. |
@@ -0,0 +1,139 @@ | |||
""" | |||
=========================== | |||
Multi-class classification |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rename this "Multi-class encoding" or something since the example is about coding strategies, not multi-class classification.
Basically your example shows that output coding is much worse in any way than just using OVO or OVR. With this graph, it is not really clear why we want to add the algorithm. |
I guess we could still add the algorithms for completeness as we already have the error correcting output code, but maybe add a note that this is more for illustration purposes? I'm not entirely sure what the purpose of adding it is... |
You're correct. I guess output code does not necessary out-perform the OVO or OVR in practice. That's probably why most people still prefer to use OVO or OVR. Before I initiated this pull request, I just thought the original algorithm is not perfect. But after doing the experiment, I found output code does not work so well, at least on the data set I have tried. |
Do you think it would still be worth including this in scikit-learn? Or do you think it would be worth doing more experiments? |
If the output code is still kept in scikit-learn, I think an improvement to the original algorithm might be worthy. But I admit that the justification of the improvement is not solid. Actually the original motivation for adding output code to scikit-learn is confusing to me, because its effectiveness was not fully tested. I would like to do more experiments, the data sets available for multi-class classification is quite limited. |
From the latest comment, closing this PR. |
[1] Thomas G. Dietterich, Ghulum Bakiri. Solving Multiclass Learning Problems via Error-Correcting Output Codes