Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Fix BOSS based classifiers truncating class names to single character length #4096

Merged
merged 7 commits into from
Jan 12, 2023

Conversation

erjieyong
Copy link
Contributor

Reference Issues/PRs

This will fix #4090 which prevent predicted classes list from being populated correctly

What does this implement/fix? Explain your changes.

When calling np.zeros, change string datatype to object so that np.zeros will not truncate the string to only the first character.

I've also tested this locally using provided example in the issues as well as running check_estimator(IndividualBOSS). All tests PASSED!

Does your contribution introduce a new dependency? If yes, which one?

No

@fkiraly fkiraly changed the title [BUG] Fix boss classes [BUG] Fix BOSS based classifiers truncating class names to single character length Jan 11, 2023
@fkiraly fkiraly added module:classification classification module: time series classification bugfix Fixes a known bug or removes unintended behavior labels Jan 11, 2023
.all-contributorsrc Outdated Show resolved Hide resolved
Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me - could you kindly add a test that certifies for the fix?

I.e., your example (as simple as possible, with dummy data) that fails before the fix and runs after?

It should go into the folder sktime.classification.dictionary_based, in a new module test_boss.py

Let me know if you don't want to add that and/or need help with pytest, best starting point is perhaps following the pattern in test_tde (same folder)

@erjieyong
Copy link
Contributor Author

erjieyong commented Jan 11, 2023 via email

@erjieyong
Copy link
Contributor Author

@fkiraly ,I need your help. How do I run the tests after writing them?

Using test_tde.py as an example, when I run the following code: the verbose output does not seem show that the tests in test_tde has been excuted. (eg. test_tde_train_estimate)

from sktime.utils.estimator_checks import check_estimator
from sktime.classification.dictionary_based._tde import TemporalDictionaryEnsemble
check_estimator(TemporalDictionaryEnsemble)
Output as follows
All tests PASSED!
{'test_clone[TemporalDictionaryEnsemble]': 'PASSED',
 'test_constructor[TemporalDictionaryEnsemble]': 'PASSED',
 'test_create_test_instance[TemporalDictionaryEnsemble]': 'PASSED',
 'test_create_test_instances_and_names[TemporalDictionaryEnsemble]': 'PASSED',
 'test_estimator_tags[TemporalDictionaryEnsemble]': 'PASSED',
 'test_get_params[TemporalDictionaryEnsemble]': 'PASSED',
 'test_has_common_interface[TemporalDictionaryEnsemble]': 'PASSED',
 'test_inheritance[TemporalDictionaryEnsemble]': 'PASSED',
 'test_no_between_test_case_side_effects[TemporalDictionaryEnsemble-ClassifierFitPredict-0]': 'PASSED',
 'test_no_between_test_case_side_effects[TemporalDictionaryEnsemble-ClassifierFitPredict-1]': 'PASSED',
 'test_no_between_test_case_side_effects[TemporalDictionaryEnsemble-ClassifierFitPredictMultivariate-0]': 'PASSED',
 'test_no_between_test_case_side_effects[TemporalDictionaryEnsemble-ClassifierFitPredictMultivariate-1]': 'PASSED',
 'test_no_cross_test_side_effects_part1[TemporalDictionaryEnsemble]': 'PASSED',
 'test_no_cross_test_side_effects_part2[TemporalDictionaryEnsemble]': 'PASSED',
 'test_repr[TemporalDictionaryEnsemble]': 'PASSED',
 'test_set_params[TemporalDictionaryEnsemble]': 'PASSED',
 'test_set_params_sklearn[TemporalDictionaryEnsemble]': 'PASSED',
 'test_valid_estimator_class_tags[TemporalDictionaryEnsemble]': 'PASSED',
 'test_valid_estimator_tags[TemporalDictionaryEnsemble]': 'PASSED',
 'test_fit_does_not_overwrite_hyper_params[TemporalDictionaryEnsemble-ClassifierFitPredict]': 'PASSED',
 'test_fit_does_not_overwrite_hyper_params[TemporalDictionaryEnsemble-ClassifierFitPredictMultivariate]': 'PASSED',
 'test_fit_idempotent[TemporalDictionaryEnsemble-ClassifierFitPredict-predict]': 'PASSED',
 'test_fit_idempotent[TemporalDictionaryEnsemble-ClassifierFitPredict-predict_proba]': 'PASSED',
 'test_fit_idempotent[TemporalDictionaryEnsemble-ClassifierFitPredictMultivariate-predict]': 'PASSED',
 'test_fit_idempotent[TemporalDictionaryEnsemble-ClassifierFitPredictMultivariate-predict_proba]': 'PASSED',
 'test_fit_returns_self[TemporalDictionaryEnsemble-ClassifierFitPredict]': 'PASSED',
 'test_fit_returns_self[TemporalDictionaryEnsemble-ClassifierFitPredictMultivariate]': 'PASSED',
 'test_fit_updates_state[TemporalDictionaryEnsemble-ClassifierFitPredict]': 'PASSED',
 'test_fit_updates_state[TemporalDictionaryEnsemble-ClassifierFitPredictMultivariate]': 'PASSED',
 'test_methods_have_no_side_effects[TemporalDictionaryEnsemble-ClassifierFitPredict-predict]': 'PASSED',
 'test_methods_have_no_side_effects[TemporalDictionaryEnsemble-ClassifierFitPredict-predict_proba]': 'PASSED',
 'test_methods_have_no_side_effects[TemporalDictionaryEnsemble-ClassifierFitPredict-get_fitted_params]': 'PASSED',
 'test_methods_have_no_side_effects[TemporalDictionaryEnsemble-ClassifierFitPredictMultivariate-predict]': 'PASSED',
 'test_methods_have_no_side_effects[TemporalDictionaryEnsemble-ClassifierFitPredictMultivariate-predict_proba]': 'PASSED',
 'test_methods_have_no_side_effects[TemporalDictionaryEnsemble-ClassifierFitPredictMultivariate-get_fitted_params]': 'PASSED',
 'test_multiprocessing_idempotent[TemporalDictionaryEnsemble-ClassifierFitPredict-predict]': 'PASSED',
 'test_multiprocessing_idempotent[TemporalDictionaryEnsemble-ClassifierFitPredict-predict_proba]': 'PASSED',
 'test_multiprocessing_idempotent[TemporalDictionaryEnsemble-ClassifierFitPredictMultivariate-predict]': 'PASSED',
 'test_multiprocessing_idempotent[TemporalDictionaryEnsemble-ClassifierFitPredictMultivariate-predict_proba]': 'PASSED',
 'test_non_state_changing_method_contract[TemporalDictionaryEnsemble-ClassifierFitPredict-predict]': 'PASSED',
 'test_non_state_changing_method_contract[TemporalDictionaryEnsemble-ClassifierFitPredict-predict_proba]': 'PASSED',
 'test_non_state_changing_method_contract[TemporalDictionaryEnsemble-ClassifierFitPredict-get_fitted_params]': 'PASSED',
 'test_non_state_changing_method_contract[TemporalDictionaryEnsemble-ClassifierFitPredictMultivariate-predict]': 'PASSED',
 'test_non_state_changing_method_contract[TemporalDictionaryEnsemble-ClassifierFitPredictMultivariate-predict_proba]': 'PASSED',
 'test_non_state_changing_method_contract[TemporalDictionaryEnsemble-ClassifierFitPredictMultivariate-get_fitted_params]': 'PASSED',
 'test_persistence_via_pickle[TemporalDictionaryEnsemble-ClassifierFitPredict-predict]': 'PASSED',
 'test_persistence_via_pickle[TemporalDictionaryEnsemble-ClassifierFitPredict-predict_proba]': 'PASSED',
 'test_persistence_via_pickle[TemporalDictionaryEnsemble-ClassifierFitPredictMultivariate-predict]': 'PASSED',
 'test_persistence_via_pickle[TemporalDictionaryEnsemble-ClassifierFitPredictMultivariate-predict_proba]': 'PASSED',
 'test_raises_not_fitted_error[TemporalDictionaryEnsemble-ClassifierFitPredict-predict]': 'PASSED',
 'test_raises_not_fitted_error[TemporalDictionaryEnsemble-ClassifierFitPredict-predict_proba]': 'PASSED',
 'test_raises_not_fitted_error[TemporalDictionaryEnsemble-ClassifierFitPredict-get_fitted_params]': 'PASSED',
 'test_raises_not_fitted_error[TemporalDictionaryEnsemble-ClassifierFitPredictMultivariate-predict]': 'PASSED',
 'test_raises_not_fitted_error[TemporalDictionaryEnsemble-ClassifierFitPredictMultivariate-predict_proba]': 'PASSED',
 'test_raises_not_fitted_error[TemporalDictionaryEnsemble-ClassifierFitPredictMultivariate-get_fitted_params]': 'PASSED',
 'test_save_estimators_to_file[TemporalDictionaryEnsemble-ClassifierFitPredict-predict]': 'PASSED',
 'test_save_estimators_to_file[TemporalDictionaryEnsemble-ClassifierFitPredict-predict_proba]': 'PASSED',
 'test_save_estimators_to_file[TemporalDictionaryEnsemble-ClassifierFitPredictMultivariate-predict]': 'PASSED',
 'test_save_estimators_to_file[TemporalDictionaryEnsemble-ClassifierFitPredictMultivariate-predict_proba]': 'PASSED',
 'test_classifier_on_basic_motions[TemporalDictionaryEnsemble]': 'PASSED',
 'test_classifier_on_unit_test_data[TemporalDictionaryEnsemble]': 'PASSED',
 'test_classifier_output[TemporalDictionaryEnsemble-ClassifierFitPredict]': 'PASSED',
 'test_classifier_output[TemporalDictionaryEnsemble-ClassifierFitPredictMultivariate]': 'PASSED',
 'test_handles_single_class[TemporalDictionaryEnsemble]': 'PASSED',
 'test_multivariate_input_exception[TemporalDictionaryEnsemble]': 'PASSED'}

Appreciate your advice. Thanks.

@fkiraly
Copy link
Collaborator

fkiraly commented Jan 12, 2023

check_estimator only runs the tests that come from TestAllEstimators and TestAllClassifiers etc.

To run the test in, say, test_tde or the new test_boss, you need to use pytest directly.

Two common options are:

fkiraly
fkiraly previously approved these changes Jan 12, 2023
Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks great!
Will merge once tests pass.

One tipp, but not a blocker - the tests are a bit repetitive, and they could be made much less copy-paste by using pytest mark.parametrize (on pairs of the expected y_pred.dtype, and the new_class dict values).

Feel free to try changing that (you can do it in this PR or a separate one once this is merged), as said it´s not necessary for this to go in imo.

@fkiraly
Copy link
Collaborator

fkiraly commented Jan 12, 2023

Ah, but linting is failing now :-)
Here´s the guide: https://www.sktime.org/en/stable/developer_guide/coding_standards.html

@fkiraly
Copy link
Collaborator

fkiraly commented Jan 12, 2023

just fixed the linting so we can potentially put this fix in the release

@erjieyong
Copy link
Contributor Author

I see there's another issue. Let me clear it on my end first. I have actually installed Black locally, but I guess it didn't catch the particular portion which failed. Will try and install pre-commit

@fkiraly
Copy link
Collaborator

fkiraly commented Jan 12, 2023

I see there's another issue. Let me clear it on my end first.

Yes, it wants you to write docstrings for the tests. Minimal ones have one line, better ones have an explanation of what precisely the test success and fail conditions are.

@erjieyong
Copy link
Contributor Author

@fkiraly, I've managed to install pre-commit locally already and it has passed all the test. Please review again. Thank you

@fkiraly
Copy link
Collaborator

fkiraly commented Jan 12, 2023

linting passes now! 🎉

@fkiraly fkiraly merged commit 7fcce15 into sktime:main Jan 12, 2023
@erjieyong
Copy link
Contributor Author

Thanks a lot for your guidance. I'll work on improving the tests soon!

@erjieyong erjieyong deleted the fix-boss-classes branch January 13, 2023 09:21
klam-data pushed a commit to CodeSmithDSMLProjects/sktime that referenced this pull request Jan 18, 2023
…racter length (sktime#4096)

Fixes sktime#4090 which prevents predicted classes list from being populated correctly

When calling `np.zeros`, change `string` datatype to `object` so that `np.zeros` will not truncate the string to only the first character.

I've also tested this locally using provided example in the issues as well as running `check_estimator(IndividualBOSS)`. All tests PASSED!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix Fixes a known bug or removes unintended behavior module:classification classification module: time series classification
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] BOSS classifier class names truncated during prediction
2 participants