Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create check_estimator tests #84

Closed
luizgh opened this issue Sep 11, 2018 · 5 comments
Closed

Create check_estimator tests #84

luizgh opened this issue Sep 11, 2018 · 5 comments
Assignees

Comments

@luizgh
Copy link
Collaborator

luizgh commented Sep 11, 2018

Setup tests for all methods using sklearn.utils.estimator_checks.check_estimator. Tests should be setup on the test file for each class.

@luizgh
Copy link
Collaborator Author

luizgh commented Sep 18, 2018

@Menelau I implemented the check_estimator checks for all classes. Besides the issues on a_priori and a_posteriori, there were also problems with meta_des and probabilistic - I will take a look at them.

The test also failed for the static ensembles, since they do not allow construction without informing a pool of classifiers. Do you think this should be changed, as is done in the DS methods? (if not informed, fit the classifiers during the fit method of the ensemble)

@Menelau
Copy link
Collaborator

Menelau commented Sep 18, 2018

@luizgh Great! About these problems with META-DES I believe the problem is due to the number of examples used in the test routine (the size of the training dataset is lower than the number of neighbors). That catches a very specific problem in this code.

About the static combination methods, we need to change them to the same format as the other DS methods (fit the ensemble inside if not informed), so that all classes in the library are consistent and compatible with sklearn standards. In this case, we also need to prepare estimator tests for these classes.

@Menelau
Copy link
Collaborator

Menelau commented Sep 18, 2018

@luizgh About the Probabilistic, this is another abstract class and should not be instantiated (this information is missing in this class which may have cause the confusion).

The estimator check should be conducted for the corresponding subclasses (RRC, DESKL, MinimumDifference...)

@luizgh
Copy link
Collaborator Author

luizgh commented Sep 18, 2018

I updated the check_estimator to run on the subclasses of Probabilistic. It worked for most of them, but failed for "MinimumDifference" in a corner case where all training points are from only one of the classes. I think this needs to be handled in the code, so I will create a new issue for it. I will also create a new issue to investigate the META-DES problem, and update issue #86 to remember to add the new "fit" behavior on the static classes.

@luizgh luizgh closed this as completed Sep 18, 2018
@luizgh
Copy link
Collaborator Author

luizgh commented Sep 18, 2018

created issues #91 and #92

Menelau pushed a commit that referenced this issue Sep 28, 2018
…all DES/DCS/static classifiers.

* - Moving code to validate the parameters from __init__ to the fit method (sklearn style)

* Refactoring DCS classes: Changing class attributes names to the sklearn style. Attributes estimated from the data now have an underscore after its name.

* Changes to make the class compatible with the sklearn standards:
- Moving code to validate the estimator parameters from the __init__ to the fit method;
- Refactoring: Changing class attributes names to the sklearn style. Attributes estimated from the data now have an underscore after its name;
- Addition BaseEstimator to the inherited classes for the get_params and set_params methods.

* Updating the test routines to the according to the new changes in attribute names and parameter validation scheme

* PEP8 formatting

* Refactoring according to sklearn guidelines: Changing names of class attributes that are estimated based on the data (on the fit method)

* Updating test routines according to the attributes name change

* Refactoring according to sklearn guidelines:

- Moving code to validate parameters from __init__ to fit
- Change in attribute names (using an underscore after the name of attributes estimated from the data)

* Updating test routines according to the refactoring on attribute name change and the new method for validating the estimator parameters

* Fixing problem with identation

* Refactoring: Moving code that validate parameters to the fit method; change ins the attribute names (sklearn standard) and accepting a clustering method as input parameter.

* Updated test routines for the DESClustering class according to the new guidelines.

* Adding code to verify whether the object passed as the clustering method is part of the sklearn clustering classes.

* Updating the test routines that check if the base classifier implements the predict_proba function (Now the check happens inside the fit method)

* Moving the _check_predict_proba function to the fit method.

* Refactoring: remove old DFP masks

* Refactoring

* - Changing default value of pool_classifiers to None
- Modifying name of random state attribute from rng to random_state

* updating the n_classifies_ attribute in the test routines

* Changing the name of the attribute rng tp random_state in the integration tests.

* Fixing error in the docstring (return value of the method)

* Changing check for proba after the fit method; refactoring attribute names according to sklearn guidelines

* Adding random_state parameter

* Adding random_state parameter

* Adding the DFP and IH and random_state hyper-parameters to DESMI class.

* changing random_state default value

* Adding DESMI to the list of DES techniques

* Adding DES Logarithmic

* Making DS clustering compatible with sklearn estimators guidelines.

* Making DESKNN compatible with sklearn estimators guidelines.

* Making KNOP compatible with sklearn estimators guidelines.

* Making META-DES compatible with sklearn estimators guidelines.

* Making Probabilistic techniques compatible with sklearn estimators guidelines.

* Updating test routines according to the new changes in variable names; Removing not used test cases

* Updating test routines according to the new variable names; Removing obsolete test functions

* Updating name of variables estimated from the data according to the sklearn guidelines

* Adding random_state to the clustering definition

* Making DESMI class an sklearn estimator

* Merging with master branch

* Adding sklearn's "check_estimator"  tests (#84)

* Adding sklearn's "check_estimator" for probabilistic DS methods

* Adding test to show #89 is indeed a problem

* Adding warning on base class (k bigger than DSEL) #93

* Adding known issue with GridSearch #89

* Fixes #91

* Marking the grid search test to skip (#89)

* Adding tests for python 3.7 (#98)

* Workaround for travis 3.7 support (#98)

* Fix #92

* adding pytest_cache to the list of ignored folders

* removing .idea from project

* Fixing problem with rng in DCS classes when using "random" or "diff" as selection method (rng during predict/predict_proba). Fixes #88

* Base class for static ensembles

* Making SingleBest an sklearn estimator

* Making StaticSelection a sklearn estimator

* Removing unused imports

* Making Oracle class compatible with sklearn

* Using sklearn check_array to assert a given array is 2d

* Removing commented code lines

* Fixing docstring on static ensemble classes; Solving a bug with label encoder for the single best class

* Adding license information

* automatically convert array to 2d

* Updating tests with Oracle technique (using fit to setup label encoder)

* updating oracle tests (setup label encoder in the fit method)

* updating test; removing check estimator from Oracle since it is not a real classifier

* Adding check array to predict.

* Enforcing the predictions of the base classifiers are integers.

* Fixing random state bug

* removing commented lines of code

* adding kdn score method

* PEP8 formatting; Cleaning commented code.

* Adding license information

* Adding license information; moving kdn_score function to utils.instance_hardness.py; Adding Label encoder; Refactoring variable names according to sklearn standards

* removing unused code

* Adding check_estimator test for OLP method

* Solving problem with label encoder when no base classifier predicts the correct label

* Test routines for the SGH class

* Adding predict proba; Checking if the method was fitted before calling predict and predict_proba.

* Adding checks to raise an error in regression problems

* skipping test while the batch processing version is not implemented

* Adding parameter to indicate percentage of data used for DSEL in the training-DSEL split

* Updating variable names.

* Updating requirements version (sklearn 0.19) due to estimators check

* Updating requirements version (sklearn 0.19) due to estimators check

* Updating requirements; travis

* Print values of N_ and J_ on error

* Fixed checks for pct_accuracy

* Fix test name

* Fixing test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants