Skip to content

Commit

Permalink
Compatibility with scikit-learn (v0.19). Check_estimator passing for …
Browse files Browse the repository at this point in the history
…all DES/DCS/static classifiers.

* - Moving code to validate the parameters from __init__ to the fit method (sklearn style)

* Refactoring DCS classes: Changing class attributes names to the sklearn style. Attributes estimated from the data now have an underscore after its name.

* Changes to make the class compatible with the sklearn standards:
- Moving code to validate the estimator parameters from the __init__ to the fit method;
- Refactoring: Changing class attributes names to the sklearn style. Attributes estimated from the data now have an underscore after its name;
- Addition BaseEstimator to the inherited classes for the get_params and set_params methods.

* Updating the test routines to the according to the new changes in attribute names and parameter validation scheme

* PEP8 formatting

* Refactoring according to sklearn guidelines: Changing names of class attributes that are estimated based on the data (on the fit method)

* Updating test routines according to the attributes name change

* Refactoring according to sklearn guidelines:

- Moving code to validate parameters from __init__ to fit
- Change in attribute names (using an underscore after the name of attributes estimated from the data)

* Updating test routines according to the refactoring on attribute name change and the new method for validating the estimator parameters

* Fixing problem with identation

* Refactoring: Moving code that validate parameters to the fit method; change ins the attribute names (sklearn standard) and accepting a clustering method as input parameter.

* Updated test routines for the DESClustering class according to the new guidelines.

* Adding code to verify whether the object passed as the clustering method is part of the sklearn clustering classes.

* Updating the test routines that check if the base classifier implements the predict_proba function (Now the check happens inside the fit method)

* Moving the _check_predict_proba function to the fit method.

* Refactoring: remove old DFP masks

* Refactoring

* - Changing default value of pool_classifiers to None
- Modifying name of random state attribute from rng to random_state

* updating the n_classifies_ attribute in the test routines

* Changing the name of the attribute rng tp random_state in the integration tests.

* Fixing error in the docstring (return value of the method)

* Changing check for proba after the fit method; refactoring attribute names according to sklearn guidelines

* Adding random_state parameter

* Adding random_state parameter

* Adding the DFP and IH and random_state hyper-parameters to DESMI class.

* changing random_state default value

* Adding DESMI to the list of DES techniques

* Adding DES Logarithmic

* Making DS clustering compatible with sklearn estimators guidelines.

* Making DESKNN compatible with sklearn estimators guidelines.

* Making KNOP compatible with sklearn estimators guidelines.

* Making META-DES compatible with sklearn estimators guidelines.

* Making Probabilistic techniques compatible with sklearn estimators guidelines.

* Updating test routines according to the new changes in variable names; Removing not used test cases

* Updating test routines according to the new variable names; Removing obsolete test functions

* Updating name of variables estimated from the data according to the sklearn guidelines

* Adding random_state to the clustering definition

* Making DESMI class an sklearn estimator

* Merging with master branch

* Adding sklearn's "check_estimator"  tests (#84)

* Adding sklearn's "check_estimator" for probabilistic DS methods

* Adding test to show #89 is indeed a problem

* Adding warning on base class (k bigger than DSEL) #93

* Adding known issue with GridSearch #89

* Fixes #91

* Marking the grid search test to skip (#89)

* Adding tests for python 3.7 (#98)

* Workaround for travis 3.7 support (#98)

* Fix #92

* adding pytest_cache to the list of ignored folders

* removing .idea from project

* Fixing problem with rng in DCS classes when using "random" or "diff" as selection method (rng during predict/predict_proba). Fixes #88

* Base class for static ensembles

* Making SingleBest an sklearn estimator

* Making StaticSelection a sklearn estimator

* Removing unused imports

* Making Oracle class compatible with sklearn

* Using sklearn check_array to assert a given array is 2d

* Removing commented code lines

* Fixing docstring on static ensemble classes; Solving a bug with label encoder for the single best class

* Adding license information

* automatically convert array to 2d

* Updating tests with Oracle technique (using fit to setup label encoder)

* updating oracle tests (setup label encoder in the fit method)

* updating test; removing check estimator from Oracle since it is not a real classifier

* Adding check array to predict.

* Enforcing the predictions of the base classifiers are integers.

* Fixing random state bug

* removing commented lines of code

* adding kdn score method

* PEP8 formatting; Cleaning commented code.

* Adding license information

* Adding license information; moving kdn_score function to utils.instance_hardness.py; Adding Label encoder; Refactoring variable names according to sklearn standards

* removing unused code

* Adding check_estimator test for OLP method

* Solving problem with label encoder when no base classifier predicts the correct label

* Test routines for the SGH class

* Adding predict proba; Checking if the method was fitted before calling predict and predict_proba.

* Adding checks to raise an error in regression problems

* skipping test while the batch processing version is not implemented

* Adding parameter to indicate percentage of data used for DSEL in the training-DSEL split

* Updating variable names.

* Updating requirements version (sklearn 0.19) due to estimators check

* Updating requirements version (sklearn 0.19) due to estimators check

* Updating requirements; travis

* Print values of N_ and J_ on error

* Fixed checks for pct_accuracy

* Fix test name

* Fixing test
  • Loading branch information
Luiz Gustavo Hafemann authored and Menelau committed Sep 28, 2018
1 parent de1d444 commit 6f83160
Show file tree
Hide file tree
Showing 67 changed files with 2,461 additions and 1,436 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,4 @@ build
dist
DESlib.egg-info
.idea
.pytest_cache
6 changes: 0 additions & 6 deletions .idea/vcs.xml

This file was deleted.

7 changes: 6 additions & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ language: python
python:
- "3.5"
- "3.6"

install:
- wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh
- bash miniconda.sh -b -p $HOME/miniconda
Expand All @@ -15,7 +16,11 @@ install:
- travis_retry pip install codecov
- travis_retry python setup.py build
- travis_retry python setup.py install
- travis_retry conda install faiss-cpu -c pytorch
# Faiss requires anaconda and only works for python 3.5 and 3.6:
- if [[ "$TRAVIS_PYTHON_VERSION" != "3.7" ]]; then
travis_retry conda install faiss-cpu -c pytorch;
fi

script: coverage run -m py.test
after_success:
- codecov
Expand Down
2 changes: 1 addition & 1 deletion README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ Latest version (under development):
Dependencies:
-------------

DESlib is tested to work with Python 3.5, and 3.6. The dependency requirements are:
DESlib is tested to work with Python 3.5, 3.6 and 3.7. The dependency requirements are:

* scipy(>=0.13.3)
* numpy(>=1.10.4)
Expand Down
314 changes: 157 additions & 157 deletions deslib/base.py

Large diffs are not rendered by default.

48 changes: 32 additions & 16 deletions deslib/dcs/a_posteriori.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,9 +28,10 @@ class APosteriori(DCS):
Parameters
----------
pool_classifiers : list of classifiers
pool_classifiers : list of classifiers (Default = None)
The generated_pool of classifiers trained for the corresponding classification problem.
The classifiers should support methods "predict" and "predict_proba".
Each base classifiers should support the method "predict" and "predict_proba".
If None, then the pool of classifiers is a bagging classifier.
k : int (Default = 7)
Number of neighbors used to estimate the competence of the base classifiers.
Expand Down Expand Up @@ -58,8 +59,11 @@ class APosteriori(DCS):
classifiers for the random and diff selection schemes. If the difference is lower than the
threshold, their performance are considered equivalent.
rng : numpy.random.RandomState instance
Random number generator to assure reproducible results.
random_state : int, RandomState instance or None, optional (default=None)
If int, random_state is the seed used by the random number generator;
If RandomState instance, random_state is the random number generator;
If None, the random number generator is the RandomState instance used
by `np.random`.
knn_classifier : {'knn', 'faiss', None} (Default = 'knn')
The algorithm used to estimate the region of competence:
Expand All @@ -68,6 +72,10 @@ class APosteriori(DCS):
- 'faiss' will use Facebook's Faiss similarity search through the :class:`FaissKNNClassifier`
- None, will use sklearn :class:`KNeighborsClassifier`.
DSEL_perc : float (Default = 0.5)
Percentage of the input data used to fit DSEL.
Note: This parameter is only used if the pool of classifier is None or unfitted.
References
----------
G. Giacinto and F. Roli, Methods for Dynamic Classifier Selection
Expand All @@ -83,14 +91,20 @@ class APosteriori(DCS):
Information Fusion, vol. 41, pp. 195 – 216, 2018.
"""
def __init__(self, pool_classifiers, k=7, DFP=False, with_IH=False, safe_k=None, IH_rate=0.30,
selection_method='diff', diff_thresh=0.1, rng=np.random.RandomState(), knn_classifier='knn'):

super(APosteriori, self).__init__(pool_classifiers, k, DFP=DFP, with_IH=with_IH, safe_k=safe_k, IH_rate=IH_rate,
def __init__(self, pool_classifiers=None, k=7, DFP=False, with_IH=False, safe_k=None, IH_rate=0.30,
selection_method='diff', diff_thresh=0.1, random_state=None, knn_classifier='knn', DSEL_perc=0.5):

super(APosteriori, self).__init__(pool_classifiers=pool_classifiers,
k=k,
DFP=DFP,
with_IH=with_IH,
safe_k=safe_k,
IH_rate=IH_rate,
selection_method=selection_method,
diff_thresh=diff_thresh,
rng=rng, knn_classifier=knn_classifier)
self._check_predict_proba()
knn_classifier=knn_classifier,
random_state=random_state,
DSEL_perc=DSEL_perc)

self.name = 'A Posteriori'

Expand All @@ -112,7 +126,9 @@ class labels of each example in X.
self
"""
super(APosteriori, self).fit(X, y)
self.dsel_scores = self._preprocess_dsel_scores()
self._check_predict_proba()

self.dsel_scores_ = self._preprocess_dsel_scores()
return self

def estimate_competence(self, query, predictions=None):
Expand Down Expand Up @@ -150,26 +166,26 @@ def estimate_competence(self, query, predictions=None):
predictions = np.atleast_2d(predictions)

# Normalize the distances
dists_normalized = 1.0/dists
dists_normalized = 1.0 / dists

# Expanding the dimensions of the predictions and target arrays in order to compare both.
predictions_3d = np.expand_dims(predictions, axis=1)
target_3d = np.expand_dims(self.DSEL_target[idx_neighbors], axis=2)
target_3d = np.expand_dims(self.DSEL_target_[idx_neighbors], axis=2)
# Create a mask to remove the neighbors belonging to a different class than the predicted by the base classifier
mask = (predictions_3d != target_3d)

# Broadcast the distance array to the same shape as the pre-processed information for future calculations
dists_normalized = np.repeat(np.expand_dims(dists_normalized, axis=2), self.n_classifiers, axis=2)
dists_normalized = np.repeat(np.expand_dims(dists_normalized, axis=2), self.n_classifiers_, axis=2)

# Multiply the pre-processed correct predictions by the base classifiers to the distance array
scores_target_norm = self.dsel_scores[idx_neighbors, :, self.DSEL_target[idx_neighbors]] * dists_normalized
scores_target_norm = self.dsel_scores_[idx_neighbors, :, self.DSEL_target_[idx_neighbors]] * dists_normalized

# Create masked arrays to remove samples with different label in the calculations
masked_preprocessed = np.ma.MaskedArray(scores_target_norm, mask=mask)
masked_dist = np.ma.MaskedArray(dists_normalized, mask=mask)

# Consider only the neighbor samples where the predicted label is equals to the neighbor label
competences_masked = np.ma.sum(masked_preprocessed, axis=1)/ np.ma.sum(masked_dist, axis=1)
competences_masked = np.ma.sum(masked_preprocessed, axis=1) / np.ma.sum(masked_dist, axis=1)

# Fill 0 to the masked values in the resulting array (when no neighbors belongs to the class predicted by
# the corresponding base classifier)
Expand Down
45 changes: 31 additions & 14 deletions deslib/dcs/a_priori.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,9 +24,10 @@ class APriori(DCS):
Parameters
----------
pool_classifiers : list of classifiers
pool_classifiers : list of classifiers (Default = None)
The generated_pool of classifiers trained for the corresponding classification problem.
The classifiers should support methods "predict" and "predict_proba".
Each base classifiers should support the method "predict" and "predict_proba".
If None, then the pool of classifiers is a bagging classifier.
k : int (Default = 7)
Number of neighbors used to estimate the competence of the base classifiers.
Expand Down Expand Up @@ -54,8 +55,11 @@ class APriori(DCS):
classifiers for the random and diff selection schemes. If the difference is lower than the
threshold, their performance are considered equivalent.
rng : numpy.random.RandomState instance
Random number generator to assure reproducible results.
random_state : int, RandomState instance or None, optional (default=None)
If int, random_state is the seed used by the random number generator;
If RandomState instance, random_state is the random number generator;
If None, the random number generator is the RandomState instance used
by `np.random`.
knn_classifier : {'knn', 'faiss', None} (Default = 'knn')
The algorithm used to estimate the region of competence:
Expand All @@ -64,6 +68,10 @@ class APriori(DCS):
- 'faiss' will use Facebook's Faiss similarity search through the :class:`FaissKNNClassifier`
- None, will use sklearn :class:`KNeighborsClassifier`.
DSEL_perc : float (Default = 0.5)
Percentage of the input data used to fit DSEL.
Note: This parameter is only used if the pool of classifier is None or unfitted.
References
----------
G. Giacinto and F. Roli, Methods for Dynamic Classifier Selection
Expand All @@ -79,14 +87,21 @@ class APriori(DCS):
Information Fusion, vol. 41, pp. 195 – 216, 2018.
"""
def __init__(self, pool_classifiers, k=7, DFP=False, with_IH=False, safe_k=None, IH_rate=0.30,
selection_method='diff', diff_thresh=0.1, rng=np.random.RandomState(), knn_classifier='knn'):

super(APriori, self).__init__(pool_classifiers, k, DFP=DFP, with_IH=with_IH, safe_k=safe_k, IH_rate=IH_rate,
def __init__(self, pool_classifiers=None, k=7, DFP=False, with_IH=False, safe_k=None, IH_rate=0.30,
selection_method='diff', diff_thresh=0.1, random_state=None, knn_classifier='knn', DSEL_perc=0.33):

super(APriori, self).__init__(pool_classifiers=pool_classifiers,
k=k,
DFP=DFP,
with_IH=with_IH,
safe_k=safe_k,
IH_rate=IH_rate,
selection_method=selection_method,
diff_thresh=diff_thresh,
rng=rng, knn_classifier=knn_classifier)
self._check_predict_proba()
random_state=random_state,
knn_classifier=knn_classifier,
DSEL_perc=DSEL_perc)

self.name = 'A Priori'

Expand All @@ -108,7 +123,9 @@ class labels of each example in X.
self
"""
super(APriori, self).fit(X, y)
self.dsel_scores = self._preprocess_dsel_scores()
self._check_predict_proba()

self.dsel_scores_ = self._preprocess_dsel_scores()
return self

def estimate_competence(self, query, predictions=None):
Expand All @@ -121,7 +138,7 @@ def estimate_competence(self, query, predictions=None):
a higher influence in the computation of the competence level. The
competence level estimate is represented by the following equation:
.. math:: \\delta_{i,j} = \\frac{\\sum_{k = 1}^{K}P(\\omega_{l} \\mid
.. math:: \\delta_{i,j} = \\frac{\\sum_{k = 1}^{K}P(\\omega_{l} \\mid
\mathbf{x}_{k} \\in \\omega_{l}, c_{i} )W_{k}}{\\sum_{k = 1}^{K}W_{k}}
where :math:`\\delta_{i,j}` represents the competence level of :math:`c_{i}` for the classification of
Expand All @@ -141,15 +158,15 @@ def estimate_competence(self, query, predictions=None):
Competence level estimated for each base classifier and test example.
"""
dists, idx_neighbors = self._get_region_competence(query)
dists_normalized = 1.0/dists
dists_normalized = 1.0 / dists

# Get the ndarray containing the scores obtained for the correct class for each neighbor (and test sample)
scores_target_class = self.dsel_scores[idx_neighbors, :, self.DSEL_target[idx_neighbors]]
scores_target_class = self.dsel_scores_[idx_neighbors, :, self.DSEL_target_[idx_neighbors]]

# Multiply the scores obtained for the correct class to the distances of each corresponding neighbor
scores_target_class *= np.expand_dims(dists_normalized, axis=2)

# Sum the scores obtained for each neighbor and divide by the sum of all distances
competences = np.sum(scores_target_class, axis=1)/ np.sum(dists_normalized, axis=1, keepdims=True)
competences = np.sum(scores_target_class, axis=1) / np.sum(dists_normalized, axis=1, keepdims=True)

return competences
Loading

0 comments on commit 6f83160

Please sign in to comment.