Compatibility with scikit-learn (v0.19). Check_estimator passing for …

…all DES/DCS/static classifiers. * - Moving code to validate the parameters from __init__ to the fit method (sklearn style) * Refactoring DCS classes: Changing class attributes names to the sklearn style. Attributes estimated from the data now have an underscore after its name. * Changes to make the class compatible with the sklearn standards: - Moving code to validate the estimator parameters from the __init__ to the fit method; - Refactoring: Changing class attributes names to the sklearn style. Attributes estimated from the data now have an underscore after its name; - Addition BaseEstimator to the inherited classes for the get_params and set_params methods. * Updating the test routines to the according to the new changes in attribute names and parameter validation scheme * PEP8 formatting * Refactoring according to sklearn guidelines: Changing names of class attributes that are estimated based on the data (on the fit method) * Updating test routines according to the attributes name change * Refactoring according to sklearn guidelines: - Moving code to validate parameters from __init__ to fit - Change in attribute names (using an underscore after the name of attributes estimated from the data) * Updating test routines according to the refactoring on attribute name change and the new method for validating the estimator parameters * Fixing problem with identation * Refactoring: Moving code that validate parameters to the fit method; change ins the attribute names (sklearn standard) and accepting a clustering method as input parameter. * Updated test routines for the DESClustering class according to the new guidelines. * Adding code to verify whether the object passed as the clustering method is part of the sklearn clustering classes. * Updating the test routines that check if the base classifier implements the predict_proba function (Now the check happens inside the fit method) * Moving the _check_predict_proba function to the fit method. * Refactoring: remove old DFP masks * Refactoring * - Changing default value of pool_classifiers to None - Modifying name of random state attribute from rng to random_state * updating the n_classifies_ attribute in the test routines * Changing the name of the attribute rng tp random_state in the integration tests. * Fixing error in the docstring (return value of the method) * Changing check for proba after the fit method; refactoring attribute names according to sklearn guidelines * Adding random_state parameter * Adding random_state parameter * Adding the DFP and IH and random_state hyper-parameters to DESMI class. * changing random_state default value * Adding DESMI to the list of DES techniques * Adding DES Logarithmic * Making DS clustering compatible with sklearn estimators guidelines. * Making DESKNN compatible with sklearn estimators guidelines. * Making KNOP compatible with sklearn estimators guidelines. * Making META-DES compatible with sklearn estimators guidelines. * Making Probabilistic techniques compatible with sklearn estimators guidelines. * Updating test routines according to the new changes in variable names; Removing not used test cases * Updating test routines according to the new variable names; Removing obsolete test functions * Updating name of variables estimated from the data according to the sklearn guidelines * Adding random_state to the clustering definition * Making DESMI class an sklearn estimator * Merging with master branch * Adding sklearn's "check_estimator" tests (#84) * Adding sklearn's "check_estimator" for probabilistic DS methods * Adding test to show #89 is indeed a problem * Adding warning on base class (k bigger than DSEL) #93 * Adding known issue with GridSearch #89 * Fixes #91 * Marking the grid search test to skip (#89) * Adding tests for python 3.7 (#98) * Workaround for travis 3.7 support (#98) * Fix #92 * adding pytest_cache to the list of ignored folders * removing .idea from project * Fixing problem with rng in DCS classes when using "random" or "diff" as selection method (rng during predict/predict_proba). Fixes #88 * Base class for static ensembles * Making SingleBest an sklearn estimator * Making StaticSelection a sklearn estimator * Removing unused imports * Making Oracle class compatible with sklearn * Using sklearn check_array to assert a given array is 2d * Removing commented code lines * Fixing docstring on static ensemble classes; Solving a bug with label encoder for the single best class * Adding license information * automatically convert array to 2d * Updating tests with Oracle technique (using fit to setup label encoder) * updating oracle tests (setup label encoder in the fit method) * updating test; removing check estimator from Oracle since it is not a real classifier * Adding check array to predict. * Enforcing the predictions of the base classifiers are integers. * Fixing random state bug * removing commented lines of code * adding kdn score method * PEP8 formatting; Cleaning commented code. * Adding license information * Adding license information; moving kdn_score function to utils.instance_hardness.py; Adding Label encoder; Refactoring variable names according to sklearn standards * removing unused code * Adding check_estimator test for OLP method * Solving problem with label encoder when no base classifier predicts the correct label * Test routines for the SGH class * Adding predict proba; Checking if the method was fitted before calling predict and predict_proba. * Adding checks to raise an error in regression problems * skipping test while the batch processing version is not implemented * Adding parameter to indicate percentage of data used for DSEL in the training-DSEL split * Updating variable names. * Updating requirements version (sklearn 0.19) due to estimators check * Updating requirements version (sklearn 0.19) due to estimators check * Updating requirements; travis * Print values of N_ and J_ on error * Fixed checks for pct_accuracy * Fix test name * Fixing test
scikit-learn-contrib · Sep 28, 2018 · 6f83160 · 6f83160
1 parent de1d444
commit 6f83160
Show file tree

Hide file tree

Showing 67 changed files with 2,461 additions and 1,436 deletions.
diff --git a/.gitignore b/.gitignore
@@ -7,3 +7,4 @@ build
 dist
 DESlib.egg-info
 .idea
+.pytest_cache
diff --git a/.idea/vcs.xml b/.idea/vcs.xml
diff --git a/.travis.yml b/.travis.yml
@@ -2,6 +2,7 @@ language: python
 python:
   - "3.5"
   - "3.6"
+
 install:
   - wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh
   - bash miniconda.sh -b -p $HOME/miniconda
@@ -15,7 +16,11 @@ install:
   - travis_retry pip install codecov
   - travis_retry python setup.py build
   - travis_retry python setup.py install
-  - travis_retry conda install faiss-cpu -c pytorch
+  # Faiss requires anaconda and only works for python 3.5 and 3.6:
+  - if [[ "$TRAVIS_PYTHON_VERSION" != "3.7" ]]; then
+      travis_retry conda install faiss-cpu -c pytorch;
+    fi
+
 script: coverage run -m py.test
 after_success:
   - codecov

diff --git a/README.rst b/README.rst
@@ -57,7 +57,7 @@ Latest version (under development):
 Dependencies:
 -------------
 
-DESlib is tested to work with Python 3.5, and 3.6. The dependency requirements are:
+DESlib is tested to work with Python 3.5, 3.6 and 3.7. The dependency requirements are:
 
 * scipy(>=0.13.3)
 * numpy(>=1.10.4)

diff --git a/deslib/base.py b/deslib/base.py
diff --git a/deslib/dcs/a_posteriori.py b/deslib/dcs/a_posteriori.py
@@ -28,9 +28,10 @@ class APosteriori(DCS):
 
     Parameters
     ----------
-    pool_classifiers : list of classifiers
+    pool_classifiers : list of classifiers (Default = None)
                        The generated_pool of classifiers trained for the corresponding classification problem.
-                       The classifiers should support methods "predict" and "predict_proba".
+                       Each base classifiers should support the method "predict" and "predict_proba".
+                       If None, then the pool of classifiers is a bagging classifier.
 
     k : int (Default = 7)
         Number of neighbors used to estimate the competence of the base classifiers.
@@ -58,8 +59,11 @@ class APosteriori(DCS):
                   classifiers for the random and diff selection schemes. If the difference is lower than the
                   threshold, their performance are considered equivalent.
 
-    rng : numpy.random.RandomState instance
-          Random number generator to assure reproducible results.
+    random_state : int, RandomState instance or None, optional (default=None)
+                   If int, random_state is the seed used by the random number generator;
+                   If RandomState instance, random_state is the random number generator;
+                   If None, the random number generator is the RandomState instance used
+                   by `np.random`.
 
     knn_classifier : {'knn', 'faiss', None} (Default = 'knn')
                      The algorithm used to estimate the region of competence:
@@ -68,6 +72,10 @@ class APosteriori(DCS):
                      - 'faiss' will use Facebook's Faiss similarity search through the :class:`FaissKNNClassifier`
                      - None, will use sklearn :class:`KNeighborsClassifier`.
 
+    DSEL_perc : float (Default = 0.5)
+                Percentage of the input data used to fit DSEL.
+                Note: This parameter is only used if the pool of classifier is None or unfitted.
+
     References
     ----------
     G. Giacinto and F. Roli, Methods for Dynamic Classifier Selection
@@ -83,14 +91,20 @@ class APosteriori(DCS):
     Information Fusion, vol. 41, pp. 195 – 216, 2018.
 
     """
-    def __init__(self, pool_classifiers, k=7, DFP=False, with_IH=False, safe_k=None, IH_rate=0.30,
-                 selection_method='diff', diff_thresh=0.1, rng=np.random.RandomState(), knn_classifier='knn'):
-
-        super(APosteriori, self).__init__(pool_classifiers, k, DFP=DFP, with_IH=with_IH, safe_k=safe_k, IH_rate=IH_rate,
+    def __init__(self, pool_classifiers=None, k=7, DFP=False, with_IH=False, safe_k=None, IH_rate=0.30,
+                 selection_method='diff', diff_thresh=0.1, random_state=None, knn_classifier='knn', DSEL_perc=0.5):
+
+        super(APosteriori, self).__init__(pool_classifiers=pool_classifiers,
+                                          k=k,
+                                          DFP=DFP,
+                                          with_IH=with_IH,
+                                          safe_k=safe_k,
+                                          IH_rate=IH_rate,
                                           selection_method=selection_method,
                                           diff_thresh=diff_thresh,
-                                          rng=rng, knn_classifier=knn_classifier)
-        self._check_predict_proba()
+                                          knn_classifier=knn_classifier,
+                                          random_state=random_state,
+                                          DSEL_perc=DSEL_perc)
 
         self.name = 'A Posteriori'
 
@@ -112,7 +126,9 @@ class labels of each example in X.
         self
         """
         super(APosteriori, self).fit(X, y)
-        self.dsel_scores = self._preprocess_dsel_scores()
+        self._check_predict_proba()
+
+        self.dsel_scores_ = self._preprocess_dsel_scores()
         return self
 
     def estimate_competence(self, query, predictions=None):
@@ -150,26 +166,26 @@ def estimate_competence(self, query, predictions=None):
         predictions = np.atleast_2d(predictions)
 
         # Normalize the distances
-        dists_normalized = 1.0/dists
+        dists_normalized = 1.0 / dists
 
         # Expanding the dimensions of the predictions and target arrays in order to compare both.
         predictions_3d = np.expand_dims(predictions, axis=1)
-        target_3d = np.expand_dims(self.DSEL_target[idx_neighbors], axis=2)
+        target_3d = np.expand_dims(self.DSEL_target_[idx_neighbors], axis=2)
         # Create a mask to remove the neighbors belonging to a different class than the predicted by the base classifier
         mask = (predictions_3d != target_3d)
 
         # Broadcast the distance array to the same shape as the pre-processed information for future calculations
-        dists_normalized = np.repeat(np.expand_dims(dists_normalized, axis=2), self.n_classifiers, axis=2)
+        dists_normalized = np.repeat(np.expand_dims(dists_normalized, axis=2), self.n_classifiers_, axis=2)
 
         # Multiply the pre-processed correct predictions by the base classifiers to the distance array
-        scores_target_norm = self.dsel_scores[idx_neighbors, :, self.DSEL_target[idx_neighbors]] * dists_normalized
+        scores_target_norm = self.dsel_scores_[idx_neighbors, :, self.DSEL_target_[idx_neighbors]] * dists_normalized
 
         # Create masked arrays to remove samples with different label in the calculations
         masked_preprocessed = np.ma.MaskedArray(scores_target_norm, mask=mask)
         masked_dist = np.ma.MaskedArray(dists_normalized, mask=mask)
 
         # Consider only the neighbor samples where the predicted label is equals to the neighbor label
-        competences_masked = np.ma.sum(masked_preprocessed, axis=1)/ np.ma.sum(masked_dist, axis=1)
+        competences_masked = np.ma.sum(masked_preprocessed, axis=1) / np.ma.sum(masked_dist, axis=1)
 
         # Fill 0 to the masked values in the resulting array (when no neighbors belongs to the class predicted by
         # the corresponding base classifier)

diff --git a/deslib/dcs/a_priori.py b/deslib/dcs/a_priori.py
@@ -24,9 +24,10 @@ class APriori(DCS):
 
     Parameters
     ----------
-    pool_classifiers : list of classifiers
+    pool_classifiers : list of classifiers (Default = None)
                        The generated_pool of classifiers trained for the corresponding classification problem.
-                       The classifiers should support methods "predict" and "predict_proba".
+                       Each base classifiers should support the method "predict" and "predict_proba".
+                       If None, then the pool of classifiers is a bagging classifier.
 
     k : int (Default = 7)
         Number of neighbors used to estimate the competence of the base classifiers.
@@ -54,8 +55,11 @@ class APriori(DCS):
                   classifiers for the random and diff selection schemes. If the difference is lower than the
                   threshold, their performance are considered equivalent.
 
-    rng : numpy.random.RandomState instance
-          Random number generator to assure reproducible results.
+    random_state : int, RandomState instance or None, optional (default=None)
+                   If int, random_state is the seed used by the random number generator;
+                   If RandomState instance, random_state is the random number generator;
+                   If None, the random number generator is the RandomState instance used
+                   by `np.random`.
 
     knn_classifier : {'knn', 'faiss', None} (Default = 'knn')
                      The algorithm used to estimate the region of competence:
@@ -64,6 +68,10 @@ class APriori(DCS):
                      - 'faiss' will use Facebook's Faiss similarity search through the :class:`FaissKNNClassifier`
                      - None, will use sklearn :class:`KNeighborsClassifier`.
 
+    DSEL_perc : float (Default = 0.5)
+                Percentage of the input data used to fit DSEL.
+                Note: This parameter is only used if the pool of classifier is None or unfitted.
+
     References
     ----------
     G. Giacinto and F. Roli, Methods for Dynamic Classifier Selection
@@ -79,14 +87,21 @@ class APriori(DCS):
     Information Fusion, vol. 41, pp. 195 – 216, 2018.
 
     """
-    def __init__(self, pool_classifiers, k=7, DFP=False, with_IH=False, safe_k=None, IH_rate=0.30,
-                 selection_method='diff', diff_thresh=0.1, rng=np.random.RandomState(), knn_classifier='knn'):
 
-        super(APriori, self).__init__(pool_classifiers, k, DFP=DFP, with_IH=with_IH, safe_k=safe_k, IH_rate=IH_rate,
+    def __init__(self, pool_classifiers=None, k=7, DFP=False, with_IH=False, safe_k=None, IH_rate=0.30,
+                 selection_method='diff', diff_thresh=0.1, random_state=None, knn_classifier='knn', DSEL_perc=0.33):
+
+        super(APriori, self).__init__(pool_classifiers=pool_classifiers,
+                                      k=k,
+                                      DFP=DFP,
+                                      with_IH=with_IH,
+                                      safe_k=safe_k,
+                                      IH_rate=IH_rate,
                                       selection_method=selection_method,
                                       diff_thresh=diff_thresh,
-                                      rng=rng, knn_classifier=knn_classifier)
-        self._check_predict_proba()
+                                      random_state=random_state,
+                                      knn_classifier=knn_classifier,
+                                      DSEL_perc=DSEL_perc)
 
         self.name = 'A Priori'
 
@@ -108,7 +123,9 @@ class labels of each example in X.
         self
         """
         super(APriori, self).fit(X, y)
-        self.dsel_scores = self._preprocess_dsel_scores()
+        self._check_predict_proba()
+
+        self.dsel_scores_ = self._preprocess_dsel_scores()
         return self
 
     def estimate_competence(self, query, predictions=None):
@@ -121,7 +138,7 @@ def estimate_competence(self, query, predictions=None):
         a higher influence in the computation of the competence level.  The
         competence level estimate is represented by the following equation:
 
-        .. math:: 	\\delta_{i,j} = \\frac{\\sum_{k = 1}^{K}P(\\omega_{l} \\mid
+        .. math::   \\delta_{i,j} = \\frac{\\sum_{k = 1}^{K}P(\\omega_{l} \\mid
             \mathbf{x}_{k} \\in \\omega_{l}, c_{i} )W_{k}}{\\sum_{k = 1}^{K}W_{k}}
 
         where :math:`\\delta_{i,j}` represents the competence level of :math:`c_{i}` for the classification of
@@ -141,15 +158,15 @@ def estimate_competence(self, query, predictions=None):
                       Competence level estimated for each base classifier and test example.
         """
         dists, idx_neighbors = self._get_region_competence(query)
-        dists_normalized = 1.0/dists
+        dists_normalized = 1.0 / dists
 
         # Get the ndarray containing the scores obtained for the correct class for each neighbor (and test sample)
-        scores_target_class = self.dsel_scores[idx_neighbors, :, self.DSEL_target[idx_neighbors]]
+        scores_target_class = self.dsel_scores_[idx_neighbors, :, self.DSEL_target_[idx_neighbors]]
 
         # Multiply the scores obtained for the correct class to the distances of each corresponding neighbor
         scores_target_class *= np.expand_dims(dists_normalized, axis=2)
 
         # Sum the scores obtained for each neighbor and divide by the sum of all distances
-        competences = np.sum(scores_target_class, axis=1)/ np.sum(dists_normalized, axis=1, keepdims=True)
+        competences = np.sum(scores_target_class, axis=1) / np.sum(dists_normalized, axis=1, keepdims=True)
 
         return competences