Skip to content

Commit

Permalink
Merge pull request #7 from rodrigo-arenas/0.4.0dev
Browse files Browse the repository at this point in the history
plots update
  • Loading branch information
rodrigo-arenas committed May 31, 2021
2 parents d699810 + 0047685 commit 05f6fb8
Show file tree
Hide file tree
Showing 15 changed files with 94 additions and 41 deletions.
1 change: 0 additions & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
language: python
python:
- 3.6
- 3.7
- 3.8
- 3.9
Expand Down
34 changes: 15 additions & 19 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
.. |Docs| image:: https://readthedocs.org/projects/sklearn-genetic-opt/badge/?version=latest
.. _Docs: https://sklearn-genetic-opt.readthedocs.io/en/latest/?badge=latest

.. image:: /docs/logo.png

Sklearn-genetic-opt
########
Expand All @@ -24,7 +25,19 @@ scikit-learn models hyperparameters tuning, using evolutionary algorithms.

This is meant to be an alternative from popular methods inside scikit-learn such as Grid Search and Randomized Grid Search.

Sklearn-genetic-opt uses evolutionary algorithms from the deap package to find the "best" set of hyperparameters that optimizes (max or min) the cross validation scores, it can be used for both regression and classification problems.
Sklearn-genetic-opt uses evolutionary algorithms from the deap package to find the "best" set of hyperparameters that
optimizes (max or min) the cross validation scores, it can be used for both regression and classification problems.

Documentation is available `here <https://sklearn-genetic-opt.readthedocs.io/>`_

Optimization progress in a regression problem:

.. image:: https://github.com/rodrigo-arenas/Sklearn-genetic-opt/blob/master/demo/images/fitness.png?raw=True

Sampled distribution oh hyperparameters

.. image:: https://github.com/rodrigo-arenas/Sklearn-genetic-opt/blob/master/demo/images/density.png?raw=True


Main Features:
########
Expand All @@ -49,7 +62,6 @@ Example
.. code-block:: python
from sklearn_genetic import GASearchCV
from sklearn_genetic.plots import plot_fitness_evolution, plot_search_space
from sklearn_genetic.space import Continuous, Categorical, Integer
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split, StratifiedKFold
Expand Down Expand Up @@ -91,14 +103,6 @@ Example
y_predict_ga = evolved_estimator.predict(X_test)
print(accuracy_score(y_test, y_predict_ga))
# Check the distribution of sampled hyperparameters
plot_search_space(evolved_estimator, features=['min_weight_fraction_leaf', 'max_depth', 'max_leaf_nodes', 'n_estimators'])
plt.show()
# See the evolution of the optimization per generation
plot_fitness_evolution(evolved_estimator)
plt.show()
# Saved metadata for further analysis
print("Stats achieved in each generation: ", evolved_estimator.history)
print("Best k solutions: ", evolved_estimator.hof)
Expand All @@ -107,17 +111,9 @@ Example
Results
^^^^^^^^^

Sampled distribution

.. image:: https://github.com/rodrigo-arenas/Sklearn-genetic-opt/blob/0.4.x/demo/images/density.png?raw=True

Fitness evolution over generations

.. image:: https://github.com/rodrigo-arenas/Sklearn-genetic-opt/blob/0.4.x/demo/images/fitness.png?raw=True

Log controlled by verbosity

.. image:: https://github.com/rodrigo-arenas/Sklearn-genetic-opt/blob/0.4.x/demo/images/log.JPG?raw=True
.. image:: https://github.com/rodrigo-arenas/Sklearn-genetic-opt/blob/master/demo/images/log.JPG?raw=True


Contributing
Expand Down
2 changes: 1 addition & 1 deletion demo/Boson_Houses_decision_tree.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@
clf,
cv=3,
scoring="r2",
population_size=12,
population_size=20,
generations=30,
tournament_size=3,
elitism=True,
Expand Down
Binary file modified demo/images/density.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified demo/images/fitness.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 2 additions & 1 deletion dev-requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -11,4 +11,5 @@ black==21.5b2
sphinx
sphinx_gallery
sphinx_rtd_theme
sphinx-copybutton
sphinx-copybutton
numpydoc
7 changes: 7 additions & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,13 @@
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ["_static"]

html_theme_options = {
"navigation_depth": 3,
"logo_only": True,
}

html_logo = "logo.png"

master_doc = "index"

# generate autosummary even if no references
Expand Down
Binary file modified docs/images/basic_usage_plot_space_4.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 3 additions & 3 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,10 @@
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
Welcome to sklean-genetic-opt's documentation!
==============================================

sklean-genetic-opt
==================
scikit-learn models hyperparameters tuning, using evolutionary algorithms.
##########################################################################

This is meant to be an alternative from popular methods inside scikit-learn such as Grid Search and Randomized Grid Search.

Expand Down
Binary file added docs/logo.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion docs/tutorials/callbacks.rst
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ using the 'fitness_min' value:
ThresholdStopping
-----------------
IT stop the optimization if the current metric
It stops the optimization if the current metric
is greater or equals than the define threshold.

For example, if we want to stop the optimization
Expand Down
10 changes: 5 additions & 5 deletions docs/tutorials/custom_callback.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ but you can make one of your own by defining a callable with
certain methods.

The callback must be a class that implements the ``__call__`` and
``_check`` methods, the result of them must be a bool, ``True`` means
``on_step`` methods, the result of them must be a bool, ``True`` means
that the optimization must stop, ``False``, means it can continue.

In this example, we are going to define a dummy callback that
Expand All @@ -17,7 +17,7 @@ The callback must have two parameters: `record` and `logbook`.
Those are a dictionary and a deap's Logbook object respectively,
with the current iteration metrics and all the past iterations metrics.
You can choice which to use, but both must be parameters
on the ``_check`` and ``__call__`` methods.
on the ``on_step`` and ``__call__`` methods.

So to check inside the logbook, we could define a function like this:

Expand All @@ -27,7 +27,7 @@ So to check inside the logbook, we could define a function like this:
metric='fitness'
threshold=0.8
def _check(record, logbook, threshold):
def on_step(record, logbook, threshold):
# Not enough data points
if len(logbook) <= N:
return False
Expand All @@ -53,7 +53,7 @@ that will have all this parameters, so we can rewrite it like this:
self.N = N
self.metric = metric
def _check(self, record, logbook):
def on_step(self, record, logbook):
# Not enough data points
if len(logbook) <= self.N:
return False
Expand All @@ -68,7 +68,7 @@ that will have all this parameters, so we can rewrite it like this:
return False
def __call__(self, record, logbook):
return self._check(record, logbook)
return self.on_step(record, logbook)
So that is it, now you can initialize the DummyThreshold
Expand Down
43 changes: 43 additions & 0 deletions ims.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
import matplotlib.pyplot as plt
from sklearn_genetic import GASearchCV
from sklearn_genetic.space import Categorical, Integer, Continuous
from sklearn.model_selection import train_test_split, StratifiedKFold
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_digits
from sklearn.metrics import accuracy_score

data = load_digits()
n_samples = len(data.images)
X = data.images.reshape((n_samples, -1))
y = data['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

param_grid = {'min_weight_fraction_leaf': Continuous(0.01, 0.5, distribution='log-uniform'),
'bootstrap': Categorical([True, False]),
'max_depth': Integer(2, 30),
'max_leaf_nodes': Integer(2, 35),
'n_estimators': Integer(100, 300)}

# The base classifier to tune
clf = RandomForestClassifier()

# Our cross-validation strategy (optional)
cv = StratifiedKFold(n_splits=3, shuffle=True)

# The main class from sklearn-genetic-opt
evolved_estimator = GASearchCV(estimator=clf,
cv=cv,
generations=20,
population_size=6,
scoring='accuracy',
param_grid=param_grid,
algorithm='eaSimple',
n_jobs=-1,
verbose=True)

# Train and optimize the estimator
evolved_estimator.fit(X_train, y_train)

from sklearn_genetic.plots import plot_search_space
plot_search_space(evolved_estimator, features=['min_weight_fraction_leaf', 'max_depth', 'max_leaf_nodes', 'n_estimators'])
plt.show()
12 changes: 6 additions & 6 deletions sklearn_genetic/callbacks.py
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ def __init__(self, threshold, metric="fitness"):
self.threshold = threshold
self.metric = metric

def _check(self, record, logbook):
def on_step(self, record, logbook):
"""
Parameters
----------
Expand All @@ -103,7 +103,7 @@ def _check(self, record, logbook):
)

def __call__(self, record=None, logbook=None):
return self._check(record, logbook)
return self.on_step(record, logbook)


class ConsecutiveStopping:
Expand All @@ -126,7 +126,7 @@ def __init__(self, generations, metric="fitness"):
self.generations = generations
self.metric = metric

def _check(self, record=None, logbook=None):
def on_step(self, record=None, logbook=None):
"""
Parameters
----------
Expand Down Expand Up @@ -156,7 +156,7 @@ def _check(self, record=None, logbook=None):
raise ValueError("logbook parameter must be provided")

def __call__(self, record=None, logbook=None):
return self._check(record, logbook)
return self.on_step(record, logbook)


class DeltaThreshold:
Expand All @@ -179,7 +179,7 @@ def __init__(self, threshold, metric: str = "fitness"):
self.threshold = threshold
self.metric = metric

def _check(self, record=None, logbook=None):
def on_step(self, record=None, logbook=None):
"""
Parameters
----------
Expand Down Expand Up @@ -210,4 +210,4 @@ def _check(self, record=None, logbook=None):
raise ValueError("logbook parameter must be provided")

def __call__(self, record=None, logbook=None):
return self._check(record, logbook)
return self.on_step(record, logbook)
15 changes: 11 additions & 4 deletions sklearn_genetic/plots.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@

from .utils import logbook_to_pandas

sns.set_style("whitegrid")

"""
This module contains some useful function to explore the results of the optimization routines
Expand All @@ -21,6 +20,8 @@ def plot_fitness_evolution(estimator):
Lines plot with the fitness value in each generation
"""
sns.set_style("white")

fitness_history = estimator.history["fitness"]

palette = sns.color_palette("rocket")
Expand Down Expand Up @@ -53,6 +54,8 @@ def plot_search_space(estimator, height=2, s=25, features: list = None):
Pair plot of the used hyperparameters during the search
"""
sns.set_style("white")

df = logbook_to_pandas(estimator.logbook)
if features:
stats = df[features]
Expand All @@ -61,7 +64,11 @@ def plot_search_space(estimator, height=2, s=25, features: list = None):
stats = df[variables]

g = sns.PairGrid(stats, diag_sharey=False, height=height)
g = g.map_upper(sns.scatterplot, s=s)
g = g.map_lower(sns.kdeplot, shade=True)
g = g.map_diag(sns.kdeplot, shade=True)
g = g.map_upper(sns.scatterplot, s=s, color="r", alpha=0.2)
g = g.map_lower(
sns.kdeplot,
shade=True,
cmap=sns.color_palette("ch:s=.25,rot=-.25", as_cmap=True),
)
g = g.map_diag(sns.kdeplot, shade=True, palette="crest", alpha=0.2, color="red")
return g

0 comments on commit 05f6fb8

Please sign in to comment.