-
Notifications
You must be signed in to change notification settings - Fork 72
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #22 from rodrigo-arenas/0.5.xdev
MLflow integration
- Loading branch information
Showing
22 changed files
with
452 additions
and
125 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,62 @@ | ||
import warnings | ||
from sklearn_genetic import GASearchCV | ||
from sklearn_genetic.space import Categorical, Integer, Continuous | ||
from sklearn.model_selection import train_test_split, StratifiedKFold | ||
from sklearn.tree import DecisionTreeClassifier | ||
from sklearn.datasets import load_digits | ||
from sklearn.metrics import accuracy_score | ||
from sklearn_genetic.mlflow import MLflowConfig | ||
|
||
warnings.filterwarnings("ignore") | ||
|
||
data = load_digits() | ||
label_names = data["target_names"] | ||
y = data["target"] | ||
X = data["data"] | ||
|
||
X_train, X_test, y_train, y_test = train_test_split( | ||
X, y, test_size=0.33, random_state=42 | ||
) | ||
|
||
clf = DecisionTreeClassifier() | ||
|
||
params_grid = { | ||
"min_weight_fraction_leaf": Continuous(0, 0.5), | ||
"criterion": Categorical(["gini", "entropy"]), | ||
"max_depth": Integer(2, 20), | ||
"max_leaf_nodes": Integer(2, 30), | ||
} | ||
|
||
cv = StratifiedKFold(n_splits=3, shuffle=True) | ||
|
||
mlflow_config = MLflowConfig( | ||
tracking_uri="http://localhost:5000", | ||
experiment="Digits-sklearn-genetic-opt", | ||
run_name="Decision Tree", | ||
save_models=True, | ||
tags={"team": "sklearn-genetic-opt", "version": "0.5.0"}, | ||
) | ||
|
||
evolved_estimator = GASearchCV( | ||
clf, | ||
cv=cv, | ||
scoring="accuracy", | ||
population_size=4, | ||
generations=10, | ||
tournament_size=3, | ||
elitism=True, | ||
crossover_probability=0.9, | ||
mutation_probability=0.05, | ||
param_grid=params_grid, | ||
algorithm="eaMuPlusLambda", | ||
n_jobs=-1, | ||
verbose=True, | ||
log_config=mlflow_config, | ||
) | ||
|
||
evolved_estimator.fit(X_train, y_train) | ||
y_predict_ga = evolved_estimator.predict(X_test) | ||
accuracy = accuracy_score(y_test, y_predict_ga) | ||
|
||
print(evolved_estimator.best_params_) | ||
print("accuracy score: ", "{:.2f}".format(accuracy)) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
MLflow | ||
------ | ||
|
||
.. autoclass:: sklearn_genetic.MLflowConfig | ||
:members: | ||
:undoc-members: False |
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,130 @@ | ||
Integrating with MLflow | ||
======================= | ||
|
||
In this post, we are going to explain how setup the build-in integration | ||
of sklearn-genetic-opt with MLflow. | ||
To use this feature, we must set the parameters that will include | ||
the tracking server, experiment, run name, tags and others, | ||
the full implementation is here: :class:`~sklearn_genetic.mlflow.MLflowConfig` | ||
|
||
Configuration | ||
------------- | ||
|
||
The configuration is pretty straight forward, we just need | ||
to import the main class and define some parameters, here there is its meaning: | ||
|
||
* **tracking_uri:** Address of local or remote tracking server. | ||
* **experiment:** Case sensitive name of an experiment to be activated. | ||
* **run_name:** Name of new run (stored as a mlflow.runName tag). | ||
* **save_models:** If ``True``, it will log the estimator into mlflow artifacts | ||
* **registry_uri:** Address of local or remote model registry server. | ||
* **tags:** Dictionary of tags to apply. | ||
|
||
Example | ||
-------- | ||
|
||
In this example, we are going to log the information into a mlflow server | ||
that is running in our local host, port 5000, we want to save each of the | ||
trained models. | ||
|
||
.. code:: python3 | ||
from sklearn_genetic.mlflow import MLflowConfig | ||
mlflow_config = MLflowConfig( | ||
tracking_uri="http://localhost:5000", | ||
experiment="Digits-sklearn-genetic-opt", | ||
run_name="Decision Tree", | ||
save_models=True, | ||
tags={"team": "sklearn-genetic-opt", "version": "0.5.0"}) | ||
Now, this config is passed to the :class:`~sklearn_genetic.GASearchCV` class | ||
in the parameter named `log_config`, for example: | ||
|
||
.. code:: python3 | ||
from sklearn_genetic import GASearchCV | ||
from sklearn_genetic.space import Categorical, Integer, Continuous | ||
from sklearn.model_selection import train_test_split, StratifiedKFold | ||
from sklearn.tree import DecisionTreeClassifier | ||
from sklearn.datasets import load_digits | ||
from sklearn.metrics import accuracy_score | ||
from sklearn_genetic.mlflow import MLflowConfig | ||
data = load_digits() | ||
label_names = data["target_names"] | ||
y = data["target"] | ||
X = data["data"] | ||
X_train, X_test, y_train, y_test = train_test_split( | ||
X, y, test_size=0.33, random_state=42) | ||
clf = DecisionTreeClassifier() | ||
params_grid = { | ||
"min_weight_fraction_leaf": Continuous(0, 0.5), | ||
"criterion": Categorical(["gini", "entropy"]), | ||
"max_depth": Integer(2, 20), | ||
"max_leaf_nodes": Integer(2, 30)} | ||
cv = StratifiedKFold(n_splits=3, shuffle=True) | ||
evolved_estimator = GASearchCV( | ||
clf, | ||
cv=cv, | ||
scoring="accuracy", | ||
population_size=3, | ||
generations=5, | ||
tournament_size=3, | ||
elitism=True, | ||
crossover_probability=0.9, | ||
mutation_probability=0.05, | ||
param_grid=params_grid, | ||
algorithm="eaMuPlusLambda", | ||
n_jobs=-1, | ||
verbose=True, | ||
log_config=mlflow_config) | ||
evolved_estimator.fit(X_train, y_train) | ||
y_predict_ga = evolved_estimator.predict(X_test) | ||
accuracy = accuracy_score(y_test, y_predict_ga) | ||
print(evolved_estimator.best_params_) | ||
Notice that we choose a small generations and population_size, just to be | ||
able to see the results without much verbosity. | ||
|
||
If you go to you mlflow UI and click the experiment named "Digits-sklearn-genetic-opt" | ||
We should see something like this (I've hidden some columns to give a better look): | ||
|
||
.. image:: ../images/mlflow_experiment_0.png | ||
|
||
There we can see the user that ran the experiment, the name of the file | ||
which contained the source code, our tags and other metadata. Notice | ||
that there is "plus" symbol that will show us each of our iterations, | ||
this is because sklearn-genetic-opt will log each `GASearchCV.fit()` call | ||
in a nested way, think it like a parent run, and each child is | ||
one of the hyperparameters that were tested, for example if we run the | ||
same code again, now we see two parents runs: | ||
|
||
.. image:: ../images/mlflow_nested_run_1.png | ||
|
||
Now click on any of the "plus" symbols to see all the children, now they | ||
look like this (again edited the columns to display): | ||
|
||
.. image:: ../images/mlflow_children_2.png | ||
|
||
From there we can see the hyper parameters and the score (cross-validation) | ||
that we got in each run, from there we can use the regular mlflow functionalities | ||
like comparing runs, download the CSV, register a model, etc. You can see more | ||
on https://mlflow.org/docs/latest/index.html | ||
|
||
Now, as we set ``save_model=True``, you can see that the column "Model" | ||
as a file attached as an artifact, if we click on one of those, we see | ||
a resume of that particular execution and some utils to use right away the | ||
model: | ||
|
||
.. image:: ../images/mlflow_artifacts_4.png | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
from .early_stoppers import ( | ||
DeltaThreshold, | ||
ThresholdStopping, | ||
ConsecutiveStopping, | ||
) | ||
from .loggers import LogbookSaver | ||
from ..mlflow import MLflowConfig | ||
|
||
__all__ = ["DeltaThreshold", "ThresholdStopping", "ConsecutiveStopping", "LogbookSaver"] |
Oops, something went wrong.