Skip to content

Commit

Permalink
Merge pull request #60 from rodrigo-arenas/0.6.X
Browse files Browse the repository at this point in the history
[PR] Grammar improvements
  • Loading branch information
rodrigo-arenas committed Jul 2, 2021
2 parents da53f41 + b0e8d8f commit 7c15bf4
Show file tree
Hide file tree
Showing 9 changed files with 84 additions and 84 deletions.
18 changes: 9 additions & 9 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,38 +7,38 @@ You can contribute with documentation, examples/tutorials, reviewing pull reques
helping answer questions in issues, creating visualizations, maintaining project
infrastructure, and creating new tests.

Code contributions are always welcome, from simple bug fixes, to new features.
Also consider contributing to the documentation,
Code contributions are always welcome, from simple bug fixes to new features.
Also, consider contributing to the documentation,
and reviewing open issues, it is the easiest way to get started.

When working on your local computer, make sure to install the development dependencies with:
```bash
pip install -r dev-requirements.txt
```

If you have questions, you can open an issue (tag it as question).
If you have questions, you can open an issue (tag it as a question).

We encourage you to follow these guidelines:

* Fork this project, make the changes you expect to merge and make a pull request
* If the work you are making is related to some issue, please mention in the comments
that you are working on it, so other people know and no duplicate you work.
* If you are working in a new feature, or have an idea, consider first opening an issue
so people know in what you are working on and possible give some guidelines
that you are working on it, so other people know and no duplicate your work.
* If you are working on a new feature, or have an idea, consider first opening an issue
so people know what you are working on and possibly give some guidelines
* Commit all changes by pull request (PR)
* A PR solves one problem (do not mix problems together in one PR) with the
* A PR solves one problem (do not mix problems in one PR) with the
minimal set of changes
* The changes should come with their respective tests and documentation
* Describe why you are proposing those changes
* Please run black on top of the package to keep the formatting style
```bash
black .
```
* Make sure all the test are passing, by running in the root of the project
* Make sure all the tests are passing, by running in the root of the project
```bash
pytest sklearn_genetic
```
* We can not merge if the tests fails.
* We can not merge if the tests fail.

# External References

Expand Down
10 changes: 5 additions & 5 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -37,8 +37,8 @@ Documentation is available `here <https://sklearn-genetic-opt.readthedocs.io/>`_
Main Features:
##############

* **GASearchCV**: Principal class of the package, holds the evolutionary cross validation optimization routine.
* **Algorithms**: Set of different evolutionary algorithms to use as optimization procedure.
* **GASearchCV**: Principal class of the package, holds the evolutionary cross-validation optimization routine.
* **Algorithms**: Set of different evolutionary algorithms to use as an optimization procedure.
* **Callbacks**: Custom evaluation strategies to generate early stopping rules,
logging (into TensorBoard, .pkl files, etc) or your custom logic.
* **Plots**: Generate pre-defined plots to understand the optimization process.
Expand All @@ -51,7 +51,7 @@ Visualize the progress of your training:

.. image:: docs/images/progress_bar.gif

Real time metrics visualization and comparison across runs:
Real-time metrics visualization and comparison across runs:

.. image:: https://github.com/rodrigo-arenas/Sklearn-genetic-opt/blob/master/docs/images/tensorboard_log.png?raw=true

Expand Down Expand Up @@ -165,10 +165,10 @@ Contributing
############

Contributions are more than welcome!
There are lots of opportunities on the on going project, so please get in touch if you would like to help out.
There are lots of opportunities on the ongoing project, so please get in touch if you would like to help out.
Also check the `Contribution guide <https://github.com/rodrigo-arenas/Sklearn-genetic-opt/blob/master/CONTRIBUTING.md>`_

Big thanks to the people who are helping this project!
Big thanks to the people who are helping with this project!

|Contributors|_

Expand Down
4 changes: 2 additions & 2 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,8 @@ scikit-learn models hyperparameters tuning, using evolutionary algorithms.

This is meant to be an alternative from popular methods inside scikit-learn such as Grid Search and Randomized Grid Search.

Sklearn-genetic-opt uses evolutionary algorithms from the deap package to choose set of hyperparameters
that optimizes (max or min) the cross validation scores, it can be used for both regression and classification problems.
Sklearn-genetic-opt uses evolutionary algorithms from the deap package to choose a set of hyperparameters
that optimizes (max or min) the cross-validation scores, it can be used for both regression and classification problems.

Installation:
#############
Expand Down
10 changes: 5 additions & 5 deletions docs/release_notes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -29,15 +29,15 @@ Features:

- **on_start**: When the evolutionary algorithm is called from the GASearchCV.fit method.

- **on_step:** When the evolutionary algorithm finish a generation (no change here).
- **on_step:** When the evolutionary algorithm finishes a generation (no change here).

- **on_end:** At the end of the last generation.

^^^^^^^^^^
Bug Fixes:
^^^^^^^^^^

* A missing statement was making that the callbacks starts to get evaluated from generation 1, ignoring generation 0.
* A missing statement was making that the callbacks start to get evaluated from generation 1, ignoring generation 0.
Now this is properly handled and callbacks work from generation 0.

^^^^^^^^^^^^
Expand Down Expand Up @@ -115,7 +115,7 @@ Docs:
* Added user guide on "Understanding the evaluation process"
* Several guides on contributing, code of conduct
* Added important links
* Docs requirement are now independent of package requirements
* Docs requirements are now independent of package requirements

^^^^^^^^^
Internal:
Expand Down Expand Up @@ -187,15 +187,15 @@ Features:
* Enabled deap's eaMuPlusLambda algorithm for the optimization process, now is the default routine
* Added a logbook and history properties to the fitted GASearchCV to make post-fit analysis
* ``Elitism=False`` now implements a roulette selection instead of ignoring the parameter
* Added the parameter keep_top_k to control the amount of solutions if the hall of fame (hof)
* Added the parameter keep_top_k to control the number of solutions if the hall of fame (hof)

^^^^^^^^^^^^
API Changes:
^^^^^^^^^^^^

* Refactored the optimization algorithm to use DEAP package instead
of a custom implementation, this causes the removal of several methods, properties and variables inside the GASearchCV class
* The parameter encoding_length has been removed, it's not longer required to the GASearchCV class
* The parameter encoding_length has been removed, it's no longer required to the GASearchCV class
* Renamed the property of the fitted estimator from `best_params_` to `best_params`
* The verbosity now prints the deap log of the fitness function,
it's standard deviation, max and min values from each generation
Expand Down
28 changes: 14 additions & 14 deletions docs/tutorials/basic_usage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,12 +21,12 @@ The optimization is made by evolutionary algorithms with the help of the
It works by defining the set of hyperparameters to tune, it starts with a randomly sampled set of options (population).
Then by using evolutionary operators as the mating, mutation, selection and evaluation,
it generates new candidates looking to improve the cross-validation score in each generation.
It'll continue with this process until a number of generations is reached or until a callback criteria is met.
It'll continue with this process until a number of generations is reached or until a callback criterion is met.

Example
-------

First lets import some dataset and others scikit-learn standard modules, we'll use
First let's import some dataset and other scikit-learn standard modules, we'll use
the `digits dataset <https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html>`__.
This is a classification problem, we'll fine-tune a Random Forest Classifier for this task.

Expand All @@ -40,7 +40,7 @@ This is a classification problem, we'll fine-tune a Random Forest Classifier for
from sklearn.datasets import load_digits
from sklearn.metrics import accuracy_score
Lets first read the data, split it in our training and test set and visualize some of the data points:
Let's first read the data, split it in our training and test set and visualize some of the data points:

.. code:: python3
Expand All @@ -60,7 +60,7 @@ We should see something like this:

.. image:: ../images/basic_usage_digits_0.png

Now, we must define our param_grid, similar to scikit-learn, is a dictionary with the models hyperparameters.
Now, we must define our param_grid, similar to scikit-learn, which is a dictionary with the model's hyperparameters.
The main difference with for example sckit-learn's GridSearchCv,
is that we don't pre-define the values to use in the search,
but rather, the boundaries of each parameter.
Expand All @@ -81,7 +81,7 @@ Notice that in the case of *'boostrap'*, as it is a categorical variable, we do
As well, in the 'min_weight_fraction_leaf', we used an additional parameter named distribution,
this is useful to tell the optimizer from which data distribution it can sample some random values during the optimization.

Now, we are ready to set the GASearchCV, its the object that will allow us to run the fitting process using evolutionary algorihtms
Now, we are ready to set the GASearchCV, its the object that will allow us to run the fitting process using evolutionary algorithms
It has several options that we can use, for this first example, we'll keep it very simple:

.. code:: python3
Expand All @@ -100,9 +100,9 @@ It has several options that we can use, for this first example, we'll keep it ve
n_jobs=-1,
verbose=True)
So now the setup in ready, note that are others parameters that can be specified in GASearchCV,
So now the setup is ready, note that are other parameters that can be specified in GASearchCV,
the ones we used, are equivalents to the meaning in scikit-learn, besides the one already explained,
is worth to mention that the "metric" is going to be used as the optimization variable,
is worth mentioning that the "metric" is going to be used as the optimization variable,
so the algorithm will try to find the set of parameters that maximizes this metric.

We are ready to run the optimization routine:
Expand All @@ -127,8 +127,8 @@ This log, shows us the metrics obtained in each iteration (generation), this is
* **fitness_max:** The maximum individual score of all the models in this generation.
* **fitness_min:** The minimum individual score of all the models in this generation.

After fitting the model, we have some extra methos to use the model right away.
It will use by default the best set of hyperparameters it found, based in the cross-validation score:
After fitting the model, we have some extra methods to use the model right away.
It will use by default the best set of hyperparameters it found, based on the cross-validation score:

.. code:: python3
Expand All @@ -142,8 +142,8 @@ In this case, we got an accuracy score in the test set of 0.93

.. image:: ../images/basic_usage_accuracy_2.jpeg

Now lets use a couple more functions available in the package.
The first one, will help us to see the evolution of our metric over the generations
Now, let's use a couple more functions available in the package.
The first one will help us to see the evolution of our metric over the generations

.. code:: python3
Expand All @@ -165,10 +165,10 @@ sklearn-genetic-opt comes with a plot function to analyze this log:
.. image:: ../images/basic_usage_plot_space_4.png

What this plot shows us, is the distribution of the sampled values for each hyperparameter.
We can see for example in the *'min_weight_fraction_leaf'* that the algorithm mostly sampled values bellow 0.15.
What this plot shows us, is the distributione of the sampled values for each hyperparameter.
We can see for example in the *'min_weight_fraction_leaf'* that the algorithm mostly sampled values below 0.15.
You can also check every single combination of variables and the contour plot that represents the sampled values.

This concludes our introduction to the basic sklearn-genetic-opt usage.
Further tutorials will cover the GASearchCV parameters, callbacks,
different optimization algorithms and more advanced usage.
different optimization algorithms and more advanced use cases.
28 changes: 14 additions & 14 deletions docs/tutorials/callbacks.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,11 @@ Introduction

Callbacks can be defined to take actions or decisions over the optimization
process while it is still running.
Common callbacks includes different rules to stop the algorithm or log artifacts.
Common callbacks include different rules to stop the algorithm or log artifacts.
The callbacks are passed to the ``.fit`` method
of the :class:`~sklearn_genetic.GASearchCV` class.

The callbacks are evaluated at start of the training using the `on_start` method,
The callbacks are evaluated at the start of the training using the `on_start` method,
at the end of each generation fit using `on_step` method and at the
end of the training using `on_end`, so it looks like this:

Expand All @@ -22,7 +22,7 @@ until that training point.

.. image:: ../images/callbacks_log_0.png

Now lets see how to use them, we'll take
Now let's see how to use them, we'll take
the data set and model used in :ref:`basic-usage`. The available callbacks are:

* ProgressBar
Expand Down Expand Up @@ -69,14 +69,14 @@ ConsecutiveStopping
-------------------

This callback stops the optimization if the current metric value
is no greater that at least one metric from the last N generations.
is no greater than at least one metric from the last N generations.

It requires us to define the number of generations to compare
against the current generation and the name of the metric we want
to track.

For example, if we want to stop the optimization after 5 iterations
where the current iteration (sixth) fitness value is worst that all
where the current iteration (sixth) fitness value is worst than all
the previous ones (5), we define it like this:

.. code:: python3
Expand All @@ -94,7 +94,7 @@ Now we just have to pass it to the estimator during the fitting
DeltaThreshold
--------------
This callback stops the optimization if the absolute difference
between the current and last metric is less or equals than a threshold.
between the current and last metric is less or equals to a threshold.

It just requires the threshold and the metric name, for example
using the 'fitness_min' value:
Expand All @@ -112,11 +112,11 @@ This callback stops the optimization if the difference in seconds between the st
first set of hyperparameters fit, and the current generation time is greater than a time threshold.

Remember that this is checked after each generation fit, so if the first (or any) generation fit takes
longer that the threshold, it won't stop the fitting process until is done with the current generation
longer than the threshold, it won't stop the fitting process until is done with the current generation
population.

It requires the total_seconds parameters, for example stopping if the time is greater
that one minute:
than one minute:

.. code:: python3
Expand All @@ -128,7 +128,7 @@ that one minute:
ThresholdStopping
-----------------
It stops the optimization if the current metric
is greater or equals than the define threshold.
is greater or equals to the defined threshold.

For example, if we want to stop the optimization
if the 'fitness_max' is above 0.98:
Expand All @@ -151,9 +151,9 @@ within this package due it's usually a sensitive and heavy dependency::

pip install tensorflow

It only requires to define the folder where you want to log your run, and optionally, a run_id, so
your consecutive runs doesn't mix up.
If the run_id is not provided, it will create a subfolder with the current datetime of your run.
It only requires defining the folder where you want to log your run, and optionally, a run_id, so
your consecutive runs don't mix up.
If the run_id is not provided, it will create a subfolder with the current date-time of your run.

.. code:: python3
Expand All @@ -162,8 +162,8 @@ If the run_id is not provided, it will create a subfolder with the current datet
evolved_estimator.fit(X, y, callbacks=callback)
While the model is being trained you can see in real time the metrics in Tensorboard.
If you have run more that 1 GASearchCV model and use the TensordBoard callback using
While the model is being trained you can see in real-time the metrics in Tensorboard.
If you have run more than one GASearchCV model and use the TensordBoard callback using
the same log_dir but different run_id, you can compare the metrics of each run, it looks
like this for the fitness in three different runs:

Expand Down
8 changes: 4 additions & 4 deletions docs/tutorials/custom_callback.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ or ``False``. It expects the parameter `estimator`.
``True`` means that the optimization must stop, ``False``, means it can continue.
It expects the parameters `record`, `logbook` and `estimator`.

**on_end:** This method is called at the end of the las generation or after an stopping
**on_end:** This method is called at the end of the las generation or after a stopping
callback meets its criteria. It expects the parameters `logbook` and `estimator`,
it should return ``None`` or ``False``.

Expand All @@ -29,7 +29,7 @@ Example
-------

In this example, we are going to define a dummy callback that
stops the process if there have been more that `N` fitness values
stops the process if there have been more than `N` fitness values
bellow a threshold value.

The callback must have three parameters: `record`, `logbook` and `estimator`.
Expand Down Expand Up @@ -61,7 +61,7 @@ So to check inside the logbook, we could define a function like this:
return False
As sklearn-genetic-opt expects all this logic in a single object, we must define a class
that will have all this parameters, so we can rewrite it like this:
that will have all these parameters, so we can rewrite it like this:


.. code-block:: python
Expand Down Expand Up @@ -121,7 +121,7 @@ Now, let's expend it to add the others method, just to print a message:
print("I'm done with training!")
So that is it, now you can initialize the DummyThreshold
and pass it to a in the ``fit`` method of a
and pass it to in the ``fit`` method of a
:class:`~sklearn_genetic.GASearchCV` instance:

.. code-block:: python
Expand Down

0 comments on commit 7c15bf4

Please sign in to comment.