Skip to content

Commit

Permalink
Move accelerate and update docs (#875)
Browse files Browse the repository at this point in the history
After introducing the skorch/hf.py module, it made sense to move the
AccelerateMixin there from helper.py. Of course, this required some code
adjustments, which have been made as well, but there are no functional
changes.

On top of that, I extended some parts of the documentation:

- Created dedicated HF section in docs to put all the HF stuff
- FAQ about gradient accumulation now also points to AccelerateMixin
- Mention gradient accumulation in AccelerateMixin docstring
  • Loading branch information
BenjaminBossan committed Jul 27, 2022
1 parent 1dbf32b commit eecf94f
Show file tree
Hide file tree
Showing 9 changed files with 609 additions and 587 deletions.
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@ User's Guide
user/parallelism
user/customization
user/performance
user/huggingface
user/FAQ


Expand Down
5 changes: 5 additions & 0 deletions docs/user/FAQ.rst
Original file line number Diff line number Diff line change
Expand Up @@ -329,6 +329,11 @@ sure that there is an optimization step after the last batch of each
epoch. However, this example can serve as a starting point to
implement your own version gradient accumulation.

Alternatively, make use of skorch's `accelerate
<https://github.com/huggingface/accelerate>`_ integration provided by
:class:`~skorch.hf.AccelerateMixin` and use the gradient accumulation feature
from that library.

How can I dynamically set the input size of the PyTorch module based on the data?
---------------------------------------------------------------------------------

Expand Down
54 changes: 0 additions & 54 deletions docs/user/helper.rst
Original file line number Diff line number Diff line change
Expand Up @@ -48,58 +48,6 @@ argument ``idx=0``, the default) and one for y (with argument
gs.fit(X_sl, y_sl)
AccelerateMixin
---------------

This mixin class can be used to add support for huggingface accelerate_ to
skorch. E.g., this allows you to use mixed precision training (AMP), multi-GPU
training, or training with a TPU. For the time being, this feature should be
considered experimental.

To use this feature, create a new subclass of the neural net class you want to
use and inherit from the mixin class. E.g., if you want to use a
:class:`.NeuralNet`, it would look like this:

.. code:: python
from skorch import NeuralNet
from skorch.helper import AccelerateMixin
class AcceleratedNet(AccelerateMixin, NeuralNet):
"""NeuralNet with accelerate support"""
The same would work for :class:`.NeuralNetClassifier`,
:class:`.NeuralNetRegressor`, etc. Then pass an instance of Accelerator_ with
the desired parameters and you're good to go:

.. code:: python
from accelerate import Accelerator
accelerator = Accelerator(...)
net = AcceleratedNet(
MyModule,
accelerator=accelerator,
)
net.fit(X, y)
accelerate_ recommends to leave the device handling to the Accelerator_, which
is why ``device`` defautls to ``None`` (thus telling skorch not to change the
device).

To install accelerate_, run the following command inside your Python environment:

.. code:: bash
python -m pip install accelerate
.. note::

Under the hood, accelerate uses :class:`~torch.cuda.amp.GradScaler`,
which does not support passing the training step as a closure.
Therefore, if your optimizer requires that (e.g.
:class:`torch.optim.LBFGS`), you cannot use accelerate.

Command line interface helpers
------------------------------

Expand Down Expand Up @@ -253,8 +201,6 @@ callbacks through the command line (but you can modify existing ones
as usual).
.. _accelerate: https://github.com/huggingface/accelerate
.. _Accelerator: https://huggingface.co/docs/accelerate/accelerator.html
.. _fire: https://github.com/google/python-fire
.. _numpydoc: https://github.com/numpy/numpydoc
.. _example: https://github.com/skorch-dev/skorch/tree/master/examples/cli
109 changes: 109 additions & 0 deletions docs/user/huggingface.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
========================
Hugging Face Integration
========================

skorch integrates with some libraries from the `Hugging Face
<https://huggingface.co/>`_ ecosystem. Take a look at the sections below to
learn more.

Accelerate
----------

The :class:`.AccelerateMixin` class can be used to add support for huggingface
accelerate_ to skorch. E.g., this allows you to use mixed precision training
(AMP), multi-GPU training, raining with a TPU, or gradient accumulation. For the
time being, this feature should be considered experimental.

To use this feature, create a new subclass of the neural net class you want to
use and inherit from the mixin class. E.g., if you want to use a
:class:`.NeuralNet`, it would look like this:

.. code:: python
from skorch import NeuralNet
from skorch.hf import AccelerateMixin
class AcceleratedNet(AccelerateMixin, NeuralNet):
"""NeuralNet with accelerate support"""
The same would work for :class:`.NeuralNetClassifier`,
:class:`.NeuralNetRegressor`, etc. Then pass an instance of Accelerator_ with
the desired parameters and you're good to go:

.. code:: python
from accelerate import Accelerator
accelerator = Accelerator(...)
net = AcceleratedNet(
MyModule,
accelerator=accelerator,
)
net.fit(X, y)
accelerate_ recommends to leave the device handling to the Accelerator_, which
is why ``device`` defautls to ``None`` (thus telling skorch not to change the
device).

To install accelerate_, run the following command inside your Python environment:

.. code:: bash
python -m pip install accelerate
.. note::

Under the hood, accelerate uses :class:`~torch.cuda.amp.GradScaler`,
which does not support passing the training step as a closure.
Therefore, if your optimizer requires that (e.g.
:class:`torch.optim.LBFGS`), you cannot use accelerate.


Tokenizers
----------

skorch also provides sklearn-like transformers that work with Hugging Face
`tokenizers <https://huggingface.co/docs/tokenizers/index>`_. The ``transform``
methods of these transformers return data in a dict-like data structure, which
makes them easy to use in conjunction with skorch's :class:`.NeuralNet`. Below
is an example of how to use a pretrained tokenizer with the help of
:class:`skorch.hf.HuggingfacePretrainedTokenizer`:

.. code:: python
from skorch.hf import HuggingfacePretrainedTokenizer
# pass the model name to be downloaded
hf_tokenizer = HuggingfacePretrainedTokenizer('bert-base-uncased')
data = ['hello there', 'this is a text']
hf_tokenizer.fit(data) # only loads the model
hf_tokenizer.transform(data)
# use hyper params from pretrained tokenizer to fit on own data
hf_tokenizer = HuggingfacePretrainedTokenizer(
'bert-base-uncased', train=True, vocab_size=12345)
data = ...
hf_tokenizer.fit(data) # fits new tokenizer on data
hf_tokenizer.transform(data)
We also :class:`skorch.hf.HuggingfaceTokenizer` if you don't want to use a
pretrained tokenizer but instead want to train your own tokenizer with
fine-grained control over each component, like which tokenization method to use.

Of course, since both transformers are scikit-learn compatible, you can use them
in a grid search.

Transformers
------------

The Hugging Face `transformers
<https://huggingface.co/docs/transformers/index>`_ library gives you access to
many pretrained deep learning models. There is no special skorch integration for
those, since they're just normal models and can thus be used without further
adjustments (as long as they're PyTorch models).

If you want to see how using ``transformers`` with skorch could look like in
practice, take a look at the `Hugging Face fine-tuning notebook
<https://nbviewer.org/github/skorch-dev/skorch/blob/master/notebooks/Hugging_Face_Finetuning.ipynb>`_.

.. _accelerate: https://github.com/huggingface/accelerate
.. _Accelerator: https://huggingface.co/docs/accelerate/accelerator.html

0 comments on commit eecf94f

Please sign in to comment.