Skip to content

Commit

Permalink
Move AccelerateMixin to hf.py, update docs
Browse files Browse the repository at this point in the history
After introducing the skorch/hf.py module, it made sense to move the
AccelerateMixin there from helper.py. Of course, this required some code
adjustments, which have been made as well, but there are no functional
changes.

On top of that, I extended some parts of the documentation:

- Created dedicated HF section in docs to put all the HF stuff
- FAQ about gradient accumulation now also points to AccelerateMixin
- Mention gradient accumulation in AccelerateMixin docstring
  • Loading branch information
BenjaminBossan committed Jul 26, 2022
1 parent 1dbf32b commit b884b11
Show file tree
Hide file tree
Showing 8 changed files with 500 additions and 587 deletions.
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@ User's Guide
user/parallelism
user/customization
user/performance
user/huggingface
user/FAQ


Expand Down
5 changes: 5 additions & 0 deletions docs/user/FAQ.rst
Original file line number Diff line number Diff line change
Expand Up @@ -329,6 +329,11 @@ sure that there is an optimization step after the last batch of each
epoch. However, this example can serve as a starting point to
implement your own version gradient accumulation.

Alternatively, make use of skorch's `accelerate
<https://github.com/huggingface/accelerate>`_ integration provided by
:class:`~skorch.hf.AccelerateMixin` and use the gradient accumulation feature
from that library.

How can I dynamically set the input size of the PyTorch module based on the data?
---------------------------------------------------------------------------------

Expand Down
54 changes: 0 additions & 54 deletions docs/user/helper.rst
Original file line number Diff line number Diff line change
Expand Up @@ -48,58 +48,6 @@ argument ``idx=0``, the default) and one for y (with argument
gs.fit(X_sl, y_sl)
AccelerateMixin
---------------

This mixin class can be used to add support for huggingface accelerate_ to
skorch. E.g., this allows you to use mixed precision training (AMP), multi-GPU
training, or training with a TPU. For the time being, this feature should be
considered experimental.

To use this feature, create a new subclass of the neural net class you want to
use and inherit from the mixin class. E.g., if you want to use a
:class:`.NeuralNet`, it would look like this:

.. code:: python
from skorch import NeuralNet
from skorch.helper import AccelerateMixin
class AcceleratedNet(AccelerateMixin, NeuralNet):
"""NeuralNet with accelerate support"""
The same would work for :class:`.NeuralNetClassifier`,
:class:`.NeuralNetRegressor`, etc. Then pass an instance of Accelerator_ with
the desired parameters and you're good to go:

.. code:: python
from accelerate import Accelerator
accelerator = Accelerator(...)
net = AcceleratedNet(
MyModule,
accelerator=accelerator,
)
net.fit(X, y)
accelerate_ recommends to leave the device handling to the Accelerator_, which
is why ``device`` defautls to ``None`` (thus telling skorch not to change the
device).

To install accelerate_, run the following command inside your Python environment:

.. code:: bash
python -m pip install accelerate
.. note::

Under the hood, accelerate uses :class:`~torch.cuda.amp.GradScaler`,
which does not support passing the training step as a closure.
Therefore, if your optimizer requires that (e.g.
:class:`torch.optim.LBFGS`), you cannot use accelerate.

Command line interface helpers
------------------------------

Expand Down Expand Up @@ -253,8 +201,6 @@ callbacks through the command line (but you can modify existing ones
as usual).
.. _accelerate: https://github.com/huggingface/accelerate
.. _Accelerator: https://huggingface.co/docs/accelerate/accelerator.html
.. _fire: https://github.com/google/python-fire
.. _numpydoc: https://github.com/numpy/numpydoc
.. _example: https://github.com/skorch-dev/skorch/tree/master/examples/cli
119 changes: 38 additions & 81 deletions notebooks/Hugging_Face_Finetuning.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -445,55 +445,31 @@
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "28462f59552f4cb19c25ddac48fa47ef",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Downloading: 0%| | 0.00/28.0 [00:00<?, ?B/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "321b9786bc634e8ab6ba113bb7ec9a30",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Downloading: 0%| | 0.00/226k [00:00<?, ?B/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "66a02c063d594d18b733a7774260c146",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Downloading: 0%| | 0.00/455k [00:00<?, ?B/s]"
]
},
"metadata": {},
"output_type": "display_data"
"name": "stdout",
"output_type": "stream",
"text": [
"huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n",
"To disable this warning, you can either:\n",
"\t- Avoid using `tokenizers` before the fork if possible\n",
"\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n",
"huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n",
"To disable this warning, you can either:\n",
"\t- Avoid using `tokenizers` before the fork if possible\n",
"\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n",
"huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n",
"To disable this warning, you can either:\n",
"\t- Avoid using `tokenizers` before the fork if possible\n",
"\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForSequenceClassification: ['vocab_projector.bias', 'vocab_transform.bias', 'vocab_layer_norm.weight', 'vocab_layer_norm.bias', 'vocab_transform.weight', 'vocab_projector.weight']\n",
"Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForSequenceClassification: ['vocab_transform.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_projector.weight', 'vocab_layer_norm.bias', 'vocab_transform.bias']\n",
"- This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n",
"- This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\n",
"Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['pre_classifier.weight', 'classifier.bias', 'classifier.weight', 'pre_classifier.bias']\n",
"Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'pre_classifier.bias', 'pre_classifier.weight', 'classifier.weight']\n",
"You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n"
]
},
Expand All @@ -517,7 +493,7 @@
"text": [
" epoch train_loss valid_acc valid_loss dur\n",
"------- ------------ ----------- ------------ --------\n",
" 1 \u001b[36m1.0325\u001b[0m \u001b[32m0.8444\u001b[0m \u001b[35m0.5309\u001b[0m 130.8168\n"
" 1 \u001b[36m1.0325\u001b[0m \u001b[32m0.8444\u001b[0m \u001b[35m0.5309\u001b[0m 129.3804\n"
]
},
{
Expand All @@ -538,7 +514,7 @@
"name": "stdout",
"output_type": "stream",
"text": [
" 2 \u001b[36m0.3306\u001b[0m \u001b[32m0.8780\u001b[0m \u001b[35m0.4194\u001b[0m 129.1985\n"
" 2 \u001b[36m0.3306\u001b[0m \u001b[32m0.8780\u001b[0m \u001b[35m0.4194\u001b[0m 129.7384\n"
]
},
{
Expand All @@ -559,9 +535,9 @@
"name": "stdout",
"output_type": "stream",
"text": [
" 3 \u001b[36m0.1346\u001b[0m \u001b[32m0.8798\u001b[0m \u001b[35m0.4100\u001b[0m 129.4979\n",
"CPU times: user 6min 7s, sys: 43.4 s, total: 6min 50s\n",
"Wall time: 6min 44s\n"
" 3 \u001b[36m0.1346\u001b[0m \u001b[32m0.8798\u001b[0m \u001b[35m0.4100\u001b[0m 129.8741\n",
"CPU times: user 6min 7s, sys: 42.8 s, total: 6min 50s\n",
"Wall time: 6min 39s\n"
]
},
{
Expand Down Expand Up @@ -621,8 +597,8 @@
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 20 s, sys: 35.2 ms, total: 20 s\n",
"Wall time: 15.6 s\n"
"CPU times: user 19.4 s, sys: 23.6 ms, total: 19.5 s\n",
"Wall time: 15 s\n"
]
}
],
Expand Down Expand Up @@ -676,10 +652,10 @@
"source": [
"For this to work, you need:\n",
"- A GPU that is capable of mixed precision training\n",
"- The [accelerate library](https://huggingface.co/docs/accelerate/index), which you can install as: `python -m pip install accelerate`.\n",
"- The [accelerate library](https://huggingface.co/docs/accelerate/index), which you can install as: `python -m pip install 'accelerate>=0.11'`.\n",
"- skorch version 0.12 or installed from the current master branch (`python -m pip install git+https://github.com/skorch-dev/skorch.git`)\n",
"\n",
"Again, we assume that you're familiar with the general concept of mixed precision training. For more information on how skorch integrates with accelerate, please consult the [skorch docs](https://skorch.readthedocs.io/en/latest/user/helper.html#acceleratemixin)."
"Again, we assume that you're familiar with the general concept of mixed precision training. For more information on how skorch integrates with accelerate, please consult the [skorch docs](https://skorch.readthedocs.io/en/latest/user/huggingface.html#accelerate)."
]
},
{
Expand All @@ -700,37 +676,18 @@
}
],
"source": [
"! [ ! -z \"$COLAB_GPU\" ] && pip install accelerate"
"! [ ! -z \"$COLAB_GPU\" ] && pip install 'accelerate>=0.11'"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "c47aa1a6-f466-4a2c-84ab-034e4d6bdbcd",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n",
"To disable this warning, you can either:\n",
"\t- Avoid using `tokenizers` before the fork if possible\n",
"\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n",
"huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n",
"To disable this warning, you can either:\n",
"\t- Avoid using `tokenizers` before the fork if possible\n",
"\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n",
"huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n",
"To disable this warning, you can either:\n",
"\t- Avoid using `tokenizers` before the fork if possible\n",
"\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n"
]
}
],
"outputs": [],
"source": [
"from accelerate import Accelerator\n",
"from skorch.helper import AccelerateMixin"
"from skorch.hf import AccelerateMixin"
]
},
{
Expand All @@ -751,7 +708,7 @@
"metadata": {},
"outputs": [],
"source": [
"accelerator = Accelerator(fp16=True)"
"accelerator = Accelerator(mixed_precision='fp16')"
]
},
{
Expand Down Expand Up @@ -806,10 +763,10 @@
"name": "stderr",
"output_type": "stream",
"text": [
"Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForSequenceClassification: ['vocab_projector.bias', 'vocab_transform.bias', 'vocab_layer_norm.weight', 'vocab_layer_norm.bias', 'vocab_transform.weight', 'vocab_projector.weight']\n",
"Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForSequenceClassification: ['vocab_transform.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_projector.weight', 'vocab_layer_norm.bias', 'vocab_transform.bias']\n",
"- This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n",
"- This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\n",
"Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['pre_classifier.weight', 'classifier.bias', 'classifier.weight', 'pre_classifier.bias']\n",
"Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'pre_classifier.bias', 'pre_classifier.weight', 'classifier.weight']\n",
"You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n"
]
},
Expand All @@ -833,7 +790,7 @@
"text": [
" epoch train_loss valid_acc valid_loss dur\n",
"------- ------------ ----------- ------------ -------\n",
" 1 \u001b[36m1.0463\u001b[0m \u001b[32m0.8374\u001b[0m \u001b[35m0.5547\u001b[0m 71.6220\n"
" 1 \u001b[36m1.0463\u001b[0m \u001b[32m0.8374\u001b[0m \u001b[35m0.5547\u001b[0m 71.2980\n"
]
},
{
Expand All @@ -854,7 +811,7 @@
"name": "stdout",
"output_type": "stream",
"text": [
" 2 \u001b[36m0.3264\u001b[0m \u001b[32m0.8786\u001b[0m \u001b[35m0.4251\u001b[0m 71.5409\n"
" 2 \u001b[36m0.3264\u001b[0m \u001b[32m0.8786\u001b[0m \u001b[35m0.4251\u001b[0m 73.2230\n"
]
},
{
Expand All @@ -875,7 +832,7 @@
"name": "stdout",
"output_type": "stream",
"text": [
" 3 \u001b[36m0.1387\u001b[0m \u001b[32m0.8845\u001b[0m \u001b[35m0.4142\u001b[0m 71.6285\n"
" 3 \u001b[36m0.1387\u001b[0m \u001b[32m0.8845\u001b[0m \u001b[35m0.4142\u001b[0m 74.4516\n"
]
},
{
Expand Down Expand Up @@ -927,8 +884,8 @@
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 11.5 s, sys: 32.9 ms, total: 11.5 s\n",
"Wall time: 7.02 s\n"
"CPU times: user 11.7 s, sys: 4.97 ms, total: 11.7 s\n",
"Wall time: 7.27 s\n"
]
}
],
Expand Down

0 comments on commit b884b11

Please sign in to comment.