Skip to content

Commit

Permalink
Merge pull request huggingface#10 from huggingface/main
Browse files Browse the repository at this point in the history
si
  • Loading branch information
jamesthesnake committed Mar 30, 2023
2 parents bffa5de + 228792a commit 9272a10
Show file tree
Hide file tree
Showing 53 changed files with 826 additions and 315 deletions.
4 changes: 2 additions & 2 deletions .github/workflows/update_tiny_models.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: Self-hosted runner (push)
name: Update Tiny Models

on:
push:
Expand All @@ -9,7 +9,7 @@ on:
- cron: "0 2 * * *"

env:
TOKEN: ${{ secrets.SYLVAIN_HF_TOKEN }}
TOKEN: ${{ secrets.TRANSFORMERS_HUB_BOT_HF_TOKEN }}

jobs:
update_tiny_models:
Expand Down
23 changes: 23 additions & 0 deletions docs/source/en/generation_strategies.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -139,6 +139,29 @@ one for summarization with beam search). You must have the right Hub permissions
['Les fichiers de configuration sont faciles à utiliser !']
```

## Streaming

The `generate()` supports streaming, through its `streamer` input. The `streamer` input is compatible any instance
from a class that has the following methods: `put()` and `end()`. Internally, `put()` is used to push new tokens and
`end()` is used to flag the end of text generation.

In practice, you can craft your own streaming class for all sorts of purposes! We also have basic streaming classes
ready for you to use. For example, you can use the [`TextStreamer`] class to stream the output of `generate()` into
your screen, one word at a time:

```python
>>> from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer

>>> tok = AutoTokenizer.from_pretrained("gpt2")
>>> model = AutoModelForCausalLM.from_pretrained("gpt2")
>>> inputs = tok(["An increasing sequence: one,"], return_tensors="pt")
>>> streamer = TextStreamer(tok)

>>> # Despite returning the usual output, the streamer will also print the generated text to stdout.
>>> _ = model.generate(**inputs, streamer=streamer, max_new_tokens=20)
An increasing sequence: one, two, three, four, five, six, seven, eight, nine, ten, eleven,
```

## Decoding strategies

Certain combinations of the `generate()` parameters, and ultimately `generation_config`, can be used to enable specific
Expand Down
4 changes: 4 additions & 0 deletions docs/source/en/internal/generation_utils.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -265,3 +265,7 @@ A [`Constraint`] can be used to force the generation to include specific tokens
[[autodoc]] top_k_top_p_filtering

[[autodoc]] tf_top_k_top_p_filtering

## Streamers

[[autodoc]] TextStreamer
3 changes: 2 additions & 1 deletion docs/source/en/main_classes/text_generation.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,8 @@ of the generation method.

To learn how to inspect a model's generation configuration, what are the defaults, how to change the parameters ad hoc,
and how to create and save a customized generation configuration, refer to the
[text generation strategies guide](../generation_strategies).
[text generation strategies guide](../generation_strategies). The guide also explains how to use related features,
like token streaming.

## GenerationConfig

Expand Down
16 changes: 9 additions & 7 deletions examples/pytorch/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -262,13 +262,13 @@ First, install the Neptune client library. You can do it with either `pip` or `c
`pip`:

```bash
pip install neptune-client
pip install neptune
```

`conda`:

```bash
conda install -c conda-forge neptune-client
conda install -c conda-forge neptune
```

Next, in your model training script, import `NeptuneCallback`:
Expand All @@ -283,8 +283,8 @@ To enable Neptune logging, in your `TrainingArguments`, set the `report_to` argu
training_args = TrainingArguments(
"quick-training-distilbert-mrpc",
evaluation_strategy="steps",
eval_steps = 20,
report_to = "neptune",
eval_steps=20,
report_to="neptune",
)

trainer = Trainer(
Expand All @@ -294,6 +294,8 @@ trainer = Trainer(
)
```

**Note:** This method requires saving your Neptune credentials as environment variables (see the bottom of the section).

Alternatively, for more logging options, create a Neptune callback:

```python
Expand All @@ -318,7 +320,7 @@ neptune_callback = NeptuneCallback(
Pass the callback to the Trainer:

```python
training_args = TrainingArguments(..., report_to = None)
training_args = TrainingArguments(..., report_to=None)
trainer = Trainer(
model,
training_args,
Expand All @@ -336,7 +338,7 @@ Now, when you start the training with `trainer.train()`, your metadata will be l
| `NEPTUNE_API_TOKEN` | Your Neptune API token. To find and copy it, click your Neptune avatar and select **Get your API token**. |
| `NEPTUNE_PROJECT` | The full name of your Neptune project (`workspace-name/project-name`). To find and copy it, head to **project settings** → **Properties**. |

For detailed instructions and examples, see the [Neptune docs](https://docs.neptune.ai/integrations-and-supported-tools/model-training/hugging-face).
For detailed instructions and examples, see the [Neptune docs](https://docs.neptune.ai/integrations/transformers/).

### ClearML

Expand Down Expand Up @@ -373,4 +375,4 @@ Advanced configuration is possible by setting environment variables:
| CLEARML_PROJECT | Name of the project in ClearML. (default: `"HuggingFace Transformers"`) |
| CLEARML_TASK | Name of the task in ClearML. (default: `"Trainer"`) |

Additional configuration options are available through generic [clearml environment variables](https://clear.ml/docs/latest/docs/configs/env_vars).
Additional configuration options are available through generic [clearml environment variables](https://clear.ml/docs/latest/docs/configs/env_vars).
12 changes: 9 additions & 3 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,10 @@
7. Build both the sources and the wheel. Do not change anything in setup.py between
creating the wheel and the source distribution (obviously).
Clean up your build and dist folders (to avoid re-uploading oldies):
rm -rf dist
rm -rf build
For the wheel, run: "python setup.py bdist_wheel" in the top level directory.
(this will build a wheel for the python version you use to build it).
Expand All @@ -46,10 +50,10 @@
8. Check that everything looks correct by uploading the package to the pypi test server:
twine upload dist/* -r pypitest
twine upload dist/* -r testpypi
(pypi suggest using twine as other methods upload files via plaintext.)
You may have to specify the repository url, use the following command then:
twine upload dist/* -r pypitest --repository-url=https://test.pypi.org/legacy/
twine upload dist/* -r testpypi --repository-url=https://test.pypi.org/legacy/
Check that you can install it in a virtualenv by running:
pip install -i https://testpypi.python.org/pypi transformers
Expand All @@ -58,6 +62,8 @@
python -c "from transformers import pipeline; classifier = pipeline('text-classification'); print(classifier('What a nice release'))"
python -c "from transformers import *"
If making a patch release, double check the bug you are patching is indeed resolved.
9. Upload the final version to actual pypi:
twine upload dist/* -r pypi
Expand Down Expand Up @@ -153,7 +159,7 @@
"rhoknp>=1.1.0",
"rjieba",
"rouge-score!=0.0.7,!=0.0.8,!=0.1,!=0.1.1",
"ruff>=0.0.241",
"ruff>=0.0.241,<=0.0.259",
"sacrebleu>=1.4.12,<2.0.0",
"sacremoses",
"safetensors>=0.2.1",
Expand Down
4 changes: 2 additions & 2 deletions src/transformers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@
"feature_extraction_sequence_utils": ["SequenceFeatureExtractor"],
"feature_extraction_utils": ["BatchFeature", "FeatureExtractionMixin"],
"file_utils": [],
"generation": ["GenerationConfig"],
"generation": ["GenerationConfig", "TextStreamer"],
"hf_argparser": ["HfArgumentParser"],
"image_transforms": [],
"integrations": [
Expand Down Expand Up @@ -3769,7 +3769,7 @@
from .feature_extraction_utils import BatchFeature, FeatureExtractionMixin

# Generation
from .generation import GenerationConfig
from .generation import GenerationConfig, TextStreamer
from .hf_argparser import HfArgumentParser

# Integrations
Expand Down
2 changes: 1 addition & 1 deletion src/transformers/dependency_versions_table.py
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@
"rhoknp": "rhoknp>=1.1.0",
"rjieba": "rjieba",
"rouge-score": "rouge-score!=0.0.7,!=0.0.8,!=0.1,!=0.1.1",
"ruff": "ruff>=0.0.241",
"ruff": "ruff>=0.0.241,<=0.0.259",
"sacrebleu": "sacrebleu>=1.4.12,<2.0.0",
"sacremoses": "sacremoses",
"safetensors": "safetensors>=0.2.1",
Expand Down
4 changes: 2 additions & 2 deletions src/transformers/generation/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,7 @@
from ..utils import OptionalDependencyNotAvailable, _LazyModule, is_flax_available, is_tf_available, is_torch_available


_import_structure = {"configuration_utils": ["GenerationConfig"]}

_import_structure = {"configuration_utils": ["GenerationConfig"], "streamers": ["TextStreamer"]}

try:
if not is_torch_available():
Expand Down Expand Up @@ -150,6 +149,7 @@

if TYPE_CHECKING:
from .configuration_utils import GenerationConfig
from .streamers import TextStreamer

try:
if not is_torch_available():
Expand Down
104 changes: 104 additions & 0 deletions src/transformers/generation/streamers.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
# coding=utf-8
# Copyright 2023 The HuggingFace Inc. team.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

from typing import TYPE_CHECKING


if TYPE_CHECKING:
from ..models.auto import AutoTokenizer


class BaseStreamer:
"""
Base class from which `.generate()` streamers should inherit.
"""

def put(self, value):
"""Function that is called by `.generate()` to push new tokens"""
raise NotImplementedError()

def end(self):
"""Function that is called by `.generate()` to signal the end of generation"""
raise NotImplementedError()


class TextStreamer(BaseStreamer):
"""
Simple text streamer that prints the token(s) to stdout as soon as entire words are formed.
Parameters:
tokenizer (`AutoTokenizer`):
The tokenized used to decode the tokens.
Examples:
```python
>>> from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
>>> tok = AutoTokenizer.from_pretrained("gpt2")
>>> model = AutoModelForCausalLM.from_pretrained("gpt2")
>>> inputs = tok(["An increasing sequence: one,"], return_tensors="pt")
>>> streamer = TextStreamer(tok)
>>> # Despite returning the usual output, the streamer will also print the generated text to stdout.
>>> _ = model.generate(**inputs, streamer=streamer, max_new_tokens=20)
An increasing sequence: one, two, three, four, five, six, seven, eight, nine, ten, eleven,
```
"""

def __init__(self, tokenizer: "AutoTokenizer"):
self.tokenizer = tokenizer
self.token_cache = []
self.print_len = 0

def put(self, value):
"""
Recives tokens, decodes them, and prints them to stdout as soon as they form entire words.
"""
if len(value.shape) > 1 and value.shape[0] > 1:
raise ValueError("TextStreamer only supports batch size 1")
elif len(value.shape) > 1:
value = value[0]

# Add the new token to the cache and decodes the entire thing.
self.token_cache.extend(value.tolist())
text = self.tokenizer.decode(self.token_cache)

# After symbol for a new line, we flush the cache.
if text.endswith("\n"):
printable_text = text[self.print_len :]
self.token_cache = []
self.print_len = 0
# Otherwise, prints until the last space char (simple heuristic to avoid printing incomplete words,
# which may change with the subsequent token -- there are probably smarter ways to do this!)
else:
printable_text = text[self.print_len : text.rfind(" ") + 1]
self.print_len += len(printable_text)

print(printable_text, flush=True, end="")

def end(self):
"""Flushes any remaining cache and prints a newline to stdout."""
# Flush the cache, if it exists
if len(self.token_cache) > 0:
text = self.tokenizer.decode(self.token_cache)
printable_text = text[self.print_len :]
self.token_cache = []
self.print_len = 0
else:
printable_text = ""

# Print a newline (and the remaining text, if any)
print(printable_text, flush=True)
Loading

0 comments on commit 9272a10

Please sign in to comment.