Merge pull request huggingface#10 from huggingface/main

si
jameshennessytempus · Mar 30, 2023 · 9272a10 · 9272a10
2 parents bffa5de + 228792a
commit 9272a10
Show file tree

Hide file tree

Showing 53 changed files with 826 additions and 315 deletions.
diff --git a/.github/workflows/update_tiny_models.yml b/.github/workflows/update_tiny_models.yml
@@ -1,4 +1,4 @@
-name: Self-hosted runner (push)
+name: Update Tiny Models
 
 on:
   push:
@@ -9,7 +9,7 @@ on:
     - cron: "0 2 * * *"
 
 env:
-  TOKEN: ${{ secrets.SYLVAIN_HF_TOKEN }}
+  TOKEN: ${{ secrets.TRANSFORMERS_HUB_BOT_HF_TOKEN }}
 
 jobs:
   update_tiny_models:

diff --git a/docs/source/en/generation_strategies.mdx b/docs/source/en/generation_strategies.mdx
@@ -139,6 +139,29 @@ one for summarization with beam search). You must have the right Hub permissions
 ['Les fichiers de configuration sont faciles à utiliser !']
 ```
 
+## Streaming
+
+The `generate()` supports streaming, through its `streamer` input. The `streamer` input is compatible any instance
+from a class that has the following methods: `put()` and `end()`. Internally, `put()` is used to push new tokens and
+`end()` is used to flag the end of text generation.
+
+In practice, you can craft your own streaming class for all sorts of purposes! We also have basic streaming classes
+ready for you to use. For example, you can use the [`TextStreamer`] class to stream the output of `generate()` into
+your screen, one word at a time:
+
+```python
+>>> from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
+
+>>> tok = AutoTokenizer.from_pretrained("gpt2")
+>>> model = AutoModelForCausalLM.from_pretrained("gpt2")
+>>> inputs = tok(["An increasing sequence: one,"], return_tensors="pt")
+>>> streamer = TextStreamer(tok)
+
+>>> # Despite returning the usual output, the streamer will also print the generated text to stdout.
+>>> _ = model.generate(**inputs, streamer=streamer, max_new_tokens=20)
+An increasing sequence: one, two, three, four, five, six, seven, eight, nine, ten, eleven,
+```
+
 ## Decoding strategies
 
 Certain combinations of the `generate()` parameters, and ultimately `generation_config`, can be used to enable specific

diff --git a/docs/source/en/internal/generation_utils.mdx b/docs/source/en/internal/generation_utils.mdx
@@ -265,3 +265,7 @@ A [`Constraint`] can be used to force the generation to include specific tokens
 [[autodoc]] top_k_top_p_filtering
 
 [[autodoc]] tf_top_k_top_p_filtering
+
+## Streamers
+
+[[autodoc]] TextStreamer
diff --git a/docs/source/en/main_classes/text_generation.mdx b/docs/source/en/main_classes/text_generation.mdx
@@ -24,7 +24,8 @@ of the generation method.
 
 To learn how to inspect a model's generation configuration, what are the defaults, how to change the parameters ad hoc,
 and how to create and save a customized generation configuration, refer to the
-[text generation strategies guide](../generation_strategies).
+[text generation strategies guide](../generation_strategies). The guide also explains how to use related features,
+like token streaming.
 
 ## GenerationConfig
 

diff --git a/examples/pytorch/README.md b/examples/pytorch/README.md
@@ -262,13 +262,13 @@ First, install the Neptune client library. You can do it with either `pip` or `c
 `pip`:
 
 ```bash
-pip install neptune-client
+pip install neptune
 ```
 
 `conda`:
 
 ```bash
-conda install -c conda-forge neptune-client
+conda install -c conda-forge neptune
 ```
 
 Next, in your model training script, import `NeptuneCallback`:
@@ -283,8 +283,8 @@ To enable Neptune logging, in your `TrainingArguments`, set the `report_to` argu
 training_args = TrainingArguments(
     "quick-training-distilbert-mrpc", 
     evaluation_strategy="steps",
-    eval_steps = 20,
-    report_to = "neptune",
+    eval_steps=20,
+    report_to="neptune",
 )
 
 trainer = Trainer(
@@ -294,6 +294,8 @@ trainer = Trainer(
 )
 ```
 
+**Note:** This method requires saving your Neptune credentials as environment variables (see the bottom of the section).
+
 Alternatively, for more logging options, create a Neptune callback:
 
 ```python
@@ -318,7 +320,7 @@ neptune_callback = NeptuneCallback(
 Pass the callback to the Trainer:
 
 ```python
-training_args = TrainingArguments(..., report_to = None)
+training_args = TrainingArguments(..., report_to=None)
 trainer = Trainer(
     model,
     training_args,
@@ -336,7 +338,7 @@ Now, when you start the training with `trainer.train()`, your metadata will be l
 | `NEPTUNE_API_TOKEN`  | Your Neptune API token. To find and copy it, click your Neptune avatar and select **Get your API token**. |
 | `NEPTUNE_PROJECT` | The full name of your Neptune project (`workspace-name/project-name`). To find and copy it, head to **project settings** &rarr; **Properties**. |
 
-For detailed instructions and examples, see the [Neptune docs](https://docs.neptune.ai/integrations-and-supported-tools/model-training/hugging-face).
+For detailed instructions and examples, see the [Neptune docs](https://docs.neptune.ai/integrations/transformers/).
 
 ### ClearML
 
@@ -373,4 +375,4 @@ Advanced configuration is possible by setting environment variables:
 | CLEARML_PROJECT    | Name of the project in ClearML. (default: `"HuggingFace Transformers"`) |
 | CLEARML_TASK       | Name of the task in ClearML. (default: `"Trainer"`) |
 
-Additional configuration options are available through generic [clearml environment variables](https://clear.ml/docs/latest/docs/configs/env_vars).
+Additional configuration options are available through generic [clearml environment variables](https://clear.ml/docs/latest/docs/configs/env_vars).
diff --git a/setup.py b/setup.py
@@ -38,6 +38,10 @@
 7. Build both the sources and the wheel. Do not change anything in setup.py between
    creating the wheel and the source distribution (obviously).
 
+   Clean up your build and dist folders (to avoid re-uploading oldies):
+   rm -rf dist
+   rm -rf build
+
    For the wheel, run: "python setup.py bdist_wheel" in the top level directory.
    (this will build a wheel for the python version you use to build it).
 
@@ -46,10 +50,10 @@
 
 8. Check that everything looks correct by uploading the package to the pypi test server:
 
-   twine upload dist/* -r pypitest
+   twine upload dist/* -r testpypi
    (pypi suggest using twine as other methods upload files via plaintext.)
    You may have to specify the repository url, use the following command then:
-   twine upload dist/* -r pypitest --repository-url=https://test.pypi.org/legacy/
+   twine upload dist/* -r testpypi --repository-url=https://test.pypi.org/legacy/
 
    Check that you can install it in a virtualenv by running:
    pip install -i https://testpypi.python.org/pypi transformers
@@ -58,6 +62,8 @@
    python -c "from transformers import pipeline; classifier = pipeline('text-classification'); print(classifier('What a nice release'))"
    python -c "from transformers import *"
 
+   If making a patch release, double check the bug you are patching is indeed resolved.
+
 9. Upload the final version to actual pypi:
    twine upload dist/* -r pypi
 
@@ -153,7 +159,7 @@
     "rhoknp>=1.1.0",
     "rjieba",
     "rouge-score!=0.0.7,!=0.0.8,!=0.1,!=0.1.1",
-    "ruff>=0.0.241",
+    "ruff>=0.0.241,<=0.0.259",
     "sacrebleu>=1.4.12,<2.0.0",
     "sacremoses",
     "safetensors>=0.2.1",

diff --git a/src/transformers/__init__.py b/src/transformers/__init__.py
@@ -96,7 +96,7 @@
     "feature_extraction_sequence_utils": ["SequenceFeatureExtractor"],
     "feature_extraction_utils": ["BatchFeature", "FeatureExtractionMixin"],
     "file_utils": [],
-    "generation": ["GenerationConfig"],
+    "generation": ["GenerationConfig", "TextStreamer"],
     "hf_argparser": ["HfArgumentParser"],
     "image_transforms": [],
     "integrations": [
@@ -3769,7 +3769,7 @@
     from .feature_extraction_utils import BatchFeature, FeatureExtractionMixin
 
     # Generation
-    from .generation import GenerationConfig
+    from .generation import GenerationConfig, TextStreamer
     from .hf_argparser import HfArgumentParser
 
     # Integrations

diff --git a/src/transformers/dependency_versions_table.py b/src/transformers/dependency_versions_table.py
@@ -59,7 +59,7 @@
     "rhoknp": "rhoknp>=1.1.0",
     "rjieba": "rjieba",
     "rouge-score": "rouge-score!=0.0.7,!=0.0.8,!=0.1,!=0.1.1",
-    "ruff": "ruff>=0.0.241",
+    "ruff": "ruff>=0.0.241,<=0.0.259",
     "sacrebleu": "sacrebleu>=1.4.12,<2.0.0",
     "sacremoses": "sacremoses",
     "safetensors": "safetensors>=0.2.1",

diff --git a/src/transformers/generation/__init__.py b/src/transformers/generation/__init__.py
@@ -17,8 +17,7 @@
 from ..utils import OptionalDependencyNotAvailable, _LazyModule, is_flax_available, is_tf_available, is_torch_available
 
 
-_import_structure = {"configuration_utils": ["GenerationConfig"]}
-
+_import_structure = {"configuration_utils": ["GenerationConfig"], "streamers": ["TextStreamer"]}
 
 try:
     if not is_torch_available():
@@ -150,6 +149,7 @@
 
 if TYPE_CHECKING:
     from .configuration_utils import GenerationConfig
+    from .streamers import TextStreamer
 
     try:
         if not is_torch_available():

diff --git a/src/transformers/generation/streamers.py b/src/transformers/generation/streamers.py
@@ -0,0 +1,104 @@
+# coding=utf-8
+# Copyright 2023 The HuggingFace Inc. team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from typing import TYPE_CHECKING
+
+
+if TYPE_CHECKING:
+    from ..models.auto import AutoTokenizer
+
+
+class BaseStreamer:
+    """
+    Base class from which `.generate()` streamers should inherit.
+    """
+
+    def put(self, value):
+        """Function that is called by `.generate()` to push new tokens"""
+        raise NotImplementedError()
+
+    def end(self):
+        """Function that is called by `.generate()` to signal the end of generation"""
+        raise NotImplementedError()
+
+
+class TextStreamer(BaseStreamer):
+    """
+    Simple text streamer that prints the token(s) to stdout as soon as entire words are formed.
+
+    Parameters:
+        tokenizer (`AutoTokenizer`):
+            The tokenized used to decode the tokens.
+
+    Examples:
+
+        ```python
+        >>> from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
+
+        >>> tok = AutoTokenizer.from_pretrained("gpt2")
+        >>> model = AutoModelForCausalLM.from_pretrained("gpt2")
+        >>> inputs = tok(["An increasing sequence: one,"], return_tensors="pt")
+        >>> streamer = TextStreamer(tok)
+
+        >>> # Despite returning the usual output, the streamer will also print the generated text to stdout.
+        >>> _ = model.generate(**inputs, streamer=streamer, max_new_tokens=20)
+        An increasing sequence: one, two, three, four, five, six, seven, eight, nine, ten, eleven,
+        ```
+    """
+
+    def __init__(self, tokenizer: "AutoTokenizer"):
+        self.tokenizer = tokenizer
+        self.token_cache = []
+        self.print_len = 0
+
+    def put(self, value):
+        """
+        Recives tokens, decodes them, and prints them to stdout as soon as they form entire words.
+        """
+        if len(value.shape) > 1 and value.shape[0] > 1:
+            raise ValueError("TextStreamer only supports batch size 1")
+        elif len(value.shape) > 1:
+            value = value[0]
+
+        # Add the new token to the cache and decodes the entire thing.
+        self.token_cache.extend(value.tolist())
+        text = self.tokenizer.decode(self.token_cache)
+
+        # After symbol for a new line, we flush the cache.
+        if text.endswith("\n"):
+            printable_text = text[self.print_len :]
+            self.token_cache = []
+            self.print_len = 0
+        # Otherwise, prints until the last space char (simple heuristic to avoid printing incomplete words,
+        # which may change with the subsequent token -- there are probably smarter ways to do this!)
+        else:
+            printable_text = text[self.print_len : text.rfind(" ") + 1]
+            self.print_len += len(printable_text)
+
+        print(printable_text, flush=True, end="")
+
+    def end(self):
+        """Flushes any remaining cache and prints a newline to stdout."""
+        # Flush the cache, if it exists
+        if len(self.token_cache) > 0:
+            text = self.tokenizer.decode(self.token_cache)
+            printable_text = text[self.print_len :]
+            self.token_cache = []
+            self.print_len = 0
+        else:
+            printable_text = ""
+
+        # Print a newline (and the remaining text, if any)
+        print(printable_text, flush=True)