Merge pull request huggingface#61 from huggingface/main

rraze
jameshennessytempus · May 17, 2023 · 2c31a7f · 2c31a7f
2 parents 2efbd38 + 46d2468
commit 2c31a7f
Show file tree

Hide file tree

Showing 42 changed files with 734 additions and 117 deletions.
diff --git a/.circleci/config.yml b/.circleci/config.yml
@@ -43,6 +43,12 @@ jobs:
                 else
                     touch test_preparation/test_list.txt
                 fi
+            - run: |
+                if [ -f doctest_list.txt ]; then
+                    cp doctest_list.txt test_preparation/doctest_list.txt
+                else
+                    touch test_preparation/doctest_list.txt
+                fi
             - run: |
                 if [ -f test_repo_utils.txt ]; then
                     mv test_repo_utils.txt test_preparation/test_repo_utils.txt
@@ -71,6 +77,8 @@ jobs:
                   fi
             - store_artifacts:
                   path: test_preparation/test_list.txt
+            - store_artifacts:
+                  path: test_preparation/doctest_list.txt
             - store_artifacts:
                   path: ~/transformers/test_preparation/filtered_test_list.txt
             - store_artifacts:

diff --git a/.circleci/create_circleci_config.py b/.circleci/create_circleci_config.py
@@ -483,7 +483,6 @@ def job_name(self):
     hub_job,
     onnx_job,
     exotic_models_job,
-    doc_test_job
 ]
 EXAMPLES_TESTS = [
     examples_torch_job,
@@ -495,6 +494,8 @@ def job_name(self):
     pipelines_tf_job,
 ]
 REPO_UTIL_TESTS = [repo_utils_job]
+DOC_TESTS = [doc_test_job]
+
 
 def create_circleci_config(folder=None):
     if folder is None:
@@ -552,6 +553,15 @@ def create_circleci_config(folder=None):
     if os.path.exists(example_file) and os.path.getsize(example_file) > 0:
         jobs.extend(EXAMPLES_TESTS)
 
+    doctest_file = os.path.join(folder, "doctest_list.txt")
+    if os.path.exists(doctest_file):
+        with open(doctest_file) as f:
+            doctest_list = f.read()
+    else:
+        doctest_list = []
+    if len(doctest_list) > 0:
+        jobs.extend(DOC_TESTS)
+
     repo_util_file = os.path.join(folder, "test_repo_utils.txt")
     if os.path.exists(repo_util_file) and os.path.getsize(repo_util_file) > 0:
         jobs.extend(REPO_UTIL_TESTS)

diff --git a/MANIFEST.in b/MANIFEST.in
diff --git a/Makefile b/Makefile
@@ -111,3 +111,10 @@ post-release:
 
 post-patch:
 	python utils/release.py --post_release --patch
+
+build-release:
+	rm -rf dist
+	rm -rf build
+	python setup.py bdist_wheel
+	python setup.py sdist
+	python utils/check_build.py
diff --git a/docs/source/en/generation_strategies.mdx b/docs/source/en/generation_strategies.mdx
@@ -338,9 +338,8 @@ For the complete list of the available parameters, refer to the [API documentati
 Assisted decoding is a modification of the decoding strategies above that uses an assistant model with the same
 tokenizer (ideally a much smaller model) to greedily generate a few candidate tokens. The main model then validates
 the candidate tokens in a single forward pass, which speeds up the decoding process. Currently, only greedy search
-and sampling are supported with assisted decoding, and doesn't support batched inputs.
-
-<!-- TODO: add link to the blog post about assisted decoding when it exists -->
+and sampling are supported with assisted decoding, and doesn't support batched inputs. To learn more about assisted
+decoding, check [this blog post](https://huggingface.co/blog/assisted-generation).
 
 To enable assisted decoding, set the `assistant_model` argument with a model.
 
@@ -364,8 +363,6 @@ To enable assisted decoding, set the `assistant_model` argument with a model.
 When using assisted decoding with sampling methods, you can use the `temperarure` argument to control the randomness
 just like in multinomial sampling. However, in assisted decoding, reducing the temperature will help improving latency.
 
-<!-- TODO: link the blog post again to explain why the tradeoff exists -->
-
 ```python
 >>> from transformers import AutoModelForCausalLM, AutoTokenizer
 

diff --git a/docs/source/en/model_doc/pix2struct.mdx b/docs/source/en/model_doc/pix2struct.mdx
@@ -25,6 +25,8 @@ Tips:
 Pix2Struct has been fine tuned on a variety of tasks and datasets, ranging from image captioning, visual question answering (VQA) over different inputs (books, charts, science diagrams), captioning UI components etc. The full list can be found in Table 1 of the paper.
 We therefore advise you to use these models for the tasks they have been fine tuned on. For instance, if you want to use Pix2Struct for UI captioning, you should use the model fine tuned on the UI dataset. If you want to use Pix2Struct for image captioning, you should use the model fine tuned on the natural images captioning dataset and so on.
 
+If you want to use the model to perform conditional text captioning, make sure to use the processor with `add_special_tokens=False`.
+
 This model was contributed by [ybelkada](https://huggingface.co/ybelkada).
 The original code can be found [here](https://github.com/google-research/pix2struct).
 

diff --git a/docs/source/ko/_toctree.yml b/docs/source/ko/_toctree.yml
@@ -79,8 +79,8 @@
   - sections:
       - local: in_translation
         title: (번역중) Audio classification
-      - local: in_translation
-        title: (번역중) Automatic speech recognition
+      - local: tasks/asr
+        title: 자동 음성 인식
     title: (번역중) 오디오
     isExpanded: false
   - sections: