-
Notifications
You must be signed in to change notification settings - Fork 30.6k
Fix too many requests
in TestMistralCommonTokenizer
#40623
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
94f6cc4
to
f8fce12
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall, just to be sure iiuc this is for local caching before starting the tests themself
cls.repo_id, | ||
tokenizer_type="mistral", | ||
local_files_only=cls.local_files_only, | ||
# This is a hack as `list_local_hf_repo_files` from `mistral_common` has a bug |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe even a TODO? Imo not a good state to need this workaround 😓
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added:
TODO: Discuss with
mistral-common
maintainers: after a fix being done there, remove thisrevision
hack
utils/fetch_hub_objects_for_ci.py
Outdated
if is_mistral_common_available(): | ||
from mistral_common.tokens.tokenizers.mistral import MistralTokenizer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We import below as well no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I was tortured by some mistral-common
issues and at the end my brain 😵💫😵💫😵💫
# For `tests/test_tokenization_mistral_common.py:TestMistralCommonTokenizer`, which eventually calls | ||
# `mistral_common.tokens.tokenizers.utils.download_tokenizer_from_hf_hub` which (probably) doesn't have the cache. | ||
if is_mistral_common_available(): | ||
from mistral_common.tokens.tokenizers.mistral import MistralTokenizer | ||
|
||
from transformers import AutoTokenizer | ||
from transformers.tokenization_mistral_common import MistralCommonTokenizer | ||
|
||
repo_id = "hf-internal-testing/namespace-mistralai-repo_name-Mistral-Small-3.1-24B-Instruct-2503" | ||
AutoTokenizer.from_pretrained(repo_id, tokenizer_type="mistral") | ||
MistralCommonTokenizer.from_pretrained(repo_id) | ||
MistralTokenizer.from_hf_hub(repo_id) | ||
|
||
repo_id = "mistralai/Voxtral-Mini-3B-2507" | ||
AutoTokenizer.from_pretrained(repo_id) | ||
MistralTokenizer.from_hf_hub(repo_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how bloated this file will become but might be nice to split into different functions already?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's do something later. For now, the most important is just to git rid of these annoying connection errorrrrrrrrrs!
What does this PR do?
We have for this test the following error (flaky)
This PR just try to cache the tokenizers in a previous step before tests being run
.
mistral_common
has a bug and make things a bit hard to handle in a clean way that I have to do a hack withrevision=None
.https://app.circleci.com/pipelines/github/huggingface/transformers/144453/workflows/8e42765f-71d4-4733-9574-feb160ff8eda/jobs/1910963/parallel-runs/6/steps/6-113