langchain-mistralai cannot pull tokenizer from huggingface 401 #20618

couardcourageux · 2024-04-18T18:46:40Z

Checked other resources

I added a very descriptive title to this issue.
I searched the LangChain documentation with the integrated search.
I used the GitHub search to find a similar question and didn't find it.
I am sure that this is a bug in LangChain rather than my code.
The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

Code:

from langchain_mistralai import MistralAIEmbeddings
import assistant.settings as settings

def getMistralEmbeddings():
    return MistralAIEmbeddings(mistral_api_key=settings.MISTRAL_API_KEY) #well defined variable from env, works on my personnal machine at the time i'm publishing the issue

Error Message and Stack Trace (if applicable)

Traceback (most recent call last):
File "/app/assistant_api.py", line 37, in
retriever = obtain_full_qdrant_tmdb()
File "/app/assistant/rag/retrievers/qdrant_connector.py", line 30, in obtain_full_qdrant_tmdb
embeddings = getMistralEmbeddings()
File "/app/assistant/rag/embeddings/mistral_embeddings.py", line 5, in getMistralEmbeddings
return MistralAIEmbeddings(mistral_api_key=settings.MISTRAL_API_KEY)
File "/usr/local/lib/python3.10/site-packages/pydantic/v1/main.py", line 339, in init
values, fields_set, validation_error = validate_model(pydantic_self.class, data)
File "/usr/local/lib/python3.10/site-packages/pydantic/v1/main.py", line 1100, in validate_model
values = validator(cls_, values)
File "/usr/local/lib/python3.10/site-packages/langchain_mistralai/embeddings.py", line 86, in validate_environment
values["tokenizer"] = Tokenizer.from_pretrained(
File "/usr/local/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 119, in _inner_fn
return fn(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1403, in hf_hub_download
raise head_call_error
File "/usr/local/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1261, in hf_hub_download
metadata = get_hf_file_metadata(
File "/usr/local/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 119, in _inner_fn
return fn(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1674, in get_hf_file_metadata
r = _request_wrapper(
File "/usr/local/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 369, in _request_wrapper
response = _request_wrapper(
File "/usr/local/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 393, in _request_wrapper
hf_raise_for_status(response)
File "/usr/local/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 321, in hf_raise_for_status
raise GatedRepoError(message, response) from e
huggingface_hub.utils._errors.GatedRepoError: 401 Client Error. (Request ID: Root=1-662165b4-2224fae43a813b360dc7b222;20b14ba7-ef96-4d6a-8bef-1fa42c4f9291)

Cannot access gated repo for url https://huggingface.co/mistralai/Mixtral-8x7B-v0.1/resolve/main/tokenizer.json.
Repo model mistralai/Mixtral-8x7B-v0.1 is gated. You must be authenticated to access it.
Traceback (most recent call last):

Description

This error, and this stackTrace occur when deployed on a kubernetes server since today afternoon.
It's seems to me it's a bug because i cannot recreate the error on my personnal machine, even when i deleted the virtual environment, and the pycaches folders, and then reinstalled everything from the requirements.txt

I know i should authenticate, but firstly, why, and secondly how ?
i came across some solutions where you have to put your huggingface token inside the header of the request, but i don't really know where to inject a token like this when using langchain-mistralai

System Info

aiohttp>=3.9.3
aiosignal>=1.3.1
annotated-types>=0.6.0
anyio>=4.3.0
async-timeout>=4.0.3
attrs>=23.2.0
certifi>=2024.2.2
charset-normalizer>=3.3.2
click>=8.1.7
dataclasses-json>=0.6.4
exceptiongroup>=1.2.0
faiss-cpu>=1.8.0
fastapi>=0.110.1
filelock>=3.13.4
frozenlist>=1.4.1
fsspec>=2024.3.1
greenlet>=3.0.3
grpcio>=1.62.1
grpcio-tools>=1.62.1
h11>=0.14.0
h2>=4.1.0
hpack>=4.0.0
httpcore>=1.0.5
httpx>=0.25.2
httpx-sse>=0.4.0
huggingface-hub>=0.22.2
hyperframe>=6.0.1
idna>=3.6
Jinja2>=3.1.3
joblib>=1.4.0
jsonpatch>=1.33
jsonpointer>=2.4
langchain>=0.1.15
langchain-community>=0.0.32
langchain-core>=0.1.41
langchain-mistralai>=0.1.1
langchain-text-splitters>=0.0.1
langsmith>=0.1.43
MarkupSafe>=2.1.5
marshmallow>=3.21.1
mistralai>=0.1.8
mpmath>=1.3.0
multidict>=6.0.5
mypy-extensions>=1.0.0
networkx>=3.3
numpy>=1.26.4
nvidia-cublas-cu12>=12.1.3.1
nvidia-cuda-cupti-cu12>=12.1.105
nvidia-cuda-nvrtc-cu12>=12.1.105
nvidia-cuda-runtime-cu12>=12.1.105
nvidia-cudnn-cu12>=8.9.2.26
nvidia-cufft-cu12>=11.0.2.54
nvidia-curand-cu12>=10.3.2.106
nvidia-cusolver-cu12>=11.4.5.107
nvidia-cusparse-cu12>=12.1.0.106
nvidia-nccl-cu12>=2.19.3
nvidia-nvjitlink-cu12>=12.4.127
nvidia-nvtx-cu12>=12.1.105
orjson>=3.10.0
packaging>=23.2
pandas>=2.2.1
pillow>=10.3.0
portalocker>=2.8.2
protobuf>=4.25.3
pyarrow>=15.0.2
pydantic>=2.6.4
pydantic_core>=2.16.3
python-dateutil>=2.9.0.post0
python-dotenv>=1.0.1
pytz>=2024.1
PyYAML>=6.0.1
qdrant-client>=1.8.2
redis>=5.0.3
regex>=2023.12.25
requests>=2.31.0
safetensors>=0.4.2
scikit-learn>=1.4.2
scipy>=1.13.0
sentence-transformers>=2.6.1
six>=1.16.0
sniffio>=1.3.1
SQLAlchemy>=2.0.29
starlette>=0.37.2
sympy>=1.12
tenacity>=8.2.3
threadpoolctl>=3.4.0
tokenizers>=0.15.2
torch>=2.2.2
tqdm>=4.66.2
transformers>=4.39.3
triton>=2.2.0
typing-inspect>=0.9.0
typing_extensions>=4.11.0
tzdata>=2024.1
urllib3>=2.2.1
uvicorn>=0.29.0
yarl>=1.9.4

reflection · 2024-04-18T19:26:44Z

MISTRAL_API_KEY here isn't related to Hugging Face Hub gate on mistralai/Mixtral-8x7B-v0.1 model. We also ran into this, and expected that if we set HUGGINGFACEHUB_API_TOKEN like in LangChain Huggingface Endpoints doc with an API token of a user with access would fix this issue. Unfortunately, still broken.

couardcourageux · 2024-04-18T19:56:02Z

yes, that's really strange

eyurtsev · 2024-04-18T20:26:08Z

Closing this issue as langchain-mistralai is for the mistral API not for hugging face. If there's a separate issuewith hugging face endpoints feel free to open or 👍 on it

reflection · 2024-04-18T20:59:52Z

@eyurtsev Anyone using from langchain_mistralai.embeddings import MistralAIEmbeddings will run into this issue when it first initializes and tries to download https://huggingface.co/mistralai/Mixtral-8x7B-v0.1/resolve/main/tokenizer.json from Hugging Face.

hslee16 · 2024-04-18T23:26:22Z

Just ran into this issue. Anyone have a workaround for this?

hslee16 · 2024-04-19T00:50:23Z

For those who are facing this issue, I fixed it by doing the following:

Create an account at huggingface.co if you don't already have one
Create a new access token at - https://huggingface.co/settings/tokens
Accept the "terms" at https://huggingface.co/mistralai/Mixtral-8x7B-v0.1/ for using mixtral
Add HF_TOKEN environment variable with the created token from step 2 as the value

Redeploy if required.

efriis · 2024-04-19T01:47:14Z

This is definitely a bug! Working on a fallback for folks without huggingface set up

bootstrapM · 2024-04-20T18:50:41Z

I just ran into this problem. hslee16's fix seems to work though!

#20618 --------- Co-authored-by: Erick Friis <erick@langchain.dev>

codebrain001 · 2024-07-19T12:47:27Z

@efriis the issue is still persisting. Is there any fix without specifying the HF_token? Thank you

dosubot bot added Ɑ: embeddings Related to text embedding models module 🔌: huggingface Primarily related to HuggingFace integrations 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature labels Apr 18, 2024

eyurtsev closed this as completed Apr 18, 2024

eyurtsev reopened this Apr 19, 2024

ccurme mentioned this issue Apr 19, 2024

standard tests: add xfails #20659

Merged

ccurme mentioned this issue Apr 23, 2024

mistral: catch GatedRepoError, release 0.1.3 #20802

Merged

efriis added a commit that referenced this issue Apr 23, 2024

mistral: catch GatedRepoError, release 0.1.3 (#20802)

6622829

#20618 --------- Co-authored-by: Erick Friis <erick@langchain.dev>

hinthornw pushed a commit that referenced this issue Apr 26, 2024

mistral: catch GatedRepoError, release 0.1.3 (#20802)

537f862

#20618 --------- Co-authored-by: Erick Friis <erick@langchain.dev>

codebrain001 mentioned this issue Jul 19, 2024

Fix Mistral integration codebrain001/msc-dissertation#17

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

langchain-mistralai cannot pull tokenizer from huggingface 401 #20618

langchain-mistralai cannot pull tokenizer from huggingface 401 #20618

couardcourageux commented Apr 18, 2024

reflection commented Apr 18, 2024

couardcourageux commented Apr 18, 2024

eyurtsev commented Apr 18, 2024

reflection commented Apr 18, 2024

hslee16 commented Apr 18, 2024

hslee16 commented Apr 19, 2024 •

edited

Loading

efriis commented Apr 19, 2024

bootstrapM commented Apr 20, 2024 •

edited

Loading

codebrain001 commented Jul 19, 2024

langchain-mistralai cannot pull tokenizer from huggingface 401 #20618

langchain-mistralai cannot pull tokenizer from huggingface 401 #20618

Comments

couardcourageux commented Apr 18, 2024

Checked other resources

Example Code

Code:

Error Message and Stack Trace (if applicable)

Description

System Info

reflection commented Apr 18, 2024

couardcourageux commented Apr 18, 2024

eyurtsev commented Apr 18, 2024

reflection commented Apr 18, 2024

hslee16 commented Apr 18, 2024

hslee16 commented Apr 19, 2024 • edited Loading

efriis commented Apr 19, 2024

bootstrapM commented Apr 20, 2024 • edited Loading

codebrain001 commented Jul 19, 2024

hslee16 commented Apr 19, 2024 •

edited

Loading

bootstrapM commented Apr 20, 2024 •

edited

Loading