Asking to pad but the tokenizer does not have a padding token. Please select a token to use as `pad_token` `(tokenizer.pad_token = tokenizer.eos_token e.g.)` or add a new pad token via `tokenizer.add_special_tokens({'pad_token': '[PAD]'})`

### Checked other resources

- [X] I added a very descriptive title to this issue.
- [X] I searched the LangChain documentation with the integrated search.
- [X] I used the GitHub search to find a similar question and didn't find it.
- [X] I am sure that this is a bug in LangChain rather than my code.
- [X] The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

### Example Code

```python
import os
from langchain.vectorstores.qdrant import Qdrant
from langchain.document_loaders.pdf import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings.huggingface import HuggingFaceBgeEmbeddings


def load_pdf(
    file: str,
    collection_name: str,
    chunk_size: int = 512,
    chunk_overlap: int = 32,
):
    loader = PyPDFLoader(file)
    documents = loader.load()
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=chunk_size, chunk_overlap=chunk_overlap
    )
    texts = text_splitter.split_documents(documents)
    embeddings = HuggingFaceBgeEmbeddings(
        model_name=os.environ.get("MODEL_NAME", "microsoft/phi-2"),
        model_kwargs=dict(device="cpu"),
        encode_kwargs=dict(normalize_embeddings=False),
    )

    url = os.environ.get("VDB_URL", "http://localhost:6333")

    qdrant = Qdrant.from_documents(
        texts,
        embeddings,
        url=url,
        collection_name=collection_name,
        prefer_grpc=False,
    )

    return qdrant

load_pdf("/Users/alvynabranches/Downloads/bh1.pdf", "buget2425")
```

### Error Message and Stack Trace (if applicable)

```bash
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Traceback (most recent call last):
  File "/Users/alvynabranches/rag/rag/ingest.py", line 39, in <module>
    load_pdf("/Users/alvynabranches/Downloads/bh1.pdf", "buget2425")
  File "/Users/alvynabranches/rag/rag/ingest.py", line 29, in load_pdf
    qdrant = Qdrant.from_documents(
             ^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.12/site-packages/langchain_core/vectorstores.py", line 528, in from_documents
    return cls.from_texts(texts, embedding, metadatas=metadatas, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.12/site-packages/langchain_community/vectorstores/qdrant.py", line 1334, in from_texts
    qdrant = cls.construct_instance(
             ^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.12/site-packages/langchain_community/vectorstores/qdrant.py", line 1591, in construct_instance
    partial_embeddings = embedding.embed_documents(texts[:1])
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.12/site-packages/langchain_community/embeddings/huggingface.py", line 257, in embed_documents
    embeddings = self.client.encode(texts, **self.encode_kwargs)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.12/site-packages/sentence_transformers/SentenceTransformer.py", line 345, in encode
    features = self.tokenize(sentences_batch)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.12/site-packages/sentence_transformers/SentenceTransformer.py", line 553, in tokenize
    return self._first_module().tokenize(texts)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.12/site-packages/sentence_transformers/models/Transformer.py", line 146, in tokenize
    self.tokenizer(
  File "/opt/homebrew/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 2829, in __call__
    encodings = self._call_one(text=text, text_pair=text_pair, **all_kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 2915, in _call_one
    return self.batch_encode_plus(
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 3097, in batch_encode_plus
    padding_strategy, truncation_strategy, max_length, kwargs = self._get_padding_truncation_strategies(
                                                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 2734, in _get_padding_truncation_strategies
    raise ValueError(
ValueError: Asking to pad but the tokenizer does not have a padding token. Please select a token to use as `pad_token` `(tokenizer.pad_token = tokenizer.eos_token e.g.)` or add a new pad token via `tokenizer.add_special_tokens({'pad_token': '[PAD]'})`.
```

### Description

I am using the langchain library to store pdf on qdrant. The model which I am using is phi-2. I tried to solve it by changing the kwargs but it is still not working. 

### System Info

```bash
pip3 freeze | grep langchain
```

```
langchain==0.1.12
langchain-community==0.0.28
langchain-core==0.1.31
langchain-text-splitters==0.0.1
```

#### Platform
Apple M3 Max
macOS Sonoma
Version 14.1

```bash
python3 -V
```

```
Python 3.12.2
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Asking to pad but the tokenizer does not have a padding token. Please select a token to use as `pad_token` `(tokenizer.pad_token = tokenizer.eos_token e.g.)` or add a new pad token via `tokenizer.add_special_tokens({'pad_token': '[PAD]'})` #19061

Checked other resources

Example Code

Error Message and Stack Trace (if applicable)

Description

System Info

Platform

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Asking to pad but the tokenizer does not have a padding token. Please select a token to use as pad_token (tokenizer.pad_token = tokenizer.eos_token e.g.) or add a new pad token via tokenizer.add_special_tokens({'pad_token': '[PAD]'}) #19061

Description

Checked other resources

Example Code

Error Message and Stack Trace (if applicable)

Description

System Info

Platform

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Asking to pad but the tokenizer does not have a padding token. Please select a token to use as `pad_token` `(tokenizer.pad_token = tokenizer.eos_token e.g.)` or add a new pad token via `tokenizer.add_special_tokens({'pad_token': '[PAD]'})` #19061