You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thu May 23 15:30:20 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.23.06 Driver Version: 545.23.06 CUDA Version: 12.3 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA A100-SXM4-80GB Off | 00000000:0F:00.0 Off | 0 |
| N/A 37C P0 61W / 400W | 4MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
Information
The official example scripts
My own modified scripts
🐛 Describe the bug
Thanks for the open source model. I init Llama 3 70B as per the recipe for local inference. However when I do inference I see warning:
Asking to pad to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no padding.
Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.
I took max_length=None from the recipe. Also, at HF, at here, they advise to do tokenizer(batch_sentences, padding='max_length', truncation=True) (without max_length but it is None by default). However, the model doesn't provide the max_length. So how to set max_length
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
> Asking to pad to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no padding.
> Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.
> Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Expected behavior
No warning expected.
The text was updated successfully, but these errors were encountered:
System Info
Cuda 12.1
PyTorch 2.3.0
Python 3.11
Information
🐛 Describe the bug
Thanks for the open source model. I init Llama 3 70B as per the recipe for local inference. However when I do inference I see warning:
I took max_length=None from the recipe. Also, at HF, at here, they advise to do
tokenizer(batch_sentences, padding='max_length', truncation=True)
(without max_length but it is None by default). However, the model doesn't provide the max_length. So how to set max_lengthMy code:
Error logs
Expected behavior
No warning expected.
The text was updated successfully, but these errors were encountered: