Warning: Asking to pad to max_length but no maximum length is provided and the model has no predefined maximum length. #539

artkpv · 2024-05-23T15:44:43Z

System Info

Cuda 12.1
PyTorch 2.3.0
Python 3.11

Thu May 23 15:30:20 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.23.06              Driver Version: 545.23.06    CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A100-SXM4-80GB          Off | 00000000:0F:00.0 Off |                    0 |
| N/A   37C    P0              61W / 400W |      4MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

Information

The official example scripts
My own modified scripts

🐛 Describe the bug

Thanks for the open source model. I init Llama 3 70B as per the recipe for local inference. However when I do inference I see warning:

Asking to pad to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no padding.
Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.

I took max_length=None from the recipe. Also, at HF, at here, they advise to do tokenizer(batch_sentences, padding='max_length', truncation=True) (without max_length but it is None by default). However, the model doesn't provide the max_length. So how to set max_length

My code:

        self._model = AutoModelForCausalLM.from_pretrained(
            llm,
            return_dict=True,
            load_in_8bit=llm_kwargs["load_in_8bit"],
            load_in_4bit=llm_kwargs["load_in_4bit"],
            device_map="auto",
            low_cpu_mem_usage=True,
            attn_implementation="sdpa" if llm_kwargs.get("use_fast_kernels", False) else None,
            torch_dtype=torch.bfloat16
        )
        self._model.eval()

        tokenizer = AutoTokenizer.from_pretrained(self._llm)
        prompt = tokenizer.apply_chat_template(
            prompt, tokenize=False, add_generation_prompt=True
        )
        tokenizer.pad_token = tokenizer.eos_token
        batch = tokenizer(
            prompt,
            padding='max_length', 
            truncation=True, 
            max_length=None,
            return_tensors="pt"
        )
        batch = {k: v.to("cuda") for k, v in batch.items()}
        outputs = self._model.generate(
            **batch,
            **self._gen_kwargs,
        )
        # Take only response:
        outputs = outputs[0][batch['input_ids'][0].size(0):]
        response = tokenizer.decode(outputs, skip_special_tokens=True)

Error logs

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
> Asking to pad to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no padding.
> Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.
> Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.

Expected behavior

No warning expected.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Warning: Asking to pad to max_length but no maximum length is provided and the model has no predefined maximum length. #539

Warning: Asking to pad to max_length but no maximum length is provided and the model has no predefined maximum length. #539

artkpv commented May 23, 2024 •

edited

Warning: Asking to pad to max_length but no maximum length is provided and the model has no predefined maximum length. #539

Warning: Asking to pad to max_length but no maximum length is provided and the model has no predefined maximum length. #539

Comments

artkpv commented May 23, 2024 • edited

System Info

Information

🐛 Describe the bug

Error logs

Expected behavior

artkpv commented May 23, 2024 •

edited