Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Asking to pad but the tokenizer does not have a padding token. Please select a token to use as `pad_token #8603

Closed
1 task done
sanbuphy opened this issue Jun 13, 2024 · 7 comments
Assignees
Labels
bug Something isn't working stale

Comments

@sanbuphy
Copy link

软件环境

- paddlepaddle:develop
- paddlepaddle-gpu: develop 11.8
- paddlenlp:  lastest  4609d07a54ab97974b962b536dde7164ab15db93

重复问题

  • I have searched the existing issues

错误描述

meta-llama/Meta-Llama-3-8B-Instruct infer error

(…)nstruct/model-00004-of-00004.safetensors:  94%|▉| 1.10G/1.17G [00:13<00:00, 8�[A
(…)nstruct/model-00004-of-00004.safetensors:  95%|▉| 1.11G/1.17G [00:13<00:00, 7�[A
(…)nstruct/model-00004-of-00004.safetensors:  96%|▉| 1.12G/1.17G [00:13<00:00, 8�[A
(…)nstruct/model-00004-of-00004.safetensors:  97%|▉| 1.13G/1.17G [00:13<00:00, 7�[A
(…)nstruct/model-00004-of-00004.safetensors:  98%|▉| 1.14G/1.17G [00:14<00:00, 6�[A
(…)nstruct/model-00004-of-00004.safetensors: 100%|█| 1.17G/1.17G [00:14<00:00, 8�[A
Downloading shards: 100%|█████████████████████████| 4/4 [03:24<00:00, 51.14s/it]
W0613 23:29:27.245162 141364 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 12.0, Runtime API Version: 11.8
W0613 23:29:27.246907 141364 gpu_resources.cc:164] device: 0, cuDNN Version: 8.9.
Loading checkpoint shards: 100%|██████████████████| 4/4 [03:39<00:00, 54.87s/it][32m[2024-06-13 23:33:27,358] [    INFO][0m - All model checkpoint weights were used when initializing LlamaForCausalLM.
�[0m
�[32m[2024-06-13 23:33:27,359] [    INFO][0m - All the weights of LlamaForCausalLM were initialized from the model checkpoint at meta-llama/Meta-Llama-3-8B-Instruct.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training.�[0m
(…)ama-3-8B-Instruct/generation_config.json: 100%|█| 126/126 [00:00<00:00, 489kB
�[32m[2024-06-13 23:33:27,486] [    INFO][0m - Loading configuration file /home/aistudio/.paddlenlp/models/meta-llama/Meta-Llama-3-8B-Instruct/generation_config.json�[0m
�[32m[2024-06-13 23:33:27,487] [    INFO][0m - We are using <class 'paddlenlp.transformers.llama.configuration.LlamaConfig'> to load 'meta-llama/Meta-Llama-3-8B-Instruct'.�[0m
�[32m[2024-06-13 23:33:27,487] [    INFO][0m - Loading configuration file /home/aistudio/.paddlenlp/models/meta-llama/Meta-Llama-3-8B-Instruct/config.json�[0m
�[32m[2024-06-13 23:33:27,488] [    INFO][0m - Loading configuration file /home/aistudio/.paddlenlp/models/meta-llama/Meta-Llama-3-8B-Instruct/generation_config.json�[0m
�[32m[2024-06-13 23:33:27,490] [    INFO][0m - Start predict�[0m
�[31m[2024-06-13 23:33:27,491] [   ERROR][0m - Using pad_token, but it is not set yet.�[0m
Traceback (most recent call last):
  File "/home/aistudio/work/PaddleNLP/llm/predictor.py", line 1651, in <module>
    predict()
  File "/home/aistudio/work/PaddleNLP/llm/predictor.py", line 1596, in predict
    outputs = predictor.predict(batch_source_text)
  File "/home/aistudio/work/PaddleNLP/llm/predictor.py", line 251, in predict
    tokenized_source = self._preprocess(input_texts)
  File "/home/aistudio/work/PaddleNLP/llm/predictor.py", line 226, in _preprocess
    tokenized_source = self.tokenizer(
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddlenlp/transformers/tokenizer_utils_base.py", line 2248, in __call__
    return self.batch_encode(
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddlenlp/transformers/tokenizer_utils_base.py", line 2523, in batch_encode
    padding_strategy, truncation_strategy, max_length, kwargs = self._get_padding_truncation_strategies(
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddlenlp/transformers/tokenizer_utils_base.py", line 2004, in _get_padding_truncation_strategies
    raise ValueError(
ValueError: Asking to pad but the tokenizer does not have a padding token. Please select a token to use as `pad_token` `(tokenizer.pad_token = tokenizer.eos_token e.g.)` or add a new pad token via `tokenizer.add_special_tokens({'pad_token': '[PAD]'})`.


### 稳定复现步骤 & 代码

!pip install tiktoken
!python predictor.py --model_name_or_path meta-llama/Meta-Llama-3-8B-Instruct --dtype=float16

@sanbuphy sanbuphy added the bug Something isn't working label Jun 13, 2024
@sanbuphy
Copy link
Author

Using unk_token, but it is not set yet. same question in qwen2 inference:

[33m[2024-06-13 23:21:55,506] [ WARNING]�[0m - if you run ring_flash_attention.py, please ensure you install the paddlenlp_ops by following the instructions provided at https://github.com/PaddlePaddle/PaddleNLP/blob/develop/csrc/README.md�[0m
�[32m[2024-06-13 23:21:56,948] [    INFO]�[0m - We are using <class 'paddlenlp.transformers.qwen2.tokenizer.Qwen2Tokenizer'> to load 'Qwen/Qwen2-7B-Instruct'.�[0m
�[31m[2024-06-13 23:21:57,310] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:21:57,310] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:21:57,310] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:21:57,310] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:21:57,310] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:21:57,310] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:21:57,310] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:21:57,310] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[32m[2024-06-13 23:21:57,311] [    INFO]�[0m - We are using <class 'paddlenlp.transformers.qwen2.configuration.Qwen2Config'> to load 'Qwen/Qwen2-7B-Instruct'.�[0m
�[32m[2024-06-13 23:21:57,311] [    INFO]�[0m - Loading configuration file /home/aistudio/.paddlenlp/models/Qwen/Qwen2-7B-Instruct/config.json�[0m
�[32m[2024-06-13 23:21:57,312] [    INFO]�[0m - We are using <class 'paddlenlp.transformers.qwen2.modeling.Qwen2ForCausalLM'> to load 'Qwen/Qwen2-7B-Instruct'.�[0m
�[32m[2024-06-13 23:21:57,312] [    INFO]�[0m - Loading configuration file /home/aistudio/.paddlenlp/models/Qwen/Qwen2-7B-Instruct/config.json�[0m
�[32m[2024-06-13 23:21:57,313] [    INFO]�[0m - Loading weights file from cache at /home/aistudio/.paddlenlp/models/Qwen/Qwen2-7B-Instruct/model.safetensors.index.json�[0m
Downloading shards: 100%|██████████████████████| 4/4 [00:00<00:00, 26255.42it/s]
W0613 23:21:57.318940 134458 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 12.0, Runtime API Version: 11.8
W0613 23:21:57.320240 134458 gpu_resources.cc:164] device: 0, cuDNN Version: 8.9.
Loading checkpoint shards: 100%|██████████████████| 4/4 [03:18<00:00, 49.74s/it]
�[32m[2024-06-13 23:25:34,697] [    INFO]�[0m - All model checkpoint weights were used when initializing Qwen2ForCausalLM.
�[0m
�[32m[2024-06-13 23:25:34,697] [    INFO]�[0m - All the weights of Qwen2ForCausalLM were initialized from the model checkpoint at Qwen/Qwen2-7B-Instruct.
If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2ForCausalLM for predictions without further training.�[0m
�[32m[2024-06-13 23:25:34,700] [    INFO]�[0m - Loading configuration file /home/aistudio/.paddlenlp/models/Qwen/Qwen2-7B-Instruct/generation_config.json�[0m
�[32m[2024-06-13 23:25:34,700] [    INFO]�[0m - Generation config file not found, using a generation config created from the model config.�[0m
�[32m[2024-06-13 23:25:34,701] [    INFO]�[0m - We are using <class 'paddlenlp.transformers.qwen2.configuration.Qwen2Config'> to load 'Qwen/Qwen2-7B-Instruct'.�[0m
�[32m[2024-06-13 23:25:34,701] [    INFO]�[0m - Loading configuration file /home/aistudio/.paddlenlp/models/Qwen/Qwen2-7B-Instruct/config.json�[0m
�[32m[2024-06-13 23:25:34,701] [    INFO]�[0m - Loading configuration file /home/aistudio/.paddlenlp/models/Qwen/Qwen2-7B-Instruct/generation_config.json�[0m
�[33m[2024-06-13 23:25:34,702] [ WARNING]�[0m - Can't find generation config, so it will not use generation_config field in the model config�[0m
�[32m[2024-06-13 23:25:34,703] [    INFO]�[0m - Start predict�[0m
�[31m[2024-06-13 23:25:48,343] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,343] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,343] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,343] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,343] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,343] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,343] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,344] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,344] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,344] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,344] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,344] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,344] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,344] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,344] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,344] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,344] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,344] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,344] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,344] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,345] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,345] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,345] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,345] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,345] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,345] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,345] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,345] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,345] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,345] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,345] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,345] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,345] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,345] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,346] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,346] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,346] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,346] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,346] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,346] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,346] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,346] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,346] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,346] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,346] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,346] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,346] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,347] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,347] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,347] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,347] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,347] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,347] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,347] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,347] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,347] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,347] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,347] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,347] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,347] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,347] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,347] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,348] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,348] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,348] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,348] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,348] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,348] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,348] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,348] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,348] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,348] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,348] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,348] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,348] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,348] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,349] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,349] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,349] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,349] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,349] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,349] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,349] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,349] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,349] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,349] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,349] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,349] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,349] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,349] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,350] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,350] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,350] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,350] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,350] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,350] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,350] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,350] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,350] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,350] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,350] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,350] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,350] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,350] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,350] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,351] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,351] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,351] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,351] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,351] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,351] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,351] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,351] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,351] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,351] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,351] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,351] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,351] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,351] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,351] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,352] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,352] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,352] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,352] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,352] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,352] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,352] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,352] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,352] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,352] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,352] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,352] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,352] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,352] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,353] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,353] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,353] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,353] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,353] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,353] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,353] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,353] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,353] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,353] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,353] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,353] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,353] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,353] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,353] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,354] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,354] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,354] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,354] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,354] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,354] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,354] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,354] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,354] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,354] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,354] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,354] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,354] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,354] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,354] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,355] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,355] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,355] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,355] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,355] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,355] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,355] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,355] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,355] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,355] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,355] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,355] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,355] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,355] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,356] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,356] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,356] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,356] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,356] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,356] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,356] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,356] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,356] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,356] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,356] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,356] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,356] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,357] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,357] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,357] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,357] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,357] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,357] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
�[31m[2024-06-13 23:25:48,357] [   ERROR]�[0m - Using unk_token, but it is not set yet.�[0m
Traceback (most recent call last):
  File "/home/aistudio/work/PaddleNLP/llm/predictor.py", line 1651, in <module>
    predict()
  File "/home/aistudio/work/PaddleNLP/llm/predictor.py", line 1596, in predict
    outputs = predictor.predict(batch_source_text)
  File "/home/aistudio/work/PaddleNLP/llm/predictor.py", line 253, in predict
    decoded_predictions = self._postprocess(predictions)
  File "/home/aistudio/work/PaddleNLP/llm/predictor.py", line 245, in _postprocess
    decoded_predictions = self.tokenizer.batch_decode(
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddlenlp/transformers/tokenizer_utils_base.py", line 3200, in batch_decode
    return [
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddlenlp/transformers/tokenizer_utils_base.py", line 3201, in <listcomp>
    self.decode(
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddlenlp/transformers/tokenizer_utils_base.py", line 3239, in decode
    return self._decode(
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddlenlp/transformers/qwen2/tokenizer.py", line 294, in _decode
    return super()._decode(
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddlenlp/transformers/tokenizer_utils.py", line 1842, in _decode
    sub_texts.append(self.convert_tokens_to_string(current_sub_text))
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddlenlp/transformers/qwen2/tokenizer.py", line 280, in convert_tokens_to_string
    text = "".join(tokens)
TypeError: sequence item 196: expected str instance, NoneType found

@DrownFish19
Copy link
Collaborator

DrownFish19 commented Jun 20, 2024

Qwen修复PR:#8601
LLaMA修复PR:#8630

@yuanlehome
Copy link
Collaborator

yuanlehome commented Jun 20, 2024

Qwen修复PR:#8601 LLaMA修复PR:#8630

#8630 并没有修复LLaMA的问题哈,我不知道咋修。

@DrownFish19
Copy link
Collaborator

Qwen修复PR:#8601 LLaMA修复PR:#8630

#8630 并没有修复LLaMA的问题哈,我不知道咋修。

LLama3的tokenizer中缺少pad_token,修复代码如下,我看已经加上了

if (isinstance(tokenizer, LlamaTokenizer) or isinstance(tokenizer, Llama3Tokenizer)) and not tokenizer.pad_token:
        tokenizer.pad_token = tokenizer.unk_token

@yuanlehome
Copy link
Collaborator

Qwen修复PR:#8601 LLaMA修复PR:#8630

#8630 并没有修复LLaMA的问题哈,我不知道咋修。

LLama3的tokenizer中缺少pad_token,修复代码如下,我看已经加上了

if (isinstance(tokenizer, LlamaTokenizer) or isinstance(tokenizer, Llama3Tokenizer)) and not tokenizer.pad_token:
        tokenizer.pad_token = tokenizer.unk_token

没用的,llama3连unk_token都set不了,这里的代码改动可以先忽略。我们根本上需要同时解决pad_token和unk_token的set问题才行,我猜测,其他token可能也同样无法set。

Copy link

This issue is stale because it has been open for 60 days with no activity. 当前issue 60天内无活动,被标记为stale。

@github-actions github-actions bot added the stale label Aug 20, 2024
Copy link

github-actions bot commented Sep 3, 2024

This issue was closed because it has been inactive for 14 days since being marked as stale. 当前issue 被标记为stale已有14天,即将关闭。

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working stale
Projects
None yet
Development

No branches or pull requests

4 participants