Model pads response with newlines up to max_length #26

marco-ve · 2023-08-25T11:28:18Z

I tried several of the models through Huggingface, and the response is always padded with newlines up to the number of tokens specified by the max_length argument in model.generate().

I also assign pad_token_id=tokenizer.eos_token_id, so I'm not sure why the model is generating these newline characters.

borzunov · 2023-08-25T23:07:51Z

I observe the same thing with codellama/CodeLlama-34b-Instruct-hf on Hugging Face Hub. Quite often, the model starts to generate \n indefinitely instead of generating </s> and stopping.

This is using the standard generation params (temperature=0.2, top_p=0.95) with the prompt format and the example prompt suggested in this repository:

<s>[INST] <<SYS>>
A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
<</SYS>>

In Bash, how do I list all text files in the current directory (excluding subdirectories) that have been modified in the last month? [/INST]

Regenhardt · 2023-08-26T23:40:14Z

This one made less newlines, but still way too many than needed:

You're an intelligent, concise coding assistant. Wrap code in ``` for readability. Don't repeat yourself. Use best practice and good coding standards.

(Using temp 0.8)

liruiw · 2023-08-28T13:44:04Z

same problem here. I am basically using llama recipe's quickstart for training and inference. Inference using the same prompt in this repo will work just fine.

ArthurZucker · 2023-08-28T17:05:09Z

Could you make sure you are using the latest release / main version of transformers?

Regenhardt · 2023-08-28T17:11:12Z

I'm actually using LlamaSharp (https://github.com/SciSharp/LLamaSharp) with the ggml model downloaded from TheBloke

zaventh · 2023-08-28T17:18:17Z

This issue still occurs with revision 4cb1403c377bb630ab92ec56272a6686c2bff315 of codellama/CodeLlama-13b-Instruct-hf running on TGI.

liruiw · 2023-08-28T17:29:18Z

Actually this minimal example works fine on my computer now. A few references 1, 2, 3.

from transformers import AutoTokenizer, LlamaForCausalLM
import transformers
import torch

model_id = "codellama/CodeLlama-7b-Instruct-hf"
tokenizer = AutoTokenizer.from_pretrained(model_id)

pipeline = transformers.pipeline(
    "text-generation",
    model="codellama/CodeLlama-7b-hf",
    torch_dtype=torch.float16,
    device_map="auto",
)

sequences = pipeline(
   'Write the code for quicksort.',
    do_sample=True,
    temperature=0.1,
    top_p=0.9,
    num_return_sequences=1,
    top_k=50,
    max_length=1024,
    eos_token_id=tokenizer.eos_token_id 
)
for seq in sequences:
    print(f"Result: {seq['generated_text']}")

marco-ve · 2023-08-29T08:59:41Z

After upgrading to the latest version of transformers (4.32.1) and huggingface_hub (0.16.4) with pip this gives me

ValueError: Tokenizer class CodeLlamaTokenizer does not exist or is not currently imported.

aliswel-mt · 2023-08-29T12:05:34Z

Instead of act as padding, I faced all the output is new line
Not all the input act like this, but many of them just output a bunch of new line
ps. I am using latest version of transformers (4.32.1)

ArthurZucker · 2023-08-29T12:33:17Z

cc @marco-ve you should install main using pip install git+https://github.com/huggingface/transformers

strokesegment · 2023-08-31T09:06:28Z

Has anyone solved the problem of generating a large number of line breaks?

ArthurZucker · 2023-08-31T13:36:39Z

You are most probably not using main.

sootlasten · 2023-09-12T08:32:45Z

@ArthurZucker is correct. The model repeating a token ad infinitum is the results of the rope_theta param not being read in correctly from the model params. This requires transformers version >= 4.33.0 (see also this thread: https://huggingface.co/TheBloke/Phind-CodeLlama-34B-v1-GPTQ/discussions/2).

Closing this issue.

hijkw added the model-usage issues related to how models are used/loaded label Sep 6, 2023

sootlasten self-assigned this Sep 12, 2023

sootlasten closed this as completed Sep 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model pads response with newlines up to max_length #26

Model pads response with newlines up to max_length #26

marco-ve commented Aug 25, 2023

borzunov commented Aug 25, 2023 •

edited

Regenhardt commented Aug 26, 2023 •

edited

liruiw commented Aug 28, 2023 •

edited

ArthurZucker commented Aug 28, 2023

Regenhardt commented Aug 28, 2023

zaventh commented Aug 28, 2023

liruiw commented Aug 28, 2023 •

edited

marco-ve commented Aug 29, 2023

aliswel-mt commented Aug 29, 2023

ArthurZucker commented Aug 29, 2023

strokesegment commented Aug 31, 2023

ArthurZucker commented Aug 31, 2023

sootlasten commented Sep 12, 2023 •

edited

Model pads response with newlines up to max_length #26

Model pads response with newlines up to max_length #26

Comments

marco-ve commented Aug 25, 2023

borzunov commented Aug 25, 2023 • edited

Regenhardt commented Aug 26, 2023 • edited

liruiw commented Aug 28, 2023 • edited

ArthurZucker commented Aug 28, 2023

Regenhardt commented Aug 28, 2023

zaventh commented Aug 28, 2023

liruiw commented Aug 28, 2023 • edited

marco-ve commented Aug 29, 2023

aliswel-mt commented Aug 29, 2023

ArthurZucker commented Aug 29, 2023

strokesegment commented Aug 31, 2023

ArthurZucker commented Aug 31, 2023

sootlasten commented Sep 12, 2023 • edited

borzunov commented Aug 25, 2023 •

edited

Regenhardt commented Aug 26, 2023 •

edited

liruiw commented Aug 28, 2023 •

edited

liruiw commented Aug 28, 2023 •

edited

sootlasten commented Sep 12, 2023 •

edited