Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model pads response with newlines up to max_length #26

Closed
marco-ve opened this issue Aug 25, 2023 · 13 comments
Closed

Model pads response with newlines up to max_length #26

marco-ve opened this issue Aug 25, 2023 · 13 comments
Assignees
Labels
model-usage issues related to how models are used/loaded

Comments

@marco-ve
Copy link

I tried several of the models through Huggingface, and the response is always padded with newlines up to the number of tokens specified by the max_length argument in model.generate().

I also assign pad_token_id=tokenizer.eos_token_id, so I'm not sure why the model is generating these newline characters.

@borzunov
Copy link

borzunov commented Aug 25, 2023

I observe the same thing with codellama/CodeLlama-34b-Instruct-hf on Hugging Face Hub. Quite often, the model starts to generate \n indefinitely instead of generating </s> and stopping.

This is using the standard generation params (temperature=0.2, top_p=0.95) with the prompt format and the example prompt suggested in this repository:

<s>[INST] <<SYS>>
A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
<</SYS>>

In Bash, how do I list all text files in the current directory (excluding subdirectories) that have been modified in the last month? [/INST]

@Regenhardt
Copy link

Regenhardt commented Aug 26, 2023

This one made less newlines, but still way too many than needed:

You're an intelligent, concise coding assistant. Wrap code in ``` for readability. Don't repeat yourself. Use best practice and good coding standards.

(Using temp 0.8)

@liruiw
Copy link

liruiw commented Aug 28, 2023

same problem here. I am basically using llama recipe's quickstart for training and inference. Inference using the same prompt in this repo will work just fine.

@ArthurZucker
Copy link

Could you make sure you are using the latest release / main version of transformers?

@Regenhardt
Copy link

I'm actually using LlamaSharp (https://github.com/SciSharp/LLamaSharp) with the ggml model downloaded from TheBloke

@zaventh
Copy link

zaventh commented Aug 28, 2023

This issue still occurs with revision 4cb1403c377bb630ab92ec56272a6686c2bff315 of codellama/CodeLlama-13b-Instruct-hf running on TGI.

@liruiw
Copy link

liruiw commented Aug 28, 2023

Actually this minimal example works fine on my computer now. A few references 1, 2, 3.

from transformers import AutoTokenizer, LlamaForCausalLM
import transformers
import torch

model_id = "codellama/CodeLlama-7b-Instruct-hf"
tokenizer = AutoTokenizer.from_pretrained(model_id)

pipeline = transformers.pipeline(
    "text-generation",
    model="codellama/CodeLlama-7b-hf",
    torch_dtype=torch.float16,
    device_map="auto",
)

sequences = pipeline(
   'Write the code for quicksort.',
    do_sample=True,
    temperature=0.1,
    top_p=0.9,
    num_return_sequences=1,
    top_k=50,
    max_length=1024,
    eos_token_id=tokenizer.eos_token_id 
)
for seq in sequences:
    print(f"Result: {seq['generated_text']}")

@marco-ve
Copy link
Author

After upgrading to the latest version of transformers (4.32.1) and huggingface_hub (0.16.4) with pip this gives me

ValueError: Tokenizer class CodeLlamaTokenizer does not exist or is not currently imported.

@aliswel-mt
Copy link

Instead of act as padding, I faced all the output is new line
Not all the input act like this, but many of them just output a bunch of new line
ps. I am using latest version of transformers (4.32.1)

@ArthurZucker
Copy link

cc @marco-ve you should install main using pip install git+https://github.com/huggingface/transformers

@strokesegment
Copy link

Has anyone solved the problem of generating a large number of line breaks?

@ArthurZucker
Copy link

You are most probably not using main.

@hijkw hijkw added the model-usage issues related to how models are used/loaded label Sep 6, 2023
@sootlasten
Copy link

sootlasten commented Sep 12, 2023

@ArthurZucker is correct. The model repeating a token ad infinitum is the results of the rope_theta param not being read in correctly from the model params. This requires transformers version >= 4.33.0 (see also this thread: https://huggingface.co/TheBloke/Phind-CodeLlama-34B-v1-GPTQ/discussions/2).

Closing this issue.

@sootlasten sootlasten self-assigned this Sep 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
model-usage issues related to how models are used/loaded
Projects
None yet
Development

No branches or pull requests

10 participants