[Bug] Huggingface backend seems to be broken #42

ganler · 2024-05-13T04:29:25Z

zyzzzz-123 · 2024-05-15T03:40:59Z

I've identified an issue with Huggingface's backend in their generation function: it unexpectedly retains the prompt portion due to its failure to omit special tokens. This mishap leads to the replies being a concatenation of prompt and the actual response.

To resolve this, truncating the generated content at the length of the initial input text effectively corrects the output, yielding results that align more closely with those from VLLM-backed models.

repoqa/repoqa/provider/hf.py def generate_reply()

prompt_tokens = self.tokenizer.apply_chat_template(
            construct_message_list(question, system_msg),
            return_tensors="pt",
            add_generation_prompt=True,
        ).cuda()

        model = self.hf_model
        if self.stop_seq:
            model = self.stop_sequencer.register_stop_texts(
                stop_texts=self.stop_seq,
                input_length=prompt_tokens.size(-1),
            )

        output_text = model.generate(
            input_ids=prompt_tokens,
            max_new_tokens=max_tokens,
            num_return_sequences=n,
            temperature=temperature,
            pad_token_id=self.tokenizer.eos_token_id,
            use_cache=True,
        )

        gen_strs = [
            self.tokenizer.decode(
                x, skip_special_tokens=True, clean_up_tokenization_spaces=False
            )
            for x in output_text
        ]

after changes:

prompt_tokens = self.tokenizer.apply_chat_template(
            construct_message_list(question, system_msg),add_generation_prompt=True, return_tensors="pt"
        ).cuda()
        **input_length = prompt_tokens.shape[1]**
        output_text = self.hf_model.generate(
            input_ids=prompt_tokens,
            max_new_tokens=max_tokens,
            num_return_sequences=n,
            temperature=temperature,
            pad_token_id=self.tokenizer.eos_token_id,
            use_cache=True,
        )

        gen_strs = [
            self.tokenizer.decode(
                **x[input_length:]**, skip_special_tokens=True, clean_up_tokenization_spaces=False
            )
            for x in output_text
        ]

I can submit a PR for you guys if you wish.

ganler · 2024-05-15T03:42:05Z

Excellent! A PR will be much appreciated!

ganler · 2024-05-19T01:19:43Z

@zyzzzz-123 Thanks for the investigation and I have fixed that. :)

ganler mentioned this issue May 13, 2024

Discrepancies in Model Evaluation Results for Mistral-7B-Instruct-v0.2 and phi-3-mini-128-instruct #41

Closed

ganler closed this as completed in b6578eb May 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Huggingface backend seems to be broken #42

[Bug] Huggingface backend seems to be broken #42

ganler commented May 13, 2024

zyzzzz-123 commented May 15, 2024 •

edited

Loading

ganler commented May 15, 2024

ganler commented May 19, 2024

[Bug] Huggingface backend seems to be broken #42

[Bug] Huggingface backend seems to be broken #42

Comments

ganler commented May 13, 2024

zyzzzz-123 commented May 15, 2024 • edited Loading

ganler commented May 15, 2024

ganler commented May 19, 2024

zyzzzz-123 commented May 15, 2024 •

edited

Loading