Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Huggingface backend seems to be broken #42

Closed
ganler opened this issue May 13, 2024 · 3 comments
Closed

[Bug] Huggingface backend seems to be broken #42

ganler opened this issue May 13, 2024 · 3 comments

Comments

@ganler
Copy link
Member

ganler commented May 13, 2024

#41

@zyzzzz-123
Copy link

zyzzzz-123 commented May 15, 2024

I've identified an issue with Huggingface's backend in their generation function: it unexpectedly retains the prompt portion due to its failure to omit special tokens. This mishap leads to the replies being a concatenation of prompt and the actual response.

To resolve this, truncating the generated content at the length of the initial input text effectively corrects the output, yielding results that align more closely with those from VLLM-backed models.

repoqa/repoqa/provider/hf.py def generate_reply()

prompt_tokens = self.tokenizer.apply_chat_template(
            construct_message_list(question, system_msg),
            return_tensors="pt",
            add_generation_prompt=True,
        ).cuda()

        model = self.hf_model
        if self.stop_seq:
            model = self.stop_sequencer.register_stop_texts(
                stop_texts=self.stop_seq,
                input_length=prompt_tokens.size(-1),
            )

        output_text = model.generate(
            input_ids=prompt_tokens,
            max_new_tokens=max_tokens,
            num_return_sequences=n,
            temperature=temperature,
            pad_token_id=self.tokenizer.eos_token_id,
            use_cache=True,
        )

        gen_strs = [
            self.tokenizer.decode(
                x, skip_special_tokens=True, clean_up_tokenization_spaces=False
            )
            for x in output_text
        ]

after changes:

prompt_tokens = self.tokenizer.apply_chat_template(
            construct_message_list(question, system_msg),add_generation_prompt=True, return_tensors="pt"
        ).cuda()
        **input_length = prompt_tokens.shape[1]**
        output_text = self.hf_model.generate(
            input_ids=prompt_tokens,
            max_new_tokens=max_tokens,
            num_return_sequences=n,
            temperature=temperature,
            pad_token_id=self.tokenizer.eos_token_id,
            use_cache=True,
        )

        gen_strs = [
            self.tokenizer.decode(
                **x[input_length:]**, skip_special_tokens=True, clean_up_tokenization_spaces=False
            )
            for x in output_text
        ]

I can submit a PR for you guys if you wish.

@ganler
Copy link
Member Author

ganler commented May 15, 2024

Excellent! A PR will be much appreciated!

@ganler ganler closed this as completed in b6578eb May 19, 2024
@ganler
Copy link
Member Author

ganler commented May 19, 2024

@zyzzzz-123 Thanks for the investigation and I have fixed that. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants