Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about --per-sequence-loss #182

Open
Sanster opened this issue Feb 18, 2024 · 1 comment
Open

Question about --per-sequence-loss #182

Sanster opened this issue Feb 18, 2024 · 1 comment

Comments

@Sanster
Copy link

Sanster commented Feb 18, 2024

In generate_dataset.py, there is a --per-sequence-loss arg, which used in conversation_template.py. This parameter further adjusts the weights based on the length of each response.

if seq_level_weight:

I would like to know, when training the OpenChat series models, have you enabled this parameter? What is the impact of this parameter on the training results? Thanks

@imoneoi
Copy link
Owner

imoneoi commented Feb 23, 2024

When this parameter is enabled, losses are averaged on a per-sequence basis, otherwise on a per-token basis (same as HF trainer). It is disabled by default because it causes worse results in our experiments, making the model worse at longer responses.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants