Question about `--per-sequence-loss` #182

Sanster · 2024-02-18T07:03:22Z

In generate_dataset.py, there is a --per-sequence-loss arg, which used in conversation_template.py. This parameter further adjusts the weights based on the length of each response.

openchat/ochat/config/conversation_template.py

Line 104 in 30da91b

if seq_level_weight:

I would like to know, when training the OpenChat series models, have you enabled this parameter? What is the impact of this parameter on the training results? Thanks

The text was updated successfully, but these errors were encountered:

imoneoi · 2024-02-23T12:30:48Z

When this parameter is enabled, losses are averaged on a per-sequence basis, otherwise on a per-token basis (same as HF trainer). It is disabled by default because it causes worse results in our experiments, making the model worse at longer responses.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about `--per-sequence-loss` #182

Question about `--per-sequence-loss` #182

Sanster commented Feb 18, 2024

imoneoi commented Feb 23, 2024

Question about --per-sequence-loss #182

Question about --per-sequence-loss #182

Comments

Sanster commented Feb 18, 2024

imoneoi commented Feb 23, 2024

Question about `--per-sequence-loss` #182

Question about `--per-sequence-loss` #182