`ORPOTrainer` fails with `flash-attention2` #1477

alvarobartt · 2024-03-24T10:21:58Z

Description

The ORPOTrainer fails when training using attn_implementation="flash_attention_2", since the cache is being used, and falls back to the default configuration i.e. padding_side="right" for the tokenizer in this case.

Bug in code

Missing use_cache=False to prevent the model from using the cache, to avoid issues with Flash Attention 2.

trl/trl/trainer/orpo_trainer.py

Lines 685 to 689 in 2ce8e45

    
           outputs = model( 
        
               concatenated_batch["concatenated_input_ids"], 
        
               attention_mask=concatenated_batch["concatenated_attention_mask"], 
        
               **model_kwargs, 
        
           )

The text was updated successfully, but these errors were encountered:

alvarobartt mentioned this issue Mar 24, 2024

Add use_cache=False in {ORPO,CPO}Trainer.concatenated_forward #1478

Merged

kashif closed this as completed in #1478 Mar 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`ORPOTrainer` fails with `flash-attention2` #1477

`ORPOTrainer` fails with `flash-attention2` #1477

alvarobartt commented Mar 24, 2024

ORPOTrainer fails with flash-attention2 #1477

ORPOTrainer fails with flash-attention2 #1477

Comments

alvarobartt commented Mar 24, 2024

Description

Bug in code

`ORPOTrainer` fails with `flash-attention2` #1477

`ORPOTrainer` fails with `flash-attention2` #1477