Skip to content

Conversation

kashif
Copy link
Contributor

@kashif kashif commented Sep 2, 2025

What does this PR do?

  • Ensured that the attention_mask is always checked for causality before being added to the buffer, and that this validation is performed only once for performance.
  • Modified the logic so that the attention_mask is appended to buffers only after successful validation, preventing non-causal masks from being used.

@kashif kashif requested a review from SunMarc September 2, 2025 11:00
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Member

@SunMarc SunMarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks !

@SunMarc SunMarc enabled auto-merge (squash) September 2, 2025 16:29
@SunMarc
Copy link
Member

SunMarc commented Sep 2, 2025

cc @S1ro1 for visibility

@S1ro1
Copy link
Member

S1ro1 commented Sep 2, 2025

It's useless to add this to buffers no? CP doesn't work with explicitly passed attention mask so in accelerate we attach a hook that pops it out

@kashif
Copy link
Contributor Author

kashif commented Sep 2, 2025

i was getting errors in the SFT Trainer since we use the attention mask for metrics and its shape was not matching

@kashif
Copy link
Contributor Author

kashif commented Sep 2, 2025

@S1ro1 the entropy metric for example is erroring out:

https://github.com/huggingface/trl/blob/main/trl/trainer/sft_trainer.py#L1041-L1056

@S1ro1
Copy link
Member

S1ro1 commented Sep 2, 2025

Okay, ig makes sense. But in this case it's not entirely correct to shard mask across dim(1), as mask should be sharded across dim(1) and dim(2) which is not expressable in torch DTensor placements, so just to look after that.

@kashif
Copy link
Contributor Author

kashif commented Sep 3, 2025

@S1ro1 good point! How about we only split the 2d masks?

@SunMarc SunMarc merged commit acc968c into main Sep 3, 2025
25 checks passed
@SunMarc SunMarc deleted the kashif-patch-1 branch September 3, 2025 10:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants