Skip to content

Conversation

@Cyrilvallez
Copy link
Member

@Cyrilvallez Cyrilvallez commented Nov 24, 2025

What does this PR do?

As per the title. Bart used to check which weight was present instead of simply using the default one, as some main checkpoint have the wrong one saved. See here before we refactored the tied weight

cc @vasqu as you noticed the issue first!

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Contributor

@vasqu vasqu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx a lot, checked locally (+ with other different fixes) and the integration tests pass then. Just 2 nits but feel free to ignore, not super important

# Initialize weights and apply final processing
self.post_init()

def tie_weights(self, missing_keys: Optional[set[str]] = None, recompute_mapping: bool = True):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to add a FIXME/TODO here to cleanup after allowing the reverse direction in tied weights?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will need to remove all model-specific tie_weights anyway at that time, so fine like this IMO. A TODO does not add much value as we need to search for it and we don't always do anyway

"""
)
# Copied from transformers.models.bart.modeling_bart.BartForConditionalGeneration with Bart->BigBirdPegasus, BART->BIGBIRD_PEGASUS
# Except `tie_weights()`: everything else Copied from transformers.models.bart.modeling_bart.BartForConditionalGeneration with Bart->BigBirdPegasus, BART->BIGBIRD_PEGASUS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These other classes like QnA are always a mess in legacy models 😢

Tbh, I don't see much being copied in the first place: The forward uses ignore copy and then only init and resize_xxx remain if I see it correctly. I'd pro to just remove the copies altogether or just directly use them on the 3 ish functions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh indeed, did not notice that forward was skipping copy, good catch! Will update on the separate functions then!

@Cyrilvallez Cyrilvallez merged commit 2f7747c into main Nov 24, 2025
7 of 11 checks passed
@Cyrilvallez Cyrilvallez deleted the fix--bart branch November 24, 2025 14:14
@github-actions
Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: bart, bigbird_pegasus

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants