-
Notifications
You must be signed in to change notification settings - Fork 301
Port bart transformer checkpoint #1783
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Just minor comments. Unlike mistral I think we can land with the test against the bart base version (though a tiny checkpoint would for tests would still be great).
Needs merge conflict resolution because of albert I think.
class TestTask(TestCase): | ||
@pytest.mark.large | ||
def test_convert_tiny_preset(self): | ||
model = BartSeq2SeqLM.from_preset("hf://facebook/bart-base") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
less urgent than the 7b mistral model, but we might want to consider making a tiny test checkpoint for this model too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added tiny-bart-test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awesome work! thank you!
(i'll merge as soon as testing finishes if all is green) |
Oops, I forgot to run our large testing for this. This is causing test failures, see below.... This is because we are not setting some weight (with shape @cosmo3769 can you take a look?
|
@mattdangerw, I looked into it. I find that: keras: It will cause shape mismatch. |
Well, slicing the first two elements works in the case:
|
@cosmo3769 thanks for the quick reply! Is there an extra two embedding positions being added in the HF implementation for position embeddings? Why? Might be worth looking into. To unbreak CI, sounds like we can just just do this?
Doing a small CI to turn testing green again would be helpful as we dig deeper here. I wouldn't hardcode the "1024" part if we can avoid it. |
Yeah, I hardcoded the above value for reference point. For now, let me raise a quick MR with the changes |
Hi @mattdangerw @ariG23498,
Ported bart transformers checkpoint in kerasNLP. Please check. Thank you!