Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding support for Distilbert #21

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

codetreras
Copy link

Based on the Bert's code for language modeling and text encoding tasks, these changes add support for DistilBert architecture #7 .

@matteo-grella
Copy link
Member

Thank you! What differs DistilBERT from BERT?

@codetreras
Copy link
Author

You're welcome, the project is awesome. The main differences are the configuration and the layers' identifiers. Architecturally, DistilBert has no token type embeddings or pooler. Check this image, in blue the equivalent layers, in orange the dissimilar ones.

Screenshot 2023-06-27 at 11 11 03 AM

At the beginning I thought about including DistilBert as a "variation" of Bert, however it would increase considerably the complexity of the code, here redundancy is necessary to make maintenance easier, let me know your thoughts.

@matteo-grella
Copy link
Member

matteo-grella commented Jun 29, 2023

@marco-nicola what do you think friend? I’ll go for it but a bit worried about code duplication for just a few differences.

@mooijtech
Copy link

mooijtech commented Jun 29, 2023

Preferably just use the DistilBERT config (extend code in BERT) so there's no need for duplicate code.

@codetreras
Copy link
Author

Got it, in that case extending the converter/preprocessing.go and converter/mapper.go for BERT would be the proper way to manage the differences in layer identifiers, together with the configuration. Let me know what you think, I can modify the PR for you to check this approach.

@mooijtech
Copy link

I'm looking into supporting flan-t5-* but so far I'm stuck since there are differences in the positional encoder (different weight key) so it currently fails when prompting due to some input being nil (it seems the second time round).

@matteo-grella
Copy link
Member

@mooijtech I am in vacation with family so it is a bit difficult for me to follow up on this now. I'll back to you next week and we'll figure it out together how to proceed with flan-t5-*!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants