-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to use SLED with custom base models that are not on huggingface? #1
Comments
Hi @mdrpanwar |
Hi @Mivg, Thanks for replying. My question and request were the following: Please feel free to close this issue. I shall get back when I have a more concrete requirement for a specific model. Thanks. |
Hi @mdrpanwar Thanks for the details. Sure, that makes sense and there is no reason SLED could not support it. In any case, supporting the above should be rather straightforward. the other possible solution assuming only the answer to the first question is yes is to support something like SledForConditionalGeneration.wrap_model(backbone_model) and use it instead of the from_pretrained initialization |
Hi @Mivg, Thank you for your detailed response. It is fine to assume that base models are written in PyTorch. Beyond that, there are two classes of base models: Ideally, we would like to support both 1 and 2 to be exhaustive; but 1 already covers a large number of possible base models. So, we can start with 1 and gradually support 2 over time if you think it to be a valid use case. |
Hello @Mivg, is there any update on this issue? Can I use SLED in other HF models? @Mivg can I do something like that:
Would this code enable SLED on Flan-T5? |
@mdrpanwar please, would you help? Were you able to use SLED with other LMs in HF? |
Hi,
Thanks for releasing the code for SLED.
The README suggests editing the config appropriately to use SLED with other base models from hugging face. However, this only works with hugging face models. Is there a way to interface SLED with other models that are not on hugging face?
A description of how to go about that and what code changes (in SLED and in the base model) might be needed would be really helpful.
Thanks!
The text was updated successfully, but these errors were encountered: