Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use SLED with custom base models that are not on huggingface? #1

Open
mdrpanwar opened this issue Aug 19, 2022 · 6 comments
Open

Comments

@mdrpanwar
Copy link

Hi,

Thanks for releasing the code for SLED.

The README suggests editing the config appropriately to use SLED with other base models from hugging face. However, this only works with hugging face models. Is there a way to interface SLED with other models that are not on hugging face?
A description of how to go about that and what code changes (in SLED and in the base model) might be needed would be really helpful.

Thanks!

@Mivg
Copy link
Owner

Mivg commented Sep 12, 2022

Hi @mdrpanwar
Thanks for your question. Any model checkpoint that can be loaded with HuggingFace can be used, even if it is not pushed as a model card.
However, in case you have a custom model with no AutoClass functionality, it will indeed not work in its current form.
Can you please add some details on what you have and try to achieve and I'll try to add support for that?

@mdrpanwar
Copy link
Author

Hi @Mivg,

Thanks for replying.

My question and request were the following:
The official code of new transformer models is not always released in the form of the Hugging Face models having the AutoClass functionality. So, the current implementation of SLED restricts the direct usage of such base models. I was hoping for a more general implementation that can take in any base model implemented in PyTorch regardless of the AutoClass. Perhaps it will require more work. Is this something you are targetting in near future?

Please feel free to close this issue. I shall get back when I have a more concrete requirement for a specific model.

Thanks.

@Mivg
Copy link
Owner

Mivg commented Sep 14, 2022

Hi @mdrpanwar

Thanks for the details. Sure, that makes sense and there is no reason SLED could not support it.
Before I think up a possible solution, I want to be precise on the goal. Is it correct to assume your model is implemented in PyTorch and inherits from PreTrainedModel (part of transformers) but is just not registered to be used as an AutoClass? I.e. you are able to to do model = MyCustomModel(...) and pass it to the trainer as if it was e.g. BART?
If so, do you also have a custom config class that inherits from PretrainedConfig?
Finally, if the two above are true, does your model support MyCustomModel.from_pretrained('some local checkpoint')?

In any case, supporting the above should be rather straightforward. the other possible solution assuming only the answer to the first question is yes is to support something like SledForConditionalGeneration.wrap_model(backbone_model) and use it instead of the from_pretrained initialization

@mdrpanwar
Copy link
Author

Hi @Mivg,

Thank you for your detailed response. It is fine to assume that base models are written in PyTorch. Beyond that, there are two classes of base models:
1. The base model is written using Hugging Face's transformers library. In this case, it is fair to assume that it inherits from PreTrainedModel and the custom config class inherits from PretrainedConfig. However, for wider applicability, we can only assume the former to be true.
2. The base model is not written using transformers library (written only using PyTorch or using some other library e.g. fairseq). In this case, we need to come up with some minimal interface that is expected of the base model such that it can be used under SLED framework.

Ideally, we would like to support both 1 and 2 to be exhaustive; but 1 already covers a large number of possible base models. So, we can start with 1 and gradually support 2 over time if you think it to be a valid use case.

@leoribeiro
Copy link

Hello @Mivg, is there any update on this issue? Can I use SLED in other HF models?

@Mivg can I do something like that:

import sled
from transformers import AutoModelForSeq2SeqLM
config = AutoConfig.from_pretrained("google/flan-t5-small")
config["model_type"] = "tau/sled"
config["underlying_config"] = "facebook/bart-base"
config["context_size"] =  256
config["window_fraction"] =  0.5
config["prepend_prefix"] =  true
config["encode_prefix"] =  true
config["sliding_method"] =  "dynamic"

model = AutoModelForSeq2SeqLM.from_pretrained("google/flan-t5-small", config=config)
tokenizer = AutoTokenizer.from_pretrained("google/flan-t5-small")

Would this code enable SLED on Flan-T5?

@leoribeiro
Copy link

@mdrpanwar please, would you help? Were you able to use SLED with other LMs in HF?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants