-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] design issue - mapping weight handling and fine-tuning for FM on forecaster interface #6580
Comments
Hello Franz! I have made good progress on LagLlama and should be done soon. Can I work on this next? Thank you! |
Sure! This is a design issue, so the way to work on this would be to propose interface patterns, with speculative code snippets first, or pointing out design decisions. |
I would like to clarify my understanding of the original diagram - I have created an edited version of the diagram above. It this diagram more or less correct?
There are 3 main use cases that I can think of right now for deep learning interfaced global forecasters
|
"correct" being a subjective term, it aligns with my understanding though. Minor comments:
Regarding your use cases: Re 1. yes, "zero shot" predictions are indeed a key subcase. In this case, the question is, how do we map the context? For zero-shot models, the context could be mapped into Re 2. Indeed, I have been thinking about a design with "two fits" as well, or using Re 3. yes, I would park this use case. We should keep it in mind though. There is a use case no.4, where a user fine-tunes their model, and wants to make the weights available as a serialized model, or as weights on hugging face. That is something imo we also should try support, and before use case 3. |
I'm pasting two designs which differ in where the context is passed. They are both applicable to zero-shot learning, but only design 2 is applicable to fine-tuning. The unpleasant point about the two designs is that they are discrepant designs in the case of zero shot learning. Design 1
Design 2
It is espcially vexing that the difference between 2A and 2B shows that it might not be possible to remain consistent both with the current main usage, and the fine-tuning usage in design 2. |
Since every deep learning model could be different, it may be useful to leverage specific deep learning For example: consider
Having tags could be extremely helpful for new users who don't know what model capabilities are available. For example, for A con for this idea is that having too many new |
Mhm... Perhaps, we can even do something like a compositor based structure here. E.g., if you are taking a look at PEFT library, it is just wrapping around the original model and behaves like a normal model. It is even possible to merge the weights into the original model to get the original model structure back. So I wonder if it would be possible if the Foundation Model are GlobalForecaster (perhaps a special type of global forecaster), and the fine-tuning module is of the same type as the foundation models. Interface wise it would than probably build as follows: class PEFTTunedModel(BaseGlobalForecaster):
def __init__(self, foundation_model, peft_config):
self.foundation_model = foundation_model
self.peft_config = peft_config
def fit(self, ....):
self._model = get_peft_model(self.foundation_model, peft_config)
# do global fit stuff
def predict(self, ...)
# do global predict stuff
def merge_weights(self, ....)
# Peft allows the merging of weights
# The merged model would have the same nn structure as type(self.foundation_model).
# Thus, the merge_weights method might even return a new instance of type(self.foundation_model)
def upload_weights(self, ....)
# Upload the weights or store the weights from the model inclusive of the adapters (or the merged weights) Drawbacks:
Benefits:
|
@julian-fong, very vaid points! I think we shoud write down speculative code for the use cases you have specified. I think most of your suggestions make sense, including for tags. However, we should:
@benHeid, I also think we need to think about the serialization pathways - we may not want to tie ourselves to one vendor (eg, hugging face, here) For the PEFT compositor to work, the deep learning based modes need some special interface point? |
Mhm, not sure what you are referring too, binding to a vendor would only be the case during upload or? If we are having HF models, we can store the weights also locally. Regarding the usage of PEFT, true this is from HF. However, I am not aware of any alternative. Furthermore, in my opinion, the compositor approach should also be applicable to other (perhaps existing fine-tuning libraries).
No, there needs no special interface point in general. However, in specific cases there might be restrictions e.g., there underlying model needs to be a transformer nn. E.g., according to the PEFT library, their LoRA implementation is applicable to all neural networks (perhaps only to all torch model, not sure about the tensorflow support). E.g., here is an example for using LoRA with a simple MLP: https://huggingface.co/docs/peft/developer_guides/custom_models Thus, I think it would be very cool if we are able to implement it as a compositor. |
I meant, the format or location of serialized models. HF is a good face to share them, but we should also think about users who may want this entirely "off the grid", e.g., in a closed code base.
Are you sure? Does it not at least need to be a neural network? |
Yes,
Oh, yes I misunderstood it. Yes, it needs to be a neural network. Perhaps even a torch model. |
How about using |
Make sense. So the reversed version would look like: class PEFTTunedModel(BaseGlobalForecaster):
def __init__(self, foundation_model, peft_config):
self.foundation_model = foundation_model
self.peft_config = peft_config
def fit(self, ....):
self._model = get_peft_model(self.foundation_model, peft_config)
# do global fit stuff
def predict(self, ...)
# do global predict stuff
def merge_weights(self, ....)
# Peft allows the merging of weights
# The merged model would have the same nn structure as type(self.foundation_model).
# Thus, the merge_weights method might even return a new instance of type(self.foundation_model)
def save(self, ....)
# Save the unmerged weights.
# To save merged weights, the user needs to use: model.merge_weights().save()
# TODO: How to control serialization method? Via argument of save or class parameter?
def load(self, ..)
# Load unmerged weights. A further open TODO is that we must define an interface for getting the neural network for which we are applying the PEFT methods. E.g., we could let all deep learning methods that should be finetunable inherit from a common Base class that enforces that a method |
Yes, this would be a good idea to implement as PEFT is more general for all transformers and most of the neural network architectures providing a good level of abstraction for sktime users to finetune foundation models and get there own model. Things I'm concerned about is every foundation model has different type of data design on which it was trained which directly corresponds to the architecture of the model. i.e the number of input parameters of the forward method. So how does that work for PEFT models, I think the data needs to be preprocessed to make it compatible with the input layers of the model. The format for the data also needs to be hugging face So the workflow will be something like,
Other functions would be easy to implement if we get this working atleast. For load and save the other formats used widely could be |
You would probably have to design conversion methods from sktime pandas Dataframes into huggingface's datasetdicts if the design method involves wrapping around edit: now that we have our own huggingface org, why not try to upload an open source dataset onto it and see how that works? 😆 |
The underlying model has to implement it too e.g. each of the foundation models. Thus, I would aim for building a proper interface in the DL based forecaster to get the dataset. We have to adapt it anyways to get the underlying neural network. This evening, I try to post an extended interface design including also the interface for Models that can be passed to the PEFTForecaster. |
Yes that's where I'm pointing too. Currently what I see in sktime is models do preprocessing in the fit method, maybe the |
Design issue for consolidating thoughts on how to map weight handling and fine-tuning for FM on the forecaster interface.
I've summarized the conceptual model involving fitting and fine-tuning, and the various interface points that we need to match:
Key observations:
The above is for foundation models to which the vendors do not give the user access to the original training algorithm or corpus.
The text was updated successfully, but these errors were encountered: