-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] Extend HFTransformersForecaster for PEFT methods #6457
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit surprised that a lot of the configs are hard coded.
E.g., r
, lora_alpha
, etc. I would suggest to allow the user to pass either the parameters as peft_config
or similar, or directly in fit_strategy
.
@@ -40,7 +43,8 @@ class HFTransformersForecaster(BaseForecaster): | |||
Path to the huggingface model to use for forecasting. Currently, | |||
Informer, Autoformer, and TimeSeriesTransformer are supported. | |||
fit_strategy : str, default="minimal" | |||
Strategy to use for fitting the model. Can be "minimal" or "full" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it needs to be much clearer to the user how the fine-tuning is done here - or that fit
, in fact, is fine-tuning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should also be possible to fine-tune the model with time series that are later not used in the forecast - but that might require the global forecasting interface to be in place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That will not be an issue because once the model has been configured to use peft in train, it will remain the same in predict as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your contribution. I have some requests regarding the configuration of the different testing strategies. Additionally, I would like to know why the new test file is required
@@ -227,6 +231,31 @@ def _fit(self, y, X, fh): | |||
elif self.fit_strategy == "full": | |||
for param in self.model.parameters(): | |||
param.requires_grad = True | |||
elif self.fit_strategy == "lora": | |||
peft_config = LoraConfig( | |||
r=8, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The user would probably like to have control about the peft_configs. I.e., by providing a dict of parameters that is passed to the LoraConfig.
LoraConfig(**self.peft_config_dict)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure makes sense, 2 things:
- no need to give any default value to param
peft_config_dict
, right? - should I also create an example with peft in docstring?
) | ||
self.model = get_peft_model(self.model, peft_config) | ||
elif self.fit_strategy == "loha": | ||
peft_config = LoHaConfig( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seem comment above for lora.
) | ||
self.model = get_peft_model(self.model, peft_config) | ||
elif self.fit_strategy == "adalora": | ||
peft_config = AdaLoraConfig( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment above for Lora
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this test file required?
I suppose that these tests are covered by the automated tests that are triggered with the params set via get_test_params
. Thus, I propose to add tests for the different fit strategies in get_test_params
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just thought get_test_params
returning 5 set of parameters was too much, because check_estimator
was taking too long to execute - I'll put them in get_test_params
now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can also try to find parameter sets with shorter fit or inference time, while maintaining coverage - is that possible?
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
validation_split : float, default=0.2 | ||
Fraction of the data to use for validation | ||
config : dict, default={} | ||
Configuration to use for the model. See the `transformers` | ||
documentation for details. | ||
peft_config_dict : dict, default={} | ||
Configuration dictionary specifying parameters and settings relevant to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please explain possible fields, or make them available directly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created an example in docstring for that. Instead I think I should link here the peft configuration documentation here
by "available directly" did you mean to set the default parameters to something other that {}? but the default one will not work for all peft methods as they take different params.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by "available directly" did you mean to set the default parameters to something other that {}? but the default one will not work for all peft methods as they take different params.
Yes, I meant as args. How bad is the variability in arguments? One option that you can take is always to take the union, and ignore arguments that are not applicable.
More quantitatively, how many config args are there in total, how many are common?
Instead I think I should link here the peft configuration documentation
That may be a good idea. Also, are there any common config sets that we could hardcode by a string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First I don't think we can have params with a default value of dictionary object - flake 8 B006 Do not use mutable data structures for argument defaults
More quantitatively, how many config args are there in total, how many are common?
Mostly 4-5
are there any common config sets that we could hardcode by a string?
we can use these - https://github.com/sktime/sktime/blob/44580748a82c2c4139c6e79a05a5663b2949a9d0/sktime/forecasting/hf_transformers_forecaster.py#L234C1-L258C65
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only important param for peft_config
is target_modules
which has to be a list of strings. Rest of the params can be set by default.
I suggest that we only make 2 changes here
- add link for peft documentation
- set the default value of
peft_config_dict
to{"target_modules": ["q_proj", "v_proj"]}
instead of{}
, so user can run the code without any peft_config_dict argument and can also mutate it through argument
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
I would suggest to add the link, but also document the most important params. You say there are only 4-5?
-
yes, default settings in estimators should always run, and that should be tested by
get_test_params
.
Just following up on my query above:
- how many parameters - inside the configs - are there in total?
- how many are shared for different choices of PEFT method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First I don't think we can have params with a default value of dictionary object
Yes, one should never use mutable defaults as they can lead to hard to diagnose errors. But that does not mean you cannot have defaults for mutable parameters at all.
The way you set a mutable default is, you set the default to None
, and you conditionally write to a private parameter, e.g., self._my_config
, depending on the public value of self.my_config
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how many parameters - inside the configs - are there in total?
how many are shared for different choices of PEFT method?
15+ in total, 4-5 used commonly, 2 of which are common to all configs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, I would have a light tendency towards making the common ones explicit, and pass the rest as dict.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the current implementation common parameter is provided as default and thus the code runs when parameters are set to default and users can provide additional params through peft_config_dict
specific to a peft strategy
validation_split : float, default=0.2 | ||
Fraction of the data to use for validation | ||
config : dict, default={} | ||
Configuration to use for the model. See the `transformers` | ||
documentation for details. | ||
peft_config_dict : dict, default={"target_modules": ["q_proj", "v_proj"]} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry with coming up with this so late. Would it be possible to just pass the LoraConfig/LohaConfig object from HuggingFace directly. In the code the only adaption would be get_peft_model(model, peft_config)
.
This should significantly reduce the maintainance workload for us since all new adaption methods would be directly available if HF releases a new Peft library version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have made the changes
- just to confirm we are originally backing out from this - so we have
peft_config
in param instead ofpeft_config_dict
- but the user still has control so it should not be an issue
The user would probably like to have control about the peft_configs. I.e., by providing a dict of parameters that is passed to the LoraConfig.
fit_strategy
can now be one of["minimal", "full", "peft"]
instead of["minimal", "full", "lora", "loha", "adalora"]
Tests fail - please ensure there are no syntax errors in the code when requesting a review. |
Btw, I am noticing that tests take very long, the estimator by itself seems to be adding 10min? Is there a way to choose a parameter set that makes the tests faster? Think smaller data set etc. |
@fkiraly we are already running a single epoch, but I'll see if it can be made faster with some other batch size or config.
(2) and (3) can be merged to make one test case. would you agree with that? And some test cases are failing without any log and it seems to be happening with only macos environment (not all macos environments) |
In the latest workflow test cases are under 10 minutes. Do we still need to work on them? Although I have tried runs with different parameters, noting duration, giving these results param 1: testing informer
param 2: testing autoformer
param 3: testing peft
suggested: testing peft with smallest config
@fkiraly @benHeid Should I replace the "param 3" with "suggested param"? |
Reference Issues/PRs
Fixes #6435.
What does this implement/fix? Explain your changes.
This PR extends the
fit_strategy
ofHFTransformersForecaster
for thesePEFT
methodsDoes your contribution introduce a new dependency? If yes, which one?
yes, it introduces
peft
Did you add any tests for the change?
Yes, I have added the tests for
fit_strategy
param intest_hf_transformers_forecaster.py
PR checklist
For all contributions
How to: add yourself to the all-contributors file in the
sktime
root directory (not theCONTRIBUTORS.md
). Common badges:code
- fixing a bug, or adding code logic.doc
- writing or improving documentation or docstrings.bug
- reporting or diagnosing a bug (get this pluscode
if you also fixed the bug in the PR).maintenance
- CI, test framework, release.See here for full badge reference
maintainers
tag - do this if you want to become the owner or maintainer of an estimator you added.See here for further details on the algorithm maintainer role.