Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Extend HFTransformersForecaster for PEFT methods #6457

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

geetu040
Copy link
Contributor

Reference Issues/PRs

Fixes #6435.

What does this implement/fix? Explain your changes.

This PR extends the fit_strategy of HFTransformersForecaster for these PEFT methods

  1. LoRa
  2. LoHa
  3. AdaLora

Does your contribution introduce a new dependency? If yes, which one?

yes, it introduces peft

Did you add any tests for the change?

Yes, I have added the tests for fit_strategy param in test_hf_transformers_forecaster.py

PR checklist

For all contributions
  • I've added myself to the list of contributors with any new badges I've earned :-)
    How to: add yourself to the all-contributors file in the sktime root directory (not the CONTRIBUTORS.md). Common badges: code - fixing a bug, or adding code logic. doc - writing or improving documentation or docstrings. bug - reporting or diagnosing a bug (get this plus code if you also fixed the bug in the PR).maintenance - CI, test framework, release.
    See here for full badge reference
  • Optionally, for added estimators: I've added myself and possibly to the maintainers tag - do this if you want to become the owner or maintainer of an estimator you added.
    See here for further details on the algorithm maintainer role.
  • The PR title starts with either [ENH], [MNT], [DOC], or [BUG]. [BUG] - bugfix, [MNT] - CI, test framework, [ENH] - adding or improving code, [DOC] - writing or improving documentation or docstrings.

pyproject.toml Outdated Show resolved Hide resolved
@fkiraly fkiraly added module:forecasting forecasting module: forecasting, incl probabilistic and hierarchical forecasting enhancement Adding new functionality labels May 27, 2024
Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit surprised that a lot of the configs are hard coded.

E.g., r, lora_alpha, etc. I would suggest to allow the user to pass either the parameters as peft_config or similar, or directly in fit_strategy.

@@ -40,7 +43,8 @@ class HFTransformersForecaster(BaseForecaster):
Path to the huggingface model to use for forecasting. Currently,
Informer, Autoformer, and TimeSeriesTransformer are supported.
fit_strategy : str, default="minimal"
Strategy to use for fitting the model. Can be "minimal" or "full"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it needs to be much clearer to the user how the fine-tuning is done here - or that fit, in fact, is fine-tuning.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should also be possible to fine-tune the model with time series that are later not used in the forecast - but that might require the global forecasting interface to be in place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That will not be an issue because once the model has been configured to use peft in train, it will remain the same in predict as well

Copy link
Contributor

@benHeid benHeid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution. I have some requests regarding the configuration of the different testing strategies. Additionally, I would like to know why the new test file is required

@@ -227,6 +231,31 @@ def _fit(self, y, X, fh):
elif self.fit_strategy == "full":
for param in self.model.parameters():
param.requires_grad = True
elif self.fit_strategy == "lora":
peft_config = LoraConfig(
r=8,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The user would probably like to have control about the peft_configs. I.e., by providing a dict of parameters that is passed to the LoraConfig.

LoraConfig(**self.peft_config_dict)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure makes sense, 2 things:

  1. no need to give any default value to param peft_config_dict, right?
  2. should I also create an example with peft in docstring?

)
self.model = get_peft_model(self.model, peft_config)
elif self.fit_strategy == "loha":
peft_config = LoHaConfig(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seem comment above for lora.

)
self.model = get_peft_model(self.model, peft_config)
elif self.fit_strategy == "adalora":
peft_config = AdaLoraConfig(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above for Lora

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this test file required?

I suppose that these tests are covered by the automated tests that are triggered with the params set via get_test_params. Thus, I propose to add tests for the different fit strategies in get_test_params.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just thought get_test_params returning 5 set of parameters was too much, because check_estimator was taking too long to execute - I'll put them in get_test_params now

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can also try to find parameter sets with shorter fit or inference time, while maintaining coverage - is that possible?

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

validation_split : float, default=0.2
Fraction of the data to use for validation
config : dict, default={}
Configuration to use for the model. See the `transformers`
documentation for details.
peft_config_dict : dict, default={}
Configuration dictionary specifying parameters and settings relevant to
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please explain possible fields, or make them available directly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created an example in docstring for that. Instead I think I should link here the peft configuration documentation here

by "available directly" did you mean to set the default parameters to something other that {}? but the default one will not work for all peft methods as they take different params.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by "available directly" did you mean to set the default parameters to something other that {}? but the default one will not work for all peft methods as they take different params.

Yes, I meant as args. How bad is the variability in arguments? One option that you can take is always to take the union, and ignore arguments that are not applicable.

More quantitatively, how many config args are there in total, how many are common?

Instead I think I should link here the peft configuration documentation

That may be a good idea. Also, are there any common config sets that we could hardcode by a string?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First I don't think we can have params with a default value of dictionary object - flake 8 B006 Do not use mutable data structures for argument defaults

More quantitatively, how many config args are there in total, how many are common?

Mostly 4-5

are there any common config sets that we could hardcode by a string?

we can use these - https://github.com/sktime/sktime/blob/44580748a82c2c4139c6e79a05a5663b2949a9d0/sktime/forecasting/hf_transformers_forecaster.py#L234C1-L258C65

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only important param for peft_config is target_modules which has to be a list of strings. Rest of the params can be set by default.
I suggest that we only make 2 changes here

  1. add link for peft documentation
  2. set the default value of peft_config_dict to {"target_modules": ["q_proj", "v_proj"]} instead of {}, so user can run the code without any peft_config_dict argument and can also mutate it through argument

Copy link
Collaborator

@fkiraly fkiraly Jun 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I would suggest to add the link, but also document the most important params. You say there are only 4-5?

  2. yes, default settings in estimators should always run, and that should be tested by get_test_params.

Just following up on my query above:

  • how many parameters - inside the configs - are there in total?
  • how many are shared for different choices of PEFT method?

Copy link
Collaborator

@fkiraly fkiraly Jun 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First I don't think we can have params with a default value of dictionary object

Yes, one should never use mutable defaults as they can lead to hard to diagnose errors. But that does not mean you cannot have defaults for mutable parameters at all.

The way you set a mutable default is, you set the default to None, and you conditionally write to a private parameter, e.g., self._my_config, depending on the public value of self.my_config.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how many parameters - inside the configs - are there in total?
how many are shared for different choices of PEFT method?

15+ in total, 4-5 used commonly, 2 of which are common to all configs

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, I would have a light tendency towards making the common ones explicit, and pass the rest as dict.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the current implementation common parameter is provided as default and thus the code runs when parameters are set to default and users can provide additional params through peft_config_dict specific to a peft strategy

@geetu040 geetu040 requested a review from benHeid June 10, 2024 04:01
@geetu040 geetu040 requested a review from fkiraly June 10, 2024 04:01
validation_split : float, default=0.2
Fraction of the data to use for validation
config : dict, default={}
Configuration to use for the model. See the `transformers`
documentation for details.
peft_config_dict : dict, default={"target_modules": ["q_proj", "v_proj"]}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry with coming up with this so late. Would it be possible to just pass the LoraConfig/LohaConfig object from HuggingFace directly. In the code the only adaption would be get_peft_model(model, peft_config).

This should significantly reduce the maintainance workload for us since all new adaption methods would be directly available if HF releases a new Peft library version.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made the changes

  1. just to confirm we are originally backing out from this - so we have peft_config in param instead of peft_config_dict - but the user still has control so it should not be an issue

#6457 (comment)

The user would probably like to have control about the peft_configs. I.e., by providing a dict of parameters that is passed to the LoraConfig.

  1. fit_strategy can now be one of ["minimal", "full", "peft"] instead of ["minimal", "full", "lora", "loha", "adalora"]

@fkiraly
Copy link
Collaborator

fkiraly commented Jun 13, 2024

Tests fail - please ensure there are no syntax errors in the code when requesting a review.

@fkiraly
Copy link
Collaborator

fkiraly commented Jun 14, 2024

Btw, I am noticing that tests take very long, the estimator by itself seems to be adding 10min? Is there a way to choose a parameter set that makes the tests faster? Think smaller data set etc.

@geetu040
Copy link
Contributor Author

Btw, I am noticing that tests take very long, the estimator by itself seems to be adding 10min? Is there a way to choose a parameter set that makes the tests faster? Think smaller data set etc.

@fkiraly we are already running a single epoch, but I'll see if it can be made faster with some other batch size or config.
There are 3 test params

  1. for testing huggingface/informer-tourism-monthly model
  2. for testing huggingface/autoformer-tourism-monthly model
  3. for testing peft fit strategy

(2) and (3) can be merged to make one test case. would you agree with that?

And some test cases are failing without any log and it seems to be happening with only macos environment (not all macos environments)
Do you have any suggestions on that?

@geetu040
Copy link
Contributor Author

In the latest workflow test cases are under 10 minutes. Do we still need to work on them?

Although I have tried runs with different parameters, noting duration, giving these results

param 1: testing informer

1.48 s ± 108 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

param 2: testing autoformer

1.54 s ± 151 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

param 3: testing peft {r=8, lora_alpha=32, target_modules=["q_proj", "v_proj"]}

1.62 s ± 131 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

suggested: testing peft with smallest config {r=2, lora_alpha=8, target_modules=["q_proj"]}

1.49 s ± 135 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

@fkiraly @benHeid Should I replace the "param 3" with "suggested param"?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Adding new functionality module:forecasting forecasting module: forecasting, incl probabilistic and hierarchical forecasting
Projects
Status: Under review
Development

Successfully merging this pull request may close these issues.

[ENH] Add fine tuning methods using PEFT for HFTransformersForecaster
4 participants