-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate adapter for s3prl frontend #5609
Conversation
Cool! |
For the lora case, I didn't modified any code except for the entry to create_lora_adapter function. Did I make it clear? |
Cool. @ftshijt, by following the convention, it would be better to have the result and model link. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates! The current implementation seems cool to me, but I would like to hold more discussions on some design points.
espnet2/train/trainer.py
Outdated
if adapter == "lora" and lora is None: | ||
raise RuntimeError("Requiring loralib. Do 'pip install loralib'") | ||
|
||
# TODO: houlsby adapter may need S3PRL? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to make it general to other modules as well? If that's difficult, we may keep only to s3prl for now.
In that case, you may consider add a check to ensure that s3prl is installed when using the houlsby adapter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this update compatible with pretrained models which have configs use_lora
and save_lora_only
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late reply.
For @ftshijt questions, I think currently it's difficult to integrate adapters and lora together. The reason is adapter implementation needs to modify the forward function of SSL models, like here, while LoRa did not. Therefore, it may requires some efforts to integrate them together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late reply.
@simpleoier I think it's compatible but need to do some revision.
And I did it by rename use_lora
to use_adapter
(more general), and the same way with save_lora_only
.
The reason is that in most case, we add lora only on the pre-trained model. However, in SSL settings, most of the time we need to initialize a downstream model for each task. That is to say, when applying lora to SSL models as adapters, we need to save not only the lora parameters but also the downstream model and other tunable parameters. Therefore, I choose to save all parameters requires_grad = True
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made some revisions here.
Please check d7f0b39
espnet2/train/trainer.py
Outdated
# TODO: This kind of design may not facilitate the integration of other kind of the adapter | ||
# Maybe we can only save those with requires_grad = True ? | ||
# If we use lora.lora_state_dict, we will not save the downstream model in SSL settings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not have a clear question to this one. But I feel only save models with requires_grad = True is a bit risky.
Ping @sw005320 @wanchichen @simpleoier @pyf98 for some discussion here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so. One case is that it can fail if the pretrained SSL models are updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For that purpose, I would suggest we simply follow the setting with the current design.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason why I choose to save all parameters with requires_grad = True
is that in adapter setting, we usually need to save the downstream model or other tunable parameters together (Ex, the weighted_sum). Therefore, we may not able to follow what loralib did, which only save lora_layers.
I understand your concern about the current design. However, for the case SSL models are updated, I think it would be better to save the SSL weights (or I can not see why we need to update it without saving it).
If we have to follow what loralib does by offering an options which saves the adapter parameter only, then I would like to add another option for saving parameters with requires_grad = True
. In this case, we can fulfill the requirement but also make sure adapter and lora can work in s3prl settings.
Do you think it's a good idea?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made some revisions here.
Please check d7f0b39
This pull request is now in conflict :( |
Could you fix the conflicts and update the pre-trained model in the readme session? then we can move forward to merge it as it is a dependency for some other projects~ |
This pull request is now in conflict :( |
Please fix the above conflict. |
@@ -96,6 +96,66 @@ General steps to run tasks in LID trask are as follows: | |||
``` | |||
./run_multi.sh --asr_config <your_training_config> --duration {10min, 1h} --lid true --only_lid false | |||
``` | |||
## Adapter usage guidelines |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to note that the challenge does not allow the fine-tuning of SSL models?
@ftshijt, what do you think?
HoulsbyTransformerSentenceEncoderLayer = None | ||
else: | ||
|
||
class HoulsbyTransformerSentenceEncoderLayer(TransformerSentenceEncoderLayer): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that this class does not go through the test.
Can you check it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May you check this now?
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
It seems like the CI error happened due to some weird reason. Is this caused by my PR? |
It's not related to your PR. |
@ftshijt, do you know why codecov complains about this? If this is due to some issues in codecov, we can ignore it and merge this PR. |
Not exactly from the codecov, I think it is mainly due to an in-successfully running of other CI tests (i.e., the time out for vits decoding), which automatically stop some running CIs, resulting in no execuation of the test function from Stan. |
Thanks, @Stanwang1210! |
What?
The PR is to support add adapter to s3prl frontend SSL models.
I integrate it with original lora implementation.
Currently, it only supports houlsby adapter. However, we can easily integrate other types of adapter very soon.
Two config yaml give examples of the configuration of adapter
TODO: Havn't done with the adapter load/save function. Will handle it soon
Why?
The original lora implementation are not compatible with other kinds of adapter.
See also
#5034
@ftshijt