Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Peft sentiment #1335

Merged
merged 8 commits into from
Jan 27, 2024
Merged

Peft sentiment #1335

merged 8 commits into from
Jan 27, 2024

Conversation

AngledLuffa
Copy link
Collaborator

Add a PEFT wrapper for the Sentiment training.

Works quite well on English, actually, even without splitting the optimizer or implementing any form of scheduling.
With no finetuning, adding electra-large to the 3 class English dataset (SST plus a few other pieces) gets 70 Macro F1.
The base finetuning gets between 74-75 macro F1 on sstplus, but frequently fails to successfully train, getting somewhere around 60 F1
Training with PEFT gets in the 74-75 F1 range each time, with no failures observed so far.

Adds a chunk of test to the sentiment training which starts the Pipeline with a peft-trained model

Also included is adding a uses-charlm flag to the config, so that inadvertently passing a charlm (such as via Pipeline) to the sentiment model doesn't blow up if it was trained w/o a charlm

…arlm. This is hard to diagnose for the models which were not previously saved with this information
Copy link
Member

@Jemoka Jemoka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some clarification questions. See comments below; thanks in advance!

target_modules=["query", "value", "output.dense", "intermediate.dense"], # self.config.lora_targets,
lora_alpha=128, #self.config.lora_alpha,
lora_dropout=0.1, #self.config.lora_dropout,
modules_to_save=["pooler"], # self.config.lora_fully_tune,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't exist for all languages' BERTs; and the NER results I reported didn't use this. how much improvement /size tradeoff does this confer?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I can run that experiment.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the experiments were run with electra, which doesn't have a pooler. listing it as fully trained doesn't make a difference there. can rerun the trial with roberta. i remember finetuning the pooler to be important with coref, although i don't have the results in front of me

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Averaged over 4 runs with roberta-large, I got 0.7391 F1 w/ the pooler and 0.7422 w/o. Fully training the pooler increased the size from 163M to 167M. My conclusion is that we don't train the pooler for sentiment. I can always try pefting it instead of fully training it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

never mind on that, can't peft a pooler

stanza/models/classifiers/cnn_classifier.py Outdated Show resolved Hide resolved
Works quite well on English, actually, even without splitting the optimizer or implementing any form of scheduling.
With no finetuning, adding electra-large to the 3 class English dataset (SST plus a few other pieces) gets 70 Macro F1.
The base finetuning gets between 74-75 macro F1 on sstplus, but frequently fails to successfully train, getting somewhere around 60 F1
Training with PEFT gets in the 74-75 F1 range each time, with no failures observed so far.

Adds a chunk of test to the sentiment training which starts the Pipeline with a peft-trained model
@AngledLuffa AngledLuffa force-pushed the peft_sentiment branch 2 times, most recently from 8e68ffa to 9240f37 Compare January 26, 2024 09:11
Ran an experiment with 4x models where finetuning the pooler for
roberta-large on sstplus got 0.7391 average F1 whereas not finetuning
got 0.7422 average.  Considering the model size difference (163M ->
167M) it seems not worthwhile to finetune this layer.
@AngledLuffa AngledLuffa merged commit 09da9ce into dev Jan 27, 2024
1 check passed
@AngledLuffa AngledLuffa deleted the peft_sentiment branch January 27, 2024 03:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants