Splitting out the optimizer utility + LR scheduler for PoS #1320

Jemoka · 2023-12-13T05:37:42Z

This PR splits out the utility optimizer into separate parts for Bert and non-Bert; this method is backwards compatible, such that get_optimizer gets retained to be used for methods that have not yet been migrated, while a new get_split_optimizer is used for methods that desire an optimizer that has been split out.

Further, the optimizer takes an is_peft option which stages the optimizer to tune .parameters() of the Bert model instead of filtering for.named_parameters() which is selected. This allows HF Peft to do weird shenanigans to the parameters value, and we will—when the weights trainable is being constrained by the Peft library—tune what it tells us to tune instead of tuning things that we select via its name.

Taking advantage of this fact that we have a split optimizer now, Part of Speech tagging with Bert finetuning features a learning rate scheduler which a warmup and a linear decay if the user requests the Bert to be tuned.

AngledLuffa · 2023-12-13T07:34:56Z

stanza/models/common/utils.py

+        bert_parameters = [p for n, p in model.named_parameters() if p.requires_grad and n.startswith("bert_model.")]
+        bert_parameters = [{'param_group_name': 'bert', 'params': bert_parameters, 'lr': lr * bert_learning_rate}]
+    else:
+        # because PEFT handles what to hand to an optimizer, we don't want to touch that


what's the difference between the blocks? i would expect that named_parameters would include the parameters under model.bert_model with bert_model. as the prefix of the name

This was a typing mistake; it should be .parameters()—this was an oversight when applying patches. Apparently PEFT does weird shenanigans with the parameters list (the work with .default.lora_a, etc.) which shadows the original parameters and may not fit a specific name filter. This has been corrected in c72a3b9 which sets it correctly as .parameters() in the second case.

AngledLuffa · 2023-12-13T07:36:08Z

stanza/models/pos/trainer.py

@@ -48,7 +58,7 @@ def update(self, batch, eval=False):
            self.model.eval()
        else:
            self.model.train()
-            self.optimizer.zero_grad()
+            [i.zero_grad() for i in self.optimizers.values()]


doesn't need to go in a list, i would think

fixed in c72a3b9. apologies.

AngledLuffa · 2023-12-13T07:36:32Z

stanza/models/pos/trainer.py

@@ -59,7 +69,11 @@ def update(self, batch, eval=False):

        loss.backward()
        torch.nn.utils.clip_grad_norm_(self.model.parameters(), self.args['max_grad_norm'])
-        self.optimizer.step()
+
+        [i.step() for i in self.optimizers.values()]


again, maybe just for optimizer in self.optimizers.values(): optimizer.step()

addressed in c72a3b9

AngledLuffa · 2023-12-13T22:06:40Z

Looks pretty good to me, thanks. Let me try it out (hopefully tonight) before merging

Jemoka · 2023-12-13T22:46:01Z

No rush; will move on to PEFTing NER or depparse after I survive 109 final tonight; thanks as always!

…Bert; this method is backwards compatible, such that get_optimizer gets retained to be used for methods that have not yet been migrated, while a new get_split_optimizer is used for methods that desire an optimizer that has been split out. Further, the optimizer takes an is_peft option which stages the optimizer to tune .parameters() of the Bert model instead of filtering for.named_parameters() which is selected. This allows HF Peft to do weird shenanigans to the parameters value, and we will—when the weights trainable is being constrained by the Peft library—tune what it tells us to tune instead of tuning things that we select via its name. Taking advantage of this fact that we have a split optimizer now, Part of Speech tagging with Bert finetuning features a learning rate scheduler which a warmup and a linear decay if the user requests the Bert to be tuned.

…inetuning is doing what we want yet, though)

AngledLuffa · 2023-12-14T08:33:15Z

Thanks! Hopefully this helps peft be more effective for depparse or NER, even if it wasn't helping POS yet...

Jemoka requested a review from AngledLuffa December 13, 2023 05:37

AngledLuffa reviewed Dec 13, 2023

View reviewed changes

Jemoka requested a review from AngledLuffa December 13, 2023 21:22

AngledLuffa force-pushed the split-optim branch from c72a3b9 to 0f2d0ac Compare December 14, 2023 03:47

Jemoka and others added 2 commits December 13, 2023 23:13

Add a unit test which invokes the bert finetuning (no guarantee the f…

8f137a7

…inetuning is doing what we want yet, though)

AngledLuffa force-pushed the split-optim branch from 3ef1dbc to 8f137a7 Compare December 14, 2023 07:13

AngledLuffa merged commit a168a18 into dev Dec 14, 2023
1 check passed

AngledLuffa deleted the split-optim branch December 14, 2023 08:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Splitting out the optimizer utility + LR scheduler for PoS #1320

Splitting out the optimizer utility + LR scheduler for PoS #1320

Jemoka commented Dec 13, 2023 •

edited

AngledLuffa Dec 13, 2023

Jemoka Dec 13, 2023 •

edited

AngledLuffa Dec 13, 2023

Jemoka Dec 13, 2023

AngledLuffa Dec 13, 2023

Jemoka Dec 13, 2023

AngledLuffa commented Dec 13, 2023

Jemoka commented Dec 13, 2023 •

edited

AngledLuffa commented Dec 14, 2023

Splitting out the optimizer utility + LR scheduler for PoS #1320

Splitting out the optimizer utility + LR scheduler for PoS #1320

Conversation

Jemoka commented Dec 13, 2023 • edited

AngledLuffa Dec 13, 2023

Choose a reason for hiding this comment

Jemoka Dec 13, 2023 • edited

Choose a reason for hiding this comment

AngledLuffa Dec 13, 2023

Choose a reason for hiding this comment

Jemoka Dec 13, 2023

Choose a reason for hiding this comment

AngledLuffa Dec 13, 2023

Choose a reason for hiding this comment

Jemoka Dec 13, 2023

Choose a reason for hiding this comment

AngledLuffa commented Dec 13, 2023

Jemoka commented Dec 13, 2023 • edited

AngledLuffa commented Dec 14, 2023

Jemoka commented Dec 13, 2023 •

edited

Jemoka Dec 13, 2023 •

edited

Jemoka commented Dec 13, 2023 •

edited