feat: automatically select LoRA modules when none are provided by RobotSail · Pull Request #166 · instructlab/training

RobotSail · 2024-08-09T21:31:19Z

In the current version of the training library, we have the default value of target_modules set to
a list oflayer names which are implementation-specific and may not reflect what a given model actually
uses for the layer names. Furthermore, the default is also a subset of all projection layers in most models,
and the recommendation is generally to use all of these layers when injecting low rank adapters.

This commit resolves that issue by introducing logic to automatically resolve the target modules
and default to using all of them when they are not provided. This commit also adds validation logic
which indicates when some of the provided modules do not exist in the model. To go a step further,
the training library will also now error out when none of the provided target modules exist in the model,
supplying the user with additional context on which modules exist and how they could resolve the error

Signed-off-by: Oleg S ec2-user@ip-10-0-24-47.us-east-2.compute.internal

RobotSail · 2024-08-09T21:36:59Z

resolves #164

fabianlim · 2024-08-19T14:03:29Z

src/instructlab/training/utils.py

+    """
+    Given a pretrained model, returns all of the projection layers (matching '_proj')
+    """
+    proj_layers = set(name.split('.')[-1] for name, _ in model.named_modules() if name.endswith("_proj"))


if this is for llama only its fine, but in general models do not always have the naming k_proj, v_proj, etc.

another alternative if you want to target all the linears, is to do isinstance (mod, torch.nn.Linear)

@fabianlim The models we actively support are listed here:

training/src/instructlab/training/main_ds.py

Line 158 in 9e2ac74

assert model.__class__.__name__ in [

When I looked at their list of layers, all of them had k_proj, q_proj, v_proj, o_proj, so the assumption is true at least for supported models.

You're right though, we could go down the path of targeting all linear layers. I just have two questions about this approach:

How would this affect the memory requirements?

What would be the impact on training times? If we are targeting more modules for LoRA, we would potentially be dropping more pretrained weights in favor of our LoRA approximations - how would this impact the loss curve?

If it is as you said. That you are targeting all the proj, then it should be equivalent to putting a Lora adapter on all linears

@fabianlim Not necessarily. Some models will use Linear layers which are not explicitly labeled as projections. For example, in starcoder-3b, these account roughly 1.2B parameters:

we should NOT use model specific names to do LoRA since model architectures are subject to change, and we might even start supporting too many of them..

Maxusmusti

LGTM

aldopareja

I think this look alright, just a minor comment that can be ignored for the moment.

aldopareja · 2024-08-19T17:26:37Z

src/instructlab/training/utils.py

+    """
+    Given a pretrained model, returns all of the projection layers (matching '_proj')
+    """
+    proj_layers = set(name.split('.')[-1] for name, _ in model.named_modules() if name.endswith("_proj"))


we should NOT use model specific names to do LoRA since model architectures are subject to change, and we might even start supporting too many of them..

aldopareja · 2024-08-29T16:44:46Z

src/instructlab/training/main_ds.py

        )
-        command.extend(train_args.lora.target_modules)
+        if train_args.lora.target_modules:
+            command.extend(train_args.lora.target_modules)


should we have this only for granite models?, how about non-granite models?

Specifying the target modules? It should be fine for all models, since we may want to target different modules depending on what we want them to learn

JamesKunstle

lgtm

mergify · 2024-09-26T15:02:46Z

This pull request has merge conflicts that must be resolved before it can be
merged. @RobotSail please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

RobotSail · 2024-09-30T14:56:02Z

@mergify rebase

mergify · 2024-09-30T14:56:18Z

rebase

☑️ Nothing to do

Details

-conflict [📌 rebase requirement]
-closed [📌 rebase requirement]
queue-position = -1 [📌 rebase requirement]
any of:
- #commits-behind > 0 [📌 rebase requirement]
- #commits > 1 [📌 rebase requirement]
- -linear-history [📌 rebase requirement]

JamesKunstle · 2024-10-01T19:44:26Z

@RobotSail Is this good to go post-rebase?

RobotSail · 2024-10-01T19:52:48Z

@JamesKunstle Yeah it should be. Mergify didn't merge it automatically

In the current version of the training library, we have the default value of target_modules set to a list oflayer names which are implementation-specific and may not reflect what a given model actually uses for the layer names. Furthermore, the default is also a subset of all projection layers in most models, and the recommendation is generally to use all of these layers when injecting low rank adapters. This commit resolves that issue by introducing logic to automatically resolve the target modules and default to using all of them when they are not provided. This commit also adds validation logic which indicates when some of the provided modules do not exist in the model. To go a step further, the training library will also now error out when none of the provided target modules exist in the model, supplying the user with additional context on which modules exist and how they could resolve the error Signed-off-by: Oleg S <97077423+RobotSail@users.noreply.github.com>

RobotSail force-pushed the auto-lora branch from c59ef0d to 67b20b9 Compare August 9, 2024 21:35

RobotSail requested review from JamesKunstle, Maxusmusti and cdoern August 9, 2024 21:37

RobotSail force-pushed the auto-lora branch from 67b20b9 to aaa1afb Compare August 9, 2024 21:40

fabianlim reviewed Aug 19, 2024

View reviewed changes

RobotSail requested review from aldopareja and removed request for cdoern August 19, 2024 14:15

RobotSail force-pushed the auto-lora branch 3 times, most recently from e2ab9dd to dd1cb74 Compare August 19, 2024 19:14

ktam3 added the jira label Aug 27, 2024

RobotSail force-pushed the auto-lora branch from dd1cb74 to fdd16c5 Compare August 28, 2024 14:30

Maxusmusti approved these changes Aug 29, 2024

View reviewed changes

mergify bot added the one-approval label Aug 29, 2024

aldopareja approved these changes Aug 29, 2024

View reviewed changes

mergify bot removed the one-approval label Aug 29, 2024

JamesKunstle approved these changes Aug 29, 2024

View reviewed changes

mergify bot added the needs-rebase label Sep 26, 2024

ktam3 linked an issue Oct 2, 2024 that may be closed by this pull request

Provide advisory when lora target modules dont exist #164

Closed

RobotSail force-pushed the auto-lora branch from fdd16c5 to 40c6b4e Compare October 4, 2024 13:51

mergify bot removed the needs-rebase label Oct 4, 2024

mergify bot merged commit 8fc555c into instructlab:main Oct 4, 2024

Conversation

RobotSail commented Aug 9, 2024

Uh oh!

RobotSail commented Aug 9, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Maxusmusti left a comment

Choose a reason for hiding this comment

Uh oh!

aldopareja left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JamesKunstle left a comment

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Sep 26, 2024

Uh oh!

RobotSail commented Sep 30, 2024

Uh oh!

mergify bot commented Sep 30, 2024

☑️ Nothing to do

Uh oh!

JamesKunstle commented Oct 1, 2024

Uh oh!

RobotSail commented Oct 1, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants