feat: automatically select LoRA modules when none are provided#166
feat: automatically select LoRA modules when none are provided#166mergify[bot] merged 1 commit intoinstructlab:mainfrom
Conversation
|
resolves #164 |
src/instructlab/training/utils.py
Outdated
| """ | ||
| Given a pretrained model, returns all of the projection layers (matching '_proj') | ||
| """ | ||
| proj_layers = set(name.split('.')[-1] for name, _ in model.named_modules() if name.endswith("_proj")) |
There was a problem hiding this comment.
if this is for llama only its fine, but in general models do not always have the naming k_proj, v_proj, etc.
There was a problem hiding this comment.
another alternative if you want to target all the linears, is to do isinstance (mod, torch.nn.Linear)
There was a problem hiding this comment.
@fabianlim The models we actively support are listed here:
training/src/instructlab/training/main_ds.py
Line 158 in 9e2ac74
When I looked at their list of layers, all of them had k_proj, q_proj, v_proj, o_proj, so the assumption is true at least for supported models.
You're right though, we could go down the path of targeting all linear layers. I just have two questions about this approach:
- How would this affect the memory requirements?
- What would be the impact on training times? If we are targeting more modules for LoRA, we would potentially be dropping more pretrained weights in favor of our LoRA approximations - how would this impact the loss curve?
There was a problem hiding this comment.
If it is as you said. That you are targeting all the proj, then it should be equivalent to putting a Lora adapter on all linears
There was a problem hiding this comment.
@fabianlim Not necessarily. Some models will use Linear layers which are not explicitly labeled as projections. For example, in starcoder-3b, these account roughly 1.2B parameters:
There was a problem hiding this comment.
we should NOT use model specific names to do LoRA since model architectures are subject to change, and we might even start supporting too many of them..
e2ab9dd to
dd1cb74
Compare
aldopareja
left a comment
There was a problem hiding this comment.
I think this look alright, just a minor comment that can be ignored for the moment.
src/instructlab/training/utils.py
Outdated
| """ | ||
| Given a pretrained model, returns all of the projection layers (matching '_proj') | ||
| """ | ||
| proj_layers = set(name.split('.')[-1] for name, _ in model.named_modules() if name.endswith("_proj")) |
There was a problem hiding this comment.
we should NOT use model specific names to do LoRA since model architectures are subject to change, and we might even start supporting too many of them..
| ) | ||
| command.extend(train_args.lora.target_modules) | ||
| if train_args.lora.target_modules: | ||
| command.extend(train_args.lora.target_modules) |
There was a problem hiding this comment.
should we have this only for granite models?, how about non-granite models?
There was a problem hiding this comment.
Specifying the target modules? It should be fine for all models, since we may want to target different modules depending on what we want them to learn
|
This pull request has merge conflicts that must be resolved before it can be |
|
@mergify rebase |
☑️ Nothing to doDetails
|
|
@RobotSail Is this good to go post-rebase? |
|
@JamesKunstle Yeah it should be. Mergify didn't merge it automatically |
In the current version of the training library, we have the default value of target_modules set to a list oflayer names which are implementation-specific and may not reflect what a given model actually uses for the layer names. Furthermore, the default is also a subset of all projection layers in most models, and the recommendation is generally to use all of these layers when injecting low rank adapters. This commit resolves that issue by introducing logic to automatically resolve the target modules and default to using all of them when they are not provided. This commit also adds validation logic which indicates when some of the provided modules do not exist in the model. To go a step further, the training library will also now error out when none of the provided target modules exist in the model, supplying the user with additional context on which modules exist and how they could resolve the error Signed-off-by: Oleg S <97077423+RobotSail@users.noreply.github.com>

In the current version of the training library, we have the default value of target_modules set to
a list oflayer names which are implementation-specific and may not reflect what a given model actually
uses for the layer names. Furthermore, the default is also a subset of all projection layers in most models,
and the recommendation is generally to use all of these layers when injecting low rank adapters.
This commit resolves that issue by introducing logic to automatically resolve the target modules
and default to using all of them when they are not provided. This commit also adds validation logic
which indicates when some of the provided modules do not exist in the model. To go a step further,
the training library will also now error out when none of the provided target modules exist in the model,
supplying the user with additional context on which modules exist and how they could resolve the error
Signed-off-by: Oleg S ec2-user@ip-10-0-24-47.us-east-2.compute.internal