-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lora config taking target modules from base base model #758
Comments
It seems that the config from |
Hello @BenjaminBossan , in the config I didn't set any target module you can check the notebook I used to do the fine-tuning here, and I also did not change the code after the lora weights were trained. I followed this example to do the fine tunning the only difference is the model used. |
Hey, sorry, I completely missed that you're the author of the linked model :) I can't access the notebook, but I don't think it matters. What I assume happened here is the following: In PEFT, we try to recognize the architecture of the model and automatically set the adapter layers if the user doesn't set |
Hey @BenjaminBossan Here is the [link] (https://colab.research.google.com/drive/1s7af-1u-LEtXx2iMAw-8Gf0jnFfgsvc0?usp=sharing) to the notebook hopefully now you can access it. But I don't understand how this could happen, I tried the same code with BioMedLM which is also a GPT2 model and it worked fine. How do I decide which are the target modules I should include in the Peftconfig? I'm sorry I know this question is not appropriate but I am new to this and didn't expect to run into this issue is quite advanced for me. |
In the notebook, on the one hand, I see:
but later:
Not sure what happened there.
Not a stupid question at all. In general, people refer to papers who did the experiments to decide which layers to adopt. In general, linear layers are the prime target. Here are some hints about how to identify what layers could be potential targets. |
I have fine-tuned a BioGPT model using Lora, the config can be found in the hub (Lukee4/biogpt-2019)
The issue is that when I try to load the model:
I get an error: ValueError: Target modules [‘c_attn’] not found in the base model. Please check the target modules and try again.
I investigated this a little and the only place that I could find c_attn is in the Source code for transformers.models.gpt2.modeling_gpt2
In the Source code for biogpt/modeling_biogpt there is no c_attn.
When I load gpt2 instead of biogpt everything works fine
The text was updated successfully, but these errors were encountered: