Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lora config taking target modules from base base model #758

Closed
Luke-4 opened this issue Jul 27, 2023 · 5 comments
Closed

Lora config taking target modules from base base model #758

Luke-4 opened this issue Jul 27, 2023 · 5 comments

Comments

@Luke-4
Copy link

Luke-4 commented Jul 27, 2023

I have fine-tuned a BioGPT model using Lora, the config can be found in the hub (Lukee4/biogpt-2019)

The issue is that when I try to load the model:

peft_model_id = "Lukee4/biogpt-2019"
config = PeftConfig.from_pretrained(peft_model_id)
inference_model = AutoModel.from_pretrained(config.base_model_name_or_path)
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
inference_model = PeftModel.from_pretrained(inference_model, peft_model_id)

I get an error: ValueError: Target modules [‘c_attn’] not found in the base model. Please check the target modules and try again.

I investigated this a little and the only place that I could find c_attn is in the Source code for transformers.models.gpt2.modeling_gpt2

In the Source code for biogpt/modeling_biogpt there is no c_attn.

When I load gpt2 instead of biogpt everything works fine

peft_model_id = "Lukee4/biogpt-2019"
config = PeftConfig.from_pretrained(peft_model_id)
inference_model = AutoModelForSequenceClassification.from_pretrained('gpt2')
tokenizer = AutoTokenizer.from_pretrained('gpt2')
inference_model = PeftModel.from_pretrained(inference_model, peft_model_id)
@BenjaminBossan
Copy link
Member

It seems that the config from Lukee4/biogpt-2019 is either misconfigured by setting c_attn as target module, or there is some kind of mismatch (e.g. the code was changed after the lora weights were trained). I think there is nothing we can do on the PEFT side for this. Maybe you can ask the author by starting a discussion on Lukee4/biogpt-2019.

@Luke-4
Copy link
Author

Luke-4 commented Jul 28, 2023

It seems that the config from Lukee4/biogpt-2019 is either misconfigured by setting c_attn as target module, or there is some kind of mismatch (e.g. the code was changed after the lora weights were trained). I think there is nothing we can do on the PEFT side for this. Maybe you can ask the author by starting a discussion on Lukee4/biogpt-2019.

Hello @BenjaminBossan , in the config I didn't set any target module you can check the notebook I used to do the fine-tuning here, and I also did not change the code after the lora weights were trained. I followed this example to do the fine tunning the only difference is the model used.

@BenjaminBossan
Copy link
Member

Hey, sorry, I completely missed that you're the author of the linked model :)

I can't access the notebook, but I don't think it matters. What I assume happened here is the following: In PEFT, we try to recognize the architecture of the model and automatically set the adapter layers if the user doesn't set target_modules themselves. Probably here, it was recognized as a GPT2-like architecture and hence c_attn was set, even though it doesn't match with the model you used. What you would have to do is specify target_modules in the config, choosing the modules that make sense (probably the Linear modules, but it depends on the model).

@Luke-4
Copy link
Author

Luke-4 commented Jul 29, 2023

Hey @BenjaminBossan

Here is the [link] (https://colab.research.google.com/drive/1s7af-1u-LEtXx2iMAw-8Gf0jnFfgsvc0?usp=sharing) to the notebook hopefully now you can access it.

But I don't understand how this could happen, I tried the same code with BioMedLM which is also a GPT2 model and it worked fine.

How do I decide which are the target modules I should include in the Peftconfig? I'm sorry I know this question is not appropriate but I am new to this and didn't expect to run into this issue is quite advanced for me.

@BenjaminBossan
Copy link
Member

In the notebook, on the one hand, I see:

target_modules= ["K_proj", 'v_proj', 'q_proj', "out_proj"]

but later:

target_modules=['c_attn']

Not sure what happened there.

How do I decide which are the target modules I should include in the Peftconfig?

Not a stupid question at all. In general, people refer to papers who did the experiments to decide which layers to adopt. In general, linear layers are the prime target.

Here are some hints about how to identify what layers could be potential targets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants