-
Notifications
You must be signed in to change notification settings - Fork 916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(server): Add native support for PEFT Lora models #762
Conversation
- Will detect `peft` model by finding `adapter_config.json`. - This triggers a totally dedicated `download-weights` path - This path, loads the adapter config, finds the base model_id - It loads the base_model - Then peft_model - Then `merge_and_unload()` - Then `save_pretrained(.., safe_serialization=True) - Add back the config + tokenizer.merge_and_unload()` - Then `save_pretrained(.., safe_serialization=True) - Add back the config + tokenizer. - The chosen location is a **local folder with the name of the user chosen model id** PROs: - Easier than to expect user to merge manually - Barely any change outside of `download-weights` command. - This means everything will work in a single load. - Should enable out of the box SM + HFE CONs: - Creates a local merged model in unusual location, potentially not saved across docker reloads, or ovewriting some files if the PEFT itself was local and containing other files in addition to the lora Alternatives considered: - Add `local_files_only=True` every where (discard because of massive code change for not a good enough reason) - Return something to `launcher` about the new model-id (a cleaner location for this new model), but it would introduce new communication somewhere where we didn't need it before. - Using the HF cache folder and *stopping* the flow after `download-weights` and asking user to restart with the actual local model location
@philschmid If you have any suggestions/counterindications for SM or HFE behavior ? @younesbelkada Is my usage of PEFT ok ? |
PEFT added withj 0.4 model = AutoPeftModelForCausalLM.from_pretrained(
args.peft_model_id,
low_cpu_mem_usage=True,
torch_dtype=torch.float16,
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So they idea here is to:
- load the adapters
- load the base model based on the adapter config
- Merge the weights
- save safetensors
- start TGI?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great on PEFT side! Thanks for the ping
Alternatively you could have also used AutoPeftModelForCausalLM
or AutoPeftModelForSeq2SeqLM
: https://huggingface.co/docs/peft/quicktour#easy-loading-with-auto-classes and directly load the model with the correct class without having to declare a base model. But this also works great (it does the same thing under the hood)
@younesbelkada Do these method force the dtype to be f32 or should I specify it with |
yes the behaviour is the same as transformers' from_pretrained so it will load in fp32, you need to pass |
feat(server): Add native support for PEFT Lora models (huggingface#762)
@Narsil Should model loading work if the model is a local model and/or the base model specified in the adapter_config.json is a local path? For example: text-generation-launcher --model-id /dev/models/gpt2-medium-peft or in adapter_config.json: {
"base_model_name_or_path": "/dev/models/gpt2-medium",
...
} It doesn't seem to be detecting that the local model I'm pointing to is a peft model. |
Hmm I haven' t tried but I don' t think there's any big reason why it shouldn' t work. Could you open an issue with the full template filled out (and ideally links to the actual models and setup so we can reproduce as easily as possible). |
@Narsil At least just looking at the code ( if not is_local_model:
try:
adapter_config_filename = hf_hub_download(model_id, revision=revision, filename="adapter_config.json")
utils.download_and_unload_peft(model_id, revision, trust_remote_code=trust_remote_code)
except (utils.LocalEntryNotFoundError, utils.EntryNotFoundError):
pass |
Oh true, should be relatively easy to update, do you want to take a stab at it ? |
@Narsil Thanks, I'd be interested in taking a stab at it. However, I'm running into some challenges testing on my Mac without a GPU. I'll give it a shot, but if I'm unable to address the issue in a reasonable timeframe, I may ask someone else to take over. |
I had similar problem (I was trying to serve a PEFT LoRA from folder My workaround was to create the merged model before and serve it (not requiring the PEFT path in TGI). Something like: 3 folders:
Script (weight rename is specific to Bloom model, probably should be removed or adapted for your model) import os
import sys
import peft
import torch
import transformers
peft_checkpoint_dir = sys.argv[1]
model_data_dir = sys.argv[2]
model = peft.AutoPeftModelForCausalLM.from_pretrained(
peft_checkpoint_dir,
torch_dtype=torch.float16,
trust_remote_code=True,
low_cpu_mem_usage=True,
)
base_model_id = model.peft_config["default"].base_model_name_or_path
model = model.merge_and_unload()
tokenizer_config_file = os.path.join(peft_checkpoint_dir, "tokenizer_config.json")
tokenizer_peft_or_base = (
peft_checkpoint_dir if os.path.isfile(tokenizer_config_file) else base_model_id
)
tokenizer = transformers.AutoTokenizer.from_pretrained(tokenizer_peft_or_base)
generation_config_file = os.path.join(peft_checkpoint_dir, "generation_config.json")
generation_peft_or_base = (
peft_checkpoint_dir if os.path.isfile(generation_config_file) else base_model_id
)
generation = transformers.GenerationConfig.from_pretrained(generation_peft_or_base)
### rename weights transformer.<key> -> <key> as expected by TGI
assert model.base_model_prefix == "transformer"
state_dict = {
k[len(model.base_model_prefix) + 1 :]: v
for k, v in model.state_dict().items()
if k.startswith(model.base_model_prefix)
}
###
model.save_pretrained(model_data_dir, state_dict=state_dict, safe_serialization=True)
tokenizer.save_pretrained(model_data_dir)
generation.save_pretrained(model_data_dir) Locally, I run the script inside TGI docker. docker run --rm \
--volume=./utils:/opt/ml/utils \
--volume=./peft/<checkpoint>:/opt/ml/peft-checkpoint \
--volume=./model/data:/opt/ml/model \
--volume=$HOME/.cache/huggingface/hub:/data \
--entrypoint=python \
--env PYTHONUNBUFFERED=1 \
--env HF_HUB_ENABLE_HF_TRANSFER=0 \
--env HUGGINGFACE_HUB_CACHE=/data \
ghcr.io/huggingface/text-generation-inference:1.0.3 \
/opt/ml/utils/peft_merger.py \
/opt/ml/peft-checkpoint \
/opt/ml/model After docker, I create a I tested with In your case, I think you can mount ( Example with Bloom: Input
Output
And running it with Docker (this requires GPU): docker run --rm \
--gpus=all \
--shm-size=1g \
--publish=8080:80 \
--volume=./model/data:/opt/ml/model \
--env DTYPE=bfloat16 \
ghcr.io/huggingface/text-generation-inference:1.0.3 \
--model-id=/opt/ml/model Output:
|
Thanks @cirocavani for the detailed solution to work around this issue--this is very helpful |
@Narsil @shimizust @cirocavani I hit the same issue when trying to load local PEFT weights. Here's a PR that fixed it for me: |
# What does this PR do? Enables PEFT weights to be loaded from a local directory, as opposed to a hf hub repository. It is a continuation of the work in PR #762 <!-- Congratulations! You've made it this far! You're not quite done yet though. Once merged, your PR is going to appear in the release notes with the title you set, so make sure it's a great title that fully reflects the extent of your awesome contribution. Then, please replace this with a description of the change and which issue is fixed (if applicable). Please also include relevant motivation and context. List any dependencies (if any) that are required for this change. Once you're done, someone will review your PR shortly (see the section "Who can review?" below to tag some potential reviewers). They may suggest changes to make the code even better. If no one reviewed your PR after a week has passed, don't hesitate to post a new comment @-mentioning the same persons---sometimes notifications get lost. --> <!-- Remove if not applicable --> Fixes #1259 ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [x] Did you read the [contributor guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests), Pull Request section? **Yes but I don't know how to run the tests for this repo, and it doesn't look like this code is covered anyway** - [x] Was this discussed/approved via a Github issue or the [forum](https://discuss.huggingface.co/)? Please add a link to it if that's the case. **Yes, @Narsil asked for a PR in [this comment](#762 (comment) - [x] Did you make sure to update the documentation with your changes? Here are the [documentation guidelines](https://github.com/huggingface/transformers/tree/main/docs), and [here are tips on formatting docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation). **I didn't see any documentation added to the [original PR](#762), and am not sure where this belongs. Let me know and I can add some** - [x] Did you write any new necessary tests? **I didn't see any existing test coverage for this python module** ## Who can review? Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR. @Narsil <!-- Your PR will be replied to more quickly if you can figure out the right person to tag with @ @Narsil --> --------- Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
# What does this PR do? Enables PEFT weights to be loaded from a local directory, as opposed to a hf hub repository. It is a continuation of the work in PR huggingface/text-generation-inference#762 <!-- Congratulations! You've made it this far! You're not quite done yet though. Once merged, your PR is going to appear in the release notes with the title you set, so make sure it's a great title that fully reflects the extent of your awesome contribution. Then, please replace this with a description of the change and which issue is fixed (if applicable). Please also include relevant motivation and context. List any dependencies (if any) that are required for this change. Once you're done, someone will review your PR shortly (see the section "Who can review?" below to tag some potential reviewers). They may suggest changes to make the code even better. If no one reviewed your PR after a week has passed, don't hesitate to post a new comment @-mentioning the same persons---sometimes notifications get lost. --> <!-- Remove if not applicable --> Fixes #1259 ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [x] Did you read the [contributor guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests), Pull Request section? **Yes but I don't know how to run the tests for this repo, and it doesn't look like this code is covered anyway** - [x] Was this discussed/approved via a Github issue or the [forum](https://discuss.huggingface.co/)? Please add a link to it if that's the case. **Yes, @Narsil asked for a PR in [this comment](huggingface/text-generation-inference#762 (comment) - [x] Did you make sure to update the documentation with your changes? Here are the [documentation guidelines](https://github.com/huggingface/transformers/tree/main/docs), and [here are tips on formatting docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation). **I didn't see any documentation added to the [original PR](huggingface/text-generation-inference#762), and am not sure where this belongs. Let me know and I can add some** - [x] Did you write any new necessary tests? **I didn't see any existing test coverage for this python module** ## Who can review? Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR. @Narsil <!-- Your PR will be replied to more quickly if you can figure out the right person to tag with @ @Narsil --> --------- Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
peft
model by findingadapter_config.json
.download-weights
pathmerge_and_unload()
chosen model id
PROs:
download-weights
command.CONs:
not saved across docker reloads, or ovewriting some files if the PEFT
itself was local and containing other files in addition to the lora
Alternatives considered:
local_files_only=True
every where (discard because of massivecode change for not a good enough reason)
launcher
about the new model-id (a cleanerlocation for this new model), but it would
introduce new communication somewhere where we didn't need it before.
download-weights
and asking user to restart with the actual localmodel location
Fix #482
What does this PR do?
Fixes # (issue)
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.