Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

push_to_hub from local model #1873

Closed
2 of 4 tasks
mmg10 opened this issue Jul 25, 2024 · 4 comments
Closed
2 of 4 tasks

push_to_hub from local model #1873

mmg10 opened this issue Jul 25, 2024 · 4 comments
Labels
🐛 bug Something isn't working

Comments

@mmg10
Copy link

mmg10 commented Jul 25, 2024

System Info

  • transformers version: 4.43.2
  • Platform: Linux-6.10.0-1-cachyos-x86_64-with-glibc2.40
  • Python version: 3.12.4
  • Huggingface_hub version: 0.24.2
  • Safetensors version: 0.4.3
  • Accelerate version: 0.33.0
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.3.1+cpu (False)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using distributed or parallel set-up in script?:

Who can help?

@muellerzr @SunMarc

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

In https://github.com/huggingface/trl/blob/main/examples/scripts/sft.py, I pass the following arguments.

    --model_name_or_path /home/ubuntu/work/Meta-Llama-3.1-8B \
    --dataset_name="HuggingFaceH4/no_robots" \
    --output_dir /home/ubuntu/work/Meta-Llama-3.1-8B-SFT \
    --report_to="wandb" \
    --push_to_hub true \
    --push_to_hub_model_id "llama3.1-8b-sft" \

I understand that I am using the SFTTrainer, but it is using super().push_to_hub method, hence I created the issue in this library.

The error is:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.11/site-packages/huggingface_hub/hf_api.py", line 3761, in create_commit
    hf_raise_for_status(response)
  File "/opt/conda/lib/python3.11/site-packages/huggingface_hub/utils/_errors.py", line 358, in hf_raise_for_status
    raise BadRequestError(message, response=response) from e
huggingface_hub.utils._errors.BadRequestError:  (Request ID: Root=1-66a1dfab-4f8aaadd66a3afc164a238e4;a70493b5-d4ba-4022-978f-9947dcf083c0)

Bad request:
"base_model" with value "/home/ubuntu/work/Meta-Llama-3.1-8B" is not valid. Use a model id from https://hf.co/models.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/ubuntu/work/sft.py", line 151, in <module>
    trainer.save_model(training_args.output_dir)
  File "/opt/conda/lib/python3.11/site-packages/transformers/trainer.py", line 3458, in save_model
    self.push_to_hub(commit_message="Model save")
  File "/opt/conda/lib/python3.11/site-packages/trl/trainer/sft_trainer.py", line 475, in push_to_hub
    return super().push_to_hub(commit_message=commit_message, blocking=blocking, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/transformers/trainer.py", line 4349, in push_to_hub
    return upload_folder(
           ^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/huggingface_hub/hf_api.py", line 1398, in _inner
    return fn(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/huggingface_hub/hf_api.py", line 4857, in upload_folder
    commit_info = self.create_commit(
                  ^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/huggingface_hub/hf_api.py", line 1398, in _inner
    return fn(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/huggingface_hub/hf_api.py", line 3765, in create_commit
    raise ValueError(f"Invalid metadata in README.md.\n{message}") from e
ValueError: Invalid metadata in README.md.
- "base_model" with value "/home/ubuntu/work/Meta-Llama-3.1-8B" is not valid. Use a model id from https://hf.co/models.
wandb: | 0.037 MB of 0.037 MB uploadedded

Expected behavior

The wandb logging and training completes sucessfully. But the push fails. This is because I am loading a model from the local directory /home/ubuntu/work/Meta-Llama-3.1-8B and not from an HF repo. However, the push should work fine!

@mmg10 mmg10 added the 🐛 bug Something isn't working label Jul 25, 2024
@LysandreJik
Copy link
Member

I don't think there is support for push_to_hub_model_id in that script in sft.py.

Moving your issue to TRL, cc @kashif maybe :)

@LysandreJik LysandreJik transferred this issue from huggingface/transformers Jul 25, 2024
Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

@qgallouedec
Copy link
Member

qgallouedec commented Sep 9, 2024

Use hub_model_id instead, see the TrainingArgumentsdoc

(max_steps 100 for demo)

python examples/scripts/sft.py \
    --model_name_or_path facebook/opt-350m \
    --dataset_name timdettmers/openassistant-guanaco \
    --output_dir opt-350m-sft \
    --dataset_text_field text \
    --push_to_hub \
    --max_steps 100 \
    --hub_model_id my_hub_model_id

https://huggingface.co/qgallouedec/my_hub_model_id

@valayDave
Copy link

valayDave commented Oct 3, 2024

I am still facing the same error EVEN when I pass the hub_model_id to the TrainingArgs. I am not using the sft.py script but I am rather directly using the SFTTrainer.

My Model is loaded directly from local machine and then I run some FT on it. After the trainer.train() is done, I call trainer.push_to_hub() and the code crashes with the exact same error. What is a workaround here.

Versions of Libraries:

        "transformers": "4.44.2",
        "peft": "0.12.0",
        "trl": "0.10.1",

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐛 bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants