Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong case of repo_id when using upload_folder or upload_file causes unpredictable error displayed #1371

Closed
ddPn08 opened this issue Mar 2, 2023 · 13 comments · Fixed by #1376
Labels
bug Something isn't working

Comments

@ddPn08
Copy link

ddPn08 commented Mar 2, 2023

Describe the bug

I get the following error when I run the upload_folder or upload_file function.
Reducing the size of the data made no difference.

Reproduction

# hfupload.py
import os
import argparse

from huggingface_hub import HfApi, login
login()
api = HfApi()

def upload(args: argparse.Namespace):
    src = args.src
    dest = args.dest
    repo = args.repo
    repo_type = args.repo_type
    ignore_patterns = args.ignore_patterns

    assert os.path.exists(src), "Source is not exists."
    
    if os.path.isfile(src):
        api.upload_file(
            path_or_fileobj=src,
            path_in_repo=dest,
            repo_id=repo,
            repo_type=repo_type
        )
    else:
        api.upload_folder(
            folder_path=src,
            path_in_repo=dest,
            repo_id=repo,
            repo_type=repo_type,
            ignore_patterns=ignore_patterns,
        )



if __name__ == "__main__":
    parser = argparse.ArgumentParser()

    parser.add_argument("src", type=str)
    parser.add_argument("dest", type=str)
    parser.add_argument("--ignore_patterns", type=str)
    parser.add_argument("--repo", type=str, default="ddpn08/mydataset")
    parser.add_argument("--repo-type", type=str, default="dataset")

    args = parser.parse_args()
    upload(args)

Use this code as below

python hfupload.py ./data.png data

Logs

Traceback (most recent call last):
  File "/home/ddpn08/miniconda3/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 264, in hf_raise_for_status
    response.raise_for_status()
  File "/home/ddpn08/miniconda3/lib/python3.10/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: https://huggingface.co/api/datasets/ddPn08/mydataset/commit/main

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/ddpn08/bin/hfupload.py", line 45, in <module>
    upload(args)
  File "/home/ddpn08/bin/hfupload.py", line 18, in upload
    api.upload_file(
  File "/home/ddpn08/miniconda3/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 124, in _inner_fn
    return fn(*args, **kwargs)
  File "/home/ddpn08/miniconda3/lib/python3.10/site-packages/huggingface_hub/hf_api.py", line 2537, in upload_file
    commit_info = self.create_commit(
  File "/home/ddpn08/miniconda3/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 124, in _inner_fn
    return fn(*args, **kwargs)
  File "/home/ddpn08/miniconda3/lib/python3.10/site-packages/huggingface_hub/hf_api.py", line 2385, in create_commit
    hf_raise_for_status(commit_resp, endpoint_name="commit")
  File "/home/ddpn08/miniconda3/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 314, in hf_raise_for_status
    raise BadRequestError(message, response=response) from e
huggingface_hub.utils._errors.BadRequestError:  (Request ID: Root=1-6400b74e-379e4675705d1bd00eb7ddde)

Bad request for commit endpoint:
Specify commit content in payload: Add a line with the key `lfsFile`, `file` or `deletedFile`


### System info

```shell
- huggingface_hub version: 0.12.1
- Platform: Linux-5.15.0-1033-azure-x86_64-with-glibc2.35
- Python version: 3.10.9
- Running in iPython ?: No
- Running in notebook ?: No
- Running in Google Colab ?: No
- Token path ?: /home/ddpn08/.cache/huggingface/token
- Has saved token ?: True
- Who am I ?: ddPn08
- Configured git credential helpers: store
- FastAI: N/A
- Tensorflow: N/A
- Torch: N/A
- Jinja2: N/A
- Graphviz: N/A
- Pydot: N/A
- Pillow: N/A
- hf_transfer: N/A
- ENDPOINT: https://huggingface.co
- HUGGINGFACE_HUB_CACHE: /home/ddpn08/.cache/huggingface/hub
- HUGGINGFACE_ASSETS_CACHE: /home/ddpn08/.cache/huggingface/assets
- HF_HUB_OFFLINE: False
- HF_TOKEN_PATH: /home/ddpn08/.cache/huggingface/token
- HF_HUB_DISABLE_PROGRESS_BARS: None
- HF_HUB_DISABLE_SYMLINKS_WARNING: False
- HF_HUB_DISABLE_IMPLICIT_TOKEN: False
- HF_HUB_ENABLE_HF_TRANSFER: False
@ddPn08 ddPn08 added the bug Something isn't working label Mar 2, 2023
@Wauplin
Copy link
Contributor

Wauplin commented Mar 2, 2023

Hi @ddPn08 , thanks for reporting the issue.
I have not being able to reproduce it so I don't really what's happening here. I copy-pasted exactly your code (just removed the login() call as I'm already logged in).

I ran

python hf_upload.py hf_upload.py hf_upload.py --repo=Wauplin/1371-GH-issue-upload-file --repo-type=model

and the script got uploaded to the Hub: https://huggingface.co/Wauplin/1371-GH-issue-upload-file/blob/main/hf_upload.py

What command line are you using?

@ddPn08
Copy link
Author

ddPn08 commented Mar 6, 2023

I tested fish and powershell on different machines, but it didn't work.
Changing from dataset to model does not change.

@Wauplin
Copy link
Contributor

Wauplin commented Mar 6, 2023

Hi @ddPn08, just to be sure, when you say "it didn't work", is it with the exact same error.

I don't really know why this is failing so let's try a simple script. Can you save this script in a file named test_issue_1371.py and run it with python test_issue_1371.py ?

import huggingface_hub

print(huggingface_hub.__version__)

# Create repo
api = huggingface_hub.HfApi()
repo_url = api.create_repo(repo_id="test_repo_1371", exist_ok=True)
repo_id = repo_url.repo_id
print(f"Created repo {repo_id}")

# Upload regular file from bytes
print("Upload file.txt")
api.upload_file(path_in_repo="file.txt", path_or_fileobj=b"content", repo_id=repo_id)

# Upload LFS file from bytes
print("Upload lfs.bin")
api.upload_file(path_in_repo="lfs.bin", path_or_fileobj=b"content", repo_id=repo_id)

# Upload from disk file
print("Upload test_issue_1371.py")
api.upload_file(path_in_repo="test_issue_1371.py", path_or_fileobj="test_issue_1371.py", repo_id=repo_id)

You should have an output like this. Can you confirm?

0.12.1
Created repo ddPn08/test_repo_1371
Upload file.txt
Upload lfs.bin
Upload test_issue_1371.py

@ddPn08
Copy link
Author

ddPn08 commented Mar 6, 2023

image
Succeeded. It's strange...

@Wauplin
Copy link
Contributor

Wauplin commented Mar 6, 2023

Ok good, that's what it's suppose to do.

Now can you change the last line to upload the file you were trying to uploading with your initial script. I.e. run something like that:

import huggingface_hub

# Create repo
api = huggingface_hub.HfApi()
repo_url = api.create_repo(repo_id="test_repo_1371", exist_ok=True)
repo_id = repo_url.repo_id
print(f"Created repo {repo_id}")

# Upload from disk file
api.upload_file(path_in_repo="example_file.bin", path_or_fileobj="example_file.bin", repo_id=repo_id)

and change "example_file.bin" by the file you were previously trying to upload. What outcome to you get now?

@ddPn08
Copy link
Author

ddPn08 commented Mar 6, 2023

import os
import argparse

from huggingface_hub import HfApi
api = HfApi()

def upload(args: argparse.Namespace):
    src = args.src
    dest = args.dest
    repo_id = args.repo_id
    repo_type = args.repo_type
    ignore_patterns = args.ignore_patterns

    assert os.path.exists(src), "Source is not exists."

    repo_url = api.create_repo(repo_id=repo_id, exist_ok=True)
    repo = repo_url.repo_id

    if os.path.isfile(src):
        api.upload_file(
            path_or_fileobj=src,
            path_in_repo=dest,
            repo_id=repo,
            repo_type=repo_type
        )
    else:
        api.upload_folder(
            folder_path=src,
            path_in_repo=dest,
            repo_id=repo,
            repo_type=repo_type,
            ignore_patterns=ignore_patterns
        )


if __name__ == "__main__":
    parser = argparse.ArgumentParser()

    parser.add_argument("src", type=str)
    parser.add_argument("dest", type=str)
    parser.add_argument("--ignore_patterns", type=str)
    parser.add_argument("--repo-id", type=str, default="mydataset")
    parser.add_argument("--repo-type", type=str, default="dataset")

    args = parser.parse_args()
    upload(args)

I don't know why, but I fixed it by rewriting the code. I get the same error when I put it back. It's strange.

@Wauplin
Copy link
Contributor

Wauplin commented Mar 6, 2023

Hmm... code is really the same. I don't know what else has changed but glad that at least your issue has been solved. Is it ok with you to close it now?

Small detail in your latest script, you must be careful to take the repo_type into account when creating the repo:

    repo_url = api.create_repo(repo_id=repo_id, repo_type=repo_type, exist_ok=True) # repo_type was missing here
    repo = repo_url.repo_id

@ddPn08
Copy link
Author

ddPn08 commented Mar 6, 2023

It was a very stupid mistake.
My user id was ddPn08. It was not ddpn08.
Thank you for everything.

@ddPn08 ddPn08 closed this as completed Mar 6, 2023
@ddPn08
Copy link
Author

ddPn08 commented Mar 6, 2023

I thought this fixed it, but maybe I'm wrong? At least the problem is no longer reproducible.

@ddPn08
Copy link
Author

ddPn08 commented Mar 6, 2023

If the p is lowercase you will get an error, if it is uppercase it will succeed.
I feel like this is an unexpected error.

@Wauplin
Copy link
Contributor

Wauplin commented Mar 6, 2023

My user id was ddPn08. It was not ddpn08.

Oooh, good to know. Thanks for spotting this.
Issue is solved for your case (at least you know a workaround) but I think we should fix it in huggingface_hub to at least return a more descriptive error. In theory repo_id is case insensitive but I'll need to investigate this more. Now that I have been able to reproduce it, I'll be able to debug/fix it.

So yes, let's keep it open and if you don't mind I'll update the issue title.

@Wauplin Wauplin reopened this Mar 6, 2023
@ddPn08 ddPn08 changed the title upload_folder and upload_file functions not working Wrong case of repo_id causes unpredictable error displayed Mar 6, 2023
@ddPn08 ddPn08 changed the title Wrong case of repo_id causes unpredictable error displayed Wrong case of repo_id when using upload_folder and upload_file causes unpredictable error displayed Mar 6, 2023
@ddPn08 ddPn08 changed the title Wrong case of repo_id when using upload_folder and upload_file causes unpredictable error displayed Wrong case of repo_id when using upload_folder or upload_file causes unpredictable error displayed Mar 6, 2023
@Wauplin
Copy link
Contributor

Wauplin commented Mar 6, 2023

@ddPn08 For your info, I opened a PR to fix this. Will be published in next release.

@Wauplin
Copy link
Contributor

Wauplin commented Mar 7, 2023

@ddPn08 problem is fixed! Thanks for reporting :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants