Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

413 Client Error: Payload Too Large when using upload_folder on a lot of files #918

Closed
nateraw opened this issue Jun 21, 2022 · 22 comments
Closed
Assignees
Labels
bug Something isn't working

Comments

@nateraw
Copy link
Contributor

nateraw commented Jun 21, 2022

Describe the bug

When trying to commit a folder with many CSV files, I got the following error:

HTTPError: 413 Client Error: Payload Too Large for url: https://huggingface.co/api/datasets/nateraw/test-upload-folder-bug/preupload/main

I assume there is a limit to total payload size when uploading a folder that I am going over here. I confirmed it has nothing to do with the number of files, but rather the total size of the files that are being uploaded. It would be great in the short term if we could document what this limit is clearly in the upload_folder fn.

Reproduction

The following fails on the last line. I wrote it so you can run it yourself without updating the repo ID or anything...so if you're logged in, the below should work (assuming you have torchvision installed).

import os

from torchvision.datasets.utils import download_and_extract_archive
from huggingface_hub import upload_folder, whoami, create_repo

user = whoami()['name']
repo_id = f'{user}/test-upload-folder-bug'
create_repo(repo_id, exist_ok=True, repo_type='dataset')

os.mkdir('./data')
download_and_extract_archive(
    url='https://zenodo.org/api/files/f7f7377b-8405-4d4f-b814-f021df5593b1/hyperbard_data.zip',
    download_root='./data',
    remove_finished=True
)
upload_folder(
    folder_path='./data',
    path_in_repo="",
    repo_id=repo_id,
    repo_type='dataset'
)

Logs

---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
<ipython-input-2-91516b1ea47f> in <module>()
     18     path_in_repo="",
     19     repo_id=repo_id,
---> 20     repo_type='dataset'
     21 )

3 frames
/usr/local/lib/python3.7/dist-packages/huggingface_hub/hf_api.py in upload_folder(self, repo_id, folder_path, path_in_repo, commit_message, commit_description, token, repo_type, revision, create_pr)
   2115             token=token,
   2116             revision=revision,
-> 2117             create_pr=create_pr,
   2118         )
   2119 

/usr/local/lib/python3.7/dist-packages/huggingface_hub/hf_api.py in create_commit(self, repo_id, operations, commit_message, commit_description, token, repo_type, revision, create_pr, num_threads)
   1813             token=token,
   1814             revision=revision,
-> 1815             endpoint=self.endpoint,
   1816         )
   1817         upload_lfs_files(

/usr/local/lib/python3.7/dist-packages/huggingface_hub/_commit_api.py in fetch_upload_modes(additions, repo_type, repo_id, token, revision, endpoint)
    380         headers=headers,
    381     )
--> 382     resp.raise_for_status()
    383 
    384     preupload_info = validate_preupload_info(resp.json())

/usr/local/lib/python3.7/dist-packages/requests/models.py in raise_for_status(self)
    939 
    940         if http_error_msg:
--> 941             raise HTTPError(http_error_msg, response=self)
    942 
    943     def close(self):

HTTPError: 413 Client Error: Payload Too Large for url: https://huggingface.co/api/datasets/nateraw/test-upload-folder-bug/preupload/main


### System Info

```shell
Colab
@nateraw nateraw added the bug Something isn't working label Jun 21, 2022
@nateraw
Copy link
Contributor Author

nateraw commented Jun 21, 2022

CC @SBrandeis

@julien-c
Copy link
Member

ah yes we probably want to chunk client-side in this use case (you're probably hitting POST limit size of 10MB). + enforce a reasonable total max size, regardless of chunking (maybe 100MB)

Note that this only applies to non-LFS files so 100MB is more than reasonable IMO.

Also cc @coyotte508 and @Pierrci for visibility

@coyotte508
Copy link
Member

coyotte508 commented Jun 22, 2022

I think the limit is already 100MB on the hub side.

But since the python library is sending in base64 (to be able to send files with non-UTF8 characters) it's closer to 70~75MB max.

What's the size of the files @nateraw ?

@SBrandeis
Copy link
Contributor

@coyotte508 the 413 is thrown during the pre-upload call it seems:

HTTPError: 413 Client Error: Payload Too Large for url: https://huggingface.co/api/datasets/nateraw/test-upload-folder-bug/preupload/main

@coyotte508
Copy link
Member

Oh you should only send the first 512 bytes of data in the preupload call @SBrandeis

@coyotte508
Copy link
Member

In the web code:

			const res = await fetch(apiUrl("preupload"), {
				method: "POST",
				headers: {
					"Content-Type": "application/json",
				},
				body: JSON.stringify({
					files: [
						{
							size: selectedFile.size,
							// Base64 conversion of the first 512 bytes
							sample: await blobToBase64(selectedFile.slice(0, 512)),
							path: [path, encodeURIComponent(selectedFile.name)]
								.filter(Boolean)
								.join("/"),
						},
					],
				} as PreuploadRequest),
			});

@SBrandeis
Copy link
Contributor

Yes, we do that already:

    payload = {
        "files": [
            {
                "path": op.path_in_repo,
                "sample": base64.b64encode(op._upload_info().sample).decode("ascii"),
                "size": op._upload_info().size,
                "sha": op._upload_info().sha256.hex(),
            }
            for op in additions
        ]
    }

upload_info().sample is the first 512 bytes of the file to upload

@coyotte508
Copy link
Member

Then likely there is so many files that the 250kB limit is overcome just with the preupload call.

Either the hub library should batch the preupload calls (in chunks of 250 files for example) or we should allow a bigger body on the hub side

@SBrandeis
Copy link
Contributor

@nateraw I opened #920 with a fix

Can you try it out and confirm it fixes your issue, please?

@fcakyon
Copy link

fcakyon commented Nov 12, 2022

@SBrandeis I am having the same error when trying to upload a folder of files with a total size of 350MB. What is the proposed approach for such case?

@coyotte508
Copy link
Member

@fcakyon the issue should be mostly fixed with recent versions of the hub library. Are you at the latest version?

@fcakyon
Copy link

fcakyon commented Nov 12, 2022

I am at the latest release v0.10.1. Now going to try with the main branch.

@coyotte508
Copy link
Member

ok 🤔

Feel free to share more details about the error, eg number of files in the folder, the request id if present, or detailed error message

@fcakyon
Copy link

fcakyon commented Nov 12, 2022

I am trying to upload a folder of 959 .mp4 video files with a total size of 297MB using upload_folder function of the hub package.

This is the error traceback:

2022-11-12 19:48:36.535 Uncaught app exception
Traceback (most recent call last):
  File "...\lib\site-packages\huggingface_hub\utils\_errors.py", line 213, in hf_raise_for_status
    response.raise_for_status()
  File "...\lib\site-packages\requests\models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 413 Client Error: Payload Too Large for url: https://huggingface.co/api/datasets/.../.../commit/main

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "...\lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 559, in _run_script
    self._session_state.on_script_will_rerun(rerun_data.widget_states)
  File "...\lib\site-packages\streamlit\runtime\state\safe_session_state.py", line 72, in on_script_will_rerun
    self._state.on_script_will_rerun(latest_widget_states)
  File "...\lib\site-packages\streamlit\runtime\state\session_state.py", line 542, in on_script_will_rerun
    self._call_callbacks()
  File "...\lib\site-packages\streamlit\runtime\state\session_state.py", line 555, in _call_callbacks
    self._new_widget_state.call_callback(wid)
  File "...\lib\site-packages\streamlit\runtime\state\session_state.py", line 277, in call_callback
    callback(*args, **kwargs)
  File "...\st_utils.py", line 66, in st_upload_folder_to_repo 
    upload_url = upload_folder_to_repo(**kwargs)
  File "...\hf_utils.py", line 49, in upload_folder_to_repo    
    for folder_path in folder_paths:
  File "...\lib\site-packages\huggingface_hub\utils\_validators.py", line 94, in _inner_fn
    return fn(*args, **kwargs)
  File "...\lib\site-packages\huggingface_hub\hf_api.py", line 2384, in upload_folder
    commit_info = self.create_commit(
  File "...\lib\site-packages\huggingface_hub\utils\_validators.py", line 94, in _inner_fn
    return fn(*args, **kwargs)
  File "...\lib\site-packages\huggingface_hub\hf_api.py", line 2074, in create_commit
    hf_raise_for_status(commit_resp, endpoint_name="commit")
  File "...\lib\site-packages\huggingface_hub\utils\_errors.py", line 254, in hf_raise_for_status
    raise HfHubHTTPError(str(HTTPError), response=response) from e
huggingface_hub.utils._errors.HfHubHTTPError: <class 'requests.exceptions.HTTPError'> (Request ID: IEd7hJkhk5rcdQ777Idq3)

request entity too large

@fcakyon
Copy link

fcakyon commented Nov 12, 2022

@coyotte508 after updating to main, everything works fine :)

Do you have any ETA on releasing 0.11.0? It has been more than a month since the last huggingface-hub release.

@coyotte508
Copy link
Member

It should be soon!! cc @Wauplin

@nateraw
Copy link
Contributor Author

nateraw commented Nov 14, 2022

@fcakyon Been using this snippet for similar use case here. Feel free to play with it in meantime if need be.

@fcakyon
Copy link

fcakyon commented Nov 14, 2022

@nateraw thanks a lot, snippet is very clear and simple!

@julien-c
Copy link
Member

@fcakyon or use pip install huggingface_hub==0.11.0rc0 which is about to be publicly released and will be a more robust future-proof fix :)

@fcakyon
Copy link

fcakyon commented Nov 15, 2022

@julien-c thank you, good to know there is a pre-release version available!

@Wauplin
Copy link
Contributor

Wauplin commented Nov 18, 2022

@fcakyon Release is done :)

pip install huggingface_hub==0.11

I'm closing this issue. Feel free to reopen it/open a new one if you still encounter an issue.

@Wauplin Wauplin closed this as completed Nov 18, 2022
@fcakyon
Copy link

fcakyon commented Nov 18, 2022

@Wauplin amazing news!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants