Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upload models using Git fails #8543

Closed
2 of 4 tasks
agemagician opened this issue Nov 14, 2020 · 20 comments
Closed
2 of 4 tasks

Upload models using Git fails #8543

agemagician opened this issue Nov 14, 2020 · 20 comments
Labels

Comments

@agemagician
Copy link
Contributor

Environment info

  • transformers version: 3.5.0
  • Platform: Linux-4.15.0-112-generic-x86_64-with-debian-buster-sid
  • Python version: 3.6.12
  • PyTorch version (GPU?): 1.7.0 (False)
  • Tensorflow version (GPU?): 2.3.1 (False)
  • Using GPU in script?: No
  • Using distributed or parallel set-up in script?: No

Who can help

Model Cards: @julien-c
T5: @patrickvonplaten

Information

Model I am using (Bert, XLNet ...):
T5

The problem arises when using:

  • the official example scripts: (give details below)
  • my own modified scripts: (give details below)

The tasks I am working on is:

  • an official GLUE/SQUaD task: (give the name)
  • my own task or dataset: (give details below)

To reproduce

Steps to reproduce the behavior:

git clone https://huggingface.co/Rostlab/prot_t5_xl_bfd
Cloning into 'prot_t5_xl_bfd'...                                                                                                  
remote: Enumerating objects: 31, done.                     
remote: Counting objects: 100% (31/31), done.                                       
remote: Compressing objects: 100% (29/29), done.                         
remote: Total 31 (delta 13), reused 0 (delta 0), pack-reused 0
Unpacking objects: 100% (31/31), done.

cp config.json pytorch_model.bin prot_t5_xl_bf

git add --all

git status
On branch main
Your branch is up to date with 'origin/main'.

Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)

        modified:   config.json
        modified:   pytorch_model.bin

git commit -m "[T5] Fix load weights function #8528"

git push
Username for 'https://huggingface.co': xxxxxx
Password for 'https://agemagician@huggingface.co':
Counting objects: 4, done.
Delta compression using up to 80 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (4/4), 4.92 GiB | 23.09 MiB/s, done.
Total 4 (delta 2), reused 1 (delta 0)
error: RPC failed; HTTP 504 curl 22 The requested URL returned error: 504 Gateway Time-out
fatal: The remote end hung up unexpectedly
fatal: The remote end hung up unexpectedly
Everything up-to-date

OR

GIT_CURL_VERBOSE=1 git push
* Couldn't find host huggingface.co in the .netrc file; using defaults
*   Trying 192.99.39.165...                                 
* TCP_NODELAY set           
* Connected to huggingface.co (192.99.39.165) port 443 (#0)
* found 140 certificates in /etc/ssl/certs/ca-certificates.crt
* found 421 certificates in /etc/ssl/certs
* ALPN, offering http/1.1                         
* SSL connection using TLS1.2 / ECDHE_RSA_AES_256_GCM_SHA384
*        server certificate verification OK
*        server certificate status verification SKIPPED
*        common name: huggingface.co (matched)                        
*        server certificate expiration date OK                       
*        server certificate activation date OK               
*        certificate public key: RSA                       
*        certificate version: #3                 
*        subject: CN=huggingface.co                     
*        start date: Tue, 10 Nov 2020 08:05:46 GMT
*        expire date: Mon, 08 Feb 2021 08:05:46 GMT
*        issuer: C=US,O=Let's Encrypt,CN=Let's Encrypt Authority X3
*        compression: NULL                          
* ALPN, server accepted to use http/1.1      
> GET /Rostlab/prot_t5_xl_bfd/info/refs?service=git-receive-pack HTTP/1.1
Host: huggingface.co
User-Agent: git/2.17.1                        
Accept: */*      
Accept-Encoding: gzip 
Accept-Language: C, *;q=0.9          
Pragma: no-cache                                     
                   
< HTTP/1.1 401 Unauthorized
< Server: nginx/1.14.2          
< Date: Sat, 14 Nov 2020 23:43:52 GMT
< Content-Type: text/plain; charset=utf-8         
< Content-Length: 12                                                  
< Connection: keep-alive                                             
< X-Powered-By: huggingface-moon                             
< WWW-Authenticate: Basic realm="Authentication required", charset="UTF-8"
< ETag: W/"c-dAuDFQrdjS3hezqxDTNgW7AOlYk"        
<                                                       
* Connection #0 to host huggingface.co left intact
Username for 'https://huggingface.co': agemagician
Password for 'https://agemagician@huggingface.co':
* Couldn't find host huggingface.co in the .netrc file; using defaults
* Found bundle for host huggingface.co: 0x55acdab63f80 [can pipeline]
* Re-using existing connection! (#0) with host huggingface.co
* Connected to huggingface.co (192.99.39.165) port 443 (#0)
* Server auth using Basic with user 'agemagician'
> GET /Rostlab/prot_t5_xl_bfd/info/refs?service=git-receive-pack HTTP/1.1
Host: huggingface.co                 
Authorization: Basic YWdlbWFnaWNpYW46VWRjXzEyMDA=       
User-Agent: git/2.17.1
Accept: */*
Accept-Encoding: gzip
Accept-Language: C, *;q=0.9
Pragma: no-cache

< HTTP/1.1 200 OK
< Server: nginx/1.14.2
< Date: Sat, 14 Nov 2020 23:43:59 GMT
< Content-Type: application/x-git-receive-pack-advertisement
< Transfer-Encoding: chunked
< Connection: keep-alive
< X-Powered-By: huggingface-moon
<
* Connection #0 to host huggingface.co left intact
Counting objects: 4, done.
Delta compression using up to 80 threads.
Compressing objects: 100% (3/3), done.
* Couldn't find host huggingface.co in the .netrc file; using defaults
* Found bundle for host huggingface.co: 0x55acdab63f80 [can pipeline]
* Re-using existing connection! (#0) with host huggingface.co
* Connected to huggingface.co (192.99.39.165) port 443 (#0)
* Server auth using Basic with user 'agemagician'
> POST /Rostlab/prot_t5_xl_bfd/git-receive-pack HTTP/1.1
Host: huggingface.co
Authorization: Basic YWdlbWFnaWNpYW46VWRjXzEyMDA=
User-Agent: git/2.17.1
Content-Type: application/x-git-receive-pack-request
Accept: application/x-git-receive-pack-result
Content-Length: 4

* upload completely sent off: 4 out of 4 bytes
< HTTP/1.1 200 OK
< Server: nginx/1.14.2
< Date: Sat, 14 Nov 2020 23:44:02 GMT
< Content-Type: application/x-git-receive-pack-result
< Content-Length: 0
< Connection: keep-alive
< X-Powered-By: huggingface-moon
<
* Connection #0 to host huggingface.co left intact
* Couldn't find host huggingface.co in the .netrc file; using defaults
* Found bundle for host huggingface.co: 0x55acdab63f80 [can pipeline]
* Re-using existing connection! (#0) with host huggingface.co
* Connected to huggingface.co (192.99.39.165) port 443 (#0)
* Server auth using Basic with user 'xxxxxx'
> POST /Rostlab/prot_t5_xl_bfd/git-receive-pack HTTP/1.1
Host: huggingface.co
Authorization: Basic YWdlbWFnaWNpYW46VWRjXzEyMDA=
User-Agent: git/2.17.1
Accept-Encoding: gzip
Content-Type: application/x-git-receive-pack-request
Accept: application/x-git-receive-pack-result
Transfer-Encoding: chunked

Writing objects: 100% (4/4), 4.92 GiB | 23.17 MiB/s, done.
Total 4 (delta 2), reused 1 (delta 0)
* Signaling end of chunked upload via terminating chunk.
* Signaling end of chunked upload via terminating chunk.
* The requested URL returned error: 504 Gateway Time-out
* stopped the pause stream!
* Closing connection 0
error: RPC failed; HTTP 504 curl 22 The requested URL returned error: 504 Gateway Time-out
fatal: The remote end hung up unexpectedly
fatal: The remote end hung up unexpectedly
Everything up-to-date

Expected behavior

I had issue before for uploading big models like T5 "#7480".
It was marked as solved after moving to model versioning:
#8324
However, @patrickvonplaten fixed some problems in T5 weights :
#8528
I tried to update the model but it still doesn't work as showed above.

I tried several tricks like:
https://stackoverflow.com/questions/54061758/error-rpc-failed-http-504-curl-22-the-requested-url-returned-error-504-gatewa
But could not solve it.

Any ideas ?

@stefan-it
Copy link
Collaborator

Hi @agemagician ,

I think this is heavily related to #8480 and can only be fixed by HF team at the moment.

@agemagician
Copy link
Contributor Author

yep , @stefan-it . It seems to be a similar issue.

In this case @julien-c or @patrickvonplaten can someone update ProtT5-XL-BFD Model:
https://huggingface.co/Rostlab/prot_t5_xl_bfd

Configuration:
https://www.dropbox.com/s/pchr2vbvckanfu5/config.json?dl=1

Pytorch Model:
https://www.dropbox.com/s/2brwq1cvxo116c7/pytorch_model.bin?dl=1

Thanks in advance for your help

@julien-c
Copy link
Member

Yes @agemagician, for clarity, I will re-open your initial issue which is indeed not yet closed (although we now know how to support it)

@agemagician
Copy link
Contributor Author

I will be grateful if someone could update the model from the above links until you fix this issue.

@patrickvonplaten
Copy link
Contributor

Should be good, I updated tf and pt - lemme know if something didn't work :-)

@agemagician
Copy link
Contributor Author

Perfect, thanks a lot Patrick. We will test it right away.

Hopefully, the issue will be fixed soon, so I don't have to waste your time again with the 11B model soon.

@patrickvonplaten
Copy link
Contributor

Perfect, thanks a lot Patrick. We will test it right away.

Hopefully, the issue will be fixed soon, so I don't have to waste your time again with the 11B model soon.

No worries at all! Feel free to open a new issue -> doesn't take me long at all to upload it :-)

@stale
Copy link

stale bot commented Jan 19, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Jan 19, 2021
@julien-c
Copy link
Member

This can now be closed. Thanks for reporting!

@tuhinjubcse
Copy link

tuhinjubcse commented Aug 31, 2021

/content/russianpoetrymany# git push
Counting objects: 11, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (9/9), done.
Writing objects: 100% (11/11), 6.26 GiB | 91.43 MiB/s, done.
Total 11 (delta 0), reused 2 (delta 0)
error: RPC failed; HTTP 504 curl 22 The requested URL returned error: 504 Gateway Time-out
fatal: The remote end hung up unexpectedly
fatal: The remote end hung up unexpectedly
Everything up-to-date


@julien-c @patrickvonplaten uploading a mbart model from google colab and got it . any pointers ??

@tscholak
Copy link

tscholak commented Oct 6, 2021

Counting objects: 100% (13/13), done.
Delta compression using up to 16 threads
Compressing objects: 100% (10/10), done.
Writing objects: 100% (12/12), 2.55 GiB | 1.03 MiB/s, done.
Total 12 (delta 1), reused 1 (delta 0), pack-reused 0
error: RPC failed; HTTP 504 curl 22 The requested URL returned error: 504
send-pack: unexpected disconnect while reading sideband packet
fatal: the remote end hung up unexpectedly
Everything up-to-date

Me too

@yucongo
Copy link

yucongo commented Jan 23, 2022

Counting objects: 3, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 1.80 GiB | 37.19 MiB/s, done.
Total 3 (delta 0), reused 1 (delta 0)
error: RPC failed; HTTP 504 curl 22 The requested URL returned error: 504 Gateway Time-out
fatal: The remote end hung up unexpectedly
fatal: The remote end hung up unexpectedly
Everything up-to-date

Same here with !git push in colab. !GIT_CURL_VERBOSE=1 git push does not help.

@julien-c
Copy link
Member

Those are usually transient errors, can you try again with flags GIT_CURL_VERBOSE=1 GIT_TRACE=1 and paste the output to a gist if it occurs again?

@jxmorris12
Copy link

This happened to me too:

❯ git push origin main
Enumerating objects: 5, done.
Counting objects: 100% (5/5), done.
Delta compression using up to 8 threads
Compressing objects: 100% (3/3), done.
Writing objects:  75% (3/4), 2.00 GiB | 2.75 MiB/s
Writing objects: 100% (4/4), 2.22 GiB | 2.86 MiB/s, done.
Total 4 (delta 0), reused 1 (delta 0), pack-reused 0
error: RPC failed; HTTP 504 curl 22 The requested URL returned error: 504
send-pack: unexpected disconnect while reading sideband packet
fatal: the remote end hung up unexpectedly
Everything up-to-date

@fabianmax
Copy link

fabianmax commented May 30, 2022

Having the same issue when uploading a ~6GB GPT-J model via command line:

git push
Enumerating objects: 4, done.
Counting objects: 100% (4/4), done.
Delta compression using up to 10 threads
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 5.11 GiB | 624.00 KiB/s, done.
Total 3 (delta 1), reused 1 (delta 0), pack-reused 0
error: RPC failed; HTTP 504 curl 22 The requested URL returned error: 504
send-pack: unexpected disconnect while reading sideband packet
fatal: the remote end hung up unexpectedly

It seems to be always around 5.11GB (total file size is 6.32GB). When uploading the same file via the HF website interface I get an error 400. Tried it from various locations (wifis), always the same behavior.

Any helpful advises on how to proceed?

Bildschirmfoto 2022-05-30 um 11 55 56

@BigSalmon2
Copy link

I was able to upload it after encountering the error, after restarting the run-time and using an earlier version of HuggingFace

@tzvc
Copy link

tzvc commented Mar 10, 2023

Running into the same issue here trying to upload a 1.4Gb dataset

Enumerating objects: 23985, done.
Counting objects: 100% (23985/23985), done.
Delta compression using up to 4 threads
Compressing objects: 100% (23948/23948), done.
error: RPC failed; HTTP 408 curl 22 The requested URL returned error: 408
send-pack: unexpected disconnect while reading sideband packet
Writing objects: 100% (23949/23949), 1.43 GiB | 4.74 MiB/s, done.
Total 23949 (delta 2), reused 23948 (delta 1), pack-reused 0
fatal: the remote end hung up unexpectedly
Everything up-to-date

@niizam
Copy link

niizam commented May 6, 2023

How did you all solve this?

@jxmorris12
Copy link

@sgugger @thomwolf can you reopen this issue?

@ngxson
Copy link

ngxson commented Feb 8, 2024

For anyone how still couldn't figured it out: it is because the big file is not being tracked git LFS. Reasons maybe:

  1. You don't have the .gitattributes
  2. You have the .gitattributes, but the file extension is not in the list

You can simply ask git LFS to track your file, for example: git lfs track "*.gguf"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests