Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upload models using transformers-cli fails #7480

Closed
2 of 4 tasks
agemagician opened this issue Sep 30, 2020 · 14 comments · Fixed by #8324
Closed
2 of 4 tasks

Upload models using transformers-cli fails #7480

agemagician opened this issue Sep 30, 2020 · 14 comments · Fixed by #8324
Assignees
Labels

Comments

@agemagician
Copy link
Contributor

Environment info

  • transformers version: 3.0.2
  • Platform: Linux-4.15.0-112-generic-x86_64-with-glibc2.10
  • Python version: 3.8.5
  • PyTorch version (GPU?): 1.6.0 (False)
  • Tensorflow version (GPU?): 2.3.0 (False)
  • Using GPU in script?: No
  • Using distributed or parallel set-up in script?: No

Who can help

Model Cards: @julien-c
T5: @patrickvonplaten

Information

Model I am using T5:

The problem arises when using:

  • the official example scripts: (give details below)
  • my own modified scripts: (give details below)

The tasks I am working on is:

  • an official GLUE/SQUaD task: (give the name)
  • my own task or dataset: (give details below)

To reproduce

Steps to reproduce the behavior:

Command:
transformers-cli upload ./prot_t5_xl_bfd/ --organization Rostlab

Error:

About to upload file /mnt/lsf-nas-1/lsf/job/repo/elnaggar/prot-transformers/models/transformers/prot_t5_xl_bfd/pytorch_model.bin to S3 under filename prot_t5_xl_bfd/pytorch_model.bin and namespace Rostl
ab                                                                                                                                                                                                        
Proceed? [Y/n] y                                                                                                                                                                                          
Uploading... This might take a while if files are large                                                                                                                                                   
  0%|▌                                                                                                                                               | 48242688/11276091454 [00:02<14:55, 12534308.31it/s]
Traceback (most recent call last):                                                                                                                                                                        
  File "/mnt/lsf-nas-1/lsf/job/repo/elnaggar/anaconda3/envs/transformers_covid/lib/python3.8/site-packages/urllib3/connectionpool.py", line 670, in urlopen                                               
    httplib_response = self._make_request(                                                                                                                                                                
  File "/mnt/lsf-nas-1/lsf/job/repo/elnaggar/anaconda3/envs/transformers_covid/lib/python3.8/site-packages/urllib3/connectionpool.py", line 392, in _make_request                                         
    conn.request(method, url, **httplib_request_kw)                                                                                                                                                       
  File "/mnt/lsf-nas-1/lsf/job/repo/elnaggar/anaconda3/envs/transformers_covid/lib/python3.8/http/client.py", line 1255, in request                                                                       
    self._send_request(method, url, body, headers, encode_chunked)                                                                                                                                        
  File "/mnt/lsf-nas-1/lsf/job/repo/elnaggar/anaconda3/envs/transformers_covid/lib/python3.8/http/client.py", line 1301, in _send_request                                                                 
    self.endheaders(body, encode_chunked=encode_chunked)                                                                                                                                                  
  File "/mnt/lsf-nas-1/lsf/job/repo/elnaggar/anaconda3/envs/transformers_covid/lib/python3.8/http/client.py", line 1250, in endheaders                                                                    
    self._send_output(message_body, encode_chunked=encode_chunked)                                                                                                                                        
  File "/mnt/lsf-nas-1/lsf/job/repo/elnaggar/anaconda3/envs/transformers_covid/lib/python3.8/http/client.py", line 1049, in _send_output                                                                  
    self.send(chunk)                                                                                                                                                                                      
  File "/mnt/lsf-nas-1/lsf/job/repo/elnaggar/anaconda3/envs/transformers_covid/lib/python3.8/http/client.py", line 971, in send                                                                           
    self.sock.sendall(data)                                                                                                                                                                               
  File "/mnt/lsf-nas-1/lsf/job/repo/elnaggar/anaconda3/envs/transformers_covid/lib/python3.8/ssl.py", line 1204, in sendall                                                                               
    v = self.send(byte_view[count:])                                                                                                                                                                      
  File "/mnt/lsf-nas-1/lsf/job/repo/elnaggar/anaconda3/envs/transformers_covid/lib/python3.8/ssl.py", line 1173, in send                                                                                  
    return self._sslobj.write(data)                                                                                                                                                                       
BrokenPipeError: [Errno 32] Broken pipe        

Traceback (most recent call last):                                                                                                                                                                        
  File "/mnt/lsf-nas-1/lsf/job/repo/elnaggar/anaconda3/envs/transformers_covid/lib/python3.8/site-packages/requests/adapters.py", line 439, in send                                                       
    resp = conn.urlopen(                                                                                                                                                                                  
  File "/mnt/lsf-nas-1/lsf/job/repo/elnaggar/anaconda3/envs/transformers_covid/lib/python3.8/site-packages/urllib3/connectionpool.py", line 726, in urlopen                                               
    retries = retries.increment(                                                                                                                                                                          
  File "/mnt/lsf-nas-1/lsf/job/repo/elnaggar/anaconda3/envs/transformers_covid/lib/python3.8/site-packages/urllib3/util/retry.py", line 403, in increment                                                 
    raise six.reraise(type(error), error, _stacktrace)                                                                                                                                                    
  File "/mnt/lsf-nas-1/lsf/job/repo/elnaggar/anaconda3/envs/transformers_covid/lib/python3.8/site-packages/urllib3/packages/six.py", line 734, in reraise                                                 
    raise value.with_traceback(tb)                                                                                                                                                                        
  File "/mnt/lsf-nas-1/lsf/job/repo/elnaggar/anaconda3/envs/transformers_covid/lib/python3.8/site-packages/urllib3/connectionpool.py", line 670, in urlopen                                               
    httplib_response = self._make_request(                                                                                                                                                                
  File "/mnt/lsf-nas-1/lsf/job/repo/elnaggar/anaconda3/envs/transformers_covid/lib/python3.8/site-packages/urllib3/connectionpool.py", line 392, in _make_request                                         
    conn.request(method, url, **httplib_request_kw)                                                                                                                                                       
  File "/mnt/lsf-nas-1/lsf/job/repo/elnaggar/anaconda3/envs/transformers_covid/lib/python3.8/http/client.py", line 1255, in request                                                                       
    self._send_request(method, url, body, headers, encode_chunked)                                                                                                                                        
  File "/mnt/lsf-nas-1/lsf/job/repo/elnaggar/anaconda3/envs/transformers_covid/lib/python3.8/http/client.py", line 1301, in _send_request                                                                 
    self.endheaders(body, encode_chunked=encode_chunked)                                                                                                                                                  
  File "/mnt/lsf-nas-1/lsf/job/repo/elnaggar/anaconda3/envs/transformers_covid/lib/python3.8/http/client.py", line 1250, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/mnt/lsf-nas-1/lsf/job/repo/elnaggar/anaconda3/envs/transformers_covid/lib/python3.8/http/client.py", line 1049, in _send_output
    self.send(chunk)
  File "/mnt/lsf-nas-1/lsf/job/repo/elnaggar/anaconda3/envs/transformers_covid/lib/python3.8/http/client.py", line 971, in send
    self.sock.sendall(data)
  File "/mnt/lsf-nas-1/lsf/job/repo/elnaggar/anaconda3/envs/transformers_covid/lib/python3.8/ssl.py", line 1204, in sendall
    v = self.send(byte_view[count:])
  File "/mnt/lsf-nas-1/lsf/job/repo/elnaggar/anaconda3/envs/transformers_covid/lib/python3.8/ssl.py", line 1173, in send
    return self._sslobj.write(data)
urllib3.exceptions.ProtocolError: ('Connection aborted.', BrokenPipeError(32, 'Broken pipe'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/mnt/lsf-nas-1/lsf/job/repo/elnaggar/anaconda3/envs/transformers_covid/bin/transformers-cli", line 8, in <module>
    sys.exit(main())
  File "/mnt/lsf-nas-1/lsf/job/repo/elnaggar/anaconda3/envs/transformers_covid/lib/python3.8/site-packages/transformers/commands/transformers_cli.py", line 33, in main
    service.run()
  File "/mnt/lsf-nas-1/lsf/job/repo/elnaggar/anaconda3/envs/transformers_covid/lib/python3.8/site-packages/transformers/commands/user.py", line 232, in run
    access_url = self._api.presign_and_upload(
  File "/mnt/lsf-nas-1/lsf/job/repo/elnaggar/anaconda3/envs/transformers_covid/lib/python3.8/site-packages/transformers/hf_api.py", line 167, in presign_and_upload
    r = requests.put(urls.write, data=data, headers={"content-type": urls.type})
  File "/mnt/lsf-nas-1/lsf/job/repo/elnaggar/anaconda3/envs/transformers_covid/lib/python3.8/site-packages/requests/api.py", line 134, in put
    return request('put', url, data=data, **kwargs)
  File "/mnt/lsf-nas-1/lsf/job/repo/elnaggar/anaconda3/envs/transformers_covid/lib/python3.8/site-packages/requests/api.py", line 61, in request
    return session.request(method=method, url=url, **kwargs)
  File "/mnt/lsf-nas-1/lsf/job/repo/elnaggar/anaconda3/envs/transformers_covid/lib/python3.8/site-packages/requests/sessions.py", line 530, in request
    resp = self.send(prep, **send_kwargs)
  File "/mnt/lsf-nas-1/lsf/job/repo/elnaggar/anaconda3/envs/transformers_covid/lib/python3.8/site-packages/requests/sessions.py", line 643, in send
    r = adapter.send(request, **kwargs)
  File "/mnt/lsf-nas-1/lsf/job/repo/elnaggar/anaconda3/envs/transformers_covid/lib/python3.8/site-packages/requests/adapters.py", line 498, in send
    raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', BrokenPipeError(32, 'Broken pipe'))                                                                                                                                                          
     

Expected behavior

I am trying to upload our T5-3B model using transformers-cli, but it always fails and gives "BrokenPipeError".
It only uploads small files like configuration files but it fails for the model files.
I have tried two different machines and both of them gives the same error.

@julien-c
Copy link
Member

Yes this is a known issue with our current system that will be fixed in ~1 month.

In the meantime, if you can upload to a different S3 bucket I can cp the files to your account on ours. Would you be able to do this?

@agemagician
Copy link
Contributor Author

agemagician commented Sep 30, 2020

I don't have access to S3. However, I uploaded the model in my dropbox:
https://www.dropbox.com/sh/0e7weo5l6g1uvqi/AADBZN_vuawdR3YOUOzZRo8Pa?dl=0

Is it possible to download and upload it from the dropbox folder?

@patrickvonplaten
Copy link
Contributor

Super I'll take care of it!

@patrickvonplaten patrickvonplaten self-assigned this Sep 30, 2020
@patrickvonplaten
Copy link
Contributor

model is uploaded here: https://huggingface.co/Rostlab/prot_t5_xl_bfd

@agemagician
Copy link
Contributor Author

Perfect, thanks a lot @patrickvonplaten for your help.
This solves my issue 😄

I will test the model to make sure everything is working as expected.

Should we close this issue as it solved my current problem, or should we leave it open until the "transformers-cli" uploading problem is solved?

I will leave it to you.

@patrickvonplaten
Copy link
Contributor

Let's leave it open :-)

@chambliss
Copy link

chambliss commented Oct 10, 2020

Hi! I'm having an issue uploading a model as well. I've tried several different iterations of the CLI command to get it to work. I'm following the instructions from the model sharing docs.

Here's the info about my setup:

  • transformers version: 3.3.1
  • Platform: Ubuntu (it's a Google Cloud Platform VM)
  • Python version: 3.8.5
  • PyTorch version (GPU?): 1.4.0 (True)
  • Tensorflow version (GPU?): 2.3.1 (True)
  • Using GPU in script?: No
  • Using distributed or parallel set-up in script?: No

First, I tried transformers-cli upload distilbert-for-food-extraction, as it says to do in the docs. This fails because for some reason the directory is not found, even though ls distilbert-for-food-extraction confirms that the directory and its files exist in this location.

(hf-nlp) charlenechambliss@charlene-gpu:~/.cache/food-ner/models$ transformers-cli upload chambliss/distilbert-for-food-extraction
2020-10-10 21:43:16.899194: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
Traceback (most recent call last):
  File "/home/charlenechambliss/anaconda3/envs/hf-nlp/bin/transformers-cli", line 8, in <module>
    sys.exit(main())
  File "/home/charlenechambliss/anaconda3/envs/hf-nlp/lib/python3.8/site-packages/transformers/commands/transformers_cli.py", line 33, in main
    service.run()
  File "/home/charlenechambliss/anaconda3/envs/hf-nlp/lib/python3.8/site-packages/transformers/commands/user.py", line 197, in run
    files = self.walk_dir(rel_path)
  File "/home/charlenechambliss/anaconda3/envs/hf-nlp/lib/python3.8/site-packages/transformers/commands/user.py", line 180, in walk_dir
    entries: List[os.DirEntry] = list(os.scandir(rel_path))
FileNotFoundError: [Errno 2] No such file or directory: 'distilbert-for-food-extraction'

Then I tried nesting it under a directory matching my HuggingFace username, so now the path is chambliss/distilbert-for-food-extraction. Attempting the upload again seems to result in 3 out of 6 files being uploaded, then the process is aborted. Here is the full output I'm getting:

(hf-nlp) charlenechambliss@charlene-gpu:~/.cache/food-ner/models$ transformers-cli upload chambliss
2020-10-10 21:43:28.932647: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
About to upload file /home/charlenechambliss/.cache/food-ner/models/chambliss/distilbert-for-food-extraction/special_tokens_map.json to S3 under filename chambliss/distilbert-for-food-extraction/special_tokens_map.json and namespace chambliss
About to upload file /home/charlenechambliss/.cache/food-ner/models/chambliss/distilbert-for-food-extraction/vocab.txt to S3 under filename chambliss/distilbert-for-food-extraction/vocab.txt and namespace chambliss
About to upload file /home/charlenechambliss/.cache/food-ner/models/chambliss/distilbert-for-food-extraction/pytorch_model.bin to S3 under filename chambliss/distilbert-for-food-extraction/pytorch_model.bin and namespace chambliss
About to upload file /home/charlenechambliss/.cache/food-ner/models/chambliss/distilbert-for-food-extraction/config.json to S3 under filename chambliss/distilbert-for-food-extraction/config.json and namespace chambliss
About to upload file /home/charlenechambliss/.cache/food-ner/models/chambliss/distilbert-for-food-extraction/tokenizer_config.json to S3 under filename chambliss/distilbert-for-food-extraction/tokenizer_config.json and namespace chambliss
About to upload file /home/charlenechambliss/.cache/food-ner/models/chambliss/distilbert-for-food-extraction/tf_model.h5 to S3 under filename chambliss/distilbert-for-food-extraction/tf_model.h5 and namespace chambliss
Proceed? [Y/n] Y
Uploading... This might take a while if files are large
Your file now lives at:                                                                                       
https://s3.amazonaws.com/models.huggingface.co/bert/chambliss/chambliss/distilbert-for-food-extraction/special_tokens_map.json
Your file now lives at:                                                                                       
https://s3.amazonaws.com/models.huggingface.co/bert/chambliss/chambliss/distilbert-for-food-extraction/vocab.txt
Your file now lives at:                                                                                       
https://s3.amazonaws.com/models.huggingface.co/bert/chambliss/chambliss/distilbert-for-food-extraction/pytorch_model.bin
400 Client Error: Bad Request for url: https://huggingface.co/api/presign
Filename invalid, model must be at exactly one level of nesting, i.e. "user/model_name".

If there is not a fix available for this at the moment, would it be possible to have my model uploaded via Dropbox as well?

Thanks!
Charlene

@patrickvonplaten
Copy link
Contributor

Hey @chambliss - it looks like you are uploading the wrong folder. Instead of running

~/.cache/food-ner/models$ transformers-cli upload chambliss

you should run

~/.cache/food-ner/models/chambliss$ transformers-cli upload distilbert-for-food-extraction

I think

@julien-c
Copy link
Member

I'll second that. If ls distilbert-for-food-extraction works and shows the correct files, transformers-cli upload distilbert-for-food-extraction should work and would be able to find the correct directory.

@chambliss
Copy link

@patrickvonplaten @julien-c Thanks for the response guys! I'm not sure why the directory wasn't found the first time, but I tried it again just now (from inside the /chambliss directory, so ~/.cache/food-ner/models/chambliss$ transformers-cli upload distilbert-for-food-extraction, as suggested) and it worked.

As a user, it is a little confusing for a reference to the correct directory not to work, and to have to be exactly one level above the directory in order for the upload to succeed. The example given on the page (transformers-cli upload path/to/awesome-name-you-picked/) implies that you can do the upload from anywhere relative to the folder. If that is a constraint, it may be worth updating the docs to reflect it.

Thanks again for the help!

@julien-c
Copy link
Member

no, it is indeed supposed to work as you describe, specifying the dir from any point in your filesystem.

Let us know if that's not the case.

@julien-c julien-c linked a pull request Nov 6, 2020 that will close this issue
@julien-c
Copy link
Member

Will reopen this for clarity until the fix mentioned in #8480 (comment) is deployed

@julien-c julien-c reopened this Nov 16, 2020
@stale
Copy link

stale bot commented Jan 16, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Jan 16, 2021
@julien-c
Copy link
Member

Ok, closing this for real now! 😎

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants