Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

transformers-cli: LFS multipart uploads (> 5GB) #8663

Merged
merged 26 commits into from
Dec 7, 2020
Merged

Conversation

julien-c
Copy link
Member

@julien-c julien-c commented Nov 19, 2020

Implementation of a custom transfer agent for the transfer type "multipart" for git-lfs. This lets users upload large files >5GB 🔥.

Spec for LFS custom transfer agent is: https://github.com/git-lfs/git-lfs/blob/master/docs/custom-transfers.md

The PR introduces two commands to the CLI:

transformers-cli lfs-enable-largefiles ./path/to/repo

^ Do this once per model repo where you want to push >5GB files. It's documented in the error message you get if you just try to git push a 5GB file without having enabled it before.

transformers-cli lfs-multipart-upload

^ is the custom transfer agent itself. This is not meant to be called by the user, but by lfs directly.

Things to experiment with:

Experiment:

time git clone https://huggingface.co/t5-3b
cd t5-3b
git remote set-url origin https://huggingface.co/$USER/t5-3b-clone
# ^ After having created this model repo in your account
transformers-cli lfs-enable-largefiles .

git reset 5e0a32db352b33091ea9fb2f8d8782d47a505986 # go back to initial commit for lfs to reupload files
git add pytorch_model.bin
git commit -m "ok"
time git push

@julien-c julien-c linked an issue Nov 19, 2020 that may be closed by this pull request
@julien-c julien-c marked this pull request as ready for review November 20, 2020 15:06
@mymusise
Copy link
Contributor

👍 nice job!

def run(self):
local_path = os.path.abspath(self.args.path)
if not os.path.isdir(local_path):
print("This does not look like a valid git repo")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe better to raise an error?

return data

def __iter__(self):
yield self.read(n=4 * 1024 * 1024)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why 4 * 1024 * 1024?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

magic value from @madlag

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good!
On the nit side, since I'm a very annoying person, I'd very much like if we could consistently use f-strings which are prettier and faster.

@@ -0,0 +1,219 @@
"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a copyright at the top?

Comment on lines +11 to +16
``` [lfs "customtransfer.multipart"]

path = /path/to/transformers/.env/bin/python

args = -m debugpy --listen 5678 --wait-for-client /path/to/transformers/src/transformers/commands/transformers_cli.py
lfs-multipart-upload ```
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is weirdly formatted.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep – is there any way to tell our formatting tooling to not change lines which represent code or structured stuff?



class LfsCommands(BaseTransformersCLICommand):
# Implementation of a custom transfer agent for the transfer type "multipart" for git-lfs.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

src/transformers/commands/lfs.py Outdated Show resolved Hide resolved
src/transformers/hf_api.py Outdated Show resolved Hide resolved
src/transformers/hf_api.py Outdated Show resolved Hide resolved
Comment on lines 164 to 166
assert filetype in self.ALLOWED_S3_FILE_TYPES, "Please specify filetype from {}".format(
self.ALLOWED_S3_FILE_TYPES
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

Comment on lines 188 to 191
assert filetype in self.ALLOWED_S3_FILE_TYPES, "Please specify filetype from {}".format(
self.ALLOWED_S3_FILE_TYPES
)
path = "{}/api/{}/listObjs".format(self.endpoint, filetype)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

Comment on lines 204 to 207
assert filetype in self.ALLOWED_S3_FILE_TYPES, "Please specify filetype from {}".format(
self.ALLOWED_S3_FILE_TYPES
)
path = "{}/api/{}/deleteObj".format(self.endpoint, filetype)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

"""
HuggingFace S3-based system, used for datasets and metrics.

Get a presigned url, then upload file to S3.

Outputs: url: Read-only url for the stored file on S3.
"""
urls = self.presign(token, filename=filename, organization=organization)
assert filetype in self.ALLOWED_S3_FILE_TYPES, "Please specify filetype from {}".format(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) capital letter for "ALLOWED_S3_FILE_TYPES" for class attribute is a bit weird for me. I'd prefer allowed_s3_file_types

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

backported from code in datasets so staying consistent

"""
HuggingFace git-based system, used for models.

Call HF API to create a whole repo.

Params:
lfsmultipartthresh: Optional: internal param for testing purposes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we change lfsmultipartthresh to lfs_multi_part_thresh? => very hard to read :D

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'd rather keep the consistency with the server side on that one

@@ -73,29 +82,35 @@ def test_whoami(self):

def test_presign_invalid_org(self):
with self.assertRaises(HTTPError):
_ = self._api.presign(token=self._token, filename="nested/fake_org.txt", organization="fake")
_ = self._api.presign(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) no need for _ = => just self._api.presign(...) i good I think

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

@patrickvonplaten patrickvonplaten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a couple of nits. Thanks a lot for the PR! Also cc @agemagician

@agemagician
Copy link
Contributor

Thanks a lot for the PR, and thanks for letting me know.

I will give my feedback after testing it with the 11B model soon.

@mymusise
Copy link
Contributor

Hi, thanks a lot for the PR.
I try to reinstall transformers with this new branch and follow with this comment:

git clone https://huggingface.co/mymusise/CPM-Third-Party
cd CPM-Third-Party
git lfs install 
transformers-cli lfs-enable-largefiles .

cp ../models/tf_model.h5 ./
git add . && git commit -m 'add model'
git push

Then, after half an hour later, it raised an error:

$ git push
Git LFS: (0 of 1 files) 0 B / 9.68 GB
Git LFS: (0 of 1 files)  4.66 GB / 9.68 GB
EOFoading LFS objects:   0% (0/1), 9.31 B / 9.68 GB
error: failed to push some refs to 'https://huggingface.co/mymusise/CPM-Third-Party'

Did I do something wrong?

@julien-c
Copy link
Member Author

@mymusise might have been an intermittent server error. Can you try again?

@mymusise
Copy link
Contributor

@mymusise might have been an intermittent server error. Can you try again?

Yes, I try again and again. But it always raises this error at 9.31 GB / 9.68 GB.

👀 But, now git push will return a 502 error :

fatal: unable to access 'https://huggingface.co/mymusise/CPM-Third-Party/': The requested URL returned error: 502

@julien-c
Copy link
Member Author

@mymusise Yes, looks like this is crashing/locking the server 😱

Do you mind trying again on Monday? As we'll have more bandwidth to fix then. Sorry about that :/

@mymusise
Copy link
Contributor

It's ok, guy, haha. Waiting for your good news.
Happy weekend!

@mymusise
Copy link
Contributor

Yes, I try again and again. But it always raises this error at 9.31 GB / 9.68 GB.

Hi, I try again today and push the big model file without any exception! 🎉
Thank guys!

@mymusise
Copy link
Contributor

mymusise commented Nov 24, 2020

Hi, here I push another model file(4.9GB) again, but this time it gives me a 504 Gateway Time-out error 😓

root@iZt4n9z4x3ph9oc3hhrdneZ:~/CPM-FP16-Third-Party# time git push 
Username for 'https://huggingface.co': mymusise
Password for 'https://mymusise@huggingface.co': 
Counting objects: 3, done.
Compressing objects: 100% (2/2), done.
Writing objects:  66% (2/3), 2.52 GiB | 3.62 MiB/s      
Writing objects: 100% (3/3), 4.46 GiB | 3.77 MiB/s, done.
Total 3 (delta 0), reused 1 (delta 0)
error: RPC failed; HTTP 504 curl 22 The requested URL returned error: 504 Gateway Time-out
fatal: The remote end hung up unexpectedly
fatal: The remote end hung up unexpectedly
Everything up-to-date

real	22m8.653s
user	0m29.758s
sys	0m17.136s

Seems the big file is uploaded completely, it looks like there is some problem with the server configuration about the timeout. @julien-c

Tried three times with the same result.

@patrickvonplaten
Copy link
Contributor

@Pierrci - I ran some tests on ~10 GB files (t5-3b) and didn't encounter any problems! However when doing it for ~45GB files (t5-11b), I encounter some problems.

cd t5-11b-repo
transformers-cli lfs-enable-largefiles /path/to/repo
git add .
git commit -m "add"
git push # <= this command failse

In case it is useful, here is a link to the git trace error message: https://github.com/patrickvonplaten/files_to_link_to/blob/master/output.txt

I can always go back to the way of manually uploading to the git-lfs hash path. Maybe the error message is helpful though :-)

@Pierrci
Copy link
Member

Pierrci commented Nov 24, 2020

Thanks @patrickvonplaten, I see where it might be coming from, gonna look at it now!

@Pierrci
Copy link
Member

Pierrci commented Nov 25, 2020

I deployed a fix that should address your problem, can you try again @patrickvonplaten?

@mymusise Is it possible for you to try again with GIT_CURL_VERBOSE=1 git push so that we can try to get more information? From what you shared so far, your error seems to be different from Patrick's one.

@mymusise
Copy link
Contributor

Thanks, @Pierrci. Yes, I think my error is different from Patrick's one. Here is the information with GIT_CURL_VERBOSE=1 git push.
Hope it help.

@patrickvonplaten
Copy link
Contributor

I deployed a fix that should address your problem, can you try again @patrickvonplaten?

@mymusise Is it possible for you to try again with GIT_CURL_VERBOSE=1 git push so that we can try to get more information? From what you shared so far, your error seems to be different from Patrick's one.

Awesome it works now @Pierrci - thanks! :-)

@Pierrci
Copy link
Member

Pierrci commented Nov 30, 2020

Thanks, @Pierrci. Yes, I think my error is different from Patrick's one. Here is the information with GIT_CURL_VERBOSE=1 git push.
Hope it help.

@mymusise Are you sure LFS is properly installed and configured for the repo? From your logs it seems your git push command isn't doing any LFS work (like running a pre-push hook or calling our LFS endpoint), trying instead to push all the files through the classic git endpoint, which can't work.

julien-c and others added 2 commits December 7, 2020 22:11
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Pierric Cistac <Pierrci@users.noreply.github.com>
@julien-c julien-c merged commit 28fa014 into master Dec 7, 2020
@julien-c julien-c deleted the lfs-multipart-uploads branch December 7, 2020 21:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Can't upload the larger model file(9GB)
7 participants