-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for file write/upload operations with HfHubRepository
#354
Add support for file write/upload operations with HfHubRepository
#354
Conversation
transactions or otherwise)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
Added some comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familar with the codebase yet, so my comments are somewhat superficial. Overall this looks reasonable to me. I'll do a second pass after the first batch of comments has been addressed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
Description
This PR introduces support for uploading files to Hugging Face Hub repositories through the current
RepositoryFile
API. In the case offsspec
repositories, we support implicit streaming of data using buffered IO. However, this is not possible with HF repositories due its usage of Git as a backing store. So, we need to allocate and write to files on the host's local storage device before uploading them as a Git commit operation to the remote repo.The current implementation hides this detail by using a proxy to represent the remote file that we want to write to. The
HfHubFile
class wraps either a locally cached file from a HF repo or a temporary file on the local storage. The user can open and write to the latter like they would with any (fsspec
) file - its contents will be uploaded to the repo as soon as the file handle is closed.Furthermore, we also introduce the concept of transactional file writes using a context manager. This lets the user batch multiple file operations that get uploaded to the repo as a single commit. The
fsspec
implementation of transactions will be implemented in a follow-up PR.Other changes:
Repository.file
returns a lazily-loaded file, i.e., its existence is only checked when the file is opened using theopen
method (or when theexists
method is called).FsspecRepository.open
that calledunstrip_protocol
on aNone
object.Checklist