Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implements 'Repository.create_blob_fromiobase'. #490

Merged
merged 2 commits into from
Oct 25, 2015

Conversation

frutiger
Copy link

This commit allows blobs to be constructed from implementatons of the 'RawIOBase' ABC. This allows any conformant stream implementation to be added as a blob.

This is useful in the case where the contents of the blob would be too large to create from memory (as 'create_blob_fromstring' allows) and avoids having to write the data to disk (as 'create_blob_fromworkdir' allows). The latter operation is especially useful when reading data from a network socket, since it avoids having to first commit all the data to disk.

A failing test case was added which is fixed by this commit.

This commit allows blobs to be constructed from implementatons of the
'RawIOBase' ABC.  This allows any conformant stream implementation to be added
as a blob.

This is useful in the case where the contents of the blob would be too large to
create from memory (as 'create_blob_fromstring' allows) and avoids having to
write the data to disk (as 'create_blob_fromworkdir' allows).  The latter
operation is especially useful when reading data from a network socket, since
it avoids having to first commit all the data to disk.
@carlosmn
Copy link
Member

Note that this does not remove the need to have to write data to disc before writing it into the object database. That step is simply behind the _fromchunks() function call.

@frutiger
Copy link
Author

Indeed - but it only needs to write it to disk once. Using fromworkdir involves writing it to disk twice: first to the working directory, and then to the objects database.

@frutiger
Copy link
Author

I whipped this together as a proof of concept (I'm relatively new to Python/C bindings). But I had the chance to sleep on it, and perhaps it would be cleaner to expose Blob_create_from_callback and then expose a pure Python implementation of Repository.create_blob_fromiobase that uses Blob_create_from_callback in its implementation. What do you think?

@carlosmn
Copy link
Member

Using fromworkdir involves writing it to disk twice: first to the working directory, and then to the objects database.

Using _fromchunks() is the same. First we have to write to a temporary file and then to the object database (they both use the same function to perform the work).

It is not possible to write create an object without knowing the size ahead of time. Anything which does not ask for the size beforehand (or for a full buffer) must store the data somewhere else before writing to the odb.

Python already has ways of making arbitrary sources look like a buffer/stream, so I'd rather use a readable stream interface rather than callbacks.

@jdavid
Copy link
Member

jdavid commented Mar 5, 2015

From the discussion I understand you cannot save writing the data to disk.

I have made some comments about the code, but not exhaustively, you need to thoroughly check errors and possible return values.

When @carlosmn says I'd rather use a readable stream interface rather than callbacks I understand he prefers the original proposal in this PR, and not Blob_create_from_callback. And I agree. Just fix the programming errors and it is good to me for merging.

@frutiger
Copy link
Author

frutiger commented Mar 8, 2015

@jdavid I addressed the review comments.

However I looked at libgit2 after @carlosmn's comments, and it does indeed create a temporary file. As a result, I'm not so sure about the relevance of this pull request. I will leave it to you guys to decide whether or not it should be merged, and happily accept a refusal.

@jdavid
Copy link
Member

jdavid commented Mar 8, 2015

Even if this does not speed up anything, it is still a nice API addition to be able to create a blob from a file like object.

But the unit tests are not passing with Python 2, that needs to be fixed.

@jdavid
Copy link
Member

jdavid commented Mar 8, 2015

Before I said supporting IOBase is enough, and that's right. But you may want to make this PR even better by supporting Python 2 files. Thought these files do not inherit from IOBase, and they don't have the readable method. If you wish to make this change, then the method should be renamed to the more general create_from_file. Otherwise, you can leave the PR the way it is, supporting Python 2 files may be the object of a future pull request.

@frutiger
Copy link
Author

frutiger commented Mar 9, 2015

@jdavid thanks for the kind feedback.

Looking at the link failure in the tests (and based on my extremely limited experience with Python/C interaction), is it even possible to implement it with this strategy in a way that is compatible with Python 2 and Python 3? Reading through the documentation implies this should rather be implemented at the python layer.

In fact, it would be far simpler to implement a create_blob_fromfile that worked correctly with file-like objects on Python 2 and IOBase implementations on Python 3 by doing most of the work in python code.

@jdavid
Copy link
Member

jdavid commented Mar 13, 2015

Okey, let make it simple:

  • Rename the method to create_blob_fromfile
  • Drop the type check, since this does not work on Python 2 (and pypy), we will stick to duck-typing by calling readable()
  • Change METH_VARARGS to METH_O
  • Add a TODO comment to the source, something like TODO Support Python 2 files
  • Add a TODO comment to the source, something like TODO Type check PyIOBase_Type (only works on Python 3)

Thanks!

@jdavid jdavid merged commit 38b1975 into libgit2:master Oct 25, 2015
@frutiger frutiger deleted the blob_from_iobase branch October 26, 2015 20:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants