Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

lfs: only buffer first 1k when creating a CleanPointerError #1852

Merged
merged 2 commits into from Jan 12, 2017

Conversation

ttaylorr
Copy link
Contributor

This pull-request fixes a bug where the entire contents of a malformed pointers would be buffered in memory, as reported in #1851.

Some backstory: in v1.5.4 I merged #1796 (via #1805) which re-added support to stream the entire contents of a malformed pointer though the filter-protocol added in Git v2.11. This involved a change to the signature of DecodeFrom where instead returning a []byte, containing the data it buffered, it instead returned an io.Reader which contained the entire contents of the pointer and could be read from by the caller.

To bring the calling code up to speed, I introduced the following line to grab a []byte of the data buffered from reading the pointer:

by, rerr := ioutil.ReadAll(buf)

This is fine for small pointers, but will exhaust system memory when the malformed pointer is large. The original behavior was to only buffer at most the first 1024 (see: lfs.blobSizeCutoff) bytes to return with the NewCleanPointerError, and this pull-request brings that back.

We only need to buffer the first 1k here, since when writing the data back via NewCleanPointerError, we're dealing with the case that the file is already a pointer, and should be written back wholesale. More information about this code-path is in #1851 (comment).

The entire reader is still available if an error was not returned (i.e., if the len(by) >= 512), so that it can be copied to the writer, writer, on L94.

After some 馃憖 get a chance to take a look at this, I think we should release this patch as v1.5.5 (and hopefully the last patch in the 1.5.x series).


/cc @git-lfs/core #1851

@ttaylorr ttaylorr merged commit faa0f06 into master Jan 12, 2017
@ttaylorr ttaylorr deleted the buffer-blob-size-only branch January 12, 2017 17:19
ttaylorr added a commit that referenced this pull request Jan 12, 2017
Backport #1852 for v1.5.x: lfs: only buffer first 1k when creating a CleanPointerError
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Mar 11, 2021
When the "clean" filter processes a pre-existing LFS pointer
blob, the copyToTemp() function returns a CleanPointerError;
it does this when DecodeFrom() successfully reads and parses
the pointer blob and when the entirety of the blob's contents
have been read (i.e., there is no additional blob data to be
read).

This latter check was originally introduced in commit
e09e5e1 in PR git-lfs#271, when
the relevant code was in the Clean() method in the
pointer/clean.go file; it used the same 512 byte maximum
to determine if all the blob content had been read.  This
was aligned with the size of the byte array used in
DecodeFrom() in the pointer/pointer.go file.  The 512-byte
buffer created in DecodeFrom() was returned to Clean(),
which then checked its length to see if it had been fully
read or if there might still be additional data.

In commit f58db7f in
PR git-lfs#684 the size of the read buffer in DecodeFrom() was
changed from 512 bytes to the value of blobSizeCutoff, which
was 1024 (and remains so since).  However, the check in
Clean()'s copyToTemp() function was not changed at the same
time.  (Note that Clean() had been refactored and the check
was in copyToTemp(), where it remains.)

In commit 6f54232 in
PR git-lfs#1796 the DecodeFrom() method was changed to return an
io.Reader instead of a byte array, and copyToTemp() would
then read from that into a byte array using io.ReadAll().
That introduced a bug when handling large malformed pointers,
fixed in commit dcc0581 in
PR git-lfs#1852 by reading into a byte array in copyToTemp() with
blobSizeCutoff length.  However, the check for a resultant
data length of less than 512 bytes still remained in place.

In order to keep this blob size check in sync with the
allocated byte array in copyToTemp(), we therefore change
copyToTemp() so it checks if a pointer was successfully
read and the blob size is below blobSizeCutoff.  This
implies there is no more data to be read and we can
return a CleanPointerError (to signal a clean pointer was
found) and the byte array, which will then be written out,
leaving the blob's contents unchanged.
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Mar 11, 2021
When the "clean" filter processes a pre-existing LFS pointer
blob, the copyToTemp() function returns a CleanPointerError;
it does this when DecodeFrom() successfully reads and parses
the pointer blob and when the entirety of the blob's contents
have been read (i.e., there is no additional blob data to be
read).

This latter check was originally introduced in commit
e09e5e1 in PR git-lfs#271, when
the relevant code was in the Clean() method in the
pointer/clean.go file; it used the same 512 byte maximum
to determine if all the blob content had been read.  This
was aligned with the size of the byte array used in
DecodeFrom() in the pointer/pointer.go file.  The 512-byte
buffer created in DecodeFrom() was adjusted and returned to
Clean(), which then checked its length to see if it had been
populated to the 512-byte maximum, meaning there might still
be additional data to be read.

In commit f58db7f in
PR git-lfs#684 the size of the read buffer in DecodeFrom() was
changed from 512 bytes to the value of blobSizeCutoff, which
was 1024 (and remains so since).  However, the check in
Clean()'s copyToTemp() function was not changed at the same
time.  (Note that Clean() had been refactored and the check
was in copyToTemp(), where it remains.)

In commit 6f54232 in
PR git-lfs#1796 the DecodeFrom() method was changed to return an
io.Reader instead of a byte array, and copyToTemp() would
then read from that into a byte array using ioutil.ReadAll().
That introduced a bug when handling large malformed pointers,
fixed in commit dcc0581 in
PR git-lfs#1852 by reading into a byte array in copyToTemp() with
blobSizeCutoff length.  However, the check for a resultant
data length of less than 512 bytes still remained in place.

In order to keep this blob size check in sync with the
allocated byte array in copyToTemp(), we therefore change
copyToTemp() so it checks if a pointer was successfully
read and the blob size is below blobSizeCutoff.  This
implies there is no more data to be read and we can
return a CleanPointerError (to signal a clean pointer was
found) containing the byte array, which will then be written
out, leaving the blob's contents unchanged.
pcal43 pushed a commit to pcal43/git-lfs-hack that referenced this pull request Jul 22, 2021
When the "clean" filter processes a pre-existing LFS pointer
blob, the copyToTemp() function returns a CleanPointerError;
it does this when DecodeFrom() successfully reads and parses
the pointer blob and when the entirety of the blob's contents
have been read (i.e., there is no additional blob data to be
read).

This latter check was originally introduced in commit
e09e5e1 in PR git-lfs#271, when
the relevant code was in the Clean() method in the
pointer/clean.go file; it used the same 512 byte maximum
to determine if all the blob content had been read.  This
was aligned with the size of the byte array used in
DecodeFrom() in the pointer/pointer.go file.  The 512-byte
buffer created in DecodeFrom() was adjusted and returned to
Clean(), which then checked its length to see if it had been
populated to the 512-byte maximum, meaning there might still
be additional data to be read.

In commit f58db7f in
PR git-lfs#684 the size of the read buffer in DecodeFrom() was
changed from 512 bytes to the value of blobSizeCutoff, which
was 1024 (and remains so since).  However, the check in
Clean()'s copyToTemp() function was not changed at the same
time.  (Note that Clean() had been refactored and the check
was in copyToTemp(), where it remains.)

In commit 6f54232 in
PR git-lfs#1796 the DecodeFrom() method was changed to return an
io.Reader instead of a byte array, and copyToTemp() would
then read from that into a byte array using ioutil.ReadAll().
That introduced a bug when handling large malformed pointers,
fixed in commit dcc0581 in
PR git-lfs#1852 by reading into a byte array in copyToTemp() with
blobSizeCutoff length.  However, the check for a resultant
data length of less than 512 bytes still remained in place.

In order to keep this blob size check in sync with the
allocated byte array in copyToTemp(), we therefore change
copyToTemp() so it checks if a pointer was successfully
read and the blob size is below blobSizeCutoff.  This
implies there is no more data to be read and we can
return a CleanPointerError (to signal a clean pointer was
found) containing the byte array, which will then be written
out, leaving the blob's contents unchanged.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants