lfs: only buffer first 1k when creating a CleanPointerError #1852

ttaylorr · 2017-01-12T00:23:52Z

This pull-request fixes a bug where the entire contents of a malformed pointers would be buffered in memory, as reported in #1851.

Some backstory: in v1.5.4 I merged #1796 (via #1805) which re-added support to stream the entire contents of a malformed pointer though the filter-protocol added in Git v2.11. This involved a change to the signature of DecodeFrom where instead returning a []byte, containing the data it buffered, it instead returned an io.Reader which contained the entire contents of the pointer and could be read from by the caller.

To bring the calling code up to speed, I introduced the following line to grab a []byte of the data buffered from reading the pointer:

by, rerr := ioutil.ReadAll(buf)

This is fine for small pointers, but will exhaust system memory when the malformed pointer is large. The original behavior was to only buffer at most the first 1024 (see: lfs.blobSizeCutoff) bytes to return with the NewCleanPointerError, and this pull-request brings that back.

We only need to buffer the first 1k here, since when writing the data back via NewCleanPointerError, we're dealing with the case that the file is already a pointer, and should be written back wholesale. More information about this code-path is in #1851 (comment).

The entire reader is still available if an error was not returned (i.e., if the len(by) >= 512), so that it can be copied to the writer, writer, on L94.

After some 👀 get a chance to take a look at this, I think we should release this patch as v1.5.5 (and hopefully the last patch in the 1.5.x series).

/cc @git-lfs/core #1851

Backport #1852 for v1.5.x: lfs: only buffer first 1k when creating a CleanPointerError

When the "clean" filter processes a pre-existing LFS pointer blob, the copyToTemp() function returns a CleanPointerError; it does this when DecodeFrom() successfully reads and parses the pointer blob and when the entirety of the blob's contents have been read (i.e., there is no additional blob data to be read). This latter check was originally introduced in commit e09e5e1 in PR git-lfs#271, when the relevant code was in the Clean() method in the pointer/clean.go file; it used the same 512 byte maximum to determine if all the blob content had been read. This was aligned with the size of the byte array used in DecodeFrom() in the pointer/pointer.go file. The 512-byte buffer created in DecodeFrom() was returned to Clean(), which then checked its length to see if it had been fully read or if there might still be additional data. In commit f58db7f in PR git-lfs#684 the size of the read buffer in DecodeFrom() was changed from 512 bytes to the value of blobSizeCutoff, which was 1024 (and remains so since). However, the check in Clean()'s copyToTemp() function was not changed at the same time. (Note that Clean() had been refactored and the check was in copyToTemp(), where it remains.) In commit 6f54232 in PR git-lfs#1796 the DecodeFrom() method was changed to return an io.Reader instead of a byte array, and copyToTemp() would then read from that into a byte array using io.ReadAll(). That introduced a bug when handling large malformed pointers, fixed in commit dcc0581 in PR git-lfs#1852 by reading into a byte array in copyToTemp() with blobSizeCutoff length. However, the check for a resultant data length of less than 512 bytes still remained in place. In order to keep this blob size check in sync with the allocated byte array in copyToTemp(), we therefore change copyToTemp() so it checks if a pointer was successfully read and the blob size is below blobSizeCutoff. This implies there is no more data to be read and we can return a CleanPointerError (to signal a clean pointer was found) and the byte array, which will then be written out, leaving the blob's contents unchanged.

When the "clean" filter processes a pre-existing LFS pointer blob, the copyToTemp() function returns a CleanPointerError; it does this when DecodeFrom() successfully reads and parses the pointer blob and when the entirety of the blob's contents have been read (i.e., there is no additional blob data to be read). This latter check was originally introduced in commit e09e5e1 in PR git-lfs#271, when the relevant code was in the Clean() method in the pointer/clean.go file; it used the same 512 byte maximum to determine if all the blob content had been read. This was aligned with the size of the byte array used in DecodeFrom() in the pointer/pointer.go file. The 512-byte buffer created in DecodeFrom() was adjusted and returned to Clean(), which then checked its length to see if it had been populated to the 512-byte maximum, meaning there might still be additional data to be read. In commit f58db7f in PR git-lfs#684 the size of the read buffer in DecodeFrom() was changed from 512 bytes to the value of blobSizeCutoff, which was 1024 (and remains so since). However, the check in Clean()'s copyToTemp() function was not changed at the same time. (Note that Clean() had been refactored and the check was in copyToTemp(), where it remains.) In commit 6f54232 in PR git-lfs#1796 the DecodeFrom() method was changed to return an io.Reader instead of a byte array, and copyToTemp() would then read from that into a byte array using ioutil.ReadAll(). That introduced a bug when handling large malformed pointers, fixed in commit dcc0581 in PR git-lfs#1852 by reading into a byte array in copyToTemp() with blobSizeCutoff length. However, the check for a resultant data length of less than 512 bytes still remained in place. In order to keep this blob size check in sync with the allocated byte array in copyToTemp(), we therefore change copyToTemp() so it checks if a pointer was successfully read and the blob size is below blobSizeCutoff. This implies there is no more data to be read and we can return a CleanPointerError (to signal a clean pointer was found) containing the byte array, which will then be written out, leaving the blob's contents unchanged.

lfs: only buffer first 1k when creating a CleanPointerError

dcc0581

ttaylorr added the bug label Jan 12, 2017

ttaylorr self-assigned this Jan 12, 2017

ttaylorr requested a review from technoweenie January 12, 2017 00:23

ttaylorr mentioned this pull request Jan 12, 2017

Adding large files exhausts memory #1851

Closed

technoweenie approved these changes Jan 12, 2017

View reviewed changes

Merge branch 'master' into buffer-blob-size-only

0aa63d1

ttaylorr merged commit faa0f06 into master Jan 12, 2017

ttaylorr deleted the buffer-blob-size-only branch January 12, 2017 17:19

ttaylorr added a commit that referenced this pull request Jan 12, 2017

Backport buffer-blob-size-only from #1852 to release-1.5

12249ca

ttaylorr mentioned this pull request Jan 12, 2017

Backport #1852 for v1.5.x: lfs: only buffer first 1k when creating a CleanPointerError #1856

Merged

ttaylorr added a commit that referenced this pull request Jan 12, 2017

Merge pull request #1856 from git-lfs/release-1.5-backport-1852

15ced16

Backport #1852 for v1.5.x: lfs: only buffer first 1k when creating a CleanPointerError

ttaylorr mentioned this pull request Aug 6, 2018

support tracking by file size #282

Open

chrisd8088 mentioned this pull request Mar 11, 2021

Use blobSizeCutoff in clean pointer buffer length check #4433

Merged

haikusw mentioned this pull request May 10, 2022

Running into what appears to be a fairly common issue with git-lfs failing to authenticate during push (verify locks) #5001

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lfs: only buffer first 1k when creating a CleanPointerError #1852

lfs: only buffer first 1k when creating a CleanPointerError #1852

ttaylorr commented Jan 12, 2017

lfs: only buffer first 1k when creating a CleanPointerError #1852

lfs: only buffer first 1k when creating a CleanPointerError #1852

Conversation

ttaylorr commented Jan 12, 2017