New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
lfs: only buffer first 1k when creating a CleanPointerError #1852
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
technoweenie
approved these changes
Jan 12, 2017
ttaylorr
added a commit
that referenced
this pull request
Jan 12, 2017
ttaylorr
added a commit
that referenced
this pull request
Jan 12, 2017
Backport #1852 for v1.5.x: lfs: only buffer first 1k when creating a CleanPointerError
chrisd8088
added a commit
to chrisd8088/git-lfs
that referenced
this pull request
Mar 11, 2021
When the "clean" filter processes a pre-existing LFS pointer blob, the copyToTemp() function returns a CleanPointerError; it does this when DecodeFrom() successfully reads and parses the pointer blob and when the entirety of the blob's contents have been read (i.e., there is no additional blob data to be read). This latter check was originally introduced in commit e09e5e1 in PR git-lfs#271, when the relevant code was in the Clean() method in the pointer/clean.go file; it used the same 512 byte maximum to determine if all the blob content had been read. This was aligned with the size of the byte array used in DecodeFrom() in the pointer/pointer.go file. The 512-byte buffer created in DecodeFrom() was returned to Clean(), which then checked its length to see if it had been fully read or if there might still be additional data. In commit f58db7f in PR git-lfs#684 the size of the read buffer in DecodeFrom() was changed from 512 bytes to the value of blobSizeCutoff, which was 1024 (and remains so since). However, the check in Clean()'s copyToTemp() function was not changed at the same time. (Note that Clean() had been refactored and the check was in copyToTemp(), where it remains.) In commit 6f54232 in PR git-lfs#1796 the DecodeFrom() method was changed to return an io.Reader instead of a byte array, and copyToTemp() would then read from that into a byte array using io.ReadAll(). That introduced a bug when handling large malformed pointers, fixed in commit dcc0581 in PR git-lfs#1852 by reading into a byte array in copyToTemp() with blobSizeCutoff length. However, the check for a resultant data length of less than 512 bytes still remained in place. In order to keep this blob size check in sync with the allocated byte array in copyToTemp(), we therefore change copyToTemp() so it checks if a pointer was successfully read and the blob size is below blobSizeCutoff. This implies there is no more data to be read and we can return a CleanPointerError (to signal a clean pointer was found) and the byte array, which will then be written out, leaving the blob's contents unchanged.
chrisd8088
added a commit
to chrisd8088/git-lfs
that referenced
this pull request
Mar 11, 2021
When the "clean" filter processes a pre-existing LFS pointer blob, the copyToTemp() function returns a CleanPointerError; it does this when DecodeFrom() successfully reads and parses the pointer blob and when the entirety of the blob's contents have been read (i.e., there is no additional blob data to be read). This latter check was originally introduced in commit e09e5e1 in PR git-lfs#271, when the relevant code was in the Clean() method in the pointer/clean.go file; it used the same 512 byte maximum to determine if all the blob content had been read. This was aligned with the size of the byte array used in DecodeFrom() in the pointer/pointer.go file. The 512-byte buffer created in DecodeFrom() was adjusted and returned to Clean(), which then checked its length to see if it had been populated to the 512-byte maximum, meaning there might still be additional data to be read. In commit f58db7f in PR git-lfs#684 the size of the read buffer in DecodeFrom() was changed from 512 bytes to the value of blobSizeCutoff, which was 1024 (and remains so since). However, the check in Clean()'s copyToTemp() function was not changed at the same time. (Note that Clean() had been refactored and the check was in copyToTemp(), where it remains.) In commit 6f54232 in PR git-lfs#1796 the DecodeFrom() method was changed to return an io.Reader instead of a byte array, and copyToTemp() would then read from that into a byte array using ioutil.ReadAll(). That introduced a bug when handling large malformed pointers, fixed in commit dcc0581 in PR git-lfs#1852 by reading into a byte array in copyToTemp() with blobSizeCutoff length. However, the check for a resultant data length of less than 512 bytes still remained in place. In order to keep this blob size check in sync with the allocated byte array in copyToTemp(), we therefore change copyToTemp() so it checks if a pointer was successfully read and the blob size is below blobSizeCutoff. This implies there is no more data to be read and we can return a CleanPointerError (to signal a clean pointer was found) containing the byte array, which will then be written out, leaving the blob's contents unchanged.
pcal43
pushed a commit
to pcal43/git-lfs-hack
that referenced
this pull request
Jul 22, 2021
When the "clean" filter processes a pre-existing LFS pointer blob, the copyToTemp() function returns a CleanPointerError; it does this when DecodeFrom() successfully reads and parses the pointer blob and when the entirety of the blob's contents have been read (i.e., there is no additional blob data to be read). This latter check was originally introduced in commit e09e5e1 in PR git-lfs#271, when the relevant code was in the Clean() method in the pointer/clean.go file; it used the same 512 byte maximum to determine if all the blob content had been read. This was aligned with the size of the byte array used in DecodeFrom() in the pointer/pointer.go file. The 512-byte buffer created in DecodeFrom() was adjusted and returned to Clean(), which then checked its length to see if it had been populated to the 512-byte maximum, meaning there might still be additional data to be read. In commit f58db7f in PR git-lfs#684 the size of the read buffer in DecodeFrom() was changed from 512 bytes to the value of blobSizeCutoff, which was 1024 (and remains so since). However, the check in Clean()'s copyToTemp() function was not changed at the same time. (Note that Clean() had been refactored and the check was in copyToTemp(), where it remains.) In commit 6f54232 in PR git-lfs#1796 the DecodeFrom() method was changed to return an io.Reader instead of a byte array, and copyToTemp() would then read from that into a byte array using ioutil.ReadAll(). That introduced a bug when handling large malformed pointers, fixed in commit dcc0581 in PR git-lfs#1852 by reading into a byte array in copyToTemp() with blobSizeCutoff length. However, the check for a resultant data length of less than 512 bytes still remained in place. In order to keep this blob size check in sync with the allocated byte array in copyToTemp(), we therefore change copyToTemp() so it checks if a pointer was successfully read and the blob size is below blobSizeCutoff. This implies there is no more data to be read and we can return a CleanPointerError (to signal a clean pointer was found) containing the byte array, which will then be written out, leaving the blob's contents unchanged.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull-request fixes a bug where the entire contents of a malformed pointers would be buffered in memory, as reported in #1851.
Some backstory: in v1.5.4 I merged #1796 (via #1805) which re-added support to stream the entire contents of a malformed pointer though the filter-protocol added in Git v2.11. This involved a change to the signature of DecodeFrom where instead returning a
[]byte
, containing the data it buffered, it instead returned anio.Reader
which contained the entire contents of the pointer and could be read from by the caller.To bring the calling code up to speed, I introduced the following line to grab a
[]byte
of the data buffered from reading the pointer:This is fine for small pointers, but will exhaust system memory when the malformed pointer is large. The original behavior was to only buffer at most the first 1024 (see:
lfs.blobSizeCutoff
) bytes to return with theNewCleanPointerError
, and this pull-request brings that back.We only need to buffer the first 1k here, since when writing the data back via
NewCleanPointerError
, we're dealing with the case that the file is already a pointer, and should be written back wholesale. More information about this code-path is in #1851 (comment).The entire reader is still available if an error was not returned (i.e., if the
len(by) >= 512
), so that it can be copied to the writer,writer
, on L94.After some 馃憖 get a chance to take a look at this, I think we should release this patch as v1.5.5 (and hopefully the last patch in the 1.5.x series).
/cc @git-lfs/core #1851