-
Notifications
You must be signed in to change notification settings - Fork 146
fix: reimplement push parsing to prevent Z_DATA_ERROR #1187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: reimplement push parsing to prevent Z_DATA_ERROR #1187
Conversation
…and baseSha for ref_deltas
…ore it were processed correctly
✅ Deploy Preview for endearing-brigadeiros-63f9d0 canceled.
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1187 +/- ##
==========================================
+ Coverage 83.88% 84.11% +0.22%
==========================================
Files 68 68
Lines 2904 2958 +54
Branches 364 373 +9
==========================================
+ Hits 2436 2488 +52
- Misses 409 410 +1
- Partials 59 60 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…rsing' into 1040-Z-DATA-ERROR-during-push-parsing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a low-level code expert, so I'll just leave some general suggestions!
I've experimented a bit to verify whether the current code slows down the push process somewhat, and it seems that it's slightly slower (10~20%) for very large pushes:
Basic timing method

Timing scripts
Regular commit
main

This PR

Massive commit (~12 MB of text changes)
main

This PR

This might not be something we want to optimize right now, though.
I think there's not a lot more I can do on this PR, without heading into a port to fast-zlib or zlib-sync, which I think is a separate challenge - we probably need to do something similar to fast-zlib (using the internals of zlib, such as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One very minor nit but otherwise LGTM. I think @kriswest knows the packet format best at this point now that this refactoring has been done lol.
My only suggestion would be to add some real (binary) receive-pack files to use as input for the tests.
# Create some new content to generate a pack
echo "New content for pack testing $(date)" > new-file.txt
echo "# Updated README $(date)" >> README.md
git add .
git commit -m "Test commit for pack capture $(date)"
echo "Capturing receive-pack output to $output_file..."
# Method 1: Capture the actual pack data being sent
# This captures the raw pack file that would be sent to receive-pack
git format-patch --stdout HEAD~1..HEAD > "$output_file.patch"
# Method 2: Capture pack file from push operation with verbose output
GIT_TRACE_PACKET=1 git push origin main 2> "$output_file.trace" || true
# Method 3: Create a pack file directly
git pack-objects "$output_file" < <(git rev-list --objects HEAD)
Co-authored-by: Thomas Cooper <57812123+coopernetes@users.noreply.github.com> Signed-off-by: Kris West <kristopher.west@natwest.com>
…rsing' into 1040-Z-DATA-ERROR-during-push-parsing
@coopernetes test added that uses a captured push. I ended up popping a temporary change into parsePush to capture it as I had trouble with the methods given (and alternatives that an AI came up with):
Produces a patch file rather than a pack file.
...and a number of variations I tried, produce a trace file that you can extract the pack data from in theory. The output didn't quite match what I was expecting and was missing the packet lines that proceed the PACK data in a push
Creates PACK and IDX files, but for the entire content of the repo (so about 16 mb for git proxy). Again is missing the packet lines that a push will have preceding the PACK file data. However, there is probably a variation on this one that would have been quite close. In the end, just capturing a request body as a buffer and writing it to a file worked and is easy to replicate. Added details to a readme file adjacent to the captured binary file. |
resolves #1040
Reimplements parsing of PACK file contents to resolve a number of issues that were causing a loss of sync with the headers and compressed data streams that make up the pack file, resulting in Z_DATA_ERROR being thrown and the rest of the pack file failing to decode. This results in partial content being extracted from push pack files and commits being missed/not processed by later steps. The missing commits can also interact with the checkHiddenCommits task to block a subset of pushes, while others proceed with missing data.
After this PR, Git Proxy will correctly process a handful of pushes I have queued up that either parse with missing data or are blocked by checkHiddenCommits.
This PR: