-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix artifact v4 upload above 8MB #31664
Fix artifact v4 upload above 8MB #31664
Conversation
Doesn't work yet. 40MB are fixed now but ca. 800MB still cause checksum errors, are there some asynchronious things going on... Parallel Chunk updates has been seen, 6 of 9 missing chunks found in log as duplicates |
After some tests the chunks are still out of order, investigating to implement blockList and store chunks using their name specified per query During merging read the block list to build the reader over every chunk |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the issue is fixed like that, but we need to look at the security of storing a blockid in blob storage or find alternatives
Uploaded bytes 746586112
Uploaded bytes 754974720
Uploaded bytes 763363328
Uploaded bytes 771751936
Uploaded bytes 780140544
Uploaded bytes 788529152
Uploaded bytes 796917760
Uploaded bytes 805306368
Uploaded bytes 807753807
Finished uploading artifact content to blob storage!
SHA256 hash of uploaded artifact zip is 6feeaf6b88049a4c6ced1240ed3911afaa819229cd082999be5c81eff09c33f1
Finalizing artifact upload
Artifact artifact.zip successfully finalized. Artifact ID 22
Artifact artifact has been successfully uploaded! Final size is 807753807 bytes. Artifact ID is 22
Artifact download URL: http://localhost:3000/test/artifact-upload-big/actions/runs/39/artifacts/22
routers/api/actions/artifactsv4.go
Outdated
_, err := r.fs.Save(fmt.Sprintf("tmp%d/block-%d-%d-%s", task.Job.RunID, task.Job.RunID, ctx.Req.ContentLength, blockid), ctx.Req.Body, -1) | ||
if err != nil { | ||
log.Error("Error runner api getting task: task is not running") | ||
ctx.Error(http.StatusInternalServerError, "Error runner api getting task: task is not running") | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have a blockid, delay ordering chunks to the end and use it's blockid to form the name
I notice that this might be a security issue, because an attacker could control the filesystem name, but what alternatives do we have?
Santizing is probably the way to go here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I switched to base64url encoding of the blockid that doesn't allow to traverse the filesystem or do other bad things
routers/api/actions/artifactsv4.go
Outdated
case "blocklist": | ||
_, err := r.fs.Save(fmt.Sprintf("tmp%d/%d-blocklist", task.Job.RunID, task.Job.RunID), ctx.Req.Body, -1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now we got the final block order by blockid in xml, save it for the merge step
routers/api/actions/artifactsv4.go
Outdated
log.Warn("Error merge chunks: %v", err) | ||
chunkMap, err := listChunksByRunID(r.fs, runID) | ||
if err != nil { | ||
log.Error("Error merge chunks: %v", err) | ||
ctx.Error(http.StatusInternalServerError, "Error merge chunks") | ||
return | ||
} | ||
chunks, ok = chunkMap[artifact.ID] | ||
if !ok { | ||
log.Error("Error merge chunks") | ||
ctx.Error(http.StatusInternalServerError, "Error merge chunks") | ||
return | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My tests don't upload a blockmap yet, fallback try old broken mode
@@ -123,6 +123,49 @@ func listChunksByRunID(st storage.ObjectStorage, runID int64) (map[int64][]*chun | |||
return chunksMap, nil | |||
} | |||
|
|||
func listChunksByRunIDV4(st storage.ObjectStorage, runID int64, artifactID int64, blist *BlockList) ([]*chunkFileItem, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Syntetise a chunk list with the correct start/end and artifact entries, based on the xml Blocklist
I also encountered the same problem. |
Isn't the storage considered "private" anyways? |
Idk if the gitea object storage is protected against this pattern tmpv4267/block-267-9-/../some/custom/path All runs of (all users?) use the same artifact storage container I double fixed that now so, it's no potential problem anymore |
Ah I see where you are getting at. Yes, cross-repo contamination should be avoided. |
Is a test easily possible here? Might be valuable. |
A test that uploads a block with a blockid like Yes I think this should be possible to do for me tomorrow |
My PR seem to have issues with minio and azurite in backend tests... Could it be that the blob storage that has been written by storage.Actions.Save(...) is not immediately available via storage.Actions.Open e.g. for minio I get an error like e.g. for azureite local backend has a 100% success rate I'm pretty shure the file has been created a few cycles before the open call Do you have any advice here? EDIT EDIT 2 EDIT 3 EDIT 4 |
goooooooooooooooooood !!! |
* giteaofficial/main: [skip ci] Updated licenses and gitignores Fix rename branch permission bug (go-gitea#32066) Fix artifact v4 upload above 8MB (go-gitea#31664) [skip ci] Updated translations via Crowdin Add bin to Composer Metadata (go-gitea#32099) Fix wrong last modify time (go-gitea#32102) Fix upload maven pacakge parallelly (go-gitea#31851) Repo Activity: count new issues that were closed (go-gitea#31776) Count typescript files as frontend for labeling (go-gitea#32088) Use camo.Always instead of camo.Allways (go-gitea#32097)
Multiple chunks are uploaded with type "block" without using "appendBlock" and eventually out of order for bigger uploads.
8MB seems to be the chunk size
This change parses the blockList uploaded after all blocks to get the final artifact size and order them correctly before calculating the sha256 checksum over all blocks
Fixes #31354