Fix artifact v4 upload above 8MB #31664

ChristopherHX · 2024-07-20T12:49:23Z

Multiple chunks are uploaded with type "block" without using "appendBlock" and eventually out of order for bigger uploads.
8MB seems to be the chunk size

This change parses the blockList uploaded after all blocks to get the final artifact size and order them correctly before calculating the sha256 checksum over all blocks

Fixes #31354

ChristopherHX · 2024-07-20T13:56:01Z

Doesn't work yet.

40MB are fixed now but ca. 800MB still cause checksum errors, are there some asynchronious things going on...

Parallel Chunk updates has been seen, 6 of 9 missing chunks found in log as duplicates

ChristopherHX · 2024-07-20T15:18:41Z

After some tests the chunks are still out of order, investigating to implement blockList and store chunks using their name specified per query

During merging read the block list to build the reader over every chunk

ChristopherHX

I think the issue is fixed like that, but we need to look at the security of storing a blockid in blob storage or find alternatives

Uploaded bytes 746586112
Uploaded bytes 754974720
Uploaded bytes 763363328
Uploaded bytes 771751936
Uploaded bytes 780140544
Uploaded bytes 788529152
Uploaded bytes 796917760
Uploaded bytes 805306368
Uploaded bytes 807753807
Finished uploading artifact content to blob storage!
SHA256 hash of uploaded artifact zip is 6feeaf6b88049a4c6ced1240ed3911afaa819229cd082999be5c81eff09c33f1
Finalizing artifact upload
Artifact artifact.zip successfully finalized. Artifact ID 22
Artifact artifact has been successfully uploaded! Final size is 807753807 bytes. Artifact ID is 22
Artifact download URL: http://localhost:3000/test/artifact-upload-big/actions/runs/39/artifacts/22

ChristopherHX · 2024-07-20T19:16:57Z

routers/api/actions/artifactsv4.go

+			_, err := r.fs.Save(fmt.Sprintf("tmp%d/block-%d-%d-%s", task.Job.RunID, task.Job.RunID, ctx.Req.ContentLength, blockid), ctx.Req.Body, -1)
+			if err != nil {
+				log.Error("Error runner api getting task: task is not running")
+				ctx.Error(http.StatusInternalServerError, "Error runner api getting task: task is not running")
+				return


If we have a blockid, delay ordering chunks to the end and use it's blockid to form the name

I notice that this might be a security issue, because an attacker could control the filesystem name, but what alternatives do we have?
Santizing is probably the way to go here

I switched to base64url encoding of the blockid that doesn't allow to traverse the filesystem or do other bad things

ChristopherHX · 2024-07-20T19:18:09Z

routers/api/actions/artifactsv4.go

+	case "blocklist":
+		_, err := r.fs.Save(fmt.Sprintf("tmp%d/%d-blocklist", task.Job.RunID, task.Job.RunID), ctx.Req.Body, -1)


Now we got the final block order by blockid in xml, save it for the merge step

ChristopherHX · 2024-07-20T19:19:08Z

routers/api/actions/artifactsv4.go

+		log.Warn("Error merge chunks: %v", err)
+		chunkMap, err := listChunksByRunID(r.fs, runID)
+		if err != nil {
+			log.Error("Error merge chunks: %v", err)
+			ctx.Error(http.StatusInternalServerError, "Error merge chunks")
+			return
+		}
+		chunks, ok = chunkMap[artifact.ID]
+		if !ok {
+			log.Error("Error merge chunks")
+			ctx.Error(http.StatusInternalServerError, "Error merge chunks")
+			return
+		}


My tests don't upload a blockmap yet, fallback try old broken mode

ChristopherHX · 2024-07-20T19:20:51Z

routers/api/actions/artifacts_chunks.go

@@ -123,6 +123,49 @@ func listChunksByRunID(st storage.ObjectStorage, runID int64) (map[int64][]*chun
 	return chunksMap, nil
 }

+func listChunksByRunIDV4(st storage.ObjectStorage, runID int64, artifactID int64, blist *BlockList) ([]*chunkFileItem, error) {


Syntetise a chunk list with the correct start/end and artifact entries, based on the xml Blocklist

routers/api/actions/artifacts_chunks.go

VAllens · 2024-07-23T11:24:54Z

I also encountered the same problem.
When can this PR be merged?
Has there been any progress?

silverwind · 2024-07-23T17:35:31Z

look at the security of storing a blockid in blob storage

Isn't the storage considered "private" anyways?

ChristopherHX · 2024-07-23T17:57:28Z

look at the security of storing a blockid in blob storage

Isn't the storage considered "private" anyways?

Idk if the gitea object storage is protected against this pattern tmpv4267/block-267-9-/../some/custom/path

All runs of (all users?) use the same artifact storage container

I double fixed that now so, it's no potential problem anymore

silverwind · 2024-07-23T18:01:07Z

Ah I see where you are getting at. Yes, cross-repo contamination should be avoided.

silverwind · 2024-07-23T18:29:55Z

Is a test easily possible here? Might be valuable.

ChristopherHX · 2024-07-23T21:01:56Z

Is a test easily possible here?

A test that uploads a block with a blockid like ../some/path and verify that it worked without failure?
Optionally check that all files created in object storage are in the correct folder

Yes I think this should be possible to do for me tomorrow

ChristopherHX · 2024-07-24T11:12:43Z

My PR seem to have issues with minio and azurite in backend tests...

Could it be that the blob storage that has been written by storage.Actions.Save(...) is not immediately available via storage.Actions.Open

e.g. for minio I get an error like The specified key does not exist, not shure if it comes from encoding/xml or minio

e.g. for azureite file does not exist

local backend has a 100% success rate

I'm pretty shure the file has been created a few cycles before the open call

Do you have any advice here?

EDIT
Is this something else? And the key is never stored on disk for minio/azureite, 5s retries didn't work

EDIT 2
The minio error is from encoding/xml...

EDIT 3
The minio error comes from minio, when reading the stream after the chunk has been deleted between open and read

EDIT 4
All known issues are gone and tests are working fine

…s via retries

VAllens · 2024-09-23T01:17:45Z

goooooooooooooooooood !!!

* giteaofficial/main: [skip ci] Updated licenses and gitignores Fix rename branch permission bug (go-gitea#32066) Fix artifact v4 upload above 8MB (go-gitea#31664) [skip ci] Updated translations via Crowdin Add bin to Composer Metadata (go-gitea#32099) Fix wrong last modify time (go-gitea#32102) Fix upload maven pacakge parallelly (go-gitea#31851) Repo Activity: count new issues that were closed (go-gitea#31776) Count typescript files as frontend for labeling (go-gitea#32088) Use camo.Always instead of camo.Allways (go-gitea#32097)

Fix artifact v4 upload above 8MB

895e8b2

pull-request-size bot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Jul 20, 2024

GiteaBot added the lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. label Jul 20, 2024

github-actions bot added modifies/api This PR adds API routes or modifies them modifies/go Pull requests that update Go code labels Jul 20, 2024

Basic blockList Support for artifacts v4

a6fb8a5

pull-request-size bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jul 20, 2024

format code

1c641fe

ChristopherHX commented Jul 20, 2024

View reviewed changes

ChristopherHX added 2 commits July 20, 2024 21:40

encode blockid via base64 url encoding to avoid security problems

4cde6db

fixes and move new v4 file naming to custom tmpv4 folder

5a2a395

ChristopherHX marked this pull request as ready for review July 21, 2024 17:06

lunny reviewed Jul 22, 2024

View reviewed changes

routers/api/actions/artifacts_chunks.go Show resolved Hide resolved

ChristopherHX added 3 commits July 24, 2024 12:14

add test to verify exploit failure

d9f55bc

add TestActionsArtifactV4UploadSingleFileWithChunksOutOfOrder

02479bf

update var name

add8925

ChristopherHX and others added 4 commits July 24, 2024 13:21

attempt to workaround minio and azureite storage availability problem…

06f3e93

…s via retries

fix logic

a96be7c

fix minio

a7ef238

Merge branch 'main' into fix-artifact-v4-upload-above-8MB

6a1383d

Merge branch 'main' into fix-artifact-v4-upload-above-8MB

5ee6ddc

techknowlogick approved these changes Sep 7, 2024

View reviewed changes

GiteaBot added lgtm/need 1 This PR needs approval from one additional maintainer to be merged. and removed lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. labels Sep 7, 2024

lunny added this to the 1.23.0 milestone Sep 7, 2024

techknowlogick and others added 2 commits September 11, 2024 22:43

Merge branch 'main' into fix-artifact-v4-upload-above-8MB

f036ed8

Merge branch 'main' into fix-artifact-v4-upload-above-8MB

4bab1d4

lunny approved these changes Sep 21, 2024

View reviewed changes

GiteaBot added lgtm/done This PR has enough approvals to get merged. There are no important open reservations anymore. and removed lgtm/need 1 This PR needs approval from one additional maintainer to be merged. labels Sep 21, 2024

lunny added the reviewed/wait-merge This pull request is part of the merge queue. It will be merged soon. label Sep 21, 2024

GiteaBot added 3 commits September 22, 2024 06:23

Merge branch 'main' into fix-artifact-v4-upload-above-8MB

69be0c8

Merge branch 'main' into fix-artifact-v4-upload-above-8MB

010825a

Merge branch 'main' into fix-artifact-v4-upload-above-8MB

33b3e04

lafriks merged commit b594cec into go-gitea:main Sep 22, 2024
26 checks passed

GiteaBot removed the reviewed/wait-merge This pull request is part of the merge queue. It will be merged soon. label Sep 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix artifact v4 upload above 8MB #31664

Fix artifact v4 upload above 8MB #31664

ChristopherHX commented Jul 20, 2024 •

edited

Loading

ChristopherHX commented Jul 20, 2024 •

edited

Loading

ChristopherHX commented Jul 20, 2024 •

edited

Loading

ChristopherHX left a comment •

edited

Loading

ChristopherHX Jul 20, 2024

ChristopherHX Jul 20, 2024

ChristopherHX Jul 20, 2024

ChristopherHX Jul 20, 2024

ChristopherHX Jul 20, 2024

VAllens commented Jul 23, 2024

silverwind commented Jul 23, 2024

ChristopherHX commented Jul 23, 2024

silverwind commented Jul 23, 2024

silverwind commented Jul 23, 2024

ChristopherHX commented Jul 23, 2024

ChristopherHX commented Jul 24, 2024 •

edited

Loading

VAllens commented Sep 23, 2024

		case "blocklist":
		_, err := r.fs.Save(fmt.Sprintf("tmp%d/%d-blocklist", task.Job.RunID, task.Job.RunID), ctx.Req.Body, -1)

Fix artifact v4 upload above 8MB #31664

Fix artifact v4 upload above 8MB #31664

Conversation

ChristopherHX commented Jul 20, 2024 • edited Loading

ChristopherHX commented Jul 20, 2024 • edited Loading

ChristopherHX commented Jul 20, 2024 • edited Loading

ChristopherHX left a comment • edited Loading

Choose a reason for hiding this comment

ChristopherHX Jul 20, 2024

Choose a reason for hiding this comment

ChristopherHX Jul 20, 2024

Choose a reason for hiding this comment

ChristopherHX Jul 20, 2024

Choose a reason for hiding this comment

ChristopherHX Jul 20, 2024

Choose a reason for hiding this comment

ChristopherHX Jul 20, 2024

Choose a reason for hiding this comment

VAllens commented Jul 23, 2024

silverwind commented Jul 23, 2024

ChristopherHX commented Jul 23, 2024

silverwind commented Jul 23, 2024

silverwind commented Jul 23, 2024

ChristopherHX commented Jul 23, 2024

ChristopherHX commented Jul 24, 2024 • edited Loading

VAllens commented Sep 23, 2024

ChristopherHX commented Jul 20, 2024 •

edited

Loading

ChristopherHX commented Jul 20, 2024 •

edited

Loading

ChristopherHX commented Jul 20, 2024 •

edited

Loading

ChristopherHX left a comment •

edited

Loading

ChristopherHX commented Jul 24, 2024 •

edited

Loading