feat: Chunk upload endpoint #7095

HazAT · 2018-01-29T14:17:17Z

This adds an endpoint for chunk uploads.
https://paper.dropbox.com/doc/Exposed-Blob-Upload-Service-A7Qd5HBtLvNsVI5SvHjUV

This is only the first part of a series of PRs, next up will the assemble endpoint.
tl;dr we accept chunks, create FileBlobs which later will be assembled in a different endpoint.

These requests need a valid auth token with permission of project:releases.

GET -> /api/0/chunk-upload/
Returns metadata how the client (mostly sentry-cli) should structure the upload.

Response:

{
    "url": "https://sentry.io",
    "chunksPerRequest": 16,
    "hashAlgorithm": "sha1",
    "chunkSize": 1048576,
    "concurrency": 4
}

POST -> /api/0/chunk-upload/

curl -X POST \
  http://localhost:8000/api/0/chunk-upload/ \
  -H 'content-type: multipart/form-data; boundary=----WebKitFormBoundary7MA4YWxkTrZu0gW' \
  -F file=@/Users/haza/Desktop/upload-test/08aec68f1f2240b718982cc0924b5684e8149174 \
  -F file=@/Users/haza/Desktop/upload-test/9073a9c28759b5e5a6d0d8afb5e1c5ca13dbdc45

So this POST takes multiple files up to MAX_CHUNKS_PER_REQUEST and creates FileBlob in the database.
The filename must be the checksum of the file because we validate against it.
If there is a mismatch the request fails (this should never happen ¯\_(ツ)_/¯).
This request also validates chunk size and count.

The tests should cover all scenarios.

ghost · 2018-01-29T14:19:33Z

	1 Warning
⚠️	You should update CHANGES due to the size of this PR

Generated by 🚫 danger

mattrobenolt · 2018-01-31T18:53:15Z

src/sentry/api/endpoints/chunk_upload.py

+from sentry.api.bases.project import ProjectReleasePermission
+
+
+UPLOAD_ENDPOINT = 'https://sentry.io'


Why is this a hardcoded value? This isn't going to work for on-premise.

Right, is there a way to get the current URL or something?
We were thinking about not serving the URL of the installation instead serve upload.sentry.io which handles stuff a bit differently.

Yeah, the system.url-prefix option. I think we should introduce a new option here so it can be overridden in config. But default back to system.url-prefix.

mattrobenolt · 2018-01-31T18:54:32Z

src/sentry/api/endpoints/chunk_upload.py

+    def get(self, request):
+        return Response(
+            {
+                'url': UPLOAD_ENDPOINT,


What's the reason for returning this value in the first place? I understand the value in this endpoint being able to fetch this information about basically the server's support, but this in theory just points back to itself? So it's just the exact same endpoint that you hit for GET. In what situation would it be different?

As I said before the idea is to maybe handle the upload through a different endpoint e.g.: upload.sentry.io which has some kind of keep alive or whatever.
I am not really sure if this endpoint basically is just another instance of Sentry with different webserver settings.

mattrobenolt · 2018-01-31T18:55:54Z

src/sentry/api/endpoints/chunk_upload.py

+            {
+                'url': UPLOAD_ENDPOINT,
+                'chunkSize': DEFAULT_BLOB_SIZE,
+                'chunksPerRequest': MAX_CHUNKS_PER_REQUEST,


Is there a reason to even accept multiple chunks per request? If we limited it to just 1 per request, you'd be able to achieve more throughput through parallelization. Anything in chunksPerRequest is going to be limited serially.

The idea behind this was that if we send more (data) per request we save the overhead of new http connections.

If we get sentry-cli to keep the connection alive, it should probably not even matter too much. But we will probably not implement parallel uploads in CLI...

I'd argue that it'd be outweighed by the parallelism. For example, this is what gsutil does for it's own chunked uploads to Google Storage. I can't think of anything that does this hybrid thing where you can send chunks, but in batches. I don't have a strong opinion, but I don't think it's that valuable.

See stuff like: https://cloud.google.com/storage/docs/json_api/v1/how-tos/resumable-upload

I also like the idea of using the Content-Range header, but might not be as relevant for us here since they are fixed sizes, etc.

But anyways, I'd really just vote for handling 1 chunk per upload to reduce some complexity.

@mattrobenolt 1 chunk per upload will be awful for europeans. Means 1000 requests with a minimum latency of 60-150ms. We will be mostly waiting for network here.

mattrobenolt · 2018-01-31T18:56:38Z

src/sentry/api/endpoints/chunk_upload.py

+
+UPLOAD_ENDPOINT = 'https://sentry.io'
+MAX_CHUNKS_PER_REQUEST = 16
+MAX_CONCURRENCY = 4


Is this just a hint to the person that calls this? Because there's nothing that's actually enforcing this.

Yeah, it should be a hint, @mitsuhiko mentioned something about if we have a different endpoint which handles connections differently we can change it afterward.

mattrobenolt · 2018-01-31T18:57:36Z

src/sentry/api/endpoints/chunk_upload.py

+                return Response({'error': 'Checksum missmatch'},
+                                status=status.HTTP_400_BAD_REQUEST)
+
+        return Response(status=status.HTTP_200_OK)


nit: Shouldn't this be 201 Created?

We've discussed this and we went with 200.
But I don't have a strong opinion we can go with 201.

Ahh and also, if the endpoint already has chunks, they will not get "created" because they are already there.

In case of a single chunk per request (see previous comment), we could even go with PUT and 200 OK.

That's fine.

So I will leave it like this, since a PUT would mean a new function and 201 will not always be correct?! 🤷‍♂️

mattrobenolt · 2018-01-31T21:38:28Z

src/sentry/api/endpoints/chunk_upload.py

+from sentry.api.base import Endpoint
+from sentry.api.bases.project import ProjectReleasePermission
+
+UPLOAD_ENDPOINT_SETTING = options.get('system.upload-url-prefix')


Don't do this at module load time. Query these inside of ChunkUploadEndpoint.get.

mattrobenolt

🙆‍♂️

feat: Chunk upload endpoint

7e14875

HazAT self-assigned this Jan 29, 2018

HazAT added 4 commits January 29, 2018 15:29

fix: Remove json from request

08f5ad6

feat: Add assemble endpoint with background task

890c14b

feat: Add tests for chunk upload

4c7e168

feat: Reverse assemble commit

dafee30

HazAT requested a review from a team January 31, 2018 12:34

HazAT mentioned this pull request Jan 31, 2018

Error when uploading sourcemaps to Sentry (Error 413, request entity too large) getsentry/sentry-react-native#326

Closed

5 tasks

mattrobenolt reviewed Jan 31, 2018

View reviewed changes

ref: Use config for upload endpoint

65597df

mattrobenolt suggested changes Jan 31, 2018

View reviewed changes

ref: Load option on request

aca0463

mattrobenolt approved these changes Jan 31, 2018

View reviewed changes

HazAT merged commit be01269 into master Feb 1, 2018

HazAT deleted the feature/chunk-upload branch February 1, 2018 07:41

HazAT mentioned this pull request Feb 5, 2018

feat: Add dif assemble endpoint #7141

Merged

jan-auer mentioned this pull request Feb 12, 2018

ref(upload): Add support for the chunked DIF upload getsentry/sentry-cli#245

Merged

13 tasks

github-actions bot locked and limited conversation to collaborators Dec 22, 2020

		from sentry.api.bases.project import ProjectReleasePermission


		UPLOAD_ENDPOINT = 'https://sentry.io'

Uh oh!

feat: Chunk upload endpoint #7095

feat: Chunk upload endpoint #7095

Uh oh!

Conversation

HazAT commented Jan 29, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ghost commented Jan 29, 2018 • edited by ghost Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mattrobenolt left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

HazAT commented Jan 29, 2018 •

edited

Loading

ghost commented Jan 29, 2018 •

edited by ghost

Loading