-
-
Notifications
You must be signed in to change notification settings - Fork 4.5k
feat: Chunk upload endpoint #7095
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Generated by 🚫 danger |
from sentry.api.bases.project import ProjectReleasePermission | ||
|
||
|
||
UPLOAD_ENDPOINT = 'https://sentry.io' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this a hardcoded value? This isn't going to work for on-premise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, is there a way to get the current URL or something?
We were thinking about not serving the URL of the installation instead serve upload.sentry.io
which handles stuff a bit differently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the system.url-prefix
option
. I think we should introduce a new option here so it can be overridden in config. But default back to system.url-prefix
.
def get(self, request): | ||
return Response( | ||
{ | ||
'url': UPLOAD_ENDPOINT, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason for returning this value in the first place? I understand the value in this endpoint being able to fetch this information about basically the server's support, but this in theory just points back to itself? So it's just the exact same endpoint that you hit for GET. In what situation would it be different?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I said before the idea is to maybe handle the upload through a different endpoint e.g.: upload.sentry.io
which has some kind of keep alive or whatever.
I am not really sure if this endpoint basically is just another instance of Sentry with different webserver settings.
{ | ||
'url': UPLOAD_ENDPOINT, | ||
'chunkSize': DEFAULT_BLOB_SIZE, | ||
'chunksPerRequest': MAX_CHUNKS_PER_REQUEST, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to even accept multiple chunks per request? If we limited it to just 1 per request, you'd be able to achieve more throughput through parallelization. Anything in chunksPerRequest
is going to be limited serially.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea behind this was that if we send more (data) per request we save the overhead of new http connections.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we get sentry-cli to keep the connection alive, it should probably not even matter too much. But we will probably not implement parallel uploads in CLI...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd argue that it'd be outweighed by the parallelism. For example, this is what gsutil
does for it's own chunked uploads to Google Storage. I can't think of anything that does this hybrid thing where you can send chunks, but in batches. I don't have a strong opinion, but I don't think it's that valuable.
See stuff like: https://cloud.google.com/storage/docs/json_api/v1/how-tos/resumable-upload
I also like the idea of using the Content-Range
header, but might not be as relevant for us here since they are fixed sizes, etc.
But anyways, I'd really just vote for handling 1 chunk per upload to reduce some complexity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mattrobenolt 1 chunk per upload will be awful for europeans. Means 1000 requests with a minimum latency of 60-150ms. We will be mostly waiting for network here.
|
||
UPLOAD_ENDPOINT = 'https://sentry.io' | ||
MAX_CHUNKS_PER_REQUEST = 16 | ||
MAX_CONCURRENCY = 4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this just a hint to the person that calls this? Because there's nothing that's actually enforcing this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it should be a hint, @mitsuhiko mentioned something about if we have a different endpoint which handles connections differently we can change it afterward.
return Response({'error': 'Checksum missmatch'}, | ||
status=status.HTTP_400_BAD_REQUEST) | ||
|
||
return Response(status=status.HTTP_200_OK) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Shouldn't this be 201 Created
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've discussed this and we went with 200.
But I don't have a strong opinion we can go with 201.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh and also, if the endpoint already has chunks, they will not get "created" because they are already there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case of a single chunk per request (see previous comment), we could even go with PUT
and 200 OK
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I will leave it like this, since a PUT
would mean a new function and 201 will not always be correct?! 🤷♂️
from sentry.api.base import Endpoint | ||
from sentry.api.bases.project import ProjectReleasePermission | ||
|
||
UPLOAD_ENDPOINT_SETTING = options.get('system.upload-url-prefix') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't do this at module load time. Query these inside of ChunkUploadEndpoint.get
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🙆♂️
This adds an endpoint for chunk uploads.
https://paper.dropbox.com/doc/Exposed-Blob-Upload-Service-A7Qd5HBtLvNsVI5SvHjUV
This is only the first part of a series of PRs, next up will the assemble endpoint.
tl;dr we accept chunks, create FileBlobs which later will be assembled in a different endpoint.
These requests need a valid auth token with permission of
project:releases
.GET
->/api/0/chunk-upload/
Returns metadata how the client (mostly
sentry-cli
) should structure the upload.Response:
POST
->/api/0/chunk-upload/
So this
POST
takes multiple files up toMAX_CHUNKS_PER_REQUEST
and createsFileBlob
in the database.The filename must be the checksum of the file because we validate against it.
If there is a mismatch the request fails (this should never happen ¯\_(ツ)_/¯).
This request also validates chunk size and count.
The tests should cover all scenarios.