bug-1863007: change UPLOAD_TEMPDIR_ORPHANS_CUTOFF to 15 minutes #2839
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This drops the cutoff from 60 minutes which kept orphaned files way too long causing an instance to have disk-full problems for over an hour to 15 minutes which will allow instances to recover more quickly.
15 minutes far exceeds the 6 minute timeout for HTTP request handling, so files that are still around after 15 minutes are certainly orphaned.
The Tecken webapp handles upload API requests. These requests can take a very long time to handle. If the request exceeds 6 minutes, it's possible that the gunicorn worker serving the request will get killed off. If that happens, then the symbols files on disk that the upload API handler was processing get orphaned and remain on disk. The disk is finite, so after enough of these events, then the instance has no more disk left and starts throwing out-of-disk errors.
There is also a disk cache manager process running in the docker container. The
UPLOAD_TEMPDIR_ORPHANS_CUTOFF
setting affects how old a file can be before the disk cache manager determines it's an orphaned file and deletes it.This reduces the cutoff number from 60 minutes to 15 minutes by changing the default value. We don't set this value in the infrastructure configuration, so changing the default changes it everywhere. There isn't a whole lot to review here.