Skip to content

Conversation

@sgrif
Copy link
Contributor

@sgrif sgrif commented Mar 31, 2017

Note: This PR is purposely broken into two commits, and they should be deployed separately. The first commit stops adding data for the update-downloads script to touch, the second commit removes the script entirely. The second commit shouldn't be deployed until SELECT COUNT(*) FROM version_downloads WHERE downloads != counted returns 0 (should happen pretty much immediately after deploying the first commit)

The current process of updating the download counters does some fairly unneccessary punting to a background job. It's possible that this was a premature performance optimization, but based on the comment which was there before, I suspect that it was due to a misunderstanding of how locks work in PG and done this way in an attempt to make it less "racy".

The comment mentioned that the update was non-atomic and racy. Neither of those statements are in any way true, which is why we use an RBDMS in the first place. Each transaction will lock the rows it updates until the transaction commits or rolls back, and any other transactions will block until then.

With this change, all hits to /download will block each other on the final query, when they try to update the metadata table. However, as this is the final query before the transaction is committed, this shouldn't be a bottleneck until we reach millions of downloads per minute (at which point a periodic backround worker which does SELECT sum(downloads) FROM crates is probably simpler)

sgrif added 2 commits March 31, 2017 06:36
The current process of updating the download counters does some fairly
unneccessary punting to a background job. It's possible that this was a
premature performance optimization, but based on the comment which was
there before, I suspect that it was due to a misunderstanding of how
locks work in PG and done this way in an attempt to make it less "racy".

The comment mentioned that the update was non-atomic and racy. Neither
of those statements are in any way true, which is why we use an RBDMS in
the first place. Each transaction will lock the rows it updates until
the transaction commits or rolls back, and any other transactions will
block until then.

With this change, all hits to `/download` will block each other on the
final query, when they try to update the `metadata` table. However, as
this is the final query before the transaction is committed, this
shouldn't be a bottleneck until we reach millions of downloads per
minute (at which point a periodic backround worker which does `SELECT
sum(downloads) FROM crates` is probably simpler)
Now that we've let the script catch up and are updating these counters
when the download occurs, this binary (and the columns it relied on) are
useless
@carols10cents
Copy link
Member

I'm not comfortable making this big of a change; I'm not convinced that this wouldn't be a bottleneck at our current traffic. I'm going to pull out some of this, namely the upsert to version downloads, in order to get master deployable again.

I don't know what would make me comfortable aside from loading up staging with the production database and running some load tests on staging.

@sgrif
Copy link
Contributor Author

sgrif commented Mar 31, 2017

We could always do a partial rollout (e.g. 1% of requests) and see what the 95th percentile on that endpoint looks like

@carols10cents
Copy link
Member

idk which of these requires less infrastructure that we don't have 😅

@sgrif
Copy link
Contributor Author

sgrif commented Mar 31, 2017

Do we not have newrelic set up? I can definitely just add a if rand() < 0.01 test to do the 1% of requests

@carols10cents
Copy link
Member

Does newrelic support Rust?

@carols10cents
Copy link
Member

What if, instead, we slowly took update_download's responsibilities away from it: start by taking out the version_download updating, see how that performs, then if it does ok, move versions updating out, then crates, then crate_downloads, then metadata?

@sgrif
Copy link
Contributor Author

sgrif commented Apr 1, 2017

Seems fine by me

@carols10cents
Copy link
Member

Ok, I'm going to close this then. If you have time before I do, please resubmit as piecemeal changes in multiple PRs. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants