Kill `bin/update-downloads`, update the counters directly at the endpoint #660

sgrif · 2017-03-31T10:40:08Z

Note: This PR is purposely broken into two commits, and they should be deployed separately. The first commit stops adding data for the update-downloads script to touch, the second commit removes the script entirely. The second commit shouldn't be deployed until SELECT COUNT(*) FROM version_downloads WHERE downloads != counted returns 0 (should happen pretty much immediately after deploying the first commit)

The current process of updating the download counters does some fairly unneccessary punting to a background job. It's possible that this was a premature performance optimization, but based on the comment which was there before, I suspect that it was due to a misunderstanding of how locks work in PG and done this way in an attempt to make it less "racy".

The comment mentioned that the update was non-atomic and racy. Neither of those statements are in any way true, which is why we use an RBDMS in the first place. Each transaction will lock the rows it updates until the transaction commits or rolls back, and any other transactions will block until then.

With this change, all hits to /download will block each other on the final query, when they try to update the metadata table. However, as this is the final query before the transaction is committed, this shouldn't be a bottleneck until we reach millions of downloads per minute (at which point a periodic backround worker which does SELECT sum(downloads) FROM crates is probably simpler)

The current process of updating the download counters does some fairly unneccessary punting to a background job. It's possible that this was a premature performance optimization, but based on the comment which was there before, I suspect that it was due to a misunderstanding of how locks work in PG and done this way in an attempt to make it less "racy". The comment mentioned that the update was non-atomic and racy. Neither of those statements are in any way true, which is why we use an RBDMS in the first place. Each transaction will lock the rows it updates until the transaction commits or rolls back, and any other transactions will block until then. With this change, all hits to `/download` will block each other on the final query, when they try to update the `metadata` table. However, as this is the final query before the transaction is committed, this shouldn't be a bottleneck until we reach millions of downloads per minute (at which point a periodic backround worker which does `SELECT sum(downloads) FROM crates` is probably simpler)

Now that we've let the script catch up and are updating these counters when the download occurs, this binary (and the columns it relied on) are useless

carols10cents · 2017-03-31T16:30:13Z

I'm not comfortable making this big of a change; I'm not convinced that this wouldn't be a bottleneck at our current traffic. I'm going to pull out some of this, namely the upsert to version downloads, in order to get master deployable again.

I don't know what would make me comfortable aside from loading up staging with the production database and running some load tests on staging.

sgrif · 2017-03-31T18:45:19Z

We could always do a partial rollout (e.g. 1% of requests) and see what the 95th percentile on that endpoint looks like

carols10cents · 2017-03-31T18:55:19Z

idk which of these requires less infrastructure that we don't have 😅

sgrif · 2017-03-31T18:56:23Z

Do we not have newrelic set up? I can definitely just add a if rand() < 0.01 test to do the 1% of requests

carols10cents · 2017-03-31T19:08:48Z

Does newrelic support Rust?

carols10cents · 2017-04-01T01:58:12Z

What if, instead, we slowly took update_download's responsibilities away from it: start by taking out the version_download updating, see how that performs, then if it does ok, move versions updating out, then crates, then crate_downloads, then metadata?

sgrif · 2017-04-01T18:23:38Z

Seems fine by me

carols10cents · 2017-04-14T13:50:35Z

Ok, I'm going to close this then. If you have time before I do, please resubmit as piecemeal changes in multiple PRs. Thanks!

sgrif added 2 commits March 31, 2017 06:36

Remove the update-downloads binary

4d2dc8e

Now that we've let the script catch up and are updating these counters when the download occurs, this binary (and the columns it relied on) are useless

carols10cents mentioned this pull request Apr 3, 2017

update-commit-db _might_ miss commits rust-lang-nursery/thanks#15

Open

carols10cents closed this Apr 14, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Kill `bin/update-downloads`, update the counters directly at the endpoint #660

Kill `bin/update-downloads`, update the counters directly at the endpoint #660

Uh oh!

sgrif commented Mar 31, 2017

Uh oh!

carols10cents commented Mar 31, 2017

Uh oh!

sgrif commented Mar 31, 2017

Uh oh!

carols10cents commented Mar 31, 2017

Uh oh!

sgrif commented Mar 31, 2017

Uh oh!

carols10cents commented Mar 31, 2017

Uh oh!

carols10cents commented Apr 1, 2017

Uh oh!

sgrif commented Apr 1, 2017

Uh oh!

carols10cents commented Apr 14, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Kill bin/update-downloads, update the counters directly at the endpoint #660

Kill bin/update-downloads, update the counters directly at the endpoint #660

Uh oh!

Conversation

sgrif commented Mar 31, 2017

Uh oh!

carols10cents commented Mar 31, 2017

Uh oh!

sgrif commented Mar 31, 2017

Uh oh!

carols10cents commented Mar 31, 2017

Uh oh!

sgrif commented Mar 31, 2017

Uh oh!

carols10cents commented Mar 31, 2017

Uh oh!

carols10cents commented Apr 1, 2017

Uh oh!

sgrif commented Apr 1, 2017

Uh oh!

carols10cents commented Apr 14, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Kill `bin/update-downloads`, update the counters directly at the endpoint #660

Kill `bin/update-downloads`, update the counters directly at the endpoint #660