Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change the default sorting on search to something less expensive #1809

Merged
merged 1 commit into from Sep 11, 2019

Conversation

sgrif
Copy link
Contributor

@sgrif sgrif commented Aug 19, 2019

This changes the behavior of /api/v1/crates if no sort option is
provided. This does not change the behavior of the website when accessed
through a browser, as the Ember frontend always specifies a sort field.

The query run by search when sorting by recent downloads currently
accounts for nearly 40% of our total DB load. It takes ~60ms on average,
which means it's extremely high load. Our logs for traffic to this
endpoint with an explicit sort=recent-downloads don't match up with
this. We can fix this by making the query faster, but this has drawbacks
that were determined not to be worth it (see #1755). So if we can't
decrease execution time, the only other option is to decrease execution
count.

A significant number of crawlers hit us without specifying any sorting,
which should mean they don't care about ordering. I'm sure there's
someone out there who is relying on this behavior, but most crawlers
I've seen don't specify ordering, and none of them care about the order
they get the results in since they're crawling the whole registry
anyway.

What's important is that this lowers the execution time from ~60ms to
~12ms. And sorting on recent_downloads will grow linearly with the table
size, while this can be done on an index alone (which should be
log(log(n)) IIRC). >90% of the query is spent on getting the count,
which will be going away soon as we change pagination strategies,
bringing us to sub-millisecond execution.

I do feel strongly that we either need to make this change, or accept
the drawbacks in #1755, as this will soon become a scaling issue for us.

r? @jtgeibel

@kzys
Copy link
Contributor

kzys commented Aug 27, 2019

Looks good to me.

@bors
Copy link
Contributor

bors commented Sep 5, 2019

☔ The latest upstream changes (presumably #1807) made this pull request unmergeable. Please resolve the merge conflicts.

@sgrif sgrif force-pushed the sg-change-default-search-ordering branch 2 times, most recently from f01fd19 to 0738c75 Compare September 5, 2019 17:07
This changes the behavior of `/api/v1/crates` if no sort option is
provided. This does not change the behavior of the website when accessed
through a browser, as the Ember frontend always specifies a sort field.

The query run by search when sorting by recent downloads currently
accounts for nearly 40% of our total DB load. It takes ~60ms on average,
which means it's extremely high load. Our logs for traffic to this
endpoint with an explicit `sort=recent-downloads` don't match up with
this. We can fix this by making the query faster, but this has drawbacks
that were determined not to be worth it (see rust-lang#1755). So if we can't
decrease execution time, the only other option is to decrease execution
count.

A significant number of crawlers hit us without specifying any sorting,
which should mean they don't care about ordering. I'm sure there's
someone out there who is relying on this behavior, but most crawlers
I've seen don't specify ordering, and none of them care about the order
they get the results in since they're crawling the whole registry
anyway.

What's important is that this lowers the execution time from ~60ms to
~12ms. And sorting on recent_downloads will grow linearly with the table
size, while this can be done on an index alone (which should be
log(log(n)) IIRC). >90% of the query is spent on getting the count,
which will be going away soon as we change pagination strategies,
bringing us to sub-millisecond execution.

I do feel strongly that we either need to make this change, or accept
the drawbacks in rust-lang#1755, as this will soon become a scaling issue for us.
@sgrif sgrif force-pushed the sg-change-default-search-ordering branch from 0738c75 to 75f1b01 Compare September 5, 2019 17:10
@jtgeibel
Copy link
Member

This change looks good to me. Since this doesn't affect the sort order in the frontend, I see no reason that this would need to go through a formal RFC (as originally done in 1824).

@bors r+

@bors
Copy link
Contributor

bors commented Sep 11, 2019

📌 Commit 75f1b01 has been approved by jtgeibel

@bors
Copy link
Contributor

bors commented Sep 11, 2019

⌛ Testing commit 75f1b01 with merge 6f5e217...

bors added a commit that referenced this pull request Sep 11, 2019
…ibel

Change the default sorting on search to something less expensive

This changes the behavior of `/api/v1/crates` if no sort option is
provided. This does not change the behavior of the website when accessed
through a browser, as the Ember frontend always specifies a sort field.

The query run by search when sorting by recent downloads currently
accounts for nearly 40% of our total DB load. It takes ~60ms on average,
which means it's extremely high load. Our logs for traffic to this
endpoint with an explicit `sort=recent-downloads` don't match up with
this. We can fix this by making the query faster, but this has drawbacks
that were determined not to be worth it (see #1755). So if we can't
decrease execution time, the only other option is to decrease execution
count.

A significant number of crawlers hit us without specifying any sorting,
which should mean they don't care about ordering. I'm sure there's
someone out there who is relying on this behavior, but most crawlers
I've seen don't specify ordering, and none of them care about the order
they get the results in since they're crawling the whole registry
anyway.

What's important is that this lowers the execution time from ~60ms to
~12ms. And sorting on recent_downloads will grow linearly with the table
size, while this can be done on an index alone (which should be
log(log(n)) IIRC). >90% of the query is spent on getting the count,
which will be going away soon as we change pagination strategies,
bringing us to sub-millisecond execution.

I do feel strongly that we either need to make this change, or accept
the drawbacks in #1755, as this will soon become a scaling issue for us.

r? @jtgeibel
@bors
Copy link
Contributor

bors commented Sep 11, 2019

☀️ Test successful - checks-travis
Approved by: jtgeibel
Pushing 6f5e217 to master...

@bors bors merged commit 75f1b01 into rust-lang:master Sep 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants