Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Order first by exact match in results #287

Merged
merged 1 commit into from
Apr 11, 2016

Conversation

jespino
Copy link
Contributor

@jespino jespino commented Mar 14, 2016

No description provided.

@alexcrichton
Copy link
Member

Hm I'm worried that there's a deeper problem here we may want to fix. If the normal postgres ranking isn't putting the exact crate match at the very top, then something about how we're ranking results seems quite wrong, so maybe postgres was accidentally misconfigured?

@jespino
Copy link
Contributor Author

jespino commented Mar 15, 2016

I think isn't a problem of postgresql configuration, the ts_rank function calculate the rank based on some weights, cargo crate have no keywords, and only include once the cargo word in the description, other packages, that include the cargo word in the name, keyword and descriptions, appear before it, for example, cargo-edit include the cargo word in name, two times in keywords and three times in description.

The tokenization is a problem too, to postgresql, cargo-edit is 2 words, cargo and edit, this means that all crates with cargo-anything will have cargo in the name, and probably in the keywords and problably in the description, I'm not sure that we can do it without a small "hack" to force the exact match priority.

@alexcrichton
Copy link
Member

This seems like our weights might be incorrect though? Sounds like we should much more heavily weight the title than we currently are (so it trumps everything) and basically go along the lines of that.

@jespino
Copy link
Contributor Author

jespino commented Mar 15, 2016

I'm not sure changing the weight can be enought because the cargo name is included in a lot of names, and the cargo exact match doesn't get a better rank. May work a combination of very heavy weight of title and a normalization to ensure that shorter names comes before (http://www.postgresql.org/docs/9.4/static/textsearch-controls.html#TEXTSEARCH-RANKING).

Anyway I have to test it correctly, I am trying to get the crates-io backend working.

@alexcrichton
Copy link
Member

Hm ok, let's try and use the normal ranking functions in postgres then? If you need help setting up the backend just let me know

@jespino
Copy link
Contributor Author

jespino commented Mar 19, 2016

I have pushed another idea. If everything is in the same search block I can't give more importance to "shorter" crate names. Then, calc the rank for the name of the crate with a normalization based on the number of "words" give it a better result.

The problem here is that any package with the word in the name, will be appear always before another package without the word in the name, independently of the number of tmes the word appears in the rest of the fields.

I have to keep thinking about it, but I think the correct approach may be a variant of this approach.

@alexcrichton
Copy link
Member

Hm I'm not quite following what the literal postgres query is doing here, but it seems legit? We can always play around with various tweaks to see what gives us the best results. So long as it's better than the previous it seems nice :)

Do you think that any tests could be added for this as well?

@cuviper
Copy link
Member

cuviper commented Mar 26, 2016

For an example, when searching "num" the exact crate is the fourth result. I'm not even sure how the first result try_into_opt came up at all. But I'm glad you're working on this. :)

(Seems like the high download count should factor in too, but that's a separate concern...)

@jespino jespino force-pushed the better-ranking-results branch 2 times, most recently from e90d046 to 56ca925 Compare April 6, 2016 19:14
@jespino
Copy link
Contributor Author

jespino commented Apr 6, 2016

I reviewed it and back to the first approach (the last version simply put the exact match in the top) and then the rest of results order by the ts_rank result.

I have visited the idea of separated rankings for name field and the rest of the fields, but doesn't work correctly because to get first the exact match i have to normalize the matches with the field size, and then the rank of name will be based on the length of the field, leaving the rank of the rest of the fields to a second ordering giving worse results.

Another idea based on the previous paragraph idea, is to calculate the rank as a combination of the rank of the name and the rank of the rest of the fields but it is complex and less understandable.

I think the solution of exact match first, and postgres ranked results after it is good enough for all the cases, have the expected behavior and is really simple to understand.

I will add a test too, to ensure the exact mach always goes first.

@jespino jespino changed the title [WIP] Order first by exact match in results Order first by exact match in results Apr 6, 2016
@alexcrichton
Copy link
Member

Ok, seems like it's at least an improvement over the status quo, so let's do that. Thanks @jespino!

@alexcrichton alexcrichton merged commit 80c91bb into rust-lang:master Apr 11, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants