Autocomplete: Score not given as expected #203

Closed
borisdaeppen opened this Issue May 8, 2012 · 5 comments

Comments

Projects
None yet
4 participants
@borisdaeppen

Autocomplete has strange behavior here:

http://api.metacpan.org//v0/search/autocomplete?&q=Log++Log4perl

     [...]
     {
        "_score" : 1.169038,
        "fields" : {
           "documentation" : "Tie::Log4perl",
           "release" : "Tie-Log4perl-0.1",
           "author" : "FRODWITH",
           "distribution" : "Tie-Log4perl"
        },
        [...]
     },
     {
        "_score" : 1.169038,
        "fields" : {
           "documentation" : "Log::Log4perl",
           "release" : "Log-Log4perl-1.36",
           "author" : "MSCHILLI",
           "distribution" : "Log-Log4perl"
        },
        [...]
     },
     {
        "_score" : 1.1590381,
        "fields" : {
           "documentation" : "Test::Log4perl",
           "release" : "Test-Log4perl-0.1001",
           "author" : "FOTANGO",
           "distribution" : "Test-Log4perl"
        },
        [...]
     },
     [...]

Both, Tie::Log4perl and Log::Log4perl have the same score which is 1.169038.
Even worse, Tie::Log4perl is placed on top. If I add the parameter &size=1 to the request (in the URL) I will only get Tie::Log4perl - but I was asking for a match witch Log::Log4perl.

This looks like a bug to me...

I use the autocomplete functionality at http://perlybook.org/ to guess what the users is asking for. It's very nice because like this you can even handle typos in user input. But in this case, the user just gets a complete wrong result.

Any ideas on that?

@monken

This comment has been minimized.

Show comment Hide comment
@monken

monken May 10, 2012

Member

you will see that the results for http://api.metacpan.org//v0/search/autocomplete?&q=Log++Log4perl and http://api.metacpan.org//v0/search/autocomplete?&q=Log4perl are the same. This is because the query is tokenized to the terms "log", "4", and "perl" (because duplicates are removed). I agree that this is not the result, one would expect, but it's not entirely random. We need to tweak the search algorithm a bit to address this issue.
So thanks for reporting this case!

Member

monken commented May 10, 2012

you will see that the results for http://api.metacpan.org//v0/search/autocomplete?&q=Log++Log4perl and http://api.metacpan.org//v0/search/autocomplete?&q=Log4perl are the same. This is because the query is tokenized to the terms "log", "4", and "perl" (because duplicates are removed). I agree that this is not the result, one would expect, but it's not entirely random. We need to tweak the search algorithm a bit to address this issue.
So thanks for reporting this case!

@dvergin

This comment has been minimized.

Show comment Hide comment
@dvergin

dvergin Jun 11, 2012

Similarly, a request at http://perlybook.org/ for "Moo" returns the docs for "ppt". A visit to api.metacpan.org shows that the two have the same score. In this case it would not seem that the issue hinges on the tokening and dup removal that monken describes in connection with Log4perl.

Checking the page http://api.metacpan.org//v0/search/autocomplete?&q=Moo I note that the ppt distribution is shown with: "documentation" : "moo".

Knowing next to nothing about metacpan, I can't speculate about the source of that confusion or whether the "documentation" anomaly is cause or symptom.

dvergin commented Jun 11, 2012

Similarly, a request at http://perlybook.org/ for "Moo" returns the docs for "ppt". A visit to api.metacpan.org shows that the two have the same score. In this case it would not seem that the issue hinges on the tokening and dup removal that monken describes in connection with Log4perl.

Checking the page http://api.metacpan.org//v0/search/autocomplete?&q=Moo I note that the ppt distribution is shown with: "documentation" : "moo".

Knowing next to nothing about metacpan, I can't speculate about the source of that confusion or whether the "documentation" anomaly is cause or symptom.

@monken

This comment has been minimized.

Show comment Hide comment
@monken

monken Jun 17, 2012

Member

This (https://github.com/CPAN-API/metacpan-web/blob/master/lib/MetaCPAN/Web/Model/API/Module.pm#L52) is the query that metacpan.org issues at the /autocomplete endpoint. This endpoint does the autocompletion for all modules on the cpan, but you need to filter the results even more to get useful results.

Member

monken commented Jun 17, 2012

This (https://github.com/CPAN-API/metacpan-web/blob/master/lib/MetaCPAN/Web/Model/API/Module.pm#L52) is the query that metacpan.org issues at the /autocomplete endpoint. This endpoint does the autocompletion for all modules on the cpan, but you need to filter the results even more to get useful results.

@monken

This comment has been minimized.

Show comment Hide comment
@monken

monken Jun 17, 2012

Member

Sorry, I was wrong, we send that query to /file/_search instead of /search/autocomplete. Let me check how to fix your query.

Member

monken commented Jun 17, 2012

Sorry, I was wrong, we send that query to /file/_search instead of /search/autocomplete. Let me check how to fix your query.

@ranguard

This comment has been minimized.

Show comment Hide comment
@ranguard

ranguard Nov 13, 2014

Member

Closing as part of Nov 2014 cleanup

Member

ranguard commented Nov 13, 2014

Closing as part of Nov 2014 cleanup

@ranguard ranguard closed this Nov 13, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment