Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Field length norm calculation is wrong #10667

Closed
loevenwong opened this issue Apr 20, 2015 · 2 comments
Closed

Field length norm calculation is wrong #10667

loevenwong opened this issue Apr 20, 2015 · 2 comments

Comments

@loevenwong
Copy link

@loevenwong loevenwong commented Apr 20, 2015

Hi,

the field-norm calculation is wrong.

# clear index
curl -XDELETE 'localhost:9200/test'

# add some data
curl 'http://localhost:9200/test/sample/five' -d '{ "name" : "one two three four five" }'
curl 'http://localhost:9200/test/sample/four' -d '{ "name" : "one two three four" }'
curl 'http://localhost:9200/test/sample/three' -d '{ "name" : "one two three" }'

# search for "one two three" and expect id:three at first result with best score
curl -s 'http://localhost:9200/test/sample/_search?q="one%20two%20three"&pretty=true'

# explain the result, expected was 0.577, 0.5, 0.447
curl -s 'http://localhost:9200/test/sample/_search?q="one%20two%20three"&pretty=true&explain' | grep -B 1 'fieldNorm'

The Lucene results are in correct order with different scores.

Thanks in advanced,
Timo

@rjernst
Copy link
Member

@rjernst rjernst commented Apr 23, 2015

@loevenwong Lucene encodes norms using 8 bits. This means precision can be lost when encoding. You can see it explained here:
https://lucene.apache.org/core/5_1_0/core/org/apache/lucene/search/similarities/DefaultSimilarity

The important bit to see is this:

The rationale supporting such lossy compression of norm values is that given the difficulty (and inaccuracy) of users to express their true information need by a query, only big differences matter.

@rjernst rjernst closed this Apr 23, 2015
@loevenwong
Copy link
Author

@loevenwong loevenwong commented May 11, 2015

@rjernst Thank you for your reply.
FYI: I've found a matching lucene issue: https://issues.apache.org/jira/browse/LUCENE-5005

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants