New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Field length norm calculation is wrong #10667

Closed
loevenwong opened this Issue Apr 20, 2015 · 2 comments

Comments

Projects
None yet
2 participants
@loevenwong

loevenwong commented Apr 20, 2015

Hi,

the field-norm calculation is wrong.

# clear index
curl -XDELETE 'localhost:9200/test'

# add some data
curl 'http://localhost:9200/test/sample/five' -d '{ "name" : "one two three four five" }'
curl 'http://localhost:9200/test/sample/four' -d '{ "name" : "one two three four" }'
curl 'http://localhost:9200/test/sample/three' -d '{ "name" : "one two three" }'

# search for "one two three" and expect id:three at first result with best score
curl -s 'http://localhost:9200/test/sample/_search?q="one%20two%20three"&pretty=true'

# explain the result, expected was 0.577, 0.5, 0.447
curl -s 'http://localhost:9200/test/sample/_search?q="one%20two%20three"&pretty=true&explain' | grep -B 1 'fieldNorm'

The Lucene results are in correct order with different scores.

Thanks in advanced,
Timo

@rjernst

This comment has been minimized.

Member

rjernst commented Apr 23, 2015

@loevenwong Lucene encodes norms using 8 bits. This means precision can be lost when encoding. You can see it explained here:
https://lucene.apache.org/core/5_1_0/core/org/apache/lucene/search/similarities/DefaultSimilarity

The important bit to see is this:

The rationale supporting such lossy compression of norm values is that given the difficulty (and inaccuracy) of users to express their true information need by a query, only big differences matter.

@rjernst rjernst closed this Apr 23, 2015

@loevenwong

This comment has been minimized.

loevenwong commented May 11, 2015

@rjernst Thank you for your reply.
FYI: I've found a matching lucene issue: https://issues.apache.org/jira/browse/LUCENE-5005

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment