Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

synonym and bq duplicate bug for solr 3.x and 4.x #31

Closed
arcadius opened this issue Oct 8, 2013 · 4 comments
Closed

synonym and bq duplicate bug for solr 3.x and 4.x #31

arcadius opened this issue Oct 8, 2013 · 4 comments
Labels
Milestone

Comments

@arcadius
Copy link

arcadius commented Oct 8, 2013

Hello

I have noticed that when using the edismax bq parameter, the bq gets applied twice: once to the synonym expansion and once as an addition to the overall score.

I have a fresh download of solr 4.5.

I have a synonym file containing only one line:

usa, united states of america

I have the following 2 documents that I index into solr using the post.jar:

[
   {
      "id":"1",
      "name":"Enterprises in the USA.",
      "cat":"Y"
   },
   {
      "id":"2",
      "name":"Enterprises in the United States of America.",
      "cat":"Y"
   }
]

when I search for "USA" using the following query:

http://localhost:8983/solr/collection1/select?q="usa"&pf=name&qf=name&bq=cat:Y^1000&debugQuery=true&defType=synonym_edismax&synonyms=true

I get back in the debug query

<str name="parsedquery">(+(DisjunctionMaxQuery((name:usa)) (((+DisjunctionMaxQuery((name:"united states of america")) () cat:Y^1000.0)/no_coord))) () cat:Y^1000.0)/no_coord</str>

<str name="parsedquery_toString">+((name:usa) ((+(name:"united states of america") () cat:Y^1000.0))) () cat:Y^1000.0</str>

meaning that doc 2 comes first as the expansion "united states of america" gets one extra bq boost.

Note the duplicate cat:Y^1000.0 in the parsedQuery. The first of which may be the actual bug?

Same when I search for "United States of America":

http://localhost:8983/solr/collection1/select?q="united states of america"&pf=name&qf=name&bq=cat:Y^1000&debugQuery=true&defType=synonym_edismax&synonyms=true

I get

<str name="parsedquery">(+(DisjunctionMaxQuery((name:"united states of america")) (((+DisjunctionMaxQuery((name:usa)) () cat:Y^1000.0)/no_coord))) () cat:Y^1000.0)/no_coord</str>

<str name="parsedquery_toString">+((name:"united states of america") ((+(name:usa) () cat:Y^1000.0))) () cat:Y^1000.0</str>

meaning that doc 1 which contains the expansion "USA" get one additional bq boost.

In general, the bug is that the synonym expansion get a duplicate bq boost.

Note that none of
synonyms.originalBoost or synonyms.synonymBoost is being used.

I have seen this issue in solr 3.6 as well as 4.5.

@nolanlawson
Copy link
Member

Bug confirmed.

Incidentally, this led me to discover #33.

@avlukanin
Copy link
Collaborator

Yes, I came accross this bug as well. Note, that pf works great, when it is applied twice, because it is actually applied only once as you have different phrases, but if I boost by some other field (e.g. boost=popularity), the synonyms are boosted twice, which is wrong.

@arcadius
Copy link
Author

arcadius commented Nov 1, 2013

Hello Nolan.
Thank you very much for looking into this.
I will test the latest version and let you know how it goes.

Thanks.

Arcadius.

@nolanlawson
Copy link
Member

OK. This will be in the upcoming 1.3.3 release, FYI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants