-
Notifications
You must be signed in to change notification settings - Fork 921
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Connective boosts support #959
Conversation
It seems the problem comes from JSON parser when a few fulltext queries specified in a single search request. This example fails: search = Sunspot.search(Post) do
fulltext('Post Ipsum') do
boost_fields :body => 0.2
minimum_match 1
end
boost(0.9) do
with(:blog_id, 1)
end
boost(function() { div(field(:average_rating), 100) })
fulltext('Post') do
minimum_match 1
end
end This one passes: search = Sunspot.search(Post) do
fulltext('Post Ipsum') do
boost_fields :body => 0.2
minimum_match 1
end
boost(0.9) do
with(:blog_id, 1)
end
boost(function() { div(field(:average_rating), 100) })
end Happens only with JSON format. |
this is 👍 hope you'll sort out JSON issue |
Here are the raw docs that being sent to Solr using XML and JSON formats: {
:data => "<?xml version=\"1.0\" encoding=\"UTF-8\"?><add><doc boost=\"7.75\"><field name=\"id\">Post 1</field><field name=\"type\">Post</field><field name=\"type\">SuperClass</field><field name=\"type\">MockRecord</field><field name=\"class_name\">Post</field><field name=\"title_ss\">Post</field><field name=\"blog_id_i\">1</field><field name=\"category_ids_im\">3</field><field name=\"average_rating_ft\">30.0</field><field name=\"sort_title_s\">post</field><field name=\"primary_category_id_i\">3</field><field name=\"last_indexed_at_ds\">2019-10-01T13:07:36Z</field><field name=\"legacy_field_s\">legacy Post</field><field name=\"legacy_array_field_sm\">first string</field><field name=\"legacy_array_field_sm\">second string</field><field boost=\"2\" name=\"title_text\">Post</field><field boost=\"3\" name=\"text_array_text\">Post</field><field boost=\"3\" name=\"text_array_text\">Post</field><field name=\"body_textsv\">Lorem</field><field name=\"backwards_title_text\">tsoP</field><field name=\"custom_integer:3_i\">1</field></doc><doc boost=\"15.25\"><field name=\"id\">Post 2</field><field name=\"type\">Post</field><field name=\"type\">SuperClass</field><field name=\"type\">MockRecord</field><field name=\"class_name\">Post</field><field name=\"title_ss\">Post</field><field name=\"blog_id_i\">2</field><field name=\"category_ids_im\">2</field><field name=\"average_rating_ft\">60.0</field><field name=\"sort_title_s\">post</field><field name=\"primary_category_id_i\">2</field><field name=\"last_indexed_at_ds\">2019-10-01T13:07:36Z</field><field name=\"legacy_field_s\">legacy Post</field><field name=\"legacy_array_field_sm\">first string</field><field name=\"legacy_array_field_sm\">second string</field><field boost=\"2\" name=\"title_text\">Post</field><field boost=\"3\" name=\"text_array_text\">Post</field><field boost=\"3\" name=\"text_array_text\">Post</field><field name=\"body_textsv\">Ipsum</field><field name=\"backwards_title_text\">tsoP</field><field name=\"custom_integer:2_i\">1</field></doc><doc boost=\"22.75\"><field name=\"id\">Post 3</field><field name=\"type\">Post</field><field name=\"type\">SuperClass</field><field name=\"type\">MockRecord</field><field name=\"class_name\">Post</field><field name=\"title_ss\">Post</field><field name=\"blog_id_i\">3</field><field name=\"category_ids_im\">1</field><field name=\"average_rating_ft\">90.0</field><field name=\"sort_title_s\">post</field><field name=\"primary_category_id_i\">1</field><field name=\"last_indexed_at_ds\">2019-10-01T13:07:36Z</field><field name=\"legacy_field_s\">legacy Post</field><field name=\"legacy_array_field_sm\">first string</field><field name=\"legacy_array_field_sm\">second string</field><field boost=\"2\" name=\"title_text\">Post</field><field boost=\"3\" name=\"text_array_text\">Post</field><field boost=\"3\" name=\"text_array_text\">Post</field><field name=\"body_textsv\">Dolor</field><field name=\"backwards_title_text\">tsoP</field><field name=\"custom_integer:1_i\">1</field></doc></add>",
:headers => { "Content-Type" => "text/xml" }
}
{
:data => "{\"add\":{\"boost\":7.75,\"doc\":{\"id\":\"Post 1\",\"type\":[\"Post\",\"SuperClass\",\"MockRecord\"],\"class_name\":\"Post\",\"title_ss\":\"Post\",\"blog_id_i\":\"1\",\"category_ids_im\":\"3\",\"average_rating_ft\":\"30.0\",\"sort_title_s\":\"post\",\"primary_category_id_i\":\"3\",\"last_indexed_at_ds\":\"2019-10-01T13:24:02Z\",\"legacy_field_s\":\"legacy Post\",\"legacy_array_field_sm\":[\"first string\",\"second string\"],\"title_text\":{\"boost\":2,\"value\":\"Post\"},\"text_array_text\":{\"boost\":3,\"value\":[\"Post\",\"Post\"]},\"body_textsv\":\"Lorem\",\"backwards_title_text\":\"tsoP\",\"custom_integer:3_i\":\"1\"}},\"add\":{\"boost\":15.25,\"doc\":{\"id\":\"Post 2\",\"type\":[\"Post\",\"SuperClass\",\"MockRecord\"],\"class_name\":\"Post\",\"title_ss\":\"Post\",\"blog_id_i\":\"2\",\"category_ids_im\":\"2\",\"average_rating_ft\":\"60.0\",\"sort_title_s\":\"post\",\"primary_category_id_i\":\"2\",\"last_indexed_at_ds\":\"2019-10-01T13:24:02Z\",\"legacy_field_s\":\"legacy Post\",\"legacy_array_field_sm\":[\"first string\",\"second string\"],\"title_text\":{\"boost\":2,\"value\":\"Post\"},\"text_array_text\":{\"boost\":3,\"value\":[\"Post\",\"Post\"]},\"body_textsv\":\"Ipsum\",\"backwards_title_text\":\"tsoP\",\"custom_integer:2_i\":\"1\"}},\"add\":{\"boost\":22.75,\"doc\":{\"id\":\"Post 3\",\"type\":[\"Post\",\"SuperClass\",\"MockRecord\"],\"class_name\":\"Post\",\"title_ss\":\"Post\",\"blog_id_i\":\"3\",\"category_ids_im\":\"1\",\"average_rating_ft\":\"90.0\",\"sort_title_s\":\"post\",\"primary_category_id_i\":\"1\",\"last_indexed_at_ds\":\"2019-10-01T13:24:02Z\",\"legacy_field_s\":\"legacy Post\",\"legacy_array_field_sm\":[\"first string\",\"second string\"],\"title_text\":{\"boost\":2,\"value\":\"Post\"},\"text_array_text\":{\"boost\":3,\"value\":[\"Post\",\"Post\"]},\"body_textsv\":\"Dolor\",\"backwards_title_text\":\"tsoP\",\"custom_integer:1_i\":\"1\"}}}",
:headers => { "Content-Type" => "application/json" }
} Here's the formatted XML: <?xml version=\"1.0\" encoding=\"UTF-8\"?>
<add>
<doc boost=\"7.75\">
<field name=\"id\">Post 1</field>
<field name=\"type\">Post</field>
<field name=\"type\">SuperClass</field>
<field name=\"type\">MockRecord</field>
<field name=\"class_name\">Post</field>
<field name=\"title_ss\">Post</field>
<field name=\"blog_id_i\">1</field>
<field name=\"category_ids_im\">3</field>
<field name=\"average_rating_ft\">30.0</field>
<field name=\"sort_title_s\">post</field>
<field name=\"primary_category_id_i\">3</field>
<field name=\"last_indexed_at_ds\">2019-10-01T13:07:36Z</field>
<field name=\"legacy_field_s\">legacy Post</field>
<field name=\"legacy_array_field_sm\">first string</field>
<field name=\"legacy_array_field_sm\">second string</field>
<field boost=\"2\" name=\"title_text\">Post</field>
<field boost=\"3\" name=\"text_array_text\">Post</field>
<field boost=\"3\" name=\"text_array_text\">Post</field>
<field name=\"body_textsv\">Lorem</field>
<field name=\"backwards_title_text\">tsoP</field>
<field name=\"custom_integer:3_i\">1</field>
</doc>
<doc boost=\"15.25\">
<field name=\"id\">Post 2</field>
<field name=\"type\">Post</field>
<field name=\"type\">SuperClass</field>
<field name=\"type\">MockRecord</field>
<field name=\"class_name\">Post</field>
<field name=\"title_ss\">Post</field>
<field name=\"blog_id_i\">2</field>
<field name=\"category_ids_im\">2</field>
<field name=\"average_rating_ft\">60.0</field>
<field name=\"sort_title_s\">post</field>
<field name=\"primary_category_id_i\">2</field>
<field name=\"last_indexed_at_ds\">2019-10-01T13:07:36Z</field>
<field name=\"legacy_field_s\">legacy Post</field>
<field name=\"legacy_array_field_sm\">first string</field>
<field name=\"legacy_array_field_sm\">second string</field>
<field boost=\"2\" name=\"title_text\">Post</field>
<field boost=\"3\" name=\"text_array_text\">Post</field>
<field boost=\"3\" name=\"text_array_text\">Post</field>
<field name=\"body_textsv\">Ipsum</field>
<field name=\"backwards_title_text\">tsoP</field>
<field name=\"custom_integer:2_i\">1</field>
</doc>
<doc boost=\"22.75\">
<field name=\"id\">Post 3</field>
<field name=\"type\">Post</field>
<field name=\"type\">SuperClass</field>
<field name=\"type\">MockRecord</field>
<field name=\"class_name\">Post</field>
<field name=\"title_ss\">Post</field>
<field name=\"blog_id_i\">3</field>
<field name=\"category_ids_im\">1</field>
<field name=\"average_rating_ft\">90.0</field>
<field name=\"sort_title_s\">post</field>
<field name=\"primary_category_id_i\">1</field>
<field name=\"last_indexed_at_ds\">2019-10-01T13:07:36Z</field>
<field name=\"legacy_field_s\">legacy Post</field>
<field name=\"legacy_array_field_sm\">first string</field>
<field name=\"legacy_array_field_sm\">second string</field>
<field boost=\"2\" name=\"title_text\">Post</field>
<field boost=\"3\" name=\"text_array_text\">Post</field>
<field boost=\"3\" name=\"text_array_text\">Post</field>
<field name=\"body_textsv\">Dolor</field>
<field name=\"backwards_title_text\">tsoP</field>
<field name=\"custom_integer:1_i\">1</field>
</doc>
</add> And formatted JSON (I had to manually convert it to an array with separated [{"add"=>
{"boost"=>7.75,
"doc"=>
{"id"=>"Post 1",
"type"=>["Post", "SuperClass", "MockRecord"],
"class_name"=>"Post",
"title_ss"=>"Post",
"blog_id_i"=>"1",
"category_ids_im"=>"3",
"average_rating_ft"=>"30.0",
"sort_title_s"=>"post",
"primary_category_id_i"=>"3",
"last_indexed_at_ds"=>"2019-10-01T13:24:02Z",
"legacy_field_s"=>"legacy Post",
"legacy_array_field_sm"=>["first string", "second string"],
"title_text"=>{"boost"=>2, "value"=>"Post"},
"text_array_text"=>{"boost"=>3, "value"=>["Post", "Post"]},
"body_textsv"=>"Lorem",
"backwards_title_text"=>"tsoP",
"custom_integer:3_i"=>"1"}}},
{"add"=>
{"boost"=>15.25,
"doc"=>
{"id"=>"Post 2",
"type"=>["Post", "SuperClass", "MockRecord"],
"class_name"=>"Post",
"title_ss"=>"Post",
"blog_id_i"=>"2",
"category_ids_im"=>"2",
"average_rating_ft"=>"60.0",
"sort_title_s"=>"post",
"primary_category_id_i"=>"2",
"last_indexed_at_ds"=>"2019-10-01T13:24:02Z",
"legacy_field_s"=>"legacy Post",
"legacy_array_field_sm"=>["first string", "second string"],
"title_text"=>{"boost"=>2, "value"=>"Post"},
"text_array_text"=>{"boost"=>3, "value"=>["Post", "Post"]},
"body_textsv"=>"Ipsum",
"backwards_title_text"=>"tsoP",
"custom_integer:2_i"=>"1"}}},
{"add"=>
{"boost"=>22.75,
"doc"=>
{"id"=>"Post 3",
"type"=>["Post", "SuperClass", "MockRecord"],
"class_name"=>"Post",
"title_ss"=>"Post",
"blog_id_i"=>"3",
"category_ids_im"=>"1",
"average_rating_ft"=>"90.0",
"sort_title_s"=>"post",
"primary_category_id_i"=>"1",
"last_indexed_at_ds"=>"2019-10-01T13:24:02Z",
"legacy_field_s"=>"legacy Post",
"legacy_array_field_sm"=>["first string", "second string"],
"title_text"=>{"boost"=>2, "value"=>"Post"},
"text_array_text"=>{"boost"=>3, "value"=>["Post", "Post"]},
"body_textsv"=>"Dolor",
"backwards_title_text"=>"tsoP",
"custom_integer:1_i"=>"1"}}}] The {
:data => {
:fq => ["type:Post"],
:q => "(_query_:\"{!edismax qf='body_textsv^0.2 title_text text_array_text backwards_title_text tags_textv' mm='1' bq='blog_id_i:1^0.9' bf='div(field(average_rating_ft),100)'}Post Ipsum\" AND _query_:\"{!edismax qf='title_text text_array_text body_textsv backwards_title_text tags_textv' mm='2'}Post\")",
:fl => "* score",
:start => 0,
:rows => 30 }
} And responses for XML and JSON request respectively (the global { "responseHeader" => { "status" => 0, "QTime" => 5 },
"response" =>
{ "numFound" => 3,
"start" => 0,
"maxScore" => 36.18859,
"docs" =>
[{ "id" => "Post 2",
"title_ss" => "Post",
"last_indexed_at_ds" => "2019-10-01T13:48:42Z",
"body_textsv" => ["Ipsum"],
"_version_" => 1646199017083764736,
"score" => 36.18859 },
{ "id" => "Post 3",
"title_ss" => "Post",
"last_indexed_at_ds" => "2019-10-01T13:48:42Z",
"body_textsv" => ["Dolor"],
"_version_" => 1646199017085861888,
"score" => 34.87313 },
{ "id" => "Post 1",
"title_ss" => "Post",
"last_indexed_at_ds" => "2019-10-01T13:48:42Z",
"body_textsv" => ["Lorem"],
"_version_" => 1646199017071181824,
"score" => 13.51507 }] } }
{ "responseHeader" => { "status" => 0, "QTime" => 6 },
"response" =>
{ "numFound" => 3,
"start" => 0,
"maxScore" => 13.21888,
"docs" =>
[{ "id" => "Post 3",
"title_ss" => "Post",
"last_indexed_at_ds" => "2019-10-01T13:50:37Z",
"body_textsv" => ["Dolor"],
"_version_" => 1646199138248818688,
"score" => 13.21888 },
{ "id" => "Post 2",
"title_ss" => "Post",
"last_indexed_at_ds" => "2019-10-01T13:50:37Z",
"body_textsv" => ["Ipsum"],
"_version_" => 1646199138247770112,
"score" => 13.090725 },
{ "id" => "Post 1",
"title_ss" => "Post",
"last_indexed_at_ds" => "2019-10-01T13:50:37Z",
"body_textsv" => ["Lorem"],
"_version_" => 1646199138243575808,
"score" => 4.8533697 }] } } |
And even for the simplest search XML and JSON responses look differently. For this example: search = Sunspot.search(Post) do
fulltext('Post Ipsum') do
minimum_match 1
end
end When using XML format (the { "responseHeader" => { "status" => 0, "QTime" => 3 },
"response" =>
{ "numFound" => 3,
"start" => 0,
"maxScore" => 32.528587,
"docs" =>
[{ "id" => "Post 2",
"title_ss" => "Post",
"last_indexed_at_ds" => "2019-10-01T14:37:09Z",
"body_textsv" => ["Ipsum"],
"_version_" => 1646202065285808128,
"score" => 32.528587 },
{ "id" => "Post 3",
"title_ss" => "Post",
"last_indexed_at_ds" => "2019-10-01T14:37:09Z",
"body_textsv" => ["Dolor"],
"_version_" => 1646202065288953856,
"score" => 15.473748 },
{ "id" => "Post 1",
"title_ss" => "Post",
"last_indexed_at_ds" => "2019-10-01T14:37:09Z",
"body_textsv" => ["Lorem"],
"_version_" => 1646202065279516672,
"score" => 5.8026557 }] } } When using JSON format: { "responseHeader" => { "status" => 0, "QTime" => 4 },
"response" =>
{ "numFound" => 3,
"start" => 0,
"maxScore" => 17.054836,
"docs" =>
[{ "id" => "Post 2",
"title_ss" => "Post",
"last_indexed_at_ds" => "2019-10-01T14:37:43Z",
"body_textsv" => ["Ipsum"],
"_version_" => 1646202100724531200,
"score" => 17.054836 },
{ "id" => "Post 3",
"title_ss" => "Post",
"last_indexed_at_ds" => "2019-10-01T14:37:43Z",
"body_textsv" => ["Dolor"],
"_version_" => 1646202100726628352,
"score" => 5.8026557 },
{ "id" => "Post 1",
"title_ss" => "Post",
"last_indexed_at_ds" => "2019-10-01T14:37:43Z",
"body_textsv" => ["Lorem"],
"_version_" => 1646202100722434048,
"score" => 1.9342185 }] } } Thus I'm assuming there's another discrepancy between these two protocols, not related to this feature. The protocol affects how documents were indexed and index time boosts are calculated differently. BTW, we're on Solr 8 now and index-time boosts were removed from Solr at all. |
@heaven Im not sure I follow now. Are you saying that Solr returns now different results now depending on what protocol was used for the query? |
@serggl the opposite – Solr returns different results depending on what protocol was used while indexing documents. The same docs indexed with XML protocol get higher scores than the same documents indexed with JSON protocol. Even for a simple search that I described here #959 (comment) the same docs indexed using different transports get different scores when searching. I am wondering if Solr treats these differently somehow multiplying the resulting boost: <field boost=\"3\" name=\"text_array_text\">Post</field>
<field boost=\"3\" name=\"text_array_text\">Post</field> JSON: "text_array_text"=>{"boost"=>3, "value"=>["Post", "Post"]}, I will try removing all index-time boosts from the Post model and see how that changes the situation. For now, it feels pretty much like boosts described in XML applied differently than those from JSON docs. |
So my assumption above was correct and this is what causes the problem sunspot/sunspot/spec/mocks/post.rb Line 40 in bce171b
With the XML transport, this boost is applied as many times as long the array is. With JSON just once. Only I removed this boost and both XML and JSON started returning equal scores in results.. |
…ncy in boost calculation between XML and JSON protocols * Restore test case for boosted queries combined with full-texts
Not sure how should be deal with this index time boost issue you found... |
Not just deprecated, these were removed completely. Don’t know exactly from which version but in 8 these don’t work.
Best,
Alex
… 3 окт. 2019 г., в 09:16, Sergey A. Glukhov ***@***.***> написал(а):
Not sure how should be deal with this index time boost issue you found...
But if you say that current Solr version have them deprecated, that makes it not too significant.
So this looks good to merge for me
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
They also don’t work in 7.
…On Thu, Oct 3, 2019 at 3:49 AM Alexander S. ***@***.***> wrote:
Not just deprecated, these were removed completely. Don’t know exactly
from which version but in 8 these don’t work.
Best,
Alex
> 3 окт. 2019 г., в 09:16, Sergey A. Glukhov ***@***.***>
написал(а):
>
>
> Not sure how should be deal with this index time boost issue you found...
> But if you say that current Solr version have them deprecated, that
makes it not too significant.
> So this looks good to merge for me
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub, or mute the thread.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#959?email_source=notifications&email_token=ABBBTMZTSTHPRZ5OIWA33CDQMWWZDA5CNFSM4I347AC2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAHPL3A#issuecomment-537851372>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABBBTM7DKLOKBJYZVPSKMVDQMWWZDANCNFSM4I347ACQ>
.
|
Hi, how about merging this one? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks good.
Currently, only full-text searches support boosts while these can be used with almost any searches that do not include full-text queries.
This patch adds support for boosts and injects an empty Dismax search to the scope with a no-op
*:*
query which addsbf
,bq
, andboost
query parameters.An example: