ordering of search results is affected by Max Results #89

ijt · 2019-07-03T19:13:55Z

Increasing the max results can affect the ordering of the search results. Here is an example.

Having stable ordering of search results would be a useful property and less surprising for users.

ijt · 2019-07-03T19:16:39Z

I would be willing to work on this.

hanwen · 2019-07-03T19:19:52Z

How would you do it?

Basically, you the search engine shows the best result on top. If you search over a larger corpus, you can find better matches, which displaces other results and changes ordering.

ijt · 2019-07-04T00:19:00Z

One possibility would be to present the results in the order they occur within the posting lists. Items in posting lists could have an additional weight field corresponding to their estimated general relevance. The posting lists could be sorted according to the weights, and re-sorted as necessary.

That would degrade the ordering though. The question needs some more thought.

hanwen · 2019-07-04T08:13:03Z

@ijt - If you mean "index shard" when you say "posting list", this is exactly how it works already.

Within a shard, files are ordered by importance (important files first), so eg. all things equal you get matches from non-test files before test files.

Then the shards themselves are ordered by "quality" score, which is mainly powered from the github star-count. So matches in github.com/google/guava get prefernce over matches in android.googlesource.com/platform/external/guava, even though the content is the same.

The problem is that matches have quality. If you are looking for "idiot", then the word "idiot" in an unimportant shard is a better match than the identifier "bidiOther" in an important shard.

If you increase the result count to include the unimportant shard, inevitably, this will upset the ordering.

One way out of this is to have a cheaper way to find quality matches. For example, currenltly we have
a section for file names, and file contents. If you add a separate corpus section for symbols (which would be smaller than file contents), you could search more shards with the same amount of CPU

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ordering of search results is affected by Max Results #89

ordering of search results is affected by Max Results #89

ijt commented Jul 3, 2019

ijt commented Jul 3, 2019

hanwen commented Jul 3, 2019

ijt commented Jul 4, 2019 •

edited

Loading

hanwen commented Jul 4, 2019 •

edited

Loading

ordering of search results is affected by Max Results #89

ordering of search results is affected by Max Results #89

Comments

ijt commented Jul 3, 2019

ijt commented Jul 3, 2019

hanwen commented Jul 3, 2019

ijt commented Jul 4, 2019 • edited Loading

hanwen commented Jul 4, 2019 • edited Loading

ijt commented Jul 4, 2019 •

edited

Loading

hanwen commented Jul 4, 2019 •

edited

Loading