New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ranking evaluation API #27478

Merged
merged 222 commits into from Dec 14, 2017

Conversation

Projects
None yet
5 participants
@cbuescher
Member

cbuescher commented Nov 21, 2017

This change adds a module containing a new _rank_eval API that evaluates the quality of ranked search results over a set of typical search queries.

Given this set of queries and a list or manually rated documents, the API is able to calculates information retrieval metrics like mean reciprocal rank, precision or discounted cumulative gain.

In its current state, the basic request structure looks like this:

GET /my_index/_rank_eval
{
    "requests": [ 
        {
            "id": "amsterdam_query", 
            "request": {
                "query": { "match": { "text": "amsterdam" }}
            },
            "ratings": [ <3> 
                 { "_index": "my_index", "_id": "doc1", "rating": 0 },
                 { "_index": "my_index", "_id": "doc2", "rating": 3},
                 { "_index": "my_index", "_id": "doc3", "rating": 1 }
            ]
        },
        {
            "id": "berlin_query",
            "request": {
                "query": { "match": { "text": "berlin" }}
            },
            "ratings": [
                { "_index": "my_index", "_id": "doc1", "rating": 1 }
            ]
        }, [...]
     ], 
    "metric": { 
       "reciprocal_rank": { ... parameters ... }
   }
}

but there are variants that use templated queries that are described in the docs in more detail. Currently the metrics provided are Prec@ (Precision at k), Mean Reciprocal Rank (MRR) and Distributed Cumulative Gain (DCG, including a normalized NDCG variant).

The response returns an overall evaluation score and a detailed breakdown of how each query contributes to the total score. Also the result contains the ranked search hits and a list of unrated documents for each query to enable the client to explain the calculation to the user and ask for furhter user feedback (e.g. rate the currently unrated documents).

Closes #19195

Isabel Drost-Fromm and others added some commits Jun 30, 2016

Initial commit for Module to compute metrics on queries
This is an initial squashed commit of the work on a new feature for query metrics
proposed in #18798.
Adding rest layer parsing and response rendering
Adding parsers for the rest request and the various components within, also
extending the existing rest test and adding rendering of the response.
Isabel Drost-Fromm
Merge pull request #19283 from cbuescher/rank-add-restparsing
Query evaluation: Adding rest layer parsing and response rendering
Add Reciprocal Rank query evaluation metric
This adds a second query evaluation metric alongside precision_at. Reciprocal
Rank is defined as 1/rank, where rank is the position of the first relevant
document in the search result. The results are averaged across all queries
across the sample of queries, according to
https://en.wikipedia.org/wiki/Mean_reciprocal_rank
Isabel Drost-Fromm
Use type information in request
Adds parsing of type and actually using it in TransportRankEvalAction.

Missing: A good idea how to actually test this in isolation...
Merge pull request #19300 from cbuescher/add-reciprocal-rank
Query evaluation: Add reciprocal rank
Isabel Drost-Fromm
Add index and type information to rated doc
Also add roundtrip testing of the xcontent serialisation of RatedDoc
Isabel Drost-Fromm
Moving averaging of partial evaluation results to RankedListQualityMe…
…tric

For the two current metrics Prec@ and reciprocal rank we currently average the
partial results in the transport action. If other metric later need a different
behaviour or want to parametrize this, this operation should be part of the
metric itself, so this change moves it there. Also removing on of the two test
packages, main code is also in one package only.
Merge pull request #19686 from cbuescher/rankMetric-combine
Moving averaging of partial evaluation results to RankedListQualityMetric
@colings86

@cbuescher I left come comments on the documentation but most of these are just small typos

Show outdated Hide outdated docs/reference/search/rank-eval.asciidoc Outdated
Show outdated Hide outdated docs/reference/search/rank-eval.asciidoc Outdated
Show outdated Hide outdated docs/reference/search/rank-eval.asciidoc Outdated
Show outdated Hide outdated docs/reference/search/rank-eval.asciidoc Outdated
Show outdated Hide outdated docs/reference/search/rank-eval.asciidoc Outdated
Show outdated Hide outdated docs/reference/search/rank-eval.asciidoc Outdated
@jimczi

I did a first round focused on docs and transport action. I left some minor comments and a question.
I'll do a second round later for the rank eval implementation.

Show outdated Hide outdated docs/reference/search/rank-eval.asciidoc Outdated
Show outdated Hide outdated docs/reference/search/rank-eval.asciidoc Outdated
Show outdated Hide outdated .../main/java/org/elasticsearch/index/rankeval/TransportRankEvalAction.java Outdated

cbuescher added some commits Nov 23, 2017

Use msearch instead of single search (#27520)
Change TransportRankEvalAction to use one MultiSearchRequest instead of issuing several parallel search requests to simplify the transport action.
// TEST[setup:twitter]
The `precision` metric takes the following optional parameters

This comment has been minimized.

@mayya-sharipova

mayya-sharipova Nov 28, 2017

Contributor

Is there a parameter for k in the P@k, or k is always equal to 10?

@mayya-sharipova

mayya-sharipova Nov 28, 2017

Contributor

Is there a parameter for k in the P@k, or k is always equal to 10?

This comment has been minimized.

@cbuescher

cbuescher Nov 28, 2017

Member

We had an expiclit parameter at some point but removed it since the size of the response can be controlled easily using the size parameter in each search request body (the request section accepts a full search request body). Thinking about it now, it might be a good idea to change it back though, because currently one would have to add the "size" to every query in the test suite (and could accidentally use different sizes), so maybe we should add this option back.
Thanks for the input, will think about this again.

@cbuescher

cbuescher Nov 28, 2017

Member

We had an expiclit parameter at some point but removed it since the size of the response can be controlled easily using the size parameter in each search request body (the request section accepts a full search request body). Thinking about it now, it might be a good idea to change it back though, because currently one would have to add the "size" to every query in the test suite (and could accidentally use different sizes), so maybe we should add this option back.
Thanks for the input, will think about this again.

This comment has been minimized.

@cbuescher

cbuescher Nov 28, 2017

Member

I opened #27569 as an example how adding k might look like. We can pull it in now or use it for further discussion.

@cbuescher

cbuescher Nov 28, 2017

Member

I opened #27569 as an example how adding k might look like. We can pull it in now or use it for further discussion.

cbuescher added some commits Nov 29, 2017

@cbuescher cbuescher removed the WIP label Dec 8, 2017

@jimczi

jimczi approved these changes Dec 11, 2017

I did a second round of review.
The code for the metrics look good to me and the tests covering is good.
I think it's time to push this in, though as we discussed last week I wonder if this should be a module or just pushed inside core es. As a module it would not be possible to define new metrics in a plugin so I'd vote for moving it to core. If you disagree then there are some work to do to make sure that the transport and rest client can use the query/response defined in the module. You can check the parent-join module integration to see what needs to be done (in the dependencies).

@cbuescher cbuescher changed the title from WIP: Add ranking evaluation API to Add ranking evaluation API Dec 11, 2017

@cbuescher

This comment has been minimized.

Show comment
Hide comment
@cbuescher

cbuescher Dec 13, 2017

Member

I wonder if this should be a module or just pushed inside core es. As a module it would not be possible to define new metrics in a plugin so I'd vote for moving it to core. If you disagree then there are some work to do to make sure that the transport and rest client can use the query/response defined in the module.

@jimczi thanks, I went ahead and did the SPI work needed to make the rest client aware of the namedXContent objects introduced in this feature in case we want to keept is as a module for now in the last commit. Let me know if you think there is anything else to add to this.

I'm still undecided on the module vs. core question, my gut feeling is that this is something similar to percolate that is good to keep in a separate module. We could either

  • move just the EvaluationMetric interface to core (feels a bit weird to me)
  • decide to publish the module so people can use those interfaces in their plugins if needed (from reading #27527 I get the impression that we seem to do this for some modules)
  • or we simply decide that at this point we don't want to support defining custom metrics and maybe add that capabilities later

I'm also fine with moving all code to its own package in core
I'd like to get some feedback from @colings86 and/or @clintongormley what they think about this too.

Member

cbuescher commented Dec 13, 2017

I wonder if this should be a module or just pushed inside core es. As a module it would not be possible to define new metrics in a plugin so I'd vote for moving it to core. If you disagree then there are some work to do to make sure that the transport and rest client can use the query/response defined in the module.

@jimczi thanks, I went ahead and did the SPI work needed to make the rest client aware of the namedXContent objects introduced in this feature in case we want to keept is as a module for now in the last commit. Let me know if you think there is anything else to add to this.

I'm still undecided on the module vs. core question, my gut feeling is that this is something similar to percolate that is good to keep in a separate module. We could either

  • move just the EvaluationMetric interface to core (feels a bit weird to me)
  • decide to publish the module so people can use those interfaces in their plugins if needed (from reading #27527 I get the impression that we seem to do this for some modules)
  • or we simply decide that at this point we don't want to support defining custom metrics and maybe add that capabilities later

I'm also fine with moving all code to its own package in core
I'd like to get some feedback from @colings86 and/or @clintongormley what they think about this too.

@clintongormley

This comment has been minimized.

Show comment
Hide comment
@clintongormley

clintongormley Dec 13, 2017

Member

I would keep it as a module for now. The SPI work that @rjernst is doing should allow plugins to work with the module. Either way, it is early days. Nobody is likely to plug in their own custom metric just yet.

Member

clintongormley commented Dec 13, 2017

I would keep it as a module for now. The SPI work that @rjernst is doing should allow plugins to work with the module. Either way, it is early days. Nobody is likely to plug in their own custom metric just yet.

@cbuescher

This comment has been minimized.

Show comment
Hide comment
@cbuescher

cbuescher Dec 13, 2017

Member

The SPI work that @rjernst is doing should allow plugins to work with the module

I also would like to keep it as a module for now, just want to point out that the problem we are facing here is that the interfaces that a user wanting to write her own custom metric needs to implement, are then contained in the module jar. I think the problem is more about whether to publish that module, and I'm not sure spi is going to fix this.

Member

cbuescher commented Dec 13, 2017

The SPI work that @rjernst is doing should allow plugins to work with the module

I also would like to keep it as a module for now, just want to point out that the problem we are facing here is that the interfaces that a user wanting to write her own custom metric needs to implement, are then contained in the module jar. I think the problem is more about whether to publish that module, and I'm not sure spi is going to fix this.

@cbuescher cbuescher merged commit 5406a9f into master Dec 14, 2017

2 checks passed

CLA Commit author is a member of Elasticsearch
Details
elasticsearch-ci Build finished.
Details
@clintongormley

This comment has been minimized.

Show comment
Hide comment
@clintongormley

clintongormley Dec 14, 2017

Member

woohoo!

Member

clintongormley commented Dec 14, 2017

woohoo!

cbuescher added a commit that referenced this pull request Dec 19, 2017

Backport of ranking evaluation API (#27478) (#27844)
This is the backport of the ranking evaluation feature (#27478) to the 6.x branches.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment