Ranking Evaluation API: Add MAP and recall@k metrics #51676

joshdevins · 2020-01-30T14:13:40Z

Several information retrieval "tasks" use a few common evaluation metrics including mean average precision (MAP) [1] and recall@k, in addition to what is already supported (e.g. ERR, nDCG, MRR). Sometimes the geometric MAP (GMAP) variant is used and if it's an easy option to add (like how NDCG is an option on DCG), we should add this option. These are standard measures in many TREC and related tasks (e.g. MSMARCO). In particular, reranking tasks use recall@k to tune the base query which is input to a reranker (e.g. tuning BM25 or RM3 parameters).

[1] "GMAP is the geometric mean of per-topic average precision, in contrast with MAP which is the arithmetic mean"

elasticmachine · 2020-01-30T14:13:46Z

Pinging @elastic/es-search (:Search/Ranking)

joshdevins · 2020-01-30T14:20:53Z

Potentially a duplicate of #29653

sbourke · 2020-02-19T17:32:53Z

@joshdevins

By mean average precision did you mean like the one described in the stanford IR course - https://web.stanford.edu/class/cs276/handouts/EvaluationNew-handout-1-per.pdf (The book itself does not appear to talk about MAP) 🤷‍♂

I've added it here sbourke@652ff11#diff-5fb623709353794e709a58f45104baec

I've added recall as well.
sbourke@652ff11

It thats what you're generally looking for please let me know and I'll clean up the code. The tests are based off of the precision tests.

joshdevins · 2020-02-20T13:00:52Z

Hang tight, I've also got a branch with some other things cleaned up in it. Let's sync up after I have a PR in.

joshdevins · 2020-02-20T18:24:46Z

@sbourke I've got this draft PR going but it's not got MAP in it yet. #52577

jtibshirani · 2020-02-20T23:50:50Z

@sbourke thanks for your interest in contributing! Perhaps we can first work to integrate @joshdevins's PR to add recall@k (#52577), then you could follow-up with a PR to add mAP. Feel free to add any suggestions to the recall@k PR.

It's generally nice to separate out changes into small PRs, so I think it's fine to add each metric separately. It would also be great to get @cbuescher's thoughts on the proposed metrics to make sure we're happy to add them.

joshdevins · 2020-02-24T10:33:50Z

@sbourke I think that MAP definition is what I understand. I'm looking at the TREC definition and I think it's the same.

sbourke · 2020-02-24T17:10:16Z

@joshdevins Your explicit confusion matrix matrix is much nicer than what I was doing. I'll look at the coder changes more closely today. Do you have GMAP as well, or should I do that.

joshdevins · 2020-02-26T09:48:50Z

Have a look at the PR (#52577) again — it's ready to merge so you can use it as a basis for the next change set if you want.

Your explicit confusion matrix matrix is much nicer than what I was doing.

I think from the ML perspective, it's typical for how we evaluate and calculate metrics. We decided to remove it for now though as it introduces a bit of unnecessary indirection in the code. We might put it back later after the MAP implementation. See related discussion in the PR.

After removing the confusion matrix, I normalized all the variables and way of calculating metrics in PrecisionAtK and RecallAtK to be based on the logical meaning of the metric components. For example, instead of counting truePositives, we count relevantDocsRetrieved. This makes the codebase uniform and consistent with the MetricDetail we return.

Do you have GMAP as well, or should I do that.

I haven't done anything for (G)MAP yet so you are welcome to contribute if you want. Let me know if you are still interested in doing a feature branch for that work. If you haven't already, have a look at CONTRIBUTING.md for some details on how we take contributions through PRs.

We should be able to implement GMAP as an option on the MAP metric, much as the DCG metric provides the normalize option to get nDCG. The change for GMAP would just be how the per-query average-precision calculations are combined in the combine function, as the default is just the mean (so will work fine for MAP).

This change adds the recall@k metric and refactors precision@k to match the new metric. Recall@k is an important metric to use for learning to rank (LTR) use-cases. Candidate generation or first ranking phase ranking functions are often optimized for high recall, in order to generate as many relevant candidates in the top-k as possible for a second phase of ranking. Adding this metric allows tuning that base query for LTR. See: #51676

This change adds the recall@k metric and refactors precision@k to match the new metric. Recall@k is an important metric to use for learning to rank (LTR) use-cases. Candidate generation or first ranking phase ranking functions are often optimized for high recall, in order to generate as many relevant candidates in the top-k as possible for a second phase of ranking. Adding this metric allows tuning that base query for LTR. See: #51676 Backports: #52577

elasticsearchmachine · 2024-07-12T09:46:01Z

Pinging @elastic/es-search-relevance (Team:Search Relevance)

joshdevins added >feature >enhancement :Search Relevance/Ranking Scoring, rescoring, rank evaluation. labels Jan 30, 2020

jtibshirani removed the >feature label Feb 8, 2020

joshdevins mentioned this issue Feb 20, 2020

Adds recall@k metric to rank eval API #52577

Merged

joshdevins mentioned this issue Feb 27, 2020

Adds recall@k metric to rank eval API #52889

Merged

codebrain mentioned this issue Apr 1, 2020

7.7.0 meta ticket (Part 2) elastic/elasticsearch-net#4533

Closed

rjernst added the Team:Search Meta label for search team label May 4, 2020

javanna added Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch and removed Team:Search Meta label for search team labels Jul 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ranking Evaluation API: Add MAP and recall@k metrics #51676

Ranking Evaluation API: Add MAP and recall@k metrics #51676

joshdevins commented Jan 30, 2020 •

edited

Loading

elasticmachine commented Jan 30, 2020

joshdevins commented Jan 30, 2020

sbourke commented Feb 19, 2020

joshdevins commented Feb 20, 2020

joshdevins commented Feb 20, 2020

jtibshirani commented Feb 20, 2020

joshdevins commented Feb 24, 2020

sbourke commented Feb 24, 2020

joshdevins commented Feb 26, 2020 •

edited

Loading

elasticsearchmachine commented Jul 12, 2024

Ranking Evaluation API: Add MAP and recall@k metrics #51676

Ranking Evaluation API: Add MAP and recall@k metrics #51676

Comments

joshdevins commented Jan 30, 2020 • edited Loading

elasticmachine commented Jan 30, 2020

joshdevins commented Jan 30, 2020

sbourke commented Feb 19, 2020

joshdevins commented Feb 20, 2020

joshdevins commented Feb 20, 2020

jtibshirani commented Feb 20, 2020

joshdevins commented Feb 24, 2020

sbourke commented Feb 24, 2020

joshdevins commented Feb 26, 2020 • edited Loading

elasticsearchmachine commented Jul 12, 2024

joshdevins commented Jan 30, 2020 •

edited

Loading

joshdevins commented Feb 26, 2020 •

edited

Loading