Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ranking Evaluation API: Add MAP and recall@k metrics #51676

Open
joshdevins opened this issue Jan 30, 2020 · 9 comments
Open

Ranking Evaluation API: Add MAP and recall@k metrics #51676

joshdevins opened this issue Jan 30, 2020 · 9 comments
Labels
>enhancement :Search/Ranking Scoring, rescoring, rank evaluation. Team:Search Meta label for search team

Comments

@joshdevins
Copy link
Member

joshdevins commented Jan 30, 2020

Several information retrieval "tasks" use a few common evaluation metrics including mean average precision (MAP) [1] and recall@k, in addition to what is already supported (e.g. ERR, nDCG, MRR). Sometimes the geometric MAP (GMAP) variant is used and if it's an easy option to add (like how NDCG is an option on DCG), we should add this option. These are standard measures in many TREC and related tasks (e.g. MSMARCO). In particular, reranking tasks use recall@k to tune the base query which is input to a reranker (e.g. tuning BM25 or RM3 parameters).

[1] "GMAP is the geometric mean of per-topic average precision, in contrast with MAP which is the arithmetic mean"

@joshdevins joshdevins added >feature >enhancement :Search/Ranking Scoring, rescoring, rank evaluation. labels Jan 30, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/Ranking)

@joshdevins
Copy link
Member Author

Potentially a duplicate of #29653

@sbourke
Copy link

sbourke commented Feb 19, 2020

@joshdevins

By mean average precision did you mean like the one described in the stanford IR course - https://web.stanford.edu/class/cs276/handouts/EvaluationNew-handout-1-per.pdf (The book itself does not appear to talk about MAP) 🤷‍♂

I've added it here sbourke@652ff11#diff-5fb623709353794e709a58f45104baec

I've added recall as well.
sbourke@652ff11

It thats what you're generally looking for please let me know and I'll clean up the code. The tests are based off of the precision tests.

@joshdevins
Copy link
Member Author

Hang tight, I've also got a branch with some other things cleaned up in it. Let's sync up after I have a PR in.

@joshdevins
Copy link
Member Author

@sbourke I've got this draft PR going but it's not got MAP in it yet. #52577

@jtibshirani
Copy link
Contributor

@sbourke thanks for your interest in contributing! Perhaps we can first work to integrate @joshdevins's PR to add recall@k (#52577), then you could follow-up with a PR to add mAP. Feel free to add any suggestions to the recall@k PR.

It's generally nice to separate out changes into small PRs, so I think it's fine to add each metric separately. It would also be great to get @cbuescher's thoughts on the proposed metrics to make sure we're happy to add them.

@joshdevins
Copy link
Member Author

@sbourke I think that MAP definition is what I understand. I'm looking at the TREC definition and I think it's the same.

@sbourke
Copy link

sbourke commented Feb 24, 2020

@joshdevins Your explicit confusion matrix matrix is much nicer than what I was doing. I'll look at the coder changes more closely today. Do you have GMAP as well, or should I do that.

@joshdevins
Copy link
Member Author

joshdevins commented Feb 26, 2020

Have a look at the PR (#52577) again — it's ready to merge so you can use it as a basis for the next change set if you want.

Your explicit confusion matrix matrix is much nicer than what I was doing.

I think from the ML perspective, it's typical for how we evaluate and calculate metrics. We decided to remove it for now though as it introduces a bit of unnecessary indirection in the code. We might put it back later after the MAP implementation. See related discussion in the PR.

After removing the confusion matrix, I normalized all the variables and way of calculating metrics in PrecisionAtK and RecallAtK to be based on the logical meaning of the metric components. For example, instead of counting truePositives, we count relevantDocsRetrieved. This makes the codebase uniform and consistent with the MetricDetail we return.

Do you have GMAP as well, or should I do that.

I haven't done anything for (G)MAP yet so you are welcome to contribute if you want. Let me know if you are still interested in doing a feature branch for that work. If you haven't already, have a look at CONTRIBUTING.md for some details on how we take contributions through PRs.

We should be able to implement GMAP as an option on the MAP metric, much as the DCG metric provides the normalize option to get nDCG. The change for GMAP would just be how the per-query average-precision calculations are combined in the combine function, as the default is just the mean (so will work fine for MAP).

joshdevins added a commit that referenced this issue Feb 27, 2020
This change adds the recall@k metric and refactors precision@k to match
the new metric.

Recall@k is an important metric to use for learning to rank (LTR)
use-cases. Candidate generation or first ranking phase ranking functions
are often optimized for high recall, in order to generate as many
relevant candidates in the top-k as possible for a second phase of
ranking. Adding this metric allows tuning that base query for LTR.

See: #51676
joshdevins added a commit that referenced this issue Feb 27, 2020
This change adds the recall@k metric and refactors precision@k to match
the new metric.

Recall@k is an important metric to use for learning to rank (LTR)
use-cases. Candidate generation or first ranking phase ranking functions
are often optimized for high recall, in order to generate as many
relevant candidates in the top-k as possible for a second phase of
ranking. Adding this metric allows tuning that base query for LTR.

See: #51676
Backports: #52577
@rjernst rjernst added the Team:Search Meta label for search team label May 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search/Ranking Scoring, rescoring, rank evaluation. Team:Search Meta label for search team
Projects
None yet
Development

No branches or pull requests

5 participants