-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add MRR to quepid as a communal scorer. #525
Conversation
I took the ticket name out of the title, because it gets confusing that the ticket isn't the pr, if that makes sense... |
Is this MRR or RR? It appears to say RR in the code? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RR for a single query, MRR for a set of queries. Quepid will be displaying MRR, the javascript computes RR for each individual query, which quepid averages together. In general, the aggregate name is used when referring to a metric that is being averaged across queries. Either name is fine in the issue. |
I guess I may need to take this on faith. When we refer to DCG, we have a file named DCG.js. I would assume that if we are referring to MRR, we would have a file named MRR.js, and if there was a seperate metric called RR, then it would have a scorer named RR.js? I don't mean to be obtuse here, but the goal of Quepid is to make metrics etc simple and something I can explain to everyday users.. So I feel like if we are adding MRR to Quepid, then the file should be called mrr.js. |
Okay, now I am super confused. Is this MRR or RR? Or does this NOT follow the pattern that we have of P, AP, DCG, NDCG etc, and is a new naming pattern? |
There is only one metric, reciprocal rank. When we average the reciprocal rank scores for multiple queries, the result is called mean reciprocal rank. From the perspective of what the code is computing, rr@10.js computes the reciprocal rank for a single query. The Quepid app uses that RR value to average across the set of queries in the collection to produce the MRR score for the full set. The same is true of the computation being called AP@10, it computes a single value for a query, which is then averaged to produce what should be called MAP@10 in the Quepid display. One metric, averaged across multiple queries. |
AP should be named MAP, mean average precision. All of the metric names have been established by the evaluation metric community. The names used by trec_eval should be considered canonical. Note, nDCG does not get called MnDCG when averaged. P@k does not get called MP@k when averaged. Only Mean Reciprocal Rank and Mean Average Precision have the M named variant for the aggregate score across a set of queries. |
Description
Add reciprocal rank as a communal Scorer
Motivation and Context
Closes #523 adds a useful metric for known item search evaluation.
How Has This Been Tested?
Local install of quepid started with bin/setup_docker followed by bin/docker server
Screenshots or GIFs (if appropriate):
Types of changes
Checklist: