Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds recall@k metric to rank eval API #52577

Merged
merged 2 commits into from
Feb 27, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@
import org.elasticsearch.index.rankeval.ExpectedReciprocalRank;
import org.elasticsearch.index.rankeval.MeanReciprocalRank;
import org.elasticsearch.index.rankeval.PrecisionAtK;
import org.elasticsearch.index.rankeval.RecallAtK;
import org.elasticsearch.index.rankeval.RankEvalRequest;
import org.elasticsearch.index.rankeval.RankEvalResponse;
import org.elasticsearch.index.rankeval.RankEvalSpec;
Expand Down Expand Up @@ -130,9 +131,9 @@ private static List<RatedRequest> createTestEvaluationSpec() {
*/
public void testMetrics() throws IOException {
List<RatedRequest> specifications = createTestEvaluationSpec();
List<Supplier<EvaluationMetric>> metrics = Arrays.asList(PrecisionAtK::new, MeanReciprocalRank::new, DiscountedCumulativeGain::new,
() -> new ExpectedReciprocalRank(1));
double expectedScores[] = new double[] {0.4285714285714286, 0.75, 1.6408962261063627, 0.4407738095238095};
List<Supplier<EvaluationMetric>> metrics = Arrays.asList(PrecisionAtK::new, RecallAtK::new,
MeanReciprocalRank::new, DiscountedCumulativeGain::new, () -> new ExpectedReciprocalRank(1));
double expectedScores[] = new double[] {0.4285714285714286, 1.0, 0.75, 1.6408962261063627, 0.4407738095238095};
int i = 0;
for (Supplier<EvaluationMetric> metricSupplier : metrics) {
RankEvalSpec spec = new RankEvalSpec(specifications, metricSupplier.get());
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,7 @@
import org.elasticsearch.index.rankeval.MeanReciprocalRank;
import org.elasticsearch.index.rankeval.MetricDetail;
import org.elasticsearch.index.rankeval.PrecisionAtK;
import org.elasticsearch.index.rankeval.RecallAtK;
import org.elasticsearch.join.aggregations.ChildrenAggregationBuilder;
import org.elasticsearch.rest.RestStatus;
import org.elasticsearch.search.SearchHits;
Expand Down Expand Up @@ -696,7 +697,7 @@ public void testDefaultNamedXContents() {

public void testProvidedNamedXContents() {
List<NamedXContentRegistry.Entry> namedXContents = RestHighLevelClient.getProvidedNamedXContents();
assertEquals(57, namedXContents.size());
assertEquals(59, namedXContents.size());
Map<Class<?>, Integer> categories = new HashMap<>();
List<String> names = new ArrayList<>();
for (NamedXContentRegistry.Entry namedXContent : namedXContents) {
Expand All @@ -710,13 +711,15 @@ public void testProvidedNamedXContents() {
assertEquals(Integer.valueOf(3), categories.get(Aggregation.class));
assertTrue(names.contains(ChildrenAggregationBuilder.NAME));
assertTrue(names.contains(MatrixStatsAggregationBuilder.NAME));
assertEquals(Integer.valueOf(4), categories.get(EvaluationMetric.class));
assertEquals(Integer.valueOf(5), categories.get(EvaluationMetric.class));
assertTrue(names.contains(PrecisionAtK.NAME));
assertTrue(names.contains(RecallAtK.NAME));
assertTrue(names.contains(DiscountedCumulativeGain.NAME));
assertTrue(names.contains(MeanReciprocalRank.NAME));
assertTrue(names.contains(ExpectedReciprocalRank.NAME));
assertEquals(Integer.valueOf(4), categories.get(MetricDetail.class));
assertEquals(Integer.valueOf(5), categories.get(MetricDetail.class));
assertTrue(names.contains(PrecisionAtK.NAME));
assertTrue(names.contains(RecallAtK.NAME));
assertTrue(names.contains(MeanReciprocalRank.NAME));
assertTrue(names.contains(DiscountedCumulativeGain.NAME));
assertTrue(names.contains(ExpectedReciprocalRank.NAME));
Expand Down
81 changes: 67 additions & 14 deletions docs/reference/search/rank-eval.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -201,20 +201,21 @@ will be used. The following metrics are supported:
[[k-precision]]
===== Precision at K (P@k)

This metric measures the number of relevant results in the top k search results.
It's a form of the well-known
https://en.wikipedia.org/wiki/Information_retrieval#Precision[Precision] metric
that only looks at the top k documents. It is the fraction of relevant documents
in those first k results. A precision at 10 (P@10) value of 0.6 then means six
out of the 10 top hits are relevant with respect to the user's information need.

P@k works well as a simple evaluation metric that has the benefit of being easy
to understand and explain. Documents in the collection need to be rated as either
relevant or irrelevant with respect to the current query. P@k does not take
into account the position of the relevant documents within the top k results,
so a ranking of ten results that contains one relevant result in position 10 is
equally as good as a ranking of ten results that contains one relevant result
in position 1.
This metric measures the proportion of relevant results in the top k search results.
It's a form of the well-known
https://en.wikipedia.org/wiki/Evaluation_measures_(information_retrieval)#Precision[Precision]
metric that only looks at the top k documents. It is the fraction of relevant
documents in those first k results. A precision at 10 (P@10) value of 0.6 then
means 6 out of the 10 top hits are relevant with respect to the user's
information need.

P@k works well as a simple evaluation metric that has the benefit of being easy
to understand and explain. Documents in the collection need to be rated as either
relevant or irrelevant with respect to the current query. P@k is a set-based
metric and does not take into account the position of the relevant documents
within the top k results, so a ranking of ten results that contains one
relevant result in position 10 is equally as good as a ranking of ten results
that contains one relevant result in position 1.

[source,console]
--------------------------------
Expand Down Expand Up @@ -251,6 +252,58 @@ If set to 'true', unlabeled documents are ignored and neither count as relevant
|=======================================================================


[float]
[[k-recall]]
===== Recall at K (R@k)

This metric measures the total number of relevant results in the top k search
results. It's a form of the well-known
https://en.wikipedia.org/wiki/Evaluation_measures_(information_retrieval)#Recall[Recall]
metric. It is the fraction of relevant documents in those first k results
relative to all possible relevant results. A recall at 10 (R@10) value of 0.5 then
means 4 out of 8 relevant documents, with respect to the user's information
need, were retrieved in the 10 top hits.

R@k works well as a simple evaluation metric that has the benefit of being easy
to understand and explain. Documents in the collection need to be rated as either
relevant or irrelevant with respect to the current query. R@k is a set-based
metric and does not take into account the position of the relevant documents
within the top k results, so a ranking of ten results that contains one
relevant result in position 10 is equally as good as a ranking of ten results
that contains one relevant result in position 1.

[source,console]
--------------------------------
GET /twitter/_rank_eval
{
"requests": [
{
"id": "JFK query",
"request": { "query": { "match_all": {}}},
"ratings": []
}],
"metric": {
"recall": {
"k" : 20,
"relevant_rating_threshold": 1
}
}
}
--------------------------------
// TEST[setup:twitter]

The `recall` metric takes the following optional parameters

[cols="<,<",options="header",]
|=======================================================================
|Parameter |Description
|`k` |sets the maximum number of documents retrieved per query. This value will act in place of the usual `size` parameter
in the query. Defaults to 10.
|`relevant_rating_threshold` |sets the rating threshold above which documents are considered to be
"relevant". Defaults to `1`.
|=======================================================================


[float]
===== Mean reciprocal rank

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@
import java.io.IOException;

/**
* Details about a specific {@link EvaluationMetric} that should be included in the resonse.
* Details about a specific {@link EvaluationMetric} that should be included in the response.
*/
public interface MetricDetail extends ToXContentObject, NamedWriteable {

Expand Down
Loading