New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query evaluation: Adding rest layer parsing and response rendering #19283

Merged
merged 1 commit into from Jul 21, 2016

Conversation

Projects
None yet
2 participants
@cbuescher
Member

cbuescher commented Jul 6, 2016

Adding parsing for the rest request for the first prototypical implementation of the rank_eval endpoint. Also extending the existing rest test and adding rendering of the response. This PR is against the WIP rank-eval feature branch.

Relates to #19195

@cbuescher

This comment has been minimized.

Show comment
Hide comment
@cbuescher

cbuescher Jul 6, 2016

Member

@MaineC for you to take a look then you are back. I made the rest request and tests work, mostly trying to stick to the syntax proposed in the comments and the original issue but changed it in some places where I thought it might be useful. Also changed a few of the classes representing the whole qa query request (e.g. deleted the RatedQuery, introduced the RatedDocument instead). This is all just a proposal, I will leave a few inline comments to motivate some of the changes.

Member

cbuescher commented Jul 6, 2016

@MaineC for you to take a look then you are back. I made the rest request and tests work, mostly trying to stick to the syntax proposed in the comments and the original issue but changed it in some places where I thought it might be useful. Also changed a few of the classes representing the whole qa query request (e.g. deleted the RatedQuery, introduced the RatedDocument instead). This is all just a proposal, I will leave a few inline comments to motivate some of the changes.

import org.elasticsearch.common.io.stream.NamedWriteable;
import org.elasticsearch.search.SearchHit;
public interface Evaluator extends NamedWriteable {

This comment has been minimized.

@cbuescher

cbuescher Jul 6, 2016

Member

I removed this in favour of making RankedListQualityMetric an abstract base class, we can re-introduce this if we need a common interface.

@cbuescher

cbuescher Jul 6, 2016

Member

I removed this in favour of making RankedListQualityMetric an abstract base class, we can re-introduce this if we need a common interface.

This comment has been minimized.

@MaineC

MaineC Jul 18, 2016

Contributor

+1

@MaineC

MaineC Jul 18, 2016

Contributor

+1

Collection<String> relevantDocIds = new ArrayList<>();
Collection<String> irrelevantDocIds = new ArrayList<>();
for (RatedDocument doc : ratedDocs) {
if (Rating.RELEVANT.equals(RatingMapping.mapTo(doc.getRating()))) {

This comment has been minimized.

@cbuescher

cbuescher Jul 6, 2016

Member

I'm not sure I'm totally happy with current mapping from the enum to the rating value. Relevant/Irrelevant or 0/1 works well for precion@ but probably not for something like Discounted Cumulative Gain, so maybe we need an abstraction for this.

@cbuescher

cbuescher Jul 6, 2016

Member

I'm not sure I'm totally happy with current mapping from the enum to the rating value. Relevant/Irrelevant or 0/1 works well for precion@ but probably not for something like Discounted Cumulative Gain, so maybe we need an abstraction for this.

This comment has been minimized.

@MaineC

MaineC Jul 18, 2016

Contributor

makes sense - treat whatever I wrote as a draft that can be adjusted as more metrics are added

@MaineC

MaineC Jul 18, 2016

Contributor

makes sense - treat whatever I wrote as a draft that can be adjusted as more metrics are added

public enum Rating {
RELEVANT, IRRELEVANT;
IRRELEVANT, RELEVANT;

This comment has been minimized.

@cbuescher

cbuescher Jul 6, 2016

Member

I switched this to IRRELEVANT(0), RELEVANT(1) because I found this more intuitive and it actually confused me quiet a bit when I got this wrong accidentally when working on the rest test.

@cbuescher

cbuescher Jul 6, 2016

Member

I switched this to IRRELEVANT(0), RELEVANT(1) because I found this more intuitive and it actually confused me quiet a bit when I got this wrong accidentally when working on the rest test.

This comment has been minimized.

@MaineC

MaineC Jul 18, 2016

Contributor

makes total sense - initial ordering was arbitrary

@MaineC

MaineC Jul 18, 2016

Contributor

makes total sense - initial ordering was arbitrary

* */
public class QuerySpec implements Writeable {
private int specId = 0;
private String specId;

This comment has been minimized.

@cbuescher

cbuescher Jul 6, 2016

Member

I changed this to a String because in the rest request examples I found things like "amsterdam_query" etc. I think its good to have something human readable here.

@cbuescher

cbuescher Jul 6, 2016

Member

I changed this to a String because in the rest request examples I found things like "amsterdam_query" etc. I think its good to have something human readable here.

This comment has been minimized.

@MaineC

MaineC Jul 18, 2016

Contributor

+1

@MaineC

MaineC Jul 18, 2016

Contributor

+1

this.specId = specId;
this.testRequest = testRequest;
this.indices = indices;
this.types = types;
this.ratedDocs = ratedDocs;

This comment has been minimized.

@cbuescher

cbuescher Jul 6, 2016

Member

Moved the rated docs here because from the proposed structure of the rest request I thought they better belong here.

@cbuescher

cbuescher Jul 6, 2016

Member

Moved the rated docs here because from the proposed structure of the rest request I thought they better belong here.

Adding rest layer parsing and response rendering
Adding parsers for the rest request and the various components within, also
extending the existing rest test and adding rendering of the response.
@@ -31,43 +31,49 @@
* For each precision at n computation the id of the search request specification used to generate search requests is returned
* for reference. In addition the averaged precision and the ids of all documents returned but not found annotated is returned.
* */
// TODO do we need an extra class for this or it RankEvalResponse enough?

This comment has been minimized.

@cbuescher

cbuescher Jul 6, 2016

Member

The RankEvalResponse currently is just a wrapper around the RankEvalResult, maybe we can simply get rid of this class and move the fields etc... to the response class.

@cbuescher

cbuescher Jul 6, 2016

Member

The RankEvalResponse currently is just a wrapper around the RankEvalResult, maybe we can simply get rid of this class and move the fields etc... to the response class.

This comment has been minimized.

@MaineC

MaineC Jul 20, 2016

Contributor

Yeah, I think this is a good idea.

@MaineC

MaineC Jul 20, 2016

Contributor

Yeah, I think this is a good idea.

private double qualityLevel;
/**Mapping from intent id to all documents seen for this intent that were not annotated.*/
private Map<Integer, Collection<String>> unknownDocs;
private Map<String, Collection<String>> unknownDocs;

This comment has been minimized.

@cbuescher

cbuescher Jul 6, 2016

Member

I was thinking if it maybe makes sense to not only return the unknown docs per query in the original request but also include partial quality metrics per query. Maybe its interesting to see e.g. which of the N original queries in the whole qa request performed bad compared to the others. Not sure though, so this is maybe a topic for further discussion.

@cbuescher

cbuescher Jul 6, 2016

Member

I was thinking if it maybe makes sense to not only return the unknown docs per query in the original request but also include partial quality metrics per query. Maybe its interesting to see e.g. which of the N original queries in the whole qa request performed bad compared to the others. Not sure though, so this is maybe a topic for further discussion.

This comment has been minimized.

@MaineC

MaineC Jul 20, 2016

Contributor

This question reminds me of an interesting larger topic that I believe could be related: At least for precision IIRC one can compute a micro-averaged or macro-averaged version. Maybe it makes sense to let users decide which of the two (or both) they want? No idea how this plays out with the other quality metrics you looked at.

@MaineC

MaineC Jul 20, 2016

Contributor

This question reminds me of an interesting larger topic that I believe could be related: At least for precision IIRC one can compute a micro-averaged or macro-averaged version. Maybe it makes sense to let users decide which of the two (or both) they want? No idea how this plays out with the other quality metrics you looked at.

}
String metricName = parser.currentName();
switch (metricName) {

This comment has been minimized.

@cbuescher

cbuescher Jul 6, 2016

Member

I just use a switch statement here to select the actual metric implementation for parsing. We can think about using a registry or sth. later if we decide the evaluation metrics need to be plugable.

@cbuescher

cbuescher Jul 6, 2016

Member

I just use a switch statement here to select the actual metric implementation for parsing. We can think about using a registry or sth. later if we decide the evaluation metrics need to be plugable.

This comment has been minimized.

@MaineC

MaineC Jul 20, 2016

Contributor

I think this is fine for now. While I do think expert users will want to tweak exactly which metric they use for evaluation, I guess it's ok to not make this plug-able right from the start.

@MaineC

MaineC Jul 20, 2016

Contributor

I think this is fine for now. While I do think expert users will want to tweak exactly which metric they use for evaluation, I guess it's ok to not make this plug-able right from the start.

QueryParseContext parseContext = new QueryParseContext(queryRegistry, parser, parseFieldMatcher);
if (restContent != null) {
parseRankEvalRequest(rankEvalRequest, request,
new RankEvalContext(parseFieldMatcher, parseContext, aggregators, suggesters));

This comment has been minimized.

@cbuescher

cbuescher Jul 6, 2016

Member

I don't really like the fact that we need to pass the aggregator parsers and suggester parsers here, but they are needed for SearchSourceBuilder.fromXContent() when parsing the request. Maybe something we can simplify along the road.

@cbuescher

cbuescher Jul 6, 2016

Member

I don't really like the fact that we need to pass the aggregator parsers and suggester parsers here, but they are needed for SearchSourceBuilder.fromXContent() when parsing the request. Maybe something we can simplify along the road.

This comment has been minimized.

@MaineC

MaineC Jul 20, 2016

Contributor

Thanks for this comment - I had been wondering why we have those in here. I agree that this would be nice to simplify.

@MaineC

MaineC Jul 20, 2016

Contributor

Thanks for this comment - I had been wondering why we have those in here. I agree that this would be nice to simplify.

}
RankEvalResponse response = new RankEvalResponse();
RankEvalResult result = new RankEvalResult(qualityTask.getTaskId(), qualitySum / specifications.size(), unknownDocs);

This comment has been minimized.

@cbuescher

cbuescher Jul 6, 2016

Member

We do the summing and averaging of the single query quality metrics all here in this class. However, I think this should probably move to the individual metrics (e.g. PrecisionAtN), unless we always want to average over the individual queries. Also probably topic of a follow up task.

@cbuescher

cbuescher Jul 6, 2016

Member

We do the summing and averaging of the single query quality metrics all here in this class. However, I think this should probably move to the individual metrics (e.g. PrecisionAtN), unless we always want to average over the individual queries. Also probably topic of a follow up task.

This comment has been minimized.

@MaineC

MaineC Jul 20, 2016

Contributor

+1 Also relates to my comment above about which average to provide. Come to think of it a bit more, it might even make sense to provide summarisation (or whatever the correct term is) statistics other than average.

@MaineC

MaineC Jul 20, 2016

Contributor

+1 Also relates to my comment above about which average to provide. Come to think of it a bit more, it might even make sense to provide summarisation (or whatever the correct term is) statistics other than average.

* */
public class RatedDocument implements Writeable {
private final String docId;

This comment has been minimized.

@MaineC

MaineC Jul 20, 2016

Contributor

I believe @clintongormley had a good point that this ID should be augmented with index and type information to identify documents uniqly.

@MaineC

MaineC Jul 20, 2016

Contributor

I believe @clintongormley had a good point that this ID should be augmented with index and type information to identify documents uniqly.

aggsParsers = null;
}
public void testParseFromXContent() throws IOException {

This comment has been minimized.

@MaineC

MaineC Jul 20, 2016

Contributor

Should we have some sort of round trip parsing testing like we added during the query refactoring? Is that even possible?

@MaineC

MaineC Jul 20, 2016

Contributor

Should we have some sort of round trip parsing testing like we added during the query refactoring? Is that even possible?

@MaineC

This comment has been minimized.

Show comment
Hide comment
@MaineC

MaineC Jul 20, 2016

Contributor

I think this is awesome progress. There are a couple things we need to iron out, most of which you already pointed to in your comments, but I think we can do so after merging in separate PRs.

Contributor

MaineC commented Jul 20, 2016

I think this is awesome progress. There are a couple things we need to iron out, most of which you already pointed to in your comments, but I think we can do so after merging in separate PRs.

@MaineC MaineC merged commit 2b3abad into elastic:feature/rank-eval Jul 21, 2016

1 check passed

CLA Commit author is a member of Elasticsearch
Details

@cbuescher cbuescher deleted the cbuescher:rank-add-restparsing branch Nov 17, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment