Change logging so that it doesn't refer to an sltr query #69

softwaredoug · 2017-08-20T18:48:40Z

Consider "offline" logging use cases where a user batches a set of identifiers and simply wants the scores for each feature for a set of document identifiers (this is what happens currently in the demo). The current logging API expects to find an sltr query in the body.

At first blush, I would prefer a logging interface closer too:

GET tmdb/_search
{
    "query": {
        "terms": {
             "id": ["1234", "5678"]
        }
    },
    "ext": {
        "ltr_log": {
            "log_specs": {
                "featureset": "my_feature_set"
            }
        }
    }
}

This would log "my_feature_set" for the returned documents.

This seems more flexible than the current logging interface in the 1.0 branch, as it would support several logging use cases.

The text was updated successfully, but these errors were encountered:

softwaredoug · 2017-08-20T18:52:14Z

I wonder if this was done for performance reasons, as most use cases would involve a query. But perhaps we could simply require one of feature_set, rescore_index, or named query. Or offer a caching hint of where to find the features.

softwaredoug · 2017-08-20T19:08:34Z

I see now that specifying an sltr query with a featureset, not a model, basically does what I'm describing.

ebernhardson · 2017-08-21T16:12:19Z

IIRC this was done because the search extension doesn't have access to the proper objects to read the index, it has to grab a pre-built query from the search context.

softwaredoug · 2017-08-22T12:32:51Z

That makes sense @ebernhardson. For posterity sake, below is how I log features for this use case.

The should sltr query has no effect in the case below. It seems that referring to just a feature set means no score is added(?). We should just be clear about documenting that.

logQuery = {
    "size": 100,
    "query": {
        "bool": {
            "must": [
                {
                    "terms": {
                        "_id": ["7555"]
                    }
                }
            ],
            "should": [
                {"sltr": {
                    "_name": "logged_featureset",
                    "featureset": "movie_features",
                    "params": {
                        "keywords": "rambo"
                    }
                }}
                ]
            }
    },
    "ext": {
        "ltr_log": {
            "log_specs": {
                "name": "main",
                "named_query": "logged_featureset",
                "missing_as_zero": true
            }
        }
    }
}

nomoa · 2017-08-31T13:19:44Z

yes, initially I wanted to put everything in the log specs but the QueryShardContext is frozen during the fetch phase and does not allow me to parse a query. It's why I relied on on named query + inpecting the rescore context.
Note that the most performant query for offline logging is wrapping everything in a filter clause :

logQuery = {
    "size": 100,
    "query": {
        "bool": {
            "filter": [
                {
                    "terms": {
                        "_id": ["7555"]
                    }
                },
                {
                    "sltr": {
                         "_name": "logged_featureset",
                         "featureset": "movie_features",
                         "params": {
                              "keywords": "rambo"
                         }
                    }
                }
             ]
        }
    },
    "ext": {
        "ltr_log": {
            "log_specs": {
                "name": "main",
                "named_query": "logged_featureset",
                "missing_as_zero": true
            }
        }
    }
}

When used inside a filter the sltr query will simply bypass all feature queries during the search phase, it permits to run feature queries only once during the fetch phase.

@softwaredoug is there something you'd like to do regarding this ticket?
Fixing the logging DSL to make it more intuitive will certainly require some very specific patches upstream to allow parsing a query during the fetch phase.

softwaredoug · 2017-09-04T12:40:10Z

No I'm fine with this the way it is, the explanation makes sense! I was able to get the offline case working fine with the pattern you described @nomoa

softwaredoug added this to the 1.0 milestone Aug 20, 2017

softwaredoug assigned nomoa Aug 20, 2017

softwaredoug mentioned this issue Aug 20, 2017

Cant log boosted sltr query #70

Closed

softwaredoug closed this as completed Sep 4, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change logging so that it doesn't refer to an sltr query #69

Change logging so that it doesn't refer to an sltr query #69

softwaredoug commented Aug 20, 2017

softwaredoug commented Aug 20, 2017

softwaredoug commented Aug 20, 2017

ebernhardson commented Aug 21, 2017

softwaredoug commented Aug 22, 2017 •

edited

nomoa commented Aug 31, 2017

softwaredoug commented Sep 4, 2017

Change logging so that it doesn't refer to an sltr query #69

Change logging so that it doesn't refer to an sltr query #69

Comments

softwaredoug commented Aug 20, 2017

softwaredoug commented Aug 20, 2017

softwaredoug commented Aug 20, 2017

ebernhardson commented Aug 21, 2017

softwaredoug commented Aug 22, 2017 • edited

nomoa commented Aug 31, 2017

softwaredoug commented Sep 4, 2017

softwaredoug commented Aug 22, 2017 •

edited