Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change logging so that it doesn't refer to an sltr query #69

Closed
softwaredoug opened this issue Aug 20, 2017 · 6 comments
Closed

Change logging so that it doesn't refer to an sltr query #69

softwaredoug opened this issue Aug 20, 2017 · 6 comments
Assignees
Milestone

Comments

@softwaredoug
Copy link
Collaborator

Consider "offline" logging use cases where a user batches a set of identifiers and simply wants the scores for each feature for a set of document identifiers (this is what happens currently in the demo). The current logging API expects to find an sltr query in the body.

At first blush, I would prefer a logging interface closer too:

GET tmdb/_search
{
    "query": {
        "terms": {
             "id": ["1234", "5678"]
        }
    },
    "ext": {
        "ltr_log": {
            "log_specs": {
                "featureset": "my_feature_set"
            }
        }
    }
}

This would log "my_feature_set" for the returned documents.

This seems more flexible than the current logging interface in the 1.0 branch, as it would support several logging use cases.

@softwaredoug softwaredoug added this to the 1.0 milestone Aug 20, 2017
@softwaredoug
Copy link
Collaborator Author

I wonder if this was done for performance reasons, as most use cases would involve a query. But perhaps we could simply require one of feature_set, rescore_index, or named query. Or offer a caching hint of where to find the features.

@softwaredoug
Copy link
Collaborator Author

I see now that specifying an sltr query with a featureset, not a model, basically does what I'm describing.

@ebernhardson
Copy link
Collaborator

IIRC this was done because the search extension doesn't have access to the proper objects to read the index, it has to grab a pre-built query from the search context.

@softwaredoug
Copy link
Collaborator Author

softwaredoug commented Aug 22, 2017

That makes sense @ebernhardson. For posterity sake, below is how I log features for this use case.

The should sltr query has no effect in the case below. It seems that referring to just a feature set means no score is added(?). We should just be clear about documenting that.

logQuery = {
    "size": 100,
    "query": {
        "bool": {
            "must": [
                {
                    "terms": {
                        "_id": ["7555"]
                    }
                }
            ],
            "should": [
                {"sltr": {
                    "_name": "logged_featureset",
                    "featureset": "movie_features",
                    "params": {
                        "keywords": "rambo"
                    }
                }}
                ]
            }
    },
    "ext": {
        "ltr_log": {
            "log_specs": {
                "name": "main",
                "named_query": "logged_featureset",
                "missing_as_zero": true
            }
        }
    }
}

@nomoa
Copy link
Collaborator

nomoa commented Aug 31, 2017

yes, initially I wanted to put everything in the log specs but the QueryShardContext is frozen during the fetch phase and does not allow me to parse a query. It's why I relied on on named query + inpecting the rescore context.
Note that the most performant query for offline logging is wrapping everything in a filter clause :

logQuery = {
    "size": 100,
    "query": {
        "bool": {
            "filter": [
                {
                    "terms": {
                        "_id": ["7555"]
                    }
                },
                {
                    "sltr": {
                         "_name": "logged_featureset",
                         "featureset": "movie_features",
                         "params": {
                              "keywords": "rambo"
                         }
                    }
                }
             ]
        }
    },
    "ext": {
        "ltr_log": {
            "log_specs": {
                "name": "main",
                "named_query": "logged_featureset",
                "missing_as_zero": true
            }
        }
    }
}

When used inside a filter the sltr query will simply bypass all feature queries during the search phase, it permits to run feature queries only once during the fetch phase.

@softwaredoug is there something you'd like to do regarding this ticket?
Fixing the logging DSL to make it more intuitive will certainly require some very specific patches upstream to allow parsing a query during the fetch phase.

@softwaredoug
Copy link
Collaborator Author

No I'm fine with this the way it is, the explanation makes sense! I was able to get the offline case working fine with the pattern you described @nomoa

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants