Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add search pagination token to python client #1444

Merged
merged 10 commits into from Jun 19, 2019
Merged

Conversation

sueann
Copy link
Contributor

@sueann sueann commented Jun 13, 2019

What changes are proposed in this pull request?

Adding support for search pagination tokens in the python client search_runs API. Implements the API changes and pagination in the RestStore, but defers the implementation in FileStore and SQLAlchemyStore to later work.

With this implementation, any existing backend store should not break - the existing search_runs implementation will overwrite AbstractStore's search_runs, and _search_runs will not be required (except by lint).

How is this patch tested?

Unit tests

Release Notes

Is this a user-facing change?

  • No. You can skip the rest of this section.
  • Yes. Give a description of this change to be included in the release notes for MLflow users.

What component(s) does this PR affect?

  • API
  • Tracking
  • Python

How should the PR be classified in the release notes? Choose one:

  • rn/feature - A new user-facing feature worth mentioning in the release notes

@sueann sueann requested review from mateiz and mparkhe June 13, 2019 18:14
@@ -4,6 +4,13 @@
from mlflow.store import SEARCH_MAX_RESULTS_DEFAULT


class ListWithToken(list):

def __init__(self, items, token):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

items -> raw_list or something like that. Items suggests a list of KV tuples like items() method on dictionaries.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm... but items here could actually be any iterable, not just a list. and items would refer to the items in any such iterable. not sure if there is a better term for it?

def _search_runs(self, experiment_ids, search_filter, run_view_type, max_results):
"""
Return runs that match the given list of search expressions within the experiments.
Given multiple search expressions, all these expressions are ANDed together for search.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can now remove this line, since anded_expression has been removed.

Given multiple search expressions, all these expressions are ANDed together for search.

Given multiple search expressions, all these expressions are ANDed together for search.

:param experiment_ids: List of experiment ids to scope the search
:param search_filter: :py:class`mlflow.utils.search_utils.SearchFilter` object to encode
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aarondav #1437 refactors this argument as

        :param filter_string: A search filter string.

Copy link
Contributor Author

@sueann sueann Jun 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah thanks for the pointer. i will need to do some conflict merging once #1437 is in.

:param experiment_ids: List of experiment ids to scope the search
:param search_filter: :py:class`mlflow.utils.search_utils.SearchFilter` object to encode
search expression or filter string
:param run_view_type: ACTIVE, DELETED, or ALL runs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oooh! Just seeing this. Options are ACTIVE_ONLY, DELETED_ONLY, and ALL

@@ -545,7 +544,9 @@ def search_runs(self, experiment_ids, search_filter, run_view_type,
run_infos = self._list_run_infos(experiment_id, run_view_type)
runs.extend(self.get_run(r.run_id) for r in run_infos)
filtered = [run for run in runs if not search_filter or search_filter.filter(run)]
return sorted(filtered, key=lambda r: (-r.info.start_time, r.info.run_id))[:max_results]
runs = sorted(filtered, key=lambda r: (-r.info.start_time, r.info.run_id))[:max_results]
token = "PAGINATION_TOKEN_NOT_IMPLEMENTED"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a constant for this, perhaps in AbstractStore and use that everywhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense, though from the UX perspective, would it be better to return None so it is easier to check for an invalid token? or should we handle such a token inside search_runs - I assume we will have to validate the token anyway - so the user doesn't have to worry about it.

@sueann sueann changed the title [RFC] returning search pagination token in python client [WIP] returning search pagination token in python client Jun 14, 2019
@sueann
Copy link
Contributor Author

sueann commented Jun 14, 2019

Current status:

@sueann sueann changed the title [WIP] returning search pagination token in python client returning search pagination token in python client Jun 17, 2019
@sueann
Copy link
Contributor Author

sueann commented Jun 17, 2019

Will add the token parameter in this PR. Will implement FileStore & SQLAlchemyStore changes in a follow-up PR.

@sueann sueann requested a review from mparkhe June 17, 2019 22:11
def search_runs(self, experiment_ids, filter_string, run_view_type,
max_results=SEARCH_MAX_RESULTS_DEFAULT, order_by=None):
max_results=SEARCH_MAX_RESULTS_DEFAULT, order_by=None, page_token=None):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mparkhe is None an okay default value for the page token (based on how you plan to implement validation/handling of this argument in the server)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually looks like proto won't send the field if it's None so should be fine

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@sueann sueann requested a review from dbczumar June 18, 2019 23:06
@sueann sueann changed the title returning search pagination token in python client Add search pagination token to python client Jun 18, 2019
Copy link
Contributor

@mparkhe mparkhe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small nits but overall looks good.

@@ -4,6 +4,13 @@
from mlflow.store import SEARCH_MAX_RESULTS_DEFAULT


class PagedList(list):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that name!

def search_runs(self, experiment_ids, filter_string, run_view_type,
max_results=SEARCH_MAX_RESULTS_DEFAULT, order_by=None):
max_results=SEARCH_MAX_RESULTS_DEFAULT, order_by=None, page_token=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


See ``search_runs`` for parameter descriptions.

:param page_token:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something is missing here.

def _search_runs(self, experiment_ids, filter_string, run_view_type, max_results, order_by,
page_token):
if page_token:
raise MlflowException("SQLAlchemy-backed tracking stores do not yet support pagination"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SQLAlchemy-backed tracking stores do -> FileStore does

@mparkhe mparkhe added the LGTM label Jun 19, 2019
@sueann sueann merged commit fcb6fa8 into mlflow:master Jun 19, 2019
@andrewmchen andrewmchen added the rn/feature Mention under Features in Changelogs. label Jul 16, 2019
avflor pushed a commit to avflor/mlflow that referenced this pull request Aug 22, 2020
Adding support for search pagination tokens in the python client search_runs API. Implements the API changes and pagination in the RestStore, but defers the implementation in FileStore and SQLAlchemyStore to later work.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
LGTM rn/feature Mention under Features in Changelogs.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants