Add search pagination token to python client #1444

sueann · 2019-06-13T18:14:32Z

What changes are proposed in this pull request?

Adding support for search pagination tokens in the python client search_runs API. Implements the API changes and pagination in the RestStore, but defers the implementation in FileStore and SQLAlchemyStore to later work.

With this implementation, any existing backend store should not break - the existing search_runs implementation will overwrite AbstractStore's search_runs, and _search_runs will not be required (except by lint).

How is this patch tested?

Unit tests

Release Notes

Is this a user-facing change?

No. You can skip the rest of this section.
Yes. Give a description of this change to be included in the release notes for MLflow users.

What component(s) does this PR affect?

API
Tracking
Python

How should the PR be classified in the release notes? Choose one:

rn/feature - A new user-facing feature worth mentioning in the release notes

mparkhe · 2019-06-13T18:49:32Z

mlflow/store/abstract_store.py

@@ -4,6 +4,13 @@
 from mlflow.store import SEARCH_MAX_RESULTS_DEFAULT


+class ListWithToken(list):
+
+    def __init__(self, items, token):


items -> raw_list or something like that. Items suggests a list of KV tuples like items() method on dictionaries.

hm... but items here could actually be any iterable, not just a list. and items would refer to the items in any such iterable. not sure if there is a better term for it?

mparkhe · 2019-06-13T18:52:15Z

mlflow/store/abstract_store.py

+    def _search_runs(self, experiment_ids, search_filter, run_view_type, max_results):
+        """
+        Return runs that match the given list of search expressions within the experiments.
+        Given multiple search expressions, all these expressions are ANDed together for search.


we can now remove this line, since anded_expression has been removed.

Given multiple search expressions, all these expressions are ANDed together for search.

mparkhe · 2019-06-13T18:53:52Z

mlflow/store/abstract_store.py

+        Given multiple search expressions, all these expressions are ANDed together for search.
+
+        :param experiment_ids: List of experiment ids to scope the search
+        :param search_filter: :py:class`mlflow.utils.search_utils.SearchFilter` object to encode


@aarondav #1437 refactors this argument as

:param filter_string: A search filter string.

ah thanks for the pointer. i will need to do some conflict merging once #1437 is in.

mparkhe · 2019-06-13T18:58:15Z

mlflow/store/abstract_store.py

+        :param experiment_ids: List of experiment ids to scope the search
+        :param search_filter: :py:class`mlflow.utils.search_utils.SearchFilter` object to encode
+            search expression or filter string
+        :param run_view_type: ACTIVE, DELETED, or ALL runs


oooh! Just seeing this. Options are ACTIVE_ONLY, DELETED_ONLY, and ALL

mparkhe · 2019-06-13T18:59:17Z

mlflow/store/file_store.py

@@ -545,7 +544,9 @@ def search_runs(self, experiment_ids, search_filter, run_view_type,
            run_infos = self._list_run_infos(experiment_id, run_view_type)
            runs.extend(self.get_run(r.run_id) for r in run_infos)
        filtered = [run for run in runs if not search_filter or search_filter.filter(run)]
-        return sorted(filtered, key=lambda r: (-r.info.start_time, r.info.run_id))[:max_results]
+        runs = sorted(filtered, key=lambda r: (-r.info.start_time, r.info.run_id))[:max_results]
+        token = "PAGINATION_TOKEN_NOT_IMPLEMENTED"


can we add a constant for this, perhaps in AbstractStore and use that everywhere?

makes sense, though from the UX perspective, would it be better to return None so it is easier to check for an invalid token? or should we handle such a token inside search_runs - I assume we will have to validate the token anyway - so the user doesn't have to worry about it.

sueann · 2019-06-14T23:22:37Z

Current status:

Resolved merge conflicts with Add search_runs order_by support #1437
Need to add the token parameter to the search_runs method
Need to implement pagination for FileStore and SQLAlchemyStore

sueann · 2019-06-17T22:11:09Z

Will add the token parameter in this PR. Will implement FileStore & SQLAlchemyStore changes in a follow-up PR.

sueann · 2019-06-18T22:52:56Z

mlflow/store/abstract_store.py

    def search_runs(self, experiment_ids, filter_string, run_view_type,
-                    max_results=SEARCH_MAX_RESULTS_DEFAULT, order_by=None):
+                    max_results=SEARCH_MAX_RESULTS_DEFAULT, order_by=None, page_token=None):


@mparkhe is None an okay default value for the page token (based on how you plan to implement validation/handling of this argument in the server)?

Actually looks like proto won't send the field if it's None so should be fine

mparkhe

small nits but overall looks good.

mparkhe · 2019-06-19T17:40:30Z

mlflow/store/abstract_store.py

@@ -4,6 +4,13 @@
 from mlflow.store import SEARCH_MAX_RESULTS_DEFAULT


+class PagedList(list):


I like that name!

mparkhe · 2019-06-19T17:40:51Z

mlflow/store/abstract_store.py

    def search_runs(self, experiment_ids, filter_string, run_view_type,
-                    max_results=SEARCH_MAX_RESULTS_DEFAULT, order_by=None):
+                    max_results=SEARCH_MAX_RESULTS_DEFAULT, order_by=None, page_token=None):


mparkhe · 2019-06-19T17:44:20Z

mlflow/store/abstract_store.py

+
+        See ``search_runs`` for parameter descriptions.
+
+        :param page_token:


something is missing here.

mparkhe · 2019-06-19T17:46:09Z

mlflow/store/file_store.py

+    def _search_runs(self, experiment_ids, filter_string, run_view_type, max_results, order_by,
+                     page_token):
+        if page_token:
+            raise MlflowException("SQLAlchemy-backed tracking stores do not yet support pagination"


SQLAlchemy-backed tracking stores do -> FileStore does

Adding support for search pagination tokens in the python client search_runs API. Implements the API changes and pagination in the RestStore, but defers the implementation in FileStore and SQLAlchemyStore to later work.

sueann requested review from mateiz and mparkhe June 13, 2019 18:14

mparkhe reviewed Jun 13, 2019

View reviewed changes

sueann added 3 commits June 14, 2019 16:06

RFC version

b9fb74d

addressed some comments - will need to revisit once order_by is added

abfa2e3

will fix the function signature once the relevant PR is in

694f24c

sueann force-pushed the py_token branch from 57e22d7 to 694f24c Compare June 14, 2019 23:08

fix actual errors arising from merge conflicts

6809bc5

sueann changed the title ~~[RFC] returning search pagination token in python client~~ [WIP] returning search pagination token in python client Jun 14, 2019

sueann changed the title ~~[WIP] returning search pagination token in python client~~ returning search pagination token in python client Jun 17, 2019

sueann requested a review from mparkhe June 17, 2019 22:11

sueann added 2 commits June 18, 2019 15:44

added proto, arg to search_runs for token

e0abc2b

cleanup

676cb55

sueann commented Jun 18, 2019

View reviewed changes

sueann requested a review from dbczumar June 18, 2019 23:06

sueann changed the title ~~returning search pagination token in python client~~ Add search pagination token to python client Jun 18, 2019

sueann added 3 commits June 18, 2019 16:45

make default returned value for token None

e0f5f63

fix tests

39139d9

add tests on page_token input when pagination not implemented

a8b766e

mparkhe reviewed Jun 19, 2019

View reviewed changes

mparkhe approved these changes Jun 19, 2019

View reviewed changes

mparkhe added the LGTM label Jun 19, 2019

address comments

20603bf

sueann merged commit fcb6fa8 into mlflow:master Jun 19, 2019

andrewmchen added the rn/feature Mention under Features in Changelogs. label Jul 16, 2019

AveshCSingh mentioned this pull request Jul 24, 2020

[FR] Pagination support for list_run_infos #3157

Closed

20 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add search pagination token to python client #1444

Add search pagination token to python client #1444

sueann commented Jun 13, 2019 •

edited

mparkhe Jun 13, 2019

sueann Jun 13, 2019

mparkhe Jun 13, 2019

mparkhe Jun 13, 2019

sueann Jun 13, 2019 •

edited

mparkhe Jun 13, 2019

mparkhe Jun 13, 2019

sueann Jun 13, 2019

sueann commented Jun 14, 2019

sueann commented Jun 17, 2019 •

edited

sueann Jun 18, 2019

sueann Jun 18, 2019

mparkhe Jun 19, 2019

mparkhe left a comment

mparkhe Jun 19, 2019

mparkhe Jun 19, 2019

mparkhe Jun 19, 2019

mparkhe Jun 19, 2019

		@@ -4,6 +4,13 @@
		from mlflow.store import SEARCH_MAX_RESULTS_DEFAULT


		class PagedList(list):


		See ``search_runs`` for parameter descriptions.

		:param page_token:

Add search pagination token to python client #1444

Add search pagination token to python client #1444

Conversation

sueann commented Jun 13, 2019 • edited

What changes are proposed in this pull request?

How is this patch tested?

Release Notes

Is this a user-facing change?

What component(s) does this PR affect?

How should the PR be classified in the release notes? Choose one:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sueann Jun 13, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sueann commented Jun 14, 2019

sueann commented Jun 17, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mparkhe left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sueann commented Jun 13, 2019 •

edited

sueann Jun 13, 2019 •

edited

sueann commented Jun 17, 2019 •

edited