Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GetExperimentByName REST API & call it from the Python client #1775

Merged
merged 22 commits into from Sep 16, 2019

Conversation

@smurching
Copy link
Contributor

smurching commented Aug 24, 2019

What changes are proposed in this pull request?

Addresses #1772 - namely:

  • Adds a new GetExperimentByName REST API endpoint to improve efficiency of fetching an experiment by name
  • Updates the Python REST client (RestStore) to try to use the new GetExperimentByName API, falling back to the existing ListExperiments-based implementation if that fails for backwards compatibility with old servers.

In follow-up PRs, we should:

  • Update R & Java clients to use the new endpoint
  • Improve server-side performance of GetExperimentByName in the SQLAlchemyStore (right now it lists & filters all experiments server-side) and FileStore, which just uses the AbstractStore implementation of also listing & filtering all experiments

How is this patch tested?

  • Adds unit tests in test_rest_store.py verifying the updated Python REST client behavior
  • Adds e2e tests in test_rest_tracking.py verifying the REST API endpoint works in the OSS server using both FileStore & SqlAlchemyStore backends
  • Manually tested the client against an MLflow 1.2 server (without the GetExperimentByName endpoint).

Release Notes

Is this a user-facing change?

  • No. You can skip the rest of this section.
  • Yes. Give a description of this change to be included in the release notes for MLflow users.

Adds a new GetExperimentByName REST API endpoint to enable searching for experiments by name, and updates the Python client to use this endpoint when possible, e.g. in mlflow.tracking.MlflowClient.get_experiment_by_name

What component(s) does this PR affect?

  • UI
  • CLI
  • API
  • REST-API
  • Examples
  • Docs
  • Tracking
  • Projects
  • Artifacts
  • Models
  • Scoring
  • Serving
  • R
  • Java
  • Python

How should the PR be classified in the release notes? Choose one:

  • rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
  • rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
  • rn/feature - A new user-facing feature worth mentioning in the release notes
  • rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
  • rn/documentation - A user-facing documentation change worth mentioning in the release notes
@smurching smurching changed the title Add GetExperimentByName REST API & call it from the client Add GetExperimentByName REST API & call it from the Python client Aug 24, 2019
assert res.name == name


def test_get_experiment(mlflow_client, backend_store_uri):

This comment has been minimized.

Copy link
@smurching

smurching Aug 24, 2019

Author Contributor

Also added an e2e test for GetExperiment

smurching added 6 commits Aug 24, 2019
…rrorCode type (enum string rather than enum value)

to MlflowException superclass constructor, and add test
@smurching smurching removed the enhancement label Aug 26, 2019
@@ -12,6 +12,23 @@ option (scalapb.options) = {
};

service MlflowService {


// Get metadata for an experiment and a list of runs for the experiment.

This comment has been minimized.

Copy link
@dbczumar

dbczumar Aug 26, 2019

Collaborator

The "list of runs" field has been deprecated in GetExperiment. Imo, we should omit reference to the list of runs since, in most cases, the experiment objects returned by this API won't contain run data.

This comment has been minimized.

Copy link
@smurching

smurching Aug 27, 2019

Author Contributor

Oops, yeah good catch, the response message doesn't include any runs (my current docstring is strictly wrong since they're guaranteed not to be returned :P), will fix!

def _get_experiment_by_name():
request_message = _get_request_message(GetExperimentByName())
response_message = GetExperimentByName.Response()
store_exp = _get_store().get_experiment_by_name(request_message.experiment_name)

This comment has been minimized.

Copy link
@smurching

smurching Aug 30, 2019

Author Contributor

Kind of funny - in the OSS server, we call the AbstractStore implementation, which (at least based on existing precedent) is supposed to return None when no matching experiment is found. This is currently the case in the FileStore (see relevant test) & SqlAlchemyStore (this PR adds a relevant test in test_sqlalchemy_store.py), and we have an e2e test that the REST client handles the OSS server's RESOURCE_DOES_NOT_EXIST response here

@smurching

This comment has been minimized.

Copy link
Contributor Author

smurching commented Aug 30, 2019

Output from manual test of updated client against MLflow 1.2.0 server:

Client:

(mlflow-python3) ➜  mlflow git:(get-experiment-by-name) python
Python 3.6.8 |Anaconda, Inc.| (default, Dec 29 2018, 19:04:46) 
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> 
>>> import mlflow
>>> print(mlflow.__version__)
1.2.1.dev0
>>> mlflow.set_tracking_uri("http://localhost:5000")
>>> c = mlflow.tracking.MlflowClient()
>>> c.get_experiment_by_name("/blasdf") is None
True
>>> c.get_experiment_by_name("Default")
<Experiment: artifact_location='./mlruns/0', experiment_id='0', lifecycle_stage='active', name='Default', tags={}>
>>> 

Server:

(universe) ➜  /tmp mlflow --version
ls mlflow, version 1.2.0
(universe) ➜  /tmp ls mlruns 
0
(universe) ➜  /tmp cat mlruns/0/meta.yaml 
artifact_location: ./mlruns/0
experiment_id: '0'
lifecycle_stage: active
name: Default
(universe) ➜  /tmp mlflow server
[2019-08-30 13:51:04 -0700] [48028] [INFO] Starting gunicorn 19.9.0
[2019-08-30 13:51:04 -0700] [48028] [INFO] Listening at: http://127.0.0.1:5000 (48028)
[2019-08-30 13:51:04 -0700] [48028] [INFO] Using worker: sync
[2019-08-30 13:51:04 -0700] [48031] [INFO] Booting worker with pid: 48031
[2019-08-30 13:51:04 -0700] [48032] [INFO] Booting worker with pid: 48032
[2019-08-30 13:51:04 -0700] [48033] [INFO] Booting worker with pid: 48033
[2019-08-30 13:51:04 -0700] [48034] [INFO] Booting worker with pid: 48034
self._verify_requests(mock_http, creds,
"experiments/get-by-name", "GET",
message_to_json(expected_message1))
assert mock_http.call_count == 1

This comment has been minimized.

Copy link
@smurching

smurching Aug 30, 2019

Author Contributor

Here we confirm that the client doesn't try to fall back to ListExperiments if it gets a RESOURCE_DOES_NOT_EXIST from the server

smurching added 2 commits Aug 30, 2019
Copy link
Contributor

tomasatdatabricks left a comment

Looks good! 2 comments:
a) I think we should only fallback if the response has error code ENDPOINT_NOT_FOUND.
b) I don't think we have a test that verifies that res store will in fact use the new endpoint.

databricks_pb2.RESOURCE_DOES_NOT_EXIST):
return None
# Fall back to using ListExperiments-based implementation.
for experiment in self.list_experiments(ViewType.ALL):

This comment has been minimized.

Copy link
@tomasatdatabricks

tomasatdatabricks Sep 9, 2019

Contributor

should we check that the error was that the api path was not found?

tests/tracking/test_rest_tracking.py Show resolved Hide resolved
mlflow/store/rest_store.py Show resolved Hide resolved
smurching added 2 commits Sep 9, 2019


// Get metadata for an experiment.
// This endpoint will return deleted experiments.

This comment has been minimized.

Copy link
@akarloff

akarloff Sep 11, 2019

Contributor

I feel like this should be active only by default, much like list experiments

This comment has been minimized.

Copy link
@smurching

smurching Sep 11, 2019

Author Contributor

@akarloff thanks for the comment, the goal here was actually to have similar behavior to the GetExperiment REST API, which does return deleted experiments by default. It seems GetExperimentByName has more use cases in common with GetExperiment (fetching an experiment based on unique ID / name) than with ListExperiments, and it'd be good to avoid the confusion that would arise from having the two experiment-getting endpoints behave differently.

This comment has been minimized.

Copy link
@akarloff

akarloff Sep 12, 2019

Contributor

I don't agree that get experiment by name and get experiment by id should behave the same. Id is always unique. Name is not unique (although it is currently treated as such in create.)

Ideally there should only be one active experiment by name, allowing users to reuse names after deleting experiments, and get by name gets the active only.

How would get-by-name (include deleted) handle an active experiment and deleted experiment sharing the same name, or multiple deleted experiments sharing a name?

This comment has been minimized.

Copy link
@smurching

smurching Sep 12, 2019

Author Contributor

@akarloff it's a fair question, my response would be that an active & deleted experiment should not have the same name as per the API spec - we error out at creation-time if a user attempts this in the reference implementations, although admittedly the API doc's notion of an experiment "existing" is vague (a deleted experiment is considered to "exist" in the reference implementations, for example). For the same reason, it's not possible for two deleted experiments to have the same name.

Note that the current RestStore get_experiment_by_name implementation (we inherit the superclass implementation in AbstractStore, see here) actually does return deleted experiments. Thus to avoid a breaking change, we'd need it to fall back to ListExperiments if no active experiment is found with the specified name via the GetExperimentByName REST API. This case is actually fairly common (e.g. if you call mlflow.set_experiment("name-of-new-experiment")), and would place a decent amount of load on the server - it'd also be a little weird for the client API's behavior to differ from the underlying REST API.

Also agree that it would be nice to be able to create an experiment that reuses the name of a deleted experiment, although as mentioned above it's not currently possible in the reference implementations.

Happy to continue discussing on this thread or Slack, if that's faster - thanks!

This comment has been minimized.

Copy link
@smurching

smurching Sep 12, 2019

Author Contributor

One way around the issue btw could be to clarify the behavior of the REST API to first look for an active experiment with the specified name, then for a deleted experiment with the specified name, returning an empty response if neither exist. If multiple deleted experiments share the same name (if we interpret the REST API as allowing for this), we just return one of the deleted experiments (seems like an unlikely case, so we don't need to overfit to it).

This comment has been minimized.

Copy link
@akarloff

akarloff Sep 13, 2019

Contributor

@smurching Clarifying the behavior of the REST API as per your last comment is a decent work around. Supporting reuse of experiment names after deletion is something I would ultimately like to see, and could be supported with the documentation you proposed.

This comment has been minimized.

Copy link
@smurching

smurching Sep 13, 2019

Author Contributor

Sounds good, I'll add that clarification - thanks again for the input!

smurching added 2 commits Sep 11, 2019
Copy link
Contributor

tomasatdatabricks left a comment

Looks good and lgtm after the "only active experiments by default" issue is resolved.

@smurching smurching merged commit 237e305 into mlflow:master Sep 16, 2019
1 check passed
1 check passed
continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.