Complete Python SDK #297

aarondav · 2018-08-13T23:48:00Z

Currently, MLflow provides three interfaces:

REST API -- the fundamental, JSON-y truth.
CLI -- a subset of the API (e.g., mlflow experiments list, as well as some more powerful integrated workflows like mlflow run
Python SDK -- currently, a subset of the REST API, focused on components needed for tracking a single experiment.

There is work in progress for adding R and Java/Scala SDK support as well.

We should complete the Python SDK. In particular, there is mixed support for experiment CRUD, runs CRUD, and metrics/artifact CRUD.

The text was updated successfully, but these errors were encountered:

aarondav · 2018-08-13T23:51:05Z

The main question here is how do we go about exposing the resultant API? So far, we have APIs split between mlflow itself (e.g., log_param, log_metric) and mlflow.tracking (create_experiment, get_run).

Second, we have to wonder if the API should be Pythonic (e.g., log_metric(key, value)) versus Proto-tastic (e.g., log_metric(Metric(key, value))).

aarondav · 2018-08-13T23:57:16Z

As is described in boto/boto3#112, boto has two APIs -- one that closely matches the lower-level (Proto-tastic) and a higher-level one with a more Pythonic interface. Since the former is "easy", and the latter "nice", this seems somewhat reasonable.

I might propose we have an autogenerated client API somewhere like mlflow.api or mlflow.sdk, and then provide wrappers for the useful components in mlflow proper. This would mean adding things like list_experiments and create_experiment to mlflow, though, which might over-saturate the mlflow module with APIs that are not in common use.

mateiz · 2018-08-14T00:07:28Z

I'd probably just start with the purely Pythonic one unless there are a lot of APIs we don't want to include in that one for some reason. Otherwise, people get confused about which API to use, and various users will depend on both APIs so it will be difficult to maintain both of them. The other issue with Protobufs is that if we expose them in the public API, we might never be able to get rid of them without a lot of trouble.

Regarding mlflow vs mlflow.tracking, the original idea was to alias some very commonly used functions in mlflow but have everything else in mlflow.tracking. I'd say add everything to tracking first. Even exposing a few functions in mlflow might have been a bad idea because some people ask what the difference is, although I see a lot of Python packages that do the same thing (for instance Flask and Click).

BTW, one thing to consider is how to have some of the APIs return data in an easy to process format. For example, I'd love to get the result of SearchRuns as a Pandas DataFrame. Maybe this can be done using an alternate version of the call, or a parameter such as format=pandas.

aarondav · 2018-08-17T16:08:29Z

PR merged!

andremesarovic · 2018-08-17T20:14:22Z

Coming from an API-centric background I believe having a low-level client that faithfully mirrors the actual HTTP calls is crucial. This client is the building block on top of which richer more domain-specific clients can be built. A low-level and language-oriented client are just ends of a continuum and not mutually exclusive. So the boto paradigm seems like a good one. Low-level API clients are easy to create. Furthermore, designing only an opinionated client can be problematic since other users will have different unanticipated needs.

I see the MLflow REST API as a foundational core feature on top of which different clients and applications can be built. A high-level API and multi-lingual client strategy is important to building out MLflow as a powerful next-gen ML management platform. I could see someone wanting to create their own custom UI for hyperparameter optimization that would be written on top of the API. Or create a custom rich client that would be tailored to their specific use cases. This would be in line with the today’s ubiquitous rich API ecosystem where businesses expose APIs and customers create value-add with client applications. A good read on the topic: APIs: A Strategy Guide.

Here's a sample low-level Python MLflow client mlflow_client/mlflow_api_client.py that is similar to the Java client API. It is slightly opinionated in that it rolls out nested JSON responses and simplifies the search input signature.

I would also certainly add CRUD capabilities to API resources so folks can more effectively manage (update, delete) their experiments and runs. A richer domain model would be helpful in being able to group experiments into "buckets". Let's say a company wants to group one set of experiments into "self-driving-cars" and another into "self-driving-trucks". Having a flat experiment name space won't scale when you have many experiments.

An autogenerated API sounds good but I would shy away from exposing Protobuf concepts. Protobuf data concepts have already leaked into the current REST data payloads (e.g. unnecessary wrapper fields such as in the “create run” response where the data is buried two-levels down in “run/info”). Protobuf is an internal implementation detail - the API should only expose RESTful concepts.

The current API also uses RPC-like endpoints such as “experiments/list”, “experiments/get”, etc. instead of standard HTTP/REST verbs and resource names. A more RESTful API would look like:

Current		Proposed	'
Method	Endpoint	Method	Endpoint
GET	experiments/list	GET	experiments
GET	experiments/get?experiment_id=$EXP_ID	GET	experiments/$EXP_ID
POST	experiments/create	POST	experiments
GET	runs/run_uuid=$RUN_ID	GET	runs/$RUN_ID
POST	runs/create	POST	runs
POST	runs/update	PUT (PATCH?)	runs/$RUN_ID

aarondav · 2018-08-22T18:35:15Z

You raise a few distinct points. Let me try to summarize them to make sure I understand each.

Python API proposal

The proposed API here looks pretty similar to the one introduced in #299 . The key differences I see are

The proposed API returns raw JSON objects as opposed to the Python mlflow.entities. The entities provide some significant utility, I think -- mainly that users can call help() or use IPython's autocomplete on them, and that Python errors are thrown if you attempt to access a field that could not possibly be defined, to help prevent typos in user code.
You introduce an easier-to-use search API. I think this makes sense, as the current one with experiment_ids and anded_expressions is difficult to reason about and construct. I think we may want to introduce this API at the REST level, though, because the REST APIs should be easy to construct by hand.

Let me know if I am missing some other major differences between the proposed API and the one in #299.

CRUD-y REST API

You point out that our current REST API does not follow the RESTful ideals as well as it could, and you're absolutely right. The reason we originally designed the REST API as we did was simply because we happened to have infrastructure and tooling to take protos and convert them into REST API constructs, but this infrastructure has the limitation where it cannot support the RESTful ideal. So, we were constrained to using GET with query parameters and POST with JSON bodies for input.

Although our API should not be in principal constrained by the existing technology and tooling we have, it is the case that it is still significant effort to convert and handle the APIs on all available servers to a more RESTful ideal. For this reason, I am still in favor of keeping the straightforward-but-not-completely-RESTful API. We can definitely introduce the more RESTful terminology later in a v2 API, and continue to support both.

aarondav added a commit to aarondav/mlflow that referenced this issue Aug 14, 2018

[mlflow#297] Complete Python SDK

7511275

aarondav added a commit to aarondav/mlflow that referenced this issue Aug 17, 2018

[mlflow#297] Complete Python SDK

33c94a5

aarondav added a commit that referenced this issue Aug 17, 2018

[#297] Complete Python SDK (#299)

60610c5

aarondav closed this as completed Aug 17, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Complete Python SDK #297

Complete Python SDK #297

aarondav commented Aug 13, 2018

aarondav commented Aug 13, 2018

aarondav commented Aug 13, 2018

mateiz commented Aug 14, 2018

aarondav commented Aug 17, 2018

andremesarovic commented Aug 17, 2018 •

edited

aarondav commented Aug 22, 2018

Complete Python SDK #297

Complete Python SDK #297

Comments

aarondav commented Aug 13, 2018

aarondav commented Aug 13, 2018

aarondav commented Aug 13, 2018

mateiz commented Aug 14, 2018

aarondav commented Aug 17, 2018

andremesarovic commented Aug 17, 2018 • edited

aarondav commented Aug 22, 2018

Python API proposal

CRUD-y REST API

andremesarovic commented Aug 17, 2018 •

edited