New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Complete Python SDK #297
Comments
The main question here is how do we go about exposing the resultant API? So far, we have APIs split between Second, we have to wonder if the API should be Pythonic (e.g., |
As is described in boto/boto3#112, boto has two APIs -- one that closely matches the lower-level (Proto-tastic) and a higher-level one with a more Pythonic interface. Since the former is "easy", and the latter "nice", this seems somewhat reasonable. I might propose we have an autogenerated client API somewhere like |
I'd probably just start with the purely Pythonic one unless there are a lot of APIs we don't want to include in that one for some reason. Otherwise, people get confused about which API to use, and various users will depend on both APIs so it will be difficult to maintain both of them. The other issue with Protobufs is that if we expose them in the public API, we might never be able to get rid of them without a lot of trouble. Regarding BTW, one thing to consider is how to have some of the APIs return data in an easy to process format. For example, I'd love to get the result of SearchRuns as a Pandas DataFrame. Maybe this can be done using an alternate version of the call, or a parameter such as format=pandas. |
PR merged! |
Coming from an API-centric background I believe having a low-level client that faithfully mirrors the actual HTTP calls is crucial. This client is the building block on top of which richer more domain-specific clients can be built. A low-level and language-oriented client are just ends of a continuum and not mutually exclusive. So the boto paradigm seems like a good one. Low-level API clients are easy to create. Furthermore, designing only an opinionated client can be problematic since other users will have different unanticipated needs. I see the MLflow REST API as a foundational core feature on top of which different clients and applications can be built. A high-level API and multi-lingual client strategy is important to building out MLflow as a powerful next-gen ML management platform. I could see someone wanting to create their own custom UI for hyperparameter optimization that would be written on top of the API. Or create a custom rich client that would be tailored to their specific use cases. This would be in line with the today’s ubiquitous rich API ecosystem where businesses expose APIs and customers create value-add with client applications. A good read on the topic: APIs: A Strategy Guide. Here's a sample low-level Python MLflow client mlflow_client/mlflow_api_client.py that is similar to the Java client API. It is slightly opinionated in that it rolls out nested JSON responses and simplifies the search input signature. I would also certainly add CRUD capabilities to API resources so folks can more effectively manage (update, delete) their experiments and runs. A richer domain model would be helpful in being able to group experiments into "buckets". Let's say a company wants to group one set of experiments into "self-driving-cars" and another into "self-driving-trucks". Having a flat experiment name space won't scale when you have many experiments. An autogenerated API sounds good but I would shy away from exposing Protobuf concepts. Protobuf data concepts have already leaked into the current REST data payloads (e.g. unnecessary wrapper fields such as in the “create run” response where the data is buried two-levels down in “run/info”). Protobuf is an internal implementation detail - the API should only expose RESTful concepts. The current API also uses RPC-like endpoints such as “experiments/list”, “experiments/get”, etc. instead of standard HTTP/REST verbs and resource names. A more RESTful API would look like:
|
You raise a few distinct points. Let me try to summarize them to make sure I understand each. Python API proposalThe proposed API here looks pretty similar to the one introduced in #299 . The key differences I see are
Let me know if I am missing some other major differences between the proposed API and the one in #299. CRUD-y REST APIYou point out that our current REST API does not follow the RESTful ideals as well as it could, and you're absolutely right. The reason we originally designed the REST API as we did was simply because we happened to have infrastructure and tooling to take protos and convert them into REST API constructs, but this infrastructure has the limitation where it cannot support the RESTful ideal. So, we were constrained to using GET with query parameters and POST with JSON bodies for input. Although our API should not be in principal constrained by the existing technology and tooling we have, it is the case that it is still significant effort to convert and handle the APIs on all available servers to a more RESTful ideal. For this reason, I am still in favor of keeping the straightforward-but-not-completely-RESTful API. We can definitely introduce the more RESTful terminology later in a v2 API, and continue to support both. |
Currently, MLflow provides three interfaces:
mlflow experiments list
, as well as some more powerful integrated workflows likemlflow run
There is work in progress for adding R and Java/Scala SDK support as well.
We should complete the Python SDK. In particular, there is mixed support for experiment CRUD, runs CRUD, and metrics/artifact CRUD.
The text was updated successfully, but these errors were encountered: