Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Experiment tracking proposal #195

Closed
wants to merge 7 commits into from

Conversation

inc0
Copy link

@inc0 inc0 commented Oct 3, 2018

This PR is supposed to start proper design discussion regarding
experiment tracking. Please, feel free to review, comment or commit new
changes.

Relevant conversations:
kubeflow/kubeflow#264
kubeflow/kubeflow#136


This change is Reviewable

This PR is supposed to start proper design discussion regarding
experiment tracking. Please, feel free to review, comment or commit new
changes.
@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: ewilderj

If they are not already assigned, you can assign the PR to them by writing /assign @ewilderj in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jlewi
Copy link
Contributor

jlewi commented Oct 4, 2018

Thanks for doing this.

Can your provide more information about the alternative solutions?
Do they provide APIs?
What do they use for DBs?
What are the DB schemas?

What does Katib need?
Some background on how we are using ModelDB and what problems we are looking to address?

@inc0
Copy link
Author

inc0 commented Oct 4, 2018

I'll slowly do research, but could use help if anyone have experience with any of alternatives. Also, let me know if you know any alternatives that are missing. I'll try to dig more into each architecture, but at the end of the day experience is priceless.

/cc @holdenk - Holden, maybe you could help us with MLFlow? Or point us to someone who knows more about this project?

As for Katib, I'm not sure what Katib really needs aside from experiment tracking. @YujiOshima could you help us figure out what requirements Katib has from it and if ModelDB is enough or we need something more?

@YujiOshima
Copy link

@inc0 @jlewi In Katib, the long term DB for experimental tracking is completely separated from katib DB.
But experimental tracking is extremely needed since users want to check and evaluate the hyper-parameters Katib generated by their eyes.
So requests from Katib is not so unique.

  • Storing Metrics, Hyper-Parameters, DataSet-path, Model-path etc.
  • Sort, filter models by Metrics or Hyper-Parameters.

Above requests are needed to do by both of GUI and API.
Katib uses ModelDB and it is OK now. But since it is not active, I want to look for alternatives.

There is many choices(MLFlow, StudioML..) and the best choice is depending on the user.
I try to do in Katib, abstracting the API and make pluggable the backend.
Users or other projects only need to know one API(grpc). They can use ModelManagement tool they want and switch it easily.

@YujiOshima
Copy link

@inc0 In this proposal, you plan to make a new tool for model management, experimental tracking?
If so, I'm so interested and happy to contribute.
But the GUI is included? GUI is so important.

@johnugeorge
Copy link
Member

@YujiOshima Why do we need to make a new tool? Instead,Isn't it better to make existing tools like modelDB or others better?

@YujiOshima
Copy link

@johnugeorge If an existing tool is enough, I agree.
I understand the pain for developing a new tool.
In my opinion, We should define the API according to our requirements and use an existing tool as a backend I write the above comment.

@johnugeorge
Copy link
Member

@YujiOshima I agree. I feel that we have to first list down missing features/requirements in the current tools and then take a call on whether to support the existing ones or implement a new tool.

@inc0
Copy link
Author

inc0 commented Oct 5, 2018

@YujiOshima I agree. I feel that we have to first list down missing features/requirements in the current tools and then take a call on whether to support the existing ones or implement a new tool.

That's what I tied to did in this PR, I have issues with ModelDB being based on mongo, which isn't easiest thing to maintain. Also need for same information (list of experiments) in 2 separate databases (katib and modeldb) is very problematic. We should create something with API and allow Katib use it as source of truth.


* Tensorboard - Wasn't meant for large number of models. It's better for very detailed examination of smaller number of models. Uses tf.Event files
* MLFlow - One of big cons for this is using files as storage for models. That would require something like dask or spark to query them efficiently. Can store files in multiple backends (S3 and GCS among other things)
* ModelDB - Requires mongodb which is problematic

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not only mongo but sqllite. And it reset the sqlite at the beginning of a process.
We can't persistent data without modification.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, so it's no good for persistent experiment tracking, which we're after

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about first design a better modelDB equivalent and then use that for tracking experiments? I would recommend we keep each of these very independent for now. So that Kubeflow components/apps can integrate with a wide variety of tools e.g. TFX, katib, autoML.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reason I think our current model is flawed is because we have 2 sources of truth. Katib uses sql, modeldb uses mongodb or sqlite. Every time you want modeldb will sync stuff from katibs db. That means if you do sync with tens of thousands of models, it's going to lock whole system. I think we should build single source of truth of where models are and how they performed and Katib should use it. This would negate need for Katibs database alltogether and, therefore, made it much easier to handle. In another issue we've discussed Katib as model management tool, but we've decided that Katibs scope is hyperparameter tuning, and model management is something different (however required).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi team, I'm the PM on the MLflow team at Databricks. Some of the engineers will chime in here too. Adding a database-backed tracking store to the tracking server is on our roadmap, and there is already a pluggable API!

## Alternatives

* Tensorboard - Wasn't meant for large number of models. It's better for very detailed examination of smaller number of models. Uses tf.Event files
* MLFlow - One of big cons for this is using files as storage for models. That would require something like dask or spark to query them efficiently. Can store files in multiple backends (S3 and GCS among other things)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it really need dask or spark? It has REST API.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well if you'll try to query 50000 records from one file (and by query I mean "highest value of X") it's going to require something more...

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although MLFlow does store files on disk. It would save sometime if folks looked at forking it and then integrating a database to store the tracking information.

@YujiOshima
Copy link

TensorBoard is so useful but it is not suitable for general Experimental Tracking.
It is difficult to manage a huge number of models and very specialized to TF ( or ONNX ). The Experimental Tracking is not only for DL models but all ML experiments.
Ideally, TensorBoard is linked from Experimental Tracking UI (MLFlow or StudioML).

@inc0
Copy link
Author

inc0 commented Oct 10, 2018

That's the idea @YujiOshima :) I was thinking of something like button "spawn Tensorboard from these 3 models"

@inc0
Copy link
Author

inc0 commented Oct 10, 2018

Also Tensorboard will have support for PyTorch, which is super cool. We would still need something for scikit-learn but it's getting better!

Copy link
Member

@ddutta ddutta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have clear requirements for an independent model tracking (modelDB equivalent) and experiment tracking that can be then leveraged by katib, autoML, pytorch, TFX integration. Then define the API. We are also very interested in contributing to model management but would like to take it slowly - get a straw man working (like @jlewi mentioned), validate the requirements to ensure it works well with different tools. Else we will have to do a lot more work down the road. Could we please form a small sub team to do this?


* Tensorboard - Wasn't meant for large number of models. It's better for very detailed examination of smaller number of models. Uses tf.Event files
* MLFlow - One of big cons for this is using files as storage for models. That would require something like dask or spark to query them efficiently. Can store files in multiple backends (S3 and GCS among other things)
* ModelDB - Requires mongodb which is problematic
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about first design a better modelDB equivalent and then use that for tracking experiments? I would recommend we keep each of these very independent for now. So that Kubeflow components/apps can integrate with a wide variety of tools e.g. TFX, katib, autoML.

@zak-hassan
Copy link

zak-hassan commented Oct 10, 2018

@jlewi The mlflow does provide a python api that can be plugged into a jupyter notebook then later when DS want to track a parameter or a metric they can view it off a dashboard. Also another thing in terms of design is that you can compare multiple runs side by side. Another thing to note is that it does have a rest api as well. Then when it comes to model deployments, it integrates with sagemaker, azure ml, regular model serving.

@inc0
Copy link
Author

inc0 commented Oct 10, 2018

@zmhassan with MLFlow, let's hypothetically assume we have 50 000 of models for detecting cats. Is there easy way to select model with highest accuracy and spawn seldon (or tf-serving) out of them? Quick look at API doesn't look like MLFlow has any form of querying. I also don't see whole lot of model provenance out there, but that probably could be implemented. Also, how easy would it be to integrate it to TFJob? As in start tfjob from this run, retrain model X etc. Another thing is integration with tensorboard. MLFlow seems to be alternative to tensorboard, and tensorboard, for what it is (UI for examining models performance) is excellent (imho). Any chances we could keep using it?

Zak Hassan and others added 2 commits October 10, 2018 16:41
Adding more detail around experiment tracking.
Proposal to experiment tracking feature
@googlebot
Copy link

So there's good news and bad news.

👍 The good news is that everyone that needs to sign a CLA (the pull request submitter and all commit authors) have done so. Everything is all good there.

😕 The bad news is that it appears that one or more commits were authored or co-authored by someone other than the pull request submitter. We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that here in the pull request.

Note to project maintainer: This is a terminal state, meaning the cla/google commit status will not change from this state. It's up to you to confirm consent of all the commit author(s), set the cla label to yes (if enabled on your project), and then merge this pull request when appropriate.

@jlewi
Copy link
Contributor

jlewi commented Oct 11, 2018

The Katib page has a screencast illustrating ModelDB.
https://user-images.githubusercontent.com/10014831/38241910-64fb0646-376e-11e8-8b98-c26e577f3935.gif

As a strawman this looks like it has most of the features we want

  • API to report metrics
  • A UI to browse models
  • Some built in visualizations
  • Ability to launch TensorBoard

I realize there are concerns about MongoDB but for a strawman running it in a container with a PVC seems fine.

Have folks checked out the demo:
http://modeldb.csail.mit.edu:3000/projects/6/models

Its pretty slick and it looks like it provides most of what we want.

I'd be thrilled if we managed to get that working and part of the 0.4 release.

My conjecture is that if we get some more first hand experience with ModelDB we'll be in a better position to figure out where to go from here.

@ddysher
Copy link
Member

ddysher commented Oct 11, 2018

Are we going to track all experiments, last time I talked to data scientist, it's not very useful for them to track all experiments, much like software debugging, where we change code and experiment, without using git commit; we checkin code only when we feel comfortable about our change.

modeldb wraps existing libraries (sklearn, sparkml) to sync model data, e.g. users are required to use sync version instead of stock methods, which I think can be fragile to library change and require extra work to support more frameworks. Also, IIRC, syncing model data takes considerable time.

This document is design proposal for new service within Kubeflow - experiment tracking. Need for tool like this was
expressed in multiple issues and discussions.

## What is experiment tracking

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should focus on experiment tracking. This is different from monitoring your production models, like gathering metrics about model drift or accuracy in the production env.

* I'm a data scientist working on a problem. I'm looking for easy way to compare multiple training jobs with multiple sets of hyperparameters. I would like to be able to select top 5 jobs measured with P1 score and examine which model architecture, hyperparameters, dataset and initial state contributed to this score. I would want to compare these 5 together in highly detailed way (for example via tensorboard). I would like rich UI to navigate models without need to interact with infrastructure.
* I'm part of big ML team in company. Our whole team works on single problem (for example search) and every person builds their models. I'd like to be able to compare my models with others. I want to be safe that nobody will accidentally delete model I'm working on.
* I'm cloud operator in ML team. I would like to take current production model (architecture+hyperparams+training state) and retrain it with new data as it becomes available. I would want to run suite of tests and determine if new model performs better. If it does, I'd like to spawn tf-serving (or seldon) cluster and perform rolling upgrade to new model.
* I'm part of highly sophisticated ML team. I'd like to automate retraining->testing->rollout for models so they can be upgraded nightly without supervision.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This e.g. is not part of experiment tracking imho. It's about model management and model monitoring.
ls there a good/common term to describe this operational bit of models?
Model management, Model operations?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Model management is alternative term to experiment tracking I think. At least I've understood it as such. As for functionality, because we'll make it k8s-native, cost of adding this feature will be so low that I think we should do it just for users benefit. Ongoing monitoring of models isn't something in scope, but as long as this monitoring agent saves observed metrics (say avg accuracy over last X days) back to this service, you still can benefit from this.

Copy link

@durandom durandom Oct 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the notion of "model management" and "experiment tracking" is slightly different. "management" has a production connotation and "experiment" has a devel connotation. Did @jlewi in this comment thread get to a common definition? This mlflow issue has also a discussion around the use case of the various tools around. And a google search for "experiment tracking" ai ml vs "model management" ai ml gives 500 vs 75k results.
Please dont get me wrong. I'm all for having a solution for this, because I think too this is a missing component of kubeflow.
I'd just limit the scope to the devel side of the house and let pachyderm and seldon focus on the production side.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So one clarification - when I'm saying, for example, model rollout, what I mean is single call to k8s to spawn seldon cluster. Actual serving, monitoring etc is beyond scope, I agree, but I think it'd be a nice touch to allow one-click mechanism. For Pachyderm integration look lower, I actually wanted to keep pipeline uuid in database. If someone will use pachyderm, we'll integrate with it and allow quick navigation. For example one-click link to relevant pachyderm ui

* Feature engineering pipeline used
* Katib study id
* Model architecture (code used)
* Hyperparameters

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the code to create the model is in a VCS, e.g. git, it should also track the version of the code used to create the model

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree, that's what I meant by "model architecture". But good idea would be to make it point to

  • code (including commit id)
  • docker image

For selected models we should be able to setup model introspection tools, like Tensorboard.
Tensorboard provides good utility, allows comparison of few models and recently it was announced that it will integrate with pytorch. I think it's reasonable to use Tensorboard
for this problem and allow easy spawn of tensorboard instance for selected models. We might need to find alternative for scikit-learn. Perhaps we can try mlflow for scikit-learn.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also http://tensorboardx.readthedocs.io which can create a tensorboard from any python code.
We've started working with mlflow because it has a nice web ui and is easy to use with it's python framework.
I dont know if tensorboard with tensorboardx has some benefits though

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it does! You could use it to add Tensorboard to scikit-learn (just log accuracy every batch of training)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MLflow and tensorboard and tensorboardx are complimentary. I have worked with some MLflow users who use them together. Check out this example in the MLflow repository of using MLflow with PyTorch and tensorboardx: https://github.com/mlflow/mlflow/blob/master/examples/pytorch/mnist_tensorboard_artifact.py

From the doc at the top of that code example:

Trains an MNIST digit recognizer using PyTorch, and uses tensorboardX to log training metrics and weights in TensorBoard event format to the MLflow run's artifact directory. This stores the TensorBoard events in MLflow for later access using the TensorBoard command line tool.

* I’m part of the ML team in company. I would like to be able to track my parameters used to train an ML job and track the metrics produced. I would like to have isolation so I can find the experiments I worked on. I would like the ability to compare multiple training rules side by side and pick the best model. Once I select the best model I would like to deploy that model to production. I may also want to test my model with historical data to see how well it performs and maybe roll out my experiment to a subset of users before fully rolling out to all users.


## Scale considerations

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The straight forward aproach would be to use kubernetes jobs for this. Let kubernetes handle the orchestration and GC. Each job would be configured with env variables.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need more than just number of replicas. It's important thing to consider when selecting underlying database

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I'm thinking an experiment would be a kubernetes primitive, like a job - no replicas involved. The job will be scheduled by kubernetes. So if you run 100 or 1000 experiments, you just create them and let kubernetes handle the scaling, i.e. the scheduling

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if that's what you mean, but we've discussed using CRDs as experiments but decided against it. Sheer number of experiments involved and lack of querying is a problem. We still need database somewhere. As for running actual experiments, then yeah, they will be tfjobs so regular pods.

## Alternatives

* Tensorboard - Wasn't meant for large number of models. It's better for very detailed examination of smaller number of models. Uses tf.Event files
* MLFlow - One of the cons for this is that experiment metadata is stored on disk. In kubernetes may require persistent volumes. A better approach would be to store metadata in a database. Pros, can store models in multiple backends (S3, Azure Cloud Storage and GCS among other things)
Copy link

@durandom durandom Oct 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not care if mlflow stores it in a db or in a file. If you use mlflow we should use their REST api as the interface and let them handle persistence. And for a DB you'd also need a PV, so 🤷‍♂️

We have started a repo to make mlflow run on openshift: https://github.com/AICoE/experiment-tracking

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Managed database (like big table) doesn't (or at least you don't care about it). Biggest issue for MLFlow API (which is directly tied to file storage) is lack of querying. Currently (unless I'm mistaken) there is no good way in MLFlow to ask for model with highest accuracy. It could be implemented, but then comparisons would be done in python, so not super scalable.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Managed database (like big table)

wouldnt this introduce a dependency that kubeflow wants to avoid?

Biggest issue for MLFlow API (which is directly tied to file storage) is lack of querying

Actually there is a REST API for that. But I havent used it and I'm not sure how good it scales

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#188 I've noted it here briefly with some consequences and how to make it manageable for operators (imho).

As for search string it's really not much. I still can't see option of "get me best model" without using spark/dask

@inc0
Copy link
Author

inc0 commented Oct 11, 2018

Are we going to track all experiments, last time I talked to data scientist, it's not very useful for them to track all experiments, much like software debugging, where we change code and experiment, without using git commit; we checkin code only when we feel comfortable about our change.

I think we can track all experiments, that's why we've been putting emphasis on scale. I expect great majority of models to never see production. I'd say commit to branch->train->save to exp_track->look up tensorboard, compare with current best model etc->decide what to do, move to prod or ignore forever

@zak-hassan
Copy link

ML Hyperparameter Tuning:

I think if we make the proposal more focused on the use case here. The purpose of tracking experiments is that we want to perform hyperparameter tuning.

Unrelated items in proposal:

  • imho, we don't need to include model provenance into this proposal as it is already being done by pachyderm. Correct me if I'm wrong??
  • We should have a conversation about rolling this out but I don't think this proposal is the right place. Anything like ML DevOps qa-> staging -> prod roll out stuff is another conversation too.

Conclusion

I think if we keep it simple we can get something working (mvp). The main motivation why I suggested MLFlow was because it supports multiple ML frameworks, not just tensorflow. To have wider adoption I suggest we cast our net wider and allow for more ML frameworks to be utilized.

@karlmutch
Copy link

Some thoughts,

At Sentient, studioml, our focus has been on enabling our evolutionary NN experiments. As a previous contributor has mentioned, workflows concentrating on single experiments (or individuals) have scale friction. That however should not pose an issue if all meta data placed by experiments on traditional storage is machine readable. In some instances we are specifically interested in looking at individual experiments, for example during development or when beginning work in a specific domain. In other situations we can ingest experiment data into our own tooling and investigate at much higher levels of granularity.

Couple this with having idempotent data and a small set of rules about incremental results enables the storage to become the system of record, rather than a DB. it also helps with the who owns what issue deferring that to the storage platform.

Defining an entity model, relationships and forcing an architecture might reduce deployment freedom. One alternative, for example, we have chosen is to concentrate on a portable data format as the primary means of defining our interfaces. Each technology we then use for UI, reporting, project authoring etc aligns around the formats. For our own purposes we are using JSON and S3 artifacts to act as our system of record. Queries on the S3 platform, for example, can be done using DSLs from cloud providers or tooling can be used to migrate meta-data to other DB's etc. This defers technology compromises to the consumer. For us that means we can avoid the mongo loose schema, and lost data problems, or going the spark big-data route. This also avoids the issue of who is the responsible party for the cost of non-artifact storage and queries, the user pays rather than the experimenter.

Anyway thats my 2 cents.

@jlewi
Copy link
Contributor

jlewi commented Oct 30, 2018

Thanks @karlmutch

@zmhassan

I think if we make the proposal more focused on the use case here. The purpose of tracking experiments is that we want to perform hyperparameter tuning.

I don't think we want to limit our selves to hyperparameter tuning. I think a very common use case is just training a model in notebook; saving that model somewhere and wanting a simple way to track/browse all such models.

@zak-hassan
Copy link

@jlewi Good point. Definitely good to capture all use cases. We experimented with mlflow you can import the python library into Jupyter notebook and be able to track parameters/metrics.

@karlmutch Definitely interested in reading up on studioml.

@inc0
Copy link
Author

inc0 commented Oct 30, 2018

For our own purposes we are using JSON and S3 artifacts to act as our system of record.

One issue with this approach (unless coupled with database, which I propose) is that it would be extremely hard to find model based on performance. I'm talking about query like "show me top 5 models with highest p1 score for experiment cat or dog". Can you deal with this kind of queries too?

@karlmutch
Copy link

karlmutch commented Oct 30, 2018

For our own purposes we are using JSON and S3 artifacts to act as our system of record.

@inc0 One issue with this approach (unless coupled with database, which I propose) is that it would be extremely hard to find model based on performance. I'm talking about query like "show me top 5 models with highest p1 score for experiment cat or dog". Can you deal with this kind of queries too?

In the cases where experimenters are using cloud based storage the cloud vendors typically offer query engines that support this use case. AWS Athena and from memory Google Cloud Datastore will both do this, not sure from memory about Azure as we use our own tech on their stack.

In the case where a tool like Minio is being used, or our customers/experimenters wish to make use of their own query engine then the json data is ingested into a DB. Things remain coherent between the store and the DB because we follow idempotency and simple rules around experiments that are in progress when the ingest occurs. From the Studio perspective however the choices in this area are not mandated by the experimentation framework.

@andyk
Copy link

andyk commented Nov 5, 2018

This is an awesome discussion. I'm a PM at Databricks, I work full time on MLflow, and I'm a huge fan of KubeFlow; it's been on our roadmap to engage about exactly this topic. In addition to what @zmhassan has been sharing, if there is any way we can be helpful, I know @mateiz @mparkhe @pogil @aarondav and others in the MLflow community would be excited to help answer any questions, discuss architecture, etc. In particular, I think @mparkhe -- an MLflow core dev -- has been thinking about this and has some more details.

@mparkhe
Copy link

mparkhe commented Nov 5, 2018

As @andyk mentioned above, the MLflow team has been thinking about various engagement efforts and how we can support those within MLflow architecture. We'd be excited to have MLflow be a component of the Kubeflow ecosystem. To that effect, we would love to understand what would be required to make this happen.

Here are some details about specific projects on our roadmap, that would help answer questions about MLflow.

Scalable backend store for Tracking Server data

The current storage layer for experiments, runs and other data is files. However, we are planning to add support for other scalable storage. Query layer will be built using a generic query layer like SQLAlchemy that would support most relational databases. There have been some queries for support of KV stores, however search query pattern supported by MLflow APIs, is most suited for relational databases out of the box, without the need for additional modeling of tables for each different types of key value stores.

As a side note, MLflow supports ability to store large file artifacts like models, training and test data, images, plots, … etc in several implementations of artifact stores: AWS S3, Azure Blob Store, GCP, … etc. The query pattern for these APIs is to access the actual artifact and purely dependent on these object store implementations.

APIs and Interfaces

One of the design principles for all 3 MLflow components was API first. We designed these interfaces, first and then various different implementations to be able to plug into them.

Current open source has implementations for FileStore and RestStore, which implement these interfaces over storage layer. Even with FileStore implementations, most of the APIs are efficient, since they index into a specific experiment/run folder. With many experiments and runs, search API can get slow since FileStore will access underlying data and then the search functionality is implemented in python.

With the above mentioned change to have a SQLAlchemy layer, these queries would be re-written to be pushed down the the appropriate backend query engine. This would enable to plug in any database backend store that can be queried through SQLAlchemy and we expect this solution to scale. For production use cases, using relational DB like MySQL would work. We have stress tested with realistic production workloads (1000s of experiments, ~ 100K runs, millions of metrics, … etc) for MLflow API pointed to MySQL backend (local machine and RDS) and found it to be performant. For instance, an indexed MySQL table for metrics returned desired results in single digit milliseconds. We believe that such an implementation would be suitable for most production workloads.

MLflow UI

The current UI implementation supports almost all APIs, including Search and viewing artifacts. We are working on releasing the next version to have more feature coverage like CRUD operations on MLflow entities (eg: deleting and renaming experiments), support easy visibility of multi-step workflows. Many UI components like graphical view for run metrics has been contributed by non-Databricks community members. The MLflow team at Databricks will continue to support and add more functionality here.

We look forward to contributions to MLflow and also collaborating to Kubeflow ecosystem towards this common goal.

@zak-hassan
Copy link

zak-hassan commented Nov 5, 2018

+1

I've been experimenting with porting over mlflow to run on kubernetes with crd's and operators. Once that is done and setup to work with ksonnet. It will be a simple ksonnet installation.

@jlewi
Copy link
Contributor

jlewi commented Nov 5, 2018

Thanks @andyk @mparkhe that sounds great would be great to explore possible collaborations.

Perhaps someone could give an MLFlow demo at one of our community meetings.

@mparkhe
Copy link

mparkhe commented Nov 6, 2018

Thanks @andyk @mparkhe that sounds great would be great to explore possible collaborations.

Perhaps someone could give an MLFlow demo at one of our community meetings.

Hi @jlewi: That sounds fantastic. We would love to get on the schedule.

cc: @mateiz, @pogil

@mpvartak
Copy link

mpvartak commented Nov 6, 2018

Hey all, this is Manasi from ModelDB. It's pretty clear that multiple groups are working towards the same goal of a general-purpose model management system. Can the KubeFlow community think about defining a generic interface for model management? (or requirements for a model management system to be compatible with KubeFlow?)

That way there can be multiple implementations of KubeFlow-compatible model management systems and users can pick the one that works best for them. I imagine this would be similar to having multiple model serving implementations for KubeFlow.

@jlewi
Copy link
Contributor

jlewi commented Jul 19, 2019

Close this because its stale.

@jlewi jlewi closed this Jul 19, 2019
@BioGeek
Copy link

BioGeek commented Aug 25, 2020

What is the current status of this proposal/of experiment tracking in Kubeflow? Is there another proposal which supersedes this one or is progress tracked somewhere else?

woop pushed a commit to woop/community that referenced this pull request Nov 16, 2020
Provide VERSION variable with default value for example script
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.