[WIP] Experiment tracking proposal #195

inc0 · 2018-10-03T22:54:09Z

This PR is supposed to start proper design discussion regarding
experiment tracking. Please, feel free to review, comment or commit new
changes.

Relevant conversations:
kubeflow/kubeflow#264
kubeflow/kubeflow#136

This change is

This PR is supposed to start proper design discussion regarding experiment tracking. Please, feel free to review, comment or commit new changes.

k8s-ci-robot · 2018-10-03T22:54:11Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: ewilderj

If they are not already assigned, you can assign the PR to them by writing /assign @ewilderj in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

jlewi · 2018-10-04T03:20:20Z

Thanks for doing this.

Can your provide more information about the alternative solutions?
Do they provide APIs?
What do they use for DBs?
What are the DB schemas?

What does Katib need?
Some background on how we are using ModelDB and what problems we are looking to address?

inc0 · 2018-10-04T04:39:27Z

I'll slowly do research, but could use help if anyone have experience with any of alternatives. Also, let me know if you know any alternatives that are missing. I'll try to dig more into each architecture, but at the end of the day experience is priceless.

/cc @holdenk - Holden, maybe you could help us with MLFlow? Or point us to someone who knows more about this project?

As for Katib, I'm not sure what Katib really needs aside from experiment tracking. @YujiOshima could you help us figure out what requirements Katib has from it and if ModelDB is enough or we need something more?

YujiOshima · 2018-10-05T04:50:32Z

@inc0 @jlewi In Katib, the long term DB for experimental tracking is completely separated from katib DB.
But experimental tracking is extremely needed since users want to check and evaluate the hyper-parameters Katib generated by their eyes.
So requests from Katib is not so unique.

Storing Metrics, Hyper-Parameters, DataSet-path, Model-path etc.
Sort, filter models by Metrics or Hyper-Parameters.

Above requests are needed to do by both of GUI and API.
Katib uses ModelDB and it is OK now. But since it is not active, I want to look for alternatives.

There is many choices(MLFlow, StudioML..) and the best choice is depending on the user.
I try to do in Katib, abstracting the API and make pluggable the backend.
Users or other projects only need to know one API(grpc). They can use ModelManagement tool they want and switch it easily.

YujiOshima · 2018-10-05T05:17:41Z

@inc0 In this proposal, you plan to make a new tool for model management, experimental tracking?
If so, I'm so interested and happy to contribute.
But the GUI is included? GUI is so important.

johnugeorge · 2018-10-05T05:21:22Z

@YujiOshima Why do we need to make a new tool? Instead,Isn't it better to make existing tools like modelDB or others better?

YujiOshima · 2018-10-05T05:25:42Z

@johnugeorge If an existing tool is enough, I agree.
I understand the pain for developing a new tool.
In my opinion, We should define the API according to our requirements and use an existing tool as a backend I write the above comment.

johnugeorge · 2018-10-05T05:40:57Z

@YujiOshima I agree. I feel that we have to first list down missing features/requirements in the current tools and then take a call on whether to support the existing ones or implement a new tool.

inc0 · 2018-10-05T17:20:57Z

@YujiOshima I agree. I feel that we have to first list down missing features/requirements in the current tools and then take a call on whether to support the existing ones or implement a new tool.

That's what I tied to did in this PR, I have issues with ModelDB being based on mongo, which isn't easiest thing to maintain. Also need for same information (list of experiments) in 2 separate databases (katib and modeldb) is very problematic. We should create something with API and allow Katib use it as source of truth.

YujiOshima · 2018-10-10T00:27:50Z

proposals/experiment-tracking-proposal.md

+
+* Tensorboard - Wasn't meant for large number of models. It's better for very detailed examination of smaller number of models. Uses tf.Event files
+* MLFlow - One of big cons for this is using files as storage for models. That would require something like dask or spark to query them efficiently. Can store files in multiple backends (S3 and GCS among other things)
+* ModelDB - Requires mongodb which is problematic


Not only mongo but sqllite. And it reset the sqlite at the beginning of a process.
We can't persistent data without modification.

Right, so it's no good for persistent experiment tracking, which we're after

How about first design a better modelDB equivalent and then use that for tracking experiments? I would recommend we keep each of these very independent for now. So that Kubeflow components/apps can integrate with a wide variety of tools e.g. TFX, katib, autoML.

Reason I think our current model is flawed is because we have 2 sources of truth. Katib uses sql, modeldb uses mongodb or sqlite. Every time you want modeldb will sync stuff from katibs db. That means if you do sync with tens of thousands of models, it's going to lock whole system. I think we should build single source of truth of where models are and how they performed and Katib should use it. This would negate need for Katibs database alltogether and, therefore, made it much easier to handle. In another issue we've discussed Katib as model management tool, but we've decided that Katibs scope is hyperparameter tuning, and model management is something different (however required).

Hi team, I'm the PM on the MLflow team at Databricks. Some of the engineers will chime in here too. Adding a database-backed tracking store to the tracking server is on our roadmap, and there is already a pluggable API!

YujiOshima · 2018-10-10T00:30:02Z

proposals/experiment-tracking-proposal.md

+## Alternatives
+
+* Tensorboard - Wasn't meant for large number of models. It's better for very detailed examination of smaller number of models. Uses tf.Event files
+* MLFlow - One of big cons for this is using files as storage for models. That would require something like dask or spark to query them efficiently. Can store files in multiple backends (S3 and GCS among other things)


Does it really need dask or spark? It has REST API.

Well if you'll try to query 50000 records from one file (and by query I mean "highest value of X") it's going to require something more...

Although MLFlow does store files on disk. It would save sometime if folks looked at forking it and then integrating a database to store the tracking information.

YujiOshima · 2018-10-10T01:36:23Z

TensorBoard is so useful but it is not suitable for general Experimental Tracking.
It is difficult to manage a huge number of models and very specialized to TF ( or ONNX ). The Experimental Tracking is not only for DL models but all ML experiments.
Ideally, TensorBoard is linked from Experimental Tracking UI (MLFlow or StudioML).

inc0 · 2018-10-10T01:37:19Z

That's the idea @YujiOshima :) I was thinking of something like button "spawn Tensorboard from these 3 models"

inc0 · 2018-10-10T01:42:55Z

Also Tensorboard will have support for PyTorch, which is super cool. We would still need something for scikit-learn but it's getting better!

ddutta

I think we should have clear requirements for an independent model tracking (modelDB equivalent) and experiment tracking that can be then leveraged by katib, autoML, pytorch, TFX integration. Then define the API. We are also very interested in contributing to model management but would like to take it slowly - get a straw man working (like @jlewi mentioned), validate the requirements to ensure it works well with different tools. Else we will have to do a lot more work down the road. Could we please form a small sub team to do this?

ddutta · 2018-10-10T07:32:06Z

proposals/experiment-tracking-proposal.md

+
+* Tensorboard - Wasn't meant for large number of models. It's better for very detailed examination of smaller number of models. Uses tf.Event files
+* MLFlow - One of big cons for this is using files as storage for models. That would require something like dask or spark to query them efficiently. Can store files in multiple backends (S3 and GCS among other things)
+* ModelDB - Requires mongodb which is problematic


How about first design a better modelDB equivalent and then use that for tracking experiments? I would recommend we keep each of these very independent for now. So that Kubeflow components/apps can integrate with a wide variety of tools e.g. TFX, katib, autoML.

zak-hassan · 2018-10-10T14:24:53Z

@jlewi The mlflow does provide a python api that can be plugged into a jupyter notebook then later when DS want to track a parameter or a metric they can view it off a dashboard. Also another thing in terms of design is that you can compare multiple runs side by side. Another thing to note is that it does have a rest api as well. Then when it comes to model deployments, it integrates with sagemaker, azure ml, regular model serving.

inc0 · 2018-10-10T14:33:58Z

@zmhassan with MLFlow, let's hypothetically assume we have 50 000 of models for detecting cats. Is there easy way to select model with highest accuracy and spawn seldon (or tf-serving) out of them? Quick look at API doesn't look like MLFlow has any form of querying. I also don't see whole lot of model provenance out there, but that probably could be implemented. Also, how easy would it be to integrate it to TFJob? As in start tfjob from this run, retrain model X etc. Another thing is integration with tensorboard. MLFlow seems to be alternative to tensorboard, and tensorboard, for what it is (UI for examining models performance) is excellent (imho). Any chances we could keep using it?

Adding more detail around experiment tracking.

Proposal to experiment tracking feature

googlebot · 2018-10-10T20:45:23Z

So there's good news and bad news.

👍 The good news is that everyone that needs to sign a CLA (the pull request submitter and all commit authors) have done so. Everything is all good there.

😕 The bad news is that it appears that one or more commits were authored or co-authored by someone other than the pull request submitter. We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that here in the pull request.

Note to project maintainer: This is a terminal state, meaning the cla/google commit status will not change from this state. It's up to you to confirm consent of all the commit author(s), set the cla label to yes (if enabled on your project), and then merge this pull request when appropriate.

jlewi · 2018-10-11T01:43:07Z

The Katib page has a screencast illustrating ModelDB.
https://user-images.githubusercontent.com/10014831/38241910-64fb0646-376e-11e8-8b98-c26e577f3935.gif

As a strawman this looks like it has most of the features we want

API to report metrics
A UI to browse models
Some built in visualizations
Ability to launch TensorBoard

I realize there are concerns about MongoDB but for a strawman running it in a container with a PVC seems fine.

Have folks checked out the demo:
http://modeldb.csail.mit.edu:3000/projects/6/models

Its pretty slick and it looks like it provides most of what we want.

I'd be thrilled if we managed to get that working and part of the 0.4 release.

My conjecture is that if we get some more first hand experience with ModelDB we'll be in a better position to figure out where to go from here.

ddysher · 2018-10-11T04:18:18Z

Are we going to track all experiments, last time I talked to data scientist, it's not very useful for them to track all experiments, much like software debugging, where we change code and experiment, without using git commit; we checkin code only when we feel comfortable about our change.

modeldb wraps existing libraries (sklearn, sparkml) to sync model data, e.g. users are required to use sync version instead of stock methods, which I think can be fragile to library change and require extra work to support more frameworks. Also, IIRC, syncing model data takes considerable time.

durandom · 2018-10-11T09:24:58Z

proposals/experiment-tracking-proposal.md

+This document is design proposal for new service within Kubeflow - experiment tracking. Need for tool like this was
+expressed in multiple issues and discussions.
+
+## What is experiment tracking


I think we should focus on experiment tracking. This is different from monitoring your production models, like gathering metrics about model drift or accuracy in the production env.

durandom · 2018-10-11T09:27:27Z

proposals/experiment-tracking-proposal.md

+* I'm a data scientist working on a problem. I'm looking for easy way to compare multiple training jobs with multiple sets of hyperparameters. I would like to be able to select top 5 jobs measured with P1 score and examine which model architecture, hyperparameters, dataset and initial state contributed to this score. I would want to compare these 5 together in highly detailed way (for example via tensorboard). I would like rich UI to navigate models without need to interact with infrastructure.
+* I'm part of big ML team in company. Our whole team works on single problem (for example search) and every person builds their models. I'd like to be able to compare my models with others. I want to be safe that nobody will accidentally delete model I'm working on.
+* I'm cloud operator in ML team. I would like to take current production model (architecture+hyperparams+training state) and retrain it with new data as it becomes available. I would want to run suite of tests and determine if new model performs better. If it does, I'd like to spawn tf-serving (or seldon) cluster and perform rolling upgrade to new model.
+* I'm part of highly sophisticated ML team. I'd like to automate retraining->testing->rollout for models so they can be upgraded nightly without supervision.


This e.g. is not part of experiment tracking imho. It's about model management and model monitoring.
ls there a good/common term to describe this operational bit of models?
Model management, Model operations?

Model management is alternative term to experiment tracking I think. At least I've understood it as such. As for functionality, because we'll make it k8s-native, cost of adding this feature will be so low that I think we should do it just for users benefit. Ongoing monitoring of models isn't something in scope, but as long as this monitoring agent saves observed metrics (say avg accuracy over last X days) back to this service, you still can benefit from this.

I think the notion of "model management" and "experiment tracking" is slightly different. "management" has a production connotation and "experiment" has a devel connotation. Did @jlewi in this comment thread get to a common definition? This mlflow issue has also a discussion around the use case of the various tools around. And a google search for "experiment tracking" ai ml vs "model management" ai ml gives 500 vs 75k results.
Please dont get me wrong. I'm all for having a solution for this, because I think too this is a missing component of kubeflow.
I'd just limit the scope to the devel side of the house and let pachyderm and seldon focus on the production side.

So one clarification - when I'm saying, for example, model rollout, what I mean is single call to k8s to spawn seldon cluster. Actual serving, monitoring etc is beyond scope, I agree, but I think it'd be a nice touch to allow one-click mechanism. For Pachyderm integration look lower, I actually wanted to keep pipeline uuid in database. If someone will use pachyderm, we'll integrate with it and allow quick navigation. For example one-click link to relevant pachyderm ui

durandom · 2018-10-11T09:28:56Z

proposals/experiment-tracking-proposal.md

+* Feature engineering pipeline used
+* Katib study id
+* Model architecture (code used)
+* Hyperparameters


If the code to create the model is in a VCS, e.g. git, it should also track the version of the code used to create the model

agree, that's what I meant by "model architecture". But good idea would be to make it point to

code (including commit id)

docker image

durandom · 2018-10-11T09:33:37Z

proposals/experiment-tracking-proposal.md

+For selected models we should be able to setup model introspection tools, like Tensorboard.
+Tensorboard provides good utility, allows comparison of few models and recently it was announced that it will integrate with pytorch. I think it's reasonable to use Tensorboard
+for this problem and allow easy spawn of tensorboard instance for selected models. We might need to find alternative for scikit-learn. Perhaps we can try mlflow for scikit-learn.
+


There is also http://tensorboardx.readthedocs.io which can create a tensorboard from any python code.
We've started working with mlflow because it has a nice web ui and is easy to use with it's python framework.
I dont know if tensorboard with tensorboardx has some benefits though

I think it does! You could use it to add Tensorboard to scikit-learn (just log accuracy every batch of training)

MLflow and tensorboard and tensorboardx are complimentary. I have worked with some MLflow users who use them together. Check out this example in the MLflow repository of using MLflow with PyTorch and tensorboardx: https://github.com/mlflow/mlflow/blob/master/examples/pytorch/mnist_tensorboard_artifact.py

From the doc at the top of that code example:

Trains an MNIST digit recognizer using PyTorch, and uses tensorboardX to log training metrics and weights in TensorBoard event format to the MLflow run's artifact directory. This stores the TensorBoard events in MLflow for later access using the TensorBoard command line tool.

durandom · 2018-10-11T09:35:17Z

proposals/experiment-tracking-proposal.md

+* I’m part of the ML team in company. I would like to be able to track my parameters used to train an ML job and track the metrics produced. I would like to have isolation so I can find the experiments I worked on. I would like the ability to compare multiple training rules side by side and pick the best model. Once I select the best model I would like to deploy that model to production. I may also want to test my model with historical data to see how well it performs and maybe roll out my experiment to a subset of users before fully rolling out to all users.
+
+
+## Scale considerations


The straight forward aproach would be to use kubernetes jobs for this. Let kubernetes handle the orchestration and GC. Each job would be configured with env variables.

We need more than just number of replicas. It's important thing to consider when selecting underlying database

Oh, I'm thinking an experiment would be a kubernetes primitive, like a job - no replicas involved. The job will be scheduled by kubernetes. So if you run 100 or 1000 experiments, you just create them and let kubernetes handle the scaling, i.e. the scheduling

I'm not sure if that's what you mean, but we've discussed using CRDs as experiments but decided against it. Sheer number of experiments involved and lack of querying is a problem. We still need database somewhere. As for running actual experiments, then yeah, they will be tfjobs so regular pods.

durandom · 2018-10-11T09:38:50Z

proposals/experiment-tracking-proposal.md

+## Alternatives
+
+* Tensorboard - Wasn't meant for large number of models. It's better for very detailed examination of smaller number of models. Uses tf.Event files
+* MLFlow - One of the cons for this is that experiment metadata is stored on disk. In kubernetes may require persistent volumes. A better approach would be to store metadata in a database. Pros, can store models in multiple backends (S3, Azure Cloud Storage and GCS among other things)


I would not care if mlflow stores it in a db or in a file. If you use mlflow we should use their REST api as the interface and let them handle persistence. And for a DB you'd also need a PV, so 🤷‍♂️

We have started a repo to make mlflow run on openshift: https://github.com/AICoE/experiment-tracking

Managed database (like big table) doesn't (or at least you don't care about it). Biggest issue for MLFlow API (which is directly tied to file storage) is lack of querying. Currently (unless I'm mistaken) there is no good way in MLFlow to ask for model with highest accuracy. It could be implemented, but then comparisons would be done in python, so not super scalable.

Managed database (like big table)

wouldnt this introduce a dependency that kubeflow wants to avoid?

Biggest issue for MLFlow API (which is directly tied to file storage) is lack of querying

Actually there is a REST API for that. But I havent used it and I'm not sure how good it scales

#188 I've noted it here briefly with some consequences and how to make it manageable for operators (imho).

As for search string it's really not much. I still can't see option of "get me best model" without using spark/dask

inc0 · 2018-10-11T14:07:16Z

Are we going to track all experiments, last time I talked to data scientist, it's not very useful for them to track all experiments, much like software debugging, where we change code and experiment, without using git commit; we checkin code only when we feel comfortable about our change.

I think we can track all experiments, that's why we've been putting emphasis on scale. I expect great majority of models to never see production. I'd say commit to branch->train->save to exp_track->look up tensorboard, compare with current best model etc->decide what to do, move to prod or ignore forever

zak-hassan · 2018-10-12T15:05:10Z

ML Hyperparameter Tuning:

I think if we make the proposal more focused on the use case here. The purpose of tracking experiments is that we want to perform hyperparameter tuning.

Unrelated items in proposal:

imho, we don't need to include model provenance into this proposal as it is already being done by pachyderm. Correct me if I'm wrong??
We should have a conversation about rolling this out but I don't think this proposal is the right place. Anything like ML DevOps qa-> staging -> prod roll out stuff is another conversation too.

Conclusion

I think if we keep it simple we can get something working (mvp). The main motivation why I suggested MLFlow was because it supports multiple ML frameworks, not just tensorflow. To have wider adoption I suggest we cast our net wider and allow for more ML frameworks to be utilized.

karlmutch · 2018-10-29T20:29:02Z

Some thoughts,

At Sentient, studioml, our focus has been on enabling our evolutionary NN experiments. As a previous contributor has mentioned, workflows concentrating on single experiments (or individuals) have scale friction. That however should not pose an issue if all meta data placed by experiments on traditional storage is machine readable. In some instances we are specifically interested in looking at individual experiments, for example during development or when beginning work in a specific domain. In other situations we can ingest experiment data into our own tooling and investigate at much higher levels of granularity.

Couple this with having idempotent data and a small set of rules about incremental results enables the storage to become the system of record, rather than a DB. it also helps with the who owns what issue deferring that to the storage platform.

Defining an entity model, relationships and forcing an architecture might reduce deployment freedom. One alternative, for example, we have chosen is to concentrate on a portable data format as the primary means of defining our interfaces. Each technology we then use for UI, reporting, project authoring etc aligns around the formats. For our own purposes we are using JSON and S3 artifacts to act as our system of record. Queries on the S3 platform, for example, can be done using DSLs from cloud providers or tooling can be used to migrate meta-data to other DB's etc. This defers technology compromises to the consumer. For us that means we can avoid the mongo loose schema, and lost data problems, or going the spark big-data route. This also avoids the issue of who is the responsible party for the cost of non-artifact storage and queries, the user pays rather than the experimenter.

Anyway thats my 2 cents.

jlewi · 2018-10-30T00:31:19Z

Thanks @karlmutch

@zmhassan

I think if we make the proposal more focused on the use case here. The purpose of tracking experiments is that we want to perform hyperparameter tuning.

I don't think we want to limit our selves to hyperparameter tuning. I think a very common use case is just training a model in notebook; saving that model somewhere and wanting a simple way to track/browse all such models.

zak-hassan · 2018-10-30T03:10:26Z

@jlewi Good point. Definitely good to capture all use cases. We experimented with mlflow you can import the python library into Jupyter notebook and be able to track parameters/metrics.

@karlmutch Definitely interested in reading up on studioml.

inc0 · 2018-10-30T13:43:36Z

For our own purposes we are using JSON and S3 artifacts to act as our system of record.

One issue with this approach (unless coupled with database, which I propose) is that it would be extremely hard to find model based on performance. I'm talking about query like "show me top 5 models with highest p1 score for experiment cat or dog". Can you deal with this kind of queries too?

karlmutch · 2018-10-30T14:32:41Z

For our own purposes we are using JSON and S3 artifacts to act as our system of record.

@inc0 One issue with this approach (unless coupled with database, which I propose) is that it would be extremely hard to find model based on performance. I'm talking about query like "show me top 5 models with highest p1 score for experiment cat or dog". Can you deal with this kind of queries too?

In the cases where experimenters are using cloud based storage the cloud vendors typically offer query engines that support this use case. AWS Athena and from memory Google Cloud Datastore will both do this, not sure from memory about Azure as we use our own tech on their stack.

In the case where a tool like Minio is being used, or our customers/experimenters wish to make use of their own query engine then the json data is ingested into a DB. Things remain coherent between the store and the DB because we follow idempotency and simple rules around experiments that are in progress when the ingest occurs. From the Studio perspective however the choices in this area are not mandated by the experimentation framework.

andyk · 2018-11-05T18:46:44Z

This is an awesome discussion. I'm a PM at Databricks, I work full time on MLflow, and I'm a huge fan of KubeFlow; it's been on our roadmap to engage about exactly this topic. In addition to what @zmhassan has been sharing, if there is any way we can be helpful, I know @mateiz @mparkhe @pogil @aarondav and others in the MLflow community would be excited to help answer any questions, discuss architecture, etc. In particular, I think @mparkhe -- an MLflow core dev -- has been thinking about this and has some more details.

mparkhe · 2018-11-05T20:18:27Z

As @andyk mentioned above, the MLflow team has been thinking about various engagement efforts and how we can support those within MLflow architecture. We'd be excited to have MLflow be a component of the Kubeflow ecosystem. To that effect, we would love to understand what would be required to make this happen.

Here are some details about specific projects on our roadmap, that would help answer questions about MLflow.

Scalable backend store for Tracking Server data

The current storage layer for experiments, runs and other data is files. However, we are planning to add support for other scalable storage. Query layer will be built using a generic query layer like SQLAlchemy that would support most relational databases. There have been some queries for support of KV stores, however search query pattern supported by MLflow APIs, is most suited for relational databases out of the box, without the need for additional modeling of tables for each different types of key value stores.

As a side note, MLflow supports ability to store large file artifacts like models, training and test data, images, plots, … etc in several implementations of artifact stores: AWS S3, Azure Blob Store, GCP, … etc. The query pattern for these APIs is to access the actual artifact and purely dependent on these object store implementations.

APIs and Interfaces

One of the design principles for all 3 MLflow components was API first. We designed these interfaces, first and then various different implementations to be able to plug into them.

Current open source has implementations for FileStore and RestStore, which implement these interfaces over storage layer. Even with FileStore implementations, most of the APIs are efficient, since they index into a specific experiment/run folder. With many experiments and runs, search API can get slow since FileStore will access underlying data and then the search functionality is implemented in python.

With the above mentioned change to have a SQLAlchemy layer, these queries would be re-written to be pushed down the the appropriate backend query engine. This would enable to plug in any database backend store that can be queried through SQLAlchemy and we expect this solution to scale. For production use cases, using relational DB like MySQL would work. We have stress tested with realistic production workloads (1000s of experiments, ~ 100K runs, millions of metrics, … etc) for MLflow API pointed to MySQL backend (local machine and RDS) and found it to be performant. For instance, an indexed MySQL table for metrics returned desired results in single digit milliseconds. We believe that such an implementation would be suitable for most production workloads.

MLflow UI

The current UI implementation supports almost all APIs, including Search and viewing artifacts. We are working on releasing the next version to have more feature coverage like CRUD operations on MLflow entities (eg: deleting and renaming experiments), support easy visibility of multi-step workflows. Many UI components like graphical view for run metrics has been contributed by non-Databricks community members. The MLflow team at Databricks will continue to support and add more functionality here.

We look forward to contributions to MLflow and also collaborating to Kubeflow ecosystem towards this common goal.

zak-hassan · 2018-11-05T20:39:02Z

+1

I've been experimenting with porting over mlflow to run on kubernetes with crd's and operators. Once that is done and setup to work with ksonnet. It will be a simple ksonnet installation.

jlewi · 2018-11-05T21:31:54Z

Thanks @andyk @mparkhe that sounds great would be great to explore possible collaborations.

Perhaps someone could give an MLFlow demo at one of our community meetings.

mparkhe · 2018-11-06T01:20:07Z

Thanks @andyk @mparkhe that sounds great would be great to explore possible collaborations.

Perhaps someone could give an MLFlow demo at one of our community meetings.

Hi @jlewi: That sounds fantastic. We would love to get on the schedule.

cc: @mateiz, @pogil

mpvartak · 2018-11-06T06:59:39Z

Hey all, this is Manasi from ModelDB. It's pretty clear that multiple groups are working towards the same goal of a general-purpose model management system. Can the KubeFlow community think about defining a generic interface for model management? (or requirements for a model management system to be compatible with KubeFlow?)

That way there can be multiple implementations of KubeFlow-compatible model management systems and users can pick the one that works best for them. I imagine this would be similar to having multiple model serving implementations for KubeFlow.

jlewi · 2019-07-19T14:05:25Z

Close this because its stale.

BioGeek · 2020-08-25T17:14:51Z

What is the current status of this proposal/of experiment tracking in Kubeflow? Is there another proposal which supersedes this one or is progress tracked somewhere else?

Provide VERSION variable with default value for example script

[WIP] Experiment tracking proposal

fc95d69

This PR is supposed to start proper design discussion regarding experiment tracking. Please, feel free to review, comment or commit new changes.

k8s-ci-robot added the do-not-merge/work-in-progress label Oct 3, 2018

k8s-ci-robot requested review from jimexist and jlewi October 3, 2018 22:54

k8s-ci-robot added the size/M label Oct 3, 2018

add note about katib

7b604cf

jlewi mentioned this pull request Oct 8, 2018

Proposal: our expectation on KubeFlow kubeflow/kubeflow#33

Closed

inc0 mentioned this pull request Oct 9, 2018

Katib needs to use durable storage that outlives the pod. kubeflow/katib#137

Closed

inc0 added 2 commits October 9, 2018 06:21

added short note for each alternative

d3541d9

node about studioml

b12584b

YujiOshima reviewed Oct 10, 2018

View reviewed changes

This was referenced Oct 10, 2018

Getting optimal parameters after a study is complete kubeflow/katib#122

Closed

Not use ModelDB as Default UI for Katib kubeflow/katib#199

Closed

ddutta reviewed Oct 10, 2018

View reviewed changes

user stories

9c80b2f

Zak Hassan and others added 2 commits October 10, 2018 16:41

Proposal to experiment tracking feature

f538c81

Adding more detail around experiment tracking.

Merge pull request #1 from zmhassan/experiment-tracking-zmhassan

668b1cc

Proposal to experiment tracking feature

durandom reviewed Oct 11, 2018

View reviewed changes

jlewi mentioned this pull request Oct 25, 2018

Deploy Model DB with durable storage kubeflow/kubeflow#1860

Closed

karlmutch mentioned this pull request Oct 26, 2018

Meta-Data Investigation leaf-ai/studio-go-runner#163

Closed

jlewi closed this Jul 19, 2019

woop pushed a commit to woop/community that referenced this pull request Nov 16, 2020

Update tftraining.md (kubeflow#195)

6e6ebfb

Provide VERSION variable with default value for example script

		* I’m part of the ML team in company. I would like to be able to track my parameters used to train an ML job and track the metrics produced. I would like to have isolation so I can find the experiments I worked on. I would like the ability to compare multiple training rules side by side and pick the best model. Once I select the best model I would like to deploy that model to production. I may also want to test my model with historical data to see how well it performs and maybe roll out my experiment to a subset of users before fully rolling out to all users.


		## Scale considerations

[WIP] Experiment tracking proposal #195

[WIP] Experiment tracking proposal #195

Conversation

inc0 commented Oct 3, 2018 • edited by jlewi Loading

k8s-ci-robot commented Oct 3, 2018

jlewi commented Oct 4, 2018

inc0 commented Oct 4, 2018

YujiOshima commented Oct 5, 2018

YujiOshima commented Oct 5, 2018

johnugeorge commented Oct 5, 2018

YujiOshima commented Oct 5, 2018

johnugeorge commented Oct 5, 2018

inc0 commented Oct 5, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

YujiOshima commented Oct 10, 2018

inc0 commented Oct 10, 2018

inc0 commented Oct 10, 2018

ddutta left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zak-hassan commented Oct 10, 2018 • edited Loading

inc0 commented Oct 10, 2018

googlebot commented Oct 10, 2018

jlewi commented Oct 11, 2018

ddysher commented Oct 11, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

durandom Oct 12, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

durandom Oct 11, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

inc0 commented Oct 11, 2018

zak-hassan commented Oct 12, 2018

ML Hyperparameter Tuning:

Unrelated items in proposal:

Conclusion

karlmutch commented Oct 29, 2018

jlewi commented Oct 30, 2018

zak-hassan commented Oct 30, 2018

inc0 commented Oct 30, 2018

karlmutch commented Oct 30, 2018 • edited Loading

andyk commented Nov 5, 2018

mparkhe commented Nov 5, 2018

Scalable backend store for Tracking Server data

APIs and Interfaces

MLflow UI

zak-hassan commented Nov 5, 2018 • edited Loading

jlewi commented Nov 5, 2018

mparkhe commented Nov 6, 2018

mpvartak commented Nov 6, 2018

jlewi commented Jul 19, 2019

BioGeek commented Aug 25, 2020

inc0 commented Oct 3, 2018 •

edited by jlewi

Loading

zak-hassan commented Oct 10, 2018 •

edited

Loading

durandom Oct 12, 2018 •

edited

Loading

durandom Oct 11, 2018 •

edited

Loading

karlmutch commented Oct 30, 2018 •

edited

Loading

zak-hassan commented Nov 5, 2018 •

edited

Loading