Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MLEM Release blog post #3575

Merged
merged 13 commits into from
Jun 1, 2022
Merged

MLEM Release blog post #3575

merged 13 commits into from
Jun 1, 2022

Conversation

aguschin
Copy link
Contributor

No description provided.

@gatsby-cloud
Copy link

gatsby-cloud bot commented May 19, 2022

Gatsby Cloud Build Report

dvc.org

🎉 Your build was successful! See the Deploy preview here.

Build Details

View the build logs here.

🕐 Build time: 1m

Performance

Lighthouse report

Metric Score
Performance 🔶 60
Accessibility 💚 98
Best Practices 🔶 83
SEO 💚 93

🔗 View full report

Comment on lines 217 to 240
ML model registries give your team key capabilities:

- Collect and organize model [versions] from different sources effectively,
preserving their data provenance and lineage information.
- Share metadata including [metrics and plots][mp] to help use and evaluate
models.
- A standard interface to access all your ML artifacts, from early-stage
[experiments] to production-ready models.
- Deploy specific models on different environments (dev, shadow, prod, etc.)
without touching the applications that consume them.
- For security, control who can manage models, and audit their usage trails.

Many of these benefits are built into DVC: Your [modeling process] and
[performance data][mp] become **codified** in Git-based <abbr>DVC
repositories</abbr>, making it possible to reproduce and manage models with
standard Git workflows (along with code). Large model files are stored
separately and efficiently, and can be pushed to [remote storage] -- a scalable
access point for [sharing].

To make a Git-native registry (on top of DVC or not), one option is to use [GTO]
(Git Tag Ops). It tags ML model releases and promotions, and links them to
artifacts in the repo using versioned annotations. This creates abstractions for
your models, which lets you **manage their lifecycle** freely and directly from
Git.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part I took from @jorgeorpinel's PR: #3333

Copy link
Contributor

@jorgeorpinel jorgeorpinel May 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to reuse explanations from other places that's fine but rephrase them in your own words (the way you understand it). Blog posts should have a consistent author's voice IMO.

OK to have very small sections (ad admonition, a sentence or 2) copy/pasted between blog and docs.

@aguschin
Copy link
Contributor Author

@jendefig @jorgeorpinel @jurv11 would be glad to get some comments. I added my part of the text very quickly and this is WIP, so not sure you need to provide very detailed feedback for this iteration. Does the structure work? Do some examples seem irrelevant? Did I miss to demonstrate some big ideas down the road? Thanks!

Comment on lines 8 to 43
We’re excited to announce the launch of our latest open source offering,
[MLEM](https://mlem.ai)! MLEM is a tool that automatically extracts meta
information like environment and frameworks from models and standardizes that
information into a human-readable format within Git. ML teams can then use the
model information for deployment into downstream production apps and services.
MLEM easily connects to solutions like Heroku to dramatically decrease model
deployment time.
picture: 2022-05-24/mlem-rocket.png
author: aguschin
# commentsUrl: TODO
tags:
- Machine Learning
- Deployment
- Model Registry
- MLOps
---

We built MLEM to address issues that MLOps teams have around managing model
information as they move them from training and development to production and,
ultimately, retirement. MLEM is meant to help teams automate the collection of
information around how the model was trained, what the model is for, and
operational requirements around deployment.

Just like all our [other](https://dvc.org) [tools](https://cml.dev), MLEM uses
your Git service to store model information and connects with CI/CD solutions
for deployment (like Heroku). This Git-based model
([one of our core philosophies](https://iterative.ai/why-iterative/)) aligns
model operations and deployment with software development teams – information
and automation is all based on familiar DevOps tools – so that deploying any
model into production is that much faster.

With MLEM, ML teams get:

- Human-readable information about a model for search and documentation
- One-step automated deployment across any cloud
- Fast model registry setup based on Git
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part I took from @jurv11 doc

@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented May 19, 2022

What's the concept behind the image? Dog in rocket "to the moon" looks a bit like some cryptocurrency meme. Maybe it's just my bias but that could be misleading.

image

@jendefig
Copy link
Contributor

jendefig commented May 19, 2022

What's the concept behind the image?

MLEM takes the different models (on the mlem rocket ship with the mlem dog) and deliver to deploying to the different stars in space. It's too late to change image.

@jendefig
Copy link
Contributor

jendefig commented May 20, 2022

Why has this not been deployed? @julieg18
Nevermind. I see it. Why is it at the top and not the bottom?

@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented May 20, 2022

@aguschin the structure makes sense

  1. Intro: feel free to provide even more background and motivation if you want. You can add a link to jump to TL;DR ("With MLEM, ML teams get") if it gets long.
    + State that MLEM is a Python-specific tool (currently not explicit anywhere).

  2. Model metadata codification: Not sure we should emphasize "human-readable" (really that depends on whether you're familiar with YAML). I think that the key aspect here is the special "magic" (ML framework integrations) to automatically capture all the relevant modeling context.

  3. Run models anywhere: Good catchy phrase. Should we also use buzz word "productionize" though? That would include packaging & distributing, running in batch (ETL), containerize/cloud deploy, or serve directly -- all MLEM features. I don't think "to deploy" captures all of that.

  4. Git-native model registry: This section doesn't really talk about MLEM.
    The core registry features would be provided by GTO right? So should this be about making Git-based model catalogs (link to GTO+DVC use case) deployable? Although that could be redundant with 2....

    Maybe this can be reduced to a single paragraph somewhere and wait for the GTO release post (is there one planned?) to go into details.

@jorgeorpinel
Copy link
Contributor

Git-native model registry: This section doesn't really talk about MLEM.

p.s. I think I know what the issue is: we mention Git in the abstract and intro but never explain (in the codification section) that you can version .mlem files with Git, bringing you to GitOps. That context is a missing piece of the puzzle rn.

Copy link
Contributor

@jendefig jendefig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking really good! Added some thoughts/changes/questions/comments

content/blog/2022-05-24-MLEM-release.md Outdated Show resolved Hide resolved
information around how the model was trained, what the model is for, and
operational requirements around deployment.

Just like all our [other](https://dvc.org) [tools](https://cml.dev), MLEM uses
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of doing these two links, maybe we should send to... Ok nevermind. I thought we have a product page at iterative.ai, but it's just a drop - down. cc: @jurv11 @julieg18, we should add this to the website list if we don't have it on there yet. There's the pricing page which shows all the tools, but that's not where we would want to send people in this case.

content/blog/2022-05-24-MLEM-release.md Show resolved Hide resolved
content/blog/2022-05-24-MLEM-release.md Outdated Show resolved Hide resolved
content/blog/2022-05-24-MLEM-release.md Outdated Show resolved Hide resolved
[gitops]: https://www.gitops.tech/

MLEM is a core building block for a Git-based ML model registry, together with
other Iterative tools, like GTO and DVC.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GTO - other than at ODSC East and those that have found repo, we haven't really exposed GTO. We probably need more links/explanation/docs/repo pointing here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ALso I'm realizing we need to address that in the image for Twitter....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think we need something. I'm also thinking about a technical page that explains how to set up MLEM + GTO + DVC together.

content/blog/2022-05-24-MLEM-release.md Outdated Show resolved Hide resolved
@aguschin aguschin self-assigned this May 20, 2022
Copy link
Member

@shcheklein shcheklein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work @aguschin !

Some comments:

  1. Intro is very long - the whole screen of text that goes into explanations about Git, etc, etc w/o giving me fist even idea what the tool is about. My 2cs - start simpler with "With MLEM, ML teams get:", then some before / after side by side then some deployment magic. Explanations can go in the middle. A bit extreme, probably the best format is something in between :)
  2. ... codification - not sure this is the best, codification is still niche, probably better to avoid it, be more explicit or use that + explanation
  3. DVC pipelines - I think if we want to include it - let's do a separate section at the end. Describe storage and pipelines. Otherwise it makes text too complicated, we can't expect people to know DVC, etc, etc
  4. The main goal of MLEM is to provide you a single tool that enables any kind of model productionization scenarios. - why don't we mention this in the very beginning of the blog post?
  5. Git-native - on the fence here on using it in the title 🤔
  6. What's next - need to put an image, make it more actionable? Start - can be an emoji, etc ... can we make some competition or some viral thingy on Twitter here cc @jendefig ?


With MLEM, ML teams get:

- **Model metadata codification**: Human-readable information about a model for
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like codification is only for search and docs, but this undersells it. This meta-information is needed to deploy things in the first place, to build clients faster, etc? This is main purpose.

Ideally we can converge this into a single value prop - packaging models to deploy, everything else comes as a benefit on top?

otherwise we start with some philosophy, then we go into codification ... and only after we go into deployment ... and only after into model registry ... it feels it should be presenting things other way around - high level solution / value prop first, then goes into impl details and ... or at least they should come really close to each other

I hope it makes sense :) happy to brainstorm on this more if needed ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If "codification" is to niche or technical maybe speak of the user benefits like "reliable, standard metadata".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your review, @shcheklein!

  1. It seems I can't distribute the first section paragraphs ("We built MLEM to address..." and "Just like all our..." and "Capturing model-specific...") anywhere except for the second section "Model metadata codification". At least in the current form. So I can try to rewrite those and move them to the rest of the document. But after addressing your other comments that may be not needed anymore. Please let me know WDYT.
  2. I think we need to use "codify". It sounds great and explains what MLEM does with metainformation in a single word - that's good for the quicker explanations later. I've provided some description about codification right after the first word occurence. Do you think it's enough?
  3. Removed DVC code examples.
  4. I think this is addressed now.
  5. "Productionize your models with MLEM in a Git-native way" maybe?
  6. I put a picture with a dog asking for the stars for now :)

@jendefig jendefig added the C: blog TEMPORARY Content of /blog label May 20, 2022
@jendefig
Copy link
Contributor

Is this ready for release @aguschin ?

@aguschin
Copy link
Contributor Author

Yes, unless @shcheklein or @dmpetrov wants to provide some feedback. If you need this ASAP, I think it's ok to take it as is.

@shcheklein shcheklein added A: docs Area: user documentation (gatsby-theme-iterative) p1-important Active priorities to deal within next sprints labels May 25, 2022
Copy link
Contributor

@jendefig jendefig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found some grammar/typos

content/blog/2022-05-24-MLEM-release.md Show resolved Hide resolved
content/blog/2022-05-24-MLEM-release.md Outdated Show resolved Hide resolved
content/blog/2022-05-24-MLEM-release.md Outdated Show resolved Hide resolved
content/blog/2022-05-24-MLEM-release.md Outdated Show resolved Hide resolved
content/blog/2022-05-24-MLEM-release.md Outdated Show resolved Hide resolved
content/blog/2022-05-24-MLEM-release.md Outdated Show resolved Hide resolved
content/blog/2022-05-24-MLEM-release.md Outdated Show resolved Hide resolved
content/blog/2022-05-24-MLEM-release.md Outdated Show resolved Hide resolved
content/blog/2022-05-24-MLEM-release.md Outdated Show resolved Hide resolved
content/blog/2022-05-24-MLEM-release.md Outdated Show resolved Hide resolved
Docker Image, or export it as some special format (like `.onnx` which is coming
soon).

```shell
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iterative/websites do we have syntax highlighters ready for MLEM?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a Gatsby Cloud issue that is preventing us from merging #3396. It's already available on other websites.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yathomasi, we can't use cli highlighter here also yet, right?

@aguschin aguschin merged commit 1579a65 into master Jun 1, 2022
@aguschin aguschin deleted the blog-mlem-release branch June 1, 2022 13:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: docs Area: user documentation (gatsby-theme-iterative) C: blog TEMPORARY Content of /blog p1-important Active priorities to deal within next sprints
Projects
No open projects
Status: Done
Development

Successfully merging this pull request may close these issues.

5 participants