`example-dvc-experiments`: Include CML configuration #83

iesahin · 2021-09-03T04:21:49Z

@casperdcl what do we need from to make this repo useful for CML happy-path?

@iesahin @casperdcl as we discussed can we make it example-experiments that would cover basic scenarios with predefined language (python), predefined framework (let's say tensorflow for now). It would have CML action from the first (?) commit that could be run if it's needed (and may be even runs automatically).

Then it'll be a good repo that we can even meaningfully present in Studio?

What do we need to make it substantially useful for CML?

Originally posted by @shcheklein in #79 (comment)

The text was updated successfully, but these errors were encountered:

casperdcl · 2021-09-03T12:29:27Z

@shcheklein:

the CML use case (specifically auto-push checkpoints) is significantly different from any of the DVC use cases (CC @DavidGOrtega)
- it also requires significant additional config (workflow.yaml to both run and also auto-create more (!) branches & open PRs, creds for cloud compute, creds for remote storage, additional env vars, code written for max num epochs)
placing the CML example in the same repo as DVC examples is very confusing to users. Switching branches is an unnecessary complexity on top of an already complex example.

I'd strongly suggest the CML case is pushed to a different repo (example-cml-experiments, with separate folders/branches for with & without dvc) rather than more branches in example-dvc-experiments/example-experiments.

shcheklein · 2021-09-03T18:20:19Z

the CML use case (specifically auto-push checkpoints) is significantly different from any of the DVC use cases

how is it different in terms of the project? let's try to scope it here

also, let's scope the "happy-path", get started experience, etc ... what is the purpose of the repo for CML - tutorial, use case, get started? what are things that we'd like to show?

it also requires significant additional config

this should not be a problem to my mind, additional GH action config is totally fine to have (people won't see it unless you point to it)

placing the CML example in the same repo as DVC examples is very confusing to users.

agreed, if we talk about CML in general (when DVC is not being used at all). If we talk about DVC+CML - I'm not sure why that would be confusing?

And to clarify, name should be generic here in that case - example-experiments.

with separate branches

example per branch is bad for a lot of reasons - branches are first class citizens in the DVC workflow and mixing them this way is bad to my mind (think about connecting such a repo to DVC Studio), or running a command like metrics -a, etc.

rather than more branches in example-dvc-experiments/example-experiments

yep, agreed - I would not do branches. See above. It should be a simple repo like the existing get-started one that covers happy path across DVC, CML, DVCLive ... to clarify, I also don't think that it will cover everyrthing ... but we should be all optimizing for simplicity and try hard to have a common ground where all tools integrate nicely

casperdcl · 2021-09-03T18:26:40Z

the scope of the CML "config" stuff: I'd put this in brackets (workflow.yaml to both run and also auto-create more (!) branches & open PRs, creds for cloud compute, creds for remote storage, additional env vars, code written for max num epochs)
one that covers happy path across DVC, CML, DVCLive ...: ah, I agree this is a nice thing to have; one example repo that uses best-practice-of-everything. However I think that is a separate issue. I though we were talking about just CML example repos here (for use in https://cml.dev/doc/X where X is use-cases, user-guide, how-to, tutorial, example, blog, etc.)

shcheklein · 2021-09-03T22:22:24Z

the scope of the CML "config" stuff:

this scope sounds good to me, that's what we do for the get-started-example, and there is not contradiction so far. I see only benefits in this.

I though we were talking about just CML example repos

yes, but this discussion started when we were trying to use mnist repo (and codify it) for CML as far as I understand?

Ideally I would then plan a bit - what kind of repositories will you need for CML, what of them you will need to codify, etc? No doubt there will be a lot of smaller repos (considering that we have Gitlab/Github/Bitbucket + different clouds + different scenarios like Ternsorboard). It's a separate question how do you want to build them, which of them to codify etc. Same with dvclive - if we want to cover all possible integrations we'll need a separate repo(s) to do that.

Here we are talking more about get started experience I think.

Back to my initial question - would it be useful/possible to create example-experiments repo that will be used in all the docs related to experiments, at least happy path, get started like? (may be we'll have to do Gitlab/Bitbucket versions, and learn how to push to three platforms).

iesahin · 2021-09-06T06:35:42Z

I'd propose to determine the most common cases (i.e. happy path?) for the related technologies and bundle them in a common repository, and additionally have smaller repositories that may be used as a showcase.

In the CML case, it seems Github configuration with AWS. This can be default in example-experiments, and we can use other repositories for example-experiments-gitlab, etc. These custom repositories can be used for testing and templating for the new user projects.

Codification for the configuration is straightforward. We just need to determine at which stage it's most relevant to configure.

iesahin · 2021-11-09T13:07:45Z

After reviewing this again, I think providing a repository generator (a la example-repos-dev) is more appropriate. We can have a "get-started" script that initializes a repository per the user's needs, after prompting for them.

...
$ Do you want to include CML configuration? (y/N)
y
$ For which cloud provider do you want to setup CML for? 
1: Github
2: Gitlab
3: Bitbucket
3
...

Otherwise, it will be difficult to keep tabs to create a separate repository on every possible setup. Also, I'm not sure we know happy path for all kinds of users, some may want a simple repository, others may want bells and whistles.

dberenbaum · 2021-11-10T14:52:47Z

Sounds a lot like creating our own cookiecutter. Having a fork of https://github.com/drivendata/cookiecutter-data-science could be a way to get users started quickly.

DavidGOrtega · 2021-11-10T22:01:56Z

n the CML case, it seems Github configuration with AWS. This can be default in example-experiments, and we can use other repositories for example-experiments-gitlab, etc. These custom repositories can be used for testing and templating for the new user projects.

I have a repo that its a full example (also integration tester) of DVC-CML for GL, BB and GH.
It mirrors every change in the other vendors.
Im giving it the final touches and I will give it back to iterative

iesahin · 2021-11-15T16:36:54Z

Sounds a lot like creating our own cookiecutter. Having a fork of https://github.com/drivendata/cookiecutter-data-science could be a way to get users started quickly.

That's a better idea. @dberenbaum

casperdcl · 2022-05-17T12:45:16Z

srry haven't followed this since Sept 2021 🙈 😅

See the list at the top of #100 for the current CML example repo layout:

https://github.com/iterative/example_cml (/doc/start/github/)

https://github.com/iterative/cml_base_case (/doc/start/github/, /doc/usage?tab=GitHub#example-projects)

https://github.com/iterative/cml_dvc_case (/doc/cml-with-dvc, /doc/usage?tab=GitHub#example-projects)

https://github.com/iterative/cml_tensorboard_case (/doc/usage?tab=GitHub#example-projects)

https://github.com/iterative/cml_cloud_case (/doc/usage?tab=GitHub#example-projects)

equivalents of above for GitLab & (soon) Bitbucket

So it's a lot of potential complexity. In terms of "single example happy path showcase of all products" I'd suggest 2 options:

With extra credentials required
- DVC (data CRUD, pipelines, plots + metrics, DVC_EXP_AUTO_PUSH for spot recovery)
- CML (runners, spot recovery, reports, tensorboard-dev)
- DVCLive (live reports?, saving epoch statefile for spot recovery)
- GHActions (CI)
- AWS (storage CRUD, runners)
- badges
- Studio
- Codespaces + VSCode extension
No extra creds required
- DVC (data CRUD, pipelines, plots + metrics, ~~DVC_EXP_AUTO_PUSH for spot recovery~~)
- CML (~~runners~~, ~~spot recovery~~, reports, ~~tensorboard-dev~~)
- DVCLive (live reports?, ~~saving epoch statefile for spot recovery~~)
- GHActions (CI)
- AWS (storage CRUD, ~~runners~~)
- badges
- Studio
- Codespaces + VSCode extension

I don't know whether this is within the scope of example-dvc-experiments from dvc exp getting started.

iesahin added enhancement New feature or request priority-p1 Immediate pool of tickets to take and work as part of the next sprint labels Sep 3, 2021

iesahin assigned iesahin and casperdcl Sep 3, 2021

casperdcl mentioned this issue Sep 3, 2021

rename get-started-experiments to dvc-example-experiments. #79

Closed

iesahin changed the title ~~example-dvc-experiments: Improve to use for CML~~ example-dvc-experiments: Include CML configuration Sep 6, 2021

iesahin mentioned this issue Sep 6, 2021

example-dvc-experiments: Improvements #85

Closed

2 tasks

shcheklein added the A: example-get-started-experiments DVC Experiment, DVCLive examples label May 11, 2022

This was referenced Jun 10, 2022

integrations: Load model_file for resuming iterative/dvclive#140

Closed

Feature exp run: Dryer resume within the CI iterative/dvc#6823

Closed

iesahin assigned iesahin and unassigned iesahin and casperdcl Jun 21, 2022

shcheklein closed this as completed Jan 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`example-dvc-experiments`: Include CML configuration #83

`example-dvc-experiments`: Include CML configuration #83

iesahin commented Sep 3, 2021

casperdcl commented Sep 3, 2021 •

edited

Loading

shcheklein commented Sep 3, 2021

casperdcl commented Sep 3, 2021 •

edited

Loading

shcheklein commented Sep 3, 2021

iesahin commented Sep 6, 2021 •

edited

Loading

iesahin commented Nov 9, 2021

dberenbaum commented Nov 10, 2021

DavidGOrtega commented Nov 10, 2021 •

edited

Loading

iesahin commented Nov 15, 2021

casperdcl commented May 17, 2022 •

edited

Loading

example-dvc-experiments: Include CML configuration #83

example-dvc-experiments: Include CML configuration #83

Comments

iesahin commented Sep 3, 2021

casperdcl commented Sep 3, 2021 • edited Loading

shcheklein commented Sep 3, 2021

casperdcl commented Sep 3, 2021 • edited Loading

shcheklein commented Sep 3, 2021

iesahin commented Sep 6, 2021 • edited Loading

iesahin commented Nov 9, 2021

dberenbaum commented Nov 10, 2021

DavidGOrtega commented Nov 10, 2021 • edited Loading

iesahin commented Nov 15, 2021

casperdcl commented May 17, 2022 • edited Loading

`example-dvc-experiments`: Include CML configuration #83

`example-dvc-experiments`: Include CML configuration #83

casperdcl commented Sep 3, 2021 •

edited

Loading

casperdcl commented Sep 3, 2021 •

edited

Loading

iesahin commented Sep 6, 2021 •

edited

Loading

DavidGOrtega commented Nov 10, 2021 •

edited

Loading

casperdcl commented May 17, 2022 •

edited

Loading