Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dvc-cml container working with Gitlab and Github #12

Merged
merged 4 commits into from
Mar 20, 2020
Merged

Conversation

DavidGOrtega
Copy link
Contributor

No description provided.

- name: Publish to dockerhub
uses: elgohr/Publish-Docker-Github-Action@master
with:
name: davidgortega/dvc-cml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we use iterative? where does this name go?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! This is actually an error, in the original branch I had it right but I had to redo it and I overlooked it.

[your own runners](https://help.github.com/en/actions/hosting-your-own-runners)
with special capabilities like GPUs.
tool for ML experimentation. This repo offers the possibility of using DVC to
establish your ML pipeline to be run by Github Actions runners or Gitlab
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Github Action runners

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Github product is named Github Action, maybe has to be double quoted?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was about typo Actions - Action .. in english you don't put two plurals one after another

README.md Outdated
or [your own Gitlab runners](https://docs.gitlab.com/runner/) with special
capabilities like GPUs...

Major beneficts of using DVC-CML in your ML projects includes:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not clear so far what CML stands for to be honest

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Continuous Machine Learning, I have not chosen the name but I really like it

README.md Outdated

- Reproducibility: DVC is always in charge of maintain your experiment tracking
all the dependencies, so you don't have to. Additionally your experiment is
always running under the same software constrains so you dont have to worry
about replicating the same enviroment again.
always running under the same constrains so you dont have to worry about
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't

do you run some editor with spell checking, by chance? )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope this is not going to be the real README! This is going to be redone by someone else as far I understand.

README.md Outdated
- Releases: DVC-action tags every experiment that runs with repro. Aside of that
DVC-action is just a job inside your workflow that could generate your model
releases or deployment according to your bussiness requirements.
experiments run through the DVC Report offeered as checks in Github or
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here ... please use some tools to fix simple language typos, stylistic mistakes, etc

README.md Outdated
experiments run through the DVC Report offeered as checks in Github or
Releases in Gitlab.
- Releases: DVC-action tags every experiment that runs with repro generating the
report. Aside of that DVC-cml is just a step in your
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DVC-cml or DVC-CML - use one style

README.md Outdated
or [your own Gitlab runners](https://docs.gitlab.com/runner/) with special
capabilities like GPUs...

Major beneficts of using DVC-CML in your ML projects includes:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we mention benefits, and non of them about simple things that are essential to CI (running tests/training independently to make sure that build is "green") and we don't mention another big one - running infra for you to train something

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about the last benefit, actually, I would expect ML users to be running their own runners using gpu, locally or cloud like AWS or Azure or any other gpu vendor.
IMHO the benefits of CI ML are Releases and Reproducibility and having everything containerised helps with that a lot since the environments are going to be consistent.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, it still manages the workflow for you. As a data scientist I don't care if there is AWS machine or something - I just push and wait. I don't provision, dockerize, SSH, copy data, etc ... It's one of the major benefits of this whole thing unless I'm missing something cc @dmpetrov

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I see. Yep, thats a benefit. Actually shared by teaming and reproducibility.
Its very easy to have a model and results just only branching and pushing new changes, without having to setup the enviroment.
And its reproducible since all are working with the same software/hardware constrains...

README.md Outdated

Example of a simple DVC-cml workflow in Gitlab:

> :eyes: Some needed variables like remote credentials and GITLAB_TOKEN are
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needed -> required
variables -> environment variables
are setted -> are set (or even come up with a better term)
as CI/CD ... -> as Gitlab Runners ... in Gitlab settings ... or what is the right term here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI/CD enviroment variables is the way to go in Gitlab

README.md Outdated

dvc:
stage: dvc_action_run
image: davidgortega/dvc-cml:dev
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dvcorg/dvc-cml:latest

README.md Outdated
</details>

This workflow will run everytime that you push code or do a Pull/Merge Request.
When triggered DVC-cml will setup the runner and DVC will run the pipelines
specified by repro_targets. Two scenarios may happen:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

repro_targets -> repro_targets

README.md Outdated
| metrics_diff_targets | string | no | | Comma delimited array of metrics files. If not specified will use all the metric files |
| rev | string | no | origin/master | Revision to be compared with current experiment. I.E. HEAD~1. |

> :warning: In Gitlab is needed that you generate the GITLAB_TOKEN that is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is required

README.md Outdated

> :warning: In Gitlab is needed that you generate the GITLAB_TOKEN that is
> analogous to GITHUB_TOKEN. See
> [Tensorflow Mnist in Gitlab](#tensorflow-mnist-in-gitlab) example For a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For -> for

- actions/setup-python

Example of a simple DVC-action workflow:
> :eyes: Note the use of the container
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

explain why do I need to note this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To not forget it in the job definition. It might be a pitfall. People adding DVC-CML inside an existing job that and the don't add the container section

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, put that explanation in the note itself?

action.yml Outdated
@@ -1,29 +1,6 @@
name: 'DVC-action'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is name, description different now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Its not actually doing anything and its only useful if we publish the repo as a Github Action

@@ -0,0 +1,27 @@
FROM ubuntu:18.04

LABEL Iterative Inc
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: Iterative, Inc

src/gitlab.js Outdated
CI_PROJECT_URL,
CI_COMMIT_REF_NAME,
CI_COMMIT_SHA,
// CI_COMMIT_BEFORE_SHA,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we don't need it - remove it

Copy link
Member

@dmpetrov dmpetrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 some comments are inline

.github/workflows/deploy.yaml Show resolved Hide resolved
- name: Publish to dockerhub
uses: elgohr/Publish-Docker-Github-Action@master
with:
name: davidgortega/dvc-cml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we use iterative/dvc-cml?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dvcorg/dvc-cml its the dockerhub 🙂

@dmpetrov dmpetrov merged commit 37d8e9a into master Mar 20, 2020
@dmpetrov dmpetrov deleted the dvc-cml-container branch March 20, 2020 19:15
or [your own Gitlab runners](https://docs.gitlab.com/runner/) with special
capabilities like GPUs...

Major benefits of using DVC-CML in your ML projects includes:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still don't understand this list of benefits ... can you summarize them w/o this official language - like A,B,C - the way you understand them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants