Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

start: Pipelines Trail #2857

Closed
iesahin opened this issue Sep 27, 2021 · 7 comments
Closed

start: Pipelines Trail #2857

iesahin opened this issue Sep 27, 2021 · 7 comments
Labels
A: docs Area: user documentation (gatsby-theme-iterative) C: start Content of /doc/start status: stale You've been groomed!

Comments

@iesahin
Copy link
Contributor

iesahin commented Sep 27, 2021

This is the third step in GS restructuring as we discussed in #2496 (may be closed by addressing this one).

See #2496 (comment), #2496 (comment), #2496 (comment)

It will introduce creating pipelines, adding stages and running them with dvc repro.

Create a pipeline

  • Why do we use pipelines in DVC?
  • What are dependencies

Add a stage

  • Introduce dvc stage add

Edit a stage

  • Introduce editing dvc.yaml
  • Mention dvc stage add --force?

Run the pipeline

  • Add another stage
  • Introduce dvc repro
  • Update an intermediate stage's dependency
  • Rerun the pipeline

Visualize the pipeline

  • List the stages
  • Show the DAG

Removing Stages

  • Introduce dvc remove

@shcheklein @jorgeorpinel @dberenbaum

@iesahin iesahin self-assigned this Sep 27, 2021
@dberenbaum
Copy link
Contributor

I think we could probably skip "Removing stages," especially if we introduce editing dvc.yaml.

@jorgeorpinel jorgeorpinel added the A: docs Area: user documentation (gatsby-theme-iterative) label Sep 28, 2021
This was referenced Oct 5, 2021
@shcheklein
Copy link
Member

Agreed with Dave. Overall - get started should not be a comprehensive overview. It should be a quick happy path that presents most important functionality and the value as fast as possible. Everything else comes secondary to that.

In this case it would be nice to start with dvc stage add, explain dvc.yaml, almost immediately (I would not even do subtitles for now) dvc repro or dvc exp run (exp run is probably even better). Then mention that pipelines could be advanced (templates), show pipeline.

That's pretty much it to be honest. Do we need two subsections for this - I don't know.

Ideally we would rely on one of the existing projects. Maybe the example-get-started one since it makes at least some sense to use pipelines there.

@iesahin
Copy link
Contributor Author

iesahin commented Oct 13, 2021

Ideally we would rely on one of the existing projects. Maybe the example-get-started one since it makes at least some sense to use pipelines there.

I can use example-get-started for this, but example-dvc-experiments also has a 2 stage pipeline, starting from extract (un-tar) and training with train.py. This one is simpler. example-get-started is a bit more complex.

@shcheklein
Copy link
Member

also has a 2 stage pipeline, starting from extract (un-tar)

this is an ungly, unfortunate hack that we need to remove eventually :) it's very sad that we have it now in the project. It's not sustainable and not how DVC should be used.

@iesahin
Copy link
Contributor Author

iesahin commented Oct 18, 2021

The fact that we had to hack may be a bit ugly but telling the pipelines without resorting to Python or code seems like an alternative to me. The user may have a bit difficulty to bridge the gap between usual commands and an ML project, but the basic mechanism might be told in a simpler way.

Anyway, no strong opinions here, I'll proceed with example-get-started.

@iesahin iesahin added the C: start Content of /doc/start label Oct 20, 2021
@jorgeorpinel

This comment was marked as resolved.

@jorgeorpinel jorgeorpinel removed the C: start Content of /doc/start label Mar 30, 2022
@jorgeorpinel jorgeorpinel added C: start Content of /doc/start and removed A: docs Area: user documentation (gatsby-theme-iterative) labels Jun 20, 2022
@jorgeorpinel
Copy link
Contributor

Guys do we still want a separate pipelines trail? Pipelining info is inside https://dvc.org/doc/start/data-management right now. I would personally like to see a separate one but I remember there were opinions agains that. I would put Experiments first, then Data Management, then Pipelines. WDYT @iesahin @dberenbaum @shcheklein ? Thanks

@jorgeorpinel jorgeorpinel added type: enhancement Something is not clear, small updates, improvement suggestions status: stale You've been groomed! labels Jun 20, 2022
This was referenced Sep 22, 2022
@jorgeorpinel jorgeorpinel added A: docs Area: user documentation (gatsby-theme-iterative) and removed type: enhancement Something is not clear, small updates, improvement suggestions labels Sep 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: docs Area: user documentation (gatsby-theme-iterative) C: start Content of /doc/start status: stale You've been groomed!
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants