User Guide / Manage Experiments #828

dashohoxha · 2019-11-29T13:37:30Z

Close #816, Close #159

ghost · 2019-12-02T18:25:13Z

@dashohoxha, the "manage experiments by directories" image could improve a lot by using a more significant naming.

What I've seen is that users create directories to try different hypothesis, not exactly different versions 🤔

dashohoxha · 2019-12-03T08:52:57Z

the "manage experiments by directories" image could improve a lot by using a more significant naming

What I've seen is that users create directories to try different hypothesis, not exactly different versions

@MrOutis I have used dummy names just to give an idea of how it should look like (and also to show that sub-experiments can be used as well, that is experiments that are based on other experiments).
I have done the same thing (using dummy names) for tags and branches as well, just to give an idea.

However I completely agree with you. If you (or someone else) could provide some more realistic example names for directories and subdirectories, I can fix that image.

dashohoxha · 2019-12-04T10:28:41Z

I am aware that it is not complete or perfect, especially the pages of directory management and combined management. However this is the best that I could do. If someone has any ideas for improving it further this would be great.

shcheklein · 2019-12-04T21:23:06Z

It looks great, @dashohoxha !

A few comments from the first pass I've done:

can we make an image with directories the same style as tags and commits? they look better, may be since they have at least some color to them?
Manage -> Managing - otherwise we have to rename all other UG sections: Updating ...., Managing ..., Versioning, etc
I would say commits - is another way to do experiments. May be we can mention this somewhere?

I'll read it more carefully today.

dashohoxha · 2019-12-05T05:25:03Z

can we make an image with directories the same style as tags and commits? they look better, may be since they have at least some color to them?

Unfortunately this is not easy with the tool that I am using (umlet).

Manage -> Managing - otherwise we have to rename all other UG sections: Updating ...., Managing ..., Versioning, etc

I used "Manage Experiments" as a shorter version of "How To Manage Experiments". It seems smother to read than "Managing Experiments". However I agree with your consistency argument.

I would say commits - is another way to do experiments. May be we can mention this somewhere?

This seems like the case of tags. If you do a single commit for each experiment, then why not put a tag for each commit? It is true that you don't need the tags to switch back to a previous commit (you can see the logs and find out the commit ID) but it is more easy with tags. Besides, the command dvc metrics show works with tags and branches, but does not work with commits, does it?

pared

Nice read, also I like a lot last points for Branches and Tags. Moving the best to master is definitely not beginner's use case.

shcheklein · 2019-12-05T18:31:18Z

static/docs/user-guide/experiments/branches.md

+
+```dvc
+$ git checkout bigrams
+$ git branch -c master old-master


add branch, diff and other commands that are no highlighted to the list here:

dvc.org/src/Documentation/Markdown/lang/dvc.js

Line 59 in bd774bf

keyword:

what's the -c and -C for?

I couldn't follow up with a lot of branching going on 😅 would be great if you could add some comments

shcheklein · 2019-12-05T18:33:17Z

static/docs/user-guide/experiments/tags.md

+$ git add .
+$ git commit -m 'Baseline experiment'
+$ dvc commit
+$ git tag baseline-experiment


it is a lightweight tag. Why do we use it here as opposed to the regular one below? Are you sure that metrics show works well with it?

This one is a "normal" tag, the other one is an annotated tag. It is just an example.
As far as I know metrics show works with all the tags.

it's not even about what can be called normal. It's more about - why do we use a lightweight here and an annotated below? If there are some reasons then explanation is required when should I use what type.

I don't think there is any reason. They are both tags and both of them can be used.

so, why use both? let's simplify. Though, there are some guidelines behind those. I think people tend to use lightweight tags for things that they won't be sharing/saving. May be we can use that difference as a guideline? Not sure that we need put so many details - I would probably just keep simple version in both cases if it works fine.

shcheklein · 2019-12-05T18:34:49Z

static/docs/user-guide/experiments/tags.md

+```dvc
+$ git checkout baseline-experiment
+$ dvc checkout
+$ dvc repro evaluate.dvc


why do we need to run repro here?

Maybe to make sure that dvc checkout has retrieved all the correct data? I don't know.

dvc status is better then and makes more sense?

You are right dvc status would make more sense.

shcheklein · 2019-12-05T18:36:15Z

static/docs/user-guide/experiments/tags.md

+$ git diff --cached
+$ git revert --continue
+
+# delete the old tag and add it to the current version


please split it into two blocks

move comments out and put them as some human readable explanation to the blocks

shcheklein · 2019-12-05T18:37:39Z

static/docs/user-guide/experiments/tags.md

+
+### Move the best experiment on top
+
+Let's say that `baseline-experiment` has the best performance and we want to


Those are destructive, right? Let's put a note with ! emoji.

What do you mean by "destructive"?

Yep, right. I probably wanted to put this comment to the branches way of doing experiments.

So, how about branches part? In that place where we do force push and stuff - let's put a comment there?

shcheklein · 2019-12-05T18:37:57Z

static/docs/user-guide/experiments/tags.md

+
+```dvc
+$ git tag
+$ git log --oneline


show output here?

I don't think it will help to improve the explanation in this case.

I would improve for me as I read it for the first time. Reading it for the second time - I don't even understand why do we need git log --oneline everywhere. You are not saying a word. I meant btw, an output of the git tag:

$ git tag baseline bnigrams

it would be easier to read and it is what the previous paragraph is about.

Both git tag and git log --oneline are ways to check what tags are available. Different from git tag, git log --oneline also shows the position of the tags in the history of commits and their order, which one is first and which one is last.

This explanation makes it 10x better! You see, you can do it :) And I'm not kidding - I didn't know the difference between those related to the tags.

I still have some questions though - would log include all commits? If you have a lot of commits (very regular thing to have in real project) tags be lost.

Is there an option for the git tag command to sort output?

I would still prefer to simplify it tbh. And put some output - it will make way easier to understand and read. At least you have one reader who is saying that it would be easier to read. Don't rely on your opinion here.

src/Documentation/sidebar.json

shcheklein · 2019-12-05T18:43:21Z

static/docs/user-guide/experiments/dirs.md

+which is based on the first one, often it is as easy as:
+
+```dvc
+$ cp --reflink -R experiment1/ experiment2/


reflinks are not supported on majority of systems. So, it's not that easy to copy it w/o hitting some performance problems by copying data. We should think a bit more on what should we do here.

reflinks are not supported on majority of systems

"reflink" is a filesystem feature, and all the systems support filesystems with reflink. For example in Linux there are XFS, Btrfs, ZFS etc. The option "--reflink" here is a hint or a reminder that they need to use a filesystem that supports reflinks (like XFS), which they should be doing anyway since the start of the project.

see my bigger comment in the review. We can't rely on XFS, etc - for now 80-90% of systems are not configured to use them by default. And it won't change anytime soon. So, if we want to provide a meaningful solution to experiments management we should at least mention something about data, code and other things.

80-90% of systems are not configured to use them by default

That's right, but they are not configured to use DVC by default either. Installing and using XFS is actually much easier than installing and using DVC.
What I am trying to say is that if someone is working on a data project, he can and should install and use XFS (or some other deduplicating filesystem). This is a basic requirement.

It's very very far from the reality. There are a lot (majority?) of cases when you don't have a choice. Your argument applies only to freelancers and students.

shcheklein · 2019-12-05T18:44:32Z

static/docs/user-guide/experiments/index.md

+- [How to Manage Experiments by Tags](/doc/user-guide/experiments/tags)
+- [How to Manage Experiments by Branches](/doc/user-guide/experiments/branches)
+- [How to Manage Experiments by Directories](/doc/user-guide/experiments/dirs)
+- [How to Manage Experiments by Several Methods](/doc/user-guide/experiments/mixed)


by a Mix of Methods? Several methods sounds strange a bit

I don't actually mind, choose the wording that seems right to you.

@jorgeorpinel what's your take on this and on the one #828 (comment) here? what the best way to word it?

Maybe ** How to Manage Experiments with a Mix of Methods**
or use the word "Combination", as in that doc's intro.

shcheklein · 2019-12-05T18:45:13Z

static/docs/user-guide/experiments/mixed.md

@@ -0,0 +1,26 @@
+# How to Manage Experiments by Several Methods


Mix Methods to Manage Experiments?

That's fine too.

Maybe ** How to Manage Experiments with a Mix of Methods**
or use the word "Combination", as in the intro paragraph.

shcheklein · 2019-12-05T18:47:55Z

static/docs/user-guide/experiments/dirs.md

+
+## How it works
+
+If we have a directory named `experiment1/` which contains the pipeline of the


except the pipeline - what else does this directory contain? what about code? do we copy the whole project?

I think this is too project specific and it is impossible to explain such details on a section like this.
The best that can be done is to have a concrete example somewhere else and to link it from the section of the examples.

there are a few recommendations we can make - clean data before copying? use a simple JSON conf? use DVC-files that are setup properly to be relative.

Those ^^ are general problems. Without giving some hints at least it's not very actionable. People who can come with a solution to those problems don't even probably need a section like this (and I can send you a few names).

there are a few recommendations we can make - clean data before copying? use a simple JSON conf? use DVC-files that are setup properly to be relative.

All these seem common sense to me. Anyway I don't see how they can be explained without having an example as a reference.
Maybe someone can give it a try and let's see whether they make sense.

Since there is no understanding from our end on how to give a guidance let's remove this page then for now and wait until the ticker on DVC core is resolved and wait for someone to take the #159.

All these seem common sense to me.

If they are common sense why would it be so hard to explain/mention them?

shcheklein

It looks good. DIrectory image is not a blocker - disregard the comment. Commits - agreed, we can update it later when we have some interface ready.

Did a full pass review. Did some minor modifications. Mostly minor comments that should be easy to address. The biggest one major concern is the logic behind dirs experiments - it makes sense on the high level but can we tricky to achieve in reality, or at least some recommendations should be put in that section:

reflinks are not supported on a majority of systems. We should do something with data on those systems. Clean before copying the dir?
code - how do we copy and/or do modification to it? or do we copy only some JSON config? (which is a good solution that should be mentioned!)
pipeline - what are considerations/requirements to make it copiable?

dashohoxha · 2019-12-05T20:01:51Z

Regarding the "dirs" page, I agree that in practice there are certain "tricks" that are needed to make it work, but I don't see how they can be explained in a meaningful way without a concrete example.

dashohoxha · 2019-12-05T20:08:15Z

By the way, the new command dvc experiment that is being discussed might be a good way to encapsulate or hide most of the tricks and details that are needed for the case of experiment-by-directory to work. This way the user will not have to worry about them and we will not have to explain them :)

ghost

partially reviewed it, left some comments.

ghost · 2019-12-17T20:10:06Z

static/docs/user-guide/experiments/branches.md

+```dvc
+$ git commit -am 'Evaluate'
+$ dvc commit   # just to make sure all the data is committed
+$ git checkout -b unigrams


why not git branch unigrams?

Actually, git branch ~~is pretty new and~~ we don't use it in the docs. But we do use git checkout a lot. I would open a separate terminology-focused issue to decide whether to change all instances or not.

ghost · 2019-12-17T20:17:51Z

static/docs/user-guide/experiments/branches.md

+
+```dvc
+$ git checkout unigrams
+$ dvc checkout


why not git checkout -b bigrams unigrams (go to experiment of brigams based on the unigrams)

This would depend on your previous suggestion, #828 (comment) ? Perhaps you can replace these 2 comments for a single one for the full thing? 🙂

static/docs/user-guide/experiments/branches.md

ghost · 2019-12-17T20:26:09Z

static/docs/user-guide/experiments/branches.md

+
+```dvc
+$ git checkout bigrams
+$ git branch -c master old-master


what's the -c and -C for?

I couldn't follow up with a lot of branching going on 😅 would be great if you could add some comments

per #828 (comment) Co-Authored-By: Mr. Outis <mroutis@protonmail.com>

shcheklein · 2020-03-14T22:50:46Z

closing this as stale

Manage experiments

75f4c8e

shcheklein temporarily deployed to dvc-org-pr-828 November 29, 2019 13:37 Inactive

Add images

69cf0b4

shcheklein temporarily deployed to dvc-org-pr-828 November 29, 2019 13:45 Inactive

This was referenced Nov 29, 2019

user-guide: add "folders" way of experimentation #159

Closed

guide: Managing Experiments section(s) #816

Closed

weekly-digest bot mentioned this pull request Dec 1, 2019

Weekly Digest (24 November, 2019 - 1 December, 2019) #829

Closed

Index page and Tags

2492e6e

shcheklein temporarily deployed to dvc-org-pr-828 December 3, 2019 08:45 Inactive

Manage by branches

6b734fa

shcheklein temporarily deployed to dvc-org-pr-828 December 3, 2019 11:49 Inactive

Manage experiments by directories

dff0f05

shcheklein temporarily deployed to dvc-org-pr-828 December 3, 2019 21:44 Inactive

Manage experiments by different methods

ad6d737

shcheklein temporarily deployed to dvc-org-pr-828 December 4, 2019 10:21 Inactive

dashohoxha changed the title ~~[WIP] User Guide / Manage Experiments~~ User Guide / Manage Experiments Dec 4, 2019

dashohoxha requested review from efiop, a user, shcheklein, dmpetrov, pared, Suor and jorgeorpinel December 4, 2019 10:25

Improve wording

e7dfcbd

shcheklein temporarily deployed to dvc-org-pr-828 December 5, 2019 05:54 Inactive

pared approved these changes Dec 5, 2019

View reviewed changes

Update dirs.md

f891f8a

shcheklein temporarily deployed to dvc-org-pr-828 December 5, 2019 18:19 Inactive

Update tags.md

ebdf3a8

shcheklein temporarily deployed to dvc-org-pr-828 December 5, 2019 18:25 Inactive

Update branches.md

5ac4591

shcheklein temporarily deployed to dvc-org-pr-828 December 5, 2019 18:27 Inactive

shcheklein reviewed Dec 5, 2019

View reviewed changes

src/Documentation/sidebar.json Show resolved Hide resolved

shcheklein reviewed Dec 5, 2019

View reviewed changes

shcheklein requested changes Dec 5, 2019

View reviewed changes

weekly-digest bot mentioned this pull request Dec 8, 2019

Weekly Digest (1 December, 2019 - 8 December, 2019) #842

Closed

ghost reviewed Dec 17, 2019

View reviewed changes

Update static/docs/user-guide/experiments/branches.md

c165839

per #828 (comment) Co-Authored-By: Mr. Outis <mroutis@protonmail.com>

weekly-digest bot mentioned this pull request Dec 22, 2019

Weekly Digest (15 December, 2019 - 22 December, 2019) #878

Closed

shcheklein closed this Mar 14, 2020

efiop deleted the user-guide/experiments branch March 15, 2020 20:18

efiop restored the user-guide/experiments branch March 15, 2020 20:18

jorgeorpinel deleted the user-guide/experiments branch May 5, 2020 22:39


		### Move the best experiment on top

		Let's say that `baseline-experiment` has the best performance and we want to

		@@ -0,0 +1,26 @@
		# How to Manage Experiments by Several Methods


		## How it works

		If we have a directory named `experiment1/` which contains the pipeline of the

User Guide / Manage Experiments #828

User Guide / Manage Experiments #828

Conversation

dashohoxha commented Nov 29, 2019

ghost commented Dec 2, 2019

dashohoxha commented Dec 3, 2019

dashohoxha commented Dec 4, 2019

shcheklein commented Dec 4, 2019

dashohoxha commented Dec 5, 2019

pared left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shcheklein left a comment

Choose a reason for hiding this comment

dashohoxha commented Dec 5, 2019

dashohoxha commented Dec 5, 2019

ghost left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorgeorpinel Dec 17, 2019 • edited Loading

Choose a reason for hiding this comment

This comment was marked as resolved.

ghost Dec 17, 2019 • edited by jorgeorpinel Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shcheklein commented Mar 14, 2020

pared left a comment •

edited

Loading

jorgeorpinel Dec 17, 2019 •

edited

Loading

ghost Dec 17, 2019 •

edited by jorgeorpinel

Loading