iterative · dashohoxha · Nov 29, 2019 · Nov 29, 2019 · Dec 3, 2019 · Dec 3, 2019
diff --git a/src/Documentation/sidebar.json b/src/Documentation/sidebar.json
@@ -120,6 +120,29 @@
         "label": "Managing External Data",
         "slug": "managing-external-data"
       },
+      {
+        "label": "Managing Experiments",
+        "slug": "experiments",
+        "source": "experiments/index.md",
+        "children": [
+          {
+            "label": "Tags",
+            "slug": "tags"
+          },
+          {
+            "label": "Branches",
+            "slug": "branches"
+          },
+          {
+            "label": "Directories",
+            "slug": "dirs"
+          },
+          {
+            "label": "Mixed",
+            "slug": "mixed"
+          }
+        ]
+      },
       {
         "label": "Contributing",
         "slug": "contributing",

diff --git a/static/docs/user-guide/experiments/branches.md b/static/docs/user-guide/experiments/branches.md
@@ -0,0 +1,119 @@
+# How to Manage Experiments by Branches
+
+You can use a different Git branch for each experiment.
+
+<p align="center">
+<img src="/static/img/user-guide/experiments/branches.png" />
+</p>
+
+This is usually more flexible than managing experiments by tags, since you can
+easily base a new experiment on any of the previous experiments.
+
+## Examples
+
+An example of managing experiments by branches can be seen on the
+[Deep Dive Tutorial](https://dvc.org/doc/tutorials/deep/reproducibility).
+
+These interactive tutorials also manage experiments by branches:
+
+- [Pipelines](https://katacoda.com/dvc/courses/tutorials/pipelines) - Using DVC
+  commands to build a simple ML pipeline.
+- [MNIST](https://katacoda.com/dvc/courses/tutorials/mnist) - Classify images of
+  hand-written digits using the MNIST dataset.
+
+## How it works
+
+### Commit and branch
+
+Let's say that we are working on the branch `master` and at the end of the
+experiment we want to save it on a branch named `unigrams`. We can do it like
+this:
+
+```dvc
+$ git commit -am 'Evaluate'
+$ dvc commit   # just to make sure all the data are committed
+$ git checkout -b unigrams
+$ git checkout master
+```
+
+Now we can continue working on `master` for another experiment. When we are done
+we can create another branch for it same as above.
+
+### New experiment based on another one
+
+Suppose that we want to start a new experiment based on another one, instead of
+starting from `master`. We can switch first to that branch and then start a new
+experiment on top of it:
+
+```dvc
+$ git checkout unigrams
+$ dvc checkout
+$ git checkout -b bigrams
+```
+
+Now we can continue to make the necessary changes for the bigrams experiment.
+
+### Compare the metrics
+
+To find out which experiment has the best performance (the best metrics) we use
+the command `dvc metrics show` with the option `-a, --all-branches`:
+
+```dvc
+$ dvc metrics show -a
+
+bigrams:
+	data/eval.txt: AUC: 0.624727
+
+unigrams:
+	data/eval.txt: AUC: 0.624652
+```
+
+### Check out an experiment
+
+Let's list first all the branches:
+
+```dvc
+$ git branch -a
+bigrams
+unigrams
+...
+```
+
+To switch to the experiment `unigrams` we can do:
+
+```dvc
+$ git checkout unigrams
+$ dvc checkout
+```
+
+Switching back to `master`:
+
+```dvc
+$ git checkout master
+$ dvc checkout
+```
+
+In any case the command `dvc repro` should not have to re-run anything and
+should finish quickly, if all the data of the experiments have been committed
+properly.
+
+### Move the best experiment to master
+
+> Usually it is not necessary to move the best experiment to master, since we
+> can easily switch to any of the branches.
+
+What we usually want is to completely replace the master branch with the
+experiment branch. Using `git merge` is not the best option in such a situation
+since it will usually result into a mixture between the two branches (the master
+branch and the experiment branch). Instead we should copy the branches, like
+this:
+
+```dvc
+$ git checkout bigrams
+$ git branch -c master old-master
 keyword: 
 keyword: 
+$ git branch -C bigrams master
+$ git push -f origin master
+$ git branch -D old-master
+$ git checkout master
+$ git diff bigrams
+```
diff --git a/static/docs/user-guide/experiments/dirs.md b/static/docs/user-guide/experiments/dirs.md
@@ -0,0 +1,97 @@
+# How to Manage Experiments by Directories
+
+Using a separate directory for each experiment is the most intuitive solution
+for managing experiments and is the first thing that comes to mind. Most of
+DS/ML practitioners are already familiar with this approach.
+
+<p align="center">
+<img src="/static/img/user-guide/experiments/dirs.png" />
+</p>
+
+This approach is most suitable when the different experiments that are being
+managed do not have significant differences in their code or the pipeline, but
+maybe change on the input datasets, processing parameters, configuration
+settings, etc.
+
+Often it is possible to generate these experiment directories automatically (or
+almost automatically) from the code of the main project (using the parameters or
+configuration settings), so keeping them in Git is not interesting or useful.
+What we would like to track instead are just the parameters that were used to
+generate the experiment directory and the results of the evaluation (metrics),
+so that we can figure out which parameters give the best results.
+
+## Examples
+
+There is a very basic example of using directories for each experiment at the
+end of
+[this interactive tutorial](https://katacoda.com/dvc/courses/basics/pipelines).
+
+## How it works
+
+If we have a directory named `experiment1/` which contains the pipeline of the
+first experiment, and we want to create another experiment on `experiment2/`,
+which is based on the first one, often it is as easy as:
+
+```dvc
+$ cp --reflink -R experiment1/ experiment2/
+```
+
+Then we can continue with modifying `experiment2/`, and finally we can produce
+its results with:
+
+```dvc
+$ dvc repro -R experiment2/
+```
+
+The most important DVC commands, like `dvc commit`, `dvc checkout`, `dvc repro`,
+`dvc pull`, `dvc push`, etc. can take the option `-R, --recursive` which is very
+convenient for experiment directories.
+
+The command `dvc metrics show` as well can take this option:
+
+```dvc
+$ dvc metrics show -R experiment2/
+```
+
+However, if we use just `dvc merics show`, without any options or targets, it
+will show the metrics of all the experiments, so that we can compare them.
+
+Deleting an experiment is as easy as:
+
+```dvc
+$ rm -rf experiment2/
+```
+
+However we should make sure to save first the parameters that we used for this
+experiment and its metrics (results).
+
+<details>
+
+### Tip: Use a script to create experiments
+
+When we build a pipeline we have to use some long `dvc run` commands, with lots
+of options, to define stages. Doing all this manually is long and tedious and
+error-prone. The recommended Linux practice in such cases is to record all the
+commands in a bash script, which can then be used to build the whole pipeline at
+once.
+
+Some of the benefits of this approach are these:
+
+- Typing mistakes while building the pipeline are avoided.
+- Modification of the pipeline becomes easier and consistent (for example using
+  find/replace).
+- Building pipelines becomes flexible (for example bash variables can be used).
+- Pipelines become reusable (other projects can copy/paste and customize them)
+
+Using a script to create a pipeline is also very convenient when we want to
+manage experiments with directories, because it allows us to customize the
+experiment based on some options and parameters that we pass to the script.
+
+This can further automate the process of creating a new experiment, producing
+its results, saving them, and finally deleting the experiment directory. This
+way we can automatically iterate for example over a large number of
+hyper-parameters and save the corresponding results.
+
+The implementation details actually depend on the specifics of each project.
+
+</details>
diff --git a/static/docs/user-guide/experiments/index.md b/static/docs/user-guide/experiments/index.md
@@ -0,0 +1,22 @@
+# Managing Experiments
+
+Data science process is inherently iterative and R&D like. Data scientist may
+try many different approaches, different hyper-parameter values, and "fail" many
+times before the required level of a metric is achieved. Even failed experiments
+can be a useful source of information in ML.
+
+DVC makes it easy to iterate on your project, providing ways to try different
+ideas, keep track of them, switch back and forth, compare their performance
+through metrics, and find the best experiment. It stores all the context
+necessary to reproduce easily and efficiently an experiment: data, pipeline
+stages, parameters, models, etc. That way, someone else (or you yourself 3
+months from now) can check out and inspect all the details of an experiment.
+
+You can use several ways to manage experiments, which are described on this
+section. Which one is more suitable for you depends on your preferences and also
+on the kind and complexity of your project.
+
+- [How to Manage Experiments by Tags](/doc/user-guide/experiments/tags)
+- [How to Manage Experiments by Branches](/doc/user-guide/experiments/branches)
+- [How to Manage Experiments by Directories](/doc/user-guide/experiments/dirs)
+- [How to Manage Experiments by Several Methods](/doc/user-guide/experiments/mixed)
diff --git a/static/docs/user-guide/experiments/mixed.md b/static/docs/user-guide/experiments/mixed.md
@@ -0,0 +1,26 @@
+# How to Manage Experiments by Several Methods
+
+On complex projects you can use a combination of the methods that we have seen
+so far, in order to manage experiments.
+
+<p align="center">
+<img src="/static/img/user-guide/experiments/mixed.png" />
+</p>
+
+If you want to change different aspects of your ML pipeline, like input
+datasets, featurization, learning algorithm, hyper-parameters, etc. you can
+manage these changes with different methods. For example let's say that you
+create a different branch for each learning algorithm, and a tag for each input
+dataset or featurization. Then you can create different experiment directories
+for different hyper-parameters.
+
+There is no standard solution that fits all the cases. The way that you might
+combine the different experiment management methods depends on the concrete
+problem that you are trying to solve and the details of the project.
+
+In order to compare all the experiments, you can use the options
+`-a, --all-branches` and `-T, --all-tags`, like this:
+
+```dvc
+$ dvc metrics show -aT
+```