# Install and init DVC

Prerequisites: 
-  DVC and requirements.txt packages installed (if not - check README.md file for instructions)
-  A project repository is a Git repo 



## Install with pip

In [1]:
!pip install "dvc==1.0.2"



## Checkout branch `tutorial`

In [1]:
!git checkout -b dvc-tutorial

M	params.yaml
M	src/evaluate.py
Переключено на новую ветку «dvc-tutorial»


## Initialize DVC

References: 
- https://dvc.org/doc/get-started/initialize 

In [31]:
!dvc init


You can now commit the changes to git.

[31m+---------------------------------------------------------------------+
[39m[31m|[39m                                                                     [31m|[39m
[31m|[39m        DVC has enabled anonymous aggregate usage analytics.         [31m|[39m
[31m|[39m     Read the analytics documentation (and how to opt-out) here:     [31m|[39m
[31m|[39m              [34mhttps://dvc.org/doc/user-guide/analytics[39m               [31m|[39m
[31m|[39m                                                                     [31m|[39m
[31m+---------------------------------------------------------------------+
[39m
[33mWhat's next?[39m
[33m------------[39m
- Check out the documentation: [34mhttps://dvc.org/doc[39m
- Get help and share ideas: [34mhttps://dvc.org/chat[39m
- Star us on GitHub: [34mhttps://github.com/iterative/dvc[39m
[0m

## Commit changes

In [32]:
%%bash

git add .
git commit -m "Initialize DVC"

[dvc-tutorial 553878c] Initialize DVC
 6 files changed, 128 insertions(+)
 create mode 100644 .dvc/.gitignore
 create mode 100644 .dvc/config
 create mode 100644 .dvc/plots/confusion.json
 create mode 100644 .dvc/plots/default.json
 create mode 100644 .dvc/plots/scatter.json
 create mode 100644 .dvc/plots/smooth.json


# Build automated pipelines

## Create `data_load` stage


In [33]:
!mkdir -p data

In [34]:
!dvc run -n data_load \
    -d src/data_load.py \
    -o data/iris.csv \
    -o data/classes.json \
    -p data_load \
    python src/data_load.py \
        --config=params.yaml

Running stage 'data_load' with command:                                         
	python src/data_load.py --config=params.yaml
Creating 'dvc.yaml'                                                             
Adding stage 'data_load' in 'dvc.yaml'
Generating lock file 'dvc.lock'

To track the changes with git, run:

	git add dvc.yaml dvc.lock
[0m

In [35]:
%%bash

du -sh data/*

4,0K	data/classes.json
4,0K	data/iris.csv


In [7]:
!tree -I venv-dvc-3-automate-experiments

[01;34m.[00m
├── [01;34mdata[00m
│   ├── classes.json
│   └── iris.csv
├── dvc-3-automate-experiments.ipynb
├── dvc.lock
├── dvc.yaml
├── params.yaml
├── README.md
├── requirements.txt
└── [01;34msrc[00m
    ├── data_load.py
    ├── evaluate.py
    ├── featurization.py
    ├── __init__.py
    ├── split_dataset.py
    └── train.py

2 directories, 14 files


## dvc.yaml

In [8]:
!cat dvc.yaml

stages:
  data_load:
    cmd: python src/data_load.py --config=params.yaml
    deps:
    - src/data_load.py
    params:
    - data_load
    outs:
    - data/classes.json
    - data/iris.csv


## params.yaml

In [10]:
!cat params.yaml


data_load:
  raw_data_path: data/iris.csv
  classes_names_path: data/classes.json

featurize:
  features_path: data/iris_featurized.csv
  target_column: target


data_split:
  test_size: 0.2
  train_path: data/train.csv
  test_path: data/test.csv


train:
  model_path: data/model.joblib


evaluate:
  metrics_file: data/metrics.json
  confusion_matrix: data/cm.json


## Reproduce a pipeline

In [11]:
!dvc repro

Stage 'data_load' didn't change, skipping                                       
Data and pipelines are up to date.
[0m

## Change params.yaml and reproduce 

Add a new line into `data_load` section:
    `dummy_param: dummy_value`

In [12]:
!dvc repro

Running stage 'data_load' with command:                                         
	python src/data_load.py --config=params.yaml
Updating lock file 'dvc.lock'                                                   

To track the changes with git, run:

	git add dvc.lock
[0m

## Check params diff

In [13]:
!dvc params diff

Path         Param                  Old    New                                  
params.yaml  data_load.dummy_param  None   dummy_value
[0m

# Build end-to-end Machine Learning pipeline
Stages 
- extract features 
- split dataset 
- train 
- evaluate 


## Add feature extraction stage

In [36]:
!dvc run -n feature_extraction \
    -d src/featurization.py \
    -d data/iris.csv \
    -o data/iris_featurized.csv \
    -p data_load,featurize \
    python src/featurization.py \
        --config=params.yaml

Running stage 'feature_extraction' with command:                                
	python src/featurization.py --config=params.yaml
Adding stage 'feature_extraction' in 'dvc.yaml'                                 
Updating lock file 'dvc.lock'

To track the changes with git, run:

	git add dvc.lock dvc.yaml
[0m

In [37]:
!ls 

data				  dvc.yaml     requirements.txt
dvc-3-automate-experiments.ipynb  params.yaml  src
dvc.lock			  README.md    venv-dvc-3-automate-experiments


In [38]:
!cat dvc.yaml

stages:
  data_load:
    cmd: python src/data_load.py --config=params.yaml
    deps:
    - src/data_load.py
    params:
    - data_load
    outs:
    - data/classes.json
    - data/iris.csv
  feature_extraction:
    cmd: python src/featurization.py --config=params.yaml
    deps:
    - data/iris.csv
    - src/featurization.py
    params:
    - data_load
    - featurize
    outs:
    - data/iris_featurized.csv


In [39]:
import pandas as pd

features = pd.read_csv('data/iris_featurized.csv')
features.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,target
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0


In [40]:
!git status -s

[31m??[m dvc.lock
[31m??[m dvc.yaml


In [41]:
%%bash
git add .
git commit -m "Add stage features_extraction"

[dvc-tutorial 0f3109c] Add stage features_extraction
 2 files changed, 53 insertions(+)
 create mode 100644 dvc.lock
 create mode 100644 dvc.yaml


## Add split train/test stage

In [42]:
!dvc run -n split_dataset \
    -d src/split_dataset.py \
    -d data/iris_featurized.csv \
    -o data/train.csv \
    -o data/test.csv \
    -p featurize,data_split \
        python src/split_dataset.py \
            --config=params.yaml

Running stage 'split_dataset' with command:                                     
	python src/split_dataset.py --config=params.yaml
Adding stage 'split_dataset' in 'dvc.yaml'                                      
Updating lock file 'dvc.lock'

To track the changes with git, run:

	git add dvc.yaml dvc.lock
[0m

In [43]:
!cat dvc.yaml

stages:
  data_load:
    cmd: python src/data_load.py --config=params.yaml
    deps:
    - src/data_load.py
    params:
    - data_load
    outs:
    - data/classes.json
    - data/iris.csv
  feature_extraction:
    cmd: python src/featurization.py --config=params.yaml
    deps:
    - data/iris.csv
    - src/featurization.py
    params:
    - data_load
    - featurize
    outs:
    - data/iris_featurized.csv
  split_dataset:
    cmd: python src/split_dataset.py --config=params.yaml
    deps:
    - data/iris_featurized.csv
    - src/split_dataset.py
    params:
    - data_split
    - featurize
    outs:
    - data/test.csv
    - data/train.csv


In [44]:
%%bash
git add .
git commit -m "Add stage split_dataset"

[dvc-tutorial 408c149] Add stage split_dataset
 2 files changed, 32 insertions(+)


## Add train stage

In [45]:
!dvc run -n train \
    -d src/train.py \
    -d data/train.csv \
    -o data/model.joblib \
    -p data_split,train \
        python src/train.py \
            --config=params.yaml

Running stage 'train' with command:                                             
	python src/train.py --config=params.yaml
Adding stage 'train' in 'dvc.yaml'                                              
Updating lock file 'dvc.lock'

To track the changes with git, run:

	git add dvc.yaml dvc.lock
[0m

In [46]:
!cat dvc.yaml

stages:
  data_load:
    cmd: python src/data_load.py --config=params.yaml
    deps:
    - src/data_load.py
    params:
    - data_load
    outs:
    - data/classes.json
    - data/iris.csv
  feature_extraction:
    cmd: python src/featurization.py --config=params.yaml
    deps:
    - data/iris.csv
    - src/featurization.py
    params:
    - data_load
    - featurize
    outs:
    - data/iris_featurized.csv
  split_dataset:
    cmd: python src/split_dataset.py --config=params.yaml
    deps:
    - data/iris_featurized.csv
    - src/split_dataset.py
    params:
    - data_split
    - featurize
    outs:
    - data/test.csv
    - data/train.csv
  train:
    cmd: python src/train.py --config=params.yaml
    deps:
    - data/train.csv
    - src/train.py
    params:
    - data_split
    - train
    outs:
    - data/model.joblib


In [47]:
%%bash
git add .
git commit -m "Add stage train"

[dvc-tutorial f186366] Add stage train
 2 files changed, 28 insertions(+)


## Add evaluate stage

In [48]:
!dvc run -n evaluate \
    -d src/evaluate.py \
    -d data/test.csv \
    -d data/model.joblib \
    -d data/classes.json \
    -m data/metrics.json \
    --plots data/cm.csv \
    -p data_load,data_split,train,evaluate \
        python src/evaluate.py \
            --config=params.yaml

Running stage 'evaluate' with command:                                          
	python src/evaluate.py --config=params.yaml
Adding stage 'evaluate' in 'dvc.yaml'                                           
Updating lock file 'dvc.lock'

To track the changes with git, run:

	git add dvc.lock dvc.yaml
[0m

In [49]:
!cat dvc.yaml

stages:
  data_load:
    cmd: python src/data_load.py --config=params.yaml
    deps:
    - src/data_load.py
    params:
    - data_load
    outs:
    - data/classes.json
    - data/iris.csv
  feature_extraction:
    cmd: python src/featurization.py --config=params.yaml
    deps:
    - data/iris.csv
    - src/featurization.py
    params:
    - data_load
    - featurize
    outs:
    - data/iris_featurized.csv
  split_dataset:
    cmd: python src/split_dataset.py --config=params.yaml
    deps:
    - data/iris_featurized.csv
    - src/split_dataset.py
    params:
    - data_split
    - featurize
    outs:
    - data/test.csv
    - data/train.csv
  train:
    cmd: python src/train.py --config=params.yaml
    deps:
    - data/train.csv
    - src/train.py
    params:
    - data_split
    - train
    outs:
    - data/model.joblib
  evaluate:
    cmd: python src/evaluate.py --config=params.yaml
    deps:
    - data/classes.json
    - data/model.jobl

In [50]:
%%bash
git add .
git commit -m "Add stage evaluate"

[dvc-tutorial e72effe] Add stage evaluate
 2 files changed, 46 insertions(+)


# Experimenting with reproducible pipelines

## How reproduce experiments?

> The most exciting part of DVC is reproducibility.
>> Reproducibility is the time you are getting benefits out of DVC instead of spending time defining the ML pipelines.

> DVC tracks all the dependencies, which helps you iterate on ML models faster without thinking what was affected by your last change.
>> In order to track all the dependencies, DVC finds and reads ALL the DVC-files in a repository and builds a dependency graph (DAG) based on these files.

> This is one of the differences between DVC reproducibility and traditional Makefile-like build automation tools (Make, Maven, Ant, Rakefile etc). It was designed in such a way to localize specification of DAG nodes.
If you run repro on any created DVC-file from our repository, nothing happens because nothing was changed in the defined pipeline.

(c) dvc.org https://dvc.org/doc/tutorial/reproducibility

In [51]:
# Nothing to reproduce
!dvc repro

Stage 'data_load' didn't change, skipping                                       
Stage 'feature_extraction' didn't change, skipping
Stage 'split_dataset' didn't change, skipping
Stage 'train' didn't change, skipping
Stage 'evaluate' didn't change, skipping
Data and pipelines are up to date.
[0m

## Experiment 1: Add features



### Create new experiment branch

Before editing the code/featurization.py file, please create and checkout a new branch __ratio_features__

In [52]:
# create new branch

!git checkout -b exp1-ratio-features
!git branch

Переключено на новую ветку «exp1-ratio-features»
  dev[m
  dev-update-pipelines[m
  dvc-tutorial[m
* [32mexp1-ratio-features[m
  master[m
  update-software[m


### Update featurization.py

in file __featurization.py__  in function`get_features()` after line 

```python
    features = dataset.copy()
```

add lines:

```python
    features['sepal_length_to_sepal_width'] = features['sepal_length'] / features['sepal_width']
    features['petal_length_to_petal_width'] = features['petal_length'] / features['petal_width']
```

### Reproduce pipeline 

In [53]:
!dvc repro

Stage 'data_load' didn't change, skipping                                       
Running stage 'feature_extraction' with command:
	python src/featurization.py --config=params.yaml
Updating lock file 'dvc.lock'                                                   

Running stage 'split_dataset' with command:
	python src/split_dataset.py --config=params.yaml
Updating lock file 'dvc.lock'                                                   

Running stage 'train' with command:
	python src/train.py --config=params.yaml
Updating lock file 'dvc.lock'                                                   

Running stage 'evaluate' with command:
	python src/evaluate.py --config=params.yaml
Updating lock file 'dvc.lock'                                                   

To track the changes with git, run:

	git add dvc.lock
[0m

In [54]:
# Check features used in this pipeline

import pandas as pd

features = pd.read_csv('data/iris_featurized.csv')
features.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,target,sepal_length_to_sepal_width,petal_length_to_petal_width
0,5.1,3.5,1.4,0.2,0,1.457143,7.0
1,4.9,3.0,1.4,0.2,0,1.633333,7.0
2,4.7,3.2,1.3,0.2,0,1.46875,6.5
3,4.6,3.1,1.5,0.2,0,1.483871,7.5
4,5.0,3.6,1.4,0.2,0,1.388889,7.0


In [55]:
!git status

На ветке exp1-ratio-features
Изменения, которые не в индексе для коммита:
  (используйте «git add <файл>…», чтобы добавить файл в индекс)
  (используйте «git checkout -- <файл>…», чтобы отменить изменения
   в рабочем каталоге)

	[31mизменено:      dvc-3-automate-experiments.ipynb[m
	[31mизменено:      dvc.lock[m
	[31mизменено:      src/featurization.py[m

нет изменений добавленных для коммита
(используйте «git add» и/или «git commit -a»)


In [56]:
# Get difference with metric from previous pipeline
!dvc metrics diff --all

Path               Metric    Value    Change                                    
data/metrics.json  f1_score  0.15385  0.0
[0m

In [57]:
!git add .
!git commit -m "Experiment with new features"
!git tag -a "exp1_ratio_features" -m "Experiment with new features"

[exp1-ratio-features ae27d9e] Experiment with new features
 3 files changed, 46 insertions(+), 62 deletions(-)
fatal: метка «exp1_ratio_features» уже существует


## Experiment 2: Use SVM

### Create new experiment branch

In [58]:
!git checkout -b exp2-svm
!git branch

Переключено на новую ветку «exp2-svm»
  dev[m
  dev-update-pipelines[m
  dvc-tutorial[m
  exp1-ratio-features[m
* [32mexp2-svm[m
  master[m
  update-software[m


### Update train.py

in file __train.py__ replace line

```python
    clf = LogisticRegression(C=0.00001, solver='lbfgs', multi_class='multinomial', max_iter=100)
```

with line

```python
    clf = SVC(C=0.01, kernel='linear', gamma='scale', degree=5)
```


### Reproduce pipeline 

In [59]:
!dvc repro

Stage 'data_load' didn't change, skipping                                       
Running stage 'feature_extraction' with command:
	python src/featurization.py --config=params.yaml
Updating lock file 'dvc.lock'                                                   

Stage 'split_dataset' didn't change, skipping
Running stage 'train' with command:
	python src/train.py --config=params.yaml
Updating lock file 'dvc.lock'                                                   

Running stage 'evaluate' with command:
	python src/evaluate.py --config=params.yaml
Updating lock file 'dvc.lock'                                                   

To track the changes with git, run:

	git add dvc.lock
[0m

In [60]:
!git status

На ветке exp2-svm
Изменения, которые не в индексе для коммита:
  (используйте «git add <файл>…», чтобы добавить файл в индекс)
  (используйте «git checkout -- <файл>…», чтобы отменить изменения
   в рабочем каталоге)

	[31mизменено:      dvc.lock[m
	[31mизменено:      src/featurization.py[m
	[31mизменено:      src/train.py[m

нет изменений добавленных для коммита
(используйте «git add» и/или «git commit -a»)


In [61]:
# Get difference with metric from previous pipeline
!dvc metrics diff --all

Path               Metric    Value    Change                                    
data/metrics.json  f1_score  1.0      0.84615
[0m

In [63]:
!git add .
!git commit -m "Experiment 2 with SVM estimator"
!git tag -a "exp2_svm" -m "Experiment 2 with SVM estimator"

На ветке exp2-svm
нечего коммитить, нет изменений в рабочем каталоге


## Params diffs 

In [64]:
# Get params diffs 

!dvc params diff

[0m                                                                            

In [65]:
# Compare parameters with a specific commit, a tag or any revision

!dvc params diff --all

Path         Param                         Old                       New        
params.yaml  data_load.classes_names_path  data/classes.json         data/classes.json
params.yaml  data_load.raw_data_path       data/iris.csv             data/iris.csv
params.yaml  data_split.test_path          data/test.csv             data/test.csv
params.yaml  data_split.test_size          0.2                       0.2
params.yaml  data_split.train_path         data/train.csv            data/train.csv
params.yaml  evaluate.confusion_matrix     data/cm.csv               data/cm.csv
params.yaml  evaluate.metrics_file         data/metrics.json         data/metrics.json
params.yaml  featurize.features_path       data/iris_featurized.csv  data/iris_featurized.csv
params.yaml  featurize.target_column       target                    target
params.yaml  train.model_path              data/model.joblib         data/model.joblib
[0m

In [44]:
# To see the difference between two specific commits, both need to be specified:

!dvc params diff e12b167 HEAD^

[31mERROR[39m: failed to show params diff - unknown Git revision 'e12b167'    

[33mHaving any troubles?[39m Hit us up at [34mhttps://dvc.org/support[39m, we are always happy to help!
[0m

## Experiment 3: Tune Logistic Regression

### Create a new experiment branch

In [66]:
# create new branch for experiment

!git checkout -b exp3-tuning-logreg
!git branch

Переключено на новую ветку «exp3-tuning-logreg»
  dev[m
  dev-update-pipelines[m
  dvc-tutorial[m
  exp1-ratio-features[m
  exp2-svm[m
* [32mexp3-tuning-logreg[m
  master[m
  update-software[m


In [67]:
!dvc metrics show

	data/metrics.json:                                                             
		f1_score: 1.0
[0m

In [68]:
# Nothing to reproduce since code was checked out by `git checkout`
# and data files were checked out by `dvc checkout`
!dvc repro

Stage 'data_load' didn't change, skipping                                       
Stage 'feature_extraction' didn't change, skipping
Stage 'split_dataset' didn't change, skipping
Stage 'train' didn't change, skipping
Stage 'evaluate' didn't change, skipping
Data and pipelines are up to date.
[0m

### Tuning parameters

in file __train.py__ :

replace line:
```python
    clf = SVC(C=0.01, kernel='linear', gamma='scale', degree=5)
```
with line:

```python
    clf = LogisticRegression(C=0.1, solver='newton-cg', multi_class='multinomial', max_iter=100)
```
__Note__: here we changed logistic regresssion hyperparameters: C  to 0.1 and solver to newton-cg


https://dvc.org/doc/tutorials/get-started/experiments#tuning-parameters

### Reproduce pipelines

In [69]:
# re-run pipeline 

!dvc repro

Stage 'data_load' didn't change, skipping                                       
Stage 'feature_extraction' didn't change, skipping
Stage 'split_dataset' didn't change, skipping
Running stage 'train' with command:
	python src/train.py --config=params.yaml
Updating lock file 'dvc.lock'                                                   

Running stage 'evaluate' with command:
	python src/evaluate.py --config=params.yaml
Updating lock file 'dvc.lock'                                                   

To track the changes with git, run:

	git add dvc.lock
[0m

In [70]:
# Get difference with metric from previous pipeline
!cat data/metrics.json

{"f1_score": 1.0}

In [71]:
!dvc metrics show -a

workspace:                                                                      
	data/metrics.json:
		f1_score: 1.0
dvc-tutorial:
	data/metrics.json:
		f1_score: 0.15384615384615383
exp1-ratio-features:
	data/metrics.json:
		f1_score: 0.15384615384615383
exp2-svm, exp3-tuning-logreg:
	data/metrics.json:
		f1_score: 1.0
[0m

In [72]:
!dvc metrics diff --all

Path               Metric    Value    Change                                    
data/metrics.json  f1_score  1.0      0.0
[0m

### Commit

In [73]:
%%bash

git add .
git commit -m "Tune model. LogisticRegression. C=0.1, solver=newton-cg"
git tag -a "exp3_tuning_logreg" -m "Tune model. LogisticRegression. C=0.1, solver=newton-cg"

[exp3-tuning-logreg 32c96af] Tune model. LogisticRegression. C=0.1, solver=newton-cg
 3 files changed, 54 insertions(+), 38 deletions(-)


### Merge the model to dvc-tutorial

In [74]:
%%bash

git checkout dvc-tutorial
git merge exp3-tuning-logreg

Обновление e72effe..32c96af
Fast-forward
 dvc-3-automate-experiments.ipynb | 129 +++++++++++++++++++--------------------
 dvc.lock                         |  24 ++++----
 src/featurization.py             |   3 +
 src/train.py                     |   2 +-
 4 files changed, 79 insertions(+), 79 deletions(-)


Переключено на ветку «dvc-tutorial»


# Compare experiment results

## List metrics for all runs (experiments)

In [75]:
# this pipeline metrics 

!dvc metrics show

	data/metrics.json:                                                             
		f1_score: 1.0
[0m

In [76]:
# show all commited pipelines metrics (all branch and tags)

!dvc metrics show -a -T

workspace:                                                                      
	data/metrics.json:
		f1_score: 1.0
dvc-tutorial, exp3-tuning-logreg:
	data/metrics.json:
		f1_score: 1.0
exp1-ratio-features:
	data/metrics.json:
		f1_score: 0.15384615384615383
exp2-svm:
	data/metrics.json:
		f1_score: 1.0
exp2_svm:
	data/metrics.json:
		f1_score: 1.0
exp3_tuning_logreg:
	data/metrics.json:
		f1_score: 1.0
[0m

## Compare metrics (get differences)

`Команда`
```bash
dvc metrics diff
```

Особенности:

* просто `dvc metrics diff` выдает разницу между текущей метрикой и метрикой в последнем коммите; если текущая и метрика в последнием коммите совпадают, то разницы нет, и она не выводится:

In [77]:
!dvc metrics diff

[0m                                                                            

* чтобы нулевая разница выводилась, необходимо добавить опцию --all - вывести сравненеи даже неизменных метрик:

In [78]:
!dvc metrics diff --all

Path               Metric    Value    Change                                    
data/metrics.json  f1_score  1.0      0.0
[0m

* чтобы сравнить текущую метрики из текущего коммита и из другого, нужно указать другой (old) коммит:

In [79]:
# Equivalent to `!dvc metrics diff exp1-ratio-features dvc-tutorial`, because dvc-tutorial - current branch
!dvc metrics diff exp1-ratio-features

Path               Metric    Value    Change                                    
data/metrics.json  f1_score  1.0      0.84615
[0m

* для сравнения любых двух коммитов небходимо указать оба в порядке old new:

In [80]:
# Compare old - exp1 and new exp2
!dvc metrics diff exp1-ratio-features exp2-svm

Path               Metric    Value    Change                                    
data/metrics.json  f1_score  1.0      0.84615
[0m

* чтобы выводить не только новую, но и старую метрики, нужно добавить опцию --old

In [81]:
# Here Old metrics is from exp1 and New metric - from exp2
!dvc metrics diff --old exp1-ratio-features exp2-svm

Path               Metric    Old      New    Change                             
data/metrics.json  f1_score  0.15385  1.0    0.84615
[0m

### Compare all experiments with each other

In [82]:
# All experiments
experiments = ['exp1-ratio-features', 'exp2-svm', 'exp3-tuning-logreg', 'dvc-tutorial']

# Make pairs (with repeats) of all experiments
for i in range(len(experiments)):
    for j in range(i + 1, len(experiments)):
        
        old_exp = experiments[i]
        new_exp = experiments[j]
        
        print(f'\n{old_exp} vs {new_exp}')
        
        # Get diffenece between metrics from old and experiments
        !dvc metrics diff --old --all {old_exp} {new_exp} 


exp1-ratio-features vs exp2-svm
Path               Metric    Old      New    Change                             
data/metrics.json  f1_score  0.15385  1.0    0.84615
[0m
exp1-ratio-features vs exp3-tuning-logreg
Path               Metric    Old      New    Change                             
data/metrics.json  f1_score  0.15385  1.0    0.84615
[0m
exp1-ratio-features vs dvc-tutorial
Path               Metric    Old      New    Change                             
data/metrics.json  f1_score  0.15385  1.0    0.84615
[0m
exp2-svm vs exp3-tuning-logreg
Path               Metric    Old    New    Change                               
data/metrics.json  f1_score  1.0    1.0    0.0
[0m
exp2-svm vs dvc-tutorial
Path               Metric    Old    New    Change                               
data/metrics.json  f1_score  1.0    1.0    0.0
[0m
exp3-tuning-logreg vs dvc-tutorial
Path               Metric    Old    New    Change                               
data/metrics.json  f1_score  1.0  

## Plots

`Команда`
```bash
dvc plots [show|diff]
```

Построенный график сохраняется в html-файл.

In [83]:
from IPython.display import IFrame

### Show

Строит график по переданному файлу метрик. 
Здесь используем стандратный шаблон `confusion` для вывод `confusion_matrix`. 

In [84]:
!dvc plots show  --template confusion "data/cm.csv" -x actual -y predicted -o data/plots-show.html

file:///home/alex/Dev/Projects/tutorials-dvc/dvc-3-automate-experiments/data/plots-show.html
[0m

In [86]:
IFrame(src='data/plots-show.html', width=300, height=300)

### Diff

Строит графики метрик для разных коммитов в одной системе координат. 
Это возможно только тогда, когда в каких-нибудь этапах файл(ы) объявлен(ы) как plots. 

Здесь такой этап - `evaluate`:

```
!dvc run -n evaluate \
    -d src/evaluate.py \
    -d data/test.csv \
    -d data/model.joblib \
    -d data/classes.json \
    -m data/metrics.json \
    --plots data/cm.csv \
    -p data_load,data_split,train,evaluate \
        python src/evaluate.py \
            --config=params.yaml
```

файл `data/cm.json` объявлен как `plots`

In [87]:
# Build metircs plots for all 3 experiments
!dvc plots diff -o data/plots-diff.html exp1-ratio-features exp2-svm exp3-tuning-logreg

file:///home/alex/Dev/Projects/tutorials-dvc/dvc-3-automate-experiments/data/plots-diff.html
[0m

In [88]:
IFrame(src='data/plots-diff.html', width=500, height=600)