In [1]:
%cd /home/dvc-2-iris-demo

/home/dvc-2-iris-demo


# About

Here we will create experiments with different configuration and save them as a tags.   
Also we will try to add new features - new experiment.

Then it will be possible to show and compare metrics for different experiments using _DVC_

NOTES:
- Make sure that you have performed the step 3 and created DVC pipelines

# Experiments

### Overview pipeline_config.yml

In [2]:
s = open('config/pipeline_config.yml').read()
print(s)

project: 7labs/dvc-2-iris-demo
name: vision
tags: [solution-0-prototype, dev]



dataset:

  # random state for train/test split
  random_state: 42
  # source dataset
  dataset_csv: data/raw/iris.csv
  featured_dataset_csv: data/interim/featured_iris.csv
  train_csv: data/processed/train_iris.csv
  test_csv: data/processed/test_iris.csv
  test_size: 0.2
  features_columns_range: ['sepal_length', 'petal_length_to_petal_width']
  target_column: species


train:
  # available estimators:
  #     logreg (sklearn.linear_model.LogisticRegression),
  #     smv(sklearn.svm.SVC),
  #     knn(sklearn.neighbors.KNeighborsClassifier)
  estimator_name: 'knn'
  # params of GridSearchCV constructor
  grid_search_cv_config:
    # grid of estimator parameters (see in https://scikit-learn.org/ for specific estimator
    param_grid:
      n_neighbors: [5,10,15]
      leaf_size: [30,60,90]
      p: [1,2]
    cv: 10


evaluate:
  metrics_file: eval.txt


model:

  model_name: model.joblib
  models_folder: 

# Experiment 1 - Tune LogisticRegression

#### 1) create branch for experiment

In [41]:
!git checkout -b exp1-tune-logreg
!git branch

fatal: A branch named 'exp1-tune-logreg' already exists.
  dev[m
  exp1-logreg[m
  exp1-tune-logreg[m
  master[m
* [32mnew-branch[m


#### 2) update config/pipeline_config.yml file: add options for __C__ hyperparamter in __logreg__:__param_grid__ section__

```yaml
...
        param_grid:
              C: [0.1,1.0,10]
...
```

as result you should have LogisticRegression config:

```yaml
...
train:
  cv: 3
  estimator_name: logreg

  estimators:

    logreg: # sklearn.linear_model.LogisticRegression
      param_grid: # params of GridSearchCV constructor
        C: [0.001, 0.01]
        max_iter: [100]
        solver: ['lbfgs']
        multi_class: ['multinomial']
...
```


#### Run experiment and save results 

In [42]:
# Reproduce pipeline with new params

!dvc repro stage_evaluate.dvc

[KStage 'stage_prepare_configs.dvc' didn't change.
[KStage 'stage_featurize.dvc' didn't change.
[KStage 'stage_split_train_test.dvc' didn't change.
[KStage 'stage_train.dvc' didn't change.
[KStage 'stage_evaluate.dvc' didn't change.
[KPipeline is up to date. Nothing to reproduce.
[0m

In [43]:
# Commit experiment results

!git add .
!git commit -m "Experiment 1 with LogisticRegression hyperparameters"
!git tag -a "exp1" -m "Experiment 1 with LogisticRegression hyperparameters"

[new-branch e42959c] Experiment 1 with LogisticRegression hyperparameters
 3 files changed, 189 insertions(+), 53 deletions(-)
fatal: tag 'exp1-logreg-.93' already exists


In [44]:
# Show metrics 

!dvc metrics show

[K	experiments/eval.txt:
[K		{
[K		  "f1_score": 0.9305555555555555,
[K		  "confusion_matrix": [
[K		    [
[K		      10,
[K		      0,
[K		      0
[K		    ],
[K		    [
[K		      0,
[K		      7,
[K		      0
[K		    ],
[K		    [
[K		      0,
[K		      2,
[K		      11
[K		    ]
[K		  ]
[K		}
[0m

In [45]:
# Merge results 

!git checkout dev
!git merge exp1-tune-logreg && git branch -d exp1-tune-logreg
!git branch

Switched to branch 'dev'
Your branch is ahead of 'origin/dev' by 3 commits.
  (use "git push" to publish your local commits)
Auto-merging notebooks/step4_experiments_management.ipynb
CONFLICT (content): Merge conflict in notebooks/step4_experiments_management.ipynb
Automatic merge failed; fix conflicts and then commit the result.
* [32mdev[m
  exp1-logreg[m
  exp1-tune-logreg[m
  master[m
  new-branch[m


In [46]:
!git tag --list

exp1-logreg
exp1-logreg-.93


In [48]:
# checkout the specific tag
!git checkout tags/exp1 -b new-branch
!git branch

fatal: A branch named 'new-branch' already exists.
* [32mdev[m
  exp1-tune-logreg[m
  master[m
  new-branch[m


# Experiment 2 - Use SVM

#### 1) create branch for experiment

In [52]:
!git checkout -b exp2-use-svm
!git branch

notebooks/step4_experiments_management.ipynb: needs merge
error: you need to resolve your current index first
* [32mdev[m
  exp1-tune-logreg[m
  master[m
  new-branch[m


#### 2) add SVC config to config/pipeline_config.yml file: in __train__:__estimators__

```yaml
...
        param_grid:
              C: [0.1,1.0,10]
...
```

as result you should have LogisticRegression config:

```yaml
...
train:
  cv: 3
  estimator_name: svm
  estimators:
        
    svm: # sklearn.svm.SVC
      param_grid:
        C: [0.1, 1.0]
        kernel: ["rbf", "linear"]
        gamma: ["scale"]
        degree: [3, 5]
...
```


#### 3) Run experiment and save results 

In [54]:
# Reproduce pipeline with new params

!dvc repro stage_evaluate.dvc -f

[KStage 'stage_prepare_configs.dvc' didn't change.
[KReproducing 'stage_prepare_configs.dvc'
[KRunning command:
	python src/pipelines/prepare_configs.py --config=config/pipeline_config.yml
Save config: experiments/base_config.yml
Save config: experiments/split_train_test_config.yml
Save config: experiments/featurize_config.yml
Save config: experiments/train_config.yml
Save config: experiments/evaluate_config.yml
[KOutput 'experiments/split_train_test_config.yml' didn't change. Skipping saving.
[KOutput 'experiments/featurize_config.yml' didn't change. Skipping saving.
[KOutput 'experiments/train_config.yml' didn't change. Skipping saving.
[KOutput 'experiments/evaluate_config.yml' didn't change. Skipping saving.
[KSaving information to 'stage_prepare_configs.dvc'.
[KStage 'stage_featurize.dvc' didn't change.
[KReproducing 'stage_featurize.dvc'
[KRunning command:
	python src/pipelines/featurize.py --config=experiments/featurize_config.yml
[KOutput 'data/interim/featured_iris

In [55]:
# Commit experiment results

!git add .
!git commit -m "Experiment 2 with SVM estimator"
!git tag -a "exp2" -m "Experiment 2 with SVM estimator"

[dev 2ae58d2] Experiment 2 with SVM estimator


In [56]:
# Show metrics 

!dvc metrics show

[K	experiments/eval.txt:
[K		{
[K		  "f1_score": 0.9305555555555555,
[K		  "confusion_matrix": [
[K		    [
[K		      10,
[K		      0,
[K		      0
[K		    ],
[K		    [
[K		      0,
[K		      7,
[K		      0
[K		    ],
[K		    [
[K		      0,
[K		      2,
[K		      11
[K		    ]
[K		  ]
[K		}
[0m

In [57]:
# Merge results 

!git checkout dev
!git merge exp2-svm && git branch -d exp2-svm
!git branch

Already on 'dev'
Your branch is ahead of 'origin/dev' by 5 commits.
  (use "git push" to publish your local commits)
Already up-to-date.
error: branch 'exp2-svm' not found.
* [32mdev[m
  exp1-tune-logreg[m
  master[m
  new-branch[m


# Experiment 3 - Add new features

#### 1) create branch for experiment

In [58]:
!git checkout -b exp3-add-features
!git branch

M	notebooks/step4_experiments_management.ipynb
Switched to a new branch 'exp3-squared-features'
  dev[m
  exp1-tune-logreg[m
* [32mexp3-squared-features[m
  master[m
  new-branch[m


#### 2) Uncomment features in src/features/featurize.py 

In [59]:
# Reproduce pipeline with new params

!dvc repro stage_evaluate.dvc -f

[KStage 'stage_prepare_configs.dvc' didn't change.
[KReproducing 'stage_prepare_configs.dvc'
[KRunning command:
	python src/pipelines/prepare_configs.py --config=config/pipeline_config.yml
Save config: experiments/base_config.yml
Save config: experiments/split_train_test_config.yml
Save config: experiments/featurize_config.yml
Save config: experiments/train_config.yml
Save config: experiments/evaluate_config.yml
[KOutput 'experiments/split_train_test_config.yml' didn't change. Skipping saving.
[KOutput 'experiments/featurize_config.yml' didn't change. Skipping saving.
[KOutput 'experiments/train_config.yml' didn't change. Skipping saving.
[KOutput 'experiments/evaluate_config.yml' didn't change. Skipping saving.
[KSaving information to 'stage_prepare_configs.dvc'.
[KStage 'stage_featurize.dvc' didn't change.
[KReproducing 'stage_featurize.dvc'
[KRunning command:
	python src/pipelines/featurize.py --config=experiments/featurize_config.yml
[KOutput 'data/interim/featured_iris

In [64]:
# Commit experiment results

!git add .
!git commit -m "Experiment 3 with new features"
!git tag -a "exp3" -m "Experiment 3 with squared features"

[dev 3f49f37] Experiment 3 with new features
 3 files changed, 327 insertions(+), 58 deletions(-)


In [65]:
# Merge results 

!git checkout dev
!git merge  exp3-add-features && git branch -d exp3-add-features
!git branch

Already on 'dev'
Your branch is ahead of 'origin/dev' by 6 commits.
  (use "git push" to publish your local commits)
merge: exp3-squared-features - not something we can merge
* [32mdev[m
  exp1-tune-logreg[m
  master[m
  new-branch[m


In [61]:
!dvc push

[KPreparing to upload data to '/tmp/dvc-storage'
[KPreparing to collect status from /tmp/dvc-storage
[K[##############################] 100% Collecting information
[K[##############################] 100% Analysing status.
[K(1/9): [##############################] 100% experiments/split_train_test_config.yml
[K(2/9): [##############################] 100% experiments/train_config.yml
[K(3/9): [##############################] 100% experiments/featurize_config.yml
[K(4/9): [##############################] 100% experiments/eval.txt
[K(5/9): [##############################] 100% models/model.joblib
[K(6/9): [##############################] 100% experiments/evaluate_config.yml
[K(7/9): [##############################] 100% data/processed/train_iris.csv
[K(8/9): [##############################] 100% data/processed/test_iris.csv
[K(9/9): [##############################] 100% data/interim/featured_iris.csv
[0m

# Compare experiments

#### List experiments

In [67]:
!git tag --list

exp1-logreg
exp1-logreg-.93
exp2-svm
exp3-features


#### Select experiment

In [68]:
!git checkout exp2

Note: checking out 'exp2-svm'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

  git checkout -b <new-branch-name>

HEAD is now at 2ae58d2... Experiment 2 with SVM estimator


#### Reproduce

In [71]:
!dvc repro stage_evaluate.dvc -f

[KStage 'stage_prepare_configs.dvc' didn't change.
[KReproducing 'stage_prepare_configs.dvc'
[KRunning command:
	python src/pipelines/prepare_configs.py --config=config/pipeline_config.yml
Save config: experiments/base_config.yml
Save config: experiments/split_train_test_config.yml
Save config: experiments/featurize_config.yml
Save config: experiments/train_config.yml
Save config: experiments/evaluate_config.yml
[KOutput 'experiments/split_train_test_config.yml' didn't change. Skipping saving.
[KOutput 'experiments/featurize_config.yml' didn't change. Skipping saving.
[KOutput 'experiments/train_config.yml' didn't change. Skipping saving.
[KOutput 'experiments/evaluate_config.yml' didn't change. Skipping saving.
[KSaving information to 'stage_prepare_configs.dvc'.
[KStage 'stage_featurize.dvc' didn't change.
[KReproducing 'stage_featurize.dvc'
[KRunning command:
	python src/pipelines/featurize.py --config=experiments/featurize_config.yml
[KOutput 'data/interim/featured_iris

#### View and compare metrics

In [72]:
# Last experiment metrics:

!dvc metrics show

[K	experiments/eval.txt:
[K		{
[K		  "f1_score": 0.9305555555555555,
[K		  "confusion_matrix": [
[K		    [
[K		      10,
[K		      0,
[K		      0
[K		    ],
[K		    [
[K		      0,
[K		      7,
[K		      0
[K		    ],
[K		    [
[K		      0,
[K		      2,
[K		      11
[K		    ]
[K		  ]
[K		}
[0m

In [79]:
# View and compare metrics for all experiments:

!dvc metrics show -a

[KWorking Tree:
[K	experiments/eval.txt:
[K		{
[K		  "f1_score": 0.9305555555555555,
[K		  "confusion_matrix": [
[K		    [
[K		      10,
[K		      0,
[K		      0
[K		    ],
[K		    [
[K		      0,
[K		      7,
[K		      0
[K		    ],
[K		    [
[K		      0,
[K		      2,
[K		      11
[K		    ]
[K		  ]
[K		}
[Kdev:
[K	experiments/eval.txt:
[K		{
[K		  "f1_score": 0.9305555555555555,
[K		  "confusion_matrix": [
[K		    [
[K		      10,
[K		      0,
[K		      0
[K		    ],
[K		    [
[K		      0,
[K		      7,
[K		      0
[K		    ],
[K		    [
[K		      0,
[K		      2,
[K		      11
[K		    ]
[K		  ]
[K		}
[Kexp1-tune-logreg:
[K	experiments/eval.txt:
[K		{
[K		  "f1_score": 0.9305555555555555,
[K		  "confusion_matrix": [
[K		    [
[K		      10,
[K		      0,
[K		      0
[K		    ],
[K		    [
[K		      0,
[K		      7,
[K		      0
[K		    ],
[K		    [
[K		      0,
[K		      2,
[K		      11
[K		    ]
[K		  ]
[K		}
[Knew-branch:
[K	exper

In [92]:
# control metrics view 

!dvc metrics show -t json -x f1_score -a

[KWorking Tree:
[K	experiments/eval.txt: [0.9305555555555555]
[Kdev:
[K	experiments/eval.txt: [0.9305555555555555]
[Kexp1-tune-logreg:
[K	experiments/eval.txt: [0.9305555555555555]
[Knew-branch:
[K	experiments/eval.txt: [0.9305555555555555]
[0m

In [93]:
# View and compare metrics for all tags:

!dvc metrics show -T

[K[31mERROR[39m: unexpected error - HEAD is a detached symbolic reference as it points to '2ae58d216bb07b28ec4800566b306c5c3cefa679'

[33mHaving any troubles?[39m. Hit us up at [34mhttps://dvc.org/support[39m, we are always happy to help!
[0m

### Try yourself: Use KNN estimator

#### TODO 
- Make experiment with estimator kNN like with SMV;
- use your version fo param_grid
