In [2]:
%cd /home/dvc-2-iris-demo

/home/dvc-2-iris-demo


# About

Here we will create experiments with different configuration and save them as a tags.   
Also we will try to add new features - new experiment.

Then it will be possible to show and compare metrics for different experiments using _DVC_

# Prerequisites

Make sure that you performed the stage 5 (_notebook step5_execution_dag.ipynb_) and DVC pipelines exist

# Experiments

## Overview pipeline_config.yml as is

In [3]:
config = open('config/pipeline_config.yml').read()
print(config)

base:
  project: 7labs/dvc-2-iris-demo
  name: iris
  tags: [solution-0-prototype, dev]

  model:
    model_name: model.joblib
    models_folder: models

  experiments:
    experiments_folder: experiments

  random_state: 42 # random state for train/test split


split_train_test:
  folder: experiments
  train_csv: data/processed/train_iris.csv
  test_csv: data/processed/test_iris.csv
  test_size: 0.2


featurize:
  dataset_csv: data/raw/iris.csv
  featured_dataset_csv: data/interim/featured_iris.csv
  features_columns_range: ['sepal_length', 'petal_length_to_petal_width']
  target_column: species


train:
  cv: 5
  estimator_name: logreg

  estimators:

    logreg: # sklearn.linear_model.LogisticRegression
      param_grid: # params of GridSearchCV constructor
        C: [0.001, 0.01]
        max_iter: [100]
        solver: ['lbfgs']
        multi_class: ['multinomial']

    knn: # sklearn.neighbors.KNeighborsClassifier
      param_grid:
        n_neighbors: [5,15]
        p: [1,2]

  

## 1. LogisticRegression

#### As you can see: current estimator in pipelines config is _logistic regression_ :
```yaml
...
    train:
      ...
      estimator_name: logreg
      ...
...
```
#### Let's update  parameter C for _logreg_:

```yaml
...
       logreg: # sklearn.linear_model.LogisticRegression
           param_grid: # params of GridSearchCV constructor
              C: [0.001, 0.01, 0.1]
...
```

##### as result you should have such parameters:

```yaml
...
train:
  cv: 5
  estimator_name: logreg

  estimators:

    logreg: # sklearn.linear_model.LogisticRegression
      param_grid: # params of GridSearchCV constructor
        C: [0.001, 0.01, 0.1]
   ...
...
```

#### Then reproduce pipelines

```bash
dvc repro pipeline_evaluate.dvc
```


#### Commit new experiment

##### add to commit main config and DVC pipeline files
```bash
git add config/pipeline_config.yml *.dvc
```
##### make commit and tag
```bash
git commit -m "create experiment with estimator LogisticRegression"
git tag -a "exp1-logreg" -m "create experiment with estimator LogisticRegression"
```



## 2. SVM

#### Open pipeline config again.

#### Change estimator name from _logreg_ to _svm_:

```yaml
...
train:
    ...
    estimator_name: 'svm'
    ...
...
```

#### Reproduce pipelines

```bash
dvc repro pipeline_evaluate.dvc
```

#### Commit new experiment

##### add to commit main config and DVC pipeline files
```bash
git add config/pipeline_config.yml *.dvc
```
##### make commit and tag
```bash
git commit -m "create experiment with estimator SVM"
git tag -a "exp2-svm" -m "create experiment with estimator SVM"
```

## 3. KNN

#### Make experiment with estimator kNN like with SMV;
#### use, for example, such config:


```bash
...
train:
    ...
    estimator_name: 'knn'
    ...
...
```

# 4. Add new features

#### Here we won't change config, but will create experiment, based on new features
#### Open module _src/features/features.py_ and after string

```python
dataset['petal_width_in_square'] = dataset['petal_width'] ** 2
```

    # uncomment for exp 2
    # features['sepal_length_squared'] = features['sepal_length'] ** 2
    # features['sepal_width_squared'] = features['sepal_width'] ** 2

    # uncomment for exp 3
    # features['petal_length_squared'] = features['petal_length'] ** 2
    # features['petal_width_squared'] = features['petal_width'] ** 2

#### Now we can reproduce pipelines:

```bash
dvc repro -f pipeline_evaluate.dvc
```
#### __Note__: here we have to reproduce pipelines forcibly, because module _src/features/features.py_ is not dependecy in some DVC pipeline, just pipeline modules (located in _src/pipelines_) are dependencies of DVC pipelines

#### Don't forget make commit and create new tag:

```bash
git add src/features/features.py *.dvc
git commit -m "create new features - cubes of sizes"
git tag -a "exp4-cubes-of-sizes" -m "create new features - cubes of sizes"
```

# Reproduce experiment

#### Experiments can be reproduced

#### List experiments
```bash
git tag --list
```
##### output
```bash
exp1-logreg
exp2-svm
exp3-knn
exp4-cubes-of-sizes
```

#### Select experiment
```bash
git checkout exp2-svm
```

#### Reproduce
```bash
dvc repro pipeline_evaluate.dvc
```

##### As you'll see, pipelines will not restart, but DVC will take all dependencies and outputs from cache

# View and compare metrics

#### Now we have some experiments and can view and compare metrics for them

#### View last experiment metrics:
```bash
dvc metrics show
```

#### View and compare metrics for all tags:
```bash
dvc metrics show -T
```