# Install and init DVC

Prerequisites: 
-  DVC and requirements.txt packages installed (if not - check README.md file for instructions)
-  A project repository is a Git repo 

## Initialize DVC

References: 
- https://dvc.org/doc/get-started/initialize 

In [None]:
!dvc init

## Commit changes

In [None]:
%%bash

git add .
git commit -m "Initialize DVC"

## Review Files and Directories created by DVC

In [None]:
!ls -a .dvc 

In [None]:
!cat .dvc/.gitignore

# Quick Tour of DVC features

## Data Versioning

In [1]:
# Get data 

import pandas as pd
from sklearn.datasets import load_iris

data = load_iris(as_frame=True)
list(data.target_names)
data.frame.to_csv('data/iris.csv', index=False)

In [2]:
# Look on data

data.frame.head()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),target
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0


In [None]:
%%bash

du -sh data/*

## Add file under DVC control

In [None]:
%%bash

dvc add data/iris.csv

In [None]:
!du -sh data/*

In [None]:
!git status -s data/

In [None]:
%%bash

git add .
git commit -m "Add a source dataset"

## Create and Reproduce ML pipelines 

Stages 
- extract features 
- split dataset 
- train 
- evaluate 


### Add pipeline stages to `dvc.yaml` 

```yaml
stages:
  
  evaluate: 
    cmd: python src/featurize.py
    deps:
    - data/iris.csv
    - src/featurize.py
    outs:
    - data/features_iris.csv
  
  split_dataset: 
    cmd: python src/split_dataset.py
    deps:
    - data/features_iris.csv
    outs:
    - data/train.csv
    - data/test.csv
    
  train: 
    cmd: python src/train.py
    deps:
    - data/train.csv
    outs:
    - data/model.joblib

  evaluate: 
    cmd: python src/evaluate.py
    deps:
    - data/model.joblib
    - data/test.csv
    outs:
    - data/eval.txt
```

### Run DVC pipeline (all stages)

In [None]:
!dvc exp run

In [None]:
!ls 

In [None]:
%%bash
git add .
git commit -m "New expeirment"

## Collaborate on ML Experiments 

### Specify remote storage (local ~ /tmp/dvc)


In [None]:
!dvc remote add -d local /tmp/dvc

### Push features to remote storage

In [None]:
!dvc push