# Lightweight pipelines with DVC

We will build a small pipeline with DVC in order to get started. The task is to classify images into either *lemons* or *bananas*.

The pipelines consists of 2 python functions:

1. preprocess(inputpath, outputpath), that processes images (convert to grayscale, resize to (100, 100))
1. classify(inputpath, outputpath), that classifies images and write the results into a JSON-File

Write the 2 functions and wrap a pipeline around them using DVC.

The best approach is to create a python file and implement the functions. Googles "fire" is an easy approach to invoke preprocess and classify.

Install fire:
!pip install fire

Use fire:

```python
import fire

...
...
...

if __name__ == '__main__':
  fire.Fire()
```

Invoke function with fire:

```bash
python <your file>.py preprocess exercise-dataset-dvc/image.jpg output/preprocessed.jpg
```

You can use the "%%sh" shell-magic to run shell commands in a cell. 

In [2]:
!pip install fire

Collecting fire
  Downloading https://files.pythonhosted.org/packages/5a/b7/205702f348aab198baecd1d8344a90748cb68f53bdcd1cc30cbc08e47d3e/fire-0.1.3.tar.gz
Building wheels for collected packages: fire
  Building wheel for fire (setup.py) ... [?25ldone
[?25h  Stored in directory: /root/.cache/pip/wheels/2a/1a/4d/6b30377c3051e76559d1185c1dbbfff15aed31f87acdd14c22
Successfully built fire
Installing collected packages: fire
Successfully installed fire-0.1.3
[33mYou are using pip version 19.0.2, however version 19.0.3 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


## Testing the functions

In [23]:
%%sh 
python DVC_exercise.py preprocess exercise-dataset-dvc/image.jpg output-exercise/test-processed.jpg

In [24]:
%%sh 
python DVC_exercise.py classify output-exercise/test-processed.jpg output-exercise/test-result.json

## Initialize DVC

In [17]:
!pip install dvc

Collecting dvc
[?25l  Downloading https://files.pythonhosted.org/packages/70/9e/c04fd8ce62bf8dae5730864980e1cb3797188082c99f548592f63686b97c/dvc-0.29.0-py2.py3-none-any.whl (127kB)
[K    100% |████████████████████████████████| 133kB 2.5MB/s 
[?25hCollecting asciimatics>=1.10.0 (from dvc)
[?25l  Downloading https://files.pythonhosted.org/packages/60/6a/dababee230e5220159a3518617facba78c697f4478fe30d77a370ba9dedf/asciimatics-1.10.0-py2.py3-none-any.whl (92kB)
[K    100% |████████████████████████████████| 102kB 23.1MB/s 
Collecting distro>=1.3.0 (from dvc)
  Downloading https://files.pythonhosted.org/packages/ea/35/82f79b92fa4d937146c660a6482cee4f3dfa1f97ff3d2a6f3ecba33e712e/distro-1.4.0-py2.py3-none-any.whl
Collecting future>=0.16.0 (from dvc)
[?25l  Downloading https://files.pythonhosted.org/packages/90/52/e20466b85000a181e1e144fd8305caf2cf475e2f9674e797b222f8105f5f/future-0.17.1.tar.gz (829kB)
[K    100% |████████████████████████████████| 829kB 10.4MB/s 
[?25hCollecting pyasn1>

In [20]:
!dvc init -f --no-scm

[31m+---------------------------------------------------------------------+
[39m[31m|[39m                                                                     [31m|[39m
[31m|[39m        DVC has enabled anonymous aggregate usage analytics.         [31m|[39m
[31m|[39m     Read the analytics documentation (and how to opt-out) here:     [31m|[39m
[31m|[39m              [34mhttps://dvc.org/doc/user-guide/analytics[39m               [31m|[39m
[31m|[39m                                                                     [31m|[39m
[31m+---------------------------------------------------------------------+
[39m
[33mWhat's next?[39m
[33m------------[39m
- Check out the documentation: [34mhttps://dvc.org/doc[39m
- Get help and share ideas: [34mhttps://dvc.org/chat[39m
- Star us on GitHub: [34mhttps://github.com/iterative/dvc[39m
[0m

## Invoke the functions

In [32]:
%%sh 
dvc run -d DVC_exercise.py \
        -d exercise-dataset-dvc/image.jpg \
        -o output-exercise/test-processed.jpg -overwrite-dvcfile \
        python DVC_exercise.py preprocess exercise-dataset-dvc/image.jpg output-exercise/test-processed.jpg 100 100

[31mError[39m: failed to run command - stage 'test-processed.jpg.dvc' already exists

[33mHaving any troubles?[39m Hit us up at [34mhttps://dvc.org/support[39m, we are always happy to help!


CalledProcessError: Command 'b'dvc run -d DVC_exercise.py \\\n        -d exercise-dataset-dvc/image.jpg \\\n        -o output-exercise/test-processed.jpg -overwrite-dvcfile \\\n        python DVC_exercise.py preprocess exercise-dataset-dvc/image.jpg output-exercise/test-processed.jpg 100 100\n'' returned non-zero exit status 1.

In [33]:
%%sh
dvc run -d DVC_exercise.py \
        -d output-exercise/test-processed.jpg -overwrite-dvcfile \
        -o output-exercise/result.json
        python DVC_exercise.py classify output-exercise/test-processed.jpg output-exercise/test-result.json

[##############################] 100% result.json


[31mError[39m: failed to run command - file/directory '/keras2production/notebooks/3-dvc/output-exercise/result.json' is specified as an output in more than onestage: verwrite-dvcfile.dvc
    result.json.dvc

[33mHaving any troubles?[39m Hit us up at [34mhttps://dvc.org/support[39m, we are always happy to help!


## Look at the pipeline

In [27]:
%%sh
dvc pipeline show result.json.dvc --ascii

+------------------------+ 
| test-processed.jpg.dvc | 
+------------------------+ 
             *             
             *             
             *             
    +-----------------+    
    | result.json.dvc |    
    +-----------------+    


In [29]:
%%sh
dvc pipeline -h

usage: dvc pipeline [-h] [-q | -v] {show,list} ...

Manage pipeline.

positional arguments:
  {show,list}    Use dvc pipeline CMD --help for command-specific help.
    show         Show pipeline.
    list         List pipelines.

optional arguments:
  -h, --help     show this help message and exit
  -q, --quiet    Be quiet.
  -v, --verbose  Be verbose.
