# Lightweight pipelines with DVC

We will build a small pipeline with DVC in order to get started. The task is to classify images into either *lemons* or *bananas*.

The pipelines consists of 2 python functions:

1. preprocess(inputpath, outputpath), that processes images (convert to grayscale, resize to (100, 100))
1. classify(inputpath, outputpath), that classifies images and write the results into a JSON-File

Write the 2 functions and wrap a pipeline around them using DVC.

The best approach is to create a python file and implement the functions. Googles "fire" is an easy approach to invoke preprocess and classify.

Install fire:
!pip install fire

Use fire:

```python
import fire

...
...
...

if __name__ == '__main__':
  fire.Fire()
```

Invoke function with fire:

```bash
python <your file>.py preprocess exercise-dataset-dvc/image.jpg output/preprocessed.jpg
```

You can use the "%%sh" shell-magic to run shell commands in a cell. 

## Testing the functions

In [6]:
%%sh 
python DVC_exercise.py preprocess exercise-dataset-dvc/image.jpg output-exercise/test-processed.jpg

Finished


In [15]:
%%sh 
python DVC_exercise.py classify output-exercise/test-processed.jpg output-exercise/test-result.json
#python DVC_exercise.py classify exercise-dataset-dvc/image.jpg output-exercise/test-result.json

## Initialize DVC

In [16]:
!dvc init -f --no-scm

[31m+---------------------------------------------------------------------+
[39m[31m|[39m                                                                     [31m|[39m
[31m|[39m        DVC has enabled anonymous aggregate usage analytics.         [31m|[39m
[31m|[39m     Read the analytics documentation (and how to opt-out) here:     [31m|[39m
[31m|[39m              [34mhttps://dvc.org/doc/user-guide/analytics[39m               [31m|[39m
[31m|[39m                                                                     [31m|[39m
[31m+---------------------------------------------------------------------+
[39m
[33mWhat's next?[39m
[33m------------[39m
- Check out the documentation: [34mhttps://dvc.org/doc[39m
- Get help and share ideas: [34mhttps://dvc.org/chat[39m
- Star us on GitHub: [34mhttps://github.com/iterative/dvc[39m
[0m

## Invoke the functions

In [18]:
%%sh
dvc run -d DVC_exercise.py \
-d exercise-dataset-dvc/image.jpg \
-o output-exercise/test-processed.jpg \
python DVC_exercise.py preprocess exercise-dataset-dvc/image.jpg output-exercise/test-processed.jpg

Running command:
	python DVC_exercise.py preprocess exercise-dataset-dvc/image.jpg output-exercise/test-processed.jpg
Finished
Saving 'output-exercise/test-processed.jpg' to cache '.dvc/cache'.
Saving information to 'test-processed.jpg.dvc'.

To track the changes with git run:

	git add test-processed.jpg.dvc


In [53]:
%%sh
dvc run -h

usage: dvc run [-h] [-q | -v] [-d DEPS] [-o OUTS] [-O OUTS_NO_CACHE]
               [-m METRICS] [-M METRICS_NO_CACHE] [-f FILE] [-c CWD]
               [--no-exec] [-y] [--overwrite-dvcfile] [--ignore-build-cache]
               [--remove-outs] [--no-commit]
               ...

Generate a stage file from a given command and execute the command.

positional arguments:
  command               Command to execute.

optional arguments:
  -h, --help            show this help message and exit
  -q, --quiet           Be quiet.
  -v, --verbose         Be verbose.
  -d DEPS, --deps DEPS  Declare dependencies for reproducible cmd.
  -o OUTS, --outs OUTS  Declare output file or directory.
  -O OUTS_NO_CACHE, --outs-no-cache OUTS_NO_CACHE
                        Declare output file or directory (do not put into DVC
                        cache).
  -m METRICS, --metrics METRICS
                        Declare output metric file or directory.
  -M METRICS_NO_CACHE, --metrics-no-cache METRICS_NO_CAC

## Look at the pipeline

In [50]:
%%sh
dvc pipeline -h

usage: dvc pipeline [-h] [-q | -v] {show,list} ...

Manage pipeline.

positional arguments:
  {show,list}    Use dvc pipeline CMD --help for command-specific help.
    show         Show pipeline.
    list         List pipelines.

optional arguments:
  -h, --help     show this help message and exit
  -q, --quiet    Be quiet.
  -v, --verbose  Be verbose.
