# Luigi Classification Pipeline

We will build a small Luigi pipeline in order to get started. The task is to classify images into either *lemons* or *bananas*.

Write 3 task:

1. Check for daily data
1. Preprocess images (convert to grayscale, resize to (100, 100)
1. Classify image and write the results into a JSON-File

## Hints and Tricks for openCV

Read an image from disk:
```python
img = cv2.imread("path", cv2.IMREAD_COLOR)
```

Resize an image:
```python
img = cv2.resize(img, (X_SIZE,Y_SIZE))
```

Convert image to grayscale:
```python
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
```

Write an image to disk:
```python
cv2.imwrite("path", img)
```

Find circles to identify lemons:
```python
circles = cv2.HoughCircles(img, 
                           cv2.HOUGH_GRADIENT,
                           dp=2, 
                           minDist=15, 
                           param1=100, 
                           param2=70)
```

## Imports

In [12]:
import json
from datetime import date
import luigi
from luigi.parameter import DateParameter, Parameter
from luigi import LocalTarget, Task, WrapperTask
from luigi.tools.range import RangeDailyBase
import cv2

## Task 1: Check for daily data

In [85]:
class CheckDailyData(Task):
    date = DateParameter(default=date.today())
    path = Parameter(default='/keras2production/notebooks/2-luigi/exercise-dataset/daily')

    def output(self):
        return LocalTarget(self.path+"/%s/image.jpg" % (self.date.strftime('%m-%d-%Y')))

## Task 2: Preprocess input image

In [56]:
class Preprocess(Task):
    date = DateParameter(default=date.today())
    path = Parameter(default='/keras2production/notebooks/2-luigi/exercise-dataset/daily')

    def requires(self):
        return CheckDailyData(self.date)

    def output(self):
        return LocalTarget(self.path+"/%s/preprocessed.jpg" % (self.date.strftime('%m-%d-%Y')))

    def run(self):
        im_path = self.requires().output().path
        im = cv2.imread(im_path, cv2.IMREAD_COLOR)
        im = cv2.resize(im, (100,100))
        im = cv2.cvtColor(im, cv2.COLOR_BGR2GRAY)
        out_path = self.output().path
        cv2.imwrite(out_path,im)

## Classify image

In [78]:
im_path = '/keras2production/notebooks/2-luigi/exercise-dataset/daily/02-19-2018/image.jpg'
im = cv2.imread(im_path, cv2.IMREAD_COLOR)
im = cv2.resize(im, (100,100))
im = cv2.cvtColor(im, cv2.COLOR_BGR2GRAY)


In [114]:
class Classify(Task):
    date = DateParameter(default=date.today())
    path = Parameter(default='/keras2production/notebooks/2-luigi/exercise-dataset/daily')
    
    def requires(self):
        return Preprocess(self.date)
    
    def output(self):
        out_path = self.path+"/%s/result.json" % (self.date.strftime('%m-%d-%Y'))
        return LocalTarget(out_path)
    
    def run(self):
        out_file = self.output().path
        im_path = self.requires().output().path
        im = cv2.imread(im_path, cv2.IMREAD_GRAYSCALE)
        circles = cv2.HoughCircles(im, 
                           cv2.HOUGH_GRADIENT,
                           dp=2, 
                           minDist=15, 
                           param1=100, 
                           param2=70)
        
        classes = dict()
        if circles is None:
            classes['class']='lemon'
        else:
            classes['class']='banana'
        with open(out_file, 'w') as fp:
            json.dump(classes, fp)


            
            
            
    
    

In [115]:
Classify(date(2018,2,19)).run()

In [116]:
luigi.build([Classify(date(2018,2,19))], local_scheduler=True, no_lock=True)

DEBUG: Checking if Classify(date=2018-02-19, path=/keras2production/notebooks/2-luigi/exercise-dataset/daily) is complete
INFO: Informed scheduler that task   Classify_2018_02_19__keras2productio_943a9dd3ff   has status   DONE
INFO: Done scheduling tasks
INFO: Running Worker with 1 processes
DEBUG: Asking scheduler for work...
DEBUG: Done
DEBUG: There are no more tasks to run at this time
INFO: Worker Worker(salt=744741479, workers=1, host=dff90e958f89, username=root, pid=12) was stopped. Shutting down Keep-Alive thread
INFO: 
===== Luigi Execution Summary =====

Scheduled 1 tasks of which:
* 1 complete ones were encountered:
    - 1 Classify(date=2018-02-19, path=/keras2production/notebooks/2-luigi/exercise-dataset/daily)

Did not run any tasks
This progress looks :) because there were no failed tasks or missing dependencies

===== Luigi Execution Summary =====



True

## Daily jobs and backfillings 

Now we can classify a single image that is identified by it's savedate. But Luigi comes even more handy when handling "backfillings". Using the *RangeDailyBase* Wrappertask we can process all 3 images with the pipeline we already built.

```python
RangeDailyBase(of=TASK, start=START_DATE, stop=END_DATE, days_back=ALLOWED_DAYS_INTO_PAST)
```

In [None]:
luigi.build([], local_scheduler=True, no_lock=True)