# Object Detection: Moving towards a production pipeline

This example illustrates a common Object Detection use case using [Pachyderm](https://www.pachyderm.com/), [Lightning Flash](https://lightning-flash.readthedocs.io/en/latest/), and [Label Studio](https://labelstud.io/). 

<p align="center">
	<img src='images/diagram.png' width='500' title='Pachyderm'>
</p>

This demo mimics the object detection [example from Lightning Flash](https://lightning-flash.readthedocs.io/en/stable/reference/object_detection.html#example). We extend the example to predict on new data that can be used to produce predictions for the [Pachyderm Label Studio integration](https://github.com/pachyderm/label-studio) to refine and improve your training data.

## Step 1: Upload Dataset

In [None]:
!wget https://github.com/zhiqwang/yolov5-rt-stack/releases/download/v0.3.0/coco128.zip
!unzip coco128.zip

In [None]:
!pachctl create repo coco128

In [None]:
!pachctl put file -r coco128@master:/coco128/ -f coco128

## Step 2: Train Model

In [None]:
!pachctl create pipeline -f pachyderm/model.json

## Step 3: Predict on Inference Data
Our `predictions` pipeline uses combines our `inference_images` data repo with the output of our model pipeline. 

In this example, we cross these two inputs via the spec: 
```yaml
input:
  cross:
  - pfs:
      repo: inference_images
      glob: "/*"
  - pfs:
      repo: model
      glob: "/"
```

This means that each time an image is added or changed in `inference_images` it will be processed independently. But whenever our model is retrained, we will reprocess all of our inference_images. 

In [None]:
# Add inference data
!pachctl create repo inference_images

In [None]:
# Deploy prediction pipeline
!pachctl create pipeline -f pachyderm/predictions.json

In [None]:
# Add data to be predicted on (pipelines run automatically)
!pachctl put file -r inference_images@master:/dog1.jpeg -f images/dog1.jpeg

We'll now deploy one more pipeline that will let us visualize the bounding boxes our model predicted. Everything is versioned, so if our model changes, then we'll be able to see the difference in our bounding boxs as our model changes over time. 

In [None]:
!pachctl create pipeline -f pachyderm/bbox.json

## Step 4: Edit Predictions in Label Studio
In order to load the predictions into Label Studio, follow the [Label Studio integration](https://github.com/pachyderm/examples/tree/master/label-studio) to run the server locally using our pre-built Docker container, passing it your Pachyderm config. 

Once it is running, continue with the steps below. 

In [None]:
# Create labels repo to push our annotations to
!pachctl create repo labels

1. Create an Object Detection with Bounding Boxes project in Label Studio. 
2. Paste in the class list from `./classes_raw.txt`
3. Configure Label Studio's Cloud Storage to:
**Source Storage**: `inference_images@master`, `predictions@master` (make sure to sync inference_images first so the image files exist for the predictions when they're imported)

<p align="center">
	<img src='images/inference_images_config.png' width='600' title='Pachyderm'>
</p>


<p align="center">
	<img src='images/predictions_config.png' width='600' title='Pachyderm'>
</p>

Note: We need to tread every object as a source file with `inference_images` but not `predictions`.

**Target Storage**: `labels`

After configuring and syncing everything, your Cloud Storage settings should look like this: 

<p align="center">
	<img src='images/ls_cloud_storage_config.png' width='600' title='Pachyderm'>
</p>

In [None]:
# View the classes available
#!cat classes_raw.txt

Now you can edit the labels for your data, and once you're satisfied with your progress, sync your labels to Pachyderm with the `Sync Storage` option on your `labels` data repository in the Cloud Storage settings. 

## Clean up

In [None]:
!pachctl delete pipeline predictions
!pachctl delete pipeline model
!pachctl delete repo labels
!pachctl delete repo inference_images
!pachctl delete repo coco128