# Deploy DFG IR on FPGA

This tutorial continues, with how to, finally deploy the algorithm, in this case our face detaction SSD model onto an FPGA. We have talked about the network itself, Tensorflow as a *DSL* and we are going to finish off by using the platform that we have introduced in previous tutorials to take this network and actually deploy it onto the FPGA.

Given that you have installed `plumber` in the Tutorial 1, we are now going to use it to generate a *D*ata-*F*low *G*raph (DFG) that can be passed down the platform to reach the final execution on FPGA and CPU. 

Just for a quick summary, the platform itself consists of multiple parts: *plumber* is a web-based application capable of taking a templated description of a machine learning algorithm, optimize it and create a DFG that is then passed into *raintime*. *raintime* then instantiates computation nodes, either processed in a CPU or offloaded to a FPGA accelerator. *rainman* then takes the FPGA templates and synthesises them on the device itself, while interconnecting with the nodes instantiated on the CPU. All can be visualised in a simple diagram:

![Flowchart.png](../data/figs/platform_flowchart.png)

So to get started with the SSD example, make sure that you have the checkpoint files of your model and a `plumber_cli` installed in your virtual environment. 

Then we can simply use the `plumber_cli` to step-by-step to create a DFG that can be loaded on FPGA.

## 1. Step: Freezing a model
Make sure that your checkpoint files generated after training/retrraining session contain these files: 

`checkpoint`: a file that contains meta information, data fines and index file about the checkpoint directory.

`*.meta`: the meta information about your model

`*.data`: weights data

`*.index`: the index file

These files are now going to be used to be imported into `plumber` and consequently converted into a representation that the platform *understands* and can optimise. 

```bash
$ plumber_cli freeze /tmp/ssd_ckpt -d /tmp/ssd
```

`/tmp/ssd_ckpt`: is the checkpoint directory

`/tmp/ssd`: is the output directory


## 2. Step: Creating a DFG

Out of these files that you have created you can create a raw Data-Flow graph, again by using `plumber_cli`:

```bash
$ plumber_cli dfg /tmp/ssd.pb /tmp/ssd_dfg.pb --dfg-text-file=/tmp/ssd_dfg.pbtxt --dfg-data-file=/tmp/ssd_dfg.h5 --input-image-shape=1,256,256,3
```

`/tmp/ssd.pb`: is the plumber file describing the network as a plumber binary file

`/tmp/ssd_dfg.pb`: is the plumber template for a DFG

`/tmp/ssd_dfg.pbtxt`: is the description of the DFG in a text format

`/tmp/ssd_dfg.h5`: is a data-file describing input/output sizes, important for random data generation or weights extraction

`1,256,256,3`: is an input image shape, in our case 256x256 images with three channels with one image per batch, n.b.: the format is Batch Size, Height, Width, Number of Channels.

## 3. Step: Optimizing DFG
`plumber` has the ability to compotationally optimise the DFG with respect to the hardware that the platform presents, that can result in increased accuracy and speed improvements.

You can do it simply with `plumber_cli`:

```bash
$ plumber_cli dfg_opt --dfg-file=/tmp/ssd_dfg.pb --dfg-data-file=/tmp/ssd_dfg.h5 --opt-dfg-file=/tmp/ssd_opt_dfg.pbtxt --logdir=/tmp/logs
```

This will now take the original DFG described in `tmp/ssd_dfg.pb` and optimize it to maximaise the gain from our platform: 

`/tmp/ssd_dfg.pb`: is the plumber file describing the network as a plumber binary file

`/tmp/ssd_dfg.h5`: is the data file that we have created in the previous step

`/tmp/ssd_opt_dfg.pbtxt`: this is the new, optimised, pbtxt

`/tmp/logs`: this is the logging directory

now let's move to the next step, which actually results in an execution on embedded system with FPGA. 

In case you want to skip these two steps you can also find the data generated at this link: [Link](https://s3-eu-west-1.amazonaws.com/coreraincifardata/ssd_6b_data.zip).

## 4. Step: Importing DFG into `raintime`

Just as a quick recap: `raintime` is a software runtime library for processing CNNs on embedded FPGA systems. Computation nodes in a CNN can either be processed in CPU or offloaded to the FPGA accelerator design built by `rainman`. It also has several parts, it can be summarised in a diagram without going to detail: 

![raintime.png](../data/figs/raintime.png)

The nodes themselves are already implemented in raintime, but we need to streamline the execution and say which data we want to extract etc. In later versions this step is going to be completely automatic, at the moment we have to write a short demo.

If we were about to write the demo in `raintime`, to execute the demo on FPGA it would have several important steps:

```C++
  int batch_size = 1;
  int n_channels = 3;
  int img_size = 265

  // Load image
  cv::Mat image;
    image = cv::imread(argv[1], cv::CV_LOAD_IMAGE_COLOR);
    
  // Reorder the pixel values from the default ordering of opencv
  std::vector<uint8_t> converted;
  auto image_pointer = image.ptr();
  for (size_t i = 0; i < n_channels; i++) {
    for (size_t j = 0; j < img_size * img_size; j++) {
      converted[j + img_size * img_size * i] =  image[n_channels * j + i];
    }
  }
  
  // Load DFGDef
  auto dfg_def = LoadDFGDefFromFile(dfg_file_name);

  // Use the integrated builder to build the graph and make abstractions to connect the CPU and FPGA
  *dfg = DFGBuilder(dfg_def).Build();

  // Load constant data map, including weights and biases
  *data_map = new DFGDataMap;
  (*data_map)->LoadFromDir(data_dir);
  
  
  std::vector<int> dims;
  dims.push_back(n_channels);
  dims.push_back(img_size);
  dims.push_back(img_size);
  dims.push_back(batch_size);
  
  
  // Load input data-map into the DFG, without any particular pre-processing optimisations
  DFGDataMap *input_data_map = new DFGDataMap;
    input_data_map->LoadImage(converted, image_size, n_channels,
                              "input_tensor", dims, "no");
                              
  // Extract the output data map from the runner, in this case the 
  auto output_data_map = runner->Run(dfg, data_map, input_data_map, true);
  auto output_data = output_data_map->get("Predictions").second;


  std::cout<<"The number of detected faces: "<<output_data_map.size()<<std::endl;
  
  //House-keeping
  delete input_data_map;
  delete output_data_map;
```

Once you have finished writing the demo, then you would have to compile your design, on the board.

## Step 5: Compilation

This is fairly easy, `raintime` has several settings how to compile a project, but we will try to avoid details. Once connected to the board with preinstalled OS and a correct `BOOT.bin`, you would clone the raintime project with your demo and compile it. 

```bash
$ git clone https://github.com/corerain/raintime.git
$ cd raintime
$ mkdir build && cd build
# Create the compiling structure through CMake and specify the number of fraction bits (FB) for a 32 bit representation
$ cmake .. -DCMAKE_BUILD_TYPE=Release -DDEF_FIXED_NUM_FB_32=20 -DBUILD_TESTS=ON
$ make
$ ./ssd_6b_demo
```

... and viola! You have described an algorithm in Python/Tensorflow and now you are executing it on FPGA, great isn't it?

The process up-to raintime is also available as a web-application on: [Link](http://corerain1.corerain.com:5005/), where you can not only view the SSD demo but also others. Below is the team with behind the platform. 

![team.jpeg](../data/figs/team.jpeg)

Or you can see a video of hte demonstration: 

In [4]:
%%HTML
<video width="640" height="480" controls>
  <source src="../data/figs/face_detecion.mp4" type="video/mp4">
</video>