# AFSK Demodulator
## Step 1: Basic AXI Streaming

-----

This notebook will outline the steps necessary to create a basic AXI streaming solution in Vivado HLS and use that solution in Python code.

This code is part of the [AFSK Demodulator on Pynq](afsk-demodulator-fpga.ipynb) project.

The purpose of this code is to serve as the foundation for migrating the Python demodulator code to FPGA.  We will be streaming audio data into the FPGA and streaming processed data out from the FPGA.  In the beginning we will be streaming in audio data from WAV files, and streaming out processed audio data.  When we get to the point of implementing the digital PLL and HDLC, we will be streaming out PLL data, or just the decoded HDLC packets.

Before we do any of this, however, we need to ensure we have a framework for getting data into and out of the FPGA.  The Zynq SOC uses AXI interfaces to communicate between the processing system (PS) and the programmable logic (PL).
From this point on we will use *PL* and *PS* to refer to the two distict halves of the Zynq SOC.

## Prerequisites

At this point you are expected to have:

 * A configured PYNQ environment.
 * Vivado installed on your computer and configured for your board.
 * Experience working through the tutorials at https://pynq.readthedocs.io/.
 * Familiarized yourself with the AFSK demodulator implementation in Python.

If you do not have a PYNQ Z2, you may have to adjust some of the code in later sections to match your board.

I am running Vivado on Linux.  It should work on Windows as well.

## Housekeeping

Before we begin, we need to address some basic configuration items that will make our lives easier going forward.

### Mounting the PYNQ Filesystem

We are going to need to copy files between our workstation running Vivado and the PYNQ board.  To do that, we are going to mount the board on our computer.  On Linux:

```bash
sudo mkdir /run/media/${USER}/PYNQ && sudo mount -t cifs -o "uid=1000,username=xilinx,password=xilinx" //pynq/xilinx /run/media/${USER}/PYNQ
```

In Linux, you should now see a PYNQ mount point in your file browser.

### Vivado Board Support

You should have added a board description file for your development board to Vivado while going through the setup process and tutorials mentioned as prerequisites. For the Pynq-Z2, that means downloading the [board file](https://d2m32eurp10079.cloudfront.net/Download/pynq-z2.zip) and installing the files in `Xilinx/Vivado/2018.3/data/boards/board_files`.

### Vivado HLS Board Support

Adding support for your board to Vivado HLS is not as straight-forward as it is to add support for the board to Vivado, but for repeated use of Vivado HLS, it is worthwhile.

 1. Edit ```Xilinx/Vivado/2018.3/common/config/VivadoHls_boards.xml```
 1. Add ```<board name="PYNQ_Z2" display_name="PYNQ Z2 Development Kit" family="zynq" part="xc7z020clg400-1"  device="xc7z020" package="clg400" speedgrade="-1" vendor="http://www.tul.com.tw" />``` just inside the "platform" section.

### Vivado Environment

Source the Vivado shell configuration files to ensure your shell environment is configured so that you and Vivado can find the Xilinx components.

```bash
. Xilinx/Vivado/2018.3/settings64.sh
```

Note: if you move the Xilinx installation, like I tried to do, the configuration scripts will point to the original installation directory and your shell environment will be wrong.  It may be easier to re-install than to try to fix that.

## Outline

We are going to create a very simple IP that performs the following function on a block of data and returns the result:

$y[d] = \frac{5}{8} x[d]$

Yes, it is trivial.  It does not do much at all, but allows us to verify that the PL is actually modifying the data.

We will perform the following steps in this section:

 1. Create a C++ file that accepts a block of 16-bit data, performs the operation, and sends the result back.
 1. Create a C++ test case for the above file (because good tests improve development speed).
 1. Generate an IP package from the code that can be used in Vivado.
 1. Create a Zynq project in Vivado that uses the IP.
 1. Export the bitstream for our project from Vivado.
 1. Use Python running on the PS to load the bitstream to the PL, and verify that it works.

## Vivado HLS

 1. Start Vivado HLS.
    ```bash
    vivado_hls
    ```
 1. Create a new project.
 1. Create a top-level function called demodulate.
 1. Create 2 new files:
    * [demodulate.hpp](HLS/demodulate.hpp)
    * [demodulate.cpp](HLS/demodulate.cpp)
 1. Create a new test bench:
    * [demodulate_test.cpp](HLS/demodulate_test.cpp)
 
The important part of this module is to implement the AXI streaming interface in C++ that we will be using for the remainder of the projects.  There are two important steps:

 1. Defining the C++ data types required for an AXI streaming interface.
 1. Adding the HLS `pragma` entries to the code.

-----

```c++
#include <ap_axi_sdata.h>
#include <hls_stream.h>

#define BLOCK_SIZE 264

typedef ap_axis<16,1,1,1> stream_type;

void demodulate(stream_type input[BLOCK_SIZE], stream_type output[BLOCK_SIZE]);
```

-----

The `ap_axis` template type is specialized for a 16-bit transfer.  The three 1's are for fields we are not using, but which are required as they are part of the data type (User, ID and Dest).

We also define the block size used for our transfers.  We are going to transfer 264 16-bit entries at a time.  Our audio data is 16-bit, 26.4k samples per second.  The sample size chosen provides 10ms of audio data.  We have two competing goals: maximize throughput efficiency and reduce latency.  We need low latency in order to quickly detect a carrier signal if one exists.  10ms is a reasonable compromise.  We can go lower given the performance of the FPGA, but for now we will use the 264 sample block size.

Once the code and test bench are written, we need to run the C simulation, C synthesis, C/RTL co-simulation, then package the IP.  The two simulation steps run our test bench.  This verifies that the code will sythesize properly and that it functions properly.  For a software engineer, this is the same as compiling and running unit tests.

A word of note regarding HLS -- Vivado HLS error messages can be rather opaque.  And Vivado HLS appears to be changing rapidly, so fixes to problems given 2-3 years ago may be out of date.

Once the IP is packaged, we are done in HLS.

## Vivado

We will now switch over to Vivado and create a block design.

 1. Start Vivado and create a new project.
 1. Give it a path -- in our case `afsk-demodulator-pynq/project_01` and the name `Vivado`.
 1. Select the `RTL Project` project type.
 1. In the "Default Part" screen, switch to the "Boards" tab. Select the your board from the list.
 1. Click "Finish".
 
With the new project open in Vivado, we need to create a block design.

 1. On the right side, in the Flow Navigator, select *Create Block Diagram*.
 1. Use the default name, design_1.
 1. Go into Tools|Settings.
    1. In the settings dialog, choose IP|Repository.
    1. Select "+" to add a repository.
    1. Add Project_01/HLS as a repository.  You should see that it has 1 IP called `demodulate` in there.
    1. When done, click "OK".
 1. In the Diagram view (main window) select "+" to add IP.
 1. Add the Zynq processing system and run block automation.
 1. When done, double-click the Zynq block and find the *High-performance AXI Slave Ports*.
 1. Click on the High-performance AXI Slave Ports.
 1. Enable the *S AXI HP0 interface*, then click OK.
 1. Add an AXI Stream Interconnect, AXI Direct Memory Access and the demodulator IP.
 1. Open the AXI Direct Memory Access, disable scatter/gather, and set the stream widths to 16 bits.
 1. Wire up the demodulator to the AXI Direct Memory Access and run connection automation.
    * A few additional modules are added: AXI SmartConnect, AXI Interconnect, and Processor System Reset
![BlockDiagram](BlockDiagram.png)
 1. Rename the demodulator block to "afsk_demod" and the DMA block to "afsk_dma".
 1. Combine the afsk_demod and afsk_dma blocks into a hierarchy called "demodulator".
 1. Combine the DMA and demodulator blocks into a hierarchy called 
 1. Generate the HDL wrapper by clicking on the design in the Sources box, right clicking, and selecting "Generate HDL Wrapper".
 1. Generate the bitstream. This will take a long time.
 1. Export the block design (File|Export|Export Block Design...)
 1. Collect the following files:
    - Vivado.srcs/sources_1/bd/design_1/hw_handoff/design_1.hwh
    - Vivado.runs/impl_1/design_1_wrapper.bit
    - design_1.tcl
    * rename these file to "project_01.{ext}" so that you have project_01.bit, project_01.tcl and project_01.hwh
 1. On the mounted Pynq create, create a directory called `pynq/overlays/afsk_demodulator/` and copy these three files there.
    ```bash
mkdir -p /var/run/media/${USER}/PYNQ/pynq/overlays/afsk_demodulator
cp project_01.{tcl,bit,hwh} /var/run/media/${USER}/PYNQ/pynq/overlays/afsk_demodulator/
```
 1. Copy the 

In [5]:
from pynq import Overlay, Xlnk
import pynq.lib.dma
import numpy as np

Copy over the following files:

 - project_01.bit
 - project_01.tcl
 - project_01.hwh (hardware hand-off)

These belong in an overlay directory.

We then load the bitstream and refer to the overlay's embedded dma module by a more convenient local variable.

In [8]:
overlay = Overlay('/home/xilinx/pynq/overlays/afsk_demodulator/afsk_demodulator.bit')
dma = overlay.demodulator.afsk_dma

Now we need to construct the arrays used to transfer the data via DMA and initialize the output data.

We then send the data, initiate the read, then wait for both to complete.

Finally we print the output.

In [9]:
xlnk = Xlnk()
out_buffer = xlnk.cma_array(shape=(264,), dtype=np.int16)
in_buffer = xlnk.cma_array(shape=(264,), dtype=np.int16)
for i in range(264):
    out_buffer[i] = 16384

dma.sendchannel.transfer(out_buffer)
dma.recvchannel.transfer(in_buffer)
dma.sendchannel.wait()
dma.recvchannel.wait()

print(in_buffer)

[10240 10240 10240 10240 10240 10240 10240 10240 10240 10240 10240 10240
 10240 10240 10240 10240 10240 10240 10240 10240 10240 10240 10240 10240
 10240 10240 10240 10240 10240 10240 10240 10240 10240 10240 10240 10240
 10240 10240 10240 10240 10240 10240 10240 10240 10240 10240 10240 10240
 10240 10240 10240 10240 10240 10240 10240 10240 10240 10240 10240 10240
 10240 10240 10240 10240 10240 10240 10240 10240 10240 10240 10240 10240
 10240 10240 10240 10240 10240 10240 10240 10240 10240 10240 10240 10240
 10240 10240 10240 10240 10240 10240 10240 10240 10240 10240 10240 10240
 10240 10240 10240 10240 10240 10240 10240 10240 10240 10240 10240 10240
 10240 10240 10240 10240 10240 10240 10240 10240 10240 10240 10240 10240
 10240 10240 10240 10240 10240 10240 10240 10240 10240 10240 10240 10240
 10240 10240 10240 10240 10240 10240 10240 10240 10240 10240 10240 10240
 10240 10240 10240 10240 10240 10240 10240 10240 10240 10240 10240 10240
 10240 10240 10240 10240 10240 10240 10240 10240 10

Note that our FPGA project did exactly what it did in the unit tests:

$y[d] = \frac{5}{8} x[d]$