# AFSK Demodulator
## Step 2: Band Pass FIR Filter

-----

This notebook will outline the steps necessary to move the band-pass FIR filter to FPGA.

This code is part of the [AFSK Demodulator on Pynq](afsk-demodulator-fpga.ipynb) project.

The purpose of this code is to serve as the foundation for migrating the Python demodulator code to FPGA.  We will be streaming audio data into the FPGA and streaming processed data out from the FPGA.

This is the first step of moving a demodulator processing step into the FPGA.

## Prerequisites

At this point you are expected to have:

 * A configured PYNQ environment.
 * Vivado installed on your computer and configured for your board.
 * Experience working through the tutorials at https://pynq.readthedocs.io/.
 * Familiarized yourself with the AFSK demodulator implementation in Python.
 * Completeed the first step of the tutorial to familiarize yourself with the process of creating a streaming interface.

## Outline

We are going to modify the IP we created in the first tutorial to do FIR processing.  We are going to use Python to generate the FIR coefficients (the same coefficients used in Python).

We will perform the following steps in this section:

 1. Create a C++ file that accepts a block of 16-bit data, performs the FIR operation, and sends the result back.
 1. Create a C++ test case for the above file (because good tests improve development speed).
 1. Generate an IP package from the code that can be used in Vivado.
 1. Create a Zynq project in Vivado that uses the IP.
 1. Export the bitstream for our project from Vivado.
 1. Use Python running on the PS to load the bitstream to the PL, and verify that it works.
 1. Integrate the FIR filter with the existing demodulator code, replacing the existing Python BPF.

First we are going to generate the FIR filter coefficients.  Then we are going to generate some sample data for our test bench. 

## Filter Coefficients

With the code below we will generate and output the coefficients for the band pass filter.

In [2]:
import numpy as np
from scipy.signal import lfiltic, lfilter, firwin
from scipy.io.wavfile import read

audio_file = read('../base/TNC_Test_Ver-1.102-26400-1sec.wav')
sample_rate = audio_file[0]
audio_data = audio_file[1]

bpf_coeffs = np.array(firwin(141, [1100.0/(sample_rate/2), 2300.0/(sample_rate/2)], width = None,
        pass_zero = False, scale = True, window='hann') * 32768, dtype=int)

print(bpf_coeffs)

[    0     0     0     0     0     0     1     3     5     8     8     5
    -2   -13   -27   -40   -46   -44   -32   -12    11    32    44    44
    32    14     0    -2    13    49    97   143   170   160   104     6
  -118  -244  -340  -381  -352  -258  -120    24   138   192   173    97
     0   -67   -56    62   287   575   850  1021  1001   737   228  -462
 -1216 -1879 -2293 -2336 -1956 -1182  -133  1008  2030  2736  2988  2736
  2030  1008  -133 -1182 -1956 -2336 -2293 -1879 -1216  -462   228   737
  1001  1021   850   575   287    62   -56   -67     0    97   173   192
   138    24  -120  -258  -352  -381  -340  -244  -118     6   104   160
   170   143    97    49    13    -2     0    14    32    44    44    32
    11   -12   -32   -44   -46   -40   -27   -13    -2     5     8     8
     5     3     1     0     0     0     0     0     0]


The above output are the filter coefficients we wil use for our bandpass filter.

## Test Bench Data

We will now generate the input and output data for our test bench.  Because we have a working Python model to work from, we will use its data as a baseline.

In [14]:
class fir_filter(object):
    def __init__(self, coeffs):
        self.coeffs = coeffs
        self.zl = lfiltic(self.coeffs, 32768, [], [])
    def __call__(self, data):
        result, self.zl = lfilter(self.coeffs, 32768, data, -1, self.zl)
        return result

bpf = fir_filter(bpf_coeffs)

delay = 12

f = bpf(audio_data[:264])
c = np.array([int(x >= 0) for x in f])
# Delay the data
d = np.append(np.zeros(delay, dtype=int), np.array(c[:0-delay], dtype=int))
# XOR the digitized data with the delayed version
x = np.logical_xor(c, d)


print(audio_data[:264])
print(np.array(x, dtype=int), len(x))

[  719   748   468   487   533   880  1187  1717  2124  2262  2417  2371
  2106  1794  1275   690     3  -721 -1382 -1855 -2227 -2378 -2383 -2243
 -1953 -1510  -958  -291   214   497   833   909   818   620   290  -207
  -787 -1396 -2019 -2434 -2756 -2914 -2901 -2762 -2424 -1954 -1371  -667
   -66   270   638   762   762   682   490   235   100   161   280   583
   913  1391  1576  1634  1685  1398  1093   658   255    94     2   105
   349   761  1288  1898  2303  2564  2793  2744  2612  2264  1851  1280
   586  -143  -830 -1336 -1795 -1993 -2038 -1917 -1622 -1209  -646    28
   598   929  1265  1382  1330  1190   843   387  -157  -776 -1420 -1866
 -2227 -2379 -2346 -2193 -1868 -1409  -796  -111   557   949  1380  1636
  1604  1550  1310   946   449  -113  -744 -1260 -1629 -1888 -1907 -1800
 -1579 -1171  -623    23   707  1176  1579  1826  1836  1802  1550  1144
   641    30  -639 -1236 -1742 -2039 -2141 -2132 -1915 -1584 -1074  -460
   237   790  1137  1509  1588  1497  1286   937   

  out_full[ind] += zi
  out = out_full[ind]
  zf = out_full[ind]


The data above represents 10ms of audio data, which is enough to get us started.

## Vivado HLS

A good resource for this section is [Vivado Design Suite
User Guide - High Level Synthesis (UG902)](https://www.xilinx.com/support/documentation/sw_manuals/xilinx2017_4/ug902-vivado-high-level-synthesis.pdf).  This documents the FIR function we will be using.

 1. Start Vivado HLS.
    ```bash
    vivado_hls
    ```
 1. Create a new project under the project_02 directory call HLS.
 1. Create a top-level function called demodulate2.
 1. Create 2 new files:
    * [demodulate.hpp](HLS/demodulate.hpp)
    * [demodulate.cpp](HLS/demodulate.cpp)
 1. Create a new test bench:
    * [demodulate_test.cpp](HLS/demodulate_test.cpp)
 
The important part of this module is to implement the AXI streaming interface in C++ that we will be using for the remainder of the projects.  There are two important steps:

 1. Defining the C++ data types required for an AXI streaming interface.
 1. Adding the HLS `pragma` entries to the code.

-----

This is the header:

```c++
#include <ap_axi_sdata.h>
#include <hls_stream.h>
#include <stdint.h>

#define BPF_COEFF_LEN 141
#define BLOCK_SIZE 264

typedef ap_axis<16,1,1,1> stream_type;

void demodulate2(stream_type input[BLOCK_SIZE], stream_type output[BLOCK_SIZE]);

```

And this is the source:

```c++
#include "demodulate.hpp"

const int16_t bpf_coeffs[] =
{    0,     0,     0,     0,     0,     0,     1,     3,     5,     8,     8,     5,
    -2,   -13,   -27,   -40,   -46,   -44,   -32,   -12,    11,    32,    44,    44,
    32,    14,     0,    -2,    13,    49,    97,   143,   170,   160,   104,     6,
  -118,  -244,  -340,  -381,  -352,  -258,  -120,    24,   138,   192,   173,    97,
     0,   -67,   -56,    62,   287,   575,   850,  1021,  1001,   737,   228,  -462,
 -1216, -1879, -2293, -2336, -1956, -1182,  -133,  1008,  2030,  2736,  2988,  2736,
  2030,  1008,  -133, -1182, -1956, -2336, -2293, -1879, -1216,  -462,   228,   737,
  1001,  1021,   850,   575,   287,    62,   -56,   -67,     0,    97,   173,   192,
   138,    24,  -120,  -258,  -352,  -381,  -340,  -244,  -118,     6,   104,   160,
   170,   143,    97,    49,    13,    -2,     0,    14,    32,    44,    44,    32,
    11,   -12,   -32,   -44,   -46,   -40,   -27,   -13,    -2,     5,     8,     8,
     5,     3,     1,     0,     0,     0,     0,     0,     0,
};

template <typename InOut, typename Filter, size_t N>
InOut fir_filter(InOut x, Filter (&coeff)[N])
{
	static InOut shift_reg[N];

	int32_t accum = 0;
	filter_loop: for (size_t i = N-1 ; i != 0; i--)
	{
		shift_reg[i] = shift_reg[i-1];
		accum += shift_reg[i] * coeff[i];
	}

	shift_reg[0] = x;
	accum += shift_reg[0] * coeff[0];

	return static_cast<InOut>(accum >> 15);
}

void demodulate2(stream_type input[BLOCK_SIZE], stream_type output[BLOCK_SIZE])
{
#pragma HLS INTERFACE axis port=input
#pragma HLS INTERFACE axis port=output
#pragma HLS interface ap_ctrl_none port=return

	demod_loop: for (size_t i = 0; i != BLOCK_SIZE; ++i) {
		stream_type tmp = input[i];
		tmp.data = fir_filter(input[i].data, bpf_coeffs);
		output[i] = tmp;
	}
}
```


-----

The `ap_axis` template type is specialized for a 16-bit transfer.  The three 1's are for fields we are not using, but which are required as they are part of the data type (User, ID and Dest).

We also define the block size used for our transfers.  We are going to transfer 264 16-bit entries at a time.  Our audio data is 16-bit, 26.4k samples per second.  The sample size chosen provides 10ms of audio data.  We have two competing goals: maximize throughput efficiency and reduce latency.  We need low latency in order to quickly detect a carrier signal if one exists.  10ms is a reasonable compromise.  We can go lower given the performance of the FPGA, but for now we will use the 264 sample block size.

Once the code and test bench are written, we need to run the C simulation, C synthesis, C/RTL co-simulation, then package the IP.  The two simulation steps run our test bench.  This verifies that the code will sythesize properly and that it functions properly.  For a software engineer, this is the same as compiling and running unit tests.

A word of note regarding HLS -- Vivado HLS error messages can be rather opaque.  And Vivado HLS appears to be changing rapidly, so fixes to problems given 2-3 years ago may be out of date.

Once the IP is packaged, we are done in HLS.

## Vivado

We will now switch over to Vivado and create a block design.

 1. Start Vivado and create a new project.
 1. Give it a path -- in our case `afsk-demodulator-pynq/project_02` and the name `Vivado`.
 1. Select the `RTL Project` project type.
 1. In the "Default Part" screen, switch to the "Boards" tab. Select the your board from the list.
 1. Click "Finish".
 
With the new project open in Vivado, we need to create a block design.

 1. On the right side, in the Flow Navigator, select *Create Block Diagram*.
 1. Use the default name, design_1.
 1. Go into Tools|Settings.
    1. In the settings dialog, choose IP|Repository.
    1. Select "+" to add a repository.
    1. Add Project_02/HLS as a repository.  You should see that it has 1 IP called `demodulate2` in there.
    1. When done, click "OK".
 1. In the Diagram view (main window) select "+" to add IP.
 1. Add the Zynq processing system and run block automation.
 1. When done, double-click the Zynq block and find the *High-performance AXI Slave Ports*.
 1. Click on the High-performance AXI Slave Ports.
 1. Enable the *S AXI HP0 interface*, then click OK.
 1. Add an AXI Stream Interconnect, AXI Direct Memory Access and the demodulator IP.
 1. Open the AXI Direct Memory Access, disable scatter/gather, and set the stream widths to 16 bits.
 1. Wire up the demodulator to the AXI Direct Memory Access and run connection automation.
    * A few additional modules are added: AXI SmartConnect, AXI Interconnect, and Processor System Reset
![BlockDiagram](BlockDiagram.png)
 1. Rename the demodulator block to "bpf" and the DMA block to "bpf_dma".
 1. Combine the bpf and bpf_dma blocks into a hierarchy called "filter".
 1. Generate the HDL wrapper by clicking on the design in the Sources box, right clicking, and selecting "Generate HDL Wrapper".
 1. Generate the bitstream. This will take a long time.  On my desktop it takes about 5 minutes.
 1. Export the block design (File|Export|Export Block Design...)
 1. Collect the following files:
    - Vivado.srcs/sources_1/bd/design_1/hw_handoff/design_1.hwh
    - Vivado.runs/impl_1/design_1_wrapper.bit
    - design_1.tcl
    * rename these file to "project_02.{ext}" so that you have project_02.bit, project_02.tcl and project_02.hwh
 1. On the mounted Pynq create, create a directory called `pynq/overlays/afsk_demodulator/` and copy these three files there.
    ```bash
mkdir -p /var/run/media/${USER}/PYNQ/pynq/overlays/afsk_demodulator
cp project_02.{tcl,bit,hwh} /var/run/media/${USER}/PYNQ/pynq/overlays/afsk_demodulator/
```
 1. You can now jump to the Jupyter notebook on the Pynq device.