# DATA INPUT Processing Requirements 
Processing required on-chip in different categories
## DSP
- FIR / CIC filters (anti-aliasing, bandpass, decimation)
- Polyphase filterbanks / decimation
- Hilbert transforms / envelope detection
## Real-Time correlation, peak-picking
- Correlate incoming waveform with transmitted waveform (sliding window multiply / accumulate)
	- So for this you'd need multiplication units
- Sub-sample interpolation for time-stamp (fitting a parabola around the correlation peak)
## Deterministic Triggering / Time-Samping
- Hardware time-stamping (latch a counter as soon as the signal is sent - No OS jitter here)
- Multi-channel synchronization (not valid here since only a single channel is present)
## On-the-fly Spectral / Distortion Analysis
- Streaming FFT: continuously transferring data into spectral bins in parallel with everything else
- Spectral windowing / harmonic tracking (cascade windowing FIRs, FFT and FIR / IIR trackers to follow distortion in parallel on spectral bins)

## Stage 1: raw data incoming

### Input data buffering

In [4]:
'''
- 10-bit (10 data-pins) clocked at 35 MHz
- These samples need to be buffered so you have about n_cycles cycles
'''
n_cycles = 100
freq_sample_max = 35e6
freq_exc_min = 1e6
bytes_per_cycle = (10 / 8) * (freq_sample_max / freq_exc_min)
n_bytes = bytes_per_cycle * n_cycles
print(f"total bytes: {n_bytes}, bytes per cycle {bytes_per_cycle}")

total bytes: 4375.0, bytes per cycle 43.75


### Input
- 10-bit (10 data-pins) clocked at 35 MHz
- Shifted in phase and gain according to the analog input stage
- This needs to be buffered


### Processing



## DSP
- FIR / CIC filters (anti-aliasing, bandpass, decimation)
- Polyphase filterbanks / decimation
- Hilbert transforms / envelope detection
## Real-Time correlation, peak-picking
- Correlate incoming waveform with transmitted waveform (sliding window multiply / accumulate)
	- So for this you'd need multiplication units
- Sub-sample interpolation for time-stamp (fitting a parabola around the correlation peak)
## Deterministic Triggering / Time-Samping
- Hardware time-stamping (latch a counter as soon as the signal is sent - No OS jitter here)
- Multi-channel synchronization (not valid here since only a single channel is present)
## On-the-fly Spectral / Distortion Analysis
- Streaming FFT: continuously transferring data into spectral bins in parallel with everything else
- Spectral windowing / harmonic tracking (cascade windowing FIRs, FFT and FIR / IIR trackers to follow distortion in parallel on spectral bins)

# MCU vs FPGA

## STM32
- Needs an external ADC anyways, internal ADC's never go above 10 MHz
- Needs a high clock speed to clock the 20 MHz data-rate with the IO-pins
	- So would require clock speeds > 100 MHz which increases the cost
	- Even with clock speeds > 100 MHz it won't be able to do complex calculations in parallel

## RP2040
- Clock speeds up to 200 MHz
- Fast enough for data acquisition and passing on however
	- Not enough speed for data-processing

## FPGA
Base your choice on:
- Max clock-speed required
- Number of LUT's required
- Number of multiplier elements required
- Number of pins required
	- Peripherals required (external + SRAM + flash communication)


### LUT's



### Lattice UP5K (used in lit3rick)
- 5280 LUT's
- 120 kbits of EBR RAM
- 1024 kbits of SPRAM
- I2C core x 2
- SPI core x 2
- 10 kHz and 48 MHz oscillator
#### Multiply and accumulator blocks: 8
Multiplier:
- Input: 2 16-bit numbers
- Produce 32-bit output in a single cycle

Accumulator:
- Input: A, B, ACC_old
- Output: ACC_new = ACC_old + (A x B)

**Usefull for**
- FIR / IIR filters
	- Note: for an N-tap filter we'd need N multiply-accumulate blocks in parallel.
	- For a single multiplier block we would normally require about 
		- 320 LUT's (16x16 multiplier)
		- 88 LUT's (8x8 multiplier)

### Lattice ICE40HX4K-TQ144

### Lattice ECP5-25 (LFE5U-25F-6BG256C6)
- 24 k luts
- 1008 kb Embedded memory
- 194 kb distributed RAM
- 28 18x18 multipliers
- 2/2 PLL's
- BGA



### XC6SLX25-2FTG256C
- 24k luts
- 38 DSP slices
- 936 kb ram
- 266 max user I/O's

### GOWIN GW1N-LV9LQ144C6/I5
- Good option
- Really good price / quality wise
- 144 pinouts
### Internal clock generation
- Up to 600 MHz (does this make sense?)

### GW1NR vs GW1N
The main difference between the GW1N series and the GW1NR series is that the GW1NR series integrates abundant Memory chip. The GW1NR series also provides low power consumption, instant on, low cost, non-volatile, high security, various packages, and flexible usage.


## Lattice ECP5-25 (LFE5U-25F-6BG256C6)
Preferred choice due to the large amount of LUT's, DSP blocks, clock capacbility and price.

### Clock
- Check for fanout / drive-strength of FPGA
- Check for appropriate input clock
#### Clock output to MS9280
- Make sure you can generate a 35 MHz clock either through some external oscillator, or internally.

#### Clock input from external oscillator
- 

### LED Drivers
- Check for LED driver pin location, try to place LED's close.


### Flash

### RAM / DDR


# Memory
## RAM

### Asynchronous SRAM Memory
SRAM: slower (100 MHz read/write) and lower cost
- Can only take in a few ms of data (kB up to a fw Mb)
- Easier to layout

NOTE: my FPGA will probably max out at a clock speed of about 200-300 MHz. (example when using Lattice FPGA's)

### DDR-ram
- Faster, smaller FIFO's

## FLASH
### SPI-NOR Flash
