# Mixed-Signal mmWave Radar Motion Compensation Accelerator

***
**Single-Person Team: Nikhil Poole, Stanford University Department of Electrical Engineering**

***
## Introduction

This notebook documents the design and silicon verification of a custom mixed-signal mmWave radar motion compensation edge accelerator engine implemented using the open-source SkyWater 130-nm process technology. The chip is designed to carry out edge-based, real-time inertial (IMU) sensor fusion for vibratory motion correction on a mmWave radar platform via a combination of custom analog front-end vibration parameter detection circuitry and a digital back-end motion deconvolution kernel generator.

Architecture-wise, the IC contains an analog front-end, used to process the frequency and amplitude of the IMU-detected platform vibration, and consisting of amplification/filtering circuits, a frequency detection type-II PLL, peak detector and sampling circuits, and two 8-bit SAR ADCs. The digitized information is then processed by a back-end custom DSP engine responsible for formulating and storing the motion deconvolution kernel frequency response phase/magnitude data in a series of on-chip SRAM banks for subsequent serial readout by a radar interface controller. Thus, the chip is highly modularized, consisting of several adjustable/programmable analog/digital circuit building blocks (e.g., current-starved ring oscillator, PLL, amplifiers, filters, ADCs, SRAMs). Indeed, many of the blocks are tunable via off-chip inputs, allowing for integration and reuse in other custom designs for analog/mixed-signal and digital processing applications.

In summary, through a completely open-source design and pre-tapeout validation process, the presented IC demonstrates how open-source tools and technology can be used in the modular design of chips for high-resolution and/or real-time sensing applications.

***
## Motivating Application
Compared with alternative sensing modalities, mmWave radar represents an optimal vehicle for edge-based perception, given its suitability for 3D imaging and relative immunity to adverse environmental conditions. However, while the proposed mixed-signal edge accelerator IC is designed specifically for integration into multi-sensor mmWave RF systems, the circuit building blocks themselves and design methodology may be applied for any high-resolution and/or real-time sensing application.

Any reliable system design must address one aspect inherent in any active intelligent mmWave edge device and detrimental to signal reception: the motion of the sensor itself, an issue, for instance, with automotive vehicles, UAVs, and wearable devices, as well as other motion tracking applications. To account for such arbitrary or deterministic parasitic motion, the sensor must be capable of filtering and deconvolving the RX returns to generate a noise-free signal. Here, we consider the effect of vibration on the radar platform; sensors mounted on an automotive engine, for instance, would be highly susceptible to such motion. The designed chip interfaces directly with an auxiliary analog accelerometer/IMU, which serves to detect the frequency and amplitude of the platform vibration. Real-time on-chip sensor fusion, on time intervals of less than 100 ms (corresponding to the mmWave radar frame-level time scale), is implemented to generate a frequency-domain deconvolution kernel/filter that may be subsequently applied to the simultaneously collected radar data off-chip, yielding the final corrected signal returns.

***
## Open-Source Design Repository

The open-source GitHub design repository for the chip can be found at: https://github.com/nhpoole/mixed_signal_mmwave_edge_accelerator. This repository consists of the following directory structure:

***
1. **designs/** : All analog schematics for the chip.
2. **digital_synth_pnr/** : The synthesis, place and route, and layout generation flows for the digital portions of the chip.
3. **images/** : Screen shots of the final chip, including the analog subsystem.
4. **layouts/** : Layouts for all the analog blocks.
5. **matlab_scripts/** : Post-processing and simulation MATLAB scripts for the various analog and digital designs.
6. **saved_gds_files/** : The final GDS files for the chip.
7. **simulations/** : SPICE design files and SPICE testbench files for the analog designs.
8. **verilog/** : RTL for the digital designs.
***

***
## Chip Architecture

The top-level architecture for the designed chip is shown below:

![chip_architecture](High_Level_Architecture.png)

***
### Analog Front-End

As depicted in the figure, the designed IC takes as an input the highpass-filtered single-channel x-axis acceleration voltage-domain signal from an analog accelerometer. An input amplifier chain, consisting of a single-ended-to-differential converter and programmable-gain amplifier, converts the input into a differential signal and provides four different gain settings (0.5x-gain, 1x-gain, 2x-gain, and 4x-gain), programmable via two digital input bits that switch in or out capacitors in the feedback network of the differential OTA-based amplifier. The OTA itself is implemented via a folded cascode architecture with common-mode feedback. Filtering of the raw accelerometer input signals is achieved via an active lowpass $G_m$-$C$ biquad filter, with the individual transconductor stages implemented via a simple single-stage differential OTA with common-mode feedback Off-chip filter capacitors on the order of nF yield a corner frequency of approximately 500 Hz, thereby removing noise components above the low-frequency vibration band. The filtered acceleration signal is then distributed to the independent frequency and amplitude processing chains to assess the nature of the vibration. Frequency detection is performed by first converting the vibration signal to a square waveform via a continuous-time comparator. The buffered single-ended output is then passed to a charge-pump-based low-frequency type-II PLL, which serves to lock onto the vibration frequency, generating an analog voltage at the charge pump output proportional to the detected frequency. The PLL loop, with its off-chip $RC$ loop filter, is designed for a worst-case 1-percent settling time of less than 500 ms, corresponding to maximum magnitude input frequency step from 0 Hz to 64 Hz. In a typical platform motion scenario, therefore, in which input frequency deviations are not nearly as large or abrupt, the PLL ideally provides sufficiently fast settling. Meanwhile, the amplitude processing chain consists of a simple differential-to-single-ended converter, followed by a peak detector circuit that generates at its output a voltage level corresponding to the maximum signal level achieved during the given sampling interval. The analog voltage outputs from the PLL charge pump and peak detector outputs are then sampled and digitized via separate 8-bit SAR ADCs designed using a capacitive DAC and a latched comparator, with the separate synchronous ADC controllers synthesized in the digital domain.

The detailed analog circuitry for these blocks is shown in the figures below.

![analog_circuitry](Analog_Circuitry.png)

***
### Digital Back-End

Meanwhile, the custom digital deconvolution generator engine takes as inputs the ADC-digitized frequency and amplitude values and generates a frequency-domain transfer function vector of length $N$ representing the deconvolution filter needed to correct the vibration-impaired radar data, where $N$ corresponds to the processing interval, with $N_{max}=1275$ corresponding to a maximum real-time temporal window of 95 ms. The top-level architecture of the deconvolution kernel generator engine, written in Verilog, is shown below.

![dsp_engine](Top_Level_Digital_Processing_Unit.png)

The estimated deconvolution kernel takes the form of a second-order IIR notch filter, which contains notches at the phase bins corresponding to the detected vibration frequency, with quality factor dependent upon the detected vibration amplitude. Estimation of the appropriate transfer function is implemented by a lookup-based filter coefficient determination procedure, while evaluation of the transfer function at each stored complex frequency/phase is executed by the three-stage pipeline shown below:

![evaluation_pipeline](IIR_Notch_Filter_Estimator.png)

This three-stage pipeline, in combination with the lookup-based coefficient determination algorithm, greatly accelerates kernel generation, allowing for real-time operation at short time scales.

***
## Design Procedure

The design of the chip was implemented using the following primary open-source tools:

***
1. **Xschem:** Used to draw all analog circuit schematics (individual and top-level) and to generate the corresponding SPICE netlists.
2. **Ngspice:** Used to simulate and verify correct operation of all analog circuits (individual and top-level). Post-processing of SPICE output files was implemented in MATLAB.
3. **Magic:** Used to implement the manual layout of all analog circuit blocks and for all DRC checks. Also used to integrate the final analog layout with the synthesized digital back-end LEF file for overall evaluation.
4. **Netgen:** Used for implementing LVS for all analog circuit blocks (individual and top-level) and for the top-level mixed-signal design.
5. **Mflowgen + Cadence Innovus**: Used to create a modular procedure for digital back-end logic synthesis, floorplanning, power planning, clock tree synthesis, place-and-route, and eventual signoff timing checks. Also used for merging the final analog and digital GDS files to obtain the final mixed-signal design.
***

To begin with, all necessary tools, PDKs, and dependencies must be installed. Additional needed tools such as *Xschem* and *Mflowgen* may be cloned and installed using the respective open-source GitHub repositories.

In [None]:
# Install necessary tools and dependencies.

import os
import pathlib
import sys

!pip install matplotlib pandas pyinstaller
!apt-get install -y ruby-full time build-essential
!apt install -f libqt4-designer libqt4-xml libqt4-sql libqt4-network libqtcore4 libqtgui4
!curl -Ls https://micro.mamba.pm/api/micromamba/linux-64/latest | tar -xvj bin/micromamba
conda_prefix_path = pathlib.Path('conda-env')
site_package_path = conda_prefix_path / 'lib/python3.7/site-packages'
sys.path.append(str(site_package_path.resolve()))
CONDA_PREFIX = str(conda_prefix_path.resolve())
PATH = os.environ['PATH']
LD_LIBRARY_PATH = os.environ.get('LD_LIBRARY_PATH', '')
%env CONDA_PREFIX={CONDA_PREFIX}
%env PATH={CONDA_PREFIX}/bin:{PATH}
%env LD_LIBRARY_PATH={CONDA_PREFIX}/lib:{LD_LIBRARY_PATH}
!bin/micromamba create --yes --prefix $CONDA_PREFIX
!echo 'python ==3.7*' >> {CONDA_PREFIX}/conda-meta/pinned
!bin/micromamba install --yes --prefix $CONDA_PREFIX \
                        --channel litex-hub \
                        --channel main \
                        open_pdks.sky130a \
                        magic \
                        netgen \
                        openroad \
                        yosys
!bin/micromamba install --yes --prefix $CONDA_PREFIX \
                        --channel conda-forge \
                        tcllib gdstk pyyaml click svgutils ngspice

***
### Analog Schematic Capture and SPICE Netlist Generation Via *XSchem*

As aforementioned, all custom circuit schematics were created using the open-source *Xschem* platform, which generates the SPICE netlists corresponding to the drawn circuits. Screenshots of several of the key analog circuit blocks are shown in the figure below (note, not all circuits or testbenches are shown but may be found in the open-source design repository for the chip).

![xschem_schematics](Xschem_Schematics.png)

Full integration of the analog circuitry yields the following top-level analog front-end, drawn in *Xschem*:

![analog_top_level_schematic](Analog_Top_Level_Schematic.png)

***
### Analog Front-End Layout and DRC Via *Magic* + LVS and Parasitics Extraction Via *Netgen*

Layouts of all analog front-end circuits were implemented manually using *Magic*, with several aspects of the layout scripted for convenient reuse across cells (e.g., guard ring generation, $n$-well placement, power rail generation, etc.). *Magic* also provides built-in DRC checks using the SkyWater 130-nm PDK design rules. Layouts of the key circuit blocks are shown below; again, not all layouts are included for conciseness but may be found in the open-source repository for the chip. Note that all power rails are routed on the top two metal layers (M4 and M5), thereby maintaining consistency with the digital back-end power rail generation for convenient eventual mixed-signal integration. Furthermore, all individual circuit blocks are implemented using standard techniques such as common-centroid layout and interdigitation to yield robustness to local wafer variation.

![magic_layouts](Magic_Layouts.jpg)

As described previously, script-automated layout was implemented for guard rings, power rails, and (deep) $n$-wells. Furthermore, scripts were used to perform LVS using *Netgen* and for parasitics extraction. A specific set of example scripts used for the SAR ADC latched comparator layout, LVS, and parasitic extraction is shown below, but similar duplicated (and appropriately adjusted) script sets were used for all other analog blocks as well.

#### Power rail, guard ring, and well generation example TCL script for SAR ADC latched comparator:

In [None]:
# Set up power rails, guard rings, wells, etc.

proc shift_to_center {} {
	set res1 [box size]
	move [expr {-[lindex $res1 0] / 2}]i [expr {-[lindex $res1 1] / 2}]i
}

proc place_nmos {x_center y_center width length nf index} {
	puts $x_center
	select clear
	box [expr $x_center]um [expr $y_center]um [expr $x_center]um [expr $y_center]um  
	magic::gencell sky130::sky130_fd_pr__nfet_01v8 [format "xm%d" $index] w $width l $length nf $nf m 1 diffcov 100 polycov 60 poverlap 0 doverlap 1 topc 1 botc 1 guard 0 full_metal 0 viagate 50
	shift_to_center
}

proc place_pmos {x_center y_center width length nf index} {
	select clear
	box [expr $x_center]um [expr $y_center]um [expr $x_center]um [expr $y_center]um  
	magic::gencell sky130::sky130_fd_pr__pfet_01v8_lvt [format "xm%d" $index] w $width l $length nf $nf m 1 diffcov 100 polycov 60 poverlap 0 doverlap 1 topc 1 botc 1 guard 0 full_metal 0 viagate 50
	shift_to_center
}

# Draw guard ring for 01V8_lvt PMOS.
proc draw_nguard {lx ly ux uy} {

	set center_x [expr ($lx + $ux)/2]
	set center_y [expr ($ly + $uy)/2]

	box [expr $center_x]um [expr $center_y]um [expr $center_x]um [expr $center_y]um
	pushbox
	# Load dict.

	set parameters [sky130::sky130_fd_pr__pfet_01v8_lvt_defaults]
	# Param dict copied from sky130::sky130_fd_pr__pfet_01v8_lvt_draw.
	set newdict [dict create \
	    gate_type		pfetlvt \
	    diff_type 		pdiff \
	    diff_contact_type	pdc \
	    plus_diff_type	nsd \
	    plus_contact_type	nsc \
	    poly_type		poly \
	    poly_contact_type	pc \
	    sub_type		nwell \
	    dev_sub_type	nwell \
	    gate_to_polycont	0.32 \
	    min_effl		0.185 \
	    min_allc		0.26 \
	]
	set drawdict [dict merge $sky130::ruleset $newdict $parameters]
	dict set drawdict viagb 100
	dict set drawdict viagt 100
	dict set drawdict viagr 90
	dict set drawdict viagl 90
	dict set drawdict contact_size 0.5
	dict set drawdict via_size 0.5
	dict set drawdict full_metal 1

	set contact_size [dict get $drawdict contact_size]
	set diff_surround [dict get $drawdict diff_surround]
	set sub_surround [dict get $drawdict sub_surround]

	# Calculate gx and gy. 

	set gw [expr ($ux-$lx - ($contact_size + $diff_surround + $diff_surround + $sub_surround + $sub_surround) - 0.3)]
	set gh [expr ($uy-$ly - ($contact_size + $diff_surround + $diff_surround + $sub_surround + $sub_surround) - 0.3)]


	# sky130::guard_ring $gw $gh $drawdict
    
	# Finish painting metal.
	# box [expr $center_x - $gw/2 - $contact_size/2 - 0.03]um [expr $center_y - $gh/2 - $contact_size/2 - 0.03]um [expr $center_x - $gw/2 + $contact_size/2 + 0.03]um [expr $center_y + $gh/2 + $contact_size/2 + 0.03]um
	# paint m1
	# box [expr $center_x + $gw/2 - $contact_size/2 - 0.03]um [expr $center_y - $gh/2 - $contact_size/2 - 0.03]um [expr $center_x + $gw/2 + $contact_size/2 + 0.03]um [expr $center_y + $gh/2 + $contact_size/2 + 0.03]um
	# paint m1

	# Connection to m4 power rail.
	# 3x1.5 via in the corners
	# Bottom right via.
	box [expr $center_x + $gw/2 - $contact_size/2 - 0.03 -3]um [expr $center_y + $gh/2 - $contact_size/2 - 0.03 - 1.5]um [expr $center_x + $gw/2 - $contact_size/2 - 0.03]um [expr $center_y + $gh/2 - $contact_size/2 - 0.03]um
	sky130::via1_draw
	sky130::via2_draw
	sky130::via3_draw

	box [expr $center_x - $gw/2 + $contact_size/2 + 0.03]um [expr $center_y + $gh/2 - $contact_size/2 - 0.03 - 1.5]um [expr $center_x - $gw/2 + $contact_size/2 + 0.03 + 3]um [expr $center_y + $gh/2 - $contact_size/2 - 0.03]um
	sky130::via1_draw
	sky130::via2_draw
	sky130::via3_draw
}

# Draw guard ring for 01V8 NMOS.
# If you place pguard and nguard too close, things get oddly shifted.
proc draw_pguard {lx ly ux uy} {

	set center_x [expr ($lx + $ux)/2]
	set center_y [expr ($ly + $uy)/2]

	box [expr $center_x]um [expr $center_y]um [expr $center_x]um [expr $center_y]um
	pushbox
	# Load dict.

	set parameters [sky130::sky130_fd_pr__nfet_01v8_defaults]
	# Param dict copied from sky130::sky130_fd_pr__nfet_01v8_draw.
	set newdict [dict create \
	    gate_type		nfet \
	    diff_type 		ndiff \
	    diff_contact_type	ndc \
	    plus_diff_type	psd \
	    plus_contact_type	psc \
	    poly_type		poly \
	    poly_contact_type	pc \
	    sub_type		psub \
	    min_effl		0.185 \
	    min_allc		0.26 \
    ]
	set drawdict [dict merge $sky130::ruleset $newdict $parameters]
	dict set drawdict viagb 100
	dict set drawdict viagt 100
	dict set drawdict viagr 90 
	dict set drawdict viagl 90
	dict set drawdict contact_size 0.5
	dict set drawdict via_size 0.5
	dict set drawdict full_metal 1

	set contact_size [dict get $drawdict contact_size]
	set diff_surround [dict get $drawdict diff_surround]
	set sub_surround [dict get $drawdict sub_surround]

	# Calculate gx and gy.

	set gw [expr ($ux-$lx - ($contact_size + $diff_surround + $diff_surround + $sub_surround + $sub_surround) - 0.3)]
	set gh [expr ($uy-$ly - ($contact_size + $diff_surround + $diff_surround + $sub_surround + $sub_surround) - 0.3)]

	# sky130::guard_ring $gw $gh $drawdict

	# Finish painting metal.
	# box [expr $center_x - $gw/2 - $contact_size/2 - 0.03]um [expr $center_y - $gh/2 - $contact_size/2 - 0.03]um [expr $center_x - $gw/2 + $contact_size/2 + 0.03]um [expr $center_y + $gh/2 + $contact_size/2 + 0.03]um
	# paint m1
	# box [expr $center_x + $gw/2 - $contact_size/2 - 0.03]um [expr $center_y - $gh/2 - $contact_size/2 - 0.03]um [expr $center_x + $gw/2 + $contact_size/2 + 0.03]um [expr $center_y + $gh/2 + $contact_size/2 + 0.03]um
	# paint m1

	# Connection to m4 power rail.
	# 3x1.5 via in the corners
	# Bottom right via.
	box [expr $center_x + $gw/2 - $contact_size/2 - 0.03 -3]um [expr $center_y - $gh/2 + $contact_size/2 + 0.03]um [expr $center_x + $gw/2 - $contact_size/2 - 0.03]um [expr $center_y - $gh/2 + $contact_size/2 + 0.03 + 1.5]um
	sky130::via1_draw
	sky130::via2_draw
	sky130::via3_draw
    
	box [expr $center_x - $gw/2 + $contact_size/2 + 0.03]um [expr $center_y - $gh/2 + $contact_size/2 + 0.03]um [expr $center_x - $gw/2 + $contact_size/2 + 0.03 + 3]um [expr $center_y - $gh/2 + $contact_size/2 + 0.03 + 1.5]um
	sky130::via1_draw
	sky130::via2_draw
	sky130::via3_draw
}

# Define cell boundaries.
# Variable cell width and height.

set cell_lx -17
set cell_ux 16.2

set cell_ly -16
set cell_uy 12

set nguard_lx [expr $cell_lx]
set nguard_ux [expr $cell_ux]
set nguard_uy [expr $cell_uy]
set pguard_lx [expr $cell_lx]
set pguard_ly [expr $cell_ly]
set pguard_ux [expr $cell_ux]
set np_boundary_y -0.4

set power_rail_width 4
set power_rail_layers "m4"

set vdd_rail_name "VDD"
set vss_rail_name "VSS"

# Draw guard ring for PFETs.
draw_nguard [expr $nguard_lx] [expr $np_boundary_y + 0.2] [expr $nguard_ux] [expr $nguard_uy]

# Draw guard ring for NFETs.
draw_pguard [expr $pguard_lx] [expr $pguard_ly] [expr $pguard_ux] [expr $np_boundary_y - 0.2]


# Place power rails. This should not need to be modified.

# Top power rail.
# lx ly ux xy
box [expr $cell_lx]um [expr $cell_uy - $power_rail_width]um [expr $cell_ux]um [expr $cell_uy]um
paint $power_rail_layers
# Label rail.
# box [expr $cell_lx]um [expr $cell_uy - $power_rail_width]um [expr $cell_lx]um [expr $cell_uy]um
# label $vdd_rail_name FreeSans 200

# Bottom power rail.
box [expr $cell_lx]um [expr $cell_ly]um [expr $cell_ux]um [expr $cell_ly + $power_rail_width]um
paint $power_rail_layers
# Label rail.
# box [expr $cell_lx]um [expr $cell_ly]um [expr $cell_lx]um [expr $cell_ly + $power_rail_width]um
# label $vss_rail_name FreeSans 200

#### LVS execution example TCL script for SAR ADC latched comparator:

In [None]:
# Run netgen LVS.

exec netgen -batch lvs "../../simulations/latched_comparator_folded_lvs.spice latched_comparator_folded_lvs" "latched_comparator_folded_lvs_layout.spice latched_comparator_folded_flat" $PDKPATH/libs.tech/netgen/sky130A_setup.tcl > netgen.log 2> netgen_err.log

#### Parasitics extractions example TCL script for SAR ADC latched comparator:

In [None]:
# LVS and parasitic extraction.

set vdd_rail_name "VDD"
set vss_rail_name "VSS"

load latched_comparator_folded.mag
box 0 0 0 0
flatten latched_comparator_folded_flat
load latched_comparator_folded_flat
box 0 0 0 0

# Power ports.
findlabel $vdd_rail_name
port make 
port use power
port class inout
findlabel $vss_rail_name
port make
port use ground
port class inout

# Cell-specific ports.
findlabel vip
port make
findlabel vim
port make
findlabel vop
port make
findlabel vom
port make
findlabel clk
port make
findlabel ibiasp
port make

port vip index 1
port vim index 2
port vop index 3
port vom index 4
port clk index 5
port ibiasp index 6
port $vdd_rail_name index 7
port $vss_rail_name index 8
save latched_comparator_folded_flat.mag

extract all
ext2spice lvs
ext2spice -o latched_comparator_folded_lvs_layout.spice

extract all
ext2sim labels on
ext2sim
extresist tolerance 10
extresist
ext2spice lvs
ext2spice cthresh 0.01
# ext2spice extresist on
ext2spice -o latched_comparator_folded_pex.spice

Full integration of the analog circuit layouts yields the following complete top-level analog front-end layout, both DRC- and LVS-verified. The top figure below represents the overall layout, while the bottom figure labels the key circuit blocks in their respective locations.
***
*Top-level analog front-end layout (unlabeled):*
![analog_top_level_layout_unlabeled](Analog_Top_Level_Layout_Unlabeled.png)

***
*Top-level analog front-end layout (labeled):*
![analog_top_level_layout_labeled](Analog_Top_Level_Layout_Labeled.jpg)

***
### Analog Front-End Simulations and Verification Via *Ngspice*

Verification of all analog circuits (individual and top-level chains) was implemented using the open-source *Ngspice* simulator, with post-processing visualization done via MATLAB. Example simulation plots are shown below for the various sub-circuits and may be regenerated via the "*spice_post_process.m*" MATLAB script in the GitHub design repository ("*/matlab_scripts*" directory), though the appropriate SPICE simulation data would first need to be generated using the simulation SPICE files in the "*/simulations*" directory.

![ngspice_simulations_1](Ngspice_Simulations_1.png)
![ngspice_simulations_2](Ngspice_Simulations_2.png)

***
### Digital Back-End Deconvolution Kernel Estimator: Generation Via *Mflowgen* + *Cadence Innovus*

Design of the custom DSP back-end deconvolution kernel generator engine was implemented fully in Verilog, with all source code files, including testbenches for the individual sub-modules, located in the "*/verilog*" directory of the GitHub design repository. Testbench files are indicated using the "*_tb*" suffix. Any testbench may be executed using the Makefile located in the same directory, and reproduced below:

In [None]:
# Makefile for digital blocks verification.

debug_iir: run_iir
	dve -full64 -vpd dump_iir.vcd &

run_iir: compile_iir
	./simv

compile_iir: 
	vcs -full64 -sverilog -timescale=1ns/1ps -debug_access+pp iir_notch_filter_tb.v iir_notch_filter.v cordic_polar_to_rect.v cordic_rect_to_polar.v fixed_pt_div.v

debug_iir_debug: run_iir_debug
	dve -full64 -vpd dump_iir_debug.vcd &

run_iir_debug: compile_iir_debug
	./simv

compile_iir_debug: 
	vcs -full64 -sverilog -timescale=1ns/1ps -debug_access+pp iir_notch_filter_debug_tb.v iir_notch_filter_debug.v cordic_polar_to_rect.v cordic_rect_to_polar.v fixed_pt_div.v

debug_adc: run_adc
	dve -full64 -vpd dump_adc.vcd &

run_adc: compile_adc
	./simv

compile_adc: 
	vcs -full64 -sverilog -timescale=1ns/1ps -debug_access+pp sar_adc_controller_tb.v sar_adc_controller.v

debug_sram: run_sram
	dve -full64 -vpd dump_sram.vcd &

run_sram: compile_sram
	./simv

compile_sram: 
	vcs -full64 -sverilog -timescale=1ns/1ps -debug_access+pp ram_sync_1rw1r_tb.v ram_sync_1rw1r.v sky130_sram_2kbyte_1rw1r_32x512_8.v sky130_sram_4kbyte_1rw1r_32x1024_8.v sky130_sram_8kbyte_1rw1r_32x2048_8.v

debug_input_sram_interface: run_input_sram_interface
	dve -full64 -vpd dump_input_sram_interface.vcd &

run_input_sram_interface: compile_input_sram_interface
	./simv

compile_input_sram_interface: 
	vcs -full64 -sverilog -timescale=1ns/1ps -debug_access+pp input_sram_interface_tb.v phase_vec_sram_interface.v tf_coeff_sram_interface.v deserializer.v serializer.v ram_sync_1rw1r.v sky130_sram_2kbyte_1rw1r_32x512_8.v sky130_sram_4kbyte_1rw1r_32x1024_8.v sky130_sram_8kbyte_1rw1r_32x2048_8.v

debug_output_sram_interface: run_output_sram_interface
	dve -full64 -vpd dump_output_sram_interface.vcd &

run_output_sram_interface: compile_output_sram_interface
	./simv

compile_output_sram_interface: 
	vcs -full64 -sverilog -timescale=1ns/1ps -debug_access+pp output_sram_interface_tb.v deconv_kernel_magnitude_sram_interface.v deconv_kernel_phase_sram_interface.v serializer.v ram_sync_1rw1r.v sky130_sram_2kbyte_1rw1r_32x512_8.v sky130_sram_4kbyte_1rw1r_32x1024_8.v sky130_sram_8kbyte_1rw1r_32x2048_8.v

debug_top_level: run_top_level
	dve -full64 -vpd dump_deconv_kernel_estimator_top_level.vcd &

run_top_level: compile_top_level
	./simv

compile_top_level: 
	vcs -full64 -sverilog -timescale=1ns/1ps -debug_access+pp deconv_kernel_estimator_top_level_tb.v deconv_kernel_estimator_top_level.v iir_notch_filter.v cordic_polar_to_rect.v cordic_rect_to_polar.v fixed_pt_div.v deconv_kernel_magnitude_sram_interface.v deconv_kernel_phase_sram_interface.v phase_vec_sram_interface.v tf_coeff_sram_interface.v deserializer.v serializer.v ram_sync_1rw1r.v sky130_sram_2kbyte_1rw1r_32x512_8.v sky130_sram_4kbyte_1rw1r_32x1024_8.v sky130_sram_8kbyte_1rw1r_32x2048_8.v

clean:
	rm -rf ./simv
	rm -rf simv.daidir/ 
	rm -rf *.vcd
	rm -rf csrc
	rm -rf ucli.key
	rm -rf vc_hdrs.h
	rm -rf DVEfiles

As described briefly above, the open-source *Mflowgen* ASIC design flow generator (https://github.com/mflowgen/mflowgen) was used to create a modular procedure for digital back-end logic synthesis, floorplanning, power planning, clock tree synthesis, place-and-route, and eventual signoff timing checks. This also includes the fully digital SAR ADC controllers.

To set up the design generation flow, a "*construct.py*" Python file is required, containing details on the individual design steps, input files needed for various steps, and the overall implementation procedural map. An example "*construct.py*" file is shown below for the primary DSP deconvolution kernel generator engine, and is located in the "*/digital_synth_pnr/deconvolution_kernel_generator_2ksram/design*" directory.

In [None]:
#! /usr/bin/env python
#=========================================================================
# construct.py
#=========================================================================
# Top-Level Deconvolution Kernel Estimator
#
# Author : Nikhil Poole
# Date   : July 9, 2021
#

import os
import sys

from mflowgen.components import Graph, Step

def construct():

  g = Graph()

  #-----------------------------------------------------------------------
  # Parameters
  #-----------------------------------------------------------------------

  adk_name = 'skywater-130nm-adk'
  adk_view = 'view-standard'

  parameters = {
    'construct_path' : __file__,
    'design_name'    : 'deconv_kernel_estimator_top_level',
    'clock_period'   : 20.0,
    'adk'            : adk_name,
    'adk_view'       : adk_view,
    'topographical'  : True,
    'testbench_name' : 'deconv_kernel_estimator_top_level_tb',
    'strip_path'     : 'deconv_kernel_estimator_top_level_tb/deconv_kernel_estimator_top_level_inst',
    'saif_instance'  : 'deconv_kernel_estimator_top_level_tb/deconv_kernel_estimator_top_level_inst'
  }

  #-----------------------------------------------------------------------
  # Create nodes
  #-----------------------------------------------------------------------

  this_dir = os.path.dirname( os.path.abspath( __file__ ) )

  # ADK step

  g.set_adk( adk_name )
  adk = g.get_adk_step()

  # Custom steps
  #-----------------------------------------------------------------------
  sram            = Step( this_dir + '/sram'                )
  rtl             = Step( this_dir + '/rtl'                 )
  testbench       = Step( this_dir + '/testbench'           )
  constraints     = Step( this_dir + '/constraints'         )
  rtl_sim         = Step( this_dir + '/cadence-xcelium-sim' )
  rtl_sim_vcs     = Step( this_dir + '/synopsys-vcs-sim'    )
  pin_placement   = Step( this_dir + '/pin-placement'       )
  floorplan       = Step( this_dir + '/floorplan'           )
  syn_compile     = Step( this_dir + '/synopsys-dc-compile' )

  # Power node is custom because power and gnd pins are named differently in
  # the standard cells compared to the default node, and the layer numbering is
  # different because of li layer, the default assumes metal 1 is the lowest
  # layer.
  power           = Step( this_dir + '/cadence-innovus-power'           ) 

  # Signoff is custom because it has to output def that the default step does
  # not do. This is because we use the def instead of gds for generating spice
  # from layout for LVS.
  signoff         = Step( this_dir + '/cadence-innovus-signoff'         ) 
  
  pt_power_rtl    = Step( this_dir + '/synopsys-ptpx-rtl'               )

  magic_drc       = Step( this_dir + '/open-magic-drc'                  )
  magic_def2spice = Step( this_dir + '/open-magic-def2spice'            )
  magic_gds2spice = Step( this_dir + '/open-magic-gds2spice'            )
  magic_gds2spice_nobbox = Step( this_dir + '/open-magic-gds2spice-nobbox' )
  netgen_lvs_def  = Step( this_dir + '/open-netgen-lvs-def-spice'       )
  netgen_lvs_def.set_name('netgen-lvs-def')
  
  netgen_lvs_gds  = Step( this_dir + '/open-netgen-lvs'                 )
  netgen_lvs_gds.set_name('netgen-lvs-gds')

  calibre_lvs     = Step( this_dir + '/mentor-calibre-comparison'       )
  calibre_lvs_nobbox = Step( this_dir + '/mentor-calibre-comparison-nobbox' )
  dont_use_cells  = Step( this_dir + '/cadence-innovus-dont-use-cells'  )

#  magic_antenna   = Step( this_dir + '/open-magic-antenna'              )

  # Default steps
  #-----------------------------------------------------------------------
  info            = Step( 'info',                          default=True )
  dc              = Step( 'synopsys-dc-synthesis',         default=True )
  iflow           = Step( 'cadence-innovus-flowsetup',     default=True )
  init            = Step( 'cadence-innovus-init',          default=True )
  place           = Step( 'cadence-innovus-place',         default=True )
  cts             = Step( 'cadence-innovus-cts',           default=True )
  postcts_hold    = Step( 'cadence-innovus-postcts_hold',  default=True )
  route           = Step( 'cadence-innovus-route',         default=True )
  postroute       = Step( 'cadence-innovus-postroute',     default=True )
  gdsmerge        = Step( 'mentor-calibre-gdsmerge',       default=True )
  pt_timing       = Step( 'synopsys-pt-timing-signoff',    default=True )

  icarus_sim      = Step( this_dir + '/open-icarus-simulation' )
  gl_sim          = rtl_sim_vcs.clone()
  gl_sim.set_name( 'gl-sim' )
  gen_saif_gl     = Step( 'synopsys-vcd2saif-convert',     default=True )
  gen_saif_gl.set_name( 'gen-saif-gl' )

  pt_power_gl     = Step( 'synopsys-ptpx-gl',              default=True )

  #-----------------------------------------------------------------------
  # Graph -- Add nodes
  #-----------------------------------------------------------------------

  g.add_step( info            )
  g.add_step( sram            )
  g.add_step( rtl             )
  g.add_step( testbench       )
  g.add_step( constraints     )
  g.add_step( syn_compile     )
  g.add_step( dc              )
  g.add_step( rtl_sim         )
  g.add_step( rtl_sim_vcs     )
  g.add_step( iflow           )
  g.add_step( pin_placement   )
  g.add_step( floorplan       )
  g.add_step( init            )
  g.add_step( power           )
  g.add_step( place           )
  g.add_step( cts             )
  g.add_step( postcts_hold    )
  g.add_step( route           )
  g.add_step( postroute       )
  g.add_step( signoff         )
  g.add_step( gdsmerge        )
  g.add_step( pt_timing       )
  g.add_step( pt_power_rtl    )
  g.add_step( gl_sim          )
  g.add_step( gen_saif_gl     )
  g.add_step( pt_power_gl     )
  g.add_step( magic_drc       )
  g.add_step( magic_def2spice )
  g.add_step( netgen_lvs_def  )
  g.add_step( magic_gds2spice )
  g.add_step( magic_gds2spice_nobbox )
  g.add_step( netgen_lvs_gds  )
  g.add_step( calibre_lvs     )
  g.add_step( calibre_lvs_nobbox )
  g.add_step( dont_use_cells  )

  #-----------------------------------------------------------------------
  # Graph -- Add edges
  #-----------------------------------------------------------------------

  # Dynamically add edges
 
  #dc.extend_inputs(['sky130_sram_8kbyte_1rw1r_32x2048_8_TT_1p8V_25C.db'])
  #dc.extend_inputs(['sky130_sram_8kbyte_1rw1r_32x2048_8.lef'])
  #dc.extend_inputs(['sky130_sram_4kbyte_1rw1r_32x1024_8_TT_1p8V_25C.db'])
  #dc.extend_inputs(['sky130_sram_4kbyte_1rw1r_32x1024_8.lef'])
  dc.extend_inputs(['sky130_sram_2kbyte_1rw1r_32x512_8_TT_1p8V_25C.db'])
  dc.extend_inputs(['sky130_sram_2kbyte_1rw1r_32x512_8.lef'])
  #dc.extend_inputs(['sky130_sram_1kbyte_1rw1r_32x256_8_TT_1p8V_25C.db'])
  #dc.extend_inputs(['sky130_sram_1kbyte_1rw1r_32x256_8.lef'])

  rtl_sim.extend_inputs(['freq_vec_data.txt'])
  rtl_sim.extend_inputs(['tf_coeff_data.txt'])
  #rtl_sim.extend_inputs(['sky130_sram_8kbyte_1rw1r_32x2048_8.v'])
  #rtl_sim.extend_inputs(['sky130_sram_4kbyte_1rw1r_32x1024_8.v'])
  rtl_sim.extend_inputs(['sky130_sram_2kbyte_1rw1r_32x512_8.v'])
  #rtl_sim.extend_inputs(['sky130_sram_1kbyte_1rw1r_32x256_8.v'])

  rtl_sim_vcs.extend_inputs(['freq_vec_data.txt'])
  rtl_sim_vcs.extend_inputs(['tf_coeff_data.txt'])
  #rtl_sim_vcs.extend_inputs(['sky130_sram_8kbyte_1rw1r_32x2048_8.v'])
  #rtl_sim_vcs.extend_inputs(['sky130_sram_4kbyte_1rw1r_32x1024_8.v'])
  rtl_sim_vcs.extend_inputs(['sky130_sram_2kbyte_1rw1r_32x512_8.v'])
  #rtl_sim_vcs.extend_inputs(['sky130_sram_1kbyte_1rw1r_32x256_8.v'])

  # extend saif out of synopsys simulation
  rtl_sim_vcs.extend_outputs(['run.saif'])

  #gl_sim.extend_inputs(['sky130_sram_8kbyte_1rw1r_32x2048_8.v'])
  #gl_sim.extend_inputs(['sky130_sram_4kbyte_1rw1r_32x1024_8.v'])
  gl_sim.extend_inputs(['sky130_sram_2kbyte_1rw1r_32x512_8.v'])
  #gl_sim.extend_inputs(['sky130_sram_1kbyte_1rw1r_32x256_8.v'])

  #pt_timing.extend_inputs(['sky130_sram_8kbyte_1rw1r_32x2048_8_TT_1p8V_25C.db'])
  #pt_timing.extend_inputs(['sky130_sram_4kbyte_1rw1r_32x1024_8_TT_1p8V_25C.db'])
  pt_timing.extend_inputs(['sky130_sram_2kbyte_1rw1r_32x512_8_TT_1p8V_25C.db'])
  #pt_timing.extend_inputs(['sky130_sram_1kbyte_1rw1r_32x256_8_TT_1p8V_25C.db'])

  #pt_power_rtl.extend_inputs(['sky130_sram_8kbyte_1rw1r_32x2048_8_TT_1p8V_25C.db'])
  #pt_power_rtl.extend_inputs(['sky130_sram_4kbyte_1rw1r_32x1024_8_TT_1p8V_25C.db'])
  pt_power_rtl.extend_inputs(['sky130_sram_2kbyte_1rw1r_32x512_8_TT_1p8V_25C.db'])
  #pt_power_rtl.extend_inputs(['sky130_sram_1kbyte_1rw1r_32x256_8_TT_1p8V_25C.db'])

  #pt_power_gl.extend_inputs(['sky130_sram_8kbyte_1rw1r_32x2048_8_TT_1p8V_25C.db'])
  #pt_power_gl.extend_inputs(['sky130_sram_4kbyte_1rw1r_32x1024_8_TT_1p8V_25C.db'])
  pt_power_gl.extend_inputs(['sky130_sram_2kbyte_1rw1r_32x512_8_TT_1p8V_25C.db'])
  #pt_power_gl.extend_inputs(['sky130_sram_1kbyte_1rw1r_32x256_8_TT_1p8V_25C.db'])

  gdsmerge.extend_inputs(['sky130_sram_2kbyte_1rw1r_32x512_8.gds'])
  netgen_lvs_def.extend_inputs(['sky130_sram_2kbyte_1rw1r_32x512_8.sp'])
  netgen_lvs_gds.extend_inputs(['sky130_sram_2kbyte_1rw1r_32x512_8.sp'])
  calibre_lvs.extend_inputs(['sky130_sram_2kbyte_1rw1r_32x512_8.sp'])
  calibre_lvs_nobbox.extend_inputs(['sky130_sram_2kbyte_1rw1r_32x512_8.sp'])
  magic_drc.extend_inputs(['sky130_sram_2kbyte_1rw1r_32x512_8.lef'])

  for step in [iflow, init, power, place, cts, postcts_hold, route, postroute, signoff]:
  #  step.extend_inputs(['sky130_sram_8kbyte_1rw1r_32x2048_8_TT_1p8V_25C.lib', 'sky130_sram_8kbyte_1rw1r_32x2048_8.lef'])
  #  step.extend_inputs(['sky130_sram_4kbyte_1rw1r_32x1024_8_TT_1p8V_25C.lib', 'sky130_sram_4kbyte_1rw1r_32x1024_8.lef'])
    step.extend_inputs(['sky130_sram_2kbyte_1rw1r_32x512_8_TT_1p8V_25C.lib', 'sky130_sram_2kbyte_1rw1r_32x512_8.lef'])
  #  step.extend_inputs(['sky130_sram_1kbyte_1rw1r_32x256_8_TT_1p8V_25C.lib', 'sky130_sram_1kbyte_1rw1r_32x256_8.lef'])

  init.extend_inputs(['floorplan.tcl', 'pin-assignments.tcl'])
  dc.extend_inputs(['compile.tcl'])

  # Connect by name
  g.connect_by_name( rtl,          rtl_sim      ) # design.v
  g.connect_by_name( sram,         rtl_sim      ) # design.v
  g.connect_by_name( testbench,    rtl_sim      ) # testbench.v
  g.connect_by_name( rtl,          rtl_sim_vcs  ) # design.v
  g.connect_by_name( sram,         rtl_sim_vcs  ) # design.v
  g.connect_by_name( testbench,    rtl_sim_vcs  ) # testbench.sv
  
  g.connect_by_name( rtl_sim_vcs,  dc           ) # run.saif
  g.connect_by_name( rtl,          dc           )
  g.connect_by_name( adk,          dc           )
  g.connect_by_name( constraints,  dc           )
  g.connect_by_name( sram,         dc           )
  g.connect_by_name( syn_compile,  dc           )

  g.connect_by_name( adk,             testbench       )
  g.connect_by_name( adk,             iflow           )
  g.connect_by_name( adk,             init            )
  g.connect_by_name( adk,             power           )
  g.connect_by_name( adk,             place           )
  g.connect_by_name( adk,             cts             )
  g.connect_by_name( adk,             postcts_hold    )
  g.connect_by_name( adk,             route           )
  g.connect_by_name( adk,             postroute       )
  g.connect_by_name( adk,             signoff         )
  g.connect_by_name( adk,             gdsmerge        )
  g.connect_by_name( adk,             magic_drc       )
  g.connect_by_name( adk,             magic_def2spice )
  g.connect_by_name( adk,             magic_gds2spice )
  g.connect_by_name( adk,             magic_gds2spice_nobbox )
  g.connect_by_name( adk,             netgen_lvs_def  )
  g.connect_by_name( adk,             netgen_lvs_gds  )
  g.connect_by_name( adk,             calibre_lvs     )
  g.connect_by_name( adk,             calibre_lvs_nobbox )
  g.connect_by_name( adk,             pt_timing       )
  g.connect_by_name( adk,             pt_power_rtl    )
  g.connect_by_name( adk,             pt_power_gl     )

  g.connect_by_name( sram,            gl_sim         )
  g.connect_by_name( sram,            iflow           )
  g.connect_by_name( sram,            init            )
  g.connect_by_name( sram,            power           )
  g.connect_by_name( sram,            place           )
  g.connect_by_name( sram,            cts             )
  g.connect_by_name( sram,            postcts_hold    )
  g.connect_by_name( sram,            route           )
  g.connect_by_name( sram,            postroute       )
  g.connect_by_name( sram,            signoff         )
  g.connect_by_name( sram,            gdsmerge        )
  g.connect_by_name( sram,            pt_timing       )
  g.connect_by_name( sram,            pt_power_rtl    )
  g.connect_by_name( sram,            pt_power_gl     )
  g.connect_by_name( sram,            magic_def2spice )
  g.connect_by_name( sram,            magic_gds2spice )
  g.connect_by_name( sram,            magic_gds2spice_nobbox )
  g.connect_by_name( sram,            netgen_lvs_def  )
  g.connect_by_name( sram,            netgen_lvs_gds  )
  g.connect_by_name( sram,            calibre_lvs     )
  g.connect_by_name( sram,            calibre_lvs_nobbox )
  g.connect_by_name( sram,            magic_drc       )

  g.connect_by_name( dc,              iflow           )
  g.connect_by_name( dc,              init            )
  g.connect_by_name( dc,              power           )
  g.connect_by_name( dc,              place           )
  g.connect_by_name( dc,              cts             )
  g.connect_by_name( dc,              pt_power_rtl    ) # design.namemap

  g.connect_by_name( iflow,           init            )
  g.connect_by_name( iflow,           power           )
  g.connect_by_name( iflow,           place           )
  g.connect_by_name( iflow,           cts             )
  g.connect_by_name( iflow,           postcts_hold    )
  g.connect_by_name( iflow,           route           )
  g.connect_by_name( iflow,           postroute       )
  g.connect_by_name( iflow,           signoff         )
  
  # Core place and route flow
  g.connect_by_name( floorplan,       init            )
  g.connect_by_name( pin_placement,   init            )
  g.connect_by_name( init,            power           )
  g.connect_by_name( power,           place           )
  g.connect_by_name( place,           cts             )
  g.connect_by_name( cts,             postcts_hold    )
  g.connect_by_name( postcts_hold,    route           )
  g.connect_by_name( route,           postroute       )
  g.connect_by_name( postroute,       signoff         )
  g.connect_by_name( signoff,         gdsmerge        )
  
  # DRC, LVS, timing signoff and power signoff
  g.connect_by_name( gdsmerge,        magic_drc       )
#  g.connect_by_name( gdsmerge,        magic_antenna   )

  # LVS using DEF
  g.connect_by_name( signoff,         magic_def2spice )
  g.connect_by_name( signoff,         netgen_lvs_def  )
  g.connect_by_name( magic_def2spice, netgen_lvs_def  )

  # LVS using GDS
  g.connect_by_name( gdsmerge,        magic_gds2spice )
  g.connect_by_name( gdsmerge,        magic_gds2spice_nobbox )
  g.connect_by_name( signoff,         netgen_lvs_gds  )
#  g.connect_by_name( magic_gds2spice, netgen_lvs_gds  )
  g.connect_by_name( magic_gds2spice_nobbox, netgen_lvs_gds  )

  # LVS comparision using Calibre with standard cells blackboxed
  g.connect_by_name( signoff,         calibre_lvs     )
  g.connect_by_name( signoff,         calibre_lvs_nobbox )
#  g.connect_by_name( magic_gds2spice, calibre_lvs     )
  g.connect_by_name( magic_gds2spice_nobbox, calibre_lvs     )
#  g.connect_by_name( magic_gds2spice, calibre_lvs_nobbox )
  g.connect_by_name( magic_gds2spice_nobbox, calibre_lvs_nobbox )
#  g.connect_by_name( magic_def2spice, calibre_lvs     )
#  g.connect_by_name( magic_def2spice, calibre_lvs_nobbox )

  g.connect_by_name( signoff,         pt_timing       )
  g.connect_by_name( signoff,         pt_power_rtl    )
  g.connect_by_name( rtl_sim_vcs,     pt_power_rtl    ) # run.saif
  g.connect_by_name( signoff,         pt_power_gl     )
  g.connect_by_name( gen_saif_gl,     pt_power_gl     ) # run.saif

  # Gate level simulation
  g.connect_by_name( adk,             gl_sim          )
  g.connect( signoff.o(   'design.vcs.pg.v'  ), gl_sim.i( 'design.vcs.v'     ) )
  g.connect( pt_timing.o( 'design.sdf'       ), gl_sim.i( 'design.sdf'       ) )
  g.connect( testbench.o( 'testbench.sv'     ), gl_sim.i( 'testbench.sv'     ) )
  g.connect( testbench.o( 'design.args.gls'  ), gl_sim.i( 'design.args'      ) )
  g.connect( gl_sim.o( 'design.vpd' ), gen_saif_gl.i( 'run.vcd' ) ) 

  # Add don't-use cells.
  init.extend_inputs(['dont-use-cells.tcl'])
  order = init.get_param( 'order' )
  order.append( 'dont-use-cells.tcl' )
  init.update_params( { 'order': order } )
  g.connect_by_name( dont_use_cells, init )

  #-----------------------------------------------------------------------
  # Parameterize
  #-----------------------------------------------------------------------

  g.update_params( parameters )

  return g

if __name__ == '__main__':
  g = construct()
# g.plot()

Additional step customization, with specific input and output files/directories are customized within individual sub-directories, all located in the "*/digital_synth_pnr/deconvolution_kernel_generator_2ksram/design*" directory. A summary of the design steps used in the deconvolution kernel generator design is shown in the terminal screenshot below (obtained via the *Mflowgen* command, "make list"). Note that DRC and LVS are implemented at the end of this design flow. Meanwhile, a visual graph of the various steps, including the input/output file connection network, can be generated via the *Mflowgen* command, "make graph"; the result for this particular design is included as a PDF file in this notebook's repository.

***
*Deconvolution kernel generator Mflowgen design steps:*

![dsp_back_end_mflowgen_steps](DSP_Back_End_Mflowgen_Steps.png)
***

During the actual design process, all intermediate files, outputs, and results are placed in a dedicated "*/build* directory; for the case of the deconvolution kernel generator design, this directory is the "*/digital_synth_pnr/deconvolution_kernel_generator_2ksram/build*" folder. Specific outputs (e.g., the final GDS file) for this particular design are situated in the "*digital_synth_pnr/deconvolution_kernel_generator_2ksram/results*" directory while final report files (e.g., timing, area, power, etc.) are located in the "*digital_synth_pnr/deconvolution_kernel_generator_2ksram/reports*" directory. A similar design directory structure exists for the SAR ADC controller design, located in the "*digital_synth_pnr/sar_adc_controller*" directory, as well as the final mixed-signal merged design, located in the "*digital_synth_pnr/chip_merge*" directory.

The final generated layouts for the SAR ADC controller and primary deconvolution kernel generator engine are shown below:

![digital_back_end_layouts](Digital_Back_End_Layouts.png)

***
### Merged Mixed-Signal Design Generation Via *Magic* and *Mflowgen* + *Cadence Innovus*

The final mixed-signal design for the chip was generated using a combination of methods. To begin with, the LEF file outputs of the deconvolution kernel generator and SAR ADC controllers were instantiated in the analog top-level design *Magic* layout, thereby allowing for manual routing of the digital ports at the periphery of these blocks, as well as connection of the digital power rails to the corresponding analog M4 and M5 power rails. Hence, all top-level routing to the chip I/O pads was implemented manually using Magic, with custom ESD protection circuits placed next to each pad. At this time, DRC and *Netgen* LVS was implemented to ensure correct connection to all chip boundary pins and digital I/O ports.

From here, the output GDS file for the top-level routed design (including all analog circuitry, I/O pads, ESD cells, and routing connections to the locations corresponding to the digital signal ports) was generated. A separate *Mflowgen* design was created to merge the top-level routed GDS with the generated GDS files for the individual digital SAR ADC controllers and primary deconvolution kernel generator. The design for this final mixed-signal merging step is located in the "*digital_synth_pnr/chip_merge*" directory of the GitHub design repository for the chip. The final merged GDS for the chip was thus generated and may be found in the "*digital_synth_pnr/chip_merge*" directory.

The final layout for the complete mixed-signal chip, with and without the labeled sub-sections, is shown below.

***
*Full chip layout (unlabeled):*
![full_chip_layout_unlabeled](Full_Chip_Layout_Unlabeled.png)

***
*Full chip layout (labeled):*
![full_chip_layout_labeled](Full_Chip_Layout_Labeled.jpg)

***
## Post-Silicon Verification

The fabricated chip was then measured and evaluated to assess its functionality. The detailed testing setup, including all chip I/O connections, is shown below.

![chip_testing_setup](Chip_Testing_Setup.png)

A Digilent AD2 arbitrary waveform generator was used to synthesize a waveform corresponding to recorded raw accelerometer data for an interval of time corresponding to the window of radar data, while simultaneously providing oscilloscope functionality for analog front-end signal debugging. This single-channel data synthesized by the AD2 module acted as the input signal to the analog front-end, with analog control and bias voltages generated via external bias circuitry. A Zynq-7000 Zybo Z7 ApSoC/FPGA served as the primary output data interface and testbench control engine, monitoring the serial output data, output-valid signal, and evaluation-done signals generated by the on-chip digital deconvolution kernel generator back-end. Furthermore, the FPGA provided the clock input, ADC trigger signal, reset signal, load/debug/ADC-bypass enable signals used for SRAM preload and back-end debugging, serial input data, and SRAM select bits necessary for operating the custom DSP engine. The integrated ARM Cortex-A9 processor recorded all generated kernel magnitude and phase data, in addition to the ADC, transfer function coefficient SRAM, and phase/frequency vector SRAM readout data used for debugging and verification purposes. Correct timing for data collection was ensured via an interrupt-based synchronization mechanism implemented on the FPGA, which indicated when a new cycle of data is available.

Now, at this point, it is pertinent to document an unexpected non-functionality of the chip analog-front end frequency detection PLL. Likely the result of layout-induced parasitics, which could not be fully evaluated given the absence of full $RC$-based extraction in the still-developing open-source tools, the charge pump output voltage was measured to be below the compatible voltage level for the SAR ADC comparator, resulting in a ring oscillator frequency well below the minimum frequency needed to achieve the PLL lock state. Unfortunately, absence of debug tap points within the PLL itself due to lack of available I/Os precluded any possibility of bypassing individual circuits within the PLL. Hence, the PLL was bypassed by simply feeding its input, i.e., the rail-to-rail output from the continuous-time comparator, directly into the FPGA, which employs a simple counter to determine the vibration frequency. A programmed $adc\_bypass$ signal was integrated into the digital circuitry for debugging purposes to allow for manual setting of the ADC data via the serial input port, which proved useful here. Note that the frequency detection chain SAR ADC functionality was verified separately by directly driving the tap point at the charge pump output, and the amplitude processing chain achieves full functionality without applying any bypass in the digital back-end.

The input clock frequency generated by the FPGA was minimized in order to optimize chip power consumption. Given a sampling interval of 95 ms, corresponding to the maximum on-chip resource consumption, the back-end DSP engine only needed to run at a rate of approximately 10-11 Hz. Approximately 42212 cycles were required to run the deconvolution kernel frequency evaluation engine while 20400 cycles each were required to output the serial magnitude and phase data, yielding a total of 40800 cycles. Thus, these two steps were pipelined, using the evaluation total time (i.e., the maximum period) as the pipeline stage time, yielding a required minimum clock rate of 444.3 kHz. For simplicity, the FPGA ran the chip at $f_{clk}\!=\!500$ kHz.

The full state machine implemented on the FPGA for testing the chip is shown below, including the corresponding number of clock cycles required for each step.

![chip_testing_state_machine](Chip_Testing_State_Machine.jpg)

Photos of the test setup are depicted below:

![chip_testing_photos](Chip_Testing_Photos.png)

In terms of general silicon-tested metrics, the measured chip operates at an average power consumption of 2.43 mW for a 500 kHz input clock frequency and a real-time temporal processing interval of 95 ms, consuming an active area of 10 mm$^2$. It should be noted that the majority of this area (approximately 70 percent) is dedicated to the 32 KB total of SRAM banks.

Evaluation of the chip functionality was implemented by generating input vibration signals of varying amplitude and frequency and assessing the accuracy of the generated frequency-domain deconvolution kernel in comparison with the ideal, expected magnitude and phase response profiles. Qualitative examples of the generated vs. ideal transfer functions are shown below, for two different vibration frequencies and notch quality factors (corresponding to vibration amplitude). As can be immediately seen, operation of the chip was very close to ideal.

***
*Qualitative kernel generation accuracy evaluation - ideal vs. generated transfer function phase and magnitude responses:*
![ideal_generated_spectra_comparison](Ideal_Generated_Spectra_Comparison.png)

Quantitatively, kernel generator precision was assessed via computation of the normalized mean absolute error (MAE) between the ideal and generated frequency response transfer functions for various vibration frequencies and amplitudes. Results of this analysis are shown in the plots below; the insignificant MAE verifies the expected functionality of the chip across a range of input vibration signals.

***
*Quantitative kernel generation accuracy evaluation - mean absolute error between ideal and generated frequency response across input vibration frequency and notch quality factor (vibration amplitude):*
![spectra_mean_absolute_error](Spectra_Mean_Absolute_Error.png)

***
## Conclusion

In summary, the presented chip demonstrates the successful use of open-source technology in the design of a fusion-based edge CMOS device for high-resolution, real-time sensing, implemented via a fully modular and reusable design methodology.