Fall 2015

Due: Thursday, Dec 3 at 4pm

# **ADC Non-linearity Correction Engine**

#### **Quick Overview**

The project aims to test your understanding of finite-state machine, memory control, and their hardware implementation using chip synthesis. You have 6 weeks (~3 weeks to develop RTL, 2 week for energy/sample optimization, 1 week for physical mapping). Submit following files by email (ee216a@gmail.com) with "Project submission: SID #" in the subject line:

NLC.v Your Verilog design
 NLC.gds Layout of your design
 Timing-SID.txt Post-layout timing report
 Power-SID.txt Post-layout power report

Summary-SID.pdf
 1-page summary report (template provided)

#### Introduction

Assume you designed an analog-to-digital converter (ADC) with a resolution of 21-bits but due to the implementation-inherent non-linearity you are observing an effective resolution of only 6-8 bits (ENOB). You were initially targeting an ENOB of 15bits. Instead of redesigning the ADC you can take an easy approach and correct the non-linear mapping such that you reach your target ENOB as follows.

- First invert the ADC<sub>count</sub> vs V<sub>in</sub> curve such that the ADC<sub>count</sub> is on the X-axis and V<sub>in</sub> is on the Y-axis (Figure 1).
- You now observe that you have a function, which can map the non-linear ADC<sub>count</sub> output to the supplied input voltage. If you design a digital block, which accepts ADC<sub>count</sub> as input and generates V<sub>in</sub> at the output you will accomplish your task.

The easiest implementation that can be employed is a lookup table. You feed the ADC<sub>count</sub> as your memory address and then the corrected value (which was previously stored in the memory) will appear at the output. This method will require 2<sup>21</sup> memory locations and 15bits for each location to accomplish this. Assuming 0.124um<sup>2</sup> for 6T SRAM cell area [1], we will need 3.9mm<sup>2</sup> for each ADC channel. For a 32-channel system the required area will be 124.8mm<sup>2</sup>, which is prohibitively expensive.



Fig. 1. Inverted Non-linearity Curve.

Another approach is to fit a non-linear function (i.e. polynomial) to the above curve and calculate the V<sub>in</sub> every time we receive an ADC<sub>count</sub>. For this approach, we will need to save the coefficients only - hence less memory will be required. This will reduce the leakage power (due to area savings), but since we will be doing computations for each ADC sample, the dynamic power will increase.

$$p(x) = a_0 + a_1 x + a_2 x^2 + \dots + a_n x^n$$

If we take a look at the general form of a polynomial above we notice that the input (X) has to be raised to the power of n, which is the polynomial order. The order of the polynomial dictates how well the non-linearity curve is matched. For example a 30<sup>th</sup> order polynomial may give a good enough fit to achieve the required specification, however raising a 21-bit input into 30<sup>th</sup> power will be computationally expensive to implement in direct form. Since the non-linearity is more pronounced at both extremes of the ADC range, slight reduction in the ADC range (+-80mV instead of +-100mV) can improve the fitting.

If we "chop" the entire non-linearity curve into smaller chunks and try to do piecewise polynomial fitting we can reduce the polynomial order (see the supplied polynomial\_order\_and\_coeffs.m matlab script for a sample). We can then implement a polynomial computation engine using the worst case polynomial order among the sections and use it to calculate the polynomial fit for all sections. Note that in this case each section will have different sets of coefficients which will be supplied to the engine based on the input range. If one section's order is less than the polynomial engine order, 0 can be supplied for the higher order coefficients.



Fig. 1. Sectioning the non-linearity curve for polynomial order reduction.

Algorithmic transformation can also ease the computational burden. If we use Horner's method for iterative polynomial computation we can use one multiply-accumulate unit over many clock cycles to compute the same polynomial. Let's say that we chopped our non-linearity curve into 4 sections and our worst case section order is 5. Figure 2 illustrates the iterative implementation for this architecture.



Fig. 2. Iterative implementation of polynomial non-linearity correction for one section.

Note that the system clock frequency is faster than the ADC sampling rate. This is why you will have idle system clock cycles in-between ADC samples, and can utilize them for iterative calculations. To manage the sample flow, the engine supports forward flow control using srdyi (input enable) and srdyo (output valid) signals.

## Will any design blocks be provided?

You will be provided with a Verilog implementation of a single precision floating point adder and a multiplier to use in your design (red blocks in figure 2). You can treat these as black boxes and instantiate them in your top module.

Note that these blocks have synchronous active-high resets (GlobalReset) and the active clock edge is the positive edge (clk). The red lettered Z<sup>-6</sup> and Z<sup>-9</sup> mean that the multiplier and the adder have latency of 6 and 9 respectively (pipelined).



These floating-point blocks use a floating-point representation that differs from the IEEE floating-point representation. Matlab, on the other hand, uses the IEEE floating-point representation. To feed coefficients found in matlab into your design, or to convert the output of your simulation to matlab readable format, you will need to convert the formats back and forth.

Two floating-point conversion functions are provided:

 syn\_ieeefp2smcfp - Converts IEEE format floating-point numbers to SMC floating-point numbers. For example to convert IEEE representation of 0.25 to SMC (symphony model compiler) representation we will use the following command in matlab

```
smcfp = uint32(syn ieeefp2smcfp(0.25, 8, 23))
```

Here, the 8 and 23 are the exponent and mantissa widths. You should always use these values for the conversions.

• syn\_smcfp2ieeefp - Converts SMC floating-point numbers to IEEE floating-point numbers. For example to convert the SMC representation of 0.25 to IEEE we will use the following command in matlab.

In addition to the floating point adder and multiplier you will need to convert the ADC output to SMC floating point format so that you can perform addition and multiplication operations. The "fp\_to\_smc\_float" help to achieve this conversion. The coefficients don't need to be converted, they can be stored in the configuration memory in SMC single precision floating point format (32-bit). The result of computation will be in SMC floating point format and will need to be converted to fixed point representation at the output. The "smc\_float\_to\_fp" block will accomplish this. Just like for the adder and multiplier blocks, smc\_float\_to\_fp and fp\_to\_smc\_float modules have synchronous active-high resets (GlobalReset) and the active clock edge is the positive edge (clk). These modules also are pipelined and have latencies of 3 and 2 respectively.



**Table I.** Provided Verilog Files.

| File Name              | Description                                                         |
|------------------------|---------------------------------------------------------------------|
| define.h               | Contains definitions used by floating point blocks.                 |
| SynLib.v               | Contains structures used by floating point block.                   |
| smc_float_adder.v      | SMC single precision floating point adder                           |
| smc_float_multiplier.v | SMC single precision floating point multiplier.                     |
| smc_float_to_fp.v      | SMC single precision floating point to fixed point converter block. |
| fp_to_smc_float.v      | Fixed point to SMC single precision floating point converter block. |

# 1. Design Specifications

Table II lists the system design specifications. You will notice that the given non-linearity curve exercises only 17bits of the raw ADC resolution not 21 bits. This is due to the sampling frequency you are given (6kHz), for lower sampling frequencies the ADC raw resolution will be higher and we will be designing the NLC engine assuming 21bit input although your analysis is for ~17bit input. Notice also that even the required ENOB is 14bits, you are asked to design the NLC engine with 16bit output resolution. Besides the previous argument about reduced sampling rates, due to simulation mismatches to actual hardware performance we want to keep a margin so that if the hardware performance is better than the simulation results we can attain higher ENOB than designed for.

**Table II.** System Design Specifications.

| Design Parameter            | Value     |
|-----------------------------|-----------|
| Number of ADC Channels      | 32        |
| ADC Raw Resolution          | 21 bits   |
| Effective Resolution (ENOB) | 14 bits   |
| (after your correction)     |           |
| Output Resolution           | 16 bits   |
| ADC Sampling Rate           | 6 kHz     |
| System Clock Frequency      | 6.144 MHz |

## 2. Design Metrics

The design objective is to minimize the energy per ADC sample (i.e. pJ/sample). Please minimize the energy while maintaining the system throughput. We use the metric of **Efficiency = Chip Power / ADC Sampling Rate** to evaluate the performance. You will need to form a group with another classmate to complete this project. Consultation with others is allowed, but the work has to be distinctly yours.

## 3. Suggested Timeline

The project will span five weeks. You will need roughly 3 weeks to develop RTL, one week for speed-area optimization, and one week for physical mapping.

## 4. Project Submission

Submit following files by email (<u>ee216a@gmail.com</u>) with "**Project submission: SID number**" in the subject line:

NLC.v Your Verilog design
 NLC.gds Layout of your design
 Timing-SID.txt Post-layout timing report
 Power-SID.txt Post-layout power report

• Summary-SID.pdf 1-page summary report (PPT template provided)

Be sure to include your SID as part of file name

# 5. Grading

Your project will be graded based on following criteria:

Functional 4 section 6<sup>th</sup> order design: 50%
Functional 3, 5, or 6 section design: 20%

Comparing the power for SRAM, Flip-flop, and Register File as storage: 5%

**Efficiency metric:** 25% (automated grading)

The efficiency points will be added only if you have complete functionality of the 4 section 5<sup>th</sup> order design. You can also apply the following algorithmic, architectural transformation and circuit optimization techniques for extra credit.

**Channel Interleaving:** (levels 2, 4, 8, 16, & 32) 5%

Voltage Scaling: 5%

Power Gating: 5%

Configuration Memory Blocking: 5%

Fixed-point implementation, wordlength optimization: 10%

Try a different algorithm for non-linearity correction: 10%

Increase the ENOB from 14 to 15bit: 10%

Increase the ADC range from +-80mV to +-100mV: 10%

# **HAVE FUN!**

# References

[1] L. Chang, D. M. Fried, J. Hergenrother, J. W. Sleight, R. H. Dennard, R. K. Montoye, L. Sekaric, S. J. McNab, A. W. Topol, C. D. Adams, K. W. Guarini, and W. Haensch, "Stable SRAM cell design for the 32 nm node and beyond," in *2005 Symposium on VLSI Technology, 2005. Digest of Technical Papers*, 2005, pp. 128–129.