# Architecture Synthesis for Linear Time-Invariant Filters

#### Antoine Martinet

CITI lab. INRIA's SOCRATE Team.

Under the supervision of Florent de Dinechin

2 February - 31 July, 2015



### Table of Contents

Signal processing and filters

- Signal processing and filters
  - Fundamentals, Purpose
  - FIR, IIR
  - State-Space representation
  - The SIF: a unified realization representation
- Overview of the implementation
  - The FloPoCo Framework: computing just right
  - Basic block: SOPC
  - Architecture generation
- Oetails of the implementation
  - Computing the MSB
  - Computing the LSB
- Future Work

### Table of Contents

- Signal processing and filters
  - Fundamentals, Purpose
  - FIR, IIR
  - State-Space representation
  - The SIF: a unified realization representation
- - The FloPoCo Framework: computing just right
  - Basic block: SOPC
  - Architecture generation
- - Computing the MSB
  - Computing the LSB



•oooooo

### Introduction: LTI Filters

Signal: temporal variable x(k) with  $\{x(k)\}_{k\geq 0}\in\mathbb{R}$ 



LTI (Linear Time Invariant) filters are particular filters that are:

- Linear: outputs are linear combinations of inputs (allows to use linear algebra definitions)
- Time-Invariant: all coefficients are constant



0000000

# Purpose of this work

What is the purpose of this work?

- Lopez's PhD thesis states how to compute LTI filters in software:
  - scheduling issues
  - fixed size
- The goal here is to do such a work in hardware, where we have more flexibility:
  - full parallelism
  - arbitrary size

In this context, constraints become degrees of freedom.



#### FIR definition:

$$y(k) = \sum_{i=0}^{n} b_i u(k-i)$$
 (1)

Details of the implementation



Abstract architecture for the direct form realization of an FIR filter



Signal processing and filters

IIR definition:

$$y(k) = \sum_{i=0}^{n} b_i u(k-i) - \sum_{i=0}^{n} a_i y(k-i)$$
 (1)

Details of the implementation



Abstract architecture for the direct form realization of an IIR filter



#### IIR definition:

$$y(k) = \sum_{i=0}^{n} b_i u(k-i) - \sum_{i=0}^{n} a_i y(k-i)$$
 (1)



Abstract architecture for the direct form realization of an IIR filter



IIR definition:

$$y(k) = \sum_{i=0}^{n} b_i u(k-i) - \sum_{i=0}^{n} a_i y(k-i)$$
 (1)



Abstract architecture for the direct form realization of an IIR filter



# State-Space representation The "ABCD" form

Let's define x(k) a state vector (hardware register)

$$\begin{cases} x(k+1) = \mathbf{A}x(k) + \mathbf{B}\mathbf{u}(k) \\ y(k+1) = \mathbf{C}x(k) + \mathbf{D}\mathbf{u}(k) \end{cases}$$
(2)

Details of the implementation

With:

$$m{A} \in \mathbb{R}^{n_{ extit{x}} imes n_{ extit{x}}}$$
 ,  $m{B} \in \mathbb{R}^{n_{ extit{x}} imes n_{u}}$  ,  $m{C} \in \mathbb{R}^{n_{ extit{y}} imes n_{ extit{x}}}$  ,  $m{D} \in \mathbb{R}^{n_{ extit{y}} imes n_{u}}$ 

Equivalent matrix formulation:

$$\begin{pmatrix} \mathbf{x}(k+1) \\ \mathbf{y}(k+1) \end{pmatrix} = \begin{pmatrix} \mathbf{A} & \mathbf{B} \\ \mathbf{C} & \mathbf{D} \end{pmatrix} \begin{pmatrix} \mathbf{x}(k) \\ \mathbf{u}(k) \end{pmatrix}$$
 (3)

# The SIF: a unified realization representation

#### Problems with ABCD:

- Isb and msb computations have to be rebuilt for each new filter in this form.
- this doesn't gives an explicit order in the operations

The SIF generalizes the state-space.

Addition: t(k) describes the operations order:

$$\begin{pmatrix} \mathbf{J} & \mathbf{0} & \mathbf{0} \\ -\mathbf{K} & \mathbf{I}_{n_{x}} & \mathbf{0} \\ -\mathbf{L} & \mathbf{0} & \mathbf{I}_{n_{y}} \end{pmatrix} \begin{pmatrix} \mathbf{t}(k+1) \\ \mathbf{x}(k+1) \\ \mathbf{y}(k) \end{pmatrix} = \begin{pmatrix} \mathbf{0} & \mathbf{M} & \mathbf{N} \\ \mathbf{0} & \mathbf{P} & \mathbf{Q} \\ \mathbf{0} & \mathbf{R} & \mathbf{S} \end{pmatrix} \begin{pmatrix} \mathbf{t}(k) \\ \mathbf{x}(k) \\ \mathbf{u}(k) \end{pmatrix}$$
(4)

# The SIF as an algorithm

$$\begin{array}{l} \textbf{for int } i = 0 \text{ ; } i \leq n_t \text{; } i + + \textbf{ do} \\ & \quad \textbf{t}_i(k+1) \leftarrow -\sum\limits_{j < i} \textbf{\textit{J}}_{ij} \textbf{\textit{t}}_j(k+1) + \sum\limits_{j = 1}^{n_x} \textbf{\textit{M}}_{ij} \textbf{\textit{x}}_j(k) + \sum\limits_{j = 1}^{n_u} \textbf{\textit{N}}_{ij} \textbf{\textit{u}}_j(k) \end{array}$$

end

for int 
$$i = 0$$
;  $i \le n_x$ ;  $i++$  do
$$x_i(k+1) \leftarrow \sum_{j=1}^{n_t} \mathbf{K}_{ij} \mathbf{t}_j(k+1) + \sum_{j=1}^{n_x} \mathbf{P}_{ij} \mathbf{x}_j(k) + \sum_{j=1}^{n_u} \mathbf{Q}_{ij} \mathbf{u}_j(k)$$

end

end

**Algorithm 1:** Computation of SIF outputs from inputs



### A more common notation

An easiest way to communicate SIFs: The 7 matrix:

$$Z = \begin{pmatrix} -J & M & N \\ K & P & Q \\ L & R & S \end{pmatrix}$$
 (5)

#### Workflow



Figure: Workflow overview of tools usage



### Table of Contents

Signal processing and filters

- - Fundamentals, Purpose
  - FIR, IIR
  - State-Space representation
  - The SIF: a unified realization representation
- Overview of the implementation
  - The FloPoCo Framework: computing just right
  - Basic block: SOPC
  - Architecture generation
- - Computing the MSB
  - Computing the LSB

# The FloPoCo Framework: computing just right

#### FloPoCo:

- C++ framework
- Target: FPGAs
- Work: generating arithmetical cores in VHDL computing just right
- Reference: http://flopoco.gforge.inria.fr/

# Basic block: SOPC



SOPC architecture: based on the KCM architecture, split in 3 chunks



### Architecture generation algorithm

```
computeMSBSLSBS([msbs, lsbs][][]) //get the matrix of msbs lsbs.
for i=1; i=Z.size(); i++ do
   row[] = Z[i][] //pick first row of Z
   for j=1; j=1; j=Z.size() j++ do
       assign(SOPC[i], row[i], "T","X","U",[msbs,lsbs][i][j])
   end
   Second pass for wiring.
end
```

**Algorithm 2:** Architecture Generation Algorithm

# Example



# Example

#### Each line will be a SOPC



# Example



# Example



### Table of Contents

Signal processing and filters

- - Fundamentals, Purpose
  - FIR, IIR
  - State-Space representation
  - The SIF: a unified realization representation
- - The FloPoCo Framework: computing just right
  - Basic block: SOPC
  - Architecture generation
- Oetails of the implementation
  - Computing the MSB
  - Computing the LSB

# Computing the MSB



0000

This computation is done by colleagues in LIP6.

# Computing the LSB: Error definition

Let's define:

$$\mathbf{v'} = \begin{pmatrix} \mathbf{t}(k+1) \\ \mathbf{x}(k+1) \\ \mathbf{y}(k) \end{pmatrix} \tag{7}$$

Errors introduced by SOPCs:

$$\boldsymbol{\varepsilon}_{v'}(k) = \begin{pmatrix} \boldsymbol{\varepsilon}_{t}(k) \\ \boldsymbol{\varepsilon}_{t}(k) \\ \boldsymbol{\varepsilon}_{t}(k) \\ \boldsymbol{\varepsilon}_{t}(k) \end{pmatrix} = \begin{pmatrix} \boldsymbol{\varepsilon}_{t}(k) \\ \boldsymbol{\varepsilon}_{t$$

(8)

00000

Signal processing and filters

### Computing the LSB: Point of view about error



A signal view of the error propagation with respect to the ideal filter

00000

Signal processing and filters

### Computing the LSB: Point of view about error



A signal view of the error propagation with respect to the ideal filter

for int 
$$i = 0$$
;  $i \le n_t$ ;  $i++$  do
$$\begin{vmatrix} \mathbf{t}_i(k+1) \leftarrow \\ -\sum_{j < i} \mathbf{J}_{ij} \mathbf{t}_j(k+1) + \sum_{j=1}^{n_x} \mathbf{M}_{ij} \mathbf{x}_j(k) + \sum_{j=1}^{n_u} \mathbf{N}_{ij} \mathbf{u}_j(k) + \boldsymbol{\varepsilon}_{t_i}(k) \end{vmatrix}$$

end

for int i = 0;  $i \le n_x$ ; i++ do

$$m{x}_i(k+1) \leftarrow \sum\limits_{j=1}^{n_t} m{K}_{ij} m{t}_j(k+1) + \sum\limits_{j=1}^{n_x} m{P}_{ij} m{x}_j(k) + \sum\limits_{j=1}^{n_u} m{Q}_{ij} m{u}_j(k) + m{arepsilon}_{\mathbf{x}_i}(m{k})$$

end

for int i = 0; i < ny; i++ do

$$m{y}_i(k) \leftarrow \sum\limits_{j=1}^{n_t} m{L}_{ij} m{t}_j(k+1) + \sum\limits_{j=1}^{n_{\mathsf{x}}} m{R}_{ij} m{x}_j(k) + \sum\limits_{j=1}^{n_u} m{S}_{ij} m{u}_j(k) + m{arepsilon}_{y_i}(k)$$

end

**Algorithm 3:** Computation of SIF outputs from inputs

# Computing the LSB: current solution

Main constraint:

$$\varepsilon_{v} < \mathbf{2}^{-lsb_{v}}$$
 (9)

Transformed into the following constraint:

$$|\langle\langle\mathcal{H}_{\varepsilon}\rangle\rangle|\cdot\mathbf{2}^{lsb_{v'}+1}<\mathbf{2}^{-lsb_{y_i}}$$
 (10)

Lopez gives a simple solution that matches the fixed size constraint.

In the present case, a better solution is achievable at the end of this work.

#### Table of Contents

- - Fundamentals, Purpose
  - FIR, IIR
  - State-Space representation
  - The SIF: a unified realization representation
- - The FloPoCo Framework: computing just right
  - Basic block: SOPC
  - Architecture generation
- - Computing the MSB
  - Computing the LSB
- Future Work



#### Future Work

Signal processing and filters

- Removal of power of two: adapt KCM and SOPC FloPoCo cores
- Sub-filter detection: open question for either Front-End and Back-End
- Precision calculations improvement
- File format re-specification

#### To conclude

- Adapting LTI Filters generation from software to hardware
- Reuse of Lopez's calculations in our context
- Implementation for FPGAs in a parametric view

$$\begin{pmatrix} \mathbf{J} & \mathbf{0} & \mathbf{0} \\ -\mathbf{K} & \mathbf{I}_{n_{x}} & \mathbf{0} \\ -\mathbf{L} & \mathbf{0} & \mathbf{I}_{n_{y}} \end{pmatrix} \begin{pmatrix} \mathbf{t}(k+1) \\ \mathbf{x}(k+1) \\ \mathbf{y}(k) \end{pmatrix} = \begin{pmatrix} \mathbf{0} & \mathbf{M} & \mathbf{N} \\ \mathbf{0} & \mathbf{P} & \mathbf{Q} \\ \mathbf{0} & \mathbf{R} & \mathbf{S} \end{pmatrix} \begin{pmatrix} \mathbf{t}(k) \\ \mathbf{x}(k) \\ \mathbf{u}(k) \end{pmatrix}$$
(11)

With:

$$\mathbf{J} \in \mathbb{R}^{n_t \times n_t}, \mathbf{M} \in \mathbb{R}^{n_t \times n_x}, \mathbf{N} \in \mathbb{R}^{n_t \times n_u}, 
\mathbf{K} \in \mathbb{R}^{n_x \times n_t}, \mathbf{P} \in \mathbb{R}^{n_x \times n_x}, \mathbf{Q} \in \mathbb{R}^{n_x \times n_u}, 
\mathbf{L} \in \mathbb{R}^{n_y \times n_t}, \mathbf{R} \in \mathbb{R}^{n_y \times n_x}, \mathbf{S} \in \mathbb{R}^{n_y \times n_u},$$
(12)



The FixRealKCM method when  $x_i$  is split in 3 chunks

#### Mathematical definition of a filter $\mathcal{H}$

Definition of a filter:  $\mathbf{y} = \mathcal{H}(\mathbf{u})$  With  $dim(\mathbf{y}) = n_{v}$  and,  $dim(\mathbf{u}) = n_{ii}$ Linearity:

$$\mathcal{H}(\alpha \cdot \mathbf{u}_1 + \beta \cdot \mathbf{u}_2) = \alpha \cdot \mathcal{H}(\mathbf{u}_1) + \beta \cdot \mathcal{H}(\mathbf{u}_2)$$

Time invariance:

$$\{\mathcal{H}(\mathbf{u})(k-k_0)\}_{k\geq 0} = \mathcal{H}(\{\mathbf{u}(k-k_0)\}_{k\geq 0})$$

$$u=\sum_{i\geq 0}u(I)\delta_I$$

Where  $\delta_I$  is a Dirac impulsion centered in I:

$$\delta_I(k) = \begin{cases} 1 & \text{when } k = I \\ 0 & \text{else} \end{cases}$$
 (14)

Computation of the outputs:

#### Small bibliography



K.D. Chapman.

Fast integer multipliers fit in FPGAs (EDN 1993 design idea winner). EDN magazine, 39(10):80, May 1993.



S. Chevillard, M. Joldes, and C. Lauter.

Sollya: An environment for the development of numerical codes.

In K. Fukuda, J. van der Hoeven, M. Joswig, and N. Takayama, editors, Mathematical Software - ICMS 2010, volume 6327 of Lecture Notes in Computer Science, pages 28-31, Heidelberg, Germany, September 2010. Springer.



Florent de Dinechin, Matei Istoan, and Abdelbassat Massouri.

Sum-of-product architectures computing just right.

In Application-Specific Systems, Architectures and Processors (ASAP), IEEE, 2014.



Florent de Dinechin and Bogdan Pasca.

Designing custom arithmetic data paths with FloPoCo.

IEEE Design & Test of Computers, 28(4):18-27, July 2011.



Thibaut Hilaire

Analyse et synthèse de l'implémentation de lois de contrôle-commande en précision finie.

PhD thesis. Université de Nantes. 2006.



### References II



P.Chawdhry I.Niabeleke, R.Pannett and C.Burrows,

Design of h-infiny loop-shaping controllers for fluid power systems.

In IEEE Colloquium Robust Control - Theory, Software and Applications, 1997.



J.F.Whidborne and R.H.Istepanian.

Reduction of controller fragility by pole sensivity minimization.

In Int. J. Control. 2000.



G.Li J.Wu. S.Chen and J.Chu.

Constructing sparse realizations of finite precision digital controllers based on a closed-loop stability related measure.

In IEEE Proc. Control Theory and Applications, 2003.



Benoit Lopez.

Implémentation optimale de filtres linéaires en arithmétique virgule fixe.

PhD thesis. Université Paris VI. 2014.



P.Chevel T.Hilaire and J.F.Whidbornz.

A unifying framework for finite wordlength realizations.

In IEEE, 2007.



P.Chevel T.Hilaire and Y.Tringuet.

Implicit state-space representation: a unifying framework for FWL implementation of LTI systems.

In Proc. of the 16th IFAC World Congress., 2005.

# References III



A. Volkova, T. Hilaire, and C. Lauter.

Reliable evaluation of the Worst-Case Peak Gain matrix in multiple precision.

In IEEE Symposium on Computer Arithmetic, 2015.