

### CZECH TECHNICAL UNIVERSITY IN PRAGUE

Faculty of Electrical Engineering
Department of Electric Drives and Traction

### Name of the report

Technical report

### **TABLE OF CONTENTS**

| 1          | Introduction                                                           | 1        |
|------------|------------------------------------------------------------------------|----------|
| 2          | Notes on all of the circuit designs in Verilog                         | 2        |
| 3          | Calculating the division of fixed point numbers                        | 3        |
| 3.1        | Newton Rapshon algorithm for calculating the division                  | 3        |
| 3.2        | IP Block Design                                                        | 4        |
| 3.2.1      | Top module design                                                      | 4        |
| 3.2.2      | Allocation and Timing                                                  | 5        |
| 3.2.3      | Data Path Module                                                       | 6        |
| 3.2.4      | Control Unit                                                           | 7        |
| 3.3        | Calculating number of bits to shift the denominator                    | 8        |
| 3.4        | Simulation results                                                     | 8        |
| 4          | Using CORDIC to calculate trigonometric functions                      | 12       |
| 4.1        | Theory                                                                 | 12       |
| 4.1.1      | Example of calculation                                                 | 14       |
| 4.2        | Python Implementation                                                  | 14       |
| 4.3        | IP Block Design                                                        | 17       |
| 4.3.1      | Top module design                                                      | 17       |
| 4.3.2      | Allocation and Timing                                                  | 17       |
| 4.3.3      | Data Path Module                                                       | 18       |
| 4.3.4      | Control Unit                                                           | 20       |
| 4.4        | Simulation results                                                     | 21       |
| 5          | Simple set of nonlinear equations solved by a Newton-Raphson algorithm | 2.4      |
| <b>7</b> 1 | using custom circuit implementation                                    | 24       |
| 5.1        | Theory                                                                 |          |
| 5.2        | IP Block Design                                                        | 25       |
| 5.2.1      | Top module design                                                      |          |
| 5.2.2      | Allocation and Timing                                                  |          |
| 5.2.3      | Data Path Unit                                                         | 27<br>28 |
| 5.2.4      | Control Unit                                                           |          |
| 5.3        | Simulation results                                                     | 29       |
| 6          | Selective Harmonic Elimination                                         | 31       |
| 6.0.1      | Control Unit                                                           | 31       |
|            | Conclusion                                                             | 32       |
|            | References                                                             | 34       |
| Anner      | ndiy A List of symbols and abbreviations                               | 35       |

| A.1 | List of abbreviations | 35 |
|-----|-----------------------|----|
| A.2 | List of symbols       | 36 |

### **LIST OF FIGURES**

| 3 - 1 | Top module design for the division unit IP block design.                                            | 5  |
|-------|-----------------------------------------------------------------------------------------------------|----|
| 3 - 2 | Alloccation and timing diagram for the Data Path Unit part of the division IP.                      | 6  |
| 3 - 3 | Register transfer level RTL scheme of the IP Data Path Unit part of the division IP                 | 7  |
| 3 - 4 | Selected signals of simulation of division $N/D = 10 / 7$ . The correct result in $R\theta$ is      |    |
|       | obtained after two iterations (reg numberOfIterations).                                             | 9  |
| 3 - 5 | Selected signals of simulation of division N/D = $1 / 0.25$ . The correct result in $R\theta$ is    |    |
|       | obtained after five iterations (reg numberOfIterations).                                            | 10 |
| 3 - 6 | Selected signals of simulation of division N/D = $1 / (-0.25)$ . The correct result in $R\theta$ is |    |
|       | obtained after five iterations (reg numberOfIterations).                                            | 10 |
| 3 - 7 | Selected signals of simulation of division $N/D = 304.03215 / (-0.25)$ . The correct result         |    |
|       | in R0 is obtained after five iterations (reg numberOfIterations)                                    | 11 |
| 3 - 8 | Selected signals of simulation of division N/D = $10 / (519)$ . The correct result in $R0$ is       |    |
|       | obtained after two iterations (reg numberOfIterations).                                             | 11 |
| 4 - 1 | Top module design for the CORDIC IP block design.                                                   | 17 |
| 4 - 2 | Alloccation and timing diagram for the Data Path Unit part of the CORDIC IP                         | 18 |
| 4 - 3 | Register transfer level RTL scheme of the CORDIC IP Data Path Unit IP                               | 19 |
| 4 - 4 | The whole Verilog simulation of CORDIC algorithm for determining the sinus and cos-                 |    |
|       | inus values of angle $\theta = -1.2479$ rad. The value of sinus and cosinus based on the            |    |
|       | current iteration is also calculated in this algorithm approach. The result is passed to the        |    |
|       | registers R9 and R10.                                                                               | 21 |
| 4 - 5 | The detail of the last iteration of the Verilog simulation of CORDIC algorithm for deter-           |    |
|       | mining the sinus and cosinus values of angle $\theta = -1.2479$ rad. The result is passed to        |    |
|       | the registers R9 and R10.                                                                           | 22 |
| 4 - 6 | The whole Verilog simulation of CORDIC algorithm for determining the sinus and cos-                 |    |
|       | inus values of angle $\theta=10.7195129$ rad. The value of sinus and cosinus based on the           |    |
|       | current iteration is also calculated in this algorithm approach. The result is passed to the        |    |
|       | registers R9 and R10.                                                                               | 22 |
| 4 - 7 | The whole Verilog simulation of CORDIC algorithm for determining the sinus and cos-                 |    |
|       | inus values of angle $\theta=6.7195129$ rad. The value of sinus and cosinus based on the            |    |
|       | current iteration is also calculated in this algorithm approach. The result is passed to the        |    |
|       | registers R9 and R10.                                                                               | 23 |
| 5 - 1 | Top module design for the simple NR calculation unit IP block design.                               | 26 |
| 5 - 2 | Allocation and timing diagram for the Data Path Unit part of the simple NR IP                       | 27 |
| 5 - 3 | Register transfer level RTL scheme of the IP Data Path Unit part of the simple NR cal-              |    |
|       | culation IP                                                                                         | 28 |
| 5 - 4 | The whole Verilog simulation of a simple NR algorithm. The result is may be seen in                 |    |
|       | registers R1 and R2 after the fifth iteration of the algorithm.                                     | 30 |
|       |                                                                                                     |    |

### LIST OF TABLES

| 3 - 1 | Control signal encoding table for instructions to be processed by the Division Module     | 7  |
|-------|-------------------------------------------------------------------------------------------|----|
| 4 - 1 | Control signal encoding table for instructions to be processed by the CORDIC Module       | 20 |
| 5 - 1 | Control signal encoding table for instructions to be processed by the simple NR alogrithm |    |
|       | solve Module                                                                              | 29 |
| 6 - 1 | Control signal encoding table for instructions to be processed by the simple NR alogrithm |    |
|       | solve Module                                                                              | 31 |

### 1 Introduction

This is the introduction.

### 2 Notes on all of the circuit designs in Verilog

All of the designs are created using pure Verilog code and tested through Free and Open-Source Software (FOSS). The decision to opt for FOSS was deliberate, aiming to prevent any vendor-locking to specific hardware or predefined IPs. Predefined IPs are often optimized by a specific hardware vendor and intended for use with that vendor's hardware. However, the hardware may not always be available or suitable for a specific application. Academics and numerous companies opt for open-source and open-hardware approaches to prevent vendor lock-in. Once the design and algorithm are thoroughly understood, they can be initially implemented without any specific platform in mind. Later, when selecting the device vendor, the design can be modified to suit the specific hardware requirements.

That is why Verilog, with Cocotb [1] (Test Bench creation tool) and Verilator [2] (simulator) have been used for designing the circuits presented in this paper.

#### 3 Calculating the division of fixed point numbers

Typically, when employing numerical methods to solve transcendental equations, the calculation of the division of two input numbers becomes necessary. This requirement persists even when applying the Newton-Raphson (NR) method to solve a set of two equations, as it entails computing the reciprocal value of the Jacobian determinant.

There are some IP blocks available, which are capable of calculating the division of two numbers, but the blocks are usually either vendor specific intellectual property IP [3] or feature low performance [4].

The drawback of vendor-specific IPs lies in their limited compatibility, often preventing their use with FPGA chips from different vendors. On the other hand the vendor specific IPs are usually optimized and able to use the specific type of resources available at the vendor's chip which resolve in better performance.

To preserve the compatibility of the design with chips from multiple vendors, the custom solution for division design based on the very known Newton Raphson (NR) algorithm was developed. [4]

#### 3.1 Newton Rapshon algorithm for calculating the division

General Newton Raphson (NR) algorithm is a well known approach to numerically solve equations. It is the reason why it is utilized in many algorithms. However, the negative aspect of NR is that it's convergency strongly depends on initial values of unknown variables. When the initial variables are chosen poorly, the performed number of iterations before the convergency is reached can be high.

To reach the fastest convergency possible (determined in number of iterations) apart from the scaling the dominator into the interval [0.5,1] the initial value calculation formula should be utilized. [4] The formula for calculating the initial value eq. 3 - 1 is applied after the scaling of denominator is performed. The algorithm developed for the appropriate scaling is explained in the *Calculating number of bits to shift the denominator*.

$$x_0 = \frac{48}{17} - \frac{32}{17}D,\tag{3-1}$$

where the  $x_0$  is the initial value for NR algorithm and D is the denominator value for calculating the expression N/D.

After the initial value  $x_0$  is calculated, the NR algorithm is performed. The idea for using NR algorithm to calculate the division of N/D is to trade the division for a multiplication, which can be synthetized in the FPGA fabric. For the NR algorithm the function with root is 1/D is essential. There may be many functions, which root is the searched value 1/D but the most trivial is eq. 3-2.

$$F(x) = \frac{1}{x} - D. \tag{3-2}$$

For the derivative at the point of  $x_i$  then applies eq. 3 - 3.

$$\frac{\mathrm{d}F(x_i)}{\mathrm{d}x} = F'(x_i) = \frac{F(x_{i+1}) - F(x_i)}{x_{i+1} - x_i}.$$
 (3 - 3)

Because finding root of the equation 3 - 2, the value of  $F(x_{i+1})$  is set to be zero. After separating the

 $x_{i+1}$  value of the eq. 3 - 3 and derivating the function  $F(x_i)$  the obtained algorithm for a value  $x_{i+1}$  is obtained from eq. 3 - 4.

$$x_{i+1} = -\frac{F(x_i)}{F'(x_i)} + x_i = -\frac{F(x_i)}{-\frac{1}{x_i^2}} + x_i = (\frac{1}{x_i} - D)x_i^2 + x_i = x_i - Dx_i^2 + x_i = 2x_i - Dx_i^2.$$
 (3 - 4)

Usually, the iterative algorithm is stopped, when the value  $F(x_{i+1}) - F(x_i)$  (called defect) reaches certain value set by the stop condition. However, in this algorithm, the stop condition is not yet implemented. Based on the observation carried on the N-R algorithm the obtained result is sufficient after 5 iterations.

The mathematically expressed algorithm is then transformed into programmable algorithm suitable for FPGA implementation. The top module design for this algorithm is presented in the section *Top module design*, the control and data unit for calculating the value  $x_{i+1}$  is presented in the *Allocation and Timing* 

#### 3.2 IP Block Design

The design of this unit is consists of 4 main modules:

- the data unit module, used for manipulating data and making calculation operations,
- the control unit module, used for controlling the data unit module and scaling unit module,
- scaling unit module, used for calculating the number of bits needed for shifting the denominator value to the interval [0.5,1].

#### 3.2.1 Top module design

The top module wraps all of the presented modules (**data unit module**, **control unit module**, **scaling unit module**). The basic structure of connected modules of this top design is depicted in the fig. 3 - 1. Thanks to this wrapper it is possible to test the created modules with Verilog Testbench, Verilator [2] or Cocotb [1].



Figure 3 - 1 Top module design for the division unit IP block design.

### 3.2.2 Allocation and Timing

The diagram of the data flow and timing of the algorithm is displayed in the Figure 3 - 2.

The whole algorithm consists of nine steps. The first four steps are used for calculating the initial value of  $x_0$  as described in the equation 3 - 1. The steps S4 to S8 are for calculating the next search value of  $x_{i+1}$ , the root of the equation 3 - 2 so the searched value of 1/D. The following iteration begins at the step labeled as S5. The iterative process continues until a predefined stop condition is met, such as reaching a specified number of iterations.



Figure 3 - 2 Alloccation and timing diagram for the Data Path Unit part of the division IP.

#### 3.2.3 Data Path Module

The structure of the Data Path Module is depicted in the Figure 3 - 3. The module was specifically designed to serve the needs of the division algorithm. It comprises five registers labeled R0 through R4, two multipliers M1, M2 and one bit shifter.

The module is controlled by the control unit with the control signal labeled as CV. The encoding table with the labels which corresponds to the Data Path Unit module is presented in the section *Control Unit*.

The result of each iteration from the division algorithm is passed to a register R0.

The Data Path Module unit also covers the possibility of negative denominator and numerator. Because the values are stored in a custom Q32.15 fixed point format (whole number comprises of 32 bits, 15 bits fractional part, 17 bits integer part), the algorithm checks if the D or N values are higher than 0h8000 and determine it's actual sign and the sets sign of the result. If the analyzed number is determined negative, it is transformed to value positive and then used in the presented division algorithm. This transformation is needed because of the algorithm calculating the bits to shift the denominator in the interval.



Figure 3 - 3 Register transfer level RTL scheme of the IP Data Path Unit part of the division IP.

#### 3.2.4 Control Unit

The signals from Control Unit to Data Path Module are encoded in the CV signal. The CV signal with the corresponding instructions for the steps S0-S8 of the FSM is presented in the table 3 - 1. For cleaner code, the signal is passed to the Control Unit in the hexadecimal format.

The number of the iteration is also set in the Control Unit. The value is used in this module to determine the stop condition of the calculation.

As stated in the *Allocation and Timing* section, after the step S8, the FSM restarts at the state S4 with new  $x_i$  values to be used in the current iteration. This jump is not depicted in the table for CV signal.

|       |                                                                         | 14  | 13  | 12  | 11  | 10  | Q     | 8        | 7        | 6     | - 5   | 4      | 3        | 2        | 1     | 0     |          |
|-------|-------------------------------------------------------------------------|-----|-----|-----|-----|-----|-------|----------|----------|-------|-------|--------|----------|----------|-------|-------|----------|
| State | RTL Code                                                                | ld0 | ld1 | ld2 | ld3 | ld4 | SelR1 | SelR2[1] | SelR2[0] | SelR3 | SelR4 | SelSh1 | SelM1[1] | SelM1[0] | SelM2 | SelS1 | CV       |
| S0    | $R1 \leftarrow D$ ;                                                     | 0   | 1   | 0   | 0   | 0   | 0     | 0        | 0        | 0     | 0     | 0      | 0        | 0        | 0     | 0     | 2000h    |
| S1    | $R1 \leftarrow R1 \ll 32$ ; (Sh1)                                       | 0   | 1   | 0   | 0   | 0   | 1     | 0        | 0        | 0     | 0     | 1      | 0        | 0        | 0     | 0     | 15'h2210 |
| S2    | $R2 \leftarrow 1.882 \times R1; (M1)$ $R3 \leftarrow N;$                | 0   | 0   | 1   | 1   | 0   | 0     | 0        | 0        | 0     | 0     | 0      | 0        | 1        | 0     | 0     | 15'h1804 |
| S3    | $R2 \leftarrow 2.82 - R2; (Sub1)$<br>$R3 \leftarrow R3 \ll 32; (Sh1)$   | 0   | 0   | 1   | 1   | 0   | 0     | 0        | 1        | 1     | 0     | 0      | 0        | 0        | 0     | 0     | 15'h18C0 |
| S4    | $R4 \leftarrow R2 \times R1; (M1)$                                      | 0   | 0   | 0   | 0   | 1   | 0     | 0        | 0        | 0     | 1     | 0      | 0        | 0        | 0     | 0     | 420h     |
| S5    | $R4 \leftarrow R2 \times R4; (M1)$<br>$R2 \leftarrow 2 \times R2; (M2)$ | 0   | 0   | 1   | 0   | 1   | 0     | 1        | 0        | 0     | 1     | 0      | 1        | 0        | 0     | 0     | 15'h1528 |
| S6    | $R2 \leftarrow R2 - R4$ ; (S1)                                          | 0   | 0   | 1   | 0   | 0   | 0     | 0        | 1        | 0     | 0     | 0      | 0        | 0        | 0     | 1     | 15'h1081 |
| S7    | $R4 \leftarrow R2 \times R3; (M2)$                                      | 0   | 0   | 0   | 0   | 1   | 0     | 0        | 0        | 0     | 0     | 0      | 0        | 0        | 1     | 0     | 15'h402  |
| S8    | $R0 \leftarrow R4$                                                      | 1   | 0   | 0   | 0   | 0   | 0     | 0        | 0        | 0     | 0     | 0      | 0        | 0        | 0     | 0     | 4000h    |

*Table 3 - 1 Control signal encoding table for instructions to be processed by the Division Module.* 

#### 3.3 Calculating number of bits to shift the denominator

As presented in the section *Newton Rapshon algorithm for calculating the division* the denominator must be appropriately scaled for the division algorithm to work. This section presents algorithm for scaling the denominator specified in the fixed point number format *Q32.15*. After the scaling value is successfully determined, the numerator is scaled accordingly.

The presented algorithm shifts the value of denominator at every positive edge of the clock signal and saves the shifted value in the compare register. Then the combinational circuit is utilized to compare the shifted value in compare register with the number 1 specified in Q32.15 format. If the compared value is the same or lower than 1 the shifting algorithm is done and the value scaleToShift is successfully found. If not, the inner value of shifting bits is incremented and the algorithm proceeds to the next iteration

tHe presented algorithm is realized in the *denominatorSizeScaleUnit* module and it's pseudocode is depicted in the code 4 - 4.

```
at every negative edge of clock or positive edge of reset
   if(rst)
     scaleToShift = 0;
     scaleToShiftInternal = 1;
     started = 0;
   end if
   else if (start)
   started = 1;
   end else if
10
   at every positive edge of clock
   13
   done = 1;
   started = 0;
14
   scaleToShift = scaleToShiftInternal;
   end if
16
   else
   done = 0;
   scaleToShiftInternal = scaleToShiftInternal + 1;
19
   end else
```

Code 3 - 1 Pseudocode for the denominatorSizeScaleUnit module algorithm.

#### 3.4 Simulation results

The simulation via Verilog testbench was made to determine the correctness of presented division module. The Icarus Verilog simulator was used to simulate the module and GTKWave was used to display the VCD simulation output file.

As for the simulation output it can be stated, that the module works correctly for positive and negative numbers of fixed point format Q32.15.

The algorithm used in this module is able to calculate the propper result in much less clock cycles than the full division algorithm used in the division module in the package [4].

Thus the presented module may be used as a submodule in more complex modules.

VCD simulation output waveforms are depicted on the following Figures. The simulations were conducted for arbitrary selected N and D. The clock frequency was set 250 MHz. Pseudocode Verilog snippet for the test bench is present in the listing 3 - 2. In the test bench, one unit of time corresponds to 1 ns. (based on the set timescale settings) The division unit algorithm starts at the next positive edge of clock signal after successful determination of the value *bitsToShift* when the *start* signal is set on low.

```
timescale 1ns/1ns
     #10; // wait for 10 units of time
     #0 rstScale = 1; startScale = 0; // reset unit for determining the
    number of bits to shift in the denominator and do not start the unit yet
     N = 32'b0000000100110000_00001000000000; D=32'
    304.03125, denominator to D = -0.25
     #10 rstScale = 0; // wait for 10 units of time and stop the reset of
    scaling unit
     #10 startScale = 1; // start the algorithm for scaling unit
     #20 rst = 1; start = 0; // reset the division unit
     #30 rst = 0; // stop reseting of the division unit
     #20 start = 1; // start the division unit
     #20 start = 0;
10
     #1000; // wait 1000 units of time
     $finish; // finish the simulation
```

Code 3 - 2 Pseudocode snippet for the Verilog simulation test bench.



Figure 3 - 4 Selected signals of simulation of division N/D = 10 / 7. The correct result in R0 is obtained after two iterations (reg number Of Iterations).



Figure 3 - 5 Selected signals of simulation of division N/D = 1 / 0.25. The correct result in R0 is obtained after five iterations (reg number Of Iterations).



Figure 3 - 6 Selected signals of simulation of division N/D = 1 / (-0.25). The correct result in R0 is obtained after five iterations (reg number Of Iterations).



Figure 3 - 7 Selected signals of simulation of division N/D = 304.03215 / (-0.25). The correct result in R0 is obtained after five iterations (reg number Of Iterations).



Figure 3 - 8 Selected signals of simulation of division N/D = 10 / (519). The correct result in R0 is obtained after two iterations (reg numberOfIterations).

#### 4 Using CORDIC to calculate trigonometric functions

There are numerous ways how to calculate the trigonometric functions. To gain more flexibility the Coordinate Rotation Digital Computer (CORDIC) was chosen above the Look-Up Table (LUT) implementation.

The LUT method may be fast, but the accuracy depends on the size of the table. When using the CORDIC the precision depends on number of performed iterations of the algorithm. The modified algorithm may be used to calculate non-trivial functions, such as hyperbolic functions, square roots, multiplications, divisions, exponentials and logarithms. [5] In this work only the calculation of *sinus* and *cosinus* functions is used.

#### 4.1 Theory

The theory of the first CORDIC was proposed by Volder in [6]. This algorithm computes a coordinate conversion between rectangular (x, y) and polar  $(R, \theta)$  coordinates. The algorithm was then generalized by Walther in [7] to include circular, linear and hyperbolic transforms. This paper utilizes only circular transforms to calculate sinus and cosinus functions. Only the most basic approach of the algorithm will be presented.

The rotation of a vector in the rectangular coordinate system (x, y) may be described by matrix-vector multiplication depicted in the eq. 4 - 1.

$$\begin{pmatrix} x_{\rm R} \\ y_{\rm R} \end{pmatrix} = \begin{pmatrix} \cos(\theta) & -\sin(\theta) \\ \sin(\theta) & \cos(\theta) \end{pmatrix} \begin{pmatrix} x_{\rm in} \\ y_{\rm in} \end{pmatrix},$$
 (4 - 1)

where  $x_R$  and  $y_R$  are coordinates of a rotated vector,  $\theta$  is the angle for which the vector with coordinates  $x_{in}$  and  $y_{in}$  was rotated.

Then when simplifying the equation

$$\begin{pmatrix} x_{\rm R} \\ y_{\rm R} \end{pmatrix} = \cos(\theta) \begin{pmatrix} 1 & -\tan(\theta) \\ \tan(\theta) & 1 \end{pmatrix} \begin{pmatrix} x_{\rm in} \\ y_{\rm in} \end{pmatrix}$$
 (4 - 2)

it can be seen, that only multiplication by scaling factor of precalculated values of  $\cos(\theta)$ , multiplication by  $\tan(\theta)$ , subtraction and addiction operations are needed. However, the multiplication by  $\tan(\theta)$  can be interchanged. The interchange may be done for angles  $\theta$  for which the equation 4 - 3 is true. The when implementing the algorithm to the FPGA the multiplication may be swapped for signed right bit shift.

$$\tan(\theta) = 2^{-1}. (4-3)$$

When the values  $x_{in} = 1$  and  $y_{in} = 0$  are used, the result for sinus and cosinus may be easily obtained from  $x_R$  and  $y_R$  as expressed in the equation 4 - 4.

$$x_{R} = x_{\text{in}}\cos(\theta) - y_{\text{in}}\sin(\theta) = |\theta = 0| = \cos(\theta)$$

$$y_{R} = x_{\text{in}}\sin(\theta) + y_{\text{in}}\cos(\theta) = |\theta = 0| = \sin(\theta)$$
(4 - 4)

The algorithm may be further simplified by expecting that the algorithm is designed to use more than 6 iterations and thus the scaling constant represented by multipliying cosinus of different  $\theta$  values converges to 0,60725. So there is no need to precalculate all the scaling values only the convergenent value may be used. In this paper the precalculated values are passed from the custom LUT module to the

main algorithm.

As can be seen from the section  $Example \ of \ calculation$  section or the algorithm theory itself, it needs to be determined, if the angle for which the vector is rotated in the next iteration should be in a positive direction (counter-clockwise) or negative direction (clockwise). For that, the set of the equations is expanded and new value  $z_i$  added. The complete set of equations which are used in the implementation are as follows.

$$x[i+1] = x[i] - \sigma_i 2^{-i} y[i],$$

$$y[i+1] = y[i] + \sigma_i 2^{-i} x[i],$$

$$z[i+1] = z[i] - \sigma_i \operatorname{atan}(2^{-i}).$$
(4 - 5)

The  $\sigma_{i+1}$  is determined based on the sign of the  $z_{i+1}$  variable

$$\sigma_{i+1} = \left\{ \begin{array}{l} -1, \text{ if } z_{i+1} < 0\\ 1, \text{ if } z_{i+1} > 0\\ 0, \text{ if } z_{i+1} = 0 \end{array} \right\}$$

$$(4-6)$$

The algorithm as presented calculates the correct values for sinus and cosinus functions only in the first and fourth quadrant ( $3\pi/2$  to  $\pi/2$  counter-clockwise). For usage in the whole  $2\pi$  range, corresponding actions before the 0. iteration must be made.

The algorithm must make checks, to determine the quadrant, where the desired angle  $\theta$  for which the sinus and cosinus functions are to be calculated. This is done by if statements at the algorithm values initialization and at the final function value calculation. If the desired argument of the functions is not in the first or fourth quadrant then the angle is transferred from the actual quadrant to the first or fourth quadrant. Based on the quadrant, to which the angle is transformed, the  $\sigma_i$  value is set. The corresponding if statements a the algorithm initialization are presented in the pseudocode 4 - 1.

Similar if statements are used at the final calculation of sinus and cosinus values. The if statements are presented in the pseudocode 4 - 2.

The pseudocodes use initialZValue as a desired angle  $\theta$ , for which to calculate the function values, zValue as a temporary value for calculating the iterations for  $z_i$  variables, sigmaValue for temporary value holding the current iteration value of  $\sigma_i$ , the resultCos and resultSin variables are used for storing the temporary and final values of the  $\cos(\theta)$  and  $\sin(\theta)$  values respectively.

```
if((initialZValue > 1.5707)&(initialZValue < 3.141592))
    sigmaValue = -1
    zValue = initialZValue - 3.141592

else if((initialZValue > 3.141592)&(initialZValue < 4.7123))
    sigmaValue = 1
    zValue = initialZValue - 3.141592

else
    zValue = initialZValue
    sigmaValue = 1
end</pre>
```

Code 4 - 1 Pseudocode for if statements used at the value initialization of the CORDIC algorithm.

```
if((initialZValue > 1.5707)&(initialZValue < 3.141592))
```

```
resultCos = - resultCos
resultSin = resultSin

less if((initialZValue > 3.141592)&(initialZValue < 4.7123))
resultCos = - resultCos
resultSin = - resultSin

end</pre>
```

Code 4 - 2 Pseudocode for if statements used at the final sinus and cosinus value calculation.

#### 4.1.1 Example of calculation

The general approach of CORDIC algorithm may be explained on the example for calculating the sinus and cosinus values for the angle  $\theta=57,535$ °. Firstly, the angle may be destructurized in the base angles, for which the equation 4 - 3 is true. In this example the is destructurized as 57,535=45+25,565-14,03.

The index i of the variables  $x_i$  and  $y_i$  in the following equations means the number of iteration of the algorithm.

0. iteration 
$$\begin{pmatrix} x_0 \\ y_0 \end{pmatrix} = \cos(45^\circ) \begin{pmatrix} 1 & -1 \\ 1 & 1 \end{pmatrix} \begin{pmatrix} x_{\text{in}} \\ y_{\text{in}} \end{pmatrix}$$
, (4 - 7)

1. iteration 
$$\begin{pmatrix} x_1 \\ y_1 \end{pmatrix} = \cos(26, 565 \, °) \begin{pmatrix} 1 & -2^{-1} \\ 2^{-1} & 1 \end{pmatrix} \begin{pmatrix} x_0 \\ y_0 \end{pmatrix}$$
, (4 - 8)

2. iteration 
$$\begin{pmatrix} x_2 \\ y_2 \end{pmatrix} = \cos(-14, 03^{\circ}) \begin{pmatrix} 1 & -2^{-2} \\ 2^{-2} & 1 \end{pmatrix} \begin{pmatrix} x_1 \\ y_1 \end{pmatrix}$$
. (4 - 9)

Then after substitution the value of  $x_2$  and  $y_2$  may be obtained.

$$\begin{pmatrix} x_2 \\ y_2 \end{pmatrix} = \cos(45 \, °) \cos(25, 565 \, °) \cos(-14, 03 \, °) \begin{pmatrix} 1 & -2^{-2} \\ 2^{-2} & 1 \end{pmatrix} \begin{pmatrix} 1 & -2^{-1} \\ 2^{-1} & 1 \end{pmatrix} \begin{pmatrix} 1 & -1 \\ 1 & 1 \end{pmatrix} \begin{pmatrix} x_{\rm in} \\ y_{\rm in} \end{pmatrix}.$$
 (4 - 10)

From the equation 4 - 10 the values  $x_2$  and  $y_2$  represent the value of  $\cos(57, 535^\circ)$  and  $\sin(57, 535^\circ)$  respectively.

#### 4.2 Python Implementation

The CORDIC algorithm was for simplicity prototyped in python. This turned out very beneficial as the debugging of the code is much faster. The less complex and abstract python code may help with understanding and creating the designed algorithms more than Mathematica which uses some higher abstraction layers to make calculations optimized and easier for more complex problems. But when designing the low level mathematical algorithms, the lower and easier language the more easy is then to implement the design in Verilog or any other hardware description language.

The python code was as well used to precalculate the LUT for scaling factor and arcus tangens values for  $z_i$  calculations.

For the clarity, the python implementation is presented in the code 4 - 3. The code also calculates the error of the CORDIC calculated value from the python math library functions.

```
import math
```

```
3 # Defining starting values and empty arrays
4 totalNumberOfIterations = 12 # 12 - best tradeof between value and
     iterations
5 atanValues = []
6 scalingValues = [1]
7 initialXValueCordic = 1
8 initialYValueCordic = 0
9 initialZValueCordic = - 1.248 # angle for which to calculate cordic
initialSigmaValueCordic = 1
12 for x in range(totalNumberOfIterations):
     # Generating arcus tanges values of precalculated angles based on
     number of iterations
     atanValues.append(math.atan(1*2**(-x)))
14
      # Generating precalculated scaling values based on a number of
      scalingValues.append(scalingValues[x]*math.cos(atanValues[x]))
print("atanValues: ", atanValues)
print("scalingValues: ", scalingValues)
20
22 # Checking the initial value and moving it in the interval
if (initialZValueCordic > 1.5707) and (initialZValueCordic < 3.141592):</pre>
      zValue = initialZValueCordic - 3.141592
      sigmaValue = -1
25
      print("value in second q")
27 elif (initialZValueCordic > 3.141592) and (initialZValueCordic < 4.7123):
      zValue = initialZValueCordic - 3.141592
      sigmaValue = 1
      print("value in third q")
elif (initialZValueCordic < 0):</pre>
      sigmaValue = -1
      zValue = initialZValueCordic
     print("value in fourth q")
35 else:
      zValue = initialZValueCordic # For angle
      sigmaValue = initialSigmaValueCordic # For +- next angle
      print("value in first")
40 # Passing starting values to the calculation values
41 xValue = initialXValueCordic # For cos
42 yValue = initialYValueCordic # For sin
45 # CORDIC ALGORITHM
46 for x in range(totalNumberOfIterations):
```

```
47
      # Calculating next values of the current iteration x
48
      xNextValue = xValue - (sigmaValue*yValue)*2**(-x)
49
      yNextValue = yValue + (sigmaValue*xValue)*2**(-x)
      zNextValue = zValue - sigmaValue * atanValues[x]
51
      # Determining the signum of next angle (addition or subtraction)
53
      if zNextValue >= 0:
54
          sigmaNextValue = 1
      else:
          sigmaNextValue = -1
      # Values for new iteration
59
      xValue = xNextValue
60
      yValue = yNextValue
61
      zValue = zNextValue
      sigmaValue = sigmaNextValue
64
      print("iteration:", x, "xValue:", xValue, "yValue:", yValue, "zValue:",
      zValue, "sigmaValue:", sigmaValue, "\n")
 # Calculating results by scaling the result values from CORDIC by the
     scalingValue which depends on number of iterations which were made
68 resultCos = scalingValues[x-1] * xValue
 resultSin = scalingValues[x-1] * yValue
71 # Changing results sign based on the rotation of the initialZValueCordic
 if (initialZValueCordic > 1.5707) and (initialZValueCordic < 3.141592):</pre>
      resultCos = - resultCos
 elif (initialZValueCordic > 3.141592) and (initialZValueCordic < 4.7123):</pre>
      resultCos = - resultCos
      resultSin = - resultSin
76
78 #Calculating values based on the math library
79 mathResultCos = math.cos(initialZValueCordic)
80 mathResultSin = math.sin(initialZValueCordic)
82 # Calculating the error of CORDIC calculated values from the python math
     functions
83 errorCos = abs(resultCos) - abs(mathResultCos)
84 errorSin = abs(resultSin) - abs(mathResultSin)
```

Code 4 - 3 Python code of CORDIC implementation.

After the python implementation and debugging has been finalized, the circuit Verilog implementation of the algorithm could be initiated. Same as for the Division Unit IP, presented in *Calculating the division of fixed point numbers* section, the Data Path, Control Unit and Top Module was designed. This approach based on the application specific circuit design should be by its nature faster and more safe than creating the custom CPU with reduced and customized ISA.

### 4.3 IP Block Design

#### 4.3.1 Top module design

The top module design of the CORDIC IP is shown in the picture 4 - 1. As can be seen, the structure is very much similar to the Division Unit top module. When using the approach to create a customized circuit for algorithm the flow of creating the top modules is likely to be similar with minor differences in signals, inputs and variables.

The Data Path Moule in the top design incorporates the precalculated LUTs for *atanValues* and *scalingValues*. The LUT memory module's structure is very simple and therefore the Verilog interpretation is depicted only for *atanValues* variable. The value of *totalNumberOfIterations* is set to be 12 in this implementation, thus the LUT is 12x32 bits in size. Obivously the already presented custom fixed point *Q32.15* format is required.



Figure 4 - 1 Top module design for the CORDIC IP block design.

#### 4.3.2 Allocation and Timing

In the picture 4 - 2 the allocation and timing diagram is depicted. As can be seen, the if statements which are implemented in the control unit are documented here as well. The explanation why the if statements are needed is stated in the CORDIC *Theory* section. As stated in the section for CORDIC *Control Unit* there are two approaches of iteration cycles. The designer may choose jump from *S4* to *S2* for faster algorithm or from *S6* to *S2* for demonstrative aproach. The jumps in the allocation and timing diagram are not shown.



Figure 4 - 2 Alloccation and timing diagram for the Data Path Unit part of the CORDIC IP.

#### 4.3.3 Data Path Module

The picture 4 - 3 visualize the Data Path part of the Top Module design including calculation and storing units. The memory LUTs for atanValues and scalingValues are not depicted as a separate registers but as inputs to the calculation units. The results of sinus and cosinus functions, in python implementation named as resultSin and resultCos are saved to registers R9 and R10. The **NEG** blocks aren't in fact implemented as a standalone blocks for making negative numbers. The negation is activated in a corresponding target register when the appropriate **SelR** $_x$  is activated. (where x is here the number of a corresponding register R9 or R10)

As was stated before, the implementation of the LUT memory module for at an Values is depicted in this section in code 4 - 4.



Figure 4 - 3 Register transfer level RTL scheme of the CORDIC IP Data Path Unit IP.

```
module atanValuesCordicLUT(index, returnValue);
input [3:0] index;
4 output reg signed [31:0] returnValue;
7 always@(index)
8 begin
    case(index)
        4'b0000: returnValue = 32'sb000000000000000_110010010000111; //
10
    0.7853981633974483
        4'b0001: returnValue = 32'sb00000000000000 011101101011010; //
    0.4636476090008061
        4'b0010: returnValue = 32'sb0000000000000000_001111101011011; //
12
    0.24497866312686414
        0.12435499454676144
        0.06241880999595735
        4'b0101: returnValue = 32'sb0000000000000000_0000011111111111; //
15
    0.031239833430268277
        4'b0110: returnValue = 32'sb0000000000000000000000111111111; //
    0.015623728620476831
```

```
4'b0111: returnValue = 32'sb000000000000000000000011111111; //
    0.007812341060101111
        4'b1000: returnValue = 32'sb00000000000000 00000001111111; //
18
    0.007812341060101111
        4'b1001: returnValue = 32'sb0000000000000000000000000111111; //
    0.0019531225164788188
        4'b1010: returnValue = 32'sb00000000000000 00000000011111; //
20
    0.0009765621895593195
        4'b1011: returnValue = 32'sb00000000000000000000000000001111; //
    0.0004882812111948983
        endcase
23
 end
24
 endmodule
```

Code 4 - 4 Verilog code of a atanValues LUT implementation. The LUT structure for scalingValues is very similar and therefore not depicted here.

#### 4.3.4 Control Unit

Same way as in a Division Module Control unit, presented in *Control Unit* section, the control signal encoding table 4 - 1 for Data Path CORDIC unit is created. The control signal in the Verilog design is named CS.

The branches of if statements used in the design has been colorcoded in the table for improved clarity. The iteation jumps are not depicted in the control signal table. The jumps may be performed from the step *S4*, when the speed of the calculation is the main concern, or from *S6*, when the alogrithm function is presented. The steps *S5* and *S6* are mainly focused on multiplying the result of iteration by the appropriate scaling value and on transforming the results based on the quadrant of the original wanted angle value.

|       |                                                                                                                                                                                                                                                         |   | 25 24   |   | ** | 21 20 | - 10 | 18 | 17     | 16 | 15 | 14 | 13       | 12 | 11       | 10 | 9 |       |       | ,      |       | 4 | 3   |         |         |         |                      |
|-------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---|---------|---|----|-------|------|----|--------|----|----|----|----------|----|----------|----|---|-------|-------|--------|-------|---|-----|---------|---------|---------|----------------------|
| State |                                                                                                                                                                                                                                                         |   | ld2 ld3 |   |    |       |      |    | Id10 : |    |    |    | SelR3[0] |    | SelR4[0] |    |   | SelR7 | SelR9 | SelR10 | SelM1 |   |     | SelSub1 | SelSub2 | SelAdd1 | CV                   |
| S0    | R0 ← totalNumberOffterations;<br>R1 ← initialXValue;<br>R2 ← initialZValue;<br>R3 ← initialZValue;                                                                                                                                                      | 1 | 1 1     | 0 | 0  | 0 0   | 0    | 0  | 0      | 0  | 0  | 0  | 0        | 0  | 0        | 0  | 0 | 0     | 0     | 0      | 0     | 0 | 0   | 0       | 0       | 0       | 27°h7000000          |
|       | if(R3 >6.283184)<br>R3 ← R3 − 6.283184; (Sub1)                                                                                                                                                                                                          | 0 | 0 1     | 0 | 0  | 0 0   | 0    | 0  | 0      | 0  | 0  | 1  | 0        | 0  | 0        | 0  | 0 | 0     | 0     | 0      | 0     | 0 | - 1 | 0       | 0       | 0       | 27'h1004008          |
| SI    | if R3 $\leftarrow$ 6.283184)<br>R3 $\leftarrow$ R3 + 6.283184; (Add1)<br>if \( \left\) (R3 \left\) 6.283184\(\delta\) (R3 \left\) \( \right\) \( \righta\) nextState \( \left\) S2;                                                                     | 0 | 0 1     | 0 | 0  | 0 0   | 0    | 0  | 0      | 0  | 0  | 0  | 1        | 0  | 0        | 0  | 0 | 0     | 0     | 0      | 0     | 0 | 0   | 0       | 0       | 1       | 27'h1002001<br>27'h0 |
| S2    |                                                                                                                                                                                                                                                         | 0 | 0 0     | 0 | 0  | 0 0   | 0    | 0  | 0      | 0  | 0  | 0  | 0        | 0  | 0        | 0  | 0 | 0     | 0     | 0      | 0     | 0 | 0   | 0       | 0       | 0       |                      |
|       | $\begin{array}{l} if[(23.5-6.283184)k(R3.3-6.283184)] \rightarrow nextState \Leftarrow S3; CS = 0; else \rightarrow nextState \Leftarrow S1;\\ if[(R3.51.5707)kc(R3.3-3.141592)]\\ R4 \leftarrow R3 = 3.141592; (Sub1)\\ R5 \leftarrow -1; \end{array}$ | 0 | 0 0     | 1 | 1  | 0 0   | 0    | 0  | 0      | 0  | 0  | 0  | 0        | 1  | 0        | 1  | 0 | 0     | 0     | 0      | 0     | 0 | 0   | 0       | 0       | 0       | 27'hC01400           |
| S3    | if[(R3 > 3, 141592,\( \)\( \)\( \)\( \)\( \)\( \)\( \)\( \                                                                                                                                                                                              | 0 | 0 0     | 1 | 1  | 0 0   | 0    | 0  | 0      | 0  | 0  | 0  | 0        | 1  | 0        | 0  | 0 | 0     | 0     | 0      | 0     | 0 | 0   | 0       | 0       | 0       | 27'hC01000           |
|       | n(R5 <0)<br>R5 ← -1;<br>R4 ← R3;                                                                                                                                                                                                                        | 0 | 0 0     | 1 | 1  | 0 0   | 0    | 0  | 0      | 0  | 0  | 0  | 0        | 0  | 1        | 1  | 0 | 0     | 0     | 0      | 0     | 0 | 0   | 0       | 0       | 0       | 27°hC00C00           |
|       | clse $R4 \leftarrow R3$ ; $R5 \leftarrow 1$ ;                                                                                                                                                                                                           | 0 | 0 0     | 1 | 1  | 0 0   | 0    | 0  | 0      | 0  | 0  | 0  | 0        | 0  | 1        | 0  | 0 | 0     | 0     | 0      | 0     | 0 | 0   | 0       | 0       | 0       | 26'hC'00800          |
| S4    | $R6 \leftarrow R1 \times R5$ ; (Mul1)<br>$R7 \leftarrow R2 \times R5$ ; (Mul2)<br>$R8 \leftarrow tant/AlbesfnumberOfliteration   x R5$ ; (Mul3)                                                                                                         | 0 | 0 0     | 0 | 0  | 1 1   | 1    | 0  | 0      | 0  | 0  | 0  | 0        | 0  | 0        | 0  | 1 | 1     | 0     | 0      | 1     | 1 | 0   | 0       | 0       | 0       | 26'h380330           |
| S5    | R6 $\leftarrow$ R6 **numberOflicration; (Sh1)<br>R7 $\leftarrow$ R7 **numberOflicration; (Sh2)<br>R4 $\leftarrow$ R4 $\leftarrow$ R8; (Sub2)                                                                                                            | 0 | 0 0     | 1 | 0  | 1 1   | 0    | 0  | 0      | 0  | 0  | 0  | 0        | 0  | 0        | 0  | 0 | 0     | 0     | 0      | 0     | 0 | 0   | 1       | 1       | 0       | 26'hB00006           |
| S6    | R4 > 0;<br>R1 + R1 - R7; $(Sub2)R2 + R2 + R6$ ; $(Add1)R5 \leftarrow 1;$                                                                                                                                                                                | 1 | 1 0     | 0 | 1  | 0 0   | 0    | 0  | 0      | 1  | 1  | 0  | 0        | 0  | 0        | 0  | 0 | 0     | 0     | 0      | 0     | 0 | 0   | 0       | 0       | 0       | 26'h6418000          |
| Sō    | $ \begin{array}{lll} R4 < 0; \\ R1 \leftarrow R1 - R7; (Sub2) \\ R2 \leftarrow R2 + R6; (Add1) \\ R5 \leftarrow -1; \end{array} $                                                                                                                       | 1 | 1 0     | 0 | 1  | 0 0   | 0    | 0  | 0      | 1  | 1  | 0  | 0        | 0  | 0        | 1  | 0 | 0     | 0     | 0      | 0     | 0 | 0   | 0       | 0       | 0       | 26'h6418400          |
| S7    | $R9 \leftarrow R1 \times scalingValues[numberOflteration]; (Mul1)$<br>$R10 \leftarrow R2 \times scalingValues[numberOflteration]; (Mul2)$                                                                                                               | 0 | 0 0     | 0 | 0  | 0 0   | 0    | 1  | 1      | 0  | 0  | 0  | 0        | 0  | 0        | 0  | 0 | 0     | 1     | 1      | 0     | 0 | 0   | 0       | 0       | 0       | 26'h600C0            |
|       | if[(R3 > 3.141592)&(R3 < 4.7123)]<br>$R9 \leftarrow R9 \times (-1); (Neg1)$<br>$R10 \leftarrow R10 \times (-1); (Neg2)$                                                                                                                                 | 0 | 0 0     | 0 | 0  | 0 0   | 0    | 1  | 1      | 0  | 0  | 0  | 0        | 0  | 0        | 0  | 0 | 0     | 0     | 0      | 0     | 0 | 0   | 0       | 0       | 0       | 26'h60000            |
| S8    | $ \begin{array}{l} if[(R3 > 1.5707)\&(R3 < 3.141592)] \\ R9 \leftarrow R9 \times (\cdot 1); (Neg1) \\ R10 \leftarrow R10; \end{array} $                                                                                                                 | 0 | 0 0     | 0 | 0  | 0 0   | 0    | 1  | 0      | 0  | 0  | 0  | 0        | 0  | 0        | 0  | 0 | 0     | 0     | 0      | 0     | 0 | 0   | 0       | 0       | 0       | 26'h40000            |
|       | else $R9 \leftarrow R9$ ; $R10 \leftarrow R10$ ;                                                                                                                                                                                                        | 0 | 0 0     | 0 | 0  | 0 0   | 0    | 0  | 0      | 0  | 0  | 0  | 0        | 0  | 0        | 0  | 0 | 0     | 0     | 0      | 0     | 0 | 0   | 0       | 0       | 0       | 24'h0                |

Table 4 - 1 Control signal encoding table for instructions to be processed by the CORDIC Module.

#### 4.4 Simulation results

The testbench for testing the design is created with cocotb and simulated with Verilator.

As can be seen when implementing the algorithm where the actual iteration value for *sinus* and *cosinus* is calculated, the number of cycles needed for the final calculation can be calculated

$$NoCyc_{\text{result every iteration}} = 2 + (5NoIt),$$
 (4 - 11)

where NoCyc (-) is the number of cycles and NoIt is the number of iterations for the CORDIC algorithm. The 2 value is for S0 and S1 and the multiplication by 5 is because of states S2–S6. When the result of the CORDIC algorithm is calculated only once at the end of the algorithm, the number of iteration can be determined by

$$NoCyc_{\text{result at the end}} = 2 + (3NoIt) + 2,$$
 (4 - 12)

where the multiplication by value 3 is caused by states S2–S4 and the addition of the second 2 is casued by states S5–S6.

The frequency of the clock signal in this design is currently set as 50 MHz.



Figure 4 - 4 The whole Verilog simulation of CORDIC algorithm for determining the sinus and cosinus values of angle  $\theta = -1.2479$  rad. The value of sinus and cosinus based on the current iteration is also calculated in this algorithm approach. The result is passed to the registers R9 and R10.



Figure 4 - 5 The detail of the last iteration of the Verilog simulation of CORDIC algorithm for determining the sinus and cosinus values of angle  $\theta = -1.2479$  rad. The result is passed to the registers R9 and R10.



Figure 4 - 6 The whole Verilog simulation of CORDIC algorithm for determining the sinus and cosinus values of angle  $\theta=10.7195129$  rad. The value of sinus and cosinus based on the current iteration is also calculated in this algorithm approach. The result is passed to the registers R9 and R10.



Figure 4 - 7 The whole Verilog simulation of CORDIC algorithm for determining the sinus and cosinus values of angle  $\theta = 6.7195129$  rad. The value of sinus and cosinus based on the current iteration is also calculated in this algorithm approach. The result is passed to the registers R9 and R10.

# 5 Simple set of nonlinear equations solved by a Newton-Raphson algorithm using custom circuit implementation

All the presented parts in previous sections may be utilized to solve the system of nonlinear equations. This work leads to solving the transcendetal equations for Selective Harmonic Elimination. But the best approach is to firstly solve an easier set of equations to determine, if the approach of NR is viable.

#### 5.1 Theory

The objective of the NR algorithm is to solve the set of nonlienar equations

$$F_1(x_1, x_2) = x_1^3 - x_2 - 1, (5-1)$$

$$F_2(x_1, x_2) = x_1 - 2x_2 - 2, (5-2)$$

where one possible set of solutions  $x_1$  and  $x_2$  yields

$$F_1 = 0,$$
 (5 - 3)

$$F_2 = 0.$$
 (5 - 4)

The algorithm could be implemented in a custom CPU with reduced instruction set but for the obvious reasons, eg. speed and complexity of developing own RISC-V, the approach of creating the application specific circuit design was used.

To be able to implement the algorithm to the custom design, the general NR algorithm approach had to be simplified to the most low level implementation. Every single part that could be precalculated was set as a static value at the design step.

To check if the implementation and algorithm was well designed, the solution by Solve function and a customized NR was made in Wolfram Mathematica. Before the start of the algorithm the starting values of  $x_1^0$  and  $x_2^0$  were set as an input to the module. Based on that input the function values at selected starting points were calculated.

As a next step, the so called defect could be calculated using the newly found values of  $F_1(x_1^0)$  and  $F_2(x_1^0, x_2^0)$ 

$$\Delta \mathbf{F}^{i} = \begin{pmatrix} \Delta F_{1}^{i} \\ \Delta F_{2}^{i} \end{pmatrix} = \begin{pmatrix} F_{1}^{i} - F_{1}^{\text{known solution}} \\ F_{2}^{i} - F_{2}^{\text{known solution}} \end{pmatrix}, \tag{5-5}$$

where the superscript i is the number of iteration for which the defect is calculated. When the algorithm starts, the i = 0. So for example the input value for  $F_1^0$  is  $x_1^0$  and  $x_2^0$ .

Next the Jacobian matrix **J** from vector of functions  $(F)(x_1, x_2) = (F_1, F_2)$  is calculated as follows.

$$\mathbf{J}^{i} = \begin{pmatrix} \frac{d\mathbf{F}_{1}}{dx_{1}^{i}} & \frac{d\mathbf{F}_{1}}{dx_{2}^{i}} \\ \frac{d\mathbf{F}_{2}}{dx_{1}^{i}} & \frac{d\mathbf{F}_{2}}{dx_{2}^{i}} \end{pmatrix} = \begin{pmatrix} 3(x_{1}^{i})^{2} & -1 \\ 1 & -2 \end{pmatrix}. \tag{5-6}$$

As for the general NR algorithm, the inverted value of Jacobian matrix needs to be calculated. The problem is that when using general mathematical software, such as Wolfram Mathematica, the calculation of the inverted value is as easy as using function of inversion. When designing the circuit, the approach of

manual calculation of inversion must be used. In this paper, the calculation is made possible by calculating the determinant of the Jacobian Matrix, its reciprocal value, its adjugate matrix and multiplication of the adjugate matrix elements by the calculated determinant reciprocal value.

Because the size of the Jacobian matrix is 2x2 the determinant may be easily calculated using the Sarrus Rule. When the matrix is more complicated, the expansion method may be utilized.

$$\det(\mathbf{J}) = 3(x_1^i)^2(-2) - (-1) = 3(x_1^i)^2(-2) + 1. \tag{5-7}$$

The reciprocal value of the determinant is then calculated by the Division Unit, created for calculating division of arbitrary numbers real numbers. This Division Unit is presented in the section *Calculating* the division of fixed point numbers.

The adjugate matrix is calculated as follows

$$\operatorname{adj}(\mathbf{J}) = \begin{pmatrix} \mathbf{J}_{11}(-1)^{1+1} & \mathbf{J}_{01}(-1)^{1+2} \\ \mathbf{J}_{10}(-1)^{1+2} & \mathbf{J}_{00}(-1)^{2+2} \end{pmatrix} = \begin{pmatrix} -2 & -1 \\ 1 & 3(x_1^i)^2 \end{pmatrix}.$$
 (5 - 8)

After the calculation of the reciprocal value of the determinant of the Jakobi matrix and the adjugate matrix, the inverted Jakobi matrix bay be finally calculated

$$\mathbf{J}^{-1i} = \frac{1}{\det(\mathbf{J}^i)} \begin{pmatrix} \operatorname{adj}(\mathbf{J}_{00}^i) & \operatorname{adj}(\mathbf{J}_{01}^i) \\ \operatorname{adj}(\mathbf{J}_{10}^i) & \operatorname{adj}(\mathbf{J}_{10}^i) \end{pmatrix} = \frac{1}{\det(\mathbf{J}^i)} \begin{pmatrix} -2 & -1 \\ 1 & 3(x_1^i)^2 \end{pmatrix}. \tag{5-9}$$

Next the  $(\Delta x_1^i, \Delta x_2^i)$  is to be calculated by using the inverted Jacobi matrix and the defect.

$$\begin{pmatrix} \Delta x_1^i \\ \Delta x_2^i \end{pmatrix} = \begin{pmatrix} \mathbf{J}_{00}^{-1i} \ \Delta F_1^i + \mathbf{J}_{01}^{-1i} \ \Delta F_2^i \\ \mathbf{J}_{10}^{-1i} \ \Delta F_1^i + \mathbf{J}_{11}^{-1i} \ \Delta F_2^i \end{pmatrix}. \tag{5-10}$$

Now the next iteration value denoted as i + 1 of  $x_1$  and  $x_2$  may be calculated

$$\begin{pmatrix} x_1^{i+1} \\ x_2^{i+1} \end{pmatrix} = \begin{pmatrix} x_1^i + \Delta x_1^i \\ x_2^i + \Delta x_2^i \end{pmatrix}.$$
 (5 - 11)

With those new iteration values  $x_1^{i+1}$   $x_2^{i+1}$  the loop for calculation starts again at the calculation of the new value  $F_1^{i+1}$   $F_2^{i+1}$  which is presented at the start of this section.

#### 5.2 IP Block Design

#### 5.2.1 Top module design

The picture 5 - 1 depicts the top module design of the circuit. The Control Unit sends control signals to the Data Path unit to make the desired calculations. As in all designs in this paper, the numbers are formatted in the Q32.15 fixed point format.



Figure 5 - 1 Top module design for the simple NR calculation unit IP block design.

### 5.2.2 Allocation and Timing

The algorithm structure for the Verilog implementation is depicted in the data flow diagram in the picture 5 - 2. The algorithm iteration jumps (explained in the section *Control unit* of the simple NR algorithm ) are not displayed in this diagram.



Figure 5 - 2 Allocation and timing diagram for the Data Path Unit part of the simple NR IP.

#### 5.2.3 Data Path Unit

The Data path unit for this simple NR algorithm consists of four multipliers, two adders, two subtractors and one divider. The divider is implemented using the Division Unit, presented in the section *Calculating* the division of fixed point numbers. When the algorithm has finished the results for  $x_1$  and  $x_2$  are saved in the R1 and R2, the state S11 is set and done signal is set to 1. The results then can be driven to another module or unit for further usage. In fact the done signal is driven in the Control Unit and can be used in controlling the possible module, where the NR module is only part of the design.



Figure 5 - 3 Register transfer level RTL scheme of the IP Data Path Unit part of the simple NR calculation IP.

#### **5.2.4** Control Unit

The encoding table 5 - 1 shows the steps of the algorithm with a corresponding control signal for the Data Path Unit of the simple NR algorithm Verilog implementation.

The NR algorithm iteration jumps are carried out from the state S10 to state S1, when the number of iteration is lower than the set total number of iterations, which is hardcoded to the Control Unit. At this implementation, the total number of iterations is se to be 5. In fact, the end of the NR algorithm should be determined based on the defect value. In this simple example, the value check of the defect is not implemented. The implementationwould be simple though. The value of register holding the defect values R3 and R4 would be wired to the control unit in the corresponding steps S4 and S5 respectively and the comparation with the desired defect value would be performed. If the defect value was smaller than the desired value, the next state of the algorithm would be S6 and the iteration would end. If the defect was larger than the desired value, the next state would be S6 and the iteration would complete normally and loop from the state S10 to S1.

Table 5 - 1 Control signal encoding table for instructions to be processed by the simple NR alogrithm solve Module.

|          |                                                                                                                                                                 | 36 | 34 | 22 2   | 31 | 20 | 29 | 28 | 27 | 26    | 25  | 24       | 23 |       | 21    | 36    | 10       | 19       | 12       | 16       | 16       | 14       | В     | - 10     |          | 10    |       |          | - 1      | - 6      | - 6     | - 4 | -        | - 1      |          |          |                              |
|----------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------|----|----|--------|----|----|----|----|----|-------|-----|----------|----|-------|-------|-------|----------|----------|----------|----------|----------|----------|-------|----------|----------|-------|-------|----------|----------|----------|---------|-----|----------|----------|----------|----------|------------------------------|
| State    |                                                                                                                                                                 |    |    | 1d3 1d |    |    |    |    |    | SelR2 |     | SeIR3[0] |    | SelR5 | SciR6 | SelR7 | SelR8[1] | SelR8[0] | SelMI[1] | SelMI[0] | SelM2[1] | SelM2[0] | SelM3 | SelM4[1] | SciM4[0] | SelM5 | SelM6 | SciAI[I] | SciA1[0] | SelA2[1] | SelA2[0 |     | SelSI[1] | SelSI[0] | SelS2[1] | SelS2[0] | CV                           |
| 50       | $R1 \leftarrow x1;$<br>$R2 \leftarrow x2;$                                                                                                                      | 1  | 1  | 0 0    | 0  | 0  | 0  | 0  | 0  | 0     | 0   | 0        | 0  | 0     | 0     | 0     | 0        | 0        | 0        | 0        | 0        | 0        | 0     | 0        | 0        | 0     | 0     | 0        | 0        | 0        | 0       | 0   | 0        | 0        | 0        | 0        | 36°hC00000000                |
| SI       | $R3 \leftarrow R1 \times R1; (1)$<br>$R4 \leftarrow R2 \times 2; (2)$                                                                                           | 0  | 0  | 1 1    | 0  | 0  | 0  | 0  | 0  | 0     | - 1 | 0        | -1 | 0     | 0     | 0     | 0        | 0        | - 1      | - 1      | - 1      | - 1      | - 1   | 1        | 0        | 0     | 0     | 0        | 0        | 0        | 0       | 0   | 0        | 0        | 0        | 0        | 36'h30283F000h               |
| S2       | $R3 \leftarrow R3 \times R1; (1)$<br>$R4 \leftarrow R1 - R4; (2)$<br>$R5 \leftarrow R3 \times 3; (3)$                                                           | 0  | 0  | 1 1    | 1  | 0  | 0  | 0  | 0  | 0     | 1   | 0        | 0  | 1     | 0     | 0     | 0        | 0        | 1        | 0        | 1        | 1        | 0     | 0        | 0        | 1     | 1     | 0        | 0        | 0        | 0       | 0   | 0        | 0        | 1        | 0        | 36°h38242C602                |
| S3       | $R3 \leftarrow R3 - R2; (1)$<br>$R4 \leftarrow R4 - 2; (2)$<br>$R8 \leftarrow R5 \times (-2); (1)$<br>$R7 \leftarrow F20;$                                      | 0  | 0  | 1 1    | 0  | 0  | 1  | 1  | 0  | 0     | 0   | 1        | 0  | 0     | 0     | 1     | 1        | 0        | 0        | 1        | 1        | 0        | 0     | 0        | 0        | 0     | 0     | 0        | 0        | 0        | 0       | 0   | 1        | 0        | 0        | 1        | 36°h331198009                |
| S4       | $R3 \leftarrow R3 - 1; (1)$<br>$R4 \leftarrow R7 - R4; (2)$<br>$R8 \leftarrow R8 + 1; (1)$<br>$R6 \leftarrow F10;$                                              | 0  | 0  | 1 1    | 0  | 1  | 0  | 1  | 0  | 0     | 0   | 1        | 0  | 0     | 1     | 0     | 0        | 1        | 0        | 0        | 0        | 0        | 0     | 0        | 0        | 0     | 0     | 1        | 0        | 1        | 0       | 0   | 0        | 1        | 0        | 0        | 36°h351240144                |
| \$5      | $R3 \leftarrow R6 - R3; (1)$<br>$R8 \leftarrow 1 / R8; (1)$                                                                                                     | -  | 0  | 1 0    | 0  | 0  | 0  | 1  | 0  | 0     | 0   | 1        | 0  | 0     | 0     | 0     | 0        | 0        | 0        | 0        | 0        | 0        | 0     | 0        | 0        | 0     | 0     | 0        | 0        | 0        | 0       | 0   | 0        | 0        | 0        | 0        | 36°h211000000                |
| S6<br>S7 | R8 load from division when data is available<br>$R6 \leftarrow R8 \times (-2); (1)$<br>$R8 \leftarrow R8 \times (-1); (2)$<br>$R5 \leftarrow R5 \times R8; (3)$ |    | -  | 0 0    | 1  | 1  | 0  | 1  | 0  | 0     | 0   | 0        | 0  | 1     | 0     | 0     | 1        | 1        | 0        | 0        | 0        | 1        | 0     | 0        | 1        | 0     | 0     | 0        | 0        | 0        | 0       | 0   | 0        | 0        | 0        | 0        | 36°h10000000<br>36°hD04C4800 |
| S8       | $R6 \leftarrow R3 \times R6; (1)$<br>$R4 \leftarrow R4 \times R8; (2)$<br>$R5 \leftarrow R3 \times R8; (3)$<br>$R7 \leftarrow R5 \times R4; (4)$                | 0  | 0  | 0 1    | 1  | 1  | 1  | 0  | 0  | 0     | 0   | 0        | 1  | 1     | 0     | 0     | 0        | 0        | 1        | 0        | 0        | 0        | 0     | 0        | 0        | 1     | 0     | 0        | 0        | 0        | 0       | 0   | 0        | 0        | 0        | 0        | 36'h1E0C20400                |
| 59       | $R3 \leftarrow R6 + R4; (1)$<br>$R5 \leftarrow R5 + R7; (2)$                                                                                                    | 0  | 0  | 1 0    | 1  | 0  | 0  | 0  | 0  | 0     | 0   | 0        | 0  | 0     | 0     | 0     | 0        | 0        | 0        | 0        | 0        | 0        | 0     | 0        | 0        | 0     | 0     | 0        | - 1      | 0        | 1       | - 1 | 0        | 0        | 0        | 0        | 36°h2800000B0                |
| S10      | $R1 \leftarrow R1 + R3; (1)$<br>$R2 \leftarrow R2 + R5; (2)$                                                                                                    | 1  | 1  | 0 0    | 0  | 0  | 0  | 0  | 1  | 1     | 0   | 0        | 0  | 0     | 0     | 0     | 0        | 0        | 0        | 0        | 0        | 0        | 0     | 0        | 0        | 0     | 0     | 0        | 0        | 0        | 0       | 0   | 0        | 0        | 0        | 0        | 36°hC0C000000                |
| S11      |                                                                                                                                                                 | х  | х  | x x    | х  | X  | x  | х  | x  | x     | X   | X        | X  | x     | x     | x     | x        | X        | x        | x        | X        | X        | x     | х        | X        | X     | X     | x        | X        | x        | X       | X   | x        | X        | X        | x        | 36°hxxxxxxxx                 |

#### 5.3 Simulation results

The test bench for simulation was made using Cocotb [1] with the Verilator [2] as a simulator. The result of the calculation may be seen in the registers R1 and R2. The results are  $x_1 = -0.707489$  and  $x_2 = -1.353759$ 

The clock signal frequency for this design is currently 20 MHz.



Figure 5 - 4 The whole Verilog simulation of a simple NR algorithm. The result is may be seen in registers R1 and R2 after the fifth iteration of the algorithm.

## **6** Selective Harmonic Elimination

### 6.0.1 Control Unit

Table 6 - 1 Control signal encoding table for instructions to be processed by the simple NR alogrithm solve Module.

| _     |                                                                                                                                                                                                                             |   | S1 58  |       | P 46    |      |       |        | _      |        |      | r 14   | -    | _       | - 11 | -      |        | -     |     |           | -,-      |          |        |        | "      | ,,        |          |        | -       |         |           |         |           |             |           |        |           | _         |          | _        | _       |         |            |            |        |          |           |         |       |                         |
|-------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---|--------|-------|---------|------|-------|--------|--------|--------|------|--------|------|---------|------|--------|--------|-------|-----|-----------|----------|----------|--------|--------|--------|-----------|----------|--------|---------|---------|-----------|---------|-----------|-------------|-----------|--------|-----------|-----------|----------|----------|---------|---------|------------|------------|--------|----------|-----------|---------|-------|-------------------------|
| State | RTL Cude                                                                                                                                                                                                                    |   | 41 142 | NO NO | 165 166 | M2 M | 107 1 | MER ME | 1 1012 | MID Se | DD 5 | G1 5-G | a sa | O Selle | SHIS | g sass | n sang | Silke | N/R | Svilki(1) | SARCKING | Seller S | 801 50 | RII Se | RIZE S | etkuppj s | SIRD(II) | SHRIDE | SelCort | SelCord | SelCir2 : | Selford | Settlette | Settlettitt | SetMattig | SHMIZE | Settlette | Settletop | E SADEAD | (I) Sets | 40(B) S | Diete S | etisakt[1] | Salsabajaj | 545662 | Septent. | Sellert : | SHARE I | SHEAR | cv                      |
| 50    | 80 ← x10;<br>R1 ← x20;                                                                                                                                                                                                      | 1 | 1 0    | 0 0   | 0 0     | 0 0  | 0     | 0 0    | 0      | 0 .    | 0    | 0 0    | 0    | - 0     | - 0  | - 0    | 0      | 0     | 0   | 0         | 0        | 0        | 0 (    | >      | 0      | 0         | 0        | 0      | 0       | 0       | 4         | 0       | 0         | 0           | 0         | 0      | 0         | 0         |          | -        | b       | 0       | 0          | 0          | 0      | 0        | 0         | 0       | 0     | 53/32/90000000000000    |
| 81    | R1 ← R0 x 5; (Mall)<br>R3 ← R1 x 5; (Mal2)                                                                                                                                                                                  |   | 0 1    | 1 0   | 0 0     | 0 0  | 0     | 0 0    | 0      | 0      | 0    | 0 1    |      | 0       | 0    |        | 0      | 0     | 0   | 0         | 0        | 0        | 0 1    | >      | 0      | 0         | 0        | 0      | 0       | 0       | 0         | 0       | 1         | 0           | 0         | 1      | 0         | 0         | 0        |          | •       | 0       | 0          | 0          | 0      | 0        | 0         | 0       | 0     | 53/360018000000000      |
| 82    | R4 COS(R0); (Coslict)<br>R5 COS(R1); (Coslict)<br>R6 COS(R2); (Coslict)<br>R7 COS(R1); (Coslict)<br>R8 SD(R0); (Coslict)<br>R9 SD(R1); (Coslict)<br>R10 SD(R2); (Coslict)<br>R11 SD(R2); (Coslict)<br>R11 SD(R2); (Coslict) |   |        |       |         |      | 1     |        |        |        | 0    |        | 0    |         |      |        | -      |       | -   |           |          | 0        |        |        | 0      |           | 0        |        |         |         |           | -       | 0         | 0           | 0         |        |           | 0         |          |          | ,       |         | 0          |            |        |          | 0         | 0       | 0     | 53°3.5°56°78006°9000    |
| 89    | R2 R4 R5 (Sid4)<br>R3 R6 R7 (Sid2)<br>R12 R8 x 5 (Mid1)<br>R4 R10 x 5 (Mid2)<br>R5 R11 x 5 (Mid2)                                                                                                                           | 0 | 0 1    |       | 1 0     | 0 0  | 0     | 0 0    | 1      | 0      | a    | 0 0    | 0    | 0       |      | 0      |        | 0     | 0   |           | 0        | 0        | 0 1    |        |        |           | 0        |        | 0       |         |           | 0       | 0         | 1           | 1         | 0      |           | 1         | 1        |          | ,       | 0       | 1          | 0          | 1      | 0        |           |         | a     | 57 1676 11220 000 66 50 |
| 84    | $R2 \leftarrow F10 - R2 \cdot (Sab1)$<br>$R3 \leftarrow F20 - R3 \cdot (Sab2)$<br>$R13 \leftarrow R9 \times R4 \cdot (Mal1)$<br>$R12 \leftarrow R11 \times R12 \cdot (Mal2)$                                                | 0 | 0 1    |       | 0 0     | 0 0  | 0     |        | -      | 1 -    | 0    |        | 0    |         |      | 0      | 0      | 0     | 0   | 0         | 0        | 0        |        |        | 0      | 0         | 1        | 0      | 0       |         |           | 0       | 0         | 1           | 0         | 0      | 1         |           | 0        |          | ,       | 0       | 0          | 1          |        | 0        | 0         | 0       | 4     | 573601800020020         |
|       | R13+-R13-R12; (Subt)                                                                                                                                                                                                        |   |        |       |         |      |       |        |        | -      |      |        |      | - 0     |      | - 0    |        | 0     |     |           |          | 0        |        |        |        | 0         | 0        |        |         | 0       |           | 9       | 0         | 0           | 0         | 0      |           |           |          |          | 2       | 0       | 0          | 0          |        | 0        | 0         | 0       | - 9   | 537240000100000         |
| 56    |                                                                                                                                                                                                                             | 0 | 2 0    | 0     | 0 0     | 9 0  | 6     | 9 0    | _      |        | 4    | 0 0    | 0    | •       | •    | •      | 0      | 0     | 0   | 0         | 0        | 0        |        | ,      |        | _         | 0        |        | 6       |         |           | 9       |           |             | 0         |        | 9         |           |          |          | ,       |         |            | 0          |        | 0        | 0         |         | 9     | 23.710000000000         |
| S2    | $RS \leftarrow RS \times R12, (Mal1)$<br>$RS \leftarrow RV \times R12, (Mal2)$<br>$RT \leftarrow RS \times R12, (Mal3)$<br>$RS \leftarrow RS \times R12, (Mal3)$                                                            | 0 | 0 0    | 0 0   | 1 1     | 1 1  |       | 0 0    | 0      |        | 0    | 0 0    | 0    | 0       | 0    | 1      | 0      | - 1   | 0   | 0         | 1        | 0        |        | •      | 0      |           | 0        |        | 0       |         |           | 4       | 0         | 0           | 1         | 0      |           | 1         | 0        |          |         | 1       |            | 0          |        | 0        | 0         |         | 0     | 5716/80148002580        |
| 58    | R6+1 x R6;(lev1)<br>R8+1 x R8;(lev1)                                                                                                                                                                                        | 0 | 0 0    | 0 0   | 0 1     | 0 1  | 0     | 0 0    | 0      | 0 1    | 0    | 0 0    | 0    |         |      |        | 0      |       | 0   |           | 0        | 0        | 0 1    |        | 0      | 0         | 0        |        | 0       | 0       | 0         | 0       |           | 0           | 0         | 0      |           |           |          | -        |         | 0       |            |            | 0      | 0        | -         |         | 0     | 5) #2000000000001       |
| 50    | $RS \leftarrow RS \times R2 \text{ (Mal1)}$<br>$RS \leftarrow RS \times R2 \text{ (Mal2)}$<br>$R7 \leftarrow R7 \times R2 \text{ (Mal3)}$<br>$RR \leftarrow RS \times R3 \text{ (Mal4)}$                                    | 0 |        |       | 1 1     | -    | 0     |        | 0      |        |      |        | 0    |         | 0    | -      |        | 1     | 0   |           | 1        | 0        |        |        |        |           |          |        |         |         |           | 0       |           |             |           | 0      | 0         |           |          |          | · [     |         |            |            |        |          |           |         | 4     | 5)*16*000-880000000     |
| \$10  | R5 +- R5 + R6; (AΔ0)<br>R6 +- R7 + R8; (AΔ0)                                                                                                                                                                                | 0 | 0 0    | 0 0   | 1 1     | 0 0  | 0     | 0 0    | 0      | 0 1    | 0    | 0 0    | 0    |         |      |        | - 1    | - 1   | 0   |           | 0        | 0        | 0 1    |        | 0      | 0         | 0        |        | 0       | 0       | 0         | 0       |           | 0           | 0         | 0      |           |           |          | -        |         | 0       |            |            | 0      | 0        |           | 1       | -     | 573/2000/20000000       |
| 811   | 80 RS = R0; (AΔ01)<br>R1 R6 = R1; (AΔ02)                                                                                                                                                                                    | 1 | 1 0    | 0 0   | 0 0     | 0 0  | 0     | 0 0    | 0      | 0      |      | 1 0    | 0    |         | 0    | 0      | 1      | -     | 0   | 0         | 0        | 0        |        | • Г    | 0      | 0         | 0        |        | 0       | 0       | 0         | 0       |           | 0           | 0         | 0      | 0         | 0         | 0        | _        | • Т     | 0       | 0          | 0          | 0      | 0        | 0         | 0       | 0     | 23,9180090C0000000      |

### **Conclusion**

And this is the conclusion of my report.  $P_n$ .

#### References

- [1] LTD, Potential Ventures; INC, SolarFlare Communications. Cocotb. In: *Cocotb website* [online]. [B.r.] [visited on 2023-10-08]. Available from: https://www.cocotb.org/.
- [2] SNYDER, Wilson. Verilator. In: *Verilator website* [online]. [B.r.] [visited on 2023-10-08]. Available from: https://www.veripool.org/verilator/.
- [3] ADVANCED MICRO DEVICES, Inc. Divider Generator LogiCORE™ IP. In: *Intellectual Property* [online]. [B.r.] [visited on 2023-10-01]. Available from: https://www.xilinx.com/products/intellectual-property/divider.html.
- [4] BURKE, Tom. Verilog Fixed point math library. In: *GitHub* [online]. [B.r.] [visited on 2023-10-01]. Available from: https://github.com/freecores/verilog\_fixed\_point\_math\_library.
- [5] MEYER-BÄSE, Uwe. *Digital signal processing with field programmable gate arrays*. 4th ed. Berlin: Springer, 2014. ISBN 978-3-642-45308-3.
- [6] VOLDER, Jack E. The CORDIC Trigonometric Computing Technique. *IRE Transactions on Electronic Computers*. 1959, roč. EC-8, č. 3, pp. 330–334. Available from DOI: 10.1109/TEC.1959.5222693.
- [7] WALTHER, J. S. A Unified Algorithm for Elementary Functions. In: *Proceedings of the May 18-20, 1971, Spring Joint Computer Conference*. Atlantic City, New Jersey: Association for Computing Machinery, 1971, pp. 379–385. AFIPS '71 (Spring). ISBN 9781450379076. Available from DOI: 10.1145/1478786.1478840.
- [8] BURENEVA, Olga I.; KAIDANOVICH, Olga U. FPGA-based Hardware Implementation of Fixed-point Division using Newton-Raphson Method. In: 2023 IV International Conference on Neural Networks and Neurotechnologies (NeuroNT). 2023, pp. 45–47. Available from DOI: 10.1109/NeuroNT58640. 2023.10175844.
- [9] DIGILENT, Inc. Zybo. In: *Digilent Documentation* [online]. [B.r.] [visited on 2022-11-11]. Available from: https://digilent.com/reference/programmable-logic/zybo/start.
- [10] XILINX, Inc. SoCs with Hardware and Software Programmability. In: *Xilinx Website* [online]. [B.r.] [visited on 2022-11-11]. Available from: https://www.xilinx.com/products/silicon-devices/soc/zynq-7000.html.
- [11] XILINX, Inc. Zynq-7000 SoC Technical Reference Manual. In: *Xilinx Documentation* [online]. 02. 04. 2021 [visited on 2022-11-11]. Available from: https://docs.xilinx.com/v/u/en-US/ug585-Zynq-7000-TRM.
- [12] DIGILENT, Inc. Zybo Reference Manual. In: *Digilent Documentation* [online]. [B.r.] [visited on 2022-11-11]. Available from: https://digilent.com/reference/programmable-logic/zybo/reference-manual.
- [13] XILINX, Inc. Vivado Design Suite User Guide: Release Notes, Installation, and Licensing (UG973). In: *AMD Xilinx Documentation Portal* [online]. [B.r.] [visited on 2022-11-18]. Available from: https://docs.xilinx.com/r/en-US/ug973-vivado-release-notes-install-license/.
- [14] XILINX, Inc. PetaLinux Tools Documentation: Reference Guide (UG1144). In: *AMD Xilinx Documentation Portal* [online]. [B.r.] [visited on 2022-11-18]. Available from: https://docs.xilinx.com/r/en-US/ug1144-petalinux-tools-reference-guide.

- [15] XILINX, Inc. Downloads. In: *AMD Xilinx PetaLinux Tools* [online]. [B.r.] [visited on 2022-11-19]. Available from: https://www.xilinx.com/products/design-tools/embedded-software/petalinux-sdk.html.
- [16] INC., Xilinx. Zynq-7000 SoC Technical Reference Manual (UG585). In: *Xilinx Documentation Portal* [online]. 02. 04. 2021 [visited on 2023-02-28]. Available from: https://docs.xilinx.com/v/u/en-US/ug585-Zynq-7000-TRM.
- [17] XILINX, Inc. Kria KR260 Robotics Starter Kit. In: *Xilinx Website* [online]. [B.r.] [visited on 2023-03-10]. Available from: https://www.xilinx.com/products/som/kria/kr260-robotics-starter-kit.html.
- [18] XILINX, Inc. Kria SOM Carrier Card Design Guide (UG1091). In: *AMD Xilinx Documentation Portal* [online]. 27. 07. 2022 [visited on 2023-03-18]. Available from: https://docs.xilinx.com/r/en-US/ug1091-carrier-card-design/Introduction.
- [19] XILINX, Inc. Kria K26 SOM Data Sheet (DS987). In: *AMD Xilinx Documentation Portal* [online]. 26. 07. 2022 [visited on 2023-03-18]. Available from: https://docs.xilinx.com/r/en-US/ds987-k26-som.
- [20] XILINX, Inc. Kria KR260 Robotics Starter Kit User Guide (UG1092). In: *AMD Xilinx Documentation Portal* [online]. 17. 05. 2022 [visited on 2023-04-05]. Available from: https://docs.xilinx.com/r/en-US/ug1092-kr260-starter-kit/Interfaces.
- [21] XILINX, Inc. XTP743 Kria KR260 Starter Kit Carrier Card Schematics (v1.0). In: *AMD Xilinx Board Files* [online]. 09. 06. 2022 [visited on 2023-04-06]. Available from: https://www.xilinx.com/member/forms/download/design-license.html?cid=bad0ada6-9a32-427e-a793-c68fed567427&filename=xtp743-kr260-schematic.zip.
- [22] XILINX, Inc. XTP685 Kria K26 SOM XDC File (v1.0). In: *AMD Xilinx Board Files* [online]. 14. 05. 2021 [visited on 2023-04-06]. Available from: https://www.xilinx.com/member/forms/download/design-license.html?cid=29e0261a-9532-4a47-bb06-38c83bbbb8c0&filename=xtp685-kria-k26-som-xdc.zip.
- [23] FOUNDATION, Linux. Real-Time Linux. In: *Linux Foundation DokuWiki* [online]. [B.r.] [visited on 2023-04-06]. Available from: https://wiki.linuxfoundation.org/realtime/start.
- [24] XILINX, Inc. Zynq UltraScale+ MPSoC Processing System Product Guide (PG201). In: *Xilinx Documentation* [online]. 11. 05. 2021 [visited on 2023-04-13]. Available from: https://docs.xilinx.com/r/en-US/pg201-zynq-ultrascale-plus-processing-system/Fabric-Reset-Enable.
- [25] ZAKOPAL, Petr et al. [Kria SOM KR260 Starter Kit] Schematic (pdf) vs constrains (xdc) pin confusion. Possible explanation on fan pinout. In: *Xilinx Support Community Forum*. 18. 03. 2023. Available also from: https://support.xilinx.com/s/question/0D54U00006alUwcSAE/kria-som-kr260-starter-kit-schematic-pdf-vs-constrains-xdc-pin-confusion-possible-explanation-on-fan-pinout?language=en\_US.

### **Appendix A:** List of symbols and abbreviations

### A.1 List of abbreviations

**CORDIC** Coordinate Rotation Digital Computer

**CPU** Central Processing Unit

FOSS Free and open-source software
FPGA Field Programmable Gate Array

FSM Finite State Machine IP Intellectual property

**ISA** Instruction Set Architecture

LUT Look Up Table
NR Newton Raphson

**RTL** Register Transfer Level

A.2 List of symbols
P<sub>n</sub> (W) jmenovitý výkon stroje