

# CZECH TECHNICAL UNIVERSITY IN PRAGUE

Faculty of Electrical Engineering
Department of Electric Drives and Traction

Low Abstraction Real-Time FPGA Implementation of Selective Harmonic Elimination Algorithm for Voltage Source Inverters Designed Using State of The Art Free and Open Source Software

Technical report

# TABLE OF CONTENTS

| 1     | Introduction                                                           | 1  |
|-------|------------------------------------------------------------------------|----|
| 2     | Notes on all of the circuit designs in Verilog                         | 2  |
| 3     | Calculating the division of fixed point numbers                        | 3  |
| 3.1   | Newton Rapshon algorithm for calculating the division                  | 3  |
| 3.2   | IP Block Design                                                        | 4  |
| 3.2.1 | Top module design                                                      | 4  |
| 3.2.2 | Allocation and Timing                                                  | 5  |
| 3.2.3 | Data Path Module                                                       | 6  |
| 3.2.4 | Control Unit                                                           | 7  |
| 3.3   | Calculating number of bits to shift the denominator                    | 8  |
| 3.4   | Simulation results                                                     | 8  |
| 4     | Using CORDIC to calculate trigonometric functions                      | 13 |
| 4.1   | Theory                                                                 | 13 |
| 4.1.1 | Example of calculation                                                 | 15 |
| 4.2   | Python Implementation                                                  | 15 |
| 4.3   | IP Block Design                                                        | 18 |
| 4.3.1 | Top module design                                                      | 18 |
| 4.3.2 | Allocation and Timing                                                  | 19 |
| 4.3.3 | Data Path Module                                                       | 20 |
| 4.3.4 | Control Unit                                                           | 23 |
| 4.4   | Simulation results                                                     | 23 |
| 5     | Simple set of nonlinear equations solved by a Newton-Raphson algorithm |    |
|       | using custom circuit implementation                                    | 27 |
| 5.1   | Theory                                                                 | 27 |
| 5.2   | IP Block Design                                                        | 28 |
| 5.2.1 | Top module design                                                      | 28 |
| 5.2.2 | Allocation and Timing                                                  | 29 |
| 5.2.3 | Data Path Unit                                                         | 30 |
| 5.2.4 | Control Unit                                                           | 32 |
| 5.3   | Simulation results                                                     | 32 |
| 6     | Selective Harmonic Elimination                                         | 34 |
| 6.1   | Theory                                                                 | 34 |
| 6.2   | Simplification for Verilog and High level implementation               | 36 |
| 6.3   | High level implementation                                              | 37 |
| 6.4   | IP Block Design                                                        | 39 |
| 6.4.1 | Algorithm Block Diagram                                                | 39 |
| 6.4.2 | Top module design                                                      | 41 |

| 6.4.3 | Allocation and Timing                                       | 41 |
|-------|-------------------------------------------------------------|----|
| 6.4.4 | Data Path Unit                                              | 43 |
| 6.4.5 | Control Unit                                                | 43 |
| 6.4.6 | Inverter output voltage analysis for Verilog implementation | 44 |
| 6.5   | Simulation results                                          | 45 |
|       | Conclusion                                                  | 48 |
|       | References                                                  | 50 |
| Appen | ix A List of and Abbreviations                              | 51 |
| A.1   | List of abbreviations.                                      | 51 |

# **LIST OF FIGURES**

| 3 - 1 | Top module design for the division unit module block design.                                       | 5  |
|-------|----------------------------------------------------------------------------------------------------|----|
| 3 - 2 | Alloccation and timing diagram for the Data Path Unit part of the division module                  | 6  |
| 3 - 3 | Register Transfer Level (RTL) scheme of the Data Path Unit part of the division module.            | 7  |
| 3 - 4 | Selected signals from simulation of division $N/D = 10 / 7$ . The correct result in $R0$ is        |    |
|       | obtained after two iterations (reg numberOfIterations).                                            | 10 |
| 3 - 5 | Selected signals from simulation of division N/D = $1 / 0.25$ . The correct result in $R0$ is      |    |
|       | obtained after five iterations (reg numberOfIterations).                                           | 10 |
| 3 - 6 | Selected signals from simulation of division N/D = $1 / (-0.25)$ . The correct result in $R\theta$ |    |
|       | is obtained after five iterations (reg numberOfIterations).                                        | 11 |
| 3 - 7 | Selected signals from simulation of division $N/D = 304.03215 / (-0.25)$ . The correct             |    |
|       | result in R0 is obtained after five iterations (reg numberOfIterations).                           | 11 |
| 3 - 8 | Selected signals from simulation of division N/D = $10 / (519)$ . The correct result in $R0$       |    |
|       | is obtained after two iterations (reg numberOfIterations).                                         | 12 |
| 4 - 1 | Top module design for the CORDIC module block design.                                              | 19 |
| 4 - 2 | Alloccation and timing diagram for the Data Path Unit part of the CORDIC IP                        | 20 |
| 4 - 3 | Register transfer level (RTL) scheme of the CORDIC IP Data Path Unit IP                            | 21 |
| 4 - 4 | The whole Verilog simulation of CORDIC algorithm for determining the sine and cosine               |    |
|       | values of angle $\theta=-1.2479$ rad. The value of sine and cosine based on the current            |    |
|       | iteration is also calculated in this algorithm approach. The result is passed to the registers     |    |
|       | R9 and R10.                                                                                        | 24 |
| 4 - 5 | The detail of the last iteration of the Verilog simulation of CORDIC algorithm for deter-          |    |
|       | mining the sine and cosine values of angle $\theta = -1.2479$ rad. The result is passed to the     |    |
|       | registers R9 and R10.                                                                              | 25 |
| 4 - 6 | The whole Verilog simulation of CORDIC algorithm for determining the sine and cosine               |    |
|       | values of angle $\theta = 10.7195129$ rad. The value of sinus and cosinus based on the current     |    |
|       | iteration is also calculated in this algorithm approach. The result is passed to the registers     |    |
|       | R9 and R10.                                                                                        | 25 |
| 4 - 7 | The whole Verilog simulation of CORDIC algorithm for determining the sine and cosine               |    |
|       | values of angle $\theta = -6.7195129$ rad. The value of sinus and cosinus based on the current     |    |
|       | iteration is also calculated in this algorithm approach. The result is passed to the registers     |    |
|       | R9 and R10.                                                                                        | 26 |
| 5 - 1 |                                                                                                    | 29 |
| 5 - 2 | Allocation and timing diagram for the Data Path Unit part of the simple (NR) module                | 30 |
| 5 - 3 | Register Transfer Level (RTL) scheme of the Data Path Unit part of the simple Newton-              |    |
|       | Raphson (NR) calculation IP.                                                                       | 31 |
| 5 - 4 | The whole Verilog simulation of a simple Newton-Raphson (NR) algorithm. The result                 |    |
|       | may be seen in registers R1 and R2 after the fifth iteration of the algorithm                      | 33 |
| 6 - 1 |                                                                                                    | 34 |
| 6 - 2 | Block Diagram of the Selective Harmonic Elimination (SHE) using Newton-Raphson                     | _  |
|       | algorithm. Design suitable for hardware implementation.                                            | 40 |
| 6 - 3 | Top module design for the Selective Harmonic Elimination (SHE).                                    | 41 |

| 6 - 4 | Allocation and Timing diagram for the Data Path Unit part of Selective Harmonic Elim- |    |
|-------|---------------------------------------------------------------------------------------|----|
|       | ination (SHE) module.                                                                 | 42 |
| 6 - 5 | Register transfer level (RTL) scheme of the Selective Harmonic Elimination Data Path  |    |
|       | Unit                                                                                  | 43 |
| 6 - 6 |                                                                                       | 44 |
| 6 - 7 |                                                                                       | 44 |
| 6 - 8 | The ending part of a Verilog simulation of Selective Harmonic Elimination (SHE) algo- |    |
|       | rithm. The result are in registers R0 and R1.                                         | 46 |
| 6 - 9 | The whole Verilog simulation of Selective Harmonic Elimination (SHE) algorithm. The   |    |
|       | result are in registers R0 and R1.                                                    | 47 |

# LIST OF TABLES

| 3 - 1 | Control signal encoding table for instructions to be processed by the Division Module    | 7  |
|-------|------------------------------------------------------------------------------------------|----|
| 4 - 1 | Control signal encoding table for instructions to be processed by the CORDIC Module      | 23 |
| 5 - 1 | Control signal encoding table for instructions to be processed by the simple Newton-     |    |
|       | Raphson (NR) alogrithm solve Module.                                                     | 32 |
| 6 - 1 | Control signal encoding table for instructions to be processed by the Selective Harmonic |    |
|       | Elimination (SHE) alogrithm solve Module                                                 | 44 |

#### 1 Introduction

This paper presents the design of multiple FPGA units, which are designed to suit near real-time constraints of controlling the electric drives or for Hardware In Loop systems.

The goal of this paper also was to investigate how to design the speed optimized units using open source toolchain. The final designed unit is capable of solving the Selective Harmonic Elimination (SHE) algorithm. Many researches opt for proprietary design software, which very often offers premade Intelectual Property (IP) blocks, which can be used to design the specified circuit. However in this paper the design was created, tested and analyzed solely using the State of The Art Open Source software without any IP catalogs. This platformless solution ensures, that the designed units may possibly be synthetized for various FPGA chips without any major barriers.

The structure of the paper is as follows: Section 3 presents a unit for division of two arbitrary values by utilizing the Newton-Raphson (NR) algorithm. Section 4 presents design of the Coordinate Rotation Digital Computer (SHE) optimized for speed, rather than lesser complexity. Section 5 introduces design which solves two non-linear equations with a Newton-Raphson (NR) algorithm, presenting suitability of FPGA designs for iterative algorithms. Section 6 presents unit for solving the Selective Harmonic Elimination problem using previously developed modules.

# 2 Notes on all of the circuit designs in Verilog

All of the designs presented in this paper are created using pure Verilog code and tested through Free and Open-Source Software (FOSS). The decision to opt for FOSS was deliberate, aiming to prevent any vendor-locking to specific hardware or predefined IPs. Predefined IPs are often optimized by a specific hardware vendor and intended for use with that vendor's hardware. However, the hardware may not always be available or suitable for a specific application. Academics and numerous companies opt for open-source and open-hardware approaches to prevent vendor lock-in. Once the design and algorithm are thoroughly understood, they can be initially implemented without any specific platform in mind. Later, when selecting the device vendor, the design can be modified to suit the specific hardware requirements.

That is why Verilog, with Cocotb [1] (Test Bench creation tool) and Verilator [2] (simulator) have been used for designing the circuits presented in this paper.

# 3 Calculating the division of fixed point numbers

Typically, when employing numerical methods to solve transcendental equations, the calculation of the division of two input numbers becomes necessary. This requirement persists even when applying the Newton-Raphson (NR) method to solve a set of two equations, because computing the reciprocal value of the Jacobian determinant.

There are some IP blocks available, which are capable of calculating the division of two numbers, but the blocks are usually either vendor specific intellectual property IP [3] or feature low performance [4].

The drawback of vendor-specific IPs lies in their limited compatibility, often preventing their use with FPGA chips from different vendors. On the other hand the vendor specific IPs are usually optimized and able to use the specific type of resources available at the vendor's chip which resolve in better performance.

To preserve the compatibility of the design with chips from multiple vendors, the custom solution for division design based on the very known Newton Raphson (NR) algorithm was developed. [4]

### 3.1 Newton Rapshon algorithm for calculating the division

General Newton Raphson (NR) algorithm is a well known approach to numerically solve equations. It is the reason why it is utilized in many algorithms. However, the negative aspect of NR is that it's convergency strongly depends on initial values of variables. When the initial values are chosen poorly, the performed number of iterations before the convergency is reached can be high.

To reach the fastest convergency possible (determined in number of iterations) apart from the scaling the dominator into the interval [0.5,1] the initial value calculation formula should be utilized. [4]

The Equation 3 - 1 for calculating the initial value is applied after the scaling of denominator is performed. The algorithm developed for the appropriate scaling is explained in the *Calculating number of bits to shift the denominator*.

$$x_0 = \frac{48}{17} - \frac{32}{17}D,\tag{3-1}$$

where the  $x_0$  is the initial value for NR algorithm, D is the denominator value for calculating the expression N/D.

Because in the module design implemented via Verilog the fixed point number format Q32.15 is used, the fractional numbers from Equation 3 - 1 are rounded to

2.8229 (32'sb00000000000000010\_110100101011000 in binary)

and 1.8819 (32'sb000000000000000001 111000011100101 in binary) respectively.

After the initial value  $x_0$  is calculated, the NR algorithm is performed. The idea of using NR algorithm to calculate the division of N/D is to trade the division for a multiplication which can be synthetized in the FPGA fabric. When employing the NR algorithm for finding the values of N/D the function with root is 1/D is essential. After the root of the function is found, it is then multiplied by the numerator value, and te solution N/D is obtained. There may be many functions, which root is the searched value 1/D but the most trivial is Equation 3 - 2.

$$F(x_i) = \frac{1}{x_i} - D. {(3-2)}$$

For the derivative at the point of  $x_i$  then applies Equation 3 - 3.

$$\frac{\mathrm{d}F(x_i)}{\mathrm{d}x} = F'(x_i) = \frac{F(x_{i+1}) - F(x_i)}{x_{i+1} - x_i}.$$
 (3 - 3)

Because finding root of the equation 3 - 2, the value of  $F(x_{i+1})$  is set to be zero. After separating the  $x_{i+1}$  value of the eq. 3 - 3 and derivating the function  $F(x_i)$  the obtained algorithm for a value  $x_{i+1}$  is obtained from eq. 3 - 4.

$$x_{i+1} = -\frac{F(x_i)}{F'(x_i)} + x_i = -\frac{F(x_i)}{-\frac{1}{x_i^2}} + x_i = (\frac{1}{x_i} - D)x_i^2 + x_i = x_i - Dx_i^2 + x_i = 2x_i - Dx_i^2.$$
 (3 - 4)

Usually, the iterative algorithm is stopped, when the value  $F(x_{i+1}) - F(x_i)$  (called defect) reaches certain value set by the stop condition. However, in this algorithm, the stop condition is not yet implemented. Based on the observation carried on the N-R algorithm the obtained result is sufficient after 5 iterations.

The mathematically expressed algorithm is then transformed into programmable algorithm suitable for FPGA implementation. The top module design for this algorithm is presented in the section *Top module design*, the control and data unit for calculating the value  $x_{i+1}$  is presented in the *Allocation and Timing* 

# 3.2 IP Block Design

The design of this unit is consists of 4 main modules:

- the data unit module, used for manipulating data and making calculation operations,
- the control unit module, used for controlling the data unit module and scaling unit module,
- **scaling unit module**, used for calculating the number of bits needed for shifting the denominator value to the interval [0.5,1].

#### 3.2.1 Top module design

The top module wraps all of the presented modules (**data unit module**, **control unit module**, **scaling unit module**). The basic structure of connected modules of this top design is depicted in the Figure 3 - 1. Thanks to this wrapper it is possible to test the created modules with Verilog Testbench, Verilator [2] or Cocotb [1].



Figure 3 - 1 Top module design for the division unit module block design.

### 3.2.2 Allocation and Timing

The diagram of the data flow and timing of the algorithm is displayed in the Figure 3 - 2.

The whole algorithm comprises nine steps. The initial four steps are used for calculating the initial value of  $x_0$  as presented in the Equation 3 - 1. The steps S4 to S8 are for calculating the next search value of  $x_{i+1}$ , thus the root of the Equation 3 - 2 which in fact is the searched value of 1/D. The following iteration begins at the step labeled as S5. The iterative process continues until a predefined stop condition is satisfied, such as reaching a specified number of iterations.



Figure 3 - 2 Alloccation and timing diagram for the Data Path Unit part of the division module.

#### 3.2.3 Data Path Module

The structure of the Data Path Module is depicted in the Figure 3 - 3. The module was specifically designed to serve the needs of the division algorithm. It comprises five registers labeled R0 through R4, two multipliers M1, M2 and one bit shifter.

The module is controlled by the control unit using the control signal labeled as CS. The encoding table with the labels corresponding to the Data Path Unit module is presented in the section *Control Unit*.

The result of each iteration from the division algorithm is passed to a register R0.

The Data Path Module unit also covers the possibility of using negative denominator and numerator. Because the values are stored in a custom Q32.15 fixed point format (whole number comprises of 32 bits, 15 bits fractional part, 17 bits integer part), the algorithm checks if the D or N values are higher than 0h8000 and determine it's actual sign and the sets sign of the result. If the analyzed number is determined negative, it is transformed to value positive and then used in the presented division algorithm. This transformation is needed because of the algorithm calculating the bits to shift the denominator in the interval.



Figure 3 - 3 Register Transfer Level (RTL) scheme of the Data Path Unit part of the division module.

#### 3.2.4 **Control Unit**

Signals from Control Unit to Data Path Module are encoded in the CS signal. Table 3 - 1 displays the CS signal along with the corresponding instructions for steps S0–S8 of the FSM. To enhance code clarity the signal is passed to the Control Unit in the hexadecimal format.

The number of the iteration of the Finite State Machine (FSM) is also set in the Control Unit. This iteration number is subsequently used in the module to check for the stop condition of the calculation loop.

As stated in the Allocation and Timing section, after the step S8, the FSM restarts at the state S4 with new  $x_i$  values as inputs. This state change is not depicted in the Table 3 - 1 for CS signal.

Table 3 - 1 Control signal encoding table for instructions to be processed by the Division Module.

| State | RTL Code                                                                | 14  | 13  | 12  | 11  | 10  | 9     | 8        | 7        | 6     | 5     | 4      | 3        | 2        | 1     | 0     | CS       |  |
|-------|-------------------------------------------------------------------------|-----|-----|-----|-----|-----|-------|----------|----------|-------|-------|--------|----------|----------|-------|-------|----------|--|
| State | KIL Code                                                                | ld0 | ld1 | ld2 | ld3 | ld4 | SelR1 | SelR2[1] | SelR2[0] | SelR3 | SelR4 | SelSh1 | SelM1[1] | SelM1[0] | SelM2 | SelS1 | CS       |  |
| S0    | $R1 \leftarrow D$ ;                                                     | 0   | 1   | 0   | 0   | 0   | 0     | 0        | 0        | 0     | 0     | 0      | 0        | 0        | 0     | 0     | 2000h    |  |
| S1    | $R1 \leftarrow R1 \ll 32$ ; (Sh1)                                       | 0   | 1   | 0   | 0   | 0   | 1     | 0        | 0        | 0     | 0     | 1      | 0        | 0        | 0     | 0     | 15'h2210 |  |
| S2    | $R2 \leftarrow 1.882 \times R1; (M1)$ $R3 \leftarrow N;$                | 0   | 0   | 1   | 1   | 0   | 0     | 0        | 0        | 0     | 0     | 0      | 0        | 1        | 0     | 0     | 15'h1804 |  |
| S3    | $R2 \leftarrow 2.82 - R2; (Sub1)$<br>$R3 \leftarrow R3 \ll 32; (Sh1)$   | 0   | 0   | 1   | 1   | 0   | 0     | 0        | 1        | 1     | 0     | 0      | 0        | 0        | 0     | 0     | 15'h18C0 |  |
| S4    | $R4 \leftarrow R2 \times R1; (M1)$                                      | 0   | 0   | 0   | 0   | 1   | 0     | 0        | 0        | 0     | 1     | 0      | 0        | 0        | 0     | 0     | 420h     |  |
| S5    | $R4 \leftarrow R2 \times R4; (M1)$<br>$R2 \leftarrow 2 \times R2; (M2)$ | 0   | 0   | 1   | 0   | 1   | 0     | 1        | 0        | 0     | 1     | 0      | 1        | 0        | 0     | 0     | 15'h1528 |  |
| S6    | $R2 \leftarrow R2 - R4$ ; (S1)                                          | 0   | 0   | 1   | 0   | 0   | 0     | 0        | 1        | 0     | 0     | 0      | 0        | 0        | 0     | 1     | 15'h1081 |  |
| S7    | $R4 \leftarrow R2 \times R3; (M2)$                                      | 0   | 0   | 0   | 0   | 1   | 0     | 0        | 0        | 0     | 0     | 0      | 0        | 0        | 1     | 0     | 15'h402  |  |
| S8    | $R0 \leftarrow R4$ ;                                                    | 1   | 0   | 0   | 0   | 0   | 0     | 0        | 0        | 0     | 0     | 0      | 0        | 0        | 0     | 0     | 4000h    |  |

# 3.3 Calculating number of bits to shift the denominator

As presented in the section *Newton Rapshon algorithm for calculating the division* the denominator must be appropriately scaled for the division algorithm to work. This section presents algorithm for scaling the denominator specified in the fixed point number format *Q32.15*. After the scaling value is successfully determined, the numerator is scaled accordingly.

The presented algorithm shifts the value of denominator at every positive edge of the clock signal and saves the shifted value in the compare register. Then the combinational circuit is utilized to compare the shifted value in compare register with the number 1 specified in Q32.15 format. If the compared value is the same or lower than 1 the shifting algorithm is done and the value scaleToShift is successfully found. If not, the inner value of shifting bits is incremented and the algorithm proceeds to the next iteration.

The presented algorithm is realized in the *denominatorSizeScaleUnit* module and it's pseudocode is depicted in the code 3 - 1.

```
at every negative edge of clock or positive edge of reset
     if(rst)
         scaleToShift = 0;
         scaleToShiftInternal = 1;
         started = 0;
     end if
     else if (start)
         started = 1;
     end else if
 at every positive edge of clock
     12
         done = 1;
         started = 0;
14
         scaleToShift = scaleToShiftInternal;
     end if
     else
17
         done = 0;
18
         scaleToShiftInternal = scaleToShiftInternal + 1;
19
     end if
20
21
 at every positive edge of clock
     if(start)
23
     compare <= DInternal >> scaleToShiftInternal;
24
     end if
```

Code 3 - 1 Pseudocode for the denominatorSizeScaleUnit module algorithm.

#### 3.4 Simulation results

The simulation via Verilog testbench was made to determine the correctness of presented division module. The Icarus Verilog simulator was used to simulate the module and GTKWave was used to display the VCD simulation output file.

The simulation output confirms that the module operates correctly for positive and negative numbers

in the fixed-point format Q32.15. The algorithm used in this module can compute the correct result in significantly fewer clock cycles compared to the full division algorithm utilized in the division module within the package [4]. As a result, the module can be freely used as a submodule in more complex modules.

VCD simulation output waveforms are depicted on the following Figures. The simulations were conducted for arbitrarily selected values of N and D, with clock frequency set to 250 MHz. Pseudocode Verilog snippet for the test bench is provided in the Listing 3 - 2. In the test bench, one unit of time corresponds to 1 ns. (based on the set timescale settings) The division unit algorithm starts at the next positive edge of clock signal after successful determination of the value bitsToShift when the start signal is set on low.

```
timescale 1ns/1ns
     #10; // wait for 10 units of time
     #0 rstScale = 1; startScale = 0; // reset unit for determining the
    number of bits to shift in the denominator and do not start the unit yet
     N = 32'b0000000100110000_00001000000000; D=32'
    304.03125, denominator to D = -0.25
     #10 rstScale = 0; // wait for 10 units of time and stop the reset of
    scaling unit
     #10 startScale = 1; // start the algorithm for scaling unit
     #20 rst = 1; start = 0; // reset the division unit
     #30 rst = 0; // stop reseting of the division unit
     #20 start = 1; // start the division unit
     #20 start = 0;
10
     #1000; // wait 1000 units of time
     $finish; // finish the simulation
```

Code 3 - 2 Pseudocode snippet for the Verilog simulation test bench.



Figure 3 - 4 Selected signals from simulation of division N/D = 10 / 7. The correct result in R0 is obtained after two iterations (reg number Of Iterations).



Figure 3 - 5 Selected signals from simulation of division N/D = 1 / 0.25. The correct result in R0 is obtained after five iterations (reg number Of Iterations).



Figure 3 - 6 Selected signals from simulation of division N/D = 1 / (-0.25). The correct result in R0 is obtained after five iterations (reg number Of Iterations).



Figure 3 - 7 Selected signals from simulation of division N/D = 304.03215 / (-0.25). The correct result in R0 is obtained after five iterations (reg number Of Iterations).



Figure 3 - 8 Selected signals from simulation of division N/D = 10 / (519). The correct result in R0 is obtained after two iterations (reg numberOfIterations).

# 4 Using CORDIC to calculate trigonometric functions

There are numerous methods calculating trigonometric functions. To enhance flexibility, the Coordinate Rotation Digital Computer (CORDIC) was selected over the Look-Up Table (LUT) implementation.

While the LUT method may be fast, its accuracy depends on the size of the table. In contrast, when using the CORDIC the precision depends on number of performed iterations of the algorithm. The modified algorithm is versatile and may be used to calculate non-trivial functions, including hyperbolic functions, square roots, multiplications, divisions, exponentials and logarithms. [5] In this work only the calculation of *sinus* and *cosinus* functions is used.

# 4.1 Theory

The theory of the first CORDIC was introduced by Volder in [6]. This algorithm computes a coordinate conversion between rectangular (x, y) and polar  $(R, \theta)$  coordinates. The algorithm was then extended by Walther in [7] to include circular, linear and hyperbolic transforms. In this paper, only circular transforms are employed to calculate sine and cosine functions. The presentation will focus on the fundamental aspects of the algorithm.

The rotation of a vector in the rectangular coordinate system (x, y) may be described by matrix-vector multiplication depicted in the Equation 4 - 1.

$$\begin{pmatrix} x_{\rm R} \\ y_{\rm R} \end{pmatrix} = \begin{pmatrix} \cos(\theta) & -\sin(\theta) \\ \sin(\theta) & \cos(\theta) \end{pmatrix} \begin{pmatrix} x_{\rm in} \\ y_{\rm in} \end{pmatrix},$$
 (4 - 1)

where  $x_R$  and  $y_R$  are coordinates of a rotated vector,  $\theta$  is the angle for which the vector with coordinates  $x_{in}$  and  $y_{in}$  was rotated.

Then when simplifying the Equation 4 - 1

$$\begin{pmatrix} x_{\rm R} \\ y_{\rm R} \end{pmatrix} = \cos(\theta) \begin{pmatrix} 1 & -\tan(\theta) \\ \tan(\theta) & 1 \end{pmatrix} \begin{pmatrix} x_{\rm in} \\ y_{\rm in} \end{pmatrix}$$
 (4 - 2)

it can be seen, that only multiplication by scaling factor of precalculated values of  $\cos(\theta)$ , multiplication by  $\tan(\theta)$ , subtraction and addiction operations are needed to perform the rotation. However, the multiplication by  $\tan(\theta)$  can be replaced. The replacement may be done for angles  $\theta$  for which the equation 4 - 3 is true. When implementing the algorithm to the FPGA the multiplication may be swapped for signed right bit shift, which is faster operation than multiplication.

$$\tan(\theta) = 2^{-1}. (4-3)$$

When the values  $x_{in} = 1$  and  $y_{in} = 0$  are used, the result for *sine* and *cosine* may be easily obtained from  $x_R$  and  $y_R$  as expressed in the Equation 4 - 4.

$$x_{R} = x_{\text{in}}\cos(\theta) - y_{\text{in}}\sin(\theta) = |\theta = 0| = \cos(\theta),$$
  

$$y_{R} = x_{\text{in}}\sin(\theta) + y_{\text{in}}\cos(\theta) = |\theta = 0| = \sin(\theta).$$
(4 - 4)

The algorithm can be further simplified by assuming that it is designed to undergo more than 6 iterations and thus the scaling constant, represented by multipliying cosine of different  $\theta$  values, converges to 0, 60725. If this condition is true, there is no necessity to precalculate all the scaling values and only the convergenent value may be used for the multiplication. In this paper the precalculated values are passed

from the custom LUT module to the main algorithm.

As evident from the *Example of calculation* section or the algorithm theory itself, it is essential to estabilish whether the angle for which the vector is rotated in the next iteration should be in a positive direction (counter-clockwise) or negative direction (clockwise). To address this, the set of the equations is expanded, and new variable  $z_i$  is introduced. The complete set of equations utilized in the implementation is as follows.

$$x[i+1] = x[i] - \sigma_i 2^{-i} y[i],$$

$$y[i+1] = y[i] + \sigma_i 2^{-i} x[i],$$

$$z[i+1] = z[i] - \sigma_i \arctan(2^{-i}).$$
(4-5)

The  $\sigma_{i+1}$  is determined based on the sign of the  $z_{i+1}$  variable

$$\sigma_{i+1} = \left\{ \begin{array}{l} -1, \text{ if } z_{i+1} < 0\\ 1, \text{ if } z_{i+1} > 0\\ 0, \text{ if } z_{i+1} = 0 \end{array} \right\}$$

$$(4-6)$$

The algorithm, as presented, accurately computes values for sine and cosine functions only in the first and fourth quadrants  $(3\pi/2 \text{ to } \pi/2 \text{ counter-clockwise})$ . To expand its applicability across the entire  $2\pi$  range, specific actions must be taken before the actual looped aglorithm.

The algorithm must determine the quadrant, where the desired angle  $\theta$  for which the sine and cosine functions are to be calculated is. This determination is made through if statements during the initialization of the algorithm values and at the final value calculation. If the reference angle  $\theta$  falls outside the first or fourth quadrant, then the angle is rotated from its original quadrant to either the first or fourth quadrant. Depending on the quadrant, to which the angle is rotated, the  $\sigma_i$  value is set accordingly. The corresponding if statements during the algorithm initialization are provided in Pseudocode 4 - 1. Similar statements used at the final values calculation are presented in Pseudocode 4 - 2.

The pseudocodes use initialZValue as a reference angle  $\theta$ , for which to calculate the sine and cosine function values, zValue as a temporary value for calculating the iterations for  $z_i$  variables, sigmaValue for temporary value holding the current iteration value of  $\sigma_i$ , the resultCos and resultSin variables are used for storing the temporary and final values of the  $cos(\theta)$  and  $sin(\theta)$  values respectively.

```
if((initialZValue > 1.5707)&(initialZValue < 3.141592))
    sigmaValue = -1
    zValue = initialZValue - 3.141592

else if((initialZValue > 3.141592)&(initialZValue < 4.7123))
    sigmaValue = 1
    zValue = initialZValue - 3.141592

else
    zValue = initialZValue
    sigmaValue = 1
end</pre>
```

Code 4 - 1 Pseudocode for if statements used at the value initialization of the CORDIC algorithm.

```
if((initialZValue > 1.5707)&(initialZValue < 3.141592))
```

```
resultCos = - resultCos
resultSin = resultSin

less if((initialZValue > 3.141592)&(initialZValue < 4.7123))
resultCos = - resultCos
resultSin = - resultSin

end</pre>
```

Code 4 - 2 Pseudocode for if statements used at the final sinus and cosinus value calculation.

#### 4.1.1 Example of calculation

The CORDIC algorithm's general approach can be illustrated by calculating the sine and cosine values for the reference angle  $\theta=57,535$ °. Initially, the angle is deconstructed into its base angles, satisfying the Equation 4 - 3. In this example the deconstruction is 57,535=45+25,565-14,03.

The index i of the variables  $x_i$  and  $y_i$  in the following equations means the number of iteration of the algorithm.

0. iteration 
$$\begin{pmatrix} x_0 \\ y_0 \end{pmatrix} = \cos(45^\circ) \begin{pmatrix} 1 & -1 \\ 1 & 1 \end{pmatrix} \begin{pmatrix} x_{\text{in}} \\ y_{\text{in}} \end{pmatrix}$$
, (4 - 7)

1. iteration 
$$\begin{pmatrix} x_1 \\ y_1 \end{pmatrix} = \cos(26, 565 \, ^{\circ}) \begin{pmatrix} 1 & -2^{-1} \\ 2^{-1} & 1 \end{pmatrix} \begin{pmatrix} x_0 \\ y_0 \end{pmatrix}$$
, (4 - 8)

2. iteration 
$$\begin{pmatrix} x_2 \\ y_2 \end{pmatrix} = \cos(-14, 03^{\circ}) \begin{pmatrix} 1 & -2^{-2} \\ 2^{-2} & 1 \end{pmatrix} \begin{pmatrix} x_1 \\ y_1 \end{pmatrix}$$
. (4 - 9)

Then values  $x_2$  and  $y_2$  may be obtained.

$$\begin{pmatrix} x_2 \\ y_2 \end{pmatrix} = \cos(45 \, ^{\circ}) \cos(25, 565 \, ^{\circ}) \cos(-14, 03 \, ^{\circ}) \begin{pmatrix} 1 & -2^{-2} \\ 2^{-2} & 1 \end{pmatrix} \begin{pmatrix} 1 & -2^{-1} \\ 2^{-1} & 1 \end{pmatrix} \begin{pmatrix} 1 & -1 \\ 1 & 1 \end{pmatrix} \begin{pmatrix} x_{\rm in} \\ y_{\rm in} \end{pmatrix}.$$
 (4 - 10)

The values  $x_2$  and  $y_2$  in the Equation 4 - 10 correspond to  $\cos(57, 535\,^\circ)$  and  $\sin(57, 535\,^\circ)$  respectively.

# 4.2 Python Implementation

For simplicity, the CORDIC algorithm was prototyped in Python. This proved highly beneficial, as the debugging of the Python code is much more straightforward compared to debugging Verilog design without prepared and debugged algoritm in a higher level language.

The Python code was used to precalculate the LUT for scaling factor and arcus tangens values for  $z_i$  calculations.

For clarity, the Python implementation is provided in Code 4 - 3. The presented Code also calculates the error between the CORDIC-calculated value and the Python math library functions.

```
import math

propert math
```

```
5 atanValues = []
6 scalingValues = [1]
7 initialXValueCordic = 1
8 initialYValueCordic = 0
9 # initialZValueCordic = 1.248 # angle for which to calculate cordic
10 # initialZValueCordic = - 1.248 # angle for which to calculate cordic
# initialZValueCordic = - 6.7194 # angle for which to calculate cordic
12 initialZValueCordic = 10.7194824 # angle for which to calculate cordic
initialSigmaValueCordic = 1
15 for x in range(totalNumberOfIterations):
     # Generating arcus tanges values of precalculated angles based on
    number of iterations
     atanValues.append(math.atan(1*2**(-x)))
     # Generating precalculated scaling values based on a number of
    iterations
     scalingValues.append(scalingValues[x]*math.cos(atanValues[x]))
21 print("atanValues: ", atanValues)
22 print("scalingValues: ", scalingValues)
25 print("\n")
26 print("initialZValue original: ", initialZValueCordic)
28 # Moving angle to interval [0,2Pi]
if initialZValueCordic > 0:
     while initialZValueCordic > (2*3.141592):
         initialZValueCordic = initialZValueCordic - 2*3.141592
32 else:
     while initialZValueCordic < (-2*3.141592):</pre>
33
         initialZValueCordic = initialZValueCordic + 2*3.141592
print("initialZValue after moving to [0,2Pi] interval: ",
     initialZValueCordic)
38 print("\n")
41 # Checking the initial value and moving it in the interval
42 if (initialZValueCordic > 1.5707) and (initialZValueCordic < 3.141592):
     zValue = initialZValueCordic - 3.141592
     sigmaValue = -1
     print("value in second q")
46 elif (initialZValueCordic > 3.141592) and (initialZValueCordic < 4.7123):
     zValue = initialZValueCordic - 3.141592
     sigmaValue = 1
48
    print("value in third q")
```

```
50 elif (initialZValueCordic < 0):</pre>
      sigmaValue = -1
51
      zValue = initialZValueCordic
      print("value in fourth q")
54 else:
      zValue = initialZValueCordic # For angle
      sigmaValue = initialSigmaValueCordic # For +- next angle
      print("value in first")
59 # Passing starting values to the calculation values
60 xValue = initialXValueCordic # For cos
61 yValue = initialYValueCordic # For sin
62
64 # CORDIC ALGORITHM
65 for x in range(totalNumberOfIterations):
      # Calculating next values of the current iteration x
      xNextValue = xValue - (sigmaValue*yValue)*2**(-x)
      yNextValue = yValue + (sigmaValue*xValue)*2**(-x)
      zNextValue = zValue - sigmaValue * atanValues[x]
71
      # Determining the signum of next angle (addition or subtraction)
      if zNextValue >= 0:
          sigmaNextValue = 1
      else:
75
          sigmaNextValue = -1
      # Values for new iteration
      xValue = xNextValue
     yValue = yNextValue
      zValue = zNextValue
81
      sigmaValue = sigmaNextValue
      print("iteration:", x, "xValue:", xValue, "yValue:", yValue, "zValue:",
      zValue, "sigmaValue:", sigmaValue, "\n")
86 # Calculating results by scaling the result values from CORDIC by the
     scalingValue which depends on number of iterations which were made
87 resultCos = scalingValues[x-1] * xValue
88 resultSin = scalingValues[x-1] * yValue
# Changing results sign based on the rotation of the initialZValueCordic
91 if (initialZValueCordic > 1.5707) and (initialZValueCordic < 3.141592):
      resultCos = - resultCos
93 elif (initialZValueCordic > 3.141592) and (initialZValueCordic < 4.7123):</pre>
      resultCos = - resultCos
     resultSin = - resultSin
```

```
96
  # Calculating values based on the math library
98 mathResultCos = math.cos(initialZValueCordic)
  mathResultSin = math.sin(initialZValueCordic)
  # Calculating the error of CORDIC calculated values from the python math
     functions
  errorCos = abs(resultCos) - abs(mathResultCos)
  errorSin = abs(resultSin) - abs(mathResultSin)
104
  # Results printing
  print("CORDIC results:")
  print("cos: ", resultCos)
  print("sin: ", resultSin)
print("scaleFactor: ", scalingValues[totalNumberOfIterations-1])
print("\n")
print("MATH results:")
print("cos: ", mathResultCos)
print("sin: ", mathResultSin)
print("\n")
print("error CORDIC-MATH:")
print("cos: ", errorCos)
print("sin: ", errorSin)
```

Code 4 - 3 Python code of CORDIC implementation.

Once the Python implementation and debugging are completed, the Verilog implementation of the algorithm can initiated. Similar to the Division Unit module, as presented in *Calculating the division of fixed point numbers* section, the Data Path, Control Unit and Top Module were designed. This application-specific circuit design approach should be faster and safer than creating a custom CPU with reduced and customized ISA.

# 4.3 IP Block Design

# 4.3.1 Top module design

The top module design of the CORDIC IP is illustrated in Figure 4 - 1. As evident, the structure closely resembles that of the Division Unit top module. When using an approach to create a customized circuit for an algorithm, the process of developing the top modules is likely to be similar, with minor differences in signals, inputs and variables.

The Data Path module incorporates precalculated values in LUTs for *atanValues* and *scalingValues*. In this implementation, the value of *totalNumberOfIterations* is set to 12, making the LUT 12x32 bits in size. It is worth noting that the previously introduced custom fixed-point format *Q32.15* is utilized.



Figure 4 - 1 Top module design for the CORDIC module block design.

### 4.3.2 Allocation and Timing

In Figure 4 - 2, the allocation and timing diagram is depicted. Notably, the if statements, implemented in the control unit, are documented within the diagram. The explanation, why the if statements are needed, is presented in the CORDIC *Theory* section.

As mentioned in the CORDIC *Control Unit* sections, there are two primary approaches to iteration cycles. The one is to proceed from *S4* to *S2* for a faster algorithm, while the other involves progressing from *S6* to *S2*. The latter approach is employed for demonstrative purposes, as it ensures that the final numerical values are always calculated.



Figure 4 - 2 Alloccation and timing diagram for the Data Path Unit part of the CORDIC IP.

#### 4.3.3 Data Path Module

The Figure 4 - 3 presents the Data Path module of the design, calculation and storing units included. The memory LUTs for *atanValues* and *scalingValues* are presented not as separate registers but as inputs to the calculation unit. The results of *sine* and *cosine* functions, referred to as *resultSin* and *resultCos* in the Python implementation, are stored to registers R9 and R10, respectively. It is important to note that the **NEG** blocks are not implemented as calculation unit blocks for generating the negative numbers. Instead negation is activated in the corresponding target register when the appropriate  $SelR_x$  is activated. (where x represents the number of a corresponding register, either R9 or R10)

The implementation of the LUT memory module for atanValues is depicted in Code 4 - 4, memory module for scalingValues is depicted in Code 4 - 5.



Figure 4 - 3 Register transfer level (RTL) scheme of the CORDIC IP Data Path Unit IP.

```
module atanValuesCordicLUT(index, returnValue);
input [3:0] index;
4 output reg signed [31:0] returnValue;
7 always@(index)
8 begin
     case(index)
         4'b0000: returnValue = 32'sb000000000000000_110010010000111; //
10
    0.7853981633974483
         4'b0001: returnValue = 32'sb00000000000000 011101101011010; //
    0.4636476090008061
         4'b0010: returnValue = 32'sb0000000000000000_001111101011011; //
12
    0.24497866312686414
         0.12435499454676144
         4'b0100: returnValue = 32'sb0000000000000000_00001111111111101; //
    0.06241880999595735
         4'b0101: returnValue = 32'sb0000000000000000_0000011111111111; //
15
    0.031239833430268277
         4'b0110: returnValue = 32'sb0000000000000000000000111111111; //
    0.015623728620476831
```

Code 4 - 4 Verilog code of the atanValuesCordicLUT lookup table (LUT) implementation.

```
module scalingValuesCordicLUT(index, returnValue);
 input [3:0] index;
4 output reg signed [31:0] returnValue;
always@(index)
7 begin
     case(index)
         4'b0000: returnValue <= 32'sb000000000000000 0000000000000; //
         4'b0001: returnValue <= 32'sb000000000000000_101101010000010; //
     0.7071067811865476
         4'b0010: returnValue <= 32'sb0000000000000000_101000011110100; //
    0.6324555320336759
         4'b0011: returnValue <= 32'sb00000000000000 100111010001001; //
     0.6135719910778964
         4'b0100: returnValue <= 32'sb00000000000000 100110111101110; //
     0.6088339125177524
         4'b0101: returnValue <= 32'sb0000000000000000_100110111000111; //
    0.6088339125177524
         4'b0110: returnValue <= 32'sb00000000000000 100110110111101; //
     0.607351770141296
         4'b0111: returnValue <= 32'sb0000000000000000_100110110111011; //
    0.6072776440935261
         4'b1000: returnValue <= 32'sb000000000000000_100110110111010; //
    0.6072591122988928
         4'b1001: returnValue <= 32'sb00000000000000 10011011011010; //
18
     0.6072544793325625
         4'b1010: returnValue <= 32'sb00000000000000 100110110111010; //
     0.6072533210898753
         4'b1011: returnValue <= 32'sb0000000000000000_100110110111010; //
20
     0.6072530315291345
```

Code 4 - 5 Verilog code of the scaling Values Cordic LUT lookup table (LUT) implementation.

#### 4.3.4 Control Unit

Similarly to the Division *Control Unit* section, the encoding of the control signal is presented in Table 4 - 1.

The branches of if statements used in the design have been color-coded to enhance clarity. Steps *S5* and *S6* are mainly focused on multiplying the result of iteration by the appropriate scaling value and on multiplying the calculated values based on the quadrant of the original reference angle value.

| State | RTL Code                                                                                                                                                                                                                                                                                                                                                                         |   | 25 24<br>ld2 ld3 | 23<br>ld4 | 22<br>ld5 | 21 20<br>ld6 ld |   | 18<br>Id9 | 17<br>ld10 : | 16<br>SelR1 | 15<br>SelR2 | 14<br>SelR3[1] | 13<br>SelR3[0] | 12<br>SelR4[1] | 11<br>SelR4[0] | 10<br>SelR5 | 9<br>SelR6 | 8<br>SelR7 | 7<br>SelR9 | 6<br>SelR10 | 5<br>SelM1 | 4<br>SelM2 | 3<br>SelSub0 | 2<br>SelSub1 | 1<br>SelSub2 | 0<br>SelAdd1 | cs                   |
|-------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---|------------------|-----------|-----------|-----------------|---|-----------|--------------|-------------|-------------|----------------|----------------|----------------|----------------|-------------|------------|------------|------------|-------------|------------|------------|--------------|--------------|--------------|--------------|----------------------|
| S0    | R0 — totalNumberOfiterations;<br>R1 — initialNValue;<br>R2 — initialVValue;<br>R3 — initialZValue;                                                                                                                                                                                                                                                                               | 1 | 1 1              | 0         | 0         | 0 0             | 0 | 0         | 0            | 0           | 0           | 0              | 0              | 0              | 0              | 0           | 0          | 0          | 0          | 0           | 0          | 0          | 0            | 0            | 0            | 0            | 27'h7000000          |
|       | if(R3>6283184)<br>R3 ← R3 − 6283184; (Sub1)                                                                                                                                                                                                                                                                                                                                      | 0 | 0 1              | 0         | 0         | 0 0             | 0 | 0         | 0            | 0           | 0           | - 1            | 0              | 0              | 0              | 0           | 0          | 0          | 0          | 0           | 0          | 0          | - 1          | 0            | 0            | 0            | 27'h1004008          |
| SI    | $\begin{array}{l} if(R3 < 6.283184) \\ R3 \leftarrow R3 + 6.283184; (Add1) \\ if\{[(R3 < 6.283184)\&(R3 < 0)](R3 > 6.283184)\&(R3 < 0))] \rightarrow nextState <= S2; \end{array}$                                                                                                                                                                                               | 0 | 0 1              | 0         | 0         | 0 0             | 0 | 0         | 0            | 0           | 0           | 0              | 1              | 0              | 0              | 0           | 0          | 0          | 0          | 0           | 0          | 0          | 0            | 0            | 0            | 1            | 27'h1002001<br>27'h0 |
| S2    | $d([R3-0)\phi(R3-6.281184)) \rightarrow nextState \Leftrightarrow S_1. CS = 0;$<br>$ete = -nextState \Leftrightarrow S_1;$<br>$d([R3-0)\phi(R3-6.281184)) \rightarrow nextState \Leftrightarrow S_2. CS = 0;$<br>$ete = -nextState \Leftrightarrow S_1;$<br>$d([R3-0.281184]) \Rightarrow nextState \Leftrightarrow S_3. CS = 0. ete \Rightarrow nextState \Leftrightarrow S_3.$ | 0 | 0 0              | 0         | 0         | 0 0             | 0 | 0         | 0            | 0           | 0           | 0              | 0              | 0              | 0              | 0           | 0          | 0          | 0          | 0           | 0          | 0          | 0            | 0            | 0            | 0            |                      |
|       | $\begin{array}{l} \text{if}[(R3 > 1.5707)\&(R3 < 3.141592)] \\ \text{R4} \leftarrow R3 = 3.141592; (Sub1) \\ \text{R5} \leftarrow 4; \end{array}$                                                                                                                                                                                                                                | 0 | 0 0              | 1         | 1         | 0 0             | 0 | 0         | 0            | 0           | 0           | 0              | 0              | 1              | 0              | 1           | 0          | 0          | 0          | 0           | 0          | 0          | 0            | 0            | 0            | 0            | 27'hC01400           |
| S3    | if[(R3 > 3.141592)&(R3 < 4.7123)]<br>R4 ← R3 − 3.141592; (Sub1)<br>R5 ← 1;                                                                                                                                                                                                                                                                                                       | 0 | 0 0              | 1         | 1         | 0 0             | 0 | 0         | 0            | 0           | 0           | 0              | 0              | 1              | 0              | 0           | 0          | 0          | 0          | 0           | 0          | 0          | 0            | 0            | 0            | 0            | 27'hC01000           |
|       | if(R3 <0)<br>R5 ← -1;<br>R4 ← R3;                                                                                                                                                                                                                                                                                                                                                | 0 | 0 0              | 1         | 1         | 0 0             | 0 | 0         | 0            | 0           | 0           | 0              | 0              | 0              | 1              | - 1         | 0          | 0          | 0          | 0           | 0          | 0          | 0            | 0            | 0            | 0            | 27'bC00C00           |
|       | else $\begin{aligned} R4 \leftarrow R3; \\ R5 \leftarrow 1; \end{aligned}$                                                                                                                                                                                                                                                                                                       | 0 | 0 0              | 1         | 1         | 0 0             | 0 | 0         | 0            | 0           | 0           | 0              | 0              | 0              | 1              | 0           | 0          | 0          | 0          | 0           | 0          | 0          | 0            | 0            | 0            | 0            | 26°hC00800           |
| S4    | $R6 \leftarrow R1 \times R5$ ; (Mul1)<br>$R7 \leftarrow R2 \times R5$ ; (Mul2)<br>$R8 \leftarrow atanValues[numberOflieration] \times R5$ ; (Mul3)                                                                                                                                                                                                                               | 0 | 0 0              | 0         | 0         | 1 1             | 1 | 0         | 0            | 0           | 0           | 0              | 0              | 0              | 0              | 0           | 1          | 1          | 0          | 0           | 1          | 1          | 0            | 0            | 0            | 0            | 26°h380330           |
| S5    | R6 — R6 »numberOflteration; (Sh1)<br>R7 — R7 »numberOflteration; (Sh2)<br>R4 — R4 — R8; (Sub2)                                                                                                                                                                                                                                                                                   | 0 | 0 0              | 1         | 0         | 1 1             | 0 | 0         | 0            | 0           | 0           | 0              | 0              | 0              | 0              | 0           | 0          | 0          | 0          | 0           | 0          | 0          | 0            | 1            | 1            | 0            | 26'hB00006           |
| S6    | $R4 \rightarrow 0$ ;<br>$R1 \leftarrow R1 - R7; (Sub2)$<br>$R2 \leftarrow R2 + R6; (Add1)$<br>$R5 \leftarrow 1;$                                                                                                                                                                                                                                                                 | 1 | 1 0              | 0         | 1         | 0 0             | 0 | 0         | 0            | 1           | 1           | 0              | 0              | 0              | 0              | 0           | 0          | 0          | 0          | 0           | 0          | 0          | 0            | 0            | 0            | 0            | 26'h6418000          |
| 56    | $ \begin{array}{lll} R4 & \!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!$                                                                                                                                                                                                                                                                                  | 1 | 1 0              | 0         | 1         | 0 0             | 0 | 0         | 0            | 1           | 1           | 0              | 0              | 0              | 0              | 1           | 0          | 0          | 0          | 0           | 0          | 0          | 0            | 0            | 0            | 0            | 26'h6418400          |
| S7    | $R9 \leftarrow R1 \times scalingValues[numberOflteration]; (Mul1)$<br>$R10 \leftarrow R2 \times scalingValues[numberOflteration]; (Mul2)$                                                                                                                                                                                                                                        | 0 | 0 0              | 0         | 0         | 0 0             | 0 | 1         | 1            | 0           | 0           | 0              | 0              | 0              | 0              | 0           | 0          | 0          | 1          | 1           | 0          | 0          | 0            | 0            | 0            | 0            | 26'h600C0            |
|       | if[(R3>3.141592)&(R3<4.7123)]<br>$R9 \leftarrow R9 \times (-1); (Neg1)$<br>$R10 \leftarrow R10 \times (-1); (Neg2)$                                                                                                                                                                                                                                                              | 0 | 0 0              | 0         | 0         | 0 0             | 0 | 1         | 1            | 0           | 0           | 0              | 0              | 0              | 0              | 0           | 0          | 0          | 0          | 0           | 0          | 0          | 0            | 0            | 0            | 0            | 26°h60000            |
| S8    | if[(R3 > 1.5707)&(R3 < 3.141592)]<br>$R9 \leftarrow R9 \times (-1); (Neg1)$<br>$R10 \leftarrow R10;$                                                                                                                                                                                                                                                                             | 0 | 0 0              | 0         | 0         | 0 0             | 0 | 1         | 0            | 0           | 0           | 0              | 0              | 0              | 0              | 0           | 0          | 0          | 0          | 0           | 0          | 0          | 0            | 0            | 0            | 0            | 26'h40000            |
|       | clsc<br>R9 ← R9;<br>R10 ← R10:                                                                                                                                                                                                                                                                                                                                                   | 0 | 0 0              | 0         | 0         | 0 0             | 0 | 0         | 0            | 0           | 0           | 0              | 0              | 0              | 0              | 0           | 0          | 0          | 0          | 0           | 0          | 0          | 0            | 0            | 0            | 0            | 24'h0                |

Table 4 - 1 Control signal encoding table for instructions to be processed by the CORDIC Module.

#### 4.4 Simulation results

The testbench for testing the design was developed using Cocotb [1] with the Verilator [2] as a simulator. It becomes evident during the algorithm implementation, where the actual iteration values for sine and cosine are calculated, that the number of cycles required for the final calculation can be determined as

$$NoCyc_{\text{result every iteration}} = \left\{ \begin{array}{l} 3, \text{ if } initialZValue \in [-2\pi, 2\pi] \\ 4, \text{ if } initialZValue \notin [-2\pi, 2\pi] \end{array} \right\} + 5NoIt, \tag{4-11}$$

where NoCyc (-) is the number of cycles and NoIt is the number of iterations for the CORDIC algorithm. The 4 value is caused by states S0-S4 and the multiplication by 5 is caused by states S4-S8. When the result of the CORDIC algorithm is calculated only once at the end of the algorithm, the number

of iterations can be determined by

$$NoCyc_{\text{result at the end}} = \left\{ \begin{array}{l} 3, \text{ if } initialZValue \in [-2\pi, 2\pi] \\ 4, \text{ if } initialZValue \notin [-2\pi, 2\pi] \end{array} \right\} + 3NoIt + 2, \tag{4-12}$$

where the multiplication by value 3 is caused by states *S4*–*S6*, the addition of 4 is caused by states *S0-S4* and the addition of the 2 is caused by states *S7*–*S8*.

In the simulation the number Of Cycles displayed is an index of the cycle, so for angle  $\theta = -1.247985$  rad is the number of iterations depicted on Figure 4 - 5 is 63.

The frequency of the clock signal in the simulation is currently set to 50 MHz.



Figure 4 - 4 The whole Verilog simulation of CORDIC algorithm for determining the sine and cosine values of angle  $\theta = -1.2479$  rad. The value of sine and cosine based on the current iteration is also calculated in this algorithm approach. The result is passed to the registers R9 and R10.



Figure 4 - 5 The detail of the last iteration of the Verilog simulation of CORDIC algorithm for determining the sine and cosine values of angle  $\theta = -1.2479$  rad. The result is passed to the registers R9 and R10.



Figure 4 - 6 The whole Verilog simulation of CORDIC algorithm for determining the sine and cosine values of angle  $\theta = 10.7195129$  rad. The value of sinus and cosinus based on the current iteration is also calculated in this algorithm approach. The result is passed to the registers R9 and R10.



Figure 4 - 7 The whole Verilog simulation of CORDIC algorithm for determining the sine and cosine values of angle  $\theta = -6.7195129$  rad. The value of sinus and cosinus based on the current iteration is also calculated in this algorithm approach. The result is passed to the registers R9 and R10.

# 5 Simple set of nonlinear equations solved by a Newton-Raphson algorithm using custom circuit implementation

Most of the modules presented in the preceding sections can be utilized as submodules to solve the system of nonlinear equations. Because this work aims to solve the transcendetal equations for Selective Harmonic Elimination (SHE), the most effective approach is to initially solve a simpler set of equations to determine, the difficulty and viability of the NR.

# 5.1 Theory

The objective of the NR algorithm is to solve the set of nonlienar equations

$$F_1(x_1, x_2) = x_1^3 - x_2 - 1, (5-1)$$

$$F_2(x_1, x_2) = x_1 - 2x_2 - 2, (5-2)$$

where one possible set of solutions  $x_1$  and  $x_2$  yields

$$F_1 = 0,$$
 (5 - 3)

$$F_2 = 0.$$
 (5 - 4)

The algorithm could have been implemented in a custom CPU with reduced instruction set. However, due to apparent reasons such as speed and complexity associated with developing own processor, chosen approach involved creating an application specific circuit design.

In order to integrate the algorithm into the custom design, the general NR algorithm approach had to be simplified to the its most fundamental implementation. Every component that could be precalculated was set as a static value during the design phase.

To check if the implementation and algorithm was well designed, the solution by *Solve* function and a customized NR was made in Wolfram Mathematica.

Before initiating the algorithm, the starting values of  $x_1^0$  and  $x_2^0$  were set as inputs to the module. Based on that input the function values at selected starting points were calculated.

As a next step, the so called defect could be calculated using the newly found values of  $F_1(x_1^0, x_2^0)$  and  $F_2(x_1^0, x_2^0)$ 

$$\Delta \mathbf{F}^{i} = \begin{pmatrix} \Delta F_{1}^{i} \\ \Delta F_{2}^{i} \end{pmatrix} = \begin{pmatrix} F_{1}^{i} - F_{1}^{\text{known solution}} \\ F_{2}^{i} - F_{2}^{\text{known solution}} \end{pmatrix}, \tag{5-5}$$

where the superscript i is the number of iteration for which the defect is calculated. When the algorithm starts, the i = 0. So for example the input value for  $F_1^0$  is  $x_1^0$  and  $x_2^0$ .

Next, the Jacobian matrix **J** from vector of functions  $(F)(x_1, x_2) = (F_1, F_2)$  is calculated as follows.

$$\mathbf{J}^{i} = \begin{pmatrix} \frac{d\mathbf{F}_{1}}{dx_{1}^{i}} & \frac{d\mathbf{F}_{1}}{dx_{2}^{i}} \\ \frac{d\mathbf{F}_{2}}{dx_{1}^{i}} & \frac{d\mathbf{F}_{2}}{dx_{2}^{i}} \end{pmatrix} = \begin{pmatrix} 3(x_{1}^{i})^{2} & -1 \\ 1 & -2 \end{pmatrix}. \tag{5-6}$$

As for the general NR algorithm, the inverted value Jacobian matrix needs to be calculated. The problem is, that when using general mathematical software, such as Wolfram Mathematica, the calculation of the inversion is as easy as using function of inversion. When designing the circuit, the approach of manual calculation of inversion must be used. In this paper, the calculation is made possible by calculating the determinant of the Jacobian Matrix, its reciprocal value, its adjugate matrix and multiplication of the adjugate matrix elements by the calculated determinant reciprocal value.

Because the size of the Jacobian matrix is 2x2 the determinant may be easily calculated using the Sarrus Rule. When the matrix is more complicated, the expansion method may be utilized.

$$\det(\mathbf{J}) = 3(x_1^i)^2(-2) - (-1) = 3(x_1^i)^2(-2) + 1. \tag{5-7}$$

The reciprocal value of the determinant is then calculated by the Division Unit, created for calculating division of arbitrary real numbers. This Division Unit is presented in the section *Calculating the division of fixed point numbers*.

The adjugate matrix is calculated as follows

$$\operatorname{adj}(\mathbf{J}) = \begin{pmatrix} \mathbf{J}_{11}(-1)^{1+1} & \mathbf{J}_{01}(-1)^{1+2} \\ \mathbf{J}_{10}(-1)^{1+2} & \mathbf{J}_{00}(-1)^{2+2} \end{pmatrix} = \begin{pmatrix} -2 & -1 \\ 1 & 3(x_1^i)^2 \end{pmatrix}.$$
 (5 - 8)

After the calculation of the reciprocal value of the determinant of the Jacobi matrix and the adjugate matrix, the inverted Jacobi matrix may be finally calculated

$$\mathbf{J}^{-1i} = \frac{1}{\det(\mathbf{J}^i)} \begin{pmatrix} \operatorname{adj}(\mathbf{J}_{00}^i) & \operatorname{adj}(\mathbf{J}_{01}^i) \\ \operatorname{adj}(\mathbf{J}_{10}^i) & \operatorname{adj}(\mathbf{J}_{10}^i) \end{pmatrix} = \frac{1}{\det(\mathbf{J}^i)} \begin{pmatrix} -2 & -1 \\ 1 & 3(x_1^i)^2 \end{pmatrix}. \tag{5-9}$$

Next the  $(\Delta x_1^i, \Delta x_2^i)$  can be calculated using the inverted Jacobi matrix and the defect.

$$\begin{pmatrix} \Delta x_1^i \\ \Delta x_2^i \end{pmatrix} = \begin{pmatrix} \mathbf{J}_{00}^{-1,i} \ \Delta \mathbf{F}_1^i + \mathbf{J}_{01}^{-1,i} \ \Delta \mathbf{F}_2^i \\ \mathbf{J}_{10}^{-1,i} \ \Delta \mathbf{F}_1^i + \mathbf{J}_{11}^{-1,i} \ \Delta \mathbf{F}_2^i \end{pmatrix}. \tag{5-10}$$

Now the next iteration value denoted as i + 1 of  $x_1$  and  $x_2$  may be calculated

$$\begin{pmatrix} x_1^{i+1} \\ x_2^{i+1} \end{pmatrix} = \begin{pmatrix} x_1^i + \Delta x_1^i \\ x_2^i + \Delta x_2^i \end{pmatrix}.$$
 (5 - 11)

With these new iteration values  $x_1^{i+1}$   $x_2^{i+1}$  the loop for calculation starts again at the calculation of the new value  $F_1^{i+1}$   $F_2^{i+1}$  which is presented at the start of this section.

# 5.2 IP Block Design

# 5.2.1 Top module design

Figure 5 - 1 depicts the top module design of the circuit. The Control Unit sends control signals to the Data Path unit to make the calculations. As in all designs in this paper, the numbers are formatted in the *Q32.15* fixed point format.



Figure 5 - 1 Top module design for the simple Newton-Raphson (NR) calculation module block design.

# 5.2.2 Allocation and Timing

The algorithm structure for the Verilog implementation is depicted in the data flow diagram in the picture 5 - 2.



Figure 5 - 2 Allocation and timing diagram for the Data Path Unit part of the simple (NR) module.

### 5.2.3 Data Path Unit

The Data path unit for this simple NR algorithm consists of four multipliers, two adders, two subtractors and one divider. The divider is implemented using the Division Unit, presented in the section *Calculating* the division of fixed point numbers. Upon completion of the algorithm the results for  $x_1$  and  $x_2$  are saved in the R1 and R2, the state transitions to S11 and signal *done* is set to 1. The results then can be driven to another module or unit for further usage.



Figure 5 - 3 Register Transfer Level (RTL) scheme of the Data Path Unit part of the simple Newton-Raphson (NR) calculation IP.

#### **5.2.4** Control Unit

The Table 5 - 1 shows encoding of a control signal for the Data Path unit.

The NR algorithm iteration transitions from the state S10 to state S1 when the iteration count is lower than the predetermined total number of iterations, value which is set in the Control Unit during the design phase. In this particular implementation, the total number of iterations is set to 5. It is worth noting that sometimes the termination of the NR algorithm is determined by the value of a defect. However, in this implementation the defect-check is not implemented.

Implementation of a defect-controlled algorithm would be straightforward. The values from registers holding the defect values, R3 and R4, would be connected to the control unit in the steps S4 and S5 respectively, and a comparison with the reference defect value would be executed. If the defect value was smaller than the reference value, the algorithm would transition to the state S11 and therefore the calculation would end. Conversely, ff the defect was larger than the reference value, the next state would be S6 and the iteratioun would proceed normally, transitioning from state S10 to S1.

Table 5 - 1 Control signal encoding table for instructions to be processed by the simple Newton-Raphson (NR) alogrithm solve Module.

| State |                                                                                                                                                                 | 35<br>M1 |   | 33 32<br>Id3 Id4 | 31<br>ld5 | 30<br>Id6 | 29 :<br>Id7 I | 28<br>ld8 5 | 27<br>leIR1 S | 26<br>ielR2 : | 25<br>idR3[1] | 24<br>SelR3[0] | 23<br>SelR4 | 22<br>SelR5 | 21<br>SelR6 | 20<br>SelR7 | 19<br>SelR8[1] | 18<br>ScIR8[0] | 17<br>SelMI[1] | 16<br>SelM1[0] | 15<br>SelM2[1] | 14<br>SelM2[0] | 13<br>SelM3 | 12<br>SelM4[1] | 11<br>SelM4[0] | 10<br>SelM5 | 9<br>SelM6 | 8<br>SelAI[I] | 7<br>SelA1[0] | 6<br>SelA2[1] | 5<br>SelA2[0] | 4<br>SdA3 | 3<br>SdS1[1] | 2<br>SelSI[0] | 1<br>SelS2[1] | 0<br>SelS2[0] | cs             |
|-------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------|----------|---|------------------|-----------|-----------|---------------|-------------|---------------|---------------|---------------|----------------|-------------|-------------|-------------|-------------|----------------|----------------|----------------|----------------|----------------|----------------|-------------|----------------|----------------|-------------|------------|---------------|---------------|---------------|---------------|-----------|--------------|---------------|---------------|---------------|----------------|
| S0    | $R1 \leftarrow x1$ ;<br>$R2 \leftarrow x2$ ;                                                                                                                    | 1        | 1 | 0 0              | 0         | 0         | 0             | 0           | 0             | 0             | 0             | 0              | 0           | 0           | 0           | 0           | 0              | 0              | 0              | 0              | 0              | 0              | 0           | 0              | 0              | 0           | 0          | 0             | 0             | 0             | 0             | 0         | 0            | 0             | 0             | 0             | 36°hC00000000  |
| SI    | $R3 \leftarrow R1 \times R1; (1)$<br>$R4 \leftarrow R2 \times 2; (2)$                                                                                           | 0        | 0 | 1 1              | 0         | 0         | 0             | 0           | 0             | 0             | 1             | 0              | 1           | 0           | 0           | 0           | 0              | 0              | 1              | 1              | - 1            | 1              | 1           | 1              | 0              | 0           | 0          | 0             | 0             | 0             | 0             | 0         | 0            | 0             | 0             | 0             | 36'h30283F000h |
|       | $R3 \leftarrow R3 \times R1$ ; (1)<br>$R4 \leftarrow R1 - R4$ ; (2)<br>$R5 \leftarrow R3 \times 3$ ; (3)                                                        | 0        | 0 | 1 1              | 1         | 0         | 0             | 0           | 0             | 0             | 1             | 0              | 0           | 1           | 0           | 0           | 0              | 0              | 1              | 0              | 1              | 1              | 0           | 0              | 0              | 1           | 1          | 0             | 0             | 0             | 0             | 0         | 0            | 0             | 1             | 0             | 36'h38242C602  |
| 62    | $R3 \leftarrow R3 - R2; (1)$<br>$R4 \leftarrow R4 - 2; (2)$<br>$R8 \leftarrow R5 \times (-2); (1)$<br>$R7 \leftarrow F20;$                                      | 0        | 0 | 1 1              | 0         | 0         | 1             | 1           | 0             | 0             | 0             | 1              | 0           | 0           | 0           | 1           | 1              | 0              | 0              | 1              | 1              | 0              | 0           | 0              | 0              | 0           | 0          | 0             | 0             | 0             | 0             | 0         | 1            | 0             | 0             | 1             | 36'h331198009  |
| S4    | $R3 \leftarrow R3 - 1$ ; (1)<br>$R4 \leftarrow R7 - R4$ ; (2)<br>$R8 \leftarrow R8 + 1$ ; (1)<br>$R6 \leftarrow F10$ ;                                          | 0        | 0 | 1 1              | 0         | 1         | 0             | 1           | 0             | 0             | 0             | 1              | 0           | 0           | 1           | 0           | 0              | 1              | 0              | 0              | 0              | 0              | 0           | 0              | 0              | 0           | 0          | 1             | 0             | 1             | 0             | 0         | 0            | 1             | 0             | 0             | 36'h351240144  |
| S5    | $R3 \leftarrow R6 - R3$ ; (1)<br>$R8 \leftarrow 1 / R8$ ; (1)                                                                                                   | 0        | 0 | 1 0              | 0         | 0         | 0             | 1           | 0             | 0             | 0             | 1              | 0           | 0           | 0           | 0           | 0              | 0              | 0              | 0              | 0              | 0              | 0           | 0              | 0              | 0           | 0          | 0             | 0             | 0             | 0             | 0         | 0            | 0             | 0             | 0             | 36°h211000000  |
|       | R8 load from division when data is available<br>$R6 \leftarrow R8 \times (-2); (1)$<br>$R8 \leftarrow R8 \times (-1); (2)$<br>$R5 \leftarrow R5 \times R8; (3)$ |          | 0 |                  |           | 0         | 0             | 1           | 0             | 0             | 0             | 0              | 0           | 0           | 0           | 0           | 0              | 0              | 0              | 0              | 0              | 1              | 0           | 0              | 0              | 0           | 0          | 0             | 0             | 0             | 0             | 0         | 0            | 0             | 0             | 0             | 36"hD04C4800   |
| S8    | $R6 \leftarrow R3 \times R6; (1)$<br>$R4 \leftarrow R4 \times R8; (2)$<br>$R5 \leftarrow R3 \times R8; (3)$<br>$R7 \leftarrow R5 \times R4; (4)$                | 0        | 0 | 0 1              | 1         | 1         | 1             | 0           | 0             | 0             | 0             | 0              | 1           | 1           | 0           | 0           | 0              | 0              | 1              | 0              | 0              | 0              | 0           | 0              | 0              | 1           | 0          | 0             | 0             | 0             | 0             | 0         | 0            | 0             | 0             | 0             | 36"h1E0C20400  |
| S9    | $R3 \leftarrow R6 + R4; (1)$<br>$R5 \leftarrow R5 + R7; (2)$                                                                                                    | 0        | 0 | 1 0              | 1         | 0         | 0             | 0           | 0             | 0             | 0             | 0              | 0           | 0           | 0           | 0           | 0              | 0              | 0              | 0              | 0              | 0              | 0           | 0              | 0              | 0           | 0          | 0             | 1             | 0             | 1             | 1         | 0            | 0             | 0             | 0             | 36°h2800000B0  |
| S10   | $R1 \leftarrow R1 + R3; (1)$<br>$R2 \leftarrow R2 + R5; (2)$                                                                                                    | 1        | 1 | 0 0              | 0         | 0         | 0             | 0           | 1             | 1             | 0             | 0              | 0           | 0           | 0           | 0           | 0              | 0              | 0              | 0              | 0              | 0              | 0           | 0              | 0              | 0           | 0          | 0             | 0             | 0             | 0             | 0         | 0            | 0             | 0             | 0             | 36'hC0C000000  |
| S11   |                                                                                                                                                                 | х        | х | x x              | х         | х         | X             | X           | X             | х             | x             | X              | x           | X           | X           | х           | x              | x              | X              | X              | X              | X              | х           | X              | х              | X           | x          | х             | X             | x             | х             | x         | X            | x             | X             | x             | 36'hxxxxxxxx   |

#### 5.3 Simulation results

The test bench for simulation was made using Cocotb [1] with the Verilator [2] as a simulator. The results of the calculation may be seen in the registers R1 and R2. The results are  $x_1 = -0.707489$  and  $x_2 = -1.353759$ .

The clock signal frequency in simulation was set to 20 MHz.



Figure 5 - 4 The whole Verilog simulation of a simple Newton-Raphson (NR) algorithm. The result may be seen in registers R1 and R2 after the fifth iteration of the algorithm.

# **6** Selective Harmonic Elimination

## 6.1 Theory

The original theory for Selective Harmonic Elimination was initially developed in [8, 9] and later adopted by numerous researchers for various voltage inverter topologies. Currently, the strategy is primarly employed in traction applications after start up state ends and the reference voltage for the drive is high enough so the six step output voltage is utilized. However, the general six step output signal produces high-order harmonics. When the motor is powered by these high-order voltage harmonics, the current with high-order harmonics (excluding triplen harmonics, considering the symmetric 3 phase motor) is observed. These current harmonics result in undesirable current ripple, torque ripple and losses [10], thereby decreasing the efficiency of the drive.

To control the output voltage and reduce unwanted harmonics, the Selective Harmonic Elimination (SHE) technique can be employed. The elimination is based on generating the output voltage by switching components at certain phase angles, thereby generating waveform with a number of pulses, to corresponding the number of elliminated harmonics. The calculation which angles to use is based on the calculation of fourier coefficients. These equations, derived from the original principle, have been adapted for different types of converters, including multilevel, H-bridge converters or generic Voltage Source Inverters (VSI). In this paper, the regular two level VSI is considered.

The considered inverter phase voltage six-step waveform is depicted in Figure 6 - 1a, while the harmonic analysis of the generic waveform is depicted in Figure 6 - 1b. It's worth noting that in a three-phase symmetrical system, the triplen harmonics are also eliminated.



(a) Generic Six-Step Waveform output of a two level Voltage Source Inverter. The Voltage value is normalized to a DC link voltage.



(b) Generic Six-Step Waveform harmonics analysis. The Voltage value is normalized to a DC link voltage.

Figure 6 - 1

As previously mentioned, the SHE method is based on a Fourier coefficient analysis. When the odd quarter-wave symmetry of the waveform is assumend, the  $a_n$  Fourier coefficient is zero (as mentioned in the Equation 6 - 1), whereas the  $b_n$  coefficient may be written as Equation 6 - 2.

$$a_n = 0, (6-1)$$

$$b_n = \frac{2}{T} \int_0^T x(n\omega t) \sin(\omega t) d\omega t, \qquad (6-2)$$

where the T is signal periode,  $x(\omega t)$  description of the VSI output voltage waveform and n is the order of the harmonics.

When assuming quarter-wave symmetry the Equation 6 - 2 may be rewritten as

$$b_{n} = \frac{8}{T} \int_{0}^{T/4} x(\omega t) \sin(n\omega t) d\omega t = \frac{8}{2\pi} \int_{0}^{2\pi/4} x(\omega t) \sin(n\omega t) d\omega t = \frac{4}{\pi} \int_{0}^{\frac{\pi}{2}} x(\omega t) \sin(n\omega t) d\omega t.$$
(6 - 3)

The function  $x(\omega t)$  represents the normalized output voltage pulse in relation to a DC link voltage. The Equation 6 - 2 can be reformulated by substituting  $\omega t$  with the angle  $\alpha$ , which also characterizes the output waveform in terms of radians. The function  $x(\alpha)$  yields 1 when the output voltage pulse is positive and -1 when negative. The reformulated Equation 6 - 2, assuming quarter-wave symmetry, is then as follows:

$$b_n = \sum_{k=1}^M \frac{8}{T} \int_{\alpha_k}^{\alpha_{k+1}} x(\alpha) \sin(n\alpha) d\alpha.$$
 (6 - 4)

Here M represents number of pulses in half period of the output signal. Assuming that the integral is calculated for angles where  $x(\alpha_k)$  is either 1 or -1, the function may be replaced by a constant. As a result, the integral calculation becomes straightforward.

$$b_n = \frac{4}{\pi} \sum_{k=1}^{M} \frac{1}{n} \left[ -\cos(n\alpha) \right]_{\alpha_k}^{n\alpha_{k+1}} = \frac{4}{\pi n} \sum_{k=1}^{M} \left[ \cos(n\alpha_k) - \cos(n\alpha_{k-1}) \right]. \tag{6-5}$$

The Equation 6 - 5 can be further simplified by observing the results of the summation for M=2.

$$b_{n} = \frac{4}{\pi n} \sum_{k=1}^{2} \left[ \cos(n\alpha_{k}) - \cos(n\alpha_{k-1}) \right] = \frac{4}{\pi n} \left[ \left( \cos(n\alpha_{1}) - \cos(n\alpha_{2}) \right) + \left( \cos(n\alpha_{2}) - \cos(n\alpha_{3}) \right) \right] =$$

$$= \frac{4}{\pi n} \left( \cos(n\alpha_{1}) - \cos(n\alpha_{3}) \right). \tag{6-6}$$

According to [8] and the example calculation for M=2, the further simplification of the Equation 6 - 5 is Equation 6 - 7.

$$b_n = \frac{4}{\pi n} \sum_{k=1}^{M} (-1)^{k+1} \cos(n\alpha_k). \tag{6-7}$$

It can be said, that the number of eliminated odd harmonics is N = M - 1.

To maintain clarity of this paper only the 5th harmonics is being eliminated by the designed unit. The set of equations required to eliminate this harmonic is as follows.

$$V_{1} = b_{1} = \frac{4}{\pi} \left[ \cos(\alpha_{1}) - \cos(\alpha_{2}) \right],$$

$$V_{5} = b_{5} = \frac{4}{5\pi} \left[ \cos(5\alpha_{1}) - \cos(5\alpha_{2}) \right].$$
(6 - 8)

The amplitudes of the 1st and 5th harmonics are denoted as  $V_1 = b_1$  and  $V_5 = b_5$ , respectively. For the elimination of the 5th harmonic, it is required that  $b_5 = 0$ . Consequentely, the set of Equations 6 - 8 can be simplified as set of Equations 6 - 9.

$$\frac{4V_1}{\pi} = \cos(\alpha_1) - \cos(\alpha_2),$$

$$0 = \cos(5\alpha_1) - \cos(5\alpha_2).$$
(6 - 9)

Solving the nonlinear Equations 6 - 9 is not straightforward. Barious methods can be employed for solving the problem, such as Genetic Algorithms [11, 12, 13] or algebraic methods [14, 15]. One commonly used algebraic method is Newton-Raphson (NR) algorithm [16]. In this paper, the solution is obtained solely using NR algorithm. However, it's worth noting that the success of this method depends on setting the initial conditions correctly; otherwise, a solution may not be found. In contrast, Genetic Algorithms also require setting initial values, but they often use random numbers from predefined intervals.

In real-time systems, the approach for solving the SHE equations may often be to precalculate the required switching angles offline and the utilize the LUT in a microprocessor to determine which set of angles use for the set reference voltage. Nowadays the FPGAs are more frequently utilized to calculate the solution. The calculation can be highly paralelized and optimized, enabling the solution to be obtained in near real-time. In the following sections the prototype implementation in Python and final implementation in Verilog are presented.

## 6.2 Simplification for Verilog and High level implementation

When implementing the solution in computational software like Wolfram Mathematica, optimizing the algorithm is unnecessary. However, when implementing the algorithm to an FPGA, higher-level constructs are not automatically available, so the simplification is necessary. Before creating the Verilog design, it is suitable, for clarity and prototyping purposes, to implement the algorithm in Python. In this section, the simplified algorithm of a NR aglorithm is presented.

The set of equations for eliminating the 5th harmonics may be formulated as

$$\begin{aligned} & \mathbf{F}_{1}^{i} = \cos(\alpha_{1}) - \cos(\alpha_{2}), \\ & \mathbf{F}_{2}^{i} = \cos(5\alpha_{1}) - \cos(5\alpha_{2}), \\ & \text{where } \mathbf{F}_{1}^{0} = m \; \frac{\pi}{4}, \; \mathbf{F}_{2}^{0} = 0. \end{aligned} \tag{6 - 10}$$

Where  $m = V_1/V_{\rm DC}$  is modulation index.

Thus the Jacobian matrix is

$$\mathbf{J}^{i} = \begin{pmatrix} -\sin(\alpha_{1}^{i}) & \sin(\alpha_{2}^{i}) \\ -5\sin(5\alpha_{1}^{i}) & 5\sin(5\alpha_{2}^{i}) \end{pmatrix}. \tag{6-11}$$

Where i is the index of the iteration of the algorithm. The inverted Jacobian matrix is needed for further calculations.

$$\mathbf{J}^{-1,i} = \begin{pmatrix} \frac{5\sin(5\alpha_2^i)}{5\sin(5\alpha_1^i)\sin(\alpha_2^i) - 5\sin(\alpha_1^i)\sin(\alpha_2^i)} & -\frac{\sin(\alpha_2^i)}{5\sin(5\alpha_1^i)\sin(\alpha_2^i) - 5\sin(\alpha_1^i)\sin(\alpha_2^i)} \\ \frac{5\sin(\alpha_1^i)}{5\sin(5\alpha_1^i)\sin(\alpha_2^i) - 5\sin(\alpha_1^i)\sin(\alpha_2^i)} & -\frac{\sin(\alpha_2^i)}{5\sin(5\alpha_1^i)\sin(\alpha_2^i) - 5\sin(\alpha_1^i)\sin(\alpha_2^i)} \\ \frac{5\sin(5\alpha_1^i)\sin(\alpha_2^i) - 5\sin(\alpha_1^i)\sin(\alpha_2^i)}{5\sin(5\alpha_1^i)\sin(\alpha_2^i) - 5\sin(\alpha_1^i)\sin(\alpha_2^i)} \end{pmatrix}. \tag{6 - 12}$$

From the inverted Jacobian matrix in Equation 6 - 12, it is evident that it can be easily calculated by dividing corresponding components of Jacobian matrix by the determinant, expressed as

$$\det(\mathbf{J}) = 5\sin(5\alpha_1^i)\sin(\alpha_2^i) - 5\sin(\alpha_1^i)\sin(\alpha_2^i). \tag{6-13}$$

Next, the defect  $\Delta F^i$  can be calculated

$$\Delta F_1^i = F_1^0 - F_1^i,$$
  

$$\Delta F_2^i = F_2^0 - F_2^i.$$
(6 - 14)

After the successfully calculated defect of a current iteration, the  $\Delta \alpha^i$  may be calculated.

$$\Delta \alpha^i = \mathbf{J}^{-1,i} \Delta \mathbf{F}^i, \tag{6-15}$$

thus rewritten in components notation which is more suitable for the Verilog implementation

$$\Delta \alpha_1^i = \mathbf{J}_{00}^{-1,i} \Delta F_1^i + \mathbf{J}_{01}^{-1,i} \Delta F_2^i,$$

$$\Delta \alpha_1^2 = \mathbf{J}_{10}^{-1,i} \Delta F_1^i + \mathbf{J}^{-1,i} \Delta F_2^i.$$
(6 - 16)

Finally the next iteration values of  $\alpha_1^i$  and  $\alpha_2^i$  may be calculated

$$\alpha_1^{i+1} = \alpha_1^i + \Delta \alpha_1^i, \alpha_2^{i+1} = \alpha_2^i + \Delta \alpha_2^i.$$
 (6 - 17)

With the newly calculated values of  $\alpha_1^i$ ,  $\alpha_2^i$  the algorithm may proceed with a new iteration (i+1) for calculating the  $F_1^{i+1}$  and  $F_2^{i+1}$  values.

It is important to note, that for the NR algorithm to function correctly and yield viable results, suitable initial values  $F_1^0$  and  $F_2^0$  must be carefully chosen before the algorithm starts.

When elliminating the 5th harmonic with m=1, the initial values of  $F_2^0=0.08726$  rad and  $F_2^0=1.3439$  rad yield satisfactory results.

The presented mathematical algorithm can then be transformed into an FPGA designed Verilog algorithm, visually represented as a block diagram in the section *Algorithm Block Design*.

## 6.3 High level implementation

The script allows changing the modulation index m at the beginning of the Python simulation. This feature enables generation of values that can be compared with results obtained from Verilog/cocotb and Verilator simulation of the hardware-implemented algorithm.

The script may be run with command "python3 she.py -mi <number>", where <number> is the requested modulation index.

```
import math
import argparse # for parsing command line arguments

# colorama for colors, easier than init class, maybe later
# source: https://github.com/tartley/colorama
from colorama import init as colorama_init
from colorama import Fore
from colorama import Style

colorama_init(autoreset=True) # autoreset color on new line

class with additional styles
```

```
13 class style:
      BOLD = ' \setminus 033[1m']
14
      UNDERLINE = ' \033 [4m']
      END = ' \033[0m']
argParser = argparse.ArgumentParser() # new object
argParser.add_argument("-mi", "--modulationIndex", help="set the modulation
      index 0-1") # adding argument
20 args = argParser.parse_args() # parsing args
21 modulationIndex = args.modulationIndex
# Set the desired modulation index
24 if not modulationIndex:
      print()
      print(style.BOLD+Fore.RED + "You did not specify the modulation index
     with mi command, specify it now:\n" + style.END)
      modulationIndex = input()
print("You have specified the modulation index: " + modulationIndex + ".\n"
modulationIndex = float(modulationIndex)
32 totalNumberOfIterations = 10
33 f10 = modulationIndex * 0.7853981 # modulationIndex * pi/4
34 f20 = 0
x10 = 0.0872664 # 5 degree
36 x20 = 1.3439035 # 77 degree
38 \times 1 = \times 10
39 \times 2 = \times 20
41 # main NR-LOOP
42 for numberOfIteration in range(totalNumberOfIterations):
      prepDeltaF1 = math.cos(x1) - math.cos(x2)
      deltaF1 = f10 - prepDeltaF1
      prepDeltaF2 = math.cos(5*x1) - math.cos(5*x2)
46
      deltaF2 = f20 - prepDeltaF2
47
48
      prepJ11 = math.sin(x1)
      prepJ01 = math.sin(x2)
      prepJ10 = 5 * math.sin(5*x1)
51
      prepJ00 = 5 * math.sin(5*x2)
52
53
54
      prepDet1 = prepJ10 * prepJ01
55
      prepDet2 = 5 * prepJ11 * math.sin(5*x2)
56
57
```

```
prepDet = prepDet1 - prepDet2
58
59
      divDet = 1 / prepDet
60
      jInv00 = divDet * prepJ00
      jInv01 = divDet * - prepJ01
      jInv10 = divDet * prepJ10
      jInv11 = divDet * - prepJ11
65
      deltaX1 = (jInv00 * deltaF1) + (jInv01 * deltaF2)
      deltaX2 = (jInv10 * deltaF1) + (jInv11 * deltaF2)
69
70
      x1 = x1 + deltaX1
71
      x2 = x2 + deltaX2
72
      print(Fore.CYAN + "numberOfIteration: " + str(numberOfIteration) +
     style.END)
75 # End of the main NR-LOOP
print(Fore.GREEN + "x1: " + str(x1) + style.END)
78 print(Fore.GREEN + "x2: " + str(x2) + style.END)
```

Code 6 - 1 Python implementation of the Selective Harmonic Elimination Algorithm with adjustable modulation index.

# 6.4 IP Block Design

# 6.4.1 Algorithm Block Diagram

The Figure 6 - 2 presents the hardware-implementation for SHE algorithm, mathematically expressed in the section *Simplification for Verilog and High level implementation*.



 $\label{lem:continuous} Figure~6-2~Block~Diagram~of~the~Selective~Harmonic~Elimination~(SHE)~using~Newton-Raphson~algorithm.~Design~suitable~for~hardware~implementation.$ 

## 6.4.2 Top module design

The top module of this IP closely resembles other developed modules in this paper. The design consists of a Control Unit which sends control signals to the Data Unit. The Data Unit, which includes registers and computational units, incorporates few external sub-modules for additional calculations, such as CORDIC and division.

Consistent with every design presented, the units utilize the Q32.15 fixed point format for the computational units and registers. The exception is the multiplier computational units, which, by the principle of multiplication, use the Q64.30 format for results. When the multiplication results are transferred to registers, the values are rounded back to the globally used format.

The design is depicted in Figure 6 - 3.



Figure 6 - 3 Top module design for the Selective Harmonic Elimination (SHE).

#### 6.4.3 Allocation and Timing

The Allocation and Timing diagram, depicted in Figure 6 - 4 outlines the algorithm presented in the *Theory* section. As evident from previous sections, this algorithm has been thoroughly tested before Verilog implementation.

The Verilog implementation comprises a total of 13 states, labeled S0-S12. Through states S1-S11, the NR algorithm iterates to calculate the final results. The state S0 is a starting state after resetting the unit, and state S12 is the ending state reached after the successful calculation of the last algorithm iteration.

As previously stated, the SHE calculation module consists of various submodules, which may use other iterative algorithms. Iterations of these submodule algorithms are not in focus of this section and are implicitly accepted as a part of the SHE module algorithm.



Figure 6 - 4 Allocation and Timing diagram for the Data Path Unit part of Selective Harmonic Elimination (SHE) module.

#### 6.4.4 Data Path Unit

As can be observed from the Figure 6 - 5 the Data Path unit for solving the transcendetal equations is more complex than previously presented units. Obviously the design could be further simplified, i.e., reduce the number of registers and calculation units. This simplification would result in a trade of speed for less complexity. The less complex the design, the less FPGA resources, i.e., LUTs, is needed for the realization of the design. This paper mainly focuses on speed and clarity, so the design consists of thirteen data registers, four CORDIC units, four multiplication units, two adders, two subtractors, one division unit and one invertor unit, which is implemeted directly in the registers logic.



Figure 6 - 5 Register transfer level (RTL) scheme of the Selective Harmonic Elimination Data Path Unit.

#### 6.4.5 Control Unit

Control unit signal specification can be observed in the Table 6 - 1. If the unit design was less complex, i.e., with smaller amount of registers, the control signal length would be smaller, but the number of states would be higher.

Table 6 - 1 Control signal encoding table for instructions to be processed by the Selective Harmonic Elimination (SHE) alogrithm solve Module.

| State RTL                           | Code                                                                                                                                                                                                                 | 51 S<br>140 M | 0 49 6<br>H M2 6 | B 47<br>D 164 | 46 45<br>145 146 | 47 MS | 42 · | 61 48<br>199 1411 | 39<br>MH2 | JA<br>MD S | 37 .<br>400 Se | iki si | 5 34<br>R2 Sells | J)<br>SiRe | 32<br>SARRI | S SARE | 1 :<br>190 Sell | 10<br>20[1] Se | 29<br>Bisjoj S | 28<br>idR7 S | 27<br>488(1) 3 | 26<br>ividenjej | 25<br>Sellito Se | 26<br>dkm % | 23<br>(RH % | 22<br>#R12[1] | 21<br>SaR12(4) | 20<br>SeRD | 19<br>SelCircl | III<br>SelCor2 | 17<br>SelCus | 16<br>SelCort | 15<br>Schidt | 16<br>2] SeMei | ijij sas | 13<br>Fallijej Se | 12<br>04400) | SIDMO[1] | SIMARI<br>SIMARI | 9<br>SiDMsD | g<br>IJ SelMa | acial Sa | 2<br>Dist S | 6<br>0546([1] | 5<br>Sassange | 4<br>Selfebil | Silleri | 2<br>Sellect | SHAME S | 0 | CS                   |
|-------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------|------------------|---------------|------------------|-------|------|-------------------|-----------|------------|----------------|--------|------------------|------------|-------------|--------|-----------------|----------------|----------------|--------------|----------------|-----------------|------------------|-------------|-------------|---------------|----------------|------------|----------------|----------------|--------------|---------------|--------------|----------------|----------|-------------------|--------------|----------|------------------|-------------|---------------|----------|-------------|---------------|---------------|---------------|---------|--------------|---------|---|----------------------|
| 50 RI                               | ×20;                                                                                                                                                                                                                 | 1 .           |                  | 0 0           | 0 0              | 0 0   | 0    | 0 0               | 0         | 0          | 0              | 0 1    |                  | 0          | 0           | 0      |                 | 0              | 0              | 0            | 0              | 0               | 0                | 0           | 0           | 0             | 4              | 4          | 0              | 0              |              | 0             | - 0          | 0              |          | 0                 | 0            | - 0      | 0                | 0           | 0             |          | 0           | 0             | 0             | 0             | 0       | 0            | 4       | 0 | 52%/2000000000000    |
| S1 R1←<br>R1←                       | - R0 x 5; (Mall)<br>- R1 x 5; (Mal2)                                                                                                                                                                                 | 0 1           | 1                | 1 0           | 0 0              | 0 0   | 0    | 0 0               | 0         | 0          | 4              | 0 :    | -                | - 4        | 0           | - 0    |                 | 0              | 4              | 0            | 0              | 0               | 0                | 0           | 0           | 4             |                | 4          | 0              | 0              | - 4          | 0             | - 1          | - 0            |          | 0                 | -            | - 0      | 0                |             | 0             | į        | 0           | 0             | 4             | 0             | 0       | 0            | 4       | 0 | 52%300000000000000   |
| R5<br>R6<br>R7<br>R8<br>R9<br>R10 - | - COS(RE); (Confact)<br>- COS(RE); (Confact)<br>- COS(RE); (Confact)<br>- COS(RE); (Confact)<br>- SIN(RE); (Confact)<br>- SIN(RE); (Confact)<br>- SIN(RE); (Confact)<br>- SIN(RE); (Confact)<br>- SIN(RE); (Confact) |               |                  | 0 1           | 1 1              |       | 1    |                   | 0         | 0          | 4              |        |                  | 1          | 1           |        |                 | 1              | 4              |              | 1              | 4               |                  | 0           | 0           | 4             | 4              | 4          | -              |                |              | 1             |              |                |          | 0                 | 0            | 4        |                  | 0           |               |          | 4           | 0             | 4             | 0             |         | 0            | 4       | 0 | SZNEFESÍ DROFOROS    |
| 83 ÷<br>812 ÷<br>84 ÷               | - R4 - R5; (Sub-1)<br>- R6 - R7; (Sub-2)<br>- R5 x 5; (Mul1)<br>- R10 x 5; (Mul2)<br>- R11 x 5; (Mul3)                                                                                                               | 0 1           | 1                | 1 1           | 1 0              | o a   | 4    | 0 4               | 1         | 0          | 4              | 0 1    |                  | 4          | 1           | a      |                 | 0              | 4              | 0            | 0              | 4               | 0                | 0           | 0           | 1             |                | 4          | 0              | 0              | 4            | 0             | 4            |                |          | 1                 | 0            |          | 1                |             | 0             |          | 4           | 1             | 4             | 1             |         | 0            | 4       | 0 | 523-3000 100006250   |
| Si R13 ↔                            | - F99 - R2; (Sub1)<br>- F20 - R3; (Sub2)<br>R9 x R4; (Mull)<br>R11 x R12; (Mul2)                                                                                                                                     |               | 1                | 1 0           |                  | 0 0   | 4    | 0 0               | 1         | 1          |                | 0 1    |                  |            | 0           | 4      |                 | 0              | 4              | 0            | 0              |                 | 0                | 0           | 0           | 4             | 4              | 1          | 0              | 0              | 4            | 0             | - 0          | -              |          | 0                 | 0            | 1        | 0                | 0           | 0             |          | 4           | 0             | 1             | 0             | 0       | 0            | 4       | 0 | 52%30000000000000000 |
| SS R13+                             | - R13 - R12 (Sub1)                                                                                                                                                                                                   | 0 1           | 0 0              | 0 0           | 0 0              | 0 0   | - 0  | 0 0               | 0         | 1          | 4              | 0 (    | - 4              | - 0        | 0           | - 0    |                 | 0              | 0              | 0            | 0              | 4               | 0                | 0           | 0           | 4             | - 0            | - 0        | . 0            | 0              | - 0          | 0             | - 0          | - 0            |          | 0                 | 0            | 4        | 0                | 0           | - 6           |          | 0           | 0             | - 0           | - 0           | 0       | 0            | - 0     | 0 | 52%4000000000        |
| S6 R12+                             | 1813 (Did)                                                                                                                                                                                                           | 0 1           | 0 0              | 0 0           | 0 0              | 0 0   | 4    | 0 0               |           | 0          | 4              | 0 1    |                  | - 4        | 0           | - 0    |                 | 0              | 4              | 0            | 0              | -               | 0                | 0           | 0           | 4             |                | - 0        | - 0            | 0              | -            | 0             | - 4          | -              |          | 0                 | 0            |          | 0                | 0           |               |          | 4           | 0             | -             |               | 0       | 0            | -       | 0 | \$276,00004000000    |
| \$7 R6 ←<br>R7 ←<br>R8 ←            | -R5 x R12 (Mall)<br>-R9 x R12 (Mal2)<br>-R6 x R12 (Mal3)<br>-R8 x R12 (Mal8)                                                                                                                                         | 0 1           |                  | 0 0           | 1 1              | 1 1   | 1    | 0 0               | 0         | 0          |                | 0 1    |                  |            | 0           |        |                 | 0              |                | 0            | 0              | 1               | 0                | 0           | 0           | 4             | 4              |            |                |                | 4            |               | 0            |                |          | ı                 |              | 4        | 1                |             | 1             |          | 1           | 0             |               | 0             | 0       | 0            | 4       | 0 | 523/C00A4002590      |
| NX RX ←                             | 1 x 88; (lev1)<br>1 x 88; (lev1)                                                                                                                                                                                     | 0 1           | 0 0              | 0 0           | 0 1              | 0 1   | 0    | 0 0               | 0         | 0          | 4              | 0 1    |                  | - 4        | 0           | - 0    |                 | 0              | 4              | 0            | 0              | 0               | 0                | 0           | 0           | 4             |                | 4          | 0              | 0              | - 4          | 0             | 0            | - 0            |          | 0                 |              | - 0      | 0                |             | 0             | į        | 0           | 0             | 4             | 0             | 0       | 1            | 4       | 0 | 523250000000000      |
| so 86<br>87                         | R5 x R2; (Mall )<br>R6 x R3; (Mal2)<br>R7 x R2; (Mal3)<br>R8 x R3; (Mal4)                                                                                                                                            | 0 1           |                  | 0 0           | 1 1              | 1 1   | 4    | 0 0               | 0         | 0          |                | 0 1    |                  |            | 0           | -      |                 | 0              |                | 0            | 0              | 1               | 0                | 0           | 0           | 4             | 4              | 4          | 0              | 0              | 4            |               | 4            |                |          | 0                 | 0            | 4        |                  | 0           | 0             |          | 4           | 0             |               | 0             | 0       | 0            |         | 0 | 52%7800A4000000      |
| N6 ←                                | -R5+R6; (AΔII)<br>-R7+R8; (AΔII)                                                                                                                                                                                     | 0 1           | 0 0              | 0 0           | 1 1              | 0 0   | 0    | 0 0               | 0         | 0          | 0              | 0 1    |                  | - 0        | 0           | 0      |                 | 1              | 1              | 0            | 0              | 0               | 0                | 0           | 0           | 0             | 4              | 4          | 0              | 0              | - 4          | 0             | - 0          | 0              |          | 0                 | 0            | 0        | 0                | 0           | 0             |          | 0           | 0             | 0             | 0             | 0       | 0            | 1       | 1 | 52360006000000       |
| S11 R1                              | - R5 + R0; (Add))<br>- R6 + R1; (Add)                                                                                                                                                                                | 1 1           | 0                | 0 0           | 0 0              | 0 0   | 0    | 0 0               | 0         | 0          | 1              | 1 1    |                  | - 0        | 0           | - 0    |                 | ı              | 1              | 0            | 0              | 0               | 0                | 0           | 0           | 0             | - 0            | - 0        | 0              | 0              | - 0          | 0             | - 0          | - 0            |          | 0                 | 0            | - 0      | 0                | 0           | 6             |          | 0           | 0             | - 0           | 0             | 0       | 0            | 0       | 0 | 52°14C003060000000   |

## 6.4.6 Inverter output voltage analysis for Verilog implementation

The simulated VSI output phase voltage, when the SHE algorithm is employed for the modulation index m=1 in Verilog, is depicted in the Figure 6 - 6a. The harmonic analysis for the presented waveform can be observed in the Figure 6 - 6b. As mentioned in previous sections, please note that in the 3-phase symmetrical system, the triplen harmonics would be eliminated as well. It is evident that the unwanted 5th harmonics has been successfully eliminated. The calculated angles after ten iterations of the NR algorithm are  $\alpha_1=0.06341$  rad and  $\alpha_2=1.320098$  rad.



(a) Waveform output of a two level Voltage Source Inverter when the Selective Harmonic Elimination method is applied with Verilog calculated angles. The Voltage value is normalized to a DC link voltage.



(b) Waveform harmonics analysis of a output voltage of a two level Voltage Source Inverter utilising switching angles for the Selective Harmonic Elimination calculated in Verilog unit. The Voltage value is normalized to a DC link voltage.

Figure 6 - 6



(a) Comparison of a Waveform output of a two level Voltage Source Inverter when the Selective Harmonic Elimination method is or is not applied with Verilog calculated angles. The Voltage value is normalized to a DC link voltage.



(b) Comparison of a Waveform harmonics analysis of a output voltage of a two level Voltage Source Inverter with and without the Selective Harmonic Elimination. The Voltage value is normalized to a DC link voltage.

*Figure 6 - 7* 

# 6.5 Simulation results

The end of simulation with result of SHE alogrithm after the 10th NR algorithm can be seen in Figure 6 - 8. The whole simulation run is depicted in the Figure 6 - 9.

The clock signal frequency in simulation was set to 25 MHz to emulate low cost FPGA capabilities.



Figure 6 - 8 The ending part of a Verilog simulation of Selective Harmonic Elimination (SHE) algorithm. The result are in registers R0 and R1.



Figure 6 - 9 The whole Verilog simulation of Selective Harmonic Elimination (SHE) algorithm. The result are in registers R0 and R1.

# **Conclusion**

This paper introduces FPGA module designed for solving the SHE algorithm in near real-time. The module comprises two additional submodules, both discussed in this paper. These submodules include units for calculating the division of two arbitrary values and a CORDIC unit suitable for calculating sine and cosine functions.

The primary objective of this paper was to design speed-optimized modules capable of near real-time calculations. The outcomes of this paper could serve as a starting point for future research in designing modules for controlling electric drives or creating Hardware-in-Loop Systems.

## References

- [1] LTD, Potential Ventures; INC, SolarFlare Communications. Cocotb. In: *Cocotb website* [online]. [B.r.] [visited on 2023-10-08]. Available from: https://www.cocotb.org/.
- [2] SNYDER, Wilson. Verilator. In: *Verilator website* [online]. [B.r.] [visited on 2023-10-08]. Available from: https://www.veripool.org/verilator/.
- [3] ADVANCED MICRO DEVICES, Inc. Divider Generator LogiCORE™ IP. In: *Intellectual Property* [online]. [B.r.] [visited on 2023-10-01]. Available from: https://www.xilinx.com/products/intellectual-property/divider.html.
- [4] BURKE, Tom. Verilog Fixed point math library. In: *GitHub* [online]. [B.r.] [visited on 2023-10-01]. Available from: https://github.com/freecores/verilog\_fixed\_point\_math\_library.
- [5] MEYER-BÄSE, Uwe. *Digital signal processing with field programmable gate arrays*. 4th ed. Berlin: Springer, 2014. ISBN 978-3-642-45308-3.
- [6] VOLDER, Jack E. The CORDIC Trigonometric Computing Technique. *IRE Transactions on Electronic Computers*. 1959, roč. EC-8, č. 3, pp. 330–334. Available from DOI: 10.1109/TEC.1959.5222693.
- [7] WALTHER, J. S. A Unified Algorithm for Elementary Functions. In: *Proceedings of the May 18-20, 1971, Spring Joint Computer Conference*. Atlantic City, New Jersey: Association for Computing Machinery, 1971, pp. 379–385. AFIPS '71 (Spring). ISBN 9781450379076. Available from DOI: 10.1145/1478786.1478840.
- [8] PATEL, Hasmukh S.; HOFT, Richard G. Generalized Techniques of Harmonic Elimination and Voltage Control in Thyristor Inverters: Part I--Harmonic Elimination. *IEEE Transactions on Industry Applications*. 05/1973, roč. IA-9, č. 3, pp. 310–317. ISSN 0093-9994. Available from DOI: 10.1109/TIA.1973.349908.
- [9] PATEL, Hasmukh S.; HOFT, Richard G. Generalized Techniques of Harmonic Elimination and Voltage Control in Thyristor Inverters: Part II --- Voltage Control Techniques. *IEEE Transactions on Industry Applications*. 09/1974, roč. IA-10, č. 5, pp. 666–673. ISSN 0093-9994. Available from DOI: 10.1109/TIA.1974.349239.
- [10] MÜLLNER, F.; NEUDORFER, H.; SCHMIDT, E. Modelling and precalculation of additional losses of inverter fed asynchronous induction machines for traction applications. In: *International Aegean Conference on Electrical Machines and Power Electronics and Electromotion, Joint Conference*. 2011, pp. 415–420. Available from DOI: 10.1109/ACEMP.2011.6490634.
- [11] TAGHIZADEH, H.; TARAFDAR HAGH, M. Harmonic elimination of multilevel inverters using particle swarm optimization. In: *2008 IEEE International Symposium on Industrial Electronics*. Cambridge, UK: IEEE, 06/2008, pp. 393–396. ISBN 978-1-4244-1665-3. Available from DOI: 10. 1109/ISIE.2008.4677093.
- [12] ORTIZ-ESPINOZA, Alexandro; MENDEZ-FLORES, Efrain; PONCE-CRUZ, Pedro; MACIAS-HIDALGO, Israel; MOLINA-GUTIERREZ, Arturo. PWM with Selective Harmonic Elimination Using Optimization Inspired on Earthquakes for AC Electric Drives. In: 2020 5th International Conference on Control and Robotics Engineering (ICCRE). Osaka, Japan: IEEE, 04/2020, pp. 135–139. ISBN 978-1-72816-791-6. Available from DOI: 10.1109/ICCRE49379.2020.9096462.

- [13] ABDELQAWEE, I. M.; ABDEL-RAHIM, Naser M. B.; MANSOUR, Hajji. SELECTIVE HARMONIC ELIMINATION PWM VOLTAGE SOURCE INVERTER BASED ON GENETIC ALGORITHM. 2015. Available also from: https://api.semanticscholar.org/CorpusID:38827373.
- [14] WANG, Chenxu; ZHANG, Qi; YU, Wensheng; YANG, Kehu. A Comprehensive Review of Solving Selective Harmonic Elimination Problem with Algebraic Algorithms. *IEEE Transactions on Power Electronics*. 2023, pp. 1–20. ISSN 0885-8993, ISSN 1941-0107. Available from DOI: 10.1109/TPEL. 2023.3327280.
- [15] CHIASSON, J.N.; TOLBERT, L.M.; MCKENZIE, K.J.; DU, Z. A Complete Solution to the Harmonic Elimination Problem. *IEEE Transactions on Power Electronics*. 03/2004, roč. 19, č. 2, pp. 491–499. ISSN 0885-8993. Available from DOI: 10.1109/TPEL.2003.823207.
- [16] BALOW, Writwik; HALDER, T. A Selective Harmonic Elimination (SHE) Technique for the Multi-Leveled Inverters. In: 2018 IEEE Electron Devices Kolkata Conference (EDKCON). Kolkata, India: IEEE, 11/2018, pp. 625–630. ISBN 978-1-5386-6415-5. Available from DOI: 10.1109/EDKCON. 2018.8770415.
- [17] BURENEVA, Olga I.; KAIDANOVICH, Olga U. FPGA-based Hardware Implementation of Fixed-point Division using Newton-Raphson Method. In: 2023 IV International Conference on Neural Networks and Neurotechnologies (NeuroNT). 2023, pp. 45–47. Available from DOI: 10.1109/NeuroNT58640. 2023.10175844.

# **Appendix A:** List of and Abbreviations

### A.1 List of abbreviations

**CORDIC** Coordinate Rotation Digital Computer

**CPU** Central Processing Unit

**DC** Direct Current

FOSS Free and open-source software
FPGA Field Programmable Gate Array

FSM Finite State Machine IP Intellectual property

**ISA** Instruction Set Architecture

LUT Look Up Table
NR Newton Raphson

**RTL** Register Transfer Level

**SHE** Selective Harmonic Elimination

VSI Voltage Source Converter