# CMSC838L - Final report

Arjun Vedantham Yusuf Bham

May 2024

### 1 Introduction and Motivation

Digital signal processing techniques are extremely important in telecommunications, computer vision, and a number of other related fields. In particular, digital signal processing techniques form a key component behind the idea of software defined radio (SDR), which refers to analyzing digital samples that represent radio signals with software, instead of discrete hardware components that operate over signals in analog formats. Software defined radio presents a notable improvement in flexibility for radio engineers, and removes the need for specialized hardware components - instead, new signal demodulation techniques or formats can be deployed just through a simple software update.

SDR users typically define signal processing pipelines using platforms like GNURadio. GNURadio presents a graphical format for creating these pipelines, with discrete blocks representing a signal source, signal sink, or an intermediate step in the processing pipeline. As an example, see the "flowgraph" (as called in GNURadio's documentation) below. This starts by instantiating a signal source from an RTL-SDR (a type of hobbyist SDR that can be used over USB with consumer PCs), and sets both a sampling rate (32000 Hz) and a listening frequency (signals at 433 MHz). From here, these samples are fed into a low pass filter block (which smooths out high frequency noise in the signal), and the resulting signal is transformed into a stream that is passed to a GUI block that graphs the signal, using a Fast Fourier Transform (FFT) to move the signal into the frequency domain.

This simple flowgraph is stored in an XML format, and is used by GNURadio's backend to generate a Python script that actually runs the defined processing pipeline. In addition, there are also C/C++ modules loaded into the runtime system for low level tasks - for instance, a USB driver for the SDR.



#### 1.1 Problems Identified with State of the Art

There are number of problems with this current approach. First, GNURadio's practice of emitting Python scripts means that parallelism is limited on two fronts - first, because of language design choices (Python's infamous "global interpreter lock", which essentially forbids runtime concurrency), and also because we are ultimately running this script on standard PC hardware, which uses an inherently sequential von Neumann architecture. Additionally, as previously mentioned, there is also an extensive library of C/C++ modules, and the implementation of these modules are opaque to the Python-level code generated by GNURadio's flowgraph compiler.

As such, we identified two research questions that we aimed to answer in this project:

- Could we use hardware acceleration to get better performance and greater parallelism for DSP applications?
- Could we design a language that is more conducive to defining correct DSP pipelines?

### 2 Literature Review

We started by conducting a literature review of existing languages designed for DSP problems.

#### 2.1 Ziria

One of the first papers we considered as Ziria [8], a domain specific language that was designed to aid development in implementations of the physical layer of wireless protocols. Ziria presented a functional language design syntax, and was specifically intended for wireless protocol implementations on IOT hardware. As such, it contained many primitives that we thought would be important to add to our language - for instance, an "FFT" primitive function. However Ziria had a key limitation - it was designed for standard CPUs, which meant that there were still parallelism limitations.

### 2.2 Calyx

One technique generally used for hardware acceleration for specialized applications like this is to deploy them to FPGAs. One example of this was the Catapult paper that we read in class, where Microsoft deployed FPGAs to accelerate running PageRank as part of their Bing server infrastructure. However, FPGAs are notoriously difficult to program, and generally require extremely fine-grained circuit configurations written in a hardware description language like Verilog.

Calyx [7] is an intermediate representation for compilers developed by the CAPRA research group at Cornell. It defines circuits in three distinct parts - a collection of memories (consisting of combinational memories/ flip-flops and registers), wire groups (which denote assignments between different memory components in the circuit), and a statically defined control schedule that orders wire assignments. Calyx has already been used in the Filament HDL, another domain specific language project from the CAPRA group that incorporated signal timing into the language's type system.

### 3 Technical Contribution

We decided to use the Calyx IR and design a more general, hardware acceleratable language for DSP tasks, called Zinnia. <sup>1</sup> We also considered emitting circuits using CIRCT, the LLVM framework that allows compilers to generate MLIR that is subsequently lowered to a Verilog hardware description, however we quickly found that CIRCT lacked the project maturity needed to develop even a basic language around it. This included no binaries to link against, thus requiring us to compile a large subset of the LLVM project, which was not practical. Calyx also had better documentation for its IR, and a small standard library of memory cells that could be easily integrated into circuits synthesized when the compiler lowers from the IR to a Verilog hardware description.

## 3.1 Parsing and Typing

Zinnia is backed by a bidirectional type system, heavily inspired by Complete and Easy Bidirectional Typing for Higher-Rank Polymorphism and Purescript [2]. Also similarly to Purescript, while the system is based on CAEBDT, our system doesn't use the "main" takeaway of the paper – an ordered context, and stays with a more simple one. This was largely due to familiarity, and in the future we would like to rewrite it to use ordered contexts. The choice of a bidirectional type system stems from its ease of extensibility compared to HM, which is the more common choice for a language like this. This extensibility allows us to more easily add features like linear typing in the future, which is important in this space as it means we can keep the *pure* and *functional* aspect, while still preserving performance and memory constraints. Currently types are restricted to functions and primitives, just due to time constraints. General universal quantification is supported, but access to it is

<sup>&</sup>lt;sup>1</sup>Named "Zinnia" because "calyx" refers to the petals of a flower and zinnias are a type of flower. Plus it sounds like Ziria.

currently limited to the scan primitive. While generics over types are generally supported, being generic over a *value* is currently restricted to the vector primitive.

Full semantics and syntax for our language can be found at the end of our report.

#### 3.2 Code Generation

The compiler that we wrote for Zinnia traverses the AST recursively. Since there is no concept of dynamic memory allocation for circuits, we focused on generating the required memories for the circuit first.

At the top level, "let" bindings are used to generate register memories (for integers) and combinational memories (for vectors). For constant values, we use Calyx's "structure" macro to instantiate a "constant" register with the required value at compile time, which is subsequently removed in the process of optimization and lowering to Verilog. Variable bindings are stored in a hashmap and are accessed using either the component name (which is guaranteed to be unique through the Calyx frontend API) or the user-defined variable name.

For binary operations or if expressions, we need to use comparator primitives and wire them correctly - to do this, we call the memory generation function on each of the parameters of the operation, and then use a "wire" generation function which generates assignments between the inputs and outputs of the memories and logical primitives. After this is complete, we generate a small schedule of operations depending on the subexpression that we are compiling, using the Calyx "seq" block to sequentially order operations (e.g. ensuring that a register memory gets its constant value before the addition actually takes place). Finally, the top most register memory's value is copied into a specially instantiated combinational memory. This is because register memories in Calyx do not have latching/persistence guarantees, however, values in combinational/vector memories can be examined as part of the circuit simulation.

## 3.3 Supporting Parallelization in Compilation

One of the main goals of our DSL was to improve parallelism. While Calyx has a "par" primitive for parallelizing parts of the circuit, Zinnia currently only generates circuits using the "seq" ("sequential") control flow keyword for easier debugging and testing. Even though parallelism at the user level is limited, we still hoped to expose some of the potential parallel speedups through the use of a function primitive that could be used to get implicit parallelism.

#### 3.3.1 Prefix Sums Scan

To expose some parallelism without explicitly supporting compiling using the "par" control flow block, we implemented a version of parallel scan using the prefix sums algorithm [1]. This primitive takes a reference to a vector with 8 integers and performs the addition operation, returning a modified array where the latter half of the array contains the scan result.

This function was supposed to be invoked as a primitive within the language, with the compiler inlining a separate Calyx component whose input memories would be populated





Figure 1: Simplified compilation/memory instantiation flow for a Zinnia program.

with a reference to the vector supplied by the caller. However, while we were able to test the scan component as a separate circuit generated from our hand-written IR code, we were not able to fully resolve linking issues that would allow us to pass memories by reference into subcomponents.

Figure 2: Screenshot of the final state of the prefix sums array - a separate vector memory is instantiated in the IR implementation to track the intermediate values as they flow up and down the prefix-sums tree.

### 4 Evaluation

We decided to evaluate our language in three different ways: correctness testing, scalability testing, and hardware testing.

## 4.1 Correctness Testing

To verify the correctness of the language, we attmepted to use the Cocotb RTL testing framework and the Icarus Verilog simulator. This allows us to run a simulated version of the circuit's clock and write unit tests around elements of the circuits that we generate. Although we did not have time to implement extensive tests, we were able to informally verify correct execution using unit tests for the binary operations supported by the language, as well as correct outputs values for the scan primitive (which was tested separately since we could not link this into the language easily) through Icarus Verilog. We were also able to generate timing diagrams that we could view with GTKWave, and this showed that values used during the circuit's evaluation were being loaded into registers correctly. However, we would have wanted to use randomized testing with cocotb to be more confident in our compiler's correctness.

## 4.2 Scalability Testing

Second, we wanted to test the scalability of our language. For this, we focused on evaluating the "scan" primitive, since it is the main way end-users would be able to obtain parallelism



Figure 3: Example timing diagram for the program shown in Figure 1. The operands (represented in hex form here) are loaded into reg0 and reg1, before being loaded into the "left" and "right" inputs of the combinational adder. The adder register's result (5th line down) is then moved into the "output" combinational memory using its "write\_data" line. Finally, the last line displays the clock signal for the whole circuit.

through the language (at least, as it is currently implemented). We found that in simulation, our parallel scan implementation always took 71 cycles to generate outputs, given a set of 8 integers to iterate over. While parallel scan theoretically provides work bounds of O(n) and span bounds of  $O(\lg(n))$ , in practice the work and span bounds of our implementation are worse. This is because we were not sure if it was possible to instantiate an arbitrarily high number of registers, so we focused on using the combinational memory primitives to hold intermediate values (particularly between the "sweep-up" and "sweep-down" phases of parallel scan). Combinational memory primitives in the Calyx IR are exclusive read/write memory cells, and take at least one cycle to set the data address and read the stored value out of the memory into a register. This restriction essentially limited the parallelism that we could achieve to only a few steps of the algorithm - specifically, summing values between nodes at each level of the prefix sums tree, since the additions were purely combinational and depended only on values in registers that were ready to go at the last cycle.

### 4.3 Physical Testing

We hoped to eventually deploy circuits that were synthesized using Zinnia to a Lattice Icestick FPGA. This FPGA can be used with consumer devices over a USB connection, and has a large library of open source tooling. This includes the Yosys synthesis toolkit (which handles transforming high level Verilog hardware descriptions into lower level RTL circuits), the nextpnr place and route tool (which handles connections between logic units on the FPGA), and iceprog, an open source programming tool that could deploy circuits with complete routing to the flash memory on-board the Icestick.

Circuits synthesized by Calyx include three signals by default - a "go" line, which when activated, activates the circuit, a "clk" (clock) line, which handles cycle-level signalling for the sequential logic elements of the circuit, and a "reset" line, which can be used to reset the circuit while active.

In order to actually use the synthesized circuits, the "go" line must be pulled to logical high, and while the reset lines needed to be set to logical low, and the clock line must be tied to the 12 MHz oscillator built into the Icestick. Values for the "scan" primitive could then be loaded in from the Icestick's block RAM module.

To support dynamically loading values into block RAM, we wrote a small serial communication driver for the Icestick. Specifically, this was a form of one way (half-duplex) SPI, with an Arduino serving as the primary device on the SPI bus, and the Icestick serving as the

secondary device. We chose SPI because it is relatively easy to implement on the receiving side (in fact, a shift register is enough to receive data transmitted over the data line). While testing the SPI driver, we wanted to use an external clock source to control the speed of data transmission, and ensure that the Icestick was sampling the data line in phase with the transmitting device's clock (in this case an Arduino). Unfortunately, setting the clock line for the circuit loaded onto the FPGA also sets the clock line for the USB programmer chip, which requires a 12MHz signal in order to function correctly. This misconfiguration prevented us from loading new circuits onto the FPGA, and required us to use simulation as our primary evaluation method instead.



Figure 4: Half duplex SPI waveform, where an Arduino would have controlled the SCK and SDO lines and the FPGA would bit shift the received values in block RAM.

[6]

### 5 Conclusion

In the end, we were able to implement a version of the scan primitive in the Calyx IR, implement a bidirectional typing system, add some basic language features into our language, and use external circuit verification and simulation tools to show that our circuits performed as the high level program specified.

However, there is still room for improvement, both in terms of feature support and performance - we do not have support for looping structures, even though the Calyx IR supports it. Additionally, our parallelism benefits are limited by the fact that we can only really exploit the parallelism present in the hardware through the scan primitive, even though Calyx supports parallelized control schedules. While part of this is due to limitations in the Calyx standard library, future libraries could be used to enable concurrent read/exclusive write accesses to these memories instead of the current exclusive read/write access pattern.

We were also not able to show our circuits running on real hardware, which was the ultimate goal of the project.

One obstacle that we did not foresee was the difficulty of working with the Calyx IR from Rust. While LLVM's CIRCT project has a robust Rust API, it seems that the authors of the Calyx IR primarily want language authors to use their IR from a Python builder module or the CIRCT toolchain, even though the Filament HDL (which was authored by the same research group and uses Calyx as its IR) is also written in Rust. For some issues, like linking against the combinational memory primitives or inlining other components, the Rust library documentation was so poor that we ended up looking through Filament's implementation and using that as a guide for building our compiler.

Altogether, this was a very interesting exploration into building a domain specific language that interacted so closely with hardware, and was a great way to get experience with compilers, parallel algorithms, type systems, and using FPGAs. We hope to continue working on Zinnia as a hobby research project over the summer.

The codebase for Zinnia is available on GitHub: https://github.com/javathunderman/zinnia

## 6 Syntax

```
::= let \langle ident \rangle : \langle ty \rangle = \langle expr \rangle;
\langle decl \rangle
                                                         ::= (\langle letter \rangle \mid ) \{\langle letter \rangle \mid 0..9 \mid \}
\langle ident \rangle
\langle ty \rangle
                                                         ::= ()
                                                                   bool
                                                                \langle num\text{-}ty \rangle Vec<\langle num\text{-}ty \rangle, \{0..9\}^+>
                                                         ::= (u \mid i) \{0...9\}^+
\langle num-ty\rangle
\langle expr \rangle
                                                         ::= ()
                                                                    \langle bool\text{-}lit \rangle
                                                                    \langle int\text{-}lit \rangle
                                                                    \langle vec\text{-}lit \rangle
                                                                    (\{\langle expr \rangle\}^+)
                                                                    (\langle expr \rangle \langle bop \rangle \langle expr \rangle)
                                                                    (let (\langle binders \rangle) \langle expr \rangle)
                                                                    (if \langle expr \rangle \langle expr \rangle \langle expr \rangle)
\langle binders \rangle
                                                         ::= \langle binder \rangle
                                                                    \langle binder \rangle, \langle binders \rangle
```

$$\langle binder \rangle & ::= (\langle ident \rangle, \langle expr \rangle) \\ | (\langle ident \rangle : \langle ty \rangle, \langle expr \rangle) \\ | (\langle ident \rangle : \langle ty \rangle, \langle expr \rangle) \\ | ::= > \\ | >= \\ | < \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <= \\ | <=$$

# 7 Typing

Types 
$$A, B, C ::= () | \alpha | \widehat{\alpha} | \forall \alpha. A | \forall \alpha. A | A_1, \dots, A_n \to B$$

Monotypes  $\tau, \sigma, \pi ::= () | \alpha | \widehat{\alpha} | \tau_1, \dots, \tau_n \to \sigma | \vartheta | \vartheta_{i,s} | \widehat{\vartheta} | \widehat{\vartheta}_{i,s} | \forall ec(\tau, c) | \widehat{\forall} ec | Bool$ 

Contexts  $\Gamma, \Delta, \Theta ::= \cdot | \Gamma, \alpha | \Gamma, (x : A)$ 

Figure 5: Syntax of types, monotypes, and contexts.

Figure 6: Well-formedness of types

$$\frac{\Gamma \vdash A}{\Gamma \vdash A \cap A \Longrightarrow A \dashv \Gamma} \ \mathsf{EqUni}$$
 
$$\frac{\Gamma \vdash \mathsf{Vec}(\tau,c) \qquad \Gamma \vdash \mathsf{Vec}(\sigma,c) \qquad \Gamma \vdash \tau \cap \sigma \Longrightarrow \pi \dashv \Delta}{\Gamma \vdash \mathsf{Vec}(\tau,c) \cap \mathsf{Vec}(\sigma,c) \implies \mathsf{Vec}(\pi,c) \dashv \Delta} \ \mathsf{VecUni}$$
 
$$\frac{\Gamma \vdash A_1, \dots, A_n \to R_A \qquad \Gamma \vdash B_1, \dots, B_N \to R_B}{\Gamma \vdash A_i \cap B_i \overset{\mathfrak{n}}{\underset{i=1}{\longrightarrow}} C_i \dashv \Delta \qquad \Gamma \vdash R_A \cap R_B \Longrightarrow R_C \dashv \Delta} \ \mathsf{ArrowUni}$$
 
$$\frac{\Gamma \vdash \widehat{\alpha} \qquad \Gamma \vdash \widehat{\beta}}{\Gamma \vdash \widehat{\alpha} \cap \widehat{\beta} \Longrightarrow \widehat{\alpha} \dashv \Delta} \ \mathsf{ExAnyUni}$$
 
$$\frac{\Gamma \vdash \widehat{\vartheta}_1 \qquad \Gamma \vdash \widehat{\vartheta}_2}{\Gamma \vdash \widehat{\vartheta}_1 \cap \widehat{\vartheta}_2 \Longrightarrow \widehat{\vartheta}_1 \dashv \Delta} \ \mathsf{ExUNumUni}$$
 
$$\frac{\Gamma \vdash \widehat{\vartheta}_{s_1,c_1} \qquad \Gamma \vdash \widehat{\vartheta}_{s_2,c_2}}{\Gamma \vdash \widehat{\vartheta}_{s_1,c_1} \qquad \Gamma \vdash \widehat{\vartheta}_{s_2,c_2}} \ \mathsf{ExSSNumUni}$$
 
$$\frac{\mathsf{S}_3 = \mathsf{S}_1 \land \mathsf{S}_2 \qquad \mathsf{C}_3 = \mathsf{max}(\mathsf{C}_1,\mathsf{C}_2)}{\Gamma \vdash \widehat{\vartheta}_{s_1,c_1} \qquad \widehat{\vartheta}_{s_2,c_2} \Longrightarrow \widehat{\vartheta}_{s_3,c_3} \dashv \Delta} \ \mathsf{ExSSNumUni}$$

$$\begin{array}{c} \Gamma \vdash \widehat{\vartheta}_{s_1,c_1} & \Gamma \vdash \widehat{\vartheta} \\ \hline \Gamma \vdash \widehat{\vartheta}_{s_1,c_1} \cap \widehat{\vartheta} \Rightarrow \widehat{\vartheta}_{s_1,c_1} \dashv \Delta \end{array} \text{ExSUNumUni} \\ \hline \frac{\Gamma \vdash \widehat{\psi}ec_1}{\Gamma \vdash \widehat{\psi}ec_1} & \Gamma \vdash \widehat{\psi}ec_2}{\Gamma \vdash \widehat{\psi}ec_1} & \Gamma \vdash \widehat{\psi}ec_2 \Rightarrow \widehat{\psi}ec_1 \dashv \Delta} & \text{ExVecUni} \\ \hline \frac{\Gamma \vdash \widehat{\psi}ec_{\alpha}}{\Gamma \vdash \widehat{\psi}ec_{\alpha}} & \Gamma \vdash \text{Vec}(\tau,c)}{\Gamma \vdash \widehat{\psi}ec_{\alpha}} & \nabla \text{Vec}(\tau,c) \Rightarrow \text{Vec}(\tau,c) \dashv \Delta} & \text{ExVecCUni} \\ \hline \frac{\Gamma \vdash \widehat{\vartheta}}{\Gamma \vdash \widehat{\vartheta} \cap \vartheta_{s,c}} & \rightarrow \vartheta_{s,c} \dashv \Delta}{\Gamma \vdash \widehat{\vartheta} \cap \vartheta_{s,c}} & \rightarrow \vartheta_{s,c} \dashv \Delta} & \text{ExUNumCUni} \\ \hline \frac{\Gamma \vdash \widehat{\vartheta}_{s_1,c_1}}{\Gamma \vdash \widehat{\vartheta}_{s_1,c_1}} & \Gamma \vdash \vartheta_{s_2,c_2} \\ \hline -s_1 \lor s_2 & c_1 \leq c_2 \\ \hline \Gamma \vdash \widehat{\vartheta}_{s_1,c_1} \cap \vartheta_{s_2,c_2} \Rightarrow \vartheta_{s_2,c_2} \dashv \Delta} & \text{ExSNumCUni} \\ \hline \frac{\Gamma \vdash \widehat{\alpha} \quad \Gamma \vdash \alpha}{\Gamma \vdash \widehat{\alpha} \cap \alpha \Rightarrow \alpha \dashv \Delta} & \text{ExAnyCUni} \\ \hline \frac{\Gamma \vdash \forall \alpha. A \quad \Gamma \vdash B \quad \Gamma, \widehat{\alpha} \vdash A[\alpha := \widehat{\alpha}] \cap B \Rightarrow B \dashv \Delta}{\Gamma \vdash \forall \alpha. A \cap B \Rightarrow B \dashv \Delta} & \text{ForAllUni} \\ \hline \frac{\Gamma \vdash A \cap B \Rightarrow C \dashv \Delta}{\Gamma \vdash B \cap A \Rightarrow C \dashv \Delta} & \text{CommutUni} \\ \hline \Gamma \vdash A_1 \cap B_1 \Rightarrow C_1 \dashv \Theta_1 \\ \Theta_1 \vdash A_2 \cap B_2 \Rightarrow C_2 \dashv \Theta_2 \\ \vdots \\ \frac{\Theta_{n-1} \vdash A_n \cap B_n \Rightarrow C_n \dashv \Theta_n}{\Gamma \vdash A_i \cap B_i} & \text{Unis} \\ \hline \Gamma \vdash A_i \cap B_i \xrightarrow{\widehat{\mu}} & C_i \dashv \Theta_n \\ \hline \Gamma \vdash A_i \cap B_i \xrightarrow{\widehat{\mu}} & C_i \dashv \Theta_n \\ \hline \end{array} \text{Unis}$$

Figure 7: Unification of types

 $\Gamma \vdash e \Leftarrow A \dashv \Delta$  Under input context  $\Gamma$ , e checks against input type A, with output context  $\Delta$ 

 $\Gamma \vdash e \Rightarrow A \dashv \Delta$  Under input context  $\Gamma$ , e synthesizes output type A, with output context  $\Delta$ 

$$\boxed{\Gamma \vdash e_i \overset{\mathfrak{n}}{\underset{i=1}{\Longrightarrow}} A_i \dashv \Delta} \text{ Under input context } \Gamma, \{e_i\}_1^{\mathfrak{n}} \text{ synthesizes types } \{A_i\}_1^{\mathfrak{n}}, \text{ with output context } \Delta$$

$$\frac{\Gamma \vdash e \Rightarrow A \dashv \Theta \qquad \Theta \vdash A \cap B \Longrightarrow C \dashv \Delta}{\Gamma \vdash e \Leftarrow B \dashv \Delta} \quad \text{UniToCheck}$$
 
$$\frac{\Gamma \vdash \Gamma[x] \Rightarrow A \dashv \Delta}{\Gamma \vdash x \Rightarrow A \dashv \Delta} \quad \text{IdSyn}$$
 
$$\frac{x = \text{true} \lor x = \text{false}}{\Gamma \vdash x \Rightarrow \text{Bool} \dashv \Gamma} \quad \text{BoolSyn}$$
 
$$\frac{x = \text{NuC} \qquad C > 0 \qquad 0 \leq N < 2^{C}}{\Gamma \vdash x \Rightarrow \vartheta_{u,C} \dashv \Gamma} \quad \text{SNumLitAnnSyn}$$
 
$$\frac{x = \text{NiC} \qquad C > 0 \qquad -2^{C-1} \leq N \leq 2^{C-1} - 1}{\Gamma \vdash x \Rightarrow \vartheta_{i,C} \dashv \Gamma} \quad \text{UNumLitAnnSyn}$$
 
$$\frac{x = N \qquad N \in \mathbb{Z} \qquad \text{sz} = \text{min-width}(N) \qquad \text{sgn} = \text{sign}(N)}{\Gamma \vdash x \Rightarrow \widehat{\vartheta}_{\text{sgn,sz}} \dashv \Delta} \quad \text{NLitSyn}$$
 
$$\frac{\Gamma \vdash e \Rightarrow \tau \dashv \Delta \qquad \Delta \vdash \text{Vec}(\tau, 1)}{\Gamma \vdash [e] \Rightarrow \text{Vec}(\tau, 1) \dashv \Delta} \quad \text{Vec1LitSyn}$$
 
$$\frac{\Gamma \vdash [e_1, \dots, e_n] \Rightarrow \text{Vec}(\tau, i) \dashv \Theta_1}{\Theta_1 \vdash e_{n+1} \Rightarrow \sigma \dashv \Theta_2}$$
 
$$\frac{\Theta_2 \vdash \sigma \cap \tau \Longrightarrow \pi \dashv \Delta}{\Gamma \vdash [e_1, \dots, e_{n+1}] \Rightarrow \text{Vec}(\pi, i + 1) \dashv \Delta} \quad \text{VecLitSyn}$$
 
$$\frac{\text{op} \in [+, -, /, \cdot]}{\Gamma \vdash 1 \Rightarrow \tau \dashv \Theta} \quad \Gamma \vdash r \Rightarrow \sigma \dashv \Theta$$
 
$$\frac{\text{num}(\tau) \quad \text{num}(\sigma)}{\text{num}(\sigma)} \quad \text{num}(\sigma)$$
 
$$\frac{\Theta \vdash \tau \cap \sigma \Longrightarrow \pi \dashv \Delta}{\Gamma \vdash 1 \text{ op} \ r \Rightarrow \pi \dashv \Delta} \quad \text{NumFieldOpSyn}$$

$$\begin{split} & \text{op} \in [<, \le, >, \ge] \\ & \Gamma \vdash l \Rightarrow \tau \dashv \Theta \qquad \Gamma \vdash r \Rightarrow \sigma \dashv \Theta \\ & \text{num}(\tau) \qquad \text{num}(\sigma) \\ & \Theta \vdash \tau \Leftarrow \sigma \dashv \Delta \\ \hline & \Gamma \vdash l \text{ op } r \Rightarrow \mathsf{Bool} \dashv \Delta \\ \hline & \text{op} \in [=, ! =] \\ & \Gamma \vdash l \Rightarrow \tau \dashv \Theta \qquad \Gamma \vdash r \Rightarrow \sigma \dashv \Theta \\ & \Theta \vdash \tau \Leftarrow \sigma \dashv \Delta \\ \hline & \Gamma \vdash l \text{ op } r \Rightarrow \mathsf{Bool} \dashv \Delta \\ \hline & \Gamma \vdash l \text{ op } r \Rightarrow \mathsf{Bool} \dashv \Delta \\ \hline & \Gamma \vdash l \text{ op } r \Rightarrow \mathsf{Bool} \dashv \Delta \\ \hline & \Gamma \vdash l \text{ op } r \Rightarrow \mathsf{Bool} \dashv \Delta \\ \hline & \Gamma \vdash l \text{ op } r \Rightarrow \mathsf{Bool} \dashv \Delta \\ \hline & \Gamma \vdash l \text{ op } r \Rightarrow \mathsf{Bool} \dashv \Delta \\ \hline & \Gamma \vdash l \text{ op } r \Rightarrow \mathsf{Bool} \dashv \Delta \\ \hline & \Gamma \vdash l \text{ op } r \Rightarrow \mathsf{Bool} \dashv \Delta \\ \hline & \Gamma \vdash l \text{ op } r \Rightarrow \mathsf{Bool} \dashv \Delta \\ \hline & \Gamma \vdash l \text{ op } r \Rightarrow \mathsf{Bool} \dashv \Delta \\ \hline & \Gamma \vdash l \text{ op } r \Rightarrow \mathsf{Bool} \dashv \Delta \\ \hline & \Gamma \vdash l \text{ op } r \Rightarrow \mathsf{Bool} \dashv \Delta \\ \hline & \Gamma \vdash l \text{ op } r \Rightarrow \mathsf{Bool} \dashv \Delta \\ \hline & \Gamma \vdash l \text{ op } r \Rightarrow \mathsf{Bool} \dashv \Delta \\ \hline & \Gamma \vdash l \text{ op } r \Rightarrow \mathsf{Bool} \dashv \Delta \\ \hline & \Gamma \vdash l \text{ op } r \Rightarrow \mathsf{Bool} \dashv \Delta \\ \hline & \Gamma \vdash l \text{ op } r \Rightarrow \mathsf{Bool} \dashv \Delta \\ \hline & \Gamma \vdash l \text{ op } r \Rightarrow \mathsf{Bool} \dashv \Delta \\ \hline & \Gamma \vdash l \text{ op } r \Rightarrow \mathsf{Bool} \dashv \Delta \\ \hline & \Gamma \vdash l \text{ op } r \Rightarrow \mathsf{Bool} \dashv \Delta \\ \hline & \Gamma \vdash l \text{ op } r \Rightarrow \mathsf{Bool} \dashv \Delta \\ \hline & \Gamma \vdash l \text{ op } r \Rightarrow \mathsf{Bool} \dashv \Delta \\ \hline & \Gamma \vdash l \text{ op } r \Rightarrow \mathsf{Bool} \dashv \Delta \\ \hline & \Gamma \vdash l \text{ op } r \Rightarrow \mathsf{Bool} \dashv \Delta \\ \hline & \Gamma \vdash l \text{ op } r \Rightarrow \mathsf{Bool} \dashv \Delta \\ \hline & \Gamma \vdash l \text{ op } r \Rightarrow \mathsf{Bool} \dashv \Delta \\ \hline & \Gamma \vdash l \text{ op } r \Rightarrow \mathsf{Bool} \dashv \Delta \\ \hline & \Gamma \vdash l \text{ op } r \Rightarrow \mathsf{Bool} \dashv \Delta \\ \hline & \Gamma \vdash l \text{ op } r \Rightarrow \mathsf{Bool} \dashv \Delta \\ \hline & \Gamma \vdash l \text{ op } r \Rightarrow \mathsf{Bool} \dashv \Delta \\ \hline & \Gamma \vdash l \text{ op } r \Rightarrow \mathsf{Bool} \dashv \Delta \\ \hline & \Gamma \vdash l \text{ op } r \Rightarrow \mathsf{Bool} \dashv \Delta \\ \hline & \Gamma \vdash l \text{ op } r \Rightarrow \mathsf{Bool} \dashv \Delta \\ \hline & \Gamma \vdash l \text{ op } r \Rightarrow \mathsf{Bool} \dashv \Delta \\ \hline & \Gamma \vdash l \text{ op } r \Rightarrow \mathsf{Bool} \dashv \Delta \\ \hline & \Gamma \vdash l \text{ op } r \Rightarrow \mathsf{Bool} \dashv \Delta \\ \hline & \Gamma \vdash l \text{ op } r \Rightarrow \mathsf{Bool} \dashv \Delta \\ \hline & \Gamma \vdash l \text{ op } r \Rightarrow \mathsf{Bool} \dashv \Delta \\ \hline & \Gamma \vdash l \text{ op } r \Rightarrow \mathsf{Bool} \dashv \Delta \\ \hline & \Gamma \vdash l \text{ op } r \Rightarrow \mathsf{Bool} \dashv \Delta \\ \hline & \Gamma \vdash l \text{ op } r \Rightarrow \mathsf{Bool} \dashv \Delta \\ \hline & \Gamma \vdash l \text{ op } r \Rightarrow \mathsf{Bool} \dashv \Delta \\ \hline & \Gamma \vdash l \text{ op } r \Rightarrow \mathsf{Bool} \dashv \Delta \\ \hline & \Gamma \vdash l \text{ op } r \Rightarrow \mathsf{Bool} \dashv \Delta \\ \hline & \Gamma \vdash l \text{ op } r \Rightarrow \mathsf{Bool} \dashv \Delta \\ \hline & \Gamma \vdash l \text{ op } r \Rightarrow \mathsf{Bool} \dashv \Delta \\ \hline & \Gamma \vdash l \text{ op } r \Rightarrow \mathsf{Bool} \dashv \Delta \\ \hline & \Gamma \vdash l \text{ op } r \Rightarrow \mathsf{Bool} \dashv \Delta \\ \hline & \Gamma \vdash l \text{ op } r \Rightarrow \mathsf{Bool} \dashv \Delta \\ \hline & \Gamma \vdash l \text{ op } r \Rightarrow \mathsf{Bool} \dashv \Delta \\ \hline & \Gamma \vdash l \text{ op } r \Rightarrow \mathsf{Pool$$

Figure 8: Inference and checking of types

### References

- [1] Guy E. Blelloch. *Prefix Sums and Their Applications*. CMU-CS-90-190. School of Computer Science, Carnegie Mellon University, Nov. 1990.
- [2] Jana Dunfield and Neelakantan R. Krishnaswami. "Complete and easy bidirectional typechecking for higher-rank polymorphism". In: *Proceedings of the 18th ACM SIG-PLAN international conference on Functional programming* (2013). URL: https://api.semanticscholar.org/CorpusID:7586176.
- [3] David Durst et al. "Type-directed scheduling of streaming accelerators". In: Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation. PLDI '20: 41st ACM SIGPLAN International Conference on Programming Language Design and Implementation. London UK: ACM, June 11, 2020, pp. 408–422. ISBN: 978-1-4503-7613-6. DOI: 10.1145/3385412.3385983. URL: https://dl.acm.org/doi/10.1145/3385412.3385983 (visited on 05/14/2024).
- [4] Matthew Gordon. µ: A Functional Programming Language for Digital Signal Processing. Apr. 9, 2003. URL: https://www.cs.unb.ca/tech-reports/honours-theses/Matthew.Gordon-4997.pdf.
- [5] Andrew Lenharth and Chris Lattner. CIRCT: Lifting hardware development out of the 20th century. Nov. 17, 2021. URL: https://llvm.org/devmtg/2021-11/slides/2021-CIRCT-LiftingHardwareDevOutOfThe20thCentury.pdf.
- [6] Mathworks. Support SPI Communication. URL: https://www.mathworks.com/help/simulink/supportpkg/raspberrypi\_ug/support-spi-communication.html.
- [7] Rachit Nigam et al. "A compiler infrastructure for accelerator generators". In: Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems. ASPLOS '21: 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems. Virtual USA: ACM, Apr. 19, 2021, pp. 804–817. ISBN: 978-1-4503-8317-2. DOI: 10.1145/3445814.3446712. URL: https://dl.acm.org/doi/10.1145/3445814.3446712 (visited on 05/11/2024).
- [8] Gordon Stewart et al. "Ziria: A DSL for Wireless Systems Programming". In: Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems. ASPLOS '15: Architectural Support for Programming Languages and Operating Systems. Istanbul Turkey: ACM, Mar. 14, 2015, pp. 415–428. ISBN: 978-1-4503-2835-7. DOI: 10.1145/2694344.2694368. URL: https://dl.acm.org/doi/10.1145/2694344.2694368 (visited on 05/11/2024).