<a href="https://colab.research.google.com/github/SiliconJackets/sscs-ose-code-a-chip.github.io/blob/main/VLSI24/submitted_notebooks/SJSystolicArray/SystolicArray.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Row Stationary Systolic Array With Openlane

```
Copyright 2023 SiliconJackets @ Georgia Institute of Technology
SPDX-License-Identifier: GPL-3.0-or-later
```

Running a 3x3 systolic array design inspired by [EYERISS](https://courses.cs.washington.edu/courses/cse550/21au/papers/CSE550.Eyeriss.pdf) design thru the [OpenLane](https://github.com/The-OpenROAD-Project/OpenLane/) GDS to RTL flow targeting the [open source SKY130 PDK](https://github.com/google/skywater-pdk/).

|Name|Affiliation| Email |IEEE Member|SSCS Member|
|:--:|:----------:|:----------:|:----------:|:----------:|
|Zachary Ellis|Georgia Institute of Technology|zellis7@gatech.edu|Yes|Yes|
|Nealson Li|Georgia Institute of Technology|nealson@gatech.edu|Yes|Yes|
|Addison Elliott|Georgia Institute of Technology|addisonelliott@gatech.edu|Yes|Yes|
|Zeyan Wu|Georgia Institute of Technology|zwu477@gatech.edu|Yes|Yes|

This notebook goes through the process of design specification, simulation, and implementation of Systolic Array with open-source tools and PDKs.
The parallel computation and data reuse ability of a systolic array is crutial for the acceleration of neural networks, and this notebook with the reusable design aims to contribute to the hardware open-source community to enable more efficient ML applications.
This project will explain the principles behind how a systolic array operates 2D convolution, demonstrate the performance of our implementation with image results, and show the final GDS generated with open-source flow.
Additionally, to further demonstrate the feasibility of the open source flow and our design, we are also submitting this systolic array design to the open source silicon initiative, [Tiny Tapeout](https://tinytapeout.com/).



<!-- In this notebook we will be going through the process of design specification, simulation, and implementation for a resource constrained design intended to submitted to another open source silicon initiative [Tiny Tapeout](https://tinytapeout.com/). Tiny tapeout allows individuals to purchase tiny 160um x 100um blocks on a silicon die for an acceptable price in order to gain exposure to the semiconductor deisng process. Our plan is to create a hardware accelerator for Convolutional Neural Networks \(CNNs\), losely based on the design from [EYERISS](https://courses.cs.washington.edu/courses/cse550/21au/papers/CSE550.Eyeriss.pdf) for the purpose of recognizing hardwritten numbers. This project will demonstrate the principles behind how a systolic array operates for doing 2D convolution operators, demonstrate the performance of our implementation, and show how the final design can fit within *a few* tiny tapeout blocks. -->


<!-- From the github page

1. Promote reproducible chip design using open-source tools and notebook-driven design flows and
2. Enable up-and-coming talents as well as seasoned open-source enthusiasts to travel to IEEE SSCS conferences and interact with the leading-edge chip design community.

Applicants must submit an open-source Jupyter notebook detailing an innovative circuit design using open-source tools. The objective is to disseminate the main ideas and design choices using open-source tools and PDKs in a reproducible manner. Generating a final layout of your circuit is encouraged but not required. -->


## Introduction
---


In this noteboook, we will first explain what a systolic array is and its application by referencing the row stationary data flow introduced in [EYERISS](https://courses.cs.washington.edu/courses/cse550/21au/papers/CSE550.Eyeriss.pdf), which our design is losely based on. Then, the hardware specification and design of the high level architecture and processing unit are explained. We will then demonstrate the performance by simulating the hardware design to perform convolution for an edge detection task, and varify it with the software golden referrence. Lastly, the systolic array is poshed through [OpenLane](https://github.com/The-OpenROAD-Project/OpenLane/) RTL to GDS flow with the open-source [SKY130 PDK](https://github.com/google/skywater-pdk/).

## Systolic Array
---

### What is a Systolic Array

1. Genrel intro to systolic array and different scemes (output stationary...)
2. The focus of this work is row stationary

#### Row Stationary Dataflow

In a row stationary dataflow, the individual processing elements in a systolic array each have small amounts of scratchpad memory whish is devoted to keep row value data in place while it is operated on. In this mode, each processing element computes a single output from a 1D convolution operation computed locally and then those partial sums are added down the colums for the final outputs. During the intial loading of the filter weights and row data, the full scratchpads need to be populated before any computation can occur, but as the convolution operation moves across the rows, only one new byte of data needs to be read per PE making this form of 2D convolution operation very memory efficient.

<div>
<img src="img/systolic_array_flow.gif" width="1000"/>
</div>

<!-- ![Flow](https://github.com/SiliconJackets/sscs-ose-code-a-chip.github.io/blob/main/VLSI24/submitted_notebooks/SJSystolicArray/img/systolic_array_flow.gif?raw=true){: width=250} -->

#### Applications

Explain convolution can be accalerated through paralell computing and data reuse

### How is the hardware designed?

Intro the top design (1 controll, 9 PE) and ports

#### Top Level Design

Detailed top level design

#### PE Design

Detailed PE design

<div>
<img src="img/PE.png" width="1000"/>
</div>

<!-- ![Flow](https://github.com/SiliconJackets/sscs-ose-code-a-chip.github.io/blob/main/VLSI24/submitted_notebooks/SJSystolicArray/img/PE.png?raw=true){width=250} -->

## Simulation
---

### Edge Detection with 2D Convolution accelarated by Systolic Array 
<!-- Explain convolving the input image with sobel filter emhances the edge of the objects, and the convolution can be accelerated by the Systolic array. -->

To demonstrate the our systolic array's abbility to accelerate convolution operation, we are performing Canny edge detection witch requires convolving an image. The edges of an image is enhanced after it is convolved with Sobel filters in x and y direction seperately. The filters are 3x3 kernels show as below:

$$\text{Sobel Filter x} = 
\begin{bmatrix} 
-0.5 & 0 & 0.5 \\
-1.0 & 0 & 1.0 \\
-0.5 & 0 & 0.5
\end{bmatrix}$$

$$\text{Sobel Filter y} = 
\begin{bmatrix} 
-0.5 & -1.0 & -0.5 \\
0 & 0 & 0 \\
0.5 & 1.0 & 0.5
\end{bmatrix}$$

The results are the first derivative in the x and y directions, $grad_x$ and $grad_y$, we can then iterate through all the pixels and calculate the intensity gradient of the image, which represents the edges, with:

$$\text{Grad Intensity} = \sqrt{grad_x ^ 2 + grad_y ^ 2}$$

We have a python implementation of the Canny edge detection algorithm as our golden referrence to verify or systolic array design. Dedicated data sequence generator is developed with the hardware architecture data flow to process the image and kernels and generate input sequence to the systolic array. The example image that undergoes the convolusion operation in both the hardware simulation and the software is:

<div>
<img src="src/python/rubiks_cube.jpg"/>
</div>

The demonstration has following steps:
1. Install the software dependencies
2. Download the python and verilog files of our design
3. Run convolution in both software and hardware:

    a. Grayscale and resize the input image to 256 by 256
    
    b. In software, performe convolution and generate the golden image
    
    c. In hardware, performe convolution

    d. In software, process the hardware result and generate the output image

4. Compare the golden image and the output image

We would first demonstrate with the rubik's cube image, after this example you can upload any image to try it out, and see how well the systolic array acceleratre edge detection is performing.


<!-- 
Explain image size, show original image.

Explain the following experiment steps -->

In [None]:
#@title Install Dependencies {display-mode: "form"}
#@markdown Click the ▷ button to setup the simulation environment.

#@markdown Main components we will install

#@markdown *   verilator : a free and open-source software tool which converts Verilog (a hardware description language) to a cycle-accurate behavioral model in C++ or SystemC.
#@markdown *   pytorch :
#@markdown *   opencv :
#@markdown *   fxpmath : This module helps emulate the floating point math behavior of our systolic array


!apt-get install verilator
!pip install torch
!pip install torchvision
!pip install opencv-python
!pip install fxpmath
!pip install numpy

zsh:1: command not found: apt-get
[33mDEPRECATION: Loading egg at /usr/local/lib/python3.11/site-packages/scalesim-2.0.2-py3.11.egg is deprecated. pip 24.3 will enforce this behaviour change. A possible replacement is to use pip for package installation.. Discussion can be found at https://github.com/pypa/pip/issues/12330[0m[33m

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.3.2[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3.11 -m pip install --upgrade pip[0m
[33mDEPRECATION: Loading egg at /usr/local/lib/python3.11/site-packages/scalesim-2.0.2-py3.11.egg is deprecated. pip 24.3 will enforce this behaviour change. A possible replacement is to use pip for package installation.. Discussion can be found at https://github.com/pypa/pip/issues/12330[0m[33m

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23

In [1]:
%%capture

#@title Download Systolic Array Files

#@markdown Click the ▷ button to download the rtl files.
#@markdown The files will be downloaded to the SytolicArray directory
#@markdown the file structure is described below:

#@markdown *   SystolicArray/src
#@markdown    *  python/
#@markdown       *   `canny.py` : python implementation of the Canny Edge Detection algorithm
#@markdown       *   `full_flow.py` : performes the edge detection on a given image with either software or hardware
#@markdown       *   `rubiks_cube.jpg` : the default example image
#@markdown       *   `seq_generator.py` : generates the Systolic Array input sequence from the image and kernel
#@markdown    *   `PE.sv` :
#@markdown    *   `tb_top.cpp` :
#@markdown    *   `top.sv` :
#@markdown    *   `topLevelControl.sv` :

%cd /content/
!rm -rf SystolicArray
!git clone https://github.com/SiliconJackets/sscs-ose-code-a-chip.github.io.git SystolicArray
!mv SystolicArray/VLSI24/submitted_notebooks/SJSystolicArray/src SystolicArray/
!mv SystolicArray/VLSI24/submitted_notebooks/SJSystolicArray/img SystolicArray/
!rm -rf SystolicArray/ISSCC23/
!rm -rf SystolicArray/ISSCC24/
!rm -rf SystolicArray/VLSI23/
!rm -rf SystolicArray/VLSI24/
!rm SystolicArray/*.md
!rm SystolicArray/LICENSE

### Compile Verilator Testbench

In [None]:
%cd /content/
!rm -rf obj_dir
!verilator --trace --cc SystolicArray/src/top.sv SystolicArray/src/topLevelControl.sv SystolicArray/src/PE.sv --exe SystolicArray/src/tb_top.cpp
!make -C obj_dir -f Vtop.mk Vtop

[Errno 2] No such file or directory: '/content/'
/Users/addisonelliott/Desktop/Spring 2024 Courses/HML/sscs-ose-code-a-chip.github.io/VLSI24/submitted_notebooks/SJSystolicArray
zsh:1: command not found: verilator
^C


### Run 2D Convolution in both Software and Hardware

In [None]:
%cd /content/SystolicArray/src/python/
!python3 full_flow.py rubikscube

/content/SystolicArray/src/python
Parameter containing:
tensor([[[[-0.5000,  0.0000,  0.5000],
          [-1.0000,  0.0000,  1.0000],
          [-0.5000,  0.0000,  0.5000]]]], requires_grad=True)
Parameter containing:
tensor([[[[-0.5000, -1.0000, -0.5000],
          [ 0.0000,  0.0000,  0.0000],
          [ 0.5000,  1.0000,  0.5000]]]], requires_grad=True)
Systolic Array Result Correct: False


In [None]:
#@title Compare Results

#@markdown Because the hardware is limited to 8 bit integer math, the output is not as bright as the software version, but it is still able to achieve a similar looking result


# code for displaying multiple images in one figure

#import libraries
import cv2
from matplotlib import pyplot as plt

# create figure
fig = plt.figure(figsize=(10, 7))

# setting values to rows and column variables
rows = 1
columns = 3

# reading images
Image1 = cv2.imread('/content/SystolicArray/src/python/rubiks_cube.jpg')
Image2 = cv2.imread('/content/SystolicArray/src/python/edge_rubiks_cube.jpg')
Image3 = cv2.imread('/content/SystolicArray/src/python/edge_rubiks_cube_sa.jpg')

#Adds a subplot at the 1st position
fig.add_subplot(rows, columns, 1)

# showing image
plt.imshow(Image1)
plt.axis('off')
plt.title("Original")

# Adds a subplot at the 2nd position
fig.add_subplot(rows, columns, 2)

# showing image
plt.imshow(Image2)
plt.axis('off')
plt.title("Software Edge Detection")

# Adds a subplot at the 3rd position
fig.add_subplot(rows, columns, 3)

# showing image
plt.imshow(Image3)
plt.axis('off')
plt.title("Systolic Array Edge Detection")


### Try it yourself

In [None]:
#@markdown Click the ▷ button to upload your own image for edge detection

from google.colab import files
import ipywidgets as widgets
from IPython.display import display, clear_output

UPLOADED = False

def upload_image(_):
    clear_output()
    upload_widget = widgets.FileUpload(accept='.jpg', multiple=False)
    display(upload_widget)
    upload_widget.observe(save_image, names='value')

def save_image(change):
    global UPLOADED
    if change.new:
        uploaded_filename = next(iter(change.new))
        content = change.new[uploaded_filename]['content']
        with open('/content/SystolicArray/src/python/uploadedimage.jpg', 'wb') as f:
            f.write(content)
            UPLOADED = True
        print('Image successfully uploaded!')
    else:
        print('Please select a file.')

upload_button = widgets.Button(description="Upload Image")
upload_button.on_click(upload_image)
display(upload_button)

In [None]:
#@markdown Click the ▷ button to start the demonstration with your image

%cd /content/SystolicArray/src/python/
import cv2
from matplotlib import pyplot as plt
if not UPLOADED:
    print("First, upload a jpg in the cell above")
else:
    !python3 full_flow.py userinput
    #@markdown Because the hardware is limited to 8 bit integer math, the output is not as bright as the software version, but it is still able to achieve a similar looking result


    # code for displaying multiple images in one figure


    # create figure
    fig = plt.figure(figsize=(10, 7))

    # setting values to rows and column variables
    rows = 1
    columns = 3

    # reading images
    Image1 = cv2.imread('/content/SystolicArray/src/python/uploadedimage.jpg')
    Image2 = cv2.imread('/content/SystolicArray/src/python/edge_uploadedimage.jpg')
    Image3 = cv2.imread('/content/SystolicArray/src/python/edge_uploadedimage_sa.jpg')

    #Adds a subplot at the 1st position
    fig.add_subplot(rows, columns, 1)

    # showing image
    plt.imshow(Image1)
    plt.axis('off')
    plt.title("Original")

    # Adds a subplot at the 2nd position
    fig.add_subplot(rows, columns, 2)

    # showing image
    plt.imshow(Image2)
    plt.axis('off')
    plt.title("Software Edge Detection")

    # Adds a subplot at the 3rd position
    fig.add_subplot(rows, columns, 3)

    # showing image
    plt.imshow(Image3)
    plt.axis('off')
    plt.title("Systolic Array Edge Detection")

### RTL2GDS Flow

In [4]:
#@title Install Dependencies {display-mode: "form"}
#@markdown Click the ▷ button to setup the digital design environment based on [conda-eda](https://github.com/hdl/conda-eda).

#@markdown Main components we will install

#@markdown *   Open_pdks.sky130a : a PDK installer for open-source EDA tools.
#@markdown *   Openlane : an automated RTL to GDSII flow based on several components including OpenROAD, Yosys, Magic, Netgen, CVC, SPEF-Extractor, KLayout and a number of custom scripts for design exploration and optimization.
#@markdown *   GDSTK : a C++ library for creation and manipulation of GDSII and OASIS files.

openlane_version = 'custom_set' #@param {type:"string"}
open_pdks_version = 'custom_set' #@param {type:"string"}

if openlane_version == 'latest':
  openlane_version = ''
if open_pdks_version == 'latest':
  open_pdks_version = ''

import os
import pathlib

!curl -Ls https://micro.mamba.pm/api/micromamba/linux-64/latest | tar -xj bin/micromamba
conda_prefix_path = pathlib.Path('conda-env')
CONDA_PREFIX = str(conda_prefix_path.resolve())
!bin/micromamba create --yes --prefix $CONDA_PREFIX
!echo 'python ==3.7*' >> {CONDA_PREFIX}/conda-meta/pinned
!CI=0 bin/micromamba install --yes --prefix $CONDA_PREFIX \
                     --channel litex-hub \
                     --channel main \
                     openlane={"2023.11.03_0_gf4f8dad8"} \
                     open_pdks.sky130a={"1.0.458_0_g8c68aca"} \
                     openroad={"2.0_10927_g0922eecb9"} \
                     verilator={"5.018_57_ga022b672a"}
!bin/micromamba install --quiet \
                        --yes \
                        --prefix $CONDA_PREFIX \
                        --channel conda-forge \
                        --channel main \
                        gdstk
!pip install libparse
PATH = os.environ['PATH']
%env CONDA_PREFIX={CONDA_PREFIX}
%env PATH={CONDA_PREFIX}/bin:{PATH}

Empty environment created at prefix: /content/conda-env

Pinned packages:
  - python 3.7*


Transaction

  Prefix: /content/conda-env

  Updating specs:

   - openlane=2023.11.03_0_gf4f8dad8
   - open_pdks.sky130a=1.0.458_0_g8c68aca
   - openroad=2.0_10927_g0922eecb9
   - verilator=5.018_57_ga022b672a


  Package                                                Version  Build                 Channel         Size
──────────────────────────────────────────────────────────────────────────────────────────────────────────────
  Install:
──────────────────────────────────────────────────────────────────────────────────────────────────────────────

  [32m+ open_pdks.sky130a                   [0m      1.0.458_0_g8c68aca  20231104_052339       litex-hub[32m     Cached[0m
  [32m+ _libgcc_mutex                       [0m                     0.1  main                  main     [32m     Cached[0m
  [32m+ libstdcxx-ng                        [0m                  11.2.0  h1234567_1            m

In [6]:
%%writefile config.json
{
    "DESIGN_NAME": "top",
    "VERILOG_FILES": "dir::SystolicArray/src/*.sv",
    "CLOCK_PERIOD": 40,
    "CLOCK_NET": "clk",
    "CLOCK_PORT": "clk",

    "FP_SIZING": "absolute",
    "DIE_AREA": "0 0 480 200",
    "PL_TARGET_DENSITY": 0.8
}

Writing config.json


In [5]:
from libparse import LibertyParser

In [7]:
%env PDK=sky130A
!flow.tcl -design .

env: PDK=sky130A
OpenLane 2023.11.03_0_gf4f8dad8-conda
All rights reserved. (c) 2020-2022 Efabless Corporation and contributors.
Available under the Apache License, version 2.0. See the LICENSE file for more details.

[36m[INFO]: Using configuration in 'config.json'...[39m
[36m[INFO]: PDK Root: /content/conda-env/share/pdk[39m
[36m[INFO]: Process Design Kit: sky130A[39m
[36m[INFO]: Standard Cell Library: sky130_fd_sc_hd[39m
[36m[INFO]: Optimization Standard Cell Library: sky130_fd_sc_hd[39m
[36m[INFO]: Run Directory: /content/runs/RUN_2024.04.13_04.18.31[39m
[36m[INFO]: Saving runtime environment...[39m
[36m[INFO]: Preparing LEF files for the nom corner...[39m
[36m[INFO]: Preparing LEF files for the min corner...[39m
[36m[INFO]: Preparing LEF files for the max corner...[39m
[36m[INFO]: Running linter (Verilator) (log: runs/RUN_2024.04.13_04.18.31/logs/synthesis/linter.log)...[39m
[36m[INFO]: 0 errors found by linter[39m
[STEP 1]
[36m[INFO]: Running Synthesis (l

In [None]:
!pip install CairoSVG
import pathlib
import gdstk
from cairosvg import svg2png

gdss = sorted(pathlib.Path('runs').glob('*/results/final/gds/*.gds'))
library = gdstk.read_gds(gdss[-1])
top_cells = library.top_level()
top_cells[0].write_svg('systolicarray.svg')
f = open("systolicarray.svg")
svg2png(bytestring=f.read().encode("utf-8"),write_to="systolicarray.png")
f.close()

