<div align='center'>
<font size = 7><font face="Product-Sans"><b><font color= "4285F4">G</font><font color= "DB4437">o</font><font color = "F4B400">o</font><font color= "4285F4">g</font><font color= "0F9D58">l</font><font color= "DB4437">e</font></b></font> - <font color = "C99700">Notre Dame</font></a> <font color = "4285F4">XLS Playground</font></font>
</div>

<div align='center'>
<img src='https://google.github.io/xls/images/xls_logo.svg' alt='XLS Logo' width=400><img src='https://raw.githubusercontent.com/mmorri22/cse30342/main/ND%20Chip%20Logo.png' alt='ND Chip Logo' width=180>
<img src="https://opensource.google/static/images/os-anim-main.gif" width=180>
</div>

<div align='center'>
<font size = 6><font color = "00843D">Reading 14 - The Synchronous Model of Computation</font></a></font>
</div>

## XLS Setup Reminder

For each new Colab notebook, you will need to run the XLS setup again. If your computer switches networks, or you restart, you will need to run those commands again. This consists of the same two setup steps from previous notebooks. You must run both in order to properly run the XLS flow.

> Note: Here is the common error message that will occur if you ran a DSLX cell and you need need to re-run the setup. If you encounter this message, simply re-run these two steps and the error will be resolved when you go back to that cell:
>
> <code>UsageError: Cell magic `%%dslx` not found.</code>


In [None]:
#@title Start-up Step 1: XLS and OpenRoad scripts {run:"auto"}

!rm -rf *

# Import required Python libraries
import os
import pathlib
import sys
import jinja2
import IPython.display
import PIL.Image
import graphviz
import pathlib

from IPython.display import display, display_png

# Set Stable XLS Version for classroom environment
xls_version = 'v0.0.0-4699-gfb023174' #@param {type:"string"}

!echo '📦 downloading xls-{xls_version}'
!curl --show-error -L https://github.com/proppy/xls/releases/download/{xls_version}/xls-{xls_version}-linux-x64.tar.gz | tar xzf - --strip-components=1
!echo '🧪 setting up colab integration'
!python -m pip install --quiet --no-cache-dir --ignore-installed https://github.com/proppy/xls/releases/download/{xls_version}/xls_colab-0.0.0-py3-none-any.whl
!python -m pip install logger
!python -m pip install colabtools
import logger
import xls.contrib.colab
_ = xls.contrib.colab.register_dslx_magic()

# Must verify xls_work_dir is created
!if test -d xls_work_dir; then echo "xls_work_dir exists"; else mkdir xls_work_dir;  fi

#@title  First Run Only #4 - OpenRoad Setup {run:"auto"}

yosys_version = '0.38_93_g84116c9a3' #@param {type:"string"}
openroad_version = '2.0_12381_g01bba3695' #@param {type:"string"}
rules_hdl_version = '2eb050e80a5c42ac3ffdb7e70392d86a6896dfc7' #@param {type:"string"}

# Install stable OpenROAD Version
!echo '🛣️ installing openroad and friends'
!curl -L -O https://repo.anaconda.com/miniconda/Miniconda3-py310_24.1.2-0-Linux-x86_64.sh
!bash Miniconda3-py310_24.1.2-0-Linux-x86_64.sh -b -p conda-env/
import pathlib
conda_prefix_path = pathlib.Path('conda-env')
CONDA_PREFIX = str(conda_prefix_path.resolve())
%env CONDA_PREFIX={CONDA_PREFIX}
!conda-env/bin/conda install -yq -c "litex-hub" openroad={openroad_version} yosys={yosys_version}

!python -m pip install gdstk tqdm

!gsutil cp gs://proppy-eda/pdk_info_asap7.zip .
!gsutil cp gs://proppy-eda/pdk_info_sky130.zip .

!unzip -q -o pdk_info_asap7.zip
!unzip -q -o pdk_info_sky130.zip

!echo '🧰 generating PDK metadata'
!curl --show-error -L  https://github.com/hdl/bazel_rules_hdl/archive/{rules_hdl_version}.tar.gz | tar xzf - --strip-components=1
!curl -L -O https://github.com/protocolbuffers/protobuf/releases/download/v24.3/protoc-24.3-linux-x86_64.zip
!unzip -q -o protoc-24.3-linux-x86_64.zip
!{sys.executable} -m pip install protobuf

!echo '📁 organizing PDK for XLS and OpenROAD Flows'
!wget https://raw.githubusercontent.com/mmorri22/cse30321/main/xls/xls_setup.py
!wget https://raw.githubusercontent.com/mmorri22/cse30321/main/xls/sky130_data_pdk_info.textproto
!python xls_setup.py
!mv /content/sky130_data_pdk_info.textproto /content/com_google_skywater_pdk_sky130_fd_sc_hd/sky130_data_pdk_info.textproto
!echo '🖼️ Setup for viewing 3D GDSII File'
!python -m pip install numpy
!python -m pip install gdspy
!python -m pip install numpy-stl
!python -m pip install triangle
!python -m pip install k3d

# gdspy is used to open the gds file
import gdspy

# Used to write the output stl file (Why we installed numpy-stl)
from stl import mesh

# Using numpy will permit fast calculations on lots of points
import numpy as np
import matplotlib

# Required to triangulate polygons
import triangle

# To render in 3d
import k3d

📦 downloading xls-v0.0.0-4699-gfb023174
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100 40.8M  100 40.8M    0     0  15.1M      0  0:00:02  0:00:02 --:--:-- 17.0M
🧪 setting up colab integration
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m182.1/182.1 kB[0m [31m3.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting logger
  Downloading logger-1.4.tar.gz (1.2 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: logger
  Building wheel for logger (setup.py) ... [?25l[?25hdone
  Created wheel for logger: filename=logger-1.4-py3-none-any.whl size=1759 sha256=e67b145d225292b29a25fa2c24bb43f7699cc7ae0e3b8640638a0823cc63c51d
  Stored in directory: /root/.cache/pip/wheels/fb/19/7b/09fc73f7503166eaf7f31b4aa0095b7f78af2ec0898e1f8312
Successfully

In [None]:
#@title Start-up Step 2: Select your PDK {run:"auto"}

pdk = 'sky130' #@param ["asap7", "sky130"] {allow-input: false}

xls.contrib.colab.pdk = pdk


#@title Select your PDK {run:"auto"}

!bin/protoc --python_out=. pdk/proto/pdk_info.proto
!ln -sf pdk/proto/pdk_info_pb2.py
import pdk_info_pb2

import enum
import dataclasses
import json
import pathlib
import subprocess
from typing import Any, Callable, Dict, Optional, Union

from google.colab import widgets
from google.protobuf import text_format
import pandas as pd

yosys = conda_prefix_path / 'bin/yosys'
openroad = conda_prefix_path / 'bin/openroad'
yosys_tcl = 'synthesis/synth.tcl'

default_work_dir = xls.contrib.colab.default_work_dir

def pdk_info_proto(
    path: pathlib.Path, optional: bool = False
) -> Optional[pdk_info_pb2.PdkInfoProto]:
  """Load PDK info from prototext.

  Args:
    path: path to prototext file.
    optional: if True, failure to access the pdk info will not produce an error.

  Returns:
    Decoded pdk info proto or None if optional.
  """
  if optional and not path.exists():
    return None
  with path.open('r') as f:
    proto = pdk_info_pb2.PdkInfoProto()
    text_format.Parse(f.read(), proto)
    return proto

pdks = {

    'asap7': {
        'delay_model': 'asap7',
        'pdk_info': pdk_info_proto(
            pathlib.Path('asap7/asap7_data_pdk_info.textproto'),
        ),
    },

    'sky130': {
        'delay_model': 'sky130',
        'pdk_info': pdk_info_proto(
            pathlib.Path('com_google_skywater_pdk_sky130_fd_sc_hd/sky130_data_pdk_info.textproto'),
        ),
    },
}

@dataclasses.dataclass(frozen=True)
class RelativeCoreArea:
  utilization_percent: float


@dataclasses.dataclass(frozen=True)
class AbsoluteCoreArea:
  core_width_microns: int
  core_padding_microns: int


@enum.unique
class ImplementationStep(enum.Enum):
  """Steps in the implementation flow."""

  XLS = 'xls'
  SYNTHESIS = 'synthesis'
  PLACEMENT = 'placement'


class PdkRuntimeError(RuntimeError):
  pass


class OpenroadRuntimeError(RuntimeError):
  pass


class OpenstaRuntimeError(RuntimeError):
  pass


class YosysRuntimeError(RuntimeError):
  pass


@dataclasses.dataclass(frozen=True)
class SynthesisResults:
  synth_v: pathlib.Path
  design_stats: pd.DataFrame
  cell_stats: pd.DataFrame


def run_synthesis(
    *,
    selected_pdk: Optional[str] = None,
    work_dir: pathlib.Path = default_work_dir,
    silent: bool = False,
) -> SynthesisResults:
  """Run synthesis with Yosys.

  Args:
    selected_pdk: The pdk to use.
    work_dir: Directory that contains verilog and will be where outputs are put.
    silent: Suppress output.

  Returns:
    Metrics from running synthesis.

  Raises:
    PdkRuntimeError: on PDK error.
    YosysRuntimeError: on yosys error.
  """
  if selected_pdk is None:
    selected_pdk = pdk
  pdk_info = pdks[selected_pdk]['pdk_info']
  if pdk_info is None:
    raise PdkRuntimeError(f'PDK "{selected_pdk}" is restricted')

  liberty = (pathlib.Path(pdk) / pathlib.Path(pdk_info.liberty_path).name).resolve()
  synth_v = (work_dir / 'user_module_synth.v').resolve()
  synth_v_flist = (work_dir / 'user_module_synth_v.flist').resolve()
  synth_uhdm_flist = (work_dir / 'user_module_synth_uhdm.flist').resolve()
  synth_uhdm_flist.touch()
  synth_stats_json = (work_dir / 'user_module_synth_stats.json').resolve()
  dont_use_args = ' '.join(
      f'-dont_use {pat}'
      for pat in pdk_info.do_not_use_cell_list
  )
  # run yosys synthesis
  with synth_v_flist.open('w') as f:
    top_v = work_dir / 'user_module.sv'
    f.write(str(top_v.resolve()))
  !FLIST='{synth_v_flist}' ABC_SCRIPT='' CONSTR='' TOP='user_module' OUTPUT='{synth_v}' UHDM_FLIST='{synth_uhdm_flist}' LIBERTY='{liberty}' STATS_JSON='{synth_stats_json}' DONT_USE_ARGS='{dont_use_args}' {yosys} -c '{yosys_tcl}'
  with synth_stats_json.open('r') as f:
    synth_stats = json.load(f)
  design_stats = synth_stats['design']
  cells_stats = design_stats.pop('num_cells_by_type')
  design_stats = pd.DataFrame.from_dict(
      design_stats, orient='index', columns=['cells']
  )
  cells_stats = pd.DataFrame.from_dict(
      cells_stats, orient='index', columns=['stats']
  )

  return SynthesisResults(
      synth_v=synth_v, design_stats=design_stats, cell_stats=cells_stats
  )


def run_opensta(
    *,
    selected_pdk: Optional[str] = None,
    work_dir: pathlib.Path = default_work_dir,
    silent: bool = False,
) -> pd.DataFrame:
  """Run OpenSta and collect timing metrics.

  Args:
    selected_pdk: The pdk to use.
    work_dir: Directory that contains verilog.
    silent: Suppress output.

  Returns:
    Dataframe containing timing report.

  Raises:
    OpenstaRuntimeError: on OpenSTA error.
    PdkRuntimeError: on PDK error.
  """
  if selected_pdk is None:
    selected_pdk = pdk
  pdk_info = pdks[selected_pdk]['pdk_info']
  if pdk_info is None:
    raise PdkRuntimeError(f'PDK "{selected_pdk}" is restricted')

  liberty = pathlib.Path(pdk) / pdk_info.liberty_path
  tech_lef = pathlib.Path(pdk) / pdk_info.tech_lef_path
  read_cell_lefs = '\n'.join(
      f'read_lef {pathlib.Path(pdk) / cell_lef_path}'
      for cell_lef_path in pdk_info.cell_lef_paths
  )
  synth_v = work_dir / 'user_module_synth.v'
  top = 'user_module'
  opensta_log = work_dir / 'user_module_sta.log'

  openroad_script = f"""
  sta::redirect_file_begin {opensta_log}
  read_lef {tech_lef}
  {read_cell_lefs}
  read_liberty {liberty}
  read_verilog {synth_v}
  link_design  {top}
  report_checks -unconstrained
  sta::redirect_file_end
  """
  openroad_tcl = work_dir / 'openroad_sta.tcl'
  with openroad_tcl.open('w') as f:
    f.write(openroad_script)

  # run opensta static timing analysis
  !{openroad} {openroad_tcl} -exit

  columns = ['delay', 'time', 'edge', 'net', 'gate']

  import re
  def sta_report_paths(opensta_log):
    with open(opensta_log) as f:
      sta_report = f.read()
    m = re.search(r'---+(.*)---+', sta_report, flags=re.M | re.S)
    for path in m.group(1).split('\n')[1:-2]:
      parts = path.split(None, maxsplit=len(columns) - 1)
      yield float(parts[0]), float(parts[1]), parts[2], parts[3], parts[4]

  df = pd.DataFrame.from_records(sta_report_paths(opensta_log), columns=columns)
  df['gate'] = df['gate'].str.replace('[()]', '', regex=True)

  return df


@dataclasses.dataclass(frozen=True)
class PlacementResults:
  openroad_global_placement_layout: pathlib.Path
  area: pd.DataFrame
  metrics: pd.DataFrame
  power: pd.DataFrame


def run_placement(
    *,
    clock_period_ps: int,
    placement_density: float,
    core_area: Union[RelativeCoreArea, AbsoluteCoreArea],
    selected_pdk: Optional[str] = None,
    work_dir: pathlib.Path = default_work_dir,
    silent: bool = False,
) -> PlacementResults:
  """Run OpenRoad placement.

  Args:
    clock_period_ps: Clock period in picoseconds.
    placement_density: Placement density in [0.0, 1.0].
    core_area: Relative or absolute core area specification.
    selected_pdk: The pdk to use.
    work_dir: Directory that contains verilog and will be where outputs are put.
    silent: Suppress output.

  Returns:
    Outputs from running placement.

  Raises:
    OpenroadRuntimeError: on OpenRoad error.
    OpenstaRuntimeError: on OpenSTA error.
    PdkRuntimeError: on PDK error.
    ValueError: on invalid inputs.
    YosysRuntimeError: on yosys error.
  """
  clock_period_ns = clock_period_ps / 1000.0
  if selected_pdk is None:
    selected_pdk = pdk
  pdk_info = pdks[selected_pdk]['pdk_info']
  if pdk_info is None:
    raise PdkRuntimeError(f'PDK "{selected_pdk}" is restricted')

  liberty = pathlib.Path(pdk) / pdk_info.liberty_path
  tech_lef = pathlib.Path(pdk) / pdk_info.tech_lef_path
  read_cell_lefs = '\n'.join(
      f'read_lef {pathlib.Path(pdk) / cell_lef_path}'
      for cell_lef_path in pdk_info.cell_lef_paths
  )

  if isinstance(core_area, AbsoluteCoreArea):
    die_side_microns = (
        core_area.core_width_microns + core_area.core_padding_microns * 2
    )
    core_side_microns = (
        core_area.core_width_microns + core_area.core_padding_microns
    )
    initialize_floorplan_args = (
        f' -die_area "0 0 {die_side_microns} {die_side_microns}" -core_area'
        f' "{core_area.core_padding_microns} {core_area.core_padding_microns} {core_side_microns} {core_side_microns}"'
    )
  elif isinstance(core_area, RelativeCoreArea):
    initialize_floorplan_args = (
        f' -utilization {core_area.utilization_percent} -aspect_ratio 1.0'
    )
  else:
    raise ValueError(
        'Expected core_area to be AbsoluteCoreArea or RelativeCoreArea, got'
        f' {core_area!r}'
    )

  initialize_floorplan_command = (
      f'initialize_floorplan -site "{pdk_info.cell_site}"'
      f' {initialize_floorplan_args}'
  )

  def source_pdk_info_tcl(path):
    return f'source {pathlib.Path(pdk) / path}' if path else ''

  source_tracks_file = source_pdk_info_tcl(pdk_info.tracks_file_path)
  source_rc_script_configuration = source_pdk_info_tcl(
      pdk_info.rc_script_configuration_path
  )
  source_pdn_config = source_pdk_info_tcl(pdk_info.pdn_config_path)
  if pdk_info.tapcell_tcl_path:
    tapcell_command = source_pdk_info_tcl(pdk_info.tapcell_tcl_path)
  else:
    tapcell_command = (
        f'tapcell -distance {pdk_info.tapcell_distance} -tapcell_master'
        f' {pdk_info.tap_cell}'
    )

  synth_v = work_dir / 'user_module_synth.v'
  openroad_metrics = work_dir / 'openroad_metrics.json'
  openroad_global_placement_layout = work_dir / 'openroad_global_placement.png'

  openroad_script = f"""
  read_lef {tech_lef}
  {read_cell_lefs}
  read_liberty {liberty}
  read_verilog {synth_v}
  link_design user_module
  {initialize_floorplan_command}
  {source_tracks_file}
  insert_tiecells {pdk_info.tie_high_port} -prefix "TIE_ONE_"
  insert_tiecells {pdk_info.tie_low_port} -prefix "TIE_ZERO_"
  create_clock [get_ports clk] -period {clock_period_ns}
  {source_rc_script_configuration}
  set_wire_rc -signal -layer "{pdk_info.wire_rc_signal_metal_layer}"
  set_wire_rc -clock  -layer "{pdk_info.wire_rc_clock_metal_layer}"
  place_pins -hor_layers {pdk_info.pin_horizontal_metal_layer} -ver_layers {pdk_info.pin_vertical_metal_layer}
  {tapcell_command}
  {source_pdn_config}
  pdngen -verbose
  global_placement -timing_driven -routability_driven -density {placement_density} -pad_left {pdk_info.global_placement_cell_pad} -pad_right {pdk_info.global_placement_cell_pad}
  remove_buffers
  estimate_parasitics -placement
  repair_design
  repair_timing
  utl::metric "utilization_percent" [rsz::utilization]
  utl::metric "design_area" [rsz::design_area]
  utl::metric "power" [sta::design_power [sta::parse_corner {{}}]]
  utl::metric "wns" [sta::worst_slack -max]
  report_power
  report_design_area
  if {{[info procs save_image] == "save_image"}} {{
    save_image -resolution 0.005 "{openroad_global_placement_layout}"
  }}
  """
  openroad_tcl = work_dir / 'place.tcl'
  with openroad_tcl.open('w') as f:
    f.write(openroad_script)
  !QT_QPA_PLATFORM=minimal {openroad} -metrics {openroad_metrics} -exit {openroad_tcl}

  with open(work_dir / 'openroad_metrics.json', 'r') as f:
    metrics = json.loads(f.read())
  df_area = pd.DataFrame.from_dict(
      {
          'global placement': [
              float(metrics['design_area']) * 1e12,
              float(metrics['utilization_percent']) * 100,
          ]
      },
      columns=['area', 'utilization'],
      orient='index',
  )
  metrics_power = [float(m) * 1e6 for m in metrics['power'].split(' ')]
  df_power = pd.DataFrame().from_dict(
      {
          'sequential': metrics_power[4:8],
          'combinational': metrics_power[8:12],
          'clock': metrics_power[12:16],
          'macro': metrics_power[16:20],
          'pad': metrics_power[20:],
          'total': metrics_power[0:4],
      },
      orient='index',
      columns=['internal', 'switching', 'leakage', 'total'],
  )
  df_metrics = (
      pd.DataFrame.from_records([metrics])
      .transpose()
      .set_axis(['metrics'], axis=1)
  )
  return PlacementResults(
      openroad_global_placement_layout=openroad_global_placement_layout,
      area=df_area,
      metrics=df_metrics,
      power=df_power,
  )

## Synchronous Computer Architecture Design

In order for computers to produce the correct results to be leveraged by programmers, we must separate functional specification from performance analysis.
<ul>
<li>In other words, the modularity design thinking you learned by implementing functions in C and Python will also strengthen sequential hardware design</li>
<li>Compose each portion of the system, and then determine the shortest path to a correct output - known as <b>latency</b></li>
<li>Once all modules are composed, the overall system works correctly as far as it is running with a clock period</li>
</ul>

This shown in the image below has four latencies:
<ul>
  <li>Cycle 1: The input registers take the values and wait until the next cycle</li>
  <li>Cycle 2: Registers 2 and 3 are multipled and stored in a register</li>
  <li>Cycle 3: The output of the multiplication is added with the result of the previous cycle. Register 4 has the input to the MUX, and the </li>
  <li>Cycle 3: The output of the register in Cycle 3 is concatenated with reg1 to </li>
</ul>
<img src = "https://github.com/mmorri22/cse30321/blob/main/xls/lec24/Synchronous%20RTL%20Design.png?raw=true" width=400>

### Revisiting Latency vs. Throughput

As we learned in our study of pipelining, T<sub>clk</sub> = max {T1 ,T2 ,T3 ,T4}. When elements are in parallel, we must consider all possible paths in synthesis. Consider the following circuit:

<img src = "https://github.com/mmorri22/cse30321/blob/main/xls/lec24/Synchronous%20RTL%20Design%20Problem.png?raw=true" width=300>

The top case is Non-Pipelined, so we must calculate the latency by the critical path. In this case, this is 200ps+300ps = 500ps. The throughput is 1 task / (500ps) = 2 GHz.

On the top, we pipeline such that each state is equal to the worst-case cycle time. Here, we set latency = 300ps * 2 cycles = 600ps. And the throughput = 2 tasks / (600ps) = 3.33 GHz.



## Buffers and Pipelining: A First Step into the Sequential Devices

Recall the RISC-V Pipelined Datapath. In lecture, we discussed the five stages of the datapath, and briefly mentioned that registers were places in between each stage and were propagated by a clock signal. And we didn't mention much of it after that.

But the registers in between each stage are critical in ensuring we prevent race conditions and produce the output expected by the programmer.

## Sequential Circuit Tokens

Sequential circuits are usually designed with flip-flops or latches, which are sometimes called memory elements.

A <b>token</b> is the value of a current state of a memory element. If tokens moved through pipeline at constant speed, no sequencing elements would be necessary. Dispersion is the variance in delay.

Consider the example of a fiber-optic cable:
<ul>
  <li>Light pulses (tokens) are sent down cable</li>
  <li>Next pulse sent before first reaches end of cable</li>
  <li>No need for hardware to separate pulses</li>
  <li>But dispersion sets min time between pulses</li>
</ul>

But this doesn't work this way in computing circuits. In most circuits, dispersion is high, meaning the variance in delay is significant. Our solution is to delay fast tokens so they don’t catch slow ones.

In Computer Architecture, we performed this using <b>pipelining</b>.

### Flip Flops

Flip Flops are built as a pair of back-to-back latches. In the first half of the clock cycle, the first latch reads in and stores the token on the rising edge. In the second half of the clock cycle, the value is propagated to the second latch, which protects the token from changes during the clock cycle. If a token arrives too early, it waits at the flip-flop until the next cycle.

### DSLX Inserts Flip Flops at the beginning of each stage

Consider the simple DSLX module below, <code>and_with_pipelining</code>. In this module, we read in two 32-bit values - <code>input_a</code> and <code>input_b</code> - and return the <b>and</b> result.

    fn and_with_pipelining( input_a: u32, input_b: u32 ) -> u32 {

        // Output the and
        input_a & input_b
    }

In order to get the registers, we must:
<ol>
  <li>Add <code>import std;</code> to the code after the <code>%%dslx</code> magic.</li>
  <li>Modify the <code>--flop_inputs=true --flop_outputs=true</code> to <b>true</b></li>
  <li>Add a clock period, which the XLS tool flow will use to generate a clock signal. For now, we will set this value as <code>--clock_period_ps=2000</code></li>
</ol>

> Later, we will observe that we will need to extend the clock period to meet the design constraints, but for now we are running this circuit with a clock period of 2ns (500 MHz).

Run the cell below, and then select the <b>verilog</b> tab. While we will not study Verilog in the Computer Architecture course, observing the output of the synthesis flow will help connect concepts between Logic Design and Computer Architecture. We will discuss what is going on after the code cell.

In [None]:
%%dslx --top=and_with_pipelining --pipeline_stages=1 --flop_inputs=true --flop_outputs=true --clock_period_ps=2000

// Used for pipelining and clock signals
import std;

fn and_with_pipelining( input_a: u32, input_b: u32 ) -> u32 {

    // Output the and
    input_a & input_b
}

## A Brief Peek Under the Hood - What is the Verilog doing?

The module <code>user_module</code> now has a generated clock signal <code>clk</code>, and the input/output wires now correlate to the input/output signals we created in the DSLX module. The <code>input wire [31:0] input_a</code> and <code>input wire [31:0] input_b</code> corresponds to our <code>input_a: u32, input_b: u32</code> signals, and the <code>output wire [31:0] out</code> corresponds to <code>->u32</code>.

    module user_module(
      input wire clk,
      input wire [31:0] input_a,
      input wire [31:0] input_b,
      output wire [31:0] out
    );

The <code>reg[31:0] p0_input_x</code> values represents the registers at the beginning of the stage, and we put the input signals onto those registers using <code><=</code> on the positive edge of every clock signal.

    // ===== Pipe stage 0:

    // Registers for pipe stage 0:
    reg [31:0] p0_input_a;
    reg [31:0] p0_input_b;
    always_ff @ (posedge clk) begin
      p0_input_a <= input_a;
      p0_input_b <= input_b;
    end

Once the signals are read in, we perform the <b>and</b> operation and store the result on the 32-bit <code>p1_and_10_comb</code> result.

    // ===== Pipe stage 1:
    wire [31:0] p1_and_10_comb;
    assign p1_and_10_comb = p0_input_a & p0_input_b;

On the next positive edge, we move the signal along using the <code>p1_and_10 <= p1_and_10_comb</code> line, and then assign it to the <code>out</code> signal.

      // Registers for pipe stage 1:
      reg [31:0] p1_and_10;
      always_ff @ (posedge clk) begin
        p1_and_10 <= p1_and_10_comb;
      end
      assign out = p1_and_10;
    endmodule

In [None]:
#@title Generate the 32-bit flip-flop registers with AND gates {display-mode: "form"}
#@markdown - Click the ▷ button to run synthesis, static timing analysis and global placement

placement_density = 1 #@param {type:"slider", min:0, max:1.0, step:0.01}
clock_period_ps = 10000 #@param {type:"slider", min:0, max:100000, step:1}
clock_period_ns = clock_period_ps / 1000.0
core_area = 'absolute' # @param ["relative", "absolute"]

# @markdown ### core_area_relative
# @markdown compute core area from the design size
utilization_percent = 100 #@param {type:"slider", min:0, max:100, step:1}
# @markdown ### core_area_absolute
# @markdown set core area explicitly
core_width_microns = 70 #@param {type:"slider", min:0, max:1000, step:1}
core_padding_microns = 20 #@param {type:"slider", min:0, max:100, step:1}

from IPython.display import display, display_png
import IPython.display
import PIL.Image

if core_area == 'relative':
  core_area_value = RelativeCoreArea(utilization_percent)
else:
  core_area_value = AbsoluteCoreArea(core_width_microns, core_padding_microns)

tb = widgets.TabBar(['synthesis', 'netlist', 'timing', 'placement', 'area', 'power'])

# run yosys synthesis
with tb.output_to('synthesis', select=True):
  synth_results = run_synthesis()
  tb.clear_tab()

with tb.output_to('synthesis', select=False):
  grid = widgets.Grid(1, 2, header_row=False, header_column=False)
  with grid.output_to(0, 0):
    display(synth_results.cell_stats)
  with grid.output_to(0, 1):
    display(synth_results.design_stats)

# display gate level netlist
with tb.output_to('netlist', select=False):
  with synth_results.synth_v.open('r') as f:
    print(f.read())


# run opensta static timing analysis
with tb.output_to('timing', select=True):
  opensta_results = run_opensta()
  tb.clear_tab()

# display opensta report
with tb.output_to('timing', select=False):
  display(
      opensta_results.style.hide(axis='index')
      .background_gradient(subset=['delay'], cmap='Oranges')
      .bar(subset=['time'], color='lightblue')
  )

# run openroad placement
with tb.output_to('placement', select=True):
  placement_results = run_placement(
      clock_period_ps=clock_period_ps,
      placement_density=placement_density,
      core_area=core_area_value,
  )
  tb.clear_tab()

# display global placement layout
with tb.output_to('placement', select=False):
  if placement_results.openroad_global_placement_layout.exists():
    img = PIL.Image.open(placement_results.openroad_global_placement_layout)
    img = img.resize((500, 500))
    display_png(img)

# display area estimate
with tb.output_to('area', select=False):
  display(
      placement_results.area.style.format('{:.3f} μm²', subset=['area'])
      .format('{:.2f} %', subset=['utilization'])
      .bar(subset=['utilization'], color='lightblue', vmin=0, vmax=100)
  )

# display power metrics
with tb.output_to('power', select=False):
  display(
      placement_results.power.style.format('{:.3f} uW')
      .background_gradient(
          subset=pd.IndexSlice[
              placement_results.power.index[:-1], ['internal', 'switching', 'leakage']
          ],
          cmap='Oranges',
          axis=None,
      )
      .bar(subset=['total'], color='lightcoral')
      .bar(
          subset=pd.IndexSlice[placement_results.power.index[-1:], :],
          color='lightcoral',
          axis='columns',
      )
  )

## Sequential Power Consumption

In the result, we see that the sequential power consumption is substantially more than combinational logic. In this example, the registers consume <b>487.966 uW</b> whereas the combinational logic consumes only <b>5.619uW</b>, meaning the sequential circuits consume 98.88% of the device's power in this case.

Because registers, SRAM, and DRAM require constant refreshing and feedback loops in order to hold onto the signal, they consume considerably more power than combinational circuits.

## Try It On Your Own: Write a Pipelined 32-bit adder/subtractor device.

In this example, the pipeline stages are added to the top module only, so you may use the code that we wrote in earlier readings/lectures to pipeline the <code>add_sub_32_with_carry</code>.

> Hint: This circuit cannot be pipelined with a clock period of 2000ps. Set it equal to 3000. If you run the code with <code>--clock_period_ps=2000</code>, you will get an error message that states: <code>INVALID_ARGUMENT: cannot achieve the specified pipeline length</code>. The reason is that the add/sub takes longer than 2000ps to calculate, and will cause a race condition. The message recommends trying two stages. What we will do instead is set <code>--clock_period_ps=3000</code>

Put your solution in the code cell below:

In [None]:
# Replace this Python comment with your DSLX magic and XLS modules for add_sub_32_with_carry

In [None]:
#@title Generate the 32-bit pipelined adder/subtractor {display-mode: "form"}
#@markdown - Click the ▷ button to run synthesis, static timing analysis and global placement
#@markdown - The correct answer will synthesize with a core_width of 85 microns


placement_density = 1 #@param {type:"slider", min:0, max:1.0, step:0.01}
clock_period_ps = 10000 #@param {type:"slider", min:0, max:100000, step:1}
clock_period_ns = clock_period_ps / 1000.0
core_area = 'absolute' # @param ["relative", "absolute"]

# @markdown ### core_area_relative
# @markdown compute core area from the design size
utilization_percent = 100 #@param {type:"slider", min:0, max:100, step:1}
# @markdown ### core_area_absolute
# @markdown set core area explicitly
core_width_microns = 85 #@param {type:"slider", min:0, max:1000, step:1}
core_padding_microns = 0 #@param {type:"slider", min:0, max:100, step:1}

from IPython.display import display, display_png
import IPython.display
import PIL.Image

if core_area == 'relative':
  core_area_value = RelativeCoreArea(utilization_percent)
else:
  core_area_value = AbsoluteCoreArea(core_width_microns, core_padding_microns)

tb = widgets.TabBar(['synthesis', 'netlist', 'timing', 'placement', 'area', 'power'])

# run yosys synthesis
with tb.output_to('synthesis', select=True):
  synth_results = run_synthesis()
  tb.clear_tab()

with tb.output_to('synthesis', select=False):
  grid = widgets.Grid(1, 2, header_row=False, header_column=False)
  with grid.output_to(0, 0):
    display(synth_results.cell_stats)
  with grid.output_to(0, 1):
    display(synth_results.design_stats)

# display gate level netlist
with tb.output_to('netlist', select=False):
  with synth_results.synth_v.open('r') as f:
    print(f.read())


# run opensta static timing analysis
with tb.output_to('timing', select=True):
  opensta_results = run_opensta()
  tb.clear_tab()

# display opensta report
with tb.output_to('timing', select=False):
  display(
      opensta_results.style.hide(axis='index')
      .background_gradient(subset=['delay'], cmap='Oranges')
      .bar(subset=['time'], color='lightblue')
  )

# run openroad placement
with tb.output_to('placement', select=True):
  placement_results = run_placement(
      clock_period_ps=clock_period_ps,
      placement_density=placement_density,
      core_area=core_area_value,
  )
  tb.clear_tab()

# display global placement layout
with tb.output_to('placement', select=False):
  if placement_results.openroad_global_placement_layout.exists():
    img = PIL.Image.open(placement_results.openroad_global_placement_layout)
    img = img.resize((500, 500))
    display_png(img)

# display area estimate
with tb.output_to('area', select=False):
  display(
      placement_results.area.style.format('{:.3f} μm²', subset=['area'])
      .format('{:.2f} %', subset=['utilization'])
      .bar(subset=['utilization'], color='lightblue', vmin=0, vmax=100)
  )

# display power metrics
with tb.output_to('power', select=False):
  display(
      placement_results.power.style.format('{:.3f} uW')
      .background_gradient(
          subset=pd.IndexSlice[
              placement_results.power.index[:-1], ['internal', 'switching', 'leakage']
          ],
          cmap='Oranges',
          axis=None,
      )
      .bar(subset=['total'], color='lightcoral')
      .bar(
          subset=pd.IndexSlice[placement_results.power.index[-1:], :],
          color='lightcoral',
          axis='columns',
      )
  )

## The First Step into Synchronous Programming: A Review of Finite State Machines

> The FSM description is provided here as a brief review of the topic you covered in Logic Design.

Sometimes, using a direct set of <i>input to output</i> signals is insufficient to solve the problem. Consider the case of a vending machine. Vending machine use small <a href = "https://en.wikipedia.org/wiki/Embedded_system">embedded systems</a> to keep track of how much money you have dispensed, and when to vend specific items. For example, if we have paid <b>1.00</b> and we enter a quarter, there are different results if we are purchasing an item that is <b>1.75</b>, <b>1.25</b>, or <b>1.00</b>. Coding the result for each individual item takes up a lot of space on the limited embedded machine. We can do better.

> We can now define an <b>Embedded System</b> as a combination of a computer processor, memory, and input/output peripheral devices that has a dedicated function within a larger mechanical or electronic system.

We can write digital logic that acts like a graph called a <b>Finite State Machine</b>. A <b>Finite State Machine</b> consists of:<br />
<ol>
    <li>A set of states (represented as nodes)</li>
    <li>An initial state</li>
    <li>A set of transitions between states (represented by edges)</li>
    <li>A set of control input signals</li>
</ol>

Consider the Finite State Machine below: The format is <code>state_name / output</code>. For example, any time you reach the state <code>q0</code>, the output will be a <code>0</code>. Conversely, any time you reach the state <code>q3</code>, the output will be a <code>0</code>.

<b>Start State</b>: <code>q0</code><br />
<b>Control Signals</b>: <code>{1, 1, 0, 1, 0, 0, 0, 1, 1}</code><br />

Here is how you would derive the results:
<ol>
    <li>The start state for this step is <code>q0</code>, so the output is <code>0</code> and the control input is <code>1</code>. The edge takes to <font color="red"><code>q2</code></font></li>
    <li>The start state for this step is <code>q2</code>, so the output is <code>0</code>, and the control input is <code>1</code>. The edge takes to <code>q2</code></li>
    <li>The start state for this step is <code>q2</code>, so the output is <code>0</code>, and the control input is <code>0</code>. The edge takes to <code>q4</code></li>
    <li>The start state for this step is <code>q4</code>, so the output is <code>1</code>, and the control input is <code>1</code>. The edge takes to <code>q3</code></li>
    <li>The start state for this step is <code>q3</code>, so the output is <code>1</code>, and the control input is <code>0</code>. The edge takes to <code>q4</code></li>
    <li>The start state for this step is <code>q4</code>, so the output is <code>1</code>, and the control input is <code>0</code>. The edge takes to <code>q1</code></li>
    <li>The start state for this step is <code>q1</code>, so the output is <code>0</code>, and the control input is <code>0</code>. The edge takes to <code>q1</code></li>
    <li>The start state for this step is <code>q1</code>, so the output is <code>0</code>, and the control input is <code>1</code>. The edge takes to <code>q3</code></li>
        <li>The start state for this step is <code>q3</code>, so the output is <code>1</code>, and the control input is <code>1</code>. The edge takes to <code>q2</code></li>
</ol>


<img src = "https://media.geeksforgeeks.org/wp-content/uploads/1-43.jpg" height=400 width=500>

The solution is <code>{0, 0, 0, 1, 1, 1, 0, 0, 1}</code>

Solution Table:

| Start State | Output | Control | Next State |
|---|---|---|---|
|q0|<font color="red">0</font>|1|q2|
|q2|<font color="red">0</font>|1|q2|
|q2|<font color="red">0</font>|0|q4|
|q4|<font color="red">1</font>|1|q3|
|q3|<font color="red">1</font>|0|q4|
|q4|<font color="red">1</font>|0|q1|
|q1|<font color="red">0</font>|0|q1|
|q1|<font color="red">0</font>|1|q3|
|q3|<font color="red">1</font>|1|q2|

## The Synchronous Programming Model

Synchronous languages have been established as a technology of choice for modeling, specifying, validating, and implementing real-time embedded applications. However, dealing with concurrency, time and causality has become increasingly difficult as design complexity increases.

### Synchronous Model Considerations
To build a safe synchronous device for a computing system, we must consider the system’s:
<ul>
  <li><b>Concurrency</b> - Multiple computations are happening at the same time</li>
  <ul>
    <il>Synchronous languages should offer as a notation block diagrams (also called dataflow diagrams), or hierarchical automata, or some imperative type of syntax</li>
  </ul>
  <li><b>Synchronicity</b>
Hardware Description Languages which support synchronous processes through inter-process communication.</li>
</ul>

### Combining Concurrency and Synchronicity

The <b>synchronous programming model</b> accounts for concurrency by breaking the tasks down into non-overlapping computation and communication phases triggered by a global clock. This model is beneficial because it provides a simpler way to access the power of concurrency in functional specification
<ul>
<li>The key advantage of the synchronous programming model is using a solid mathematical foundation is the ability to reason formally about the operation of the system.</li>
</ul>

In a sequential system, the nth atomic reaction is the combination of the previous n-1 reactions.

Consider two concurrent functions, <b>f</b> and <b>g</b>:
<ul>
  <li>They have the same input vector U<sub>n</sub> and state X<sub>n-1</sub></li>
  <li>They are operating on the same block i</li>
</ul>

They will produce the following equation, where 𝑋<sub>𝑛</sub><sup>𝑖</sup> is the next state of the block i and 𝑌<sub>𝑛</sub><sup>𝑖</sup> is the output of block i:
<ul>
  <li>𝑋<sub>𝑛</sub><sup>𝑖</sup> = f(𝑋<sub>𝑛-1</sub><sup>𝑖</sup>, U<sub>n</sub><sup>𝑖</sup>)</li>
  <li>Y<sub>𝑛</sub><sup>𝑖</sup> = g(Y<sub>𝑛-1</sub><sup>𝑖</sup>, U<sub>n</sub><sup>𝑖</sup>)</li>
</ul>

In the image below, we see the three basic stages of synchronous computation. At stage 1, 𝑋<sub>𝑛-1</sub><sup>𝑖</sup> has been fed back to the functional block from the previous cycle. At stage 2, both U<sub>n</sub><sup>𝑖</sup> and 𝑋<sub>𝑛-1</sub><sup>𝑖</sup> are used to calculate the <b>g</b>, the next logical output, and <b>f</b> - the next state output. At stage 3, those outputs are produced to be used in future calculations.
<img src = "https://github.com/mmorri22/cse30321/blob/main/xls/lec24/Synchronous%20Model%201.png?raw=true">

## Transaction Level Modeling

DSLX promotes testing of sequential elements through <b>inter-process communication</b>. Think of inter-process communication like treating design of logic and architectures as akin to I/O devices. We learned that the computer architecture and I/O devices communicate through a series of transactions based on bus-width and timing.

In High-Level Synthese, transaction level modeling and verification follows this process in order to increases architectural design and verification productivity.
<ul>
  <li>Less detail means higher performance and easier debug</li>
  <li>Earlier testing (before RTL) means less expensive bug fixes</li>
</ul>

How is <b>Transaction-Level Modeling</b>? Consider the diagram below.
<ul>
  <li><b>Modules</b> are the <code>fn</code> functions we have designed so far. Modules are a container which serves as a hierarchical entity that may contain other modules or processes.</li>
  <li>A <b>Process</b> is the task performed by the module</b></li>
  <li>The <b>p_in/p_out</b> are the input and output signals of the module</li>
  <li>A <b>port</b> is the connector between the current module and another module.</li>
  <ul><li>In combinational logic, we used the <code>let</code> keyword to define a connecting wire</li>
  <li>In sequential logic, the port allows us to allow the synthesis tool to design the transactions between modules for us, akin to how I/O polling defines how the processor and a I/O device communicate.</li></ul>
  <li>Channels connect between ports, and define the communication protocols</li>
</ul>

<img src = "https://github.com/mmorri22/cse30321/blob/main/xls/Reading%2014/TLM%20Diagram.png?raw=true">

## Transaction-Level Modeling and Finite State Machines - DSLX procs

We use flip flops to hold intermediate stages, but they are not long-term memory elements that store the value after the clock cycle. To perform this task, we will leverage <b>DSLX Procs</b>, short for "communicating sequential processes", which are the means by which DSLX models sequential and stateful modules.

DSLX procs express stateful sequencial logic similar to the <code>always_ff block</code> that we observed in the synthesized Verilog.

A proc requires:
<ul>
<li><code>--reset=reset</code> to be added to the top-level dslx magic</li>
<li><b>channels</b> - entities into which data can be sent and from which data can be received</li>
<ul><li>channels are uni-directional data fifo that allow communication between procs</li>
<li>Channels materialize as a data bus with ready/valid signal in synthesis. (For those of you who are computer engineers, this concept will become important in Signals and Systems.)</li>
<li>Each channel has a send and a receive endpoint: data inserted into a channel by a <code>send</code> op can be pulled out by a <code>recv</code> op.</li></ul>
<li>A <code>config</code> function that initializes constant proc state and spawns any other dependent/child procs needed for execution.</li>
<li>A recurrent (i.e., infinitely looping) <code>next</code> function that contains the actual logic to be executed by the proc.</li>
</ul>

Here is our DSLX magic for the Finite State Machine we are going to build:

<code>%%dslx --top=fsm_example_proc --pipeline_stages=1 --flop_inputs=false --flop_outputs=false --clock_period_ps=4650 --reset=reset</code><br>
<code>import std;</code>


## Building our FSM example with a DSLX proc

In this section, we will develop an algorithm for designing and testing procs.

### Step 1: Develop the combinational elements of the proc.

In this case, we will use <code>match</code> statements and <code>struct STATE_RES</code> to represent the logic shown in the FSM. Note that this does not contain any memory elment, just the logic between the states.

In the context of Transaction Level Modeling, this is the <b>module</b> (<code>fsm_logic</code>) that defines the process (calculating the next output and next state).

Here is our Finite State Machine that we presented earlier:<br>
<img src = "https://media.geeksforgeeks.org/wp-content/uploads/1-43.jpg" height=400 width=500>

And here is the corresponding combinational logic used to traverse the states, given an input <code>curr_state:u3</code> and a <code>control:u1</code> signal:

    struct STATE_RES{
        output:u1,
        state:u3
    }

    fn fsm_logic( curr_state:u3, control:u1 ) -> STATE_RES {

        match( curr_state ){

            // Start State 0 - Output is 0
            u3:0 => {
                // Next State for control 0 is 1
                if(control == u1:0){

                    // Output, Next State
                    STATE_RES{ output:u1:0, state:u3:1 }
                }
                // Next State is 2
                else{
                    STATE_RES{ output:u1:0, state:u3:2 }
                }
            },

            // State 1, output is 0
            u3:1 =>{
                // Next State for control 0 is 1
                if(control == u1:0){

                    // Output, Next State
                    STATE_RES{ output:u1:0, state:u3:1 }
                }
                // Next State is 3
                else{
                    STATE_RES{ output:u1:0, state:u3:3 }
                }
            },

            // State 2, output is 0
            u3:2 =>{
                // Next State for control 0 is 4
                if(control == u1:0){

                    // Output, Next State
                    STATE_RES{ output:u1:0, state:u3:4 }
                }
                // Next State is 2
                else{
                    STATE_RES{ output:u1:0, state:u3:2 }
                }
            },

            // State 3, output is 1
            u3:3 =>{
                // Next State for control 0 is 4
                if(control == u1:0){

                    // Output, Next State
                    STATE_RES{ output:u1:1, state:u3:4 }
                }
                // Next State is 2
                else{
                    STATE_RES{ output:u1:1, state:u3:2 }
                }
            },

            // State 4, output is 1 - Final Case
            _ =>{
                // Next State for control 0 is 1
                if(control == u1:0){

                    // Output, Next State
                    STATE_RES{ output:u1:1, state:u3:1 }
                }
                // Next State is 3
                else{
                    STATE_RES{ output:u1:1, state:u3:3 }
                }
            }
        }
    }


## Step 2: Develop the proc name and input/output channels

Next, we will define our <b>ports</b> and <b>channels</b>. The <code>proc</code> defines the module for the procedure.

The way a channel is defined in DSLX is to use the <code>chan</code> keyword, the bitwidth of the channel, and define its direction, as channels are unidirectional.

        // define control as input channel
        control: chan<u1> in;

In our example, the Finite State Machine has an input (the control signal) and an output (the generated result at each state). Note that we have not yet defined the "next" state, since we will later define the next state as a memory element internal to the proc:

    proc fsm_example_proc {

        // define control as input channel
        control: chan<u1> in;

        // define output as an output channel
        output: chan<u1> out;

## Step 3: Define the memory element

The internal memory element is defined using a procedure called <code>init</code>. In this procedure, you can define an internal memory element that will be preserved throughout the entire operation.

In this case, we have five states in our finite state machine with an initial state of 0, so we define the state as <code>u3:0</code>:

    // `init` returns the output and state `0` for the sequential logic.
    // Initialize the state, and we will save this as the intermediate register
    init {
        u3:0
    }

## Step 4: Matching the ports to the channels

We will configure the ports and channels using the <code>config</code> procedure. The <code>config</code> procedure takes external channels as parameters and returns channel used by the sequential logic.

The approach to defining a config in DSLX is:
<ul>
  <li>Use the <code>config</code> keyword</li>
  <li>Put the input and output channels as inputs to the config procedure. Be sure they are in the identical order as you define them at the beginning of the proc.</li>
  <li>Map them as an output</li>
</ul>

    // `config` takes external channels as parameters and returns channel used by the sequential logic.
    // similar to the module input and output definitions in Verilog.
    config(control: chan<u1> in, output: chan<u1> out) {
        (control, output)
    }

At first glance, this may seem redundant, but let's consider the example problem from I/O polling where the processor is a 32-bit RISC-V CPU and the floppy drive with a 25KB/sec transfer rate over a 16-bit bus. We may use channel/port configurations to configure this communication protocol so we write the result from the RISC-V processor to the floppy drive at the appropriate rate.

<img src = "https://github.com/mmorri22/cse30321/blob/main/xls/Reading%2014/FSM%202.png?raw=true" height=400>

## Step 5 - Defining the internal loop with the next procedure

After the <code>config</code> procedure, we will use the <code>next</code> procedure to perform the sequential tasks. The inputs to the procedure are a token and the accumulator (the value we are storing):

<ul>
  <li><code>token</code> - Keyword used to define the next token for the clock signal</li>
  <li><code>state</code> - The state that corresponds to the <code>u3:0</code> we defined in the <code>init</code> procedure.</li>
</ul>

    // Step 5 - Defining the internal procedure
    next(tok: token, state: u3) {

Inside the procedure, we use the <code>recv</code> to correspond <code>tok</code> to the input channels. We will define each input channel separately, and then we will <code>join</code> them so that the channels are defined in parallel.

Let's say we had two control inputs (control_1 and control_2). We would define and join them to the internal module with lines control_in1 and control_in2. Then, they would be joined in a tok_c.

        // receive one bit value control from the input channel.
        let (tok_a, control_in1) = recv(tok, control_1);

        // receive one bit value control from the input channel.
        let (tok_b, control_in2) = recv(tok, control_2);

        // Join all recv operation so they are performed in parallel.
        let tok_c = join(tok_a, tok_b);

In the case of our FSM example, we only need one <code>recv</code> and <code>join</code>, so our example code will appear as follows:

        // receive one bit value control from the input channel.
        let (tok_a, control_in) = recv(tok, control);

        // Join all recv operation so they are performed in parallel.
        let tok_b = join(tok_a);

Next, we perform the process itself

        // multiply `a` and `b` and add (accumulate) them to the previous `acc` value.
        let result:STATE_RES = fsm_logic(state, control_in);

<img src = "https://github.com/mmorri22/cse30321/blob/main/xls/Reading%2014/FSM%204.png?raw=true" height=400>

To produce the output, we will use the <code>send</code> module. The format is:
<ul>
  <li><code>tok</code> - Indicates the output will occur at the next clock</li>
  <li><code>output</code> - Indicates we will connect this signal to the output channel we defined in Step 1</li>
  <li><code>result.output</code> - Indicates we will put the <code>result.output</code> value on the output channel</li>
</ul>

        // send the result and carry to the output channel.
        send(tok, output, result.output);

Finally, we will place the value of <code>result.state</code> as our return, which will save it in the <code>u3</code> memory element we created in <code>init</code>:

        // return the accumulated result as the new state.
        result.state
    }

Here is a diagram connecting everything together, and the complete code proc may be found in the cell below:

> Note: we have not included a test proc yet.

<img src = "https://github.com/mmorri22/cse30321/blob/main/xls/Reading%2014/fsm_example_proc.png?raw=true">

In [None]:
%%dslx --top=fsm_example_proc --pipeline_stages=1 --flop_inputs=false --flop_outputs=false --clock_period_ps=4650 --reset=reset

import std;

// Step 1 - Develop the combinational elements of the proc.
struct STATE_RES{
    output:u1,
    state:u3
}


fn fsm_logic( curr_state:u3, control:u1 ) -> STATE_RES {

    match( curr_state ){

        // Start State 0 - Output is 0
        u3:0 => {
            // Next State for control 0 is 1
            if(control == u1:0){

                // Output, Next State
                STATE_RES{ output:u1:0, state:u3:1 }
            }
            // Next State is 2
            else{
                STATE_RES{ output:u1:0, state:u3:2 }
            }
        },

        // State 1, output is 0
        u3:1 =>{
            // Next State for control 0 is 1
            if(control == u1:0){

                // Output, Next State
                STATE_RES{ output:u1:0, state:u3:1 }
            }
            // Next State is 3
            else{
                STATE_RES{ output:u1:0, state:u3:3 }
            }
        },

        // State 2, output is 0
        u3:2 =>{
             // Next State for control 0 is 4
            if(control == u1:0){

                // Output, Next State
                STATE_RES{ output:u1:0, state:u3:4 }
            }
            // Next State is 2
            else{
                STATE_RES{ output:u1:0, state:u3:2 }
            }
        },

        // State 3, output is 1
        u3:3 =>{
             // Next State for control 0 is 4
            if(control == u1:0){

                // Output, Next State
                STATE_RES{ output:u1:1, state:u3:4 }
            }
            // Next State is 2
            else{
                STATE_RES{ output:u1:1, state:u3:2 }
            }
        },

        // State 4, output is 1 - Final Case
        _ =>{
             // Next State for control 0 is 1
            if(control == u1:0){

                // Output, Next State
                STATE_RES{ output:u1:1, state:u3:1 }
            }
            // Next State is 3
            else{
                STATE_RES{ output:u1:1, state:u3:3 }
            }
        }
    }
}

// Step 2 - Define the proc name and input and output challens
proc fsm_example_proc {

    // define control as input channel
    control: chan<u1> in;

    // define output as an output channel
    output: chan<u1> out;

    // `init` returns the output and state `0` for the sequential logic.
    // Initialize the state, and we will save this as the intermediate register
    // Since there are five states, we use u3 to represent all the possible state values
    init {
        u3:0
    }

    // `config` takes external channels as parameters and returns channel used by the sequential logic.
    // similar to the module input and output definitions in Verilog.
    config(control: chan<u1> in, output: chan<u1> out) {
        (control, output)
    }

    // Step 5 - Defining the internal procedure
    next(tok: token, state: u3) {

        // receive one bit value control from the input channel.
        let (tok_a, control_in) = recv(tok, control);

        // Join all recv operation so they are performed in parallel.
        let tok_b = join(tok_a);

        // multiply `a` and `b` and add (accumulate) them to the previous `acc` value.
        let result:STATE_RES = fsm_logic(state, control_in);

        // send the result and carry to the output channel.
        send(tok, output, result.output);

        // return the accumulated result as the new state.
        result.state
    }

}

## DUTs - Testing procs... with procs

We will create a DSLX proc in order to test the circuit we just designed. It turns out that this is our first step into the world of <b>verification engineering</b>.

A verification engineer is tasked with developing software that can simulate, anticipate and detect errors in the design that has been developed by the design engineers. They work with the design team to identify design flaws, help resolve problems and verify that the product, service or system will work as expected.

According to <a href = "https://www.talent.com/salary?job=asic+verification+engineer">Talent.com</a>, the average hardware verification engineer salary in the USA is $155,776. Almost 50% of hardware development (or, for that matter, software development) is spent in testing the designs, and yet universities spend comparatively little time teaching hardware or software verification.

> In fact, there is a <a href = "https://www.jtag.com/">Joint Test Action Group</a> (JTAG) is an industry standard for verifying designs of and testing of circuits after manufacture.

There are several advanced topics in verification engineering if you want to pursue independent projects that can help differentiate your resume and grow your potential career paths, including (links optional for reading assignment):
<ul>
  <li><a href = "https://en.wikipedia.org/wiki/Automatic_test_pattern_generation">Automatic Test Pattern Generation</a></li>
  <li><a href = "https://en.wikipedia.org/wiki/Scan_chain">Scan Chaining</a></li>
  <li><a href = "https://en.wikipedia.org/wiki/Boundary_scan">Boundary Scanning</a></li>
  <li><a href = "https://link.springer.com/chapter/10.1007/0-306-47504-9_7">Built-In Logic Block Observers</a></li>
</ul>

### Step 1: Develop the proc name and input/output channels in <i>reverse</i> of the proc under test.

We want to have the channels of the test match the proc we are testing. But why do we want them <i>backwards</i>? This is because our test proc (proc_fsm_test) is going to output the test vectors we are inputing into the proc under test, and our test proc will receive the outputs of the DUT as inputs to verify the results.

    // Step 1 - Flip the ordering of the channels of the Design Under Test
    control_test: chan<u1> out;
    output_test: chan<u1> in;


### Step 2 - Add a terminator bool as a signal to eventually stop the FSM

We will output a terminator boolean that will be true when we want to conclude the test.

    // Step 2 - Add a terminator bool
    terminator: chan<bool> out;

<img src = "https://github.com/mmorri22/cse30321/blob/main/xls/Reading%2014/Test%201.png?raw=true" height=400>

### Step 3 - Initialize with an empty init procedure

We will output a terminator boolean that will be true when we want to conclude the test.

    // Empty init
    init { () }


### Step 4 - Develop the config procedure, but only have the terminator as the input.

We will develop a config module, except this time, we only want the terminator as the input to config (compared to the proc under test, we we have all channels as inputs.

Next, we will develop sender, receiver pairs. Consider this example:

      // 4.2 - define a channel pair (sender, receiver) to communicate
      let (control_test, control_receiver) = chan<u1>;

We want the <code>control_test</code> to represent the signals we will send to the proc under test, and the <code>control_receiver</code> that will be connected to the proc under test itself.

Now, let's consider the opposite order, where the output_sender is sending the resylt

      // 4.2 - define a channel pair (sender, receiver) to communicate
      let (output_sender, output_test) = chan<u1>;

In this case, the <code>output_receiver</code> is the output that we receive from the proc under test, and the <code>output_test</code> is what is being received by testing proc.

Now, we will <b>spawn</b> a proc under test using the

    // 4-3 - Spawn a Proc that you will test
    spawn fsm_example_proc(control_receiver, output_sender);

Finally, we will configure the channels as we did previously.

      // 4-4 - Config the outputs as before
      (control_test, output_test, terminator)

<img src = "https://github.com/mmorri22/cse30321/blob/main/xls/Reading%2014/Test%202.png?raw=true" height=400>

### Step 5 - Write the <code>next</code> section header

The <code>next</code> section will allow us to test cases, where we will perform each test on a new clock signal, or <code>token</code>. The tuple format is <code>token, state</code>. In this case, since the proc is being used to test the state, the state is <code>state: ()</code>

    // Step 5 - Use the next procedure and state to keep the running.
    next(tok: token, state: ()) {

### Step 6- Developing the tests

There are four steps to developing a test.

#### 6.1 - Connect each input to the <code>tok</code> using the <code>send</code> function.

The <code>send</code> function returns a token as well. We use this approach in order to connect all inputs in parallel, and then join then, as we will see in step 6.2.

The format is: <code>let tok_x = send(tok, input_channel_name, value );</code>

Here is an example from our FSM:

      /////// Test 1 - Control = 1, Output - 0
      // send `1` as the control_test input.
      let tok_a = send(tok, control_test, u1:1);

#### 6.2 - Use the <code>join</code> function to combine all the <code>tok</code> return values from 6.1.

This approach allows us to bring in all the channels in parallel and time them so they get to the Design Under Test at the same time. Use the original <code>tok</code> variable to ensure they are all combined. The format is:

    let tok = join(tok_1, tok_2, tok_3, ..., tok_N);

In the case of our Finite State Machine, we only have one input channel, so we join here to ensure timing is maintained.

      // wait for both send to complete (and overwrite the existing `tok` binding).
      let tok = join(tok_a);

<img src = "https://github.com/mmorri22/cse30321/blob/main/xls/Reading%2014/Test%203.png?raw=true" height=400>

#### 6.3 - Use the <code>recv</code> function to connect the output of the test to the input of the channel.

In the code below, the <code>output_test</code> is the output from the Design Under Test, and the <code>output_sender</code> is the input to the testing module.

      // receive and assert the result value.
      let (tok, output_sender) = recv(tok, output_test);

In this case, we use <code>assert_eq</code> to comare the <code>output_sender</code> to the expected result, which for the first test in our truth table was <code>u1:0</code>:

      // Test that the output is 0
      assert_eq(output_sender, u1:0);

<img src = "https://github.com/mmorri22/cse30321/blob/main/xls/Reading%2014/Test%204.png?raw=true" height=400>

### Step 7 - Terminate the test with the terminate signal

Since we are perfoming a sequential test, we must inform the test when to terminate. This command will be send after the final test.

      /////////// Complete the test by sending tok, terminator, true
      let tok = send(tok, terminator, true);

## Diagram of the testing module and the overall testing flow.

In the cell below, we have the previously presented <code>fsm_example_proc</code> as well as a new <code>proc proc_fsm_test</code>.

In the first diagram, we will study a diagram for the <code>proc proc_fsm_test</code> procedure. Notice hoe the output port generates the value to be sent to the Design Under Test, and the input port receives the output values from the DUT.

<img src = "https://github.com/mmorri22/cse30321/blob/main/xls/Reading%2014/fsm_test_proc.png?raw=true" height=400>

In the second diagram, all the pieces are put together. Note the connections between the inputs and outputs of the FSM Design Under Test and the testing proc:

<img src = "https://github.com/mmorri22/cse30321/blob/main/xls/Reading%2014/fsm_proc_final.png?raw=true">

As you review the final code block below, compare the code with the diagrams above to see how the design / test flow works together.

In [None]:
%%dslx --top=fsm_example_proc --pipeline_stages=1 --flop_inputs=false --flop_outputs=false --clock_period_ps=4650 --reset=reset

import std;

// Step 1 - Develop the combinational elements of the proc.
struct STATE_RES{
    output:u1,
    state:u3
}


fn fsm_logic( curr_state:u3, control:u1 ) -> STATE_RES {

    match( curr_state ){

        // Start State 0 - Output is 0
        u3:0 => {
            // Next State for control 0 is 1
            if(control == u1:0){

                // Output, Next State
                STATE_RES{ output:u1:0, state:u3:1 }
            }
            // Next State is 2
            else{
                STATE_RES{ output:u1:0, state:u3:2 }
            }
        },

        // State 1, output is 0
        u3:1 =>{
            // Next State for control 0 is 1
            if(control == u1:0){

                // Output, Next State
                STATE_RES{ output:u1:0, state:u3:1 }
            }
            // Next State is 3
            else{
                STATE_RES{ output:u1:0, state:u3:3 }
            }
        },

        // State 2, output is 0
        u3:2 =>{
             // Next State for control 0 is 4
            if(control == u1:0){

                // Output, Next State
                STATE_RES{ output:u1:0, state:u3:4 }
            }
            // Next State is 2
            else{
                STATE_RES{ output:u1:0, state:u3:2 }
            }
        },

        // State 3, output is 1
        u3:3 =>{
             // Next State for control 0 is 4
            if(control == u1:0){

                // Output, Next State
                STATE_RES{ output:u1:1, state:u3:4 }
            }
            // Next State is 2
            else{
                STATE_RES{ output:u1:1, state:u3:2 }
            }
        },

        // State 4, output is 1 - Final Case
        _ =>{
             // Next State for control 0 is 1
            if(control == u1:0){

                // Output, Next State
                STATE_RES{ output:u1:1, state:u3:1 }
            }
            // Next State is 3
            else{
                STATE_RES{ output:u1:1, state:u3:3 }
            }
        }
    }
}

// Step 2 - Define the proc name and input and output challens
proc fsm_example_proc {

    // define control as input channel
    control: chan<u1> in;

    // define output as an output channel
    output: chan<u1> out;

    // `init` returns the output and state `0` for the sequential logic.
    // Initialize the state, and we will save this as the intermediate register
    init {
        u3:0
    }

    // `config` takes external channels as parameters and returns channel used by the sequential logic.
    // similar to the module input and output definitions in Verilog.
    config(control: chan<u1> in, output: chan<u1> out) {
        (control, output)
    }

    // Step 5 - Defining the internal procedure
    next(tok: token, state: u3) {

        // receive one bit value control from the input channel.
        let (tok_a, control_in) = recv(tok, control);

        // Join all recv operation so they are performed in parallel.
        let tok_b = join(tok_a);

        // multiply `a` and `b` and add (accumulate) them to the previous `acc` value.
        let result:STATE_RES = fsm_logic(state, control_in);

        // send the result and carry to the output channel.
        send(tok, output, result.output);

        // return the accumulated result as the new state.
        result.state
    }

}


#[test_proc]
proc proc_fsm_test {

  // define channels to communicate with the proc under test
  // Step 1 - Flip the ordering of the channels of the Design Under Test
  control_test: chan<u1> out;
  output_test: chan<u1> in;

  // Step 2 - Add a terminator bool
  terminator: chan<bool> out;

  // Empty init
  init { () }

  // 4.1 - `config` takes the `terminator` output channel as an argument.
  config(terminator: chan<bool> out) {

    // 4.2 - Outputs: define a channel pair (sender, receiver) to communicate output_test values as an output from the test module
    let (control_test, control_receiver) = chan<u1>;

    // 4.2 - define a channel pair (sender, receiver) to communicate control_test as an input to the test module
    let (output_sender, output_test) = chan<u1>;

    // 4-3 - Spawn a Proc that you will test
    spawn fsm_example_proc(control_receiver, output_sender);

    // 4-4 - Config the outputs as before
    (control_test, output_test, terminator)
  }

  // Step 5 - Use the next procedure and state to keep the running.
  next(tok: token, state: ()) {

      /////// Test 1 - Control = 1, Output - 0
      // send `1` as the control_test input.
      let tok_a = send(tok, control_test, u1:1);

      // wait for both send to complete (and overwrite the existing `tok` binding).
      let tok = join(tok_a);

      // receive and assert the result value.
      let (tok, output_sender) = recv(tok, output_test);

      // Test that the output is 0
      assert_eq(output_sender, u1:0);


      /////// Test 2 - Control = 1, Output - 0
      // send `1` as the control_test input.
      let tok_a = send(tok, control_test, u1:1);

      // wait for both send to complete (and overwrite the existing `tok` binding).
      let tok = join(tok_a);

      // receive and assert the result value.
      let (tok, output_sender) = recv(tok, output_test);

      // Test that the output is 0
      assert_eq(output_sender, u1:0);


      /////// Test 3 - Control = 0, Output - 0
      // send `0` as the first control_test input.
      let tok_a = send(tok, control_test, u1:0);

      // wait for both send to complete (and overwrite the existing `tok` binding).
      let tok = join(tok_a);

      // receive and assert the result value.
      let (tok, output_sender) = recv(tok, output_test);

      // Test that the output is 0
      assert_eq(output_sender, u1:0);


      /////// Test 4 - Control = 1, Output - 1
      // send `1` as the control_test input.
      let tok_a = send(tok, control_test, u1:1);

      // wait for both send to complete (and overwrite the existing `tok` binding).
      let tok = join(tok_a);

      // receive and assert the result value.
      let (tok, output_sender) = recv(tok, output_test);

      // Test that the output is 1
      assert_eq(output_sender, u1:1);


      /////// Test 5 - Control = 0, Output - 1
      // send `0` as the control_test input.
      let tok_a = send(tok, control_test, u1:0);

      // wait for both send to complete (and overwrite the existing `tok` binding).
      let tok = join(tok_a);

      // receive and assert the result value.
      let (tok, output_sender) = recv(tok, output_test);

      // Test that the output is 1
      assert_eq(output_sender, u1:1);


      /////// Test 6 - Control = 0, Output - 1
      // send `0` as the control_test input.
      let tok_a = send(tok, control_test, u1:0);

      // wait for both send to complete (and overwrite the existing `tok` binding).
      let tok = join(tok_a);

      // receive and assert the result value.
      let (tok, output_sender) = recv(tok, output_test);

      // Test that the output is 1
      assert_eq(output_sender, u1:1);


      /////// Test 7 - Control = 0, Output - 0
      // send `0` as the control_test input.
      let tok_a = send(tok, control_test, u1:0);

      // wait for both send to complete (and overwrite the existing `tok` binding).
      let tok = join(tok_a);

      // receive and assert the result value.
      let (tok, output_sender) = recv(tok, output_test);

      // Test that the output is 0
      assert_eq(output_sender, u1:0);


      /////// Test8 - Control = 1, Output - 0
      // send `0` as the control_test input.
      let tok_a = send(tok, control_test, u1:1);

      // wait for both send to complete (and overwrite the existing `tok` binding).
      let tok = join(tok_a);

      // receive and assert the result value.
      let (tok, output_sender) = recv(tok, output_test);

      // Test that the output is 0
      assert_eq(output_sender, u1:0);



      /////// Test9 - Control = 1, Output - 1
      // send `0` as the control_test input.
      let tok_a = send(tok, control_test, u1:1);

      // wait for both send to complete (and overwrite the existing `tok` binding).
      let tok = join(tok_a);

      // receive and assert the result value.
      let (tok, output_sender) = recv(tok, output_test);

      // Test that the output is 1
      assert_eq(output_sender, u1:1);


      /////////// Complete the test by sending tok, terminator, true
      let tok = send(tok, terminator, true);
  }

}

UsageError: Cell magic `%%dslx` not found.


In [None]:
#@title Now let's synthesize our Finite State Machine {display-mode: "form"}
#@markdown - Click the ▷ button to run synthesis, static timing analysis and global placement
#@markdown - This 1-bit FSM will successfully synthesize in 30 micron core width

placement_density = 1 #@param {type:"slider", min:0, max:1.0, step:0.01}
clock_period_ps = 10000 #@param {type:"slider", min:0, max:100000, step:1}
clock_period_ns = clock_period_ps / 1000.0
core_area = 'absolute' # @param ["relative", "absolute"]

# @markdown ### core_area_relative
# @markdown compute core area from the design size
utilization_percent = 100 #@param {type:"slider", min:0, max:100, step:1}
# @markdown ### core_area_absolute
# @markdown set core area explicitly
core_width_microns = 30 #@param {type:"slider", min:0, max:1000, step:1}
core_padding_microns = 0 #@param {type:"slider", min:0, max:100, step:1}

from IPython.display import display, display_png
import IPython.display
import PIL.Image

if core_area == 'relative':
  core_area_value = RelativeCoreArea(utilization_percent)
else:
  core_area_value = AbsoluteCoreArea(core_width_microns, core_padding_microns)

tb = widgets.TabBar(['synthesis', 'netlist', 'timing', 'placement', 'area', 'power'])

# run yosys synthesis
with tb.output_to('synthesis', select=True):
  synth_results = run_synthesis()
  tb.clear_tab()

with tb.output_to('synthesis', select=False):
  grid = widgets.Grid(1, 2, header_row=False, header_column=False)
  with grid.output_to(0, 0):
    display(synth_results.cell_stats)
  with grid.output_to(0, 1):
    display(synth_results.design_stats)

# display gate level netlist
with tb.output_to('netlist', select=False):
  with synth_results.synth_v.open('r') as f:
    print(f.read())


# run opensta static timing analysis
with tb.output_to('timing', select=True):
  opensta_results = run_opensta()
  tb.clear_tab()

# display opensta report
with tb.output_to('timing', select=False):
  display(
      opensta_results.style.hide(axis='index')
      .background_gradient(subset=['delay'], cmap='Oranges')
      .bar(subset=['time'], color='lightblue')
  )

# run openroad placement
with tb.output_to('placement', select=True):
  placement_results = run_placement(
      clock_period_ps=clock_period_ps,
      placement_density=placement_density,
      core_area=core_area_value,
  )
  tb.clear_tab()

# display global placement layout
with tb.output_to('placement', select=False):
  if placement_results.openroad_global_placement_layout.exists():
    img = PIL.Image.open(placement_results.openroad_global_placement_layout)
    img = img.resize((500, 500))
    display_png(img)

# display area estimate
with tb.output_to('area', select=False):
  display(
      placement_results.area.style.format('{:.3f} μm²', subset=['area'])
      .format('{:.2f} %', subset=['utilization'])
      .bar(subset=['utilization'], color='lightblue', vmin=0, vmax=100)
  )

# display power metrics
with tb.output_to('power', select=False):
  display(
      placement_results.power.style.format('{:.3f} uW')
      .background_gradient(
          subset=pd.IndexSlice[
              placement_results.power.index[:-1], ['internal', 'switching', 'leakage']
          ],
          cmap='Oranges',
          axis=None,
      )
      .bar(subset=['total'], color='lightcoral')
      .bar(
          subset=pd.IndexSlice[placement_results.power.index[-1:], :],
          color='lightcoral',
          axis='columns',
      )
  )

# 📄 README

Like what you see? 🤝 [Contact us](https://docs.google.com/forms/d/e/1FAIpQLSd1DNMoOxxr73mkIrZXhDWd1gn-jSsL7SMQry6y_JK0caDKlg/viewform?resourcekey=0-1YtZY34PHo-vug_UmFrMQg) 💬 [Join the chat](https://chat.google.com/room/AAAA8aUpxQc?cls=4)

# 🔒 Privacy

 `%%dslx` cell execution count is tracked using [Google Analytics](https://developers.google.com/analytics).