## Agile Hardware Design
***
# Network Design Case Study

<img src="./images/chisel_logo.svg" alt="agile hardware design logo" style="width: 20%; float:right"/>

Peter Hanping Chen, based on

- 1. UCB Bootcamp: configuration file load-ivy.sc: 
- https://github.com/freechipsproject/chisel-bootcamp/tree/master/source
- 2. Prof. Scott Beamer, sbeamer@ucsc.edu, CSE 228A
- https://classes.soe.ucsc.edu/cse228a/Winter24/

## Plan for Today

* Sketch of progressive development plan
* Starting from a crossbar
* Ending with a parameterized network generator

## Loading The Chisel Library Into a Notebook

In [26]:
// interp.load.module(os.Path(s"${System.getProperty("user.dir")}/../resource/chisel_deps.sc"))
val path = System.getProperty("user.dir") + "/source/load-ivy.sc"
//val path = System.getProperty("user.dir") + "/source/chisel_deps.sc"
println("path: "+path)
interp.load.module(ammonite.ops.Path(java.nio.file.FileSystems.getDefault().getPath(path)))

path: /home/peter/AIU/AIU_CS800_Chisel/500_UCSC_HWD/015_Network/001_Code/source/load-ivy.sc


[36mpath[39m: [32mString[39m = [32m"/home/peter/AIU/AIU_CS800_Chisel/500_UCSC_HWD/015_Network/001_Code/source/load-ivy.sc"[39m

In [27]:
import chisel3._
import chisel3.util._
import chiseltest._
import chiseltest.RawTester.test

[32mimport [39m[36mchisel3._
[39m
[32mimport [39m[36mchisel3.util._
[39m
[32mimport [39m[36mchiseltest._
[39m
[32mimport [39m[36mchiseltest.RawTester.test[39m

## Goals for Today

* Demonstrate progressive/iterative development of a generator for an _on-chip network_
  * Focus on process over polished end result
* Design abstractions and apply _inheritance_ to reuse code
* Caveats - today's design is a network generator in spirit, but lacks:
  * support for many messages in flight
  * reasonable test infrastructure
  * comprehensive flow control, multi-beat transfers
  * deadlock avoidance, quality-of-service (QoS) guarantees

In [None]:
## Our Crossbar (`XBar`) Revised from Prior Lectures (1/3)

<p>
<img src="./images/xbar.svg" alt="1-way ring network" style="width:50%;float:left" />

## Our Crossbar (`XBar`) Revised from Prior Lectures (2/3)

In [28]:
class Message(numDests: Int, width: Int) extends Bundle {
    val addr = UInt(log2Ceil(numDests).W)
    val data = UInt(width.W)
}

class XBarIO(numIns: Int, numOuts: Int, width: Int) extends Bundle {
    val in  = Vec(numIns, Flipped(Decoupled(new Message(numOuts, width))))
    val out = Vec(numOuts, Decoupled(new Message(numOuts, width)))
}

defined [32mclass[39m [36mMessage[39m
defined [32mclass[39m [36mXBarIO[39m

## Our Crossbar (`XBar`) Revised from Prior Lectures (3/3)

In [29]:
class XBar(numIns: Int, numOuts: Int, width: Int) extends Module {
    val io = IO(new XBarIO(numIns, numOuts, width))
    val arbs = Seq.fill(numOuts)(Module(new RRArbiter(new Message(numOuts, width), numIns)))
    for (ip <- 0 until numIns) {
        // https://www.youtube.com/watch?v=RNwI5zN3qAs
        // io.in(ip).ready := arbs.map{ _.io.in(ip).ready }.reduce { _ || _ }
        io.in(ip).ready := VecInit(arbs.map{ _.io.in(ip).ready })(io.in(ip).bits.addr)
    }
    for (op <- 0 until numOuts) {
        arbs(op).io.in.zip(io.in).foreach { case (arbIn, ioIn) =>
            arbIn.bits <> ioIn.bits
            arbIn.valid := ioIn.valid && (ioIn.bits.addr === op.U)
        }
        io.out(op) <> arbs(op).io.out
    }
}

// declaration example: new XBar(4,4,64)

defined [32mclass[39m [36mXBar[39m

## Refactor Parameters with Case Classes (1/2)

In [30]:
case class XBarParams(numHosts: Int, payloadSize: Int) {
    def addrBitW() = log2Ceil(numHosts)
}

class Message(p: XBarParams) extends Bundle {
    val addr = UInt(p.addrBitW.W)
    val data = UInt(p.payloadSize.W)
}

class PortIO(p: XBarParams) extends Bundle {
    val in = Flipped(Decoupled(new Message(p)))
    val out = Decoupled(new Message(p))
}

defined [32mclass[39m [36mXBarParams[39m
defined [32mclass[39m [36mMessage[39m
defined [32mclass[39m [36mPortIO[39m

## Refactor Parameters with Case Classes (2/2)

In [31]:
class XBar(p: XBarParams) extends Module {
    val io = IO(new Bundle {
        val ports = Vec(p.numHosts, new PortIO(p))
    })
    val arbs = Seq.fill(p.numHosts)(Module(new RRArbiter(new Message(p), p.numHosts)))
    for (ip <- 0 until p.numHosts) {
        io.ports(ip).in.ready := VecInit(arbs.map{ _.io.in(ip).ready })(io.ports(ip).in.bits.addr)
    }
    for (op <- 0 until p.numHosts) {
        arbs(op).io.in.zip(io.ports).foreach { case (arbIn, port) =>
            arbIn.bits <> port.in.bits
            arbIn.valid := port.in.valid && (port.in.bits.addr === op.U)
        }
        io.ports(op).out <> arbs(op).io.out
    }
}

// declaration example: new XBar(XBarParams(4,64))

defined [32mclass[39m [36mXBar[39m

## Template Payload Data Type (1/2)

In [32]:
case class XBarParams[T <: chisel3.Data](numHosts: Int, payloadT: T) {
    def addrBitW() = log2Ceil(numHosts)
}

class Message[T <: chisel3.Data](p: XBarParams[T]) extends Bundle {
    val addr = UInt(p.addrBitW.W)
    val data = p.payloadT
}

class PortIO[T <: chisel3.Data](p: XBarParams[T]) extends Bundle {
    val in = Flipped(Decoupled(new Message(p)))
    val out = Decoupled(new Message(p))
}

defined [32mclass[39m [36mXBarParams[39m
defined [32mclass[39m [36mMessage[39m
defined [32mclass[39m [36mPortIO[39m

## Template Payload Data Type (2/2)

In [33]:
class XBar[T <: chisel3.Data](p: XBarParams[T]) extends Module {
    val io = IO(new Bundle {
        val ports = Vec(p.numHosts, new PortIO(p))
    })
    val arbs = Seq.fill(p.numHosts)(Module(new RRArbiter(new Message(p), p.numHosts)))
    for (ip <- 0 until p.numHosts) {
        io.ports(ip).in.ready := VecInit(arbs.map{ _.io.in(ip).ready })(io.ports(ip).in.bits.addr)
    }
    for (op <- 0 until p.numHosts) {
        arbs(op).io.in.zip(io.ports).foreach { case (arbIn, port) =>
            arbIn.bits <> port.in.bits
            arbIn.valid := port.in.valid && (port.in.bits.addr === op.U)
        }
        io.ports(op).out <> arbs(op).io.out
    }
}

// declaration example: new XBar(XBarParams(4,UInt(64.W)))

defined [32mclass[39m [36mXBar[39m

## Need for Multi-hop Networks

* Can only make a crossbar so big, at some point will need a _multi-hop_ interconnect
* Sending messages over multiple hops requires _routing_ messages to right next hop

### Moving to a Ring Network
* A _ring network_ is a simple topology in 1-dimension
* _Routing:_ (for now) if not at destination, send to next hop
* _Plan:_ will develop independently first, then will look for commonality with `XBar`

<img src="./images/ring1.svg" alt="1-way ring network" style="width:50%;float:left" />

## First Implementation of a Ring Network

In [34]:
class RingRouter[T <: chisel3.Data](p: XBarParams[T], id: Int) extends Module {
    val io = IO(new Bundle{
        val in = Flipped(Decoupled(new Message(p)))
        val out = Decoupled(new Message(p))
        val host = new PortIO(p)
    })
    val forMe = io.in.bits.addr === id.U
    // INCOMPLETE, but gives spirit
    io.host.in.ready := io.out.ready
    io.host.out.valid := forMe && io.in.valid
    io.host.out.bits := io.in.bits
    io.in.ready := forMe && io.host.out.ready || io.out.ready
    io.out.valid := (io.in.fire && !forMe) || io.host.in.fire
    io.out.bits := Mux(io.host.in.fire, io.host.in.bits, io.in.bits)
}

class RingNetwork[T <: chisel3.Data](p: XBarParams[T]) extends Module {
    val io = IO(new Bundle {
        val ports = Vec(p.numHosts, new PortIO(p))
    })
    val routers = Seq.tabulate(p.numHosts){ id => new RingRouter(p, id)}
    routers.foldLeft(routers.last){ (prev, curr) => prev.io.out <> curr.io.in; curr}
    routers.zip(io.ports).foreach { case (router, port) => router.io.host <> port}
}

defined [32mclass[39m [36mRingRouter[39m
defined [32mclass[39m [36mRingNetwork[39m

## Looking for Commonality between `XBar` & `RingNetwork`

* For users, choosing one or the other requires some code changes
* _Commonality:_ both provide abstraction of network with decoupled bidirectional ports (interface)

<img src="./images/ring1.svg" alt="1-way ring network" style="width:50%;float:left" />
<img src="./images/xbar.svg" alt="1-way ring network" style="width:50%;float:right" />

## Generic Port for Network

In [35]:
case class NetworkParams[T <: chisel3.Data](numHosts: Int, payloadT: T) {
    def addrBitW() = log2Ceil(numHosts)
}

class Message[T <: chisel3.Data](p: NetworkParams[T]) extends Bundle {
    val addr = UInt(p.addrBitW.W)
    val data = p.payloadT
}

class PortIO[T <: chisel3.Data](p: NetworkParams[T]) extends Bundle {
    val in = Flipped(Decoupled(new Message(p)))
    val out = Decoupled(new Message(p))
}

abstract class Network[T <: chisel3.Data](p: NetworkParams[T]) extends Module {
    val io = IO(new Bundle {
        val ports = Vec(p.numHosts, new PortIO(p))
    })
}

defined [32mclass[39m [36mNetworkParams[39m
defined [32mclass[39m [36mMessage[39m
defined [32mclass[39m [36mPortIO[39m
defined [32mclass[39m [36mNetwork[39m

## Redo `XBar` with Inherited Interface

In [36]:
class XBar[T <: chisel3.Data](p: NetworkParams[T]) extends Network[T](p) {
    val arbs = Seq.fill(p.numHosts)(Module(new RRArbiter(new Message(p), p.numHosts)))
    for (ip <- 0 until p.numHosts) {
        io.ports(ip).in.ready := VecInit(arbs.map{ _.io.in(ip).ready })(io.ports(ip).in.bits.addr)
    }
    for (op <- 0 until p.numHosts) {
        arbs(op).io.in.zip(io.ports).foreach { case (arbIn, port) =>
            arbIn.bits <> port.in.bits
            arbIn.valid := port.in.valid && (port.in.bits.addr === op.U)
        }
        io.ports(op).out <> arbs(op).io.out
    }
}

// declaration example: new XBar(NetworkParams(4,UInt(64.W)))

defined [32mclass[39m [36mXBar[39m

## Redo `RingNetwork` with Inherited Interface

In [37]:
class RingRouter[T <: chisel3.Data](p: NetworkParams[T], id: Int) extends Module {
    val io = IO(new Bundle{
        val in = Flipped(Decoupled(new Message(p)))
        val out = Decoupled(new Message(p))
        val host = new PortIO(p)
    })
    val forMe = io.in.bits.addr === id.U
    // INCOMPLETE, but gives spirit
    io.host.in.ready := io.out.ready
    io.host.out.valid := forMe && io.in.valid
    io.host.out.bits := io.in.bits
    io.in.ready := forMe && io.host.out.ready || io.out.ready
    io.out.valid := (io.in.fire && !forMe) || io.host.in.fire
    io.out.bits := Mux(io.host.in.fire, io.host.in.bits, io.in.bits)
}

class RingNetwork[T <: chisel3.Data](p: NetworkParams[T]) extends Network[T](p) {
    val routers = Seq.tabulate(p.numHosts){ id => new RingRouter(p, id)}
    routers.foldLeft(routers.last){ (prev, curr) => prev.io.out <> curr.io.in; curr}
    routers.zip(io.ports).foreach { case (router, port) => router.io.host <> port}
}

defined [32mclass[39m [36mRingRouter[39m
defined [32mclass[39m [36mRingNetwork[39m

## Assessing Revised `RingNetwork`

* Parameterized number of hosts √
* Parameterized data type √
* _Poor performance:_ messages potentially take long routes
* _Missing:_ graceful interchangability with `XBar`

## Improve Ring by Sending Message in Shorter Direction

* Make links between routers _bidirectional_ and send message to closer one
  * Reduces number of hops
  * Will complicate deadlocks and such, but will overlook that for today

<img src="./images/ring2.svg" alt="2-way ring network" style="width:50%;align:left" />

## Improve Ring by Refactoring Router

* Recognize opportunity for _reuse_
  * Router (internally) is basically a crossbar (switch) with routing logic
  * _Routing logic:_ current router & destination address -> next port

<img src="./images/ringrouter.svg" alt="2-way ring network" style="width:70%;align:left" />

## `RingRouter` Revised for Bidirectional & Reuse of `XBar`

In [38]:
class RingRouter[T <: chisel3.Data](p: NetworkParams[T], id: Int) extends Module {
    val io = IO(new Bundle{
        val ports = Vec(3, new PortIO(p)) // port(0) for left, port(1) for right, port(2) for host
    })

    def nextHop(destAddr: UInt): UInt = { // routing logic
        val distTowards0 = Mux(destAddr < id.U, id.U - destAddr, id.U + (p.numHosts.U - destAddr))
        val distTowards1 = Mux(destAddr > id.U, destAddr - id.U, (p.numHosts.U - id.U) + destAddr)
        Mux(destAddr === id.U, 2.U, Mux(distTowards0 < distTowards1, 0.U, 1.U))
    }

    val xbarParams = NetworkParams(3, new Message(p))
    val xbar = new XBar(xbarParams)
    val portsRouted = io.ports map { port =>  
        val routed = Wire(new PortIO(xbarParams))
        // INCOMPLETE, need to connect ready & valids
        routed.in.bits.addr := nextHop(port.in.bits.addr)
        routed.in.bits.data := port.in.bits
        port.out.bits := routed.out.bits.data
        routed
    }

    portsRouted.zip(xbar.io.ports).foreach{ case (extPort, xbarPort) => extPort <> xbarPort }
}

class RingNetwork[T <: chisel3.Data](p: NetworkParams[T]) extends Network[T](p) {
    val routers = Seq.tabulate(p.numHosts){ id => new RingRouter(p, id)}
    routers.foldLeft(routers.last){ (prev, curr) => prev.io.ports(1) <> curr.io.ports(0); curr }
    routers.zip(io.ports).foreach { case (router, port) => router.io.ports(2) <> port}
}

defined [32mclass[39m [36mRingRouter[39m
defined [32mclass[39m [36mRingNetwork[39m

## Assessing Revised `RingNetwork`

* Parameterized number of hosts √
* Parameterized data type √
* Sends messages in shorter direction √
* _Missing:_ graceful interchangability with `XBar`

## Making a `Network` Factory (1/3)

* Can make specific sub-types for each topology

In [39]:
abstract class NetworkParams[T <: chisel3.Data] {
    def numHosts: Int
    def payloadT: T
    val addrBitW = log2Ceil(numHosts)
}

case class XBarParams[T <: chisel3.Data](numHosts: Int, payloadT: T) extends NetworkParams[T]

case class RingParams[T <: chisel3.Data](numHosts: Int, payloadT: T) extends NetworkParams[T]

class Message[T <: chisel3.Data](p: NetworkParams[T]) extends Bundle {
    val addr = UInt(p.addrBitW.W)
    val data = p.payloadT
}

class PortIO[T <: chisel3.Data](p: NetworkParams[T]) extends Bundle {
    val in = Flipped(Decoupled(new Message(p)))
    val out = Decoupled(new Message(p))
}

abstract class Network[T <: chisel3.Data](p: NetworkParams[T]) extends Module {
    val io = IO(new Bundle {
        val ports = Vec(p.numHosts, new PortIO(p))
    })
}

defined [32mclass[39m [36mNetworkParams[39m
defined [32mclass[39m [36mXBarParams[39m
defined [32mclass[39m [36mRingParams[39m
defined [32mclass[39m [36mMessage[39m
defined [32mclass[39m [36mPortIO[39m
defined [32mclass[39m [36mNetwork[39m

## Making a `Network` Factory (2/3)

* Revise generators to use params specific to their type

In [40]:
class XBar[T <: chisel3.Data](p: XBarParams[T]) extends Network[T](p) {
    val arbs = Seq.fill(p.numHosts)(Module(new RRArbiter(new Message(p), p.numHosts)))
    for (ip <- 0 until p.numHosts) {
        io.ports(ip).in.ready := VecInit(arbs.map{ _.io.in(ip).ready })(io.ports(ip).in.bits.addr)
    }
    for (op <- 0 until p.numHosts) {
        arbs(op).io.in.zip(io.ports).foreach { case (arbIn, port) =>
            arbIn.bits <> port.in.bits
            arbIn.valid := port.in.valid && (port.in.bits.addr === op.U)
        }
        io.ports(op).out <> arbs(op).io.out
    }
}

class RingRouter[T <: chisel3.Data](p: RingParams[T], id: Int) extends Module {
    val io = IO(new Bundle{
        val ports = Vec(3, new PortIO(p)) // port(2) for host
    })

    val xbarParams = XBarParams(3, new Message(p))
    val xbar = new XBar(xbarParams)

    def nextHop(destAddr: UInt): UInt = {
        val distTowards0 = Mux(destAddr < id.U, id.U - destAddr, id.U + (p.numHosts.U - destAddr))
        val distTowards1 = Mux(destAddr > id.U, destAddr - id.U, (p.numHosts.U - id.U) + destAddr)
        Mux(destAddr === id.U, 2.U, Mux(distTowards0 < distTowards1, 0.U, 1.U))
    }
    val portsRouted = io.ports map { port =>  
        val routed = Wire(new PortIO(xbarParams))
        // INCOMPLETE, need to connect ready & valids
        routed.in.bits.addr := nextHop(port.in.bits.addr)
        routed.in.bits.data := port.in.bits
        port.out.bits := routed.out.bits.data
        routed
    }

    portsRouted.zip(xbar.io.ports).foreach{ case (extPort, xbarPort) => extPort <> xbarPort }
}

class RingNetwork[T <: chisel3.Data](p: RingParams[T]) extends Network[T](p) {
    val routers = Seq.tabulate(p.numHosts){ id => new RingRouter(p, id)}
    routers.foldLeft(routers.last){ (prev, curr) => prev.io.ports(1) <> curr.io.ports(0); curr }
    routers.zip(io.ports).foreach { case (router, port) => router.io.ports(2) <> port}
}

defined [32mclass[39m [36mXBar[39m
defined [32mclass[39m [36mRingRouter[39m
defined [32mclass[39m [36mRingNetwork[39m

## Making a `Network` Factory (3/3)

* Can pattern match on params for type

In [41]:
object Network {
    def apply[T <: chisel3.Data](p: NetworkParams[T]): Network[T] = p match {
        case xp: XBarParams[T] => new XBar(xp)
        case rp: RingParams[T] => new RingNetwork(rp)
    }
}

// Network(XBarParams(...))

defined [32mobject[39m [36mNetwork[39m

## Assessing Revised `RingNetwork`

* Parameterized number of hosts √
* Parameterized data type √
* Sends messages in shorter direction √
* Graceful interchangability with `XBar` √

## Let's Add More Network Topologies

* What about a _mesh_ or a _torus_ instead of just a ring?
* Can we share components between these networks?
* Common abstractions:
  * Router (including routing logic)
  * Router interconnections

## Torus Topology

<img src="./images/torus.svg" alt="torus network" style="width:60%; align:left" />

## Generalizing Router

In [42]:
abstract class Router[T <: chisel3.Data] (p: NetworkParams[T], numPorts: Int, id: Int) extends Module {
    val io = IO(new Bundle{
        val ports = Vec(numPorts, new PortIO(p))
        // convention: last port is for attached host
    })

    def nextHop(destAddr: UInt): UInt
    
    val xbarParams = XBarParams(numPorts, new Message(p))
    val xbar = new XBar(xbarParams)
    val portsRouted = io.ports map { port =>  
        val routed = Wire(new PortIO(xbarParams))
        // INCOMPLETE, need to connect ready & valids
        routed.in.bits.addr := nextHop(port.in.bits.addr)
        routed.in.bits.data := port.in.bits
        port.out.bits := routed.out.bits.data
        routed
    }
    portsRouted.zip(xbar.io.ports).foreach{ case (extPort, xbarPort) => extPort <> xbarPort }
}

abstract class MultiHopNetwork[T <: chisel3.Data](p: NetworkParams[T]) extends Network[T](p) {
    val routers: Seq[Router[T]]
    def connectRouters()
    connectRouters()
    routers.zip(io.ports).foreach { case (router, port) => router.io.ports.last <> port}
}

defined [32mclass[39m [36mRouter[39m
defined [32mclass[39m [36mMultiHopNetwork[39m

## `RingNetwork` Revised with `MultiHopNetwork`

In [43]:
class RingRouter[T <: chisel3.Data](p: RingParams[T], id: Int) extends Router[T](p,3,id) {
    def nextHop(destAddr: UInt): UInt = {
        val distTowards0 = Mux(destAddr < id.U, id.U - destAddr, id.U + (p.numHosts.U - destAddr))
        val distTowards1 = Mux(destAddr > id.U, destAddr - id.U, (p.numHosts.U - id.U) + destAddr)
        Mux(destAddr === id.U, 2.U, Mux(distTowards0 < distTowards1, 0.U, 1.U))
    }
}

class RingNetwork[T <: chisel3.Data](p: RingParams[T]) extends MultiHopNetwork[T](p) {
    val routers = Seq.tabulate(p.numHosts){ id => new RingRouter(p, id)}
    def connectRouters() {
        routers.foldLeft(routers.last){ (prev, curr) => prev.io.ports(1) <> curr.io.ports(0); curr }
    }
}

defined [32mclass[39m [36mRingRouter[39m
defined [32mclass[39m [36mRingNetwork[39m

## What About a 2D Torus?

In [44]:
case class TorusParams[T <: chisel3.Data](numHosts: Int, payloadT: T, numRows: Int) extends NetworkParams[T] {
    require(numHosts % numRows == 0)
    val numCols = numHosts / numRows
}

class TorusRouter[T <: chisel3.Data](p: TorusParams[T], id: Int) extends Router[T](p,5,id) {
    def nextHop(destAddr: UInt): UInt = {
        // FILL IN routing logic, e.g. dimension-ordered routing
        destAddr // INCORRECT, but will allow to compile
    }
}

class TorusNetwork[T <: chisel3.Data](p: TorusParams[T]) extends MultiHopNetwork[T](p) {
    val routers = Seq.tabulate(p.numHosts){ id => new TorusRouter(p, id)}
    def connectRouters() {
        // FILL IN 2D connectivity
    }
}

defined [32mclass[39m [36mTorusParams[39m
defined [32mclass[39m [36mTorusRouter[39m
defined [32mclass[39m [36mTorusNetwork[39m

## We Did It!

* Reused common components between network types via _inheritance_
  * Inherited interfaces as well as standard connections
  * Each network focuses on what makes it unique
  * Used case classes to pass around parameters
* Can even integrate behind a factory

In [45]:
object Network {
    def apply[T <: chisel3.Data](p: NetworkParams[T]): Network[T] = p match {
        case xp: XBarParams[T] => new XBar(xp)
        case rp: RingParams[T] => new RingNetwork(rp)
//         case TorusParams(numHosts, payloadT, 1) => new RingNetwork(RingParams(numHosts, payloadT))
        case tp: TorusParams[T] => new TorusNetwork(tp)
    }
}

// Network(TorusParams(16, UInt(128.W), 4))

defined [32mobject[39m [36mNetwork[39m

## Takeaways

* With progressive design, don't be afraid to make specific/concrete at first
  * Generalize when there is more than one instance
* Keep an eye out for _reuse_ opportunities
  * Copying & pasting (to start a module) is a sign there may be significant overlap
* _Inheritance_ is a powerful tool to reuse implementations and interfaces
* Can apply generics (templating) to increase flexibility