![Chisel](https://chisel.eecs.berkeley.edu/assets/img/chisel_64.png)

# Module 4: Advanced Chisel

#### Written by Paul Rigge (rigge@berkeley.edu)

### Introduction

Chisel is a framework that helps users write hardware generators.
The idea is to encode a designer's methodology into a program that can be used to create many categories of a circuit.
Some generators are very narrow in scope and can be used to generate a small set of designs, for example an adder parameterized by the width of its operands.
Other generators are very broad in scope and can generate circuits with a wide range of architectures, for example a rocket core that can either be in-order or out-of-order.

Most of the popular HDLs have some mechanisms for writing generators, but they are often difficult to use to write sophisticated generators because of limitations of the language.
Chisel can make writing sophisticated generators much easier because it is hosted in Scala.
This allows generator writers to use the powerful language features of Scala and the software development practices they enable that are not possible in HDLs.

This module will cover:

  - Writing parameterized modules
  - Writing parameterized IOs, and advanced IO for generators
  - Miscellaneous useful things like multiple clock domains, verilog blackboxing, and more

In [1]:
import $ivy.`edu.berkeley.cs::chisel3:3.0-SNAPSHOT_2017-07-19` 
import $ivy.`edu.berkeley.cs::chisel-iotesters:1.1-SNAPSHOT_2017-07-19`
import $ivy.`org.scalanlp::breeze:0.13.2`
import chisel3._
import chisel3.iotesters.{ChiselFlatSpec, Driver, PeekPokeTester, TesterOptionsManager}
import chisel3.util._

// Don't worry about understanding the code below. This is a pure Scala (no Chisel) implementation
// for finding primitive polynomials
// Based on Saxena & McClusky, "Primitive Polynomial Generation Algorithms: Implementation and Performance Analysis" (2004)
// http://crc.stanford.edu/crc_papers/CRC-TR-04-03.pdf
object Galois {
    def maxForDegree(n: Int): Long = {
        var max: Long = 1
        for (i <- 1 to n) {
            max *= 2
        }
        max - 1
    }
    def gp(degree: Int, l: Option[Int] = None, d: Option[Seq[Int]] = None): Seq[Int] = {
        val myL = l.getOrElse(degree - 1)
        val myD = d.getOrElse(scala.collection.mutable.ArrayBuffer.fill(degree + 1)(1))
        
        if (myL == 0) visit(myD) match {
            case Some(d) => d
            case _ => Seq()
        } else {
            val d0 = myD.updated(myL, 0)
            val d1 = myD.updated(myL, 1)
            val try0 = gp(degree, Some(myL - 1), Some(d0))
            if (try0.length > 0) return try0
            val try1 = gp(degree, Some(myL - 1), Some(d1))
            return try1
        }
        
    }
    def visit(d: Seq[Int]): Option[Seq[Int]] = {
        // println(s"visit() called on ${d.toString}")
        val n = d.length
        val max = maxForDegree(n)
        var f: Boolean = true
        var c: Long = 0
        var t: Int = 0
        val s = scala.collection.mutable.ArrayBuffer.fill(n)(1)
        do {
            c += 1
            t = 0
            for (i <- 0 until n) {
                t = (t ^ (s(i) & d(i)))
            }
            for (i <- 0 until n - 1) {
                s.update(i, s(i+1))
            }
            s.update(n-1, t)
            f = s.exists(_ == 0)
        } while (f)
        if (c == max) {
            Some(d)
        } else {
            None
        }
    }
}

[32mimport [39m[36m$ivy.$                                                  
[39m
[32mimport [39m[36m$ivy.$                                                          
[39m
[32mimport [39m[36m$ivy.$                            
[39m
[32mimport [39m[36mchisel3._
[39m
[32mimport [39m[36mchisel3.iotesters.{ChiselFlatSpec, Driver, PeekPokeTester, TesterOptionsManager}
[39m
[32mimport [39m[36mchisel3.util._

// Don't worry about understanding the code below. This is a pure Scala (no Chisel) implementation
// for finding primitive polynomials
// Based on Saxena & McClusky, "Primitive Polynomial Generation Algorithms: Implementation and Performance Analysis" (2004)
// http://crc.stanford.edu/crc_papers/CRC-TR-04-03.pdf
[39m
defined [32mobject[39m [36mGalois[39m

## Module Parameterization

### Simple Parameterization

An important building block to writing hardware generators is writing a parameterized module.
Chisel `Module`s are implemented as Scala classes, and any Scala objects can be used as parameters to a `Module`.

Providing widths and vector sizes is the simplest style of parameterization and is commonly done in Verilog.
The following code block gives examples of this style of parameterization in chisel.

Notice the use of `require()`.
Some values of a parameter may be nonsensical or unsupported by the generator.
`require()` allows the generator author to make a Chisel compile-time assertion with a message explaining what was wrong.
Note what happens when you change the values of the parameters in the last two lines.

In [2]:
class Adder(inWidth: Int, outWidth: Int) extends Module {
    require(inWidth > 0 && outWidth > 0, s"Widths should be positive, got $inWidth and $outWidth")
    require (outWidth >= inWidth, s"Output width should not be smaller than input width ($outWidth < $inWidth)")
    
    val io = IO(new Bundle {
        val in0 = Input(UInt(inWidth.W))
        val in1 = Input(UInt(inWidth.W))
        val out = Output(UInt(outWidth.W))
    })
    
    io.out := io.in0 + io.in1
}

class VecAdder(inWidth: Int, outWidth: Int, vecSize: Int) extends Module {
    require (vecSize > 0, "Vector length should be positive")
    require(inWidth > 0 && outWidth > 0, "Widths should be positive")
    require (outWidth >= inWidth, "Output width should not be smaller than input width")

    
    val io = IO(new Bundle {
        val in0 = Input(Vec(vecSize, UInt(inWidth.W)))
        val in1 = Input(Vec(vecSize, UInt(inWidth.W)))
        val out = Output(Vec(vecSize, UInt(outWidth.W)))
    })
    
    for (i <- 0 until vecSize) {
        io.out(i) := io.in0(i) + io.in1(i)
    }
}

class AdderTester(c: Adder) extends PeekPokeTester(c) {
    poke(c.io.in0, 3)
    poke(c.io.in1, 4)
    step(1)
    expect(c.io.out, 7)
}

class VecAdderTester(c: VecAdder) extends PeekPokeTester(c) {
    for (i <- 0 until c.io.in0.length) {
        poke(c.io.in0(i), 3 + i)
        poke(c.io.in1(i), 4 + i)
        expect(c.io.out(i), 7 + 2 * i)
    }
}

Driver(() => new Adder(3, 4)) { c => new AdderTester(c) }
Driver(() => new VecAdder(7, 8, 5)) { c => new VecAdderTester(c) }

[[35minfo[0m] [0.001] Elaborating design...
[[35minfo[0m] [0.069] Done elaborating.
Total FIRRTL Compile Time: 175.8 ms
Total FIRRTL Compile Time: 12.0 ms
End of dependency graph
Circuit state created
[[35minfo[0m] [0.002] SEED 1503283921517
test cmd1WrapperHelperAdder Success: 1 tests passed in 6 cycles taking 0.013844 seconds
[[35minfo[0m] [0.004] RAN 1 CYCLES PASSED
[[35minfo[0m] [0.000] Elaborating design...
[[35minfo[0m] [0.012] Done elaborating.
Total FIRRTL Compile Time: 53.3 ms
Total FIRRTL Compile Time: 31.2 ms
End of dependency graph
Circuit state created
[[35minfo[0m] [0.000] SEED 1503283922202
test cmd1WrapperHelperVecAdder Success: 5 tests passed in 5 cycles taking 0.038889 seconds
[[35minfo[0m] [0.019] RAN 0 CYCLES PASSED


defined [32mclass[39m [36mAdder[39m
defined [32mclass[39m [36mVecAdder[39m
defined [32mclass[39m [36mAdderTester[39m
defined [32mclass[39m [36mVecAdderTester[39m
[36mres1_4[39m: [32mBoolean[39m = [32mtrue[39m
[36mres1_5[39m: [32mBoolean[39m = [32mtrue[39m

### More Advanced Parameterization
The kind of parameterization shown in `Adder` and `VecAdder` is very basic.
Chisel `Module`s are Scala classes, so anything that can be used as an argument to a Scala class constructor can be a parameter for a Chisel `Module`.

This section will show a few LFSR implementations that are parameterized differently.
The code block below starts of by defining `LFSRIO` (a class that extends `Bundle`).
Each LFSR implementation will use the same `LFSRIO` class and reuse the same tester.

The `LFSRTester` is abstract because it doesn't define the `feedback` function- each concrete class that extends it should define feedback to match whatever LFSR it is testing.

In [3]:
type HasLFSRIO = { def io: LFSRIO; def n: Int }

class LFSRIO extends Bundle {
    val en  = Input(Bool())
    val out = Output(Bool())
    val state = Output(UInt())
}

abstract class LFSRTester[T <: Module](c: T) extends PeekPokeTester(c) {
    val n = c match {
        case c: HasLFSRIO => c.n
    }
    def feedback(state: BigInt): BigInt
    def nextState(state: BigInt): BigInt = {
        (state << 1) & BigInt("1"*n, 2) | feedback(state)
    }
    
    val numStates = BigInt("1" * n, 2)
    c match {
    case c: HasLFSRIO =>
        poke(c.io.en, 1)

        for (i <- BigInt(0) until numStates) {
            val next = nextState(peek(c.io.state))
            step(1)
            expect(c.io.state, next)
        }
    }
}

defined [32mtype[39m [36mHasLFSRIO[39m
defined [32mclass[39m [36mLFSRIO[39m
defined [32mclass[39m [36mLFSRTester[39m

This first example is perhaps somewhat similar to how you would write this generator in a language like Verilog.
The module has two parameters: number of state bits and an integer representing the feedback polynomial.
If the `i`th LSB of `feedback` is high, the `i`th bit of state is included in the feedback.


In [4]:
class LFSRwithIntParam(val n: Int, feedback: Int) extends Module {
    require(n > 1, "State must be at least 2 bits")
    
    val io = IO(new LFSRIO)
    
    val allOnes = (BigInt(1) << n) - 1 // n may be larger than the word size
    val state = RegInit(allOnes.U(n.W))
    val xors = Wire(Vec(n + 1, Bool()))
    
    xors(0) := false.B
    for (i <- 0 until n) {
        val sel = (feedback >> i) & 1
        if (sel != 0) {
            // this is a tap!
            xors(i + 1) := state(i) ^ xors(i)
        } else {
            // not a tap, just pass through
            xors(i + 1) := xors(i)
        }
    }
    
    io.out := state(0)
    when (io.en) {
        state := (state << 1) | xors(n)
    }
    io.state := state
}

Driver( () => new LFSRwithIntParam(4, 0xC) ) {
    c: LFSRwithIntParam => // new LFSR32Tester(c)
    new LFSRTester(c) {
        def feedback(state: BigInt) = ((state >> 3) ^ (state >> 2)) & 0x1
    }
}

[[35minfo[0m] [0.000] Elaborating design...
[[35minfo[0m] [0.019] Done elaborating.
Total FIRRTL Compile Time: 94.4 ms
Total FIRRTL Compile Time: 27.5 ms
End of dependency graph
Circuit state created
[[35minfo[0m] [0.000] SEED 1503283925212
test cmd3WrapperHelperLFSRwithIntParam Success: 15 tests passed in 20 cycles taking 0.039508 seconds
[[35minfo[0m] [0.035] RAN 15 CYCLES PASSED


defined [32mclass[39m [36mLFSRwithIntParam[39m
[36mres3_1[39m: [32mBoolean[39m = [32mtrue[39m

In this second example below, instead of representing `feedback` as an integer, we represent it as a function.
`feedback` takes the state (of type `UInt`) as an argument and produces the new bit to shift in (of type `Bool`).
This is possible because Scala is a functional programming language that treats functions as first class objects (you can pass them around as arguments and treat them like any other object).

Is the second example better than the first?
In this case, it made the code shorter (although defining `feedback` as a function may take more lines of code than defining it as an integer).
Using a function in this case eliminates some bit manipulation code which can be hard to read or debug.
One potential downside to having feedback defined as a function is that you could pass a function that has state or isn't linear, which would mean this is no longer an LFSR.


In [5]:
// Functions in Scala are first class objects
// UInt => Bool is the type signature for a function that takes a UInt as an argument
// and returns a Bool.
// The input will be the state of the lfsr and the return value will be the new
// bit to shift in.
class LFSRwithFuncParam(val n: Int, feedback: UInt => Bool) extends Module {
    require(n > 1, "State must be at least 2 bits")
    
    val io = IO(new LFSRIO)
    
    val allOnes = (1 << n) - 1 // n may be larger than the word size
    val state = RegInit(allOnes.U(n.W))
    val nextState = (state << 1) | feedback(state)
    
    io.out := state(0)
    when (io.en) {
        state := nextState
    }
    io.state := state
}

Driver( () => new LFSRwithFuncParam(4, {u: UInt => u(3) ^ u(2)}) ) {
    c => new LFSRTester(c) {
        def feedback(state: BigInt) = ((state >> 3) ^ (state >> 2)) & 0x1
    }
}

[[35minfo[0m] [0.000] Elaborating design...
[[35minfo[0m] [0.014] Done elaborating.
Total FIRRTL Compile Time: 22.4 ms
Total FIRRTL Compile Time: 14.0 ms
End of dependency graph
Circuit state created
[[35minfo[0m] [0.000] SEED 1503283926248
test cmd4WrapperHelperLFSRwithFuncParam Success: 15 tests passed in 20 cycles taking 0.011204 seconds
[[35minfo[0m] [0.010] RAN 15 CYCLES PASSED


defined [32mclass[39m [36mLFSRwithFuncParam[39m
[36mres4_1[39m: [32mBoolean[39m = [32mtrue[39m

The third example has one parameter: a list with `Booleans` that indicate if the bit in the corresponding position is included in the feedback polynomial.
This avoids the bit manipulation code of the first example while still enforcing that you are actually building an LFSR.

One thing to notice about the third example is that `n` is no longer a parameter.
The number of bits of state is set by the length of the list being passed in.
Also note that it is written using some functional programming constructs.

Which style of parameterization presented here is best?


In [6]:
class LFSRwithPolynomialParam(polynomial: Seq[Boolean]) extends Module {
    require (polynomial.length > 1, "State must be at least 2 bits")
    
    val io = IO(new LFSRIO)
    
    val n = polynomial.length
    val allOnes = (BigInt(1) << n) - 1 // n may be larger than the word size
    val state = RegInit(allOnes.U(n.W))
    // e.g. Seq(1, 0, 1) -> Seq( (1,0), (0,1), (1,2) )
    val polyWithIdxs = polynomial.zipWithIndex
    // e.g. Seq( (1,0), (0,1), (1,2) ) -> Seq( (1,0), (1,2) )
    val polyWithIdxsFiltered = polyWithIdxs.filter( x => x._1 )
    // e.g. Seq( (1,0), (1,2) ) -> Seq(0, 2)
    val feedback = polyWithIdxsFiltered.map ( x => state(x._2) ).reduce( _ ^ _ )
    // the last three lines could be combined into one step with
    //val feedback = polynomial.zipWithIndex.collect {
    //  case (sel, idx) if sel => state(idx)
    //}.reduce(_ ^ _)
    val nextState = (state << 1) | feedback
    
    io.out := state(0)
    when (io.en) {
        state := nextState
    }
    io.state := state
}

Driver( () => new LFSRwithPolynomialParam(Seq(false, false, true, true)) ) {
    c => new LFSRTester(c) {
        def feedback(state: BigInt) = ((state >> 3) ^ (state >> 2)) & 0x1
    }
}

[[35minfo[0m] [0.000] Elaborating design...
[[35minfo[0m] [0.007] Done elaborating.
Total FIRRTL Compile Time: 24.0 ms
Total FIRRTL Compile Time: 18.1 ms
End of dependency graph
Circuit state created
[[35minfo[0m] [0.000] SEED 1503283927752
test cmd5WrapperHelperLFSRwithPolynomialParam Success: 15 tests passed in 20 cycles taking 0.012743 seconds
[[35minfo[0m] [0.012] RAN 15 CYCLES PASSED


defined [32mclass[39m [36mLFSRwithPolynomialParam[39m
[36mres5_1[39m: [32mBoolean[39m = [32mtrue[39m

The ability to have more sophisticated objects as parameters to our `Module`s is very powerful.
Combined with the fact that we can write arbitrary Scala code with our Chisel code, this means we can write programs that generate low level parameters based on high level requirements.

In the following example, we write an `MSequence` `Module` that generates its own polynomial parameter.
It uses pure Scala to find a generator polynomial that will give a maximal-length LFSR and then passes the polynomial to the `LFSR` generator.
Don't worry too much about the details of how it finds the generator polynomial (which is done inside `object Galois { ... }` with the imports at the top of the file).

Having hardware generators that can do this sort of thing is extremely useful.
This can reduce hardcoded constants (even with good comments, you'll probably have trouble remembering where they came from, and good luck if you need to change something later on) and is much more robust than things like Matlab->text file->HDL.

In [7]:
class MSequence(val n: Int) extends Module {
    val io = IO(new LFSRIO)
    
    // find polynomial corresponding to m-sequence with nBits of state
    val poly = Galois.gp(n - 1).map(_ != 0)
    
    val lfsr = Module(new LFSRwithPolynomialParam(poly))
    io <> lfsr.io
}

Driver( () => new MSequence(4) ) {
    c => new LFSRTester(c) {
        def feedback(state: BigInt): BigInt = {
            c.poly.zipWithIndex.collect { case (true, idx) => 
                (state >> idx) & 0x1
            }.reduce (_ ^ _)
        }
    }
}

[[35minfo[0m] [0.000] Elaborating design...
[[35minfo[0m] [0.026] Done elaborating.
Total FIRRTL Compile Time: 44.9 ms
Total FIRRTL Compile Time: 46.3 ms
End of dependency graph
Circuit state created
[[35minfo[0m] [0.004] SEED 1503283928692
test cmd6WrapperHelperMSequence Success: 15 tests passed in 20 cycles taking 0.040513 seconds
[[35minfo[0m] [0.036] RAN 15 CYCLES PASSED


defined [32mclass[39m [36mMSequence[39m
[36mres6_1[39m: [32mBoolean[39m = [32mtrue[39m

### Type Parameterization

In the previous tutorial, we wrote a shift register. Unfortunately, it wasn't very flexible in what kind of inputs it could handle. If instead of a `Bool` we wanted a shift register for `SInt`, we would have to rewrite the shift register module.

In Scala, objects and functions aren't the only things we can treat as parameters. We can also treat types as parameters.

We usually need to provide a type constraint.
In this case, we want to be able to put objects in a bundle, connect (`:=`) them, and create registers with them (`RegNext`).
These operations cannot be done on arbitrary objects; for example `wire := 3` is illegal because Scala is statically typed and `3` is a Scala `Int`, not a Chisel `UInt`.
If we use a type constraint to say that type `T` is a subclass of `Data`, then we can use `:=` on any objects of type `T` because `:=` is defined for all `Data`.

Here is an implementations of a simple shift register that take types as a parameter.
`gen` is an argument of type `T` that tells what width to use, for example `new ShiftRegister(UInt(4.W))` is a shift register for 4-bit `UInt`s.
`gen` also allows the Scala compiler to infer the type `T`- you can write `new ShiftRegister[UInt](UInt(4.W))` if you want to to be more specific, but the Scala compiler is smart enough to figure it out if you leave out the `[UInt]`.

In [8]:
class ShiftRegisterIO[T <: Data](gen: T, n: Int) extends Bundle {
    require (n >= 0, "Shift register must have non-negative shift")
    
    val in = Input(gen.cloneType)
    val out = Output(Vec(n + 1, gen.cloneType)) // + 1 because in is included in out
}

class ShiftRegister[T <: Data](gen: T, n: Int) extends Module {
    val io = IO(new ShiftRegisterIO(gen, n))
    
    io.out.foldLeft(io.in) { case (in, out) =>
        out := in
        RegNext(in)
    }
}

class ShiftRegisterTester[T <: Bits](c: ShiftRegister[T]) extends PeekPokeTester(c) {
    println(s"Testing ShiftRegister of type ${c.io.in} and depth ${c.io.out.length}")
    for (i <- 0 until 10) {
        poke(c.io.in, i)
        println(s"$i: ${peek(c.io.out)}")
        step(1)
    }
}

Driver(() => new ShiftRegister(UInt(4.W), 5)) { c => new ShiftRegisterTester(c) }
Driver(() => new ShiftRegister(SInt(6.W), 3)) { c => new ShiftRegisterTester(c) }

[[35minfo[0m] [0.000] Elaborating design...
[[35minfo[0m] [0.007] Done elaborating.
Total FIRRTL Compile Time: 20.8 ms
Total FIRRTL Compile Time: 17.2 ms
End of dependency graph
Circuit state created
[[35minfo[0m] [0.000] SEED 1503283929733
[[35minfo[0m] [0.001] Testing ShiftRegister of type chisel3.core.UInt@8 and depth 6
[[35minfo[0m] [0.015] 0: Vector(0, 13, 13, 13, 13, 13)
[[35minfo[0m] [0.020] 1: Vector(1, 0, 13, 13, 13, 13)
[[35minfo[0m] [0.026] 2: Vector(2, 1, 0, 13, 13, 13)
[[35minfo[0m] [0.027] 3: Vector(3, 2, 1, 0, 13, 13)
[[35minfo[0m] [0.027] 4: Vector(4, 3, 2, 1, 0, 13)
[[35minfo[0m] [0.027] 5: Vector(5, 4, 3, 2, 1, 0)
[[35minfo[0m] [0.028] 6: Vector(6, 5, 4, 3, 2, 1)
[[35minfo[0m] [0.034] 7: Vector(7, 6, 5, 4, 3, 2)
[[35minfo[0m] [0.037] 8: Vector(8, 7, 6, 5, 4, 3)
[[35minfo[0m] [0.038] 9: Vector(9, 8, 7, 6, 5, 4)
test cmd7WrapperHelperShiftRegister Success: 0 tests passed in 15 cycles taking 0.039284 seconds
[[35minfo[0m] [0.039] RAN 10 CYCL

defined [32mclass[39m [36mShiftRegisterIO[39m
defined [32mclass[39m [36mShiftRegister[39m
defined [32mclass[39m [36mShiftRegisterTester[39m
[36mres7_3[39m: [32mBoolean[39m = [32mtrue[39m
[36mres7_4[39m: [32mBoolean[39m = [32mtrue[39m

## Advanced Bundles
So far we've talked about writing code that can generate the contents of a module.
Generators also need to be able to programmatically generate IOs.
The next few sections will talk about some more sophisticated things you can do with `Bundle`s in chisel.

### DecoupledIO
Ready/valid handshakes are very commonly used.
Rather than make a new ready and valid signal in an ad-hoc way for every module, chisel gives some helpers to make dealing with them easier.
`Decoupled` is one such helper.
Wrapping an IO with a call to `Decoupled(gen)` returns a bundle of type `DecoupledIO` with three fields:
  - `ready` (Input)
  - `valid` (Output)
  - `bits`  (Output of the type of `gen`)
  
The outputs and inputs can be reversed with a call to `Flipped()` if needed.
Decoupled also defines `fire()` which returns a `Bool` indicating when a valid transaction is occuring (i.e. `valid && ready`).

Chisel provides some other helpers, like `Valid()` (similar to `Decoupled` but with no `ready` signal, only `valid`) and `Irrevocable()` (same fields as `Decoupled`, but `valid` cannot go from 1 -> 0 unless `ready` is asserted).

The following code is an example of how to replace the somewhat ad-hoc `en` signal in `LFSRIO` with a `Decoupled` interface on `out`.

In [9]:
class SimpleLFSRIO extends Bundle {
    val out   = Decoupled(Bool())
    val state = Output(UInt())
}

class DecoupledLFSR(n: Int, feedback: UInt => Bool) extends Module {
    val io = IO(new SimpleLFSRIO)
    
    val allOnes    = (1 << n) - 1
    val state      = RegInit(allOnes.U(n.W))
    val nextState  = (state << 1) | feedback(state)
    io.out.valid  := true.B // LFSR can always output valid data
    io.out.bits   := state(n-1)
    io.state      := state

    when (io.out.fire()) { // io.out.fire() = io.out.ready && io.out.valid for Decoupled
        state := nextState
    }
}

println(chisel3.Driver.emit( () => new DecoupledLFSR(4, {u: UInt => u(3) ^ u(0)}) ))

[[35minfo[0m] [0.000] Elaborating design...
[[35minfo[0m] [0.007] Done elaborating.
;buildInfoPackage: chisel3, version: 3.0-SNAPSHOT_2017-07-19, scalaVersion: 2.11.11, sbtVersion: 0.13.15, builtAtString: 2017-07-19 18:56:34.453, builtAtMillis: 1500490594453
circuit cmd8WrapperHelperDecoupledLFSR : 
  module cmd8WrapperHelperDecoupledLFSR : 
    input clock : Clock
    input reset : UInt<1>
    output io : {out : {flip ready : UInt<1>, valid : UInt<1>, bits : UInt<1>}, state : UInt}
    
    clock is invalid
    reset is invalid
    io is invalid
    reg state : UInt<4>, clock with : (reset => (reset, UInt<4>("h0f"))) @[cmd8.sc 10:29]
    node _T_9 = shl(state, 1) @[cmd8.sc 11:29]
    node _T_10 = bits(state, 3, 3) @[cmd8.sc 21:83]
    node _T_11 = bits(state, 0, 0) @[cmd8.sc 21:90]
    node _T_12 = xor(_T_10, _T_11) @[cmd8.sc 21:87]
    node nextState = or(_T_9, _T_12) @[cmd8.sc 11:35]
    io.out.valid <= UInt<1>("h01") @[cmd8.sc 12:19]
    node _T_14 = bits(state, 3, 3) @[cmd8.sc

defined [32mclass[39m [36mSimpleLFSRIO[39m
defined [32mclass[39m [36mDecoupledLFSR[39m

### Parameterized Bundles

We've shown them in previous sections but it is worth discussing in a dedicated section.
Like `Module`s, chisel `Bundle`s are classes that can have any valid Scala object as arguments.
These parameterized bundles can cause problems in some instances, usually with `cloneType`.
The following code will give a somewhat strange error unless you uncomment the commented `cloneType` implementation.

In [10]:
class ParamBundle(a: Int) extends Bundle {
    val in1 = Output(SInt(a.W))
    val in2 = Output(SInt(a.W))
    // override def cloneType = new ParamBundle(a).asInstanceOf[this.type]
}

println(chisel3.Driver.emit( () => new ShiftRegister(new ParamBundle(3), 4) ))

[[35minfo[0m] [0.002] Elaborating design...


: 

The error says a `cloneType` method is needed.
What is going on?
Every chisel object is either a bound "hardware" object or an unbound "type" object.
Bound hardware objects actually exist in the circuit, like a register or a wire.
Unbound type objects are things like `UInt(4.W)`- they don't exist in the circuit, they just describe a type.
`cloneType` is a method used a lot internally in chisel that gets an unbound type object from any object, including a bound hardware object.
Normally, chisel can figure out how to do this automatically, but sometimes parameterized bundles confuse this process because the chisel compiler has trouble figuring out where the parameters are coming from.
Overriding `cloneType` and filling in the parameters manually will solve the problem, as shown above.

### Optional Bundle Fields

Sometimes we want IOs to be optionally included or excluded.
Maybe there's some internal state that's nice to be able to look at for debugging, but you want to hide it when the generator is being used in a system.
Maybe your generator some inputs don't need to be connected in every situation because there is a sensible default.

Optional bundle fields are one way to get this functionality.
`Option`s in Scala might contain an object, or they might not.
The option could be `Some`, in which case if you call `get` on it you will get the object it contains.
It could also be `None`, in which case it contains no object and calling `get` on it raises an error.
An `Option` can be either `Some` or `None`- either it has a value, or it is empty.

In the following example, we show an LFSR where the state output is optional.
If you are debugging the LFSR, it could be nice to look at the state and see what's going on.
If you're using the LFSR as a PRBS generator, you don't have to see the state, just the output.
If the state output exists, the generator assigns to it, but if it doesn't it does nothing.

If the optional field were an input rather than an output, `getOrElse(...)` is a useful thing to call on the optional field.
If the option is `Some()`, calling `getOrElse(...)` on it returns the value of the `Some()`.
If the option is `None`, calling `getOrElse(default)` returns default.

The following code block shows the different firrtl emitted when the state output is included or excluded.
Look for how the line beginning `output io :` is different for the two circuits.
This will be reflected in the generated verilog when the firrtl is compiled.
The last line of the code block emits verilog to the folder `verilog_output`.
The verilog will be named `cmd{i}WrapperHelperOptionalStateLFSR.v`, where `i` is the number of the command being run (look at the number next to `In` in the prompt).

In [11]:
class OptionalLFSRIO(includeState: Boolean = true) extends Bundle {
    val out   = Output(Bool())
    val state = if (includeState) Some(Output(UInt())) else None
}

class OptionalStateLFSR(includeState: Boolean = true) extends Module {
    val io = IO(new OptionalLFSRIO(includeState))
    
    // simple 4-bit LFSR
    val state = RegInit(15.U(4.W))
    val nextState = (state << 1) | (state(3) ^ state(0))
    state := nextState
    io.out := state(0)
    // map can be used to conditionally connect state
    // an equivalent way would be
    // if (!io.state.isEmpty) io.state.get := state
    io.state.map { case s => s := state }
}

println(chisel3.Driver.emit( () => new OptionalStateLFSR(true) ))
println(chisel3.Driver.emit( () => new OptionalStateLFSR(false) ))

// Emit verilog
// try false->true to see the difference
chisel3.Driver.execute(Array("-X", "verilog", "-td", "verilog_output"), () => new OptionalStateLFSR(false))

[[35minfo[0m] [0.000] Elaborating design...
[[35minfo[0m] [0.004] Done elaborating.
;buildInfoPackage: chisel3, version: 3.0-SNAPSHOT_2017-07-19, scalaVersion: 2.11.11, sbtVersion: 0.13.15, builtAtString: 2017-07-19 18:56:34.453, builtAtMillis: 1500490594453
circuit cmd10WrapperHelperOptionalStateLFSR : 
  module cmd10WrapperHelperOptionalStateLFSR : 
    input clock : Clock
    input reset : UInt<1>
    output io : {out : UInt<1>, state : UInt}
    
    clock is invalid
    reset is invalid
    io is invalid
    reg state : UInt<4>, clock with : (reset => (reset, UInt<4>("h0f"))) @[cmd10.sc 10:24]
    node _T_6 = shl(state, 1) @[cmd10.sc 11:28]
    node _T_7 = bits(state, 3, 3) @[cmd10.sc 11:42]
    node _T_8 = bits(state, 0, 0) @[cmd10.sc 11:53]
    node _T_9 = xor(_T_7, _T_8) @[cmd10.sc 11:46]
    node nextState = or(_T_6, _T_9) @[cmd10.sc 11:34]
    state <= nextState @[cmd10.sc 12:11]
    node _T_10 = bits(state, 0, 0) @[cmd10.sc 13:20]
    io.out <= _T_10 @[cmd10.sc 13:12]
  

defined [32mclass[39m [36mOptionalLFSRIO[39m
defined [32mclass[39m [36mOptionalStateLFSR[39m
[36mres10_4[39m: [32mChiselExecutionResult[39m = ChiselExecutionSuccess(Some(Circuit(cmd10WrapperHelperOptionalStateLFSR,ArrayBuffer(DefModule($sess.cmd10Wrapper$Helper$OptionalStateLFSR@0,cmd10WrapperHelperOptionalStateLFSR,ArrayBuffer(Port(chisel3.core.Clock@2,Input), Port(chisel3.core.Bool@4,Input), Port($sess.cmd10Wrapper$Helper$OptionalLFSRIO@5,Unspecified)),ArrayBuffer(DefInvalid(UnlocatableSourceInfo,Node(chisel3.core.Clock@2)), DefInvalid(UnlocatableSourceInfo,Node(chisel3.core.Bool@4)), DefInvalid(UnlocatableSourceInfo,Node($sess.cmd10Wrapper$Helper$OptionalLFSRIO@5)), DefRegInit(SourceLine(cmd10.sc,10,24),chisel3.core.UInt@a,Node(chisel3.core.Clock@2),Node(chisel3.core.Bool@4),ULit(15,<4>)), DefPrim(SourceLine(cmd10.sc,11,28),chisel3.core.UInt@b,shl,WrappedArray(Node(chisel3.core.UInt@a), ILit(1))), DefPrim(SourceLine(cmd10.sc,11,42),chisel3.core.Bool@c,bits,WrappedArray(N

### Zero-Width Wires

Types with width 0 are legal in chisel.
This is frequently useful.
They are more or less equivalent to a literal 0 when they are used in operations, and they are not emitted in IOs.

Why would you want to use a zero-width wire?
Widths are often derived from other widths.
One very common case is that the width of one field is the log of the width of another field, as shown in the following example.
Rather than special-casing these situations out, zero-width wires allow your generator to be clean while still emitting the right verilog.

The following block prints the firrtl and emits verilog to the `verilog_output` folder.
See what changes when you try `n=1` and `n>1`.

In [12]:
class VectorSelectIO(n: Int) extends Bundle {
    val vecIn = Input(Vec(UInt(4.W), n))
    val sel   = Input(UInt(log2Ceil(n).W))
    val out   = Output(UInt(4.W))
}

class VectorSelect(n: Int) extends Module {
    val io = IO(new VectorSelectIO(n))
    io.out := io.vecIn(io.sel)
}

println(chisel3.Driver.emit( () => new VectorSelect(4) ))
println(chisel3.Driver.emit( () => new VectorSelect(1) ))
chisel3.Driver.execute(Array("-X", "verilog", "-td", "verilog_output"), () => new VectorSelect(1))

[[35minfo[0m] [0.000] Elaborating design...
[[35minfo[0m] [0.011] Done elaborating.
;buildInfoPackage: chisel3, version: 3.0-SNAPSHOT_2017-07-19, scalaVersion: 2.11.11, sbtVersion: 0.13.15, builtAtString: 2017-07-19 18:56:34.453, builtAtMillis: 1500490594453
circuit cmd11WrapperHelperVectorSelect : 
  module cmd11WrapperHelperVectorSelect : 
    input clock : Clock
    input reset : UInt<1>
    output io : {flip vecIn : UInt<4>[4], flip sel : UInt<2>, out : UInt<4>}
    
    clock is invalid
    reset is invalid
    io is invalid
    io.out <= io.vecIn[io.sel] @[cmd11.sc 9:12]
    

[[35minfo[0m] [0.000] Elaborating design...
[[35minfo[0m] [0.002] Done elaborating.
;buildInfoPackage: chisel3, version: 3.0-SNAPSHOT_2017-07-19, scalaVersion: 2.11.11, sbtVersion: 0.13.15, builtAtString: 2017-07-19 18:56:34.453, builtAtMillis: 1500490594453
circuit cmd11WrapperHelperVectorSelect : 
  module cmd11WrapperHelperVectorSelect : 
    input clock : Clock
    input reset : UInt<1>
    outp

defined [32mclass[39m [36mVectorSelectIO[39m
defined [32mclass[39m [36mVectorSelect[39m
[36mres11_4[39m: [32mChiselExecutionResult[39m = ChiselExecutionSuccess(Some(Circuit(cmd11WrapperHelperVectorSelect,ArrayBuffer(DefModule($sess.cmd11Wrapper$Helper$VectorSelect@0,cmd11WrapperHelperVectorSelect,ArrayBuffer(Port(chisel3.core.Clock@2,Input), Port(chisel3.core.Bool@4,Input), Port($sess.cmd11Wrapper$Helper$VectorSelectIO@5,Unspecified)),ArrayBuffer(DefInvalid(UnlocatableSourceInfo,Node(chisel3.core.Clock@2)), DefInvalid(UnlocatableSourceInfo,Node(chisel3.core.Bool@4)), DefInvalid(UnlocatableSourceInfo,Node($sess.cmd11Wrapper$Helper$VectorSelectIO@5)), Connect(SourceLine(cmd11.sc,9,12),Node(chisel3.core.UInt@12),Node(chisel3.core.UInt@14))))),ArrayBuffer())),;buildInfoPackage: chisel3, version: 3.0-SNAPSHOT_2017-07-19, scalaVersion: 2.11.11, sbtVersion: 0.13.15, builtAtString: 2017-07-19 18:56:34.453, builtAtMillis: 1500490594453
circuit cmd11WrapperHelperVectorSelect : 
[33m

### Multiple Clocks

So far, all of our modules have used chisel's implicit clock and reset.
You can override one or both of clock and reset.
Clocks are special kinds of signals, whereas resets are synchronous and use any `Bool`.
Here is an example of how to add new clocks and resets and what the resulting firrtl looks like.
The verilog is also saved to the `verilog_output` folder.

In [13]:
import chisel3.experimental.{withClockAndReset, withClock, withReset}

class MultiClockExample extends Module {
    val io = IO(new Bundle {
        val clk1 = Input(Clock())
        val clk2 = Output(Clock())
        val rst = Input(Bool())
        val data = Input(UInt(4.W))
        val out = Output(UInt())
    })
    
    // use the implicit clock and reset
    val reg1 = RegNext(io.data)
    // use the clock in the bundle and the implicit reset
    val reg2 = withClock(io.clk1) { RegNext(io.data) }
    // use the clock and reset (inverted) in the bundle
    withClockAndReset(io.clk1, !io.rst) {
        val regInside = RegInit(0.U)
        regInside := io.data
    }
    // use the reset in the bundle
    val reg3 = withReset(io.rst) { RegInit(0.U) }
    reg3 := reg2
    when (io.data === 0.U) {
        io.out := reg1
    } .elsewhen (io.data === 1.U) {
        io.out := reg2
    } .otherwise {
        io.out := reg3
    }
}

println(chisel3.Driver.emit( () => new MultiClockExample ))
chisel3.Driver.execute(Array("-X", "verilog", "-td", "verilog_output"), () => new MultiClockExample)

[[35minfo[0m] [0.000] Elaborating design...
[[35minfo[0m] [0.016] Done elaborating.
;buildInfoPackage: chisel3, version: 3.0-SNAPSHOT_2017-07-19, scalaVersion: 2.11.11, sbtVersion: 0.13.15, builtAtString: 2017-07-19 18:56:34.453, builtAtMillis: 1500490594453
circuit cmd12WrapperHelperMultiClockExample : 
  module cmd12WrapperHelperMultiClockExample : 
    input clock : Clock
    input reset : UInt<1>
    output io : {flip clk1 : Clock, clk2 : Clock, flip rst : UInt<1>, flip data : UInt<4>, out : UInt}
    
    clock is invalid
    reset is invalid
    io is invalid
    reg reg1 : UInt, clock @[cmd12.sc 13:23]
    reg1 <= io.data @[cmd12.sc 13:23]
    reg reg2 : UInt, io.clk1 @[cmd12.sc 15:44]
    reg2 <= io.data @[cmd12.sc 15:44]
    node _T_10 = eq(io.rst, UInt<1>("h00")) @[cmd12.sc 17:32]
    reg _T_13 : UInt, io.clk1 with : (reset => (_T_10, UInt<1>("h00"))) @[cmd12.sc 18:32]
    _T_13 <= io.data @[cmd12.sc 19:19]
    reg reg3 : UInt, clock with : (reset => (io.rst, UInt<1>("h00

[32mimport [39m[36mchisel3.experimental.{withClockAndReset, withClock, withReset}

[39m
defined [32mclass[39m [36mMultiClockExample[39m
[36mres12_3[39m: [32mChiselExecutionResult[39m = ChiselExecutionSuccess(Some(Circuit(cmd12WrapperHelperMultiClockExample,ArrayBuffer(DefModule($sess.cmd12Wrapper$Helper$MultiClockExample@0,cmd12WrapperHelperMultiClockExample,ArrayBuffer(Port(chisel3.core.Clock@2,Input), Port(chisel3.core.Bool@4,Input), Port($sess.cmd12Wrapper$Helper$MultiClockExample$$anon$1@5,Unspecified)),ArrayBuffer(DefInvalid(UnlocatableSourceInfo,Node(chisel3.core.Clock@2)), DefInvalid(UnlocatableSourceInfo,Node(chisel3.core.Bool@4)), DefInvalid(UnlocatableSourceInfo,Node($sess.cmd12Wrapper$Helper$MultiClockExample$$anon$1@5)), DefReg(SourceLine(cmd12.sc,13,23),chisel3.core.UInt@11,Node(chisel3.core.Clock@2)), Connect(SourceLine(cmd12.sc,13,23),Node(chisel3.core.UInt@11),Node(chisel3.core.UInt@d)), DefReg(SourceLine(cmd12.sc,15,44),chisel3.core.UInt@13,Node(chisel3.cor

## Verilog blackboxes

Even though chisel is great, there are some situations where you'll want to use other HDLs.
Maybe you have some IP you want to integrate with chisel.
Maybe you need to hand-craft some HDL that chisel has trouble emitting (e.g. a pragma in a comment, or a clock-crossing FIFO).
Chisel provides the `BlackBox` mechanism for integrating external HDL sources.

Instead of extending `Module`, extend `BlackBox`.
Define the IO in the same way you would a `Module` (making sure it everything matches your external HDL), but don't fill in an implementation for the rest of the circuit.
The chisel and firrtl compiler will instantiate the module, but it won't declare or define it.
It's up to you to figure out how to get your downstream tools to include the chisel-generated RTL as well as the modules you blackboxed.

Sometimes you'll have HDL for your blackbox in advance.
Chisel provides another mechanism to include this source code from your chisel generator.
When you emit verilog, the chisel+firrtl compiler will also write out your black boxed source code, and your testers will automatically include your black box.


To include an implementation of a black box, mix in `HasBlackBoxInline` (or `HasBlackBoxResource`, but that isn't useful when using jupyter notebook) as shown below.
Call `setInline()` on a string containing your verilog source.

_ One minor detail we've been sweeping under the rug is that thus far we've been using the firrtl interpreter to test our circuits.
The firrtl interpreter directly executes firrtl without ever emitting verilog.
We can't simulate verilog black boxes with the firrtl interpreter, so in these tests we switch to using verilator (an open source verilog simulator).
VCS is also supported by the chisel testers.
_

In [14]:
class NegedgeReg extends BlackBox with HasBlackBoxInline {
    // only necessary because jupyter does weird things to your scope
    override def desiredName = "NegedgeReg"
    
    val io = IO(new Bundle {
        val clock = Input(Clock())
        val d     = Input(UInt(1.W))
        val en    = Input(Bool())
        val reset = Input(Bool())
        val q     = Output(UInt(1.W))
    })
    
    setInline("NegedgeReg.v",
"""module NegedgeReg(
input clock,
input d,
input en,
input reset,
output reg q
);
    always @(negedge clock) begin
        if (reset) begin
            q <= 1'b0;
        end else if (en) begin
            q <= d;
        end
    end
endmodule
""")
}

class Inverter extends BlackBox with HasBlackBoxInline {
    // only necessary because jupyter does weird things to your scope
    override def desiredName = "Inverter"
    val io = IO(new Bundle {
        val in  = Input(Bool())
        val out = Output(Bool())
    })
    
    setInline("Inverter.v", 
"""module Inverter(
in,
out
);
    input in;
    output out;
    assign out = ~in;
endmodule
""")
}


class Negate extends Module {
    // only necessary because jupyter does weird things to your scope
    override def desiredName = "Negate"
    val io = IO(new Bundle {
        val in = Input(SInt(4.W))
        val out = Output(SInt(4.W))
    })
    val bools = io.in.toBools
    val negated = Vec(bools.map { case b =>
        val inverter = Module(new Inverter)
        val delayed  = Module(new NegedgeReg)
        inverter.io.in   := b
        delayed.io.clock := clock
        delayed.io.d     := inverter.io.out
        delayed.io.en    := true.B
        delayed.io.reset := reset
        delayed.io.q
    })
    io.out := negated.asTypeOf(SInt())
}

class NegateTester(c: Negate) extends PeekPokeTester(c) {
    poke(c.io.in, 1)
    step(1)
    expect(c.io.out, -2)
    println(s"Out is ${peek(c.io.out)}")
    poke(c.io.in, 0)
    step(1)
    expect(c.io.out, -1)
    println(s"Out is ${peek(c.io.out)}")
}
import chisel3.iotesters._

val manager = new TesterOptionsManager {
  testerOptions = TesterOptions(backendName = "verilator")
  commonOptions = commonOptions.copy(targetDirName = "verilatortests", topName = "Negate")
}

chisel3.Driver.execute(manager, () => new Negate)
Driver.execute(() => new Negate, manager) { c => new NegateTester(c) }


[[35minfo[0m] [0.000] Elaborating design...
[[35minfo[0m] [0.034] Done elaborating.
Total FIRRTL Compile Time: 89.7 ms
[[35minfo[0m] [0.000] Elaborating design...
[[35minfo[0m] [0.010] Done elaborating.
Total FIRRTL Compile Time: 97.6 ms
verilator --cc Negate.v -f /Users/rigge/src/generator-bootcamp/verilatortests/black_box_verilog_files.f --assert -Wno-fatal -Wno-WIDTH -Wno-STMTDLY --trace -O1 --top-module Negate +define+TOP_TYPE=VNegate +define+PRINTF_COND=!Negate.reset +define+STOP_COND=!Negate.reset -CFLAGS -Wno-undefined-bool-conversion -O1 -DTOP_TYPE=VNegate -DVL_USER_FINISH -include VNegate.h -Mdir /Users/rigge/src/generator-bootcamp/verilatortests --exe /Users/rigge/src/generator-bootcamp/verilatortests/Negate-harness.cpp
make: Entering directory '/Users/rigge/src/generator-bootcamp/verilatortests'
clang++  -I.  -MMD -I/usr/local/Cellar/verilator/3.900/share/verilator/include -I/usr/local/Cellar/verilator/3.900/share/verilator/include/vltstd -DVL_PRINTF=printf -DVM_COVE

defined [32mclass[39m [36mNegedgeReg[39m
defined [32mclass[39m [36mInverter[39m
defined [32mclass[39m [36mNegate[39m
defined [32mclass[39m [36mNegateTester[39m
[32mimport [39m[36mchisel3.iotesters._

[39m
[36mmanager[39m: [32mTesterOptionsManager[39m = $sess.cmd13Wrapper$Helper$$anon$1@698a6f19
[36mres13_6[39m: [32mChiselExecutionResult[39m = ChiselExecutionSuccess(Some(Circuit(Negate,ArrayBuffer(DefBlackBox($sess.cmd13Wrapper$Helper$Inverter@e,Inverter,ArrayBuffer(Port(chisel3.core.Bool@13,Output), Port(chisel3.core.Bool@11,Input)),Map()), DefBlackBox($sess.cmd13Wrapper$Helper$NegedgeReg@14,NegedgeReg,ArrayBuffer(Port(chisel3.core.UInt@1f,Output), Port(chisel3.core.Bool@1d,Input), Port(chisel3.core.Bool@1b,Input), Port(chisel3.core.UInt@19,Input), Port(chisel3.core.Clock@17,Input)),Map()), DefBlackBox($sess.cmd13Wrapper$Helper$Inverter@21,Inverter_1,ArrayBuffer(Port(chisel3.core.Bool@26,Output), Port(chisel3.core.Bool@24,Input)),Map()), DefBlackBox($sess.cm

## Type Casting

The code below will give an error if you try to run it without removing the comment.
What's the problem?
It is trying to assign a `UInt` to an `SInt`, which is illegal.

Chisel has a set of type casting functions.
The most general is `asTypeOf()`, which is shown below.
Some chisel objects also define `asUInt()` and `asSInt()` as well as some others.

If you remove the `//` from the code block below, the example should work for you.

In [15]:
class TypeConvertDemo extends Module {
    val io = IO(new Bundle {
        val in  = Input(UInt(4.W))
        val out = Output(SInt(4.W))
    })
    io.out := io.in//.asTypeOf(io.out)
}

Driver(() => new TypeConvertDemo) { c =>
  new PeekPokeTester(c) {
      poke(c.io.in, 3)
      expect(c.io.out, 3)
      poke(c.io.in, 15)
      expect(c.io.out, -1)
  }}

[[35minfo[0m] [0.000] Elaborating design...


: 

### Fixed Point

Chisel includes a fixed point type.
Fixed point numbers can do most of the same things that `UInt`s and `SInt`s can.
You can make wires and registers with fixed point numbers and put them in IOs or `Vec`s.

Fixed point numbers have a width just like `UInt`s and `SInt`s, but they also have a binary point that specifies how many bits correspond to the fractional part of the number.
Both width and binary point can be specified or inferred separately.

In the same way that widths are written like `integer.W`, binary points are written like `integer.BP`.
In the same way that UInt or SInt literals can be specified like 3.U or (-1).S, fixed point literals can be specified with `number.F`.
The number can be an integer or floating point number.
You can include one or both of a width and binary point as arguments to `.F()` as shown below.

In [16]:
import chisel3.experimental.FixedPoint

class FixedDemo extends Module {
    val io = IO(new Bundle {
        val a = Input(FixedPoint(4.W, 3.BP))
        val b = Output(FixedPoint(6.W, 5.BP))
    })
    
    val addOne = io.a + 1.0.F(8.BP)
    val times2 = addOne * 1.F(2.W, 0.BP)
    io.b := RegNext(times2)
}

class MAC(width: Int, binaryPoint: Int) extends Module {
    val io = IO(new Bundle {
        val a = Input(FixedPoint(width.W, binaryPoint.BP))
        val b = Input(FixedPoint(width.W, binaryPoint.BP))
        val c = Input(FixedPoint(width.W, binaryPoint.BP))
        val out = Output(FixedPoint())
    })
    
    io.out := io.a * io.b + io.c
}
println(chisel3.Driver.emit( () => new FixedDemo))
println(chisel3.Driver.emit( () => new MAC(6, 3)))

[[35minfo[0m] [0.000] Elaborating design...
[[35minfo[0m] [0.012] Done elaborating.
;buildInfoPackage: chisel3, version: 3.0-SNAPSHOT_2017-07-19, scalaVersion: 2.11.11, sbtVersion: 0.13.15, builtAtString: 2017-07-19 18:56:34.453, builtAtMillis: 1500490594453
circuit cmd15WrapperHelperFixedDemo : 
  module cmd15WrapperHelperFixedDemo : 
    input clock : Clock
    input reset : UInt<1>
    output io : {flip a : Fixed<4><<3>>, b : Fixed<6><<5>>}
    
    clock is invalid
    reset is invalid
    io is invalid
    node _T_5 = add(io.a, asFixedPoint(UInt<10>("h0100"), 8)) @[cmd15.sc 9:23]
    node _T_6 = tail(_T_5, 1) @[cmd15.sc 9:23]
    node addOne = asFixedPoint(_T_6, 8) @[cmd15.sc 9:23]
    node times2 = mul(addOne, asFixedPoint(UInt<2>("h01"), 0)) @[cmd15.sc 10:25]
    reg _T_9 : Fixed<<8>>, clock @[cmd15.sc 11:20]
    _T_9 <= times2 @[cmd15.sc 11:20]
    io.b <= _T_9 @[cmd15.sc 11:10]
    

[[35minfo[0m] [0.000] Elaborating design...
[[35minfo[0m] [0.004] Done elaborating.
;b

[32mimport [39m[36mchisel3.experimental.FixedPoint

[39m
defined [32mclass[39m [36mFixedDemo[39m
defined [32mclass[39m [36mMAC[39m

### Floating Point

Chisel is written in Scala, so writing libraries for Chisel is a lot like writing any other Scala library.
One useful Scala library is `dsptools`, located on [Github](https://github.com/ucb-bar/dsptools).
This library provides a number of useful constructs for writing DSP circuits.

One useful feature is the ability to prototype circuits with floating point.
The generated code uses non-synthesizable constructs with Verilog `$real`s.
`DspReal`s are fixed width, so no width needs to be specified.

In [17]:
import $ivy.`edu.berkeley.cs::dsptools:1.0` 

[32mimport [39m[36m$ivy.$                               [39m

In [18]:
import dsptools.numbers._

class FloatDemo extends Module {
    val io = IO(new Bundle {
        val a = Input(DspReal())
        val b = Output(DspReal())
    })
    
    val addOne = io.a + DspReal(1.0)
    val times2 = addOne * DspReal(1.0)
    io.b := RegNext(times2)
}

class FloatMAC extends Module {
    val io = IO(new Bundle {
        val a = Input(DspReal())
        val b = Input(DspReal())
        val c = Input(DspReal())
        val out = Output(DspReal())
    })
    io.out := io.a * io.b + io.c
}

println(chisel3.Driver.emit( () => new FloatDemo))
println(chisel3.Driver.emit( () => new FloatMAC ))

[[35minfo[0m] [0.000] Elaborating design...
[[35minfo[0m] [0.021] Done elaborating.
;buildInfoPackage: chisel3, version: 3.0-SNAPSHOT_2017-07-19, scalaVersion: 2.11.11, sbtVersion: 0.13.15, builtAtString: 2017-07-19 18:56:34.453, builtAtMillis: 1500490594453
circuit cmd17WrapperHelperFloatDemo : 
  extmodule BBFAdd : 
    output out : UInt<64>
    input in2 : UInt<64>
    input in1 : UInt<64>
    
    defname = BBFAdd
    
    
  extmodule BBFMultiply : 
    output out : UInt<64>
    input in2 : UInt<64>
    input in1 : UInt<64>
    
    defname = BBFMultiply
    
    
  module cmd17WrapperHelperFloatDemo : 
    input clock : Clock
    input reset : UInt<1>
    output io : {flip a : {node : UInt<64>}, b : {node : UInt<64>}}
    
    clock is invalid
    reset is invalid
    io is invalid
    inst BBFAdd of BBFAdd @[DspReal.scala 43:36]
    BBFAdd.out is invalid
    BBFAdd.in2 is invalid
    BBFAdd.in1 is invalid
    BBFAdd.in1 <= io.a.node @[DspReal.scala 26:21]
    BBFAdd.in2 <= U

[32mimport [39m[36mdsptools.numbers._

[39m
defined [32mclass[39m [36mFloatDemo[39m
defined [32mclass[39m [36mFloatMAC[39m

### Type Parameterized DSP Modules

Earlier sections showed how useful type parameterized code can be.
We were limited to simple operations that could be performed on any instance of `Data` such as `:=` or `RegNext()`.
When generating DSP circuits, we would like to do mathematical operations like addition and multiplication.
The `dsptools` library provides tools for writing type parameterized DSP generators.

Here is an example of writing a multiply-accumulate module.
It can be used to generate a multiply-accumulate (MAC) for `FixedPoint`, `DspReal`, or even `DspComplex[T]` (the complex number type provided by `dsptools`).
The syntax of the type bound is a little different because `dsptools` uses typeclasses.
They are beyond the scope of this notebook.
Read the `dsptools` readme and documentation for more information on using typeclasses.

`T <: Data : Ring` means that `T` is a subtype of `Data` and is also a `Ring`.
`Ring` is defined in `dsptools` as a number with `+` and `*` (see <a href="https://en.wikipedia.org/wiki/Ring_(mathematics)#Definition">here</a> for the mathematical definition).
An alternative to `Ring` would be `Real`, but then we couldn't use `DspComplex()` because complex numbers are not `Real`.

In [19]:
import dsptools.numbers.implicits._

class Mac[T <: Data : Ring](genIn : T, genOut: T) extends Module {
    val io = IO(new Bundle {
        val a = Input(genIn.cloneType)
        val b = Input(genIn.cloneType)
        val c = Input(genIn.cloneType)
        val out = Output(genOut.cloneType)
    })
    io.out := io.a * io.b + io.c
}

println(chisel3.Driver.emit( () => new Mac(FixedPoint(4.W, 3.BP), FixedPoint(6.W, 4.BP))))
println(chisel3.Driver.emit( () => new Mac(DspReal(), DspReal())))
println(chisel3.Driver.emit( () => new Mac(DspComplex(FixedPoint(4.W, 3.BP), FixedPoint(4.W, 3.BP)), DspComplex(FixedPoint(6.W, 4.BP), FixedPoint(6.W, 4.BP)))))

[[35minfo[0m] [0.000] Elaborating design...
[[35minfo[0m] [0.082] Done elaborating.
;buildInfoPackage: chisel3, version: 3.0-SNAPSHOT_2017-07-19, scalaVersion: 2.11.11, sbtVersion: 0.13.15, builtAtString: 2017-07-19 18:56:34.453, builtAtMillis: 1500490594453
circuit cmd18WrapperHelperMac : 
  module cmd18WrapperHelperMac : 
    input clock : Clock
    input reset : UInt<1>
    output io : {flip a : Fixed<4><<3>>, flip b : Fixed<4><<3>>, flip c : Fixed<4><<3>>, out : Fixed<6><<4>>}
    
    clock is invalid
    reset is invalid
    io is invalid
    node _T_6 = mul(io.a, io.b) @[FixedPointTypeClass.scala 43:59]
    node _T_7 = add(_T_6, io.c) @[FixedPointTypeClass.scala 21:58]
    node _T_8 = tail(_T_7, 1) @[FixedPointTypeClass.scala 21:58]
    node _T_9 = asFixedPoint(_T_8, 6) @[FixedPointTypeClass.scala 21:58]
    io.out <= _T_9 @[cmd18.sc 10:12]
    

[[35minfo[0m] [0.000] Elaborating design...
[[35minfo[0m] [0.018] Done elaborating.
;buildInfoPackage: chisel3, version: 3.0-S

[32mimport [39m[36mdsptools.numbers.implicits._

[39m
defined [32mclass[39m [36mMac[39m

## Exercises

### 1. Shift Register Test with Bundles

The shift register implementation given earlier is templated for all `[T <: Data]`.
`Bundle`s are subtypes of `Data`.
However, the given tester was templated for `Bits` (which includes things like `UInt`, `SInt`, but not `Bundle`.
Also, the test only printed out the values, it didn't actually check that it was correct.

The following code defines a bundle type for complex numbers.
Write a tester to check that the shift register works correctly for complex numbers.
Test that it works for a variety of depths!
To begin with, test that it works on `depth=4` and `width=3`, but then uncomment the `depths` and `widths` to test that it works for more values.

In [19]:
class ComplexBundle(w: Int) extends Bundle {
    val real = Output(SInt(w.W))
    val imag = Output(SInt(w.W))
    override def cloneType = new ComplexBundle(w).asInstanceOf[this.type]
}

// Show the emitted firrtl for an instance of ShiftRegister with Complex
println(chisel3.Driver.emit( () => new ShiftRegister(new ComplexBundle(4), 0) ))

class ComplexShiftRegisterTester(c: ShiftRegister[ComplexBundle]) extends PeekPokeTester(c) {
    // TODO fill me in and remove fail
    fail
}

// See what happens when you try to compile this
// Why won't it compile?
Driver( () => new ShiftRegister(new ComplexBundle(4), 5)) { c=>
        new ShiftRegisterTester(c) }

val depths = List(4) // List(0, 1, 2, 5, 10, 100)
val widths = List(3) // List(3, 16)

for (w <- widths) {
    for (d <- depths) {
        Driver( () => new ShiftRegister(new ComplexBundle(w), d)) { c=>
        new ComplexShiftRegisterTester(c) }
    }
}

cmd19.sc:18: inferred type arguments [Helper.this.ComplexBundle] do not conform to class ShiftRegisterTester's type parameter bounds [T <: chisel3.Bits]
        new ShiftRegisterTester(c) }
        ^cmd19.sc:18: type mismatch;
 found   : cmd19Wrapper.this.cmd7.wrapper.ShiftRegister[Helper.this.ComplexBundle]
 required: cmd19Wrapper.this.cmd7.wrapper.ShiftRegister[T]
        new ShiftRegisterTester(c) }
                                ^

: 

### 2. Decoupled Shift Register

Write an implementation of a shift register that has decoupled inputs and outputs.
The initial values in the shift register should be zero and they should be valid output (i.e. the first `n` valid outputs are 0 before it starts shifting out an input).

In [30]:
class DecoupledShiftRegisterIO[T <: Data](gen: T, n: Int) extends Bundle {
    require (n >= 0, "Shift register must have non-negative shift")
    
    val in = Flipped(Decoupled(gen))
    val out = Decoupled(Vec(n + 1, gen.cloneType)) // + 1 because in is included in out
}

class DecoupledShiftRegister[T <: Data](val gen: T, val n: Int) extends Module {
    val io = IO(new DecoupledShiftRegisterIO(gen, n))
    io.out.valid := true.B

    io.out.bits(n-1) := io.in.bits
    io.in.ready := true.B
}

class DecoupledShiftRegisterTester[T <: DecoupledShiftRegister[UInt]](c: T) extends PeekPokeTester(c) {
    val n = c.n
    val genWidth = c.gen.getWidth
    val maxCycles = 4 * n * 100
    var currentCycles = 0
    
    // make 4 * n random inputs that will fit in gen
    val savedInputs = Seq.fill(4 * n){BigInt(genWidth, scala.util.Random)}
    var inputs = Seq(savedInputs:_*)
    
    var outputs = Seq[BigInt]()
    
    while (inputs.length > 0 && currentCycles < maxCycles) {
        // don't run forever if the DUT is broken
        currentCycles += 1
        println(inputs.toString)
        println(outputs.toString)

        
        val outValid = peek(c.io.out.valid) != 0
        val outReady = scala.util.Random.nextBoolean

        val inValid = scala.util.Random.nextBoolean
        val inReady = peek(c.io.in.ready) != 0
        
        poke(c.io.in.valid, inValid)
        poke(c.io.out.ready, outReady)

        if (inValid) {
            poke(c.io.in.bits, inputs.head)
        } else {
            // not valid, poke some other random thing
            poke(c.io.in.bits, BigInt(genWidth, scala.util.Random))
        }
        
        if (inValid && inReady) {
            inputs = inputs.tail
        }
        
        if (outReady && inReady) {
            outputs = outputs :+ peek(c.io.out.bits(n-1))
        }
        step(1)
    }
    require(currentCycles < maxCycles, "Tester didn't see enough transactions")
    (Seq.fill(n){0} ++ savedInputs).zip(outputs).foreach {
        case (in, out) => require(in == out)
    }
    
}

println(chisel3.Driver.emit( () => new DecoupledShiftRegister(UInt(4.W), 5)))
Driver( () => new DecoupledShiftRegister(UInt(4.W), 5)) { c => new DecoupledShiftRegisterTester(c)}

[[35minfo[0m] [0.000] Elaborating design...
[[35minfo[0m] [0.004] Done elaborating.
;buildInfoPackage: chisel3, version: 3.0-SNAPSHOT_2017-07-19, scalaVersion: 2.11.11, sbtVersion: 0.13.15, builtAtString: 2017-07-19 18:56:34.453, builtAtMillis: 1500490594453
circuit cmd29WrapperHelperDecoupledShiftRegister : 
  module cmd29WrapperHelperDecoupledShiftRegister : 
    input clock : Clock
    input reset : UInt<1>
    output io : {flip in : {flip ready : UInt<1>, valid : UInt<1>, bits : UInt<4>}, out : {flip ready : UInt<1>, valid : UInt<1>, bits : UInt<4>[6]}}
    
    clock is invalid
    reset is invalid
    io is invalid
    io.out.valid <= UInt<1>("h01") @[cmd29.sc 10:18]
    io.out.bits[4] <= io.in.bits @[cmd29.sc 12:22]
    io.in.ready <= UInt<1>("h01") @[cmd29.sc 13:17]
    

[[35minfo[0m] [0.000] Elaborating design...
[[35minfo[0m] [0.003] Done elaborating.
Total FIRRTL Compile Time: 10.2 ms
Total FIRRTL Compile Time: 6.1 ms
End of dependency graph
Circuit state created
[

java.lang.IllegalArgumentException: requirement failed


[[35minfo[0m] [0.497] List(9, 8, 6, 14, 4, 4, 0, 2, 13, 5, 3, 4, 7, 4, 3, 6, 4)
[[35minfo[0m] [0.497] List(12)
[[35minfo[0m] [0.497] List(9, 8, 6, 14, 4, 4, 0, 2, 13, 5, 3, 4, 7, 4, 3, 6, 4)



	at scala.Predef$.require(Predef.scala:212)
	at $sess.cmd29Wrapper$Helper$DecoupledShiftRegisterTester$$anonfun$5.apply(cmd29.sc:62)
	at $sess.cmd29Wrapper$Helper$DecoupledShiftRegisterTester$$anonfun$5.apply(cmd29.sc:61)
	at scala.collection.immutable.List.foreach(List.scala:392)
	at $sess.cmd29Wrapper$Helper$DecoupledShiftRegisterTester.<init>(cmd29.sc:61)
	at $sess.cmd29Wrapper$Helper$$anonfun$8.apply(cmd29.sc:68)
	at $sess.cmd29Wrapper$Helper$$anonfun$8.apply(cmd29.sc:68)
	at chisel3.iotesters.Driver$$anonfun$execute$1$$anonfun$apply$mcZ$sp$1$$anonfun$apply$mcZ$sp$2.apply$mcZ$sp(Driver.scala:62)
	at chisel3.iotesters.Driver$$anonfun$execute$1$$anonfun$apply$mcZ$sp$1$$anonfun$apply$mcZ$sp$2.apply(Driver.scala:61)
	at chisel3.iotesters.Driver$$anonfun$execute$1$$anonfun$apply$mcZ$sp$1$$anonfun$apply$mcZ$sp$2.apply(Driver.scala:61)
	at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
	at chisel3.iotesters.Driver$$anonfun$execute$1$$anonfun$apply$mcZ$sp$1.apply$mcZ$sp(Dr

: 

### 3. Single-path delay-feedback FFT

TODO text + picture

### 3.a Butterfly

TODO text + picture

In [20]:
class Butterfly[T <: Data : Real](genIn: T, genOut: T) extends Module {
    val io = IO(new Bundle {
        // all IOs are complex
        val in0  = Input(DspComplex(genIn, genIn))
        val in1  = Input(DspComplex(genIn, genIn))
        val out0 = Output(DspComplex(genOut, genOut))
        val out1 = Output(DspComplex(genOut, genOut))
    })
    
    io.out0 := io.in0 + io.in1
    io.out1 := io.in0 - io.in1
}

defined [32mclass[39m [36mButterfly[39m

### 3.b Butterfly + Delay Element + Twiddle

TODO text + picture

In [21]:
// import breeze.math.Complex
class PE[T <: Data : Real](genIn: T, genButterflyOut: T, genMultOut: T, twiddlesFunc: () => Vec[DspComplex[T]]) extends Module {
    val io = IO(new Bundle {
        val in  = Input(DspComplex(genIn, genIn))
        val out = Output(DspComplex(genMultOut, genMultOut))
    })
    
    val butterfly = Module(new Butterfly(genIn, genButterflyOut))
    val twiddles  = twiddlesFunc()
    val delay     = twiddles.length
    val delayReg  = Module(new ShiftRegister(DspComplex(genButterflyOut, genButterflyOut), delay))
    
    delayReg.io.in := butterfly.io.out0
    butterfly.io.in0 := delayReg.io.out(delay-1)
    butterfly.io.in1 := io.in
    
    val twiddleCount = RegInit(0.U(log2Ceil(delay).W)) // count up to delay
    twiddleCount := twiddleCount + 1.U
    
    val tw = twiddles(twiddleCount)
    
    io.out := butterfly.io.out1 * tw
}

val twiddles = () => Vec( DspComplex.wire(1.S(2.W), 0.S(2.W)), DspComplex.wire(0.S(2.W), 1.S(2.W)), DspComplex.wire(-1.S(2.W), 0.S(2.W)), DspComplex.wire(0.S(2.W), -1.S(2.W)) )
println(chisel3.Driver.emit( () => new PE(SInt(4.W), SInt(4.W), SInt(4.W), twiddles) ))

[[35minfo[0m] [0.000] Elaborating design...
[[35minfo[0m] [0.047] Done elaborating.
;buildInfoPackage: chisel3, version: 3.0-SNAPSHOT_2017-07-19, scalaVersion: 2.11.11, sbtVersion: 0.13.15, builtAtString: 2017-07-19 18:56:34.453, builtAtMillis: 1500490594453
circuit cmd20WrapperHelperPE : 
  module cmd19WrapperHelperButterfly : 
    input clock : Clock
    input reset : UInt<1>
    output io : {flip in0 : {real : SInt<4>, imag : SInt<4>}, flip in1 : {real : SInt<4>, imag : SInt<4>}, out0 : {real : SInt<4>, imag : SInt<4>}, out1 : {real : SInt<4>, imag : SInt<4>}}
    
    clock is invalid
    reset is invalid
    io is invalid
    node _T_22 = add(io.in0.real, io.in1.real) @[SIntTypeClass.scala 18:40]
    node _T_23 = tail(_T_22, 1) @[SIntTypeClass.scala 18:40]
    node _T_24 = asSInt(_T_23) @[SIntTypeClass.scala 18:40]
    node _T_25 = add(io.in0.imag, io.in1.imag) @[SIntTypeClass.scala 18:40]
    node _T_26 = tail(_T_25, 1) @[SIntTypeClass.scala 18:40]
    node _T_27 = asSInt(_T_

defined [32mclass[39m [36mPE[39m
[36mtwiddles[39m: () => [32mVec[39m[[32mDspComplex[39m[[32mSInt[39m]] = <function0>

### 3.c Putting together into an FFT