Skip to content

Latest commit

 

History

History
981 lines (771 loc) · 43.9 KB

cs150.md

File metadata and controls

981 lines (771 loc) · 43.9 KB

CS 150 - Digital Design

August 23, 2012

30 lab stations. Initially, individual, but later will partner up. Can admit up to 60 people -- limit. Waitlist + enrolled is a little over that. Lab lecture that many people miss: Friday 2-3. Specific lab sections (that you're in). Go to the assigned section at least for the first several weeks.

This is a lab-intensive course. You will get to know 125 Cory very well. Food and drink on round tables only. Very reasonable policy.

As mentioned: first few are individual labs, after that you'll pair up for the projects. The right strategy is to work really hard on the first few labs so you look good and get a good partner.

The book is Harris & Harris, Computer Design & Architecture.

Reading: (skim chapter 1, read section 5.6.2 on FPGAs) -- H&H, start to look at Ch. 5 of the Virtex User's Guide.

H&H Ch. 2 is on combinational circuits. Assuming you took 61C, not doing proofs of equivalence, etc.

Ch. 3 is sequential logic. Combinational is history-agnostic; sequential allows us to store state (dynamical time-variant system).

With memory and a NAND gate, you can make everything.

Chapter 4 is HDLs. Probably good to flip through for now. We're going to use Verilog this semester. Book gives comparisons between Verilog and VHDL.

First lab next week, you will be writing simple Verilog code to implement simple boolean functions. 5 is building blocks like ALUs. 6 is architecture. 7, microarchitecture: why does it work, and how do you make pipelined processors. May find there's actually code useful to final project. Chapter 8 is on memory.

Would suggest that you read the book sooner rather than later. Can sit down in first couple of weeks and read entire thing through.

Lecture notes: will be using the whiteboard. If you want lecture notes, go to the web. Tons of resources out there. If there's something particular about the thing Kris says, use Piazza. Probably used several times by now, so not an issue.

Cheating vs. collaboration: link on website that points to Kris's version of a cheating policy.

Grading! There will be homeworks, and there will probably be homework quizzes (a handful, so probably 10 + 5%). There will be a midterm at least (possibly two), so that's like 15%. Labs and project are like 10 and 30%, and the final is 30%.

Couple things to note: lab is very important, because this is a lab course. If you take the final and the midterm and the quizzes into account, that's 50% of your grade.

Lab lecture in this room (306 Soda, F 2-3p). Will probably have five weeks of lab lecture. Section's 3-4. Starting tomorrow.

Office hours to be posted -- as soon as website is up. Hopefully by tomorrow morning.

King Silicon

FINFETs (from Berkeley, in use by Intel). Waht can you do with 22nm tech? Logic? You get something more than $10^6$ digital gates per $mm^2$. SRAM, you get something like $10 Mb/mm^2$, Flash and DRAM, you get something like $10 MB/mm^2$. You want to put your MIPS processor on there, or a 32-bit ARM Cortex? A small but efficient machine? On the order of $10^5$ gates, so about $0.1 mm^2$. Don't need a whole lot of RAM and flash for your program. Maybe a megabit of RAM, a megabyte of flash, and that adds up to $0.3 mm^2$. Even taking into account cost of packaging and testing, you're making a chip for a few pennies.

Think of the cell phone: processor surrounded by a whole bunch of peripherals. I/O devices (speakers, screens, LEDs, buzzer; microphone, keypad, buttons, N-megapixel camera, 3-axis accelerometer, 3-axis gyroscope, 3-axis magnetometer, touchscreen, etc), networking devices (cell, Wifi, bluetooth, etc), Cool thing here is that it means that you can get all of these sensors in a little chip package. Something: microprocessor in general will not want direct interface. A whole cloud of "glue logic" that frees the processor from having to deal with idiosyncrasies of these things. Lots of different interfaces that you have to deal with. Another way of looking at this: microprocessor at the core, glue logic around the outside of that, that is talking to the analog circuitry, which talks to the actual transducers (something has to do conversions between different energy domains).

another way of looking at this is that you have this narrow waist of microprocessors, which connects all of this stuff. the real reason we do this is to get up to software. one goal of this class is to make you understand the tradeoffs between hw and sw. hw is faster, lower power, cheaper, more accurate (several axes: timing, sw is more flexible. if we knew everything people wanted to do, we'd put it in hw. everything is better in hw, except when you put it in hw, it's fixed. in general, you've got a bunch more people working in sw than in hw. this class is nice in that it connects these two worlds.

if you can cross that bridge and understand how to understand a software problem in hardware design stages, or solve a hw bug through software, you can be the magician.

what we're going to do this semester in the project is similar to previous projects in that we'll have a mips processor. looks like a 3-stage pipeline design, and we'll have you do a bunch of hardware interfaces (going across the hw/sw boundary, not into analog obviously). we have, for example, video: we might do audio out. we'll do a keyboard for sure and a timer. there's some things in here; all of this will end up being memory-mapped io so that software on the processor can get to it, and it'll also have interrupts (exceptions). not that many people who understand interrupts and can design that interface well so that sw people are happy with it.

You will not be wiring up chips on breadboards (or protoboards), the way we used to in this class. You'll be writing in Verilog. You'll basically be using a text editor to write Verilog, which is a HDL. There's a couple of forms: one is a structural one, where you actually specify the nodes. First lab you'll do that, but afterwards, you'll be working at a behavioral level. You'll let the synthesis engine figure out how to take that high-level description and turn it into the right stuff in the target technology. Has to do a mapping function, and eventually a place & route.

Lot of logic. Whole bunch of underlying techs you might map to, and in the end, you might go to an IC where there's some cell library that the particular foundry gives you. Very different than if you're mapping to a FPGA, which is what you'll be doing this semester. Job of the synthesis tool to turn text into right set of circuits, and that goes into simulation engine(s), and it lets you go around one of these loops in minutes for small designs, hours for larger designs, and iterate on your design and make sure it's right.

Some of you have used LTSpice and are used to using drawing to get a schematic, and that's another way to get into this kind of system as well, but that's structural. A big part of this course, unfortunately, is learning and understanding how these tools work: how I go through the simulation and synthesis process.

The better you get at navigating the software, the better a digital designer you'll be. Painful truth of it all. Reality is that this is exactly the way it works in industry. Nature of the IC CAD world. Something like a $10B/yr industry. Whole lot better than plugging into a board.

FPGA board: fast and cheap upfront, but expensive per part. Other part of spectrum is to go with an IC or application-specific integrated circuit (ASIC), which is slow and costly upfront, but cheap per unit. Something in between: use an FPGA + commercial off-the-shelf chips, and a custom PCB. Still expensive per part (less so), but it's pretty fast.

FPGA? Field-programmable gate array. The core of an FPGA is the configurable logic block. The whole idea behind a CLB is that you have the dataplane, where you have a bunch of digital inputs to the box, and some number of outputs in the data plane, and there's a separate control plane (configuration) that's often loaded from a ROM or flash chip externally when the chip boots up. Depending on what you put in, it can look different. Fast, since it's implemented at the HW level.

If you take a bunch of CLBs and put them in an array, and you put a bunch of wiring through that array, that is configurable wiring.

If we made this chip and went through the process, when we turned it into a single chip, we'd take everything we put into this, it'd be less than a square millimeter of 22nm silicon.

Talk about how FPGAs make it easy to connect external devices to a microprocessor.

Course material: we can start from systems perspective: systems are composed of datapath and control (FSM), or we can start from the very bottom: transistors compose gates, which get turned into registers and combinational logic, which gets turned into state and next state logic, which make up the control. Also, storage and math/ALU (from registers and combinational logic) makes up the datapath.

CS 150 Lab Lecture 0

August 24, 2012

Note: please make sure to finish labs 2 and 4, since those will be going into your final project. Labs will run first 6 weeks, after which we will be starting the final project.

Large design checkpoint, group sizes < 3.

stuff.

CS 150: Digital Design & Computer Architecture

August 28, 2012

Admin

Lab: cardkey, working on that. Not a problem yet. Labs: T 5:30-8:30, W 5-8, θ 5-8. Discussion section: looking for a room. Office hours online. θ 11-12, F 10:30-11:30. In 512 Cory.

Reading: Ch. 4 (HDL). This week: through 4.3 (section that talks about structural Verilog). For next week: the rest. It is Verilog only; we're not going to need VHDL. If you get an interview question in VHDL, answer it in Verilog.

Taxonomy

you've got HDLs, and there are really two flavors: VHDL and Verilog. Inside of Verilog you've got structural (from gates) and behavioral (say output is equal to a & b).

Abstraction

Real signals are continuous in time and magnitude. Before you do anything with signals, these days, you'll discretize in time and magnitude -- generally close enough to continuous for practical purposes. If it's a serial line, there's some dividing line in the middle, and the HW has to make a decision. Regular time interval called CLK; two values of magnitude is called binary.

Hierarchy

Compose bigger blocks from smaller blocks. Principle of reuse -- modularity based on abstraction is how we get things done (Liskov). Reuse tested modules. Very important design habit to get into. Both partners work on and define interface specification. Layering. Expose inputs and outputs and behavior. Define spec, then divide labor.

One partner implements module, one partner implements test harness.

Regularity: because transistors are cheap, and design time is expensive, sometimes you build smaller (simpler) blocks out of tested bigger blocks. Key pieces of what we want to do with our digital abstraction.

Abstraction is not reality. Simulation: Intel FDIV bug in the original Pentium. Voltage sag because of relatively high wire resistance.

Lab 0: our abstraction is structural Verilog. There are tons of online tutorials on Verilog; Ch 4.3 in H&H is a good reference on that; your TAs are a good reference. Pister's not a good reference on the syntax. You're allowed to drop a small number of different components on your circuit board and wire them up. If you want to make some circuit, you can.

Powers of two!

FDIV bug: from EE dept at UCLA: outraged that they had not done exhaustive testing.

note: $1 yr \approx \pi \cdot 10^7 s$. With this approximation, $\pi$ and $2$ are about the same. Combinatorial problem.

Combinational logic vs. sequential logic. Combinational logic: outputs are a function of current inputs, potentially after some delay (memoryless), versus sequential, where output can be a function of previous inputs.

Combinational circuits have no loops (no feedback), whereas circuits with memory have feedback. Classic: SR latch (2 xor gates hooked to each other).

So let's look at the high-level top-down big picture that we drew before: system design comes from a combination of datapath and control (FSM). On the midterm (or every midterm Pister's given for this course), there's going to be a problem about SRAM, and you're going to have to design a simple system with that SRAM.

e.g. Given 64k x 16 SRAM design, design a HW solution to find the min and max two's complement numbers in that SRAM.

Things you need to know about transistors for this class: you already know them.

Wired OR (could be a wired AND, depending on how you look at it). Open drain or open collector: this sort of thing.

Zero static power: CMOS inverter. Not longer true; power going down, but number goes up. Leakage current ends up being on the order of an amp. Also, increasingly, gates leak.

switching current: charging and discharging capacitors. $\alpha C V%2 f$

crowbar current: $I_{CC}$, While voltage is swinging from min to max or vice versa, this current exists. All of these things come together to limit performance of microprocessor.

a minterm is a product containing every input variable or its complement, and a maxterm is a sum containing every input variable or its complement.

CS 150: Digital Design & Computer Architecture

August 30, 2012

Introduction

Finite state machines, namely in Verilog. If we have time, canonical forms and more going from transistor to inverter to flip-flop.

So. The idea with lab1 is that you're going to be making a digital lock. The real idea is that you're going to be learning behavioral Verilog.

Finite State Machines

Finite state machines are sequential circuits (as opposed to combinational circuits), so they may depend on previous inputs. What we're interested in are synchronous (clocked) sequential circuits. In a synchronous circuit, the additional restriction is that you only care about in/out values on the (almost always) positive-going edge of the clock.

Drawing with a caret on it refers to a circuit sensitive on a positive clock edge. A bubble corresponds to the negative edge.

If we have a clock, some input D, and output Q, we have our standard positive edge-triggered D flip-flop. The way we draw an unknown value, we draw both values.

A register is one or more D flip-flops with a shared clock.

Blocking vs. unblocking assignments.

So. We have three parts to a Moore machine. State, Output logic, and next state. Mealy machine is not very different.

Canonical forms

Minterms and maxterms. Truth table is the most obvious way of writing down a canonical form. And then there's minterm expansion and maxterm expansion. Both are popular and useful for different reasons. A minterm is a product term containing every input variable, while a maxterm is a sum term containing every input variable. Consider min term as a way of specifying the ones in the truth table. Construction looks like disjunctive normal form.

Maxterms are just the opposite: you're trying to knock out rows of the truth table. If you've got some function that's mostly ones, you have to write a bunch of minterms to get it, as opposed to a handful of maxterms. Construction looks like conjunctive normal form.

Both maxterm and minterm are unique.

de Morgan's law: "bubble-pushing".

CS 150: Digital Design & Computer Architecture

September 4, 2012

FSM design: Problem statement, block diagram (inputs and outputs), state transition diagram (bubbles and arcs), state assignment, state transition function, output function.

Classic example: string recognizer. Output a 1 every time you see a one followed by two zeroes on the input.

When talking about systems, there's typically datapath and the FSM controller, and you've got stuff going between the two (and outside world interacts with the control).

Just go through the steps.

Low-level stuff

Transistor turns into inverter, which turns into inverter with enable, which turns into D flip-flop.

Last time: standard CMOS inverter. If you want to put an enable on it, several ways to do that: stick it into an NMOS transistor, e.g. When enable is low, output is Z (high impedance) -- it's not trying to drag output anywhere.

It turns out (beyond scope of things for this class) that NMOS are good at pulling things down, but not so much at pulling things up. Turns out you really want to add a PMOS transistor to pull up. We want this transistor to be on when enable is 1, but it turns on when the gate is low. So we stick an inverter on enable. Common; called a pass-gate (butterfly gate). Pass gates are useful, but they're not actually driving anything. They just allow current to flow through. If you put too many in series, though, things slow down.

Pass-gates as controlled inverters; can be used to create a mux.

SR (set/reset) latch. Requires a NOR gate. Useful thing about NOR and NAND is that with the right constant, they can make inverters. That is why they are useful in making latches (if we cross-couple two of them).

If S = R = 0, then NOR gates turn into inverters, and this thing effectively turns into a bistable storage element. If I feed in a 1, it'll force the output to be 0, which forces the original gate's input to be a 1.

Clock systems. Suppose we take our SR latch and put an AND gate in front of S and R with an enable line on it, we can now turn off this functionality, and when enable is low, S and R can do whatever they want; they're not going to affect the outputs of this thing. You can design synchronous digital systems using simple level sensitive latches.

Contrast with ring oscillator (3-stage; simplest). That is unstable -- if I put an odd number of inverters in series, there is no stable configuration. Very useful for generating a clock. Standard crystal oscillator: Pierce configuration.

Odd number of stages unstable, two stages stable, more stages you have to wrry about other things.

Can be clocked, but you have to be careful. For example: if I wanted to design a 1-bit counter, with a clocked system, we can consider a level-sensitive D latch. This is what happens when you get a latch in Verilog. Otherwise, the synthesis tool well have it keep its previous value. If you do that, it turns out that probably gives you enough delay that when the clock is high, the output is 1; it'll probably oscillate. So that's bad; maybe we'll make the enable line (the clock line) really narrow. And not surprisingly, that's called narrow clocking. For simple systems, you can get away with that. Make delay similar to single gate's or a few gates' delay. However, ugly, and don't do that. Back in the day, people did this, and they were simple enough that they could get away with it.

What I really want is my state register and its output going through some combinational logic block with some input to set my next state, and only once per clock period does this thing happen. The problem here is in a single clock period, I get a couple of iterations through the loop. So how do I take my level-sensitive latch (I've turned it into a D latch with enable, and that's my clock), and when clock is low, there's no problem. I don't worry that my input's going to cruise through this thing; and when it's high, I want my input (the D input) to remain constant.

As long as clock is high, I don't care; it'll maintain its state, since I'm not looking at those inputs. There are a whole bunch of ways you can do it (all of which get used somewhere), but the safest (and probably most common) is to stick another latch (another clocked level-sensitive latch) in front of it with an inverter. That's now my input.

So when the clock is low, the first one is enabled, and it's transparent (it's a buffer). This is called an edge-triggered master/slave D flip-flop.

The modern way of implementing the basic D latch is by using feedback for the storage element, and an input (both the feedback and the input are driven by out-of-phase enable). My front end (the master) is driving the signal line when the clock is low, and, conversely, when the clock is high, the feedback inverter will be driving the line. Bistable storage element maintaining its state, and input is disconnected.

Now, with the slave, same picture. Sensitive when clock is high, as opposed to master, which is sensitive when clock is low. The idea is that the slave prevents anything from getting into the storage element until it stabilizes.

At the end of the day, the rising edge of the clock latched D to Q. Variation that happened after, doesn't propagate to master; variation that happened before, slave wasn't listening. So now we have flip-flops and can make FSMs.

CS 150: Digital Design & Computer Architecture

September 6, 2012

Verilog synthesis vs. testbenches (H&H 4.8)

There's the subset of the language that's actually synthesizable, and then there's more stuff just for the purpose of simulation and testing. Way easier to debug via simulation.

Constructs that don't synthesize:

  • #t: used for adding a simulation delay.
  • ===, !==: 4 state comparisons (0, 1, X, Z).
  • System tasks (e.g. $display, prints to console in a C printf-style format that's pretty easy to figure out; $monitor, prints to console whenever its arguments change)

In industry, it's not at all uncommon to write the spec, write the testbench, then implement the module. Once the testbench is written, it becomes the real spec. "You no longer have bugs in your code; you only have bugs in your specification."

How do we build a clock in Verilog?

parameter halfT = 5;
reg CLK;
initial CLK = 0;
always begin
	#(halfT) CLK = ~CLK;
end

H&H example 4.39 shows you how to read from a file.

silly_function(.a(a), .b(b), .c(c), .s(s));

reg [3:0] testvect[10000:0];
$readmemb("test.tv", testvect); // done when input is 4 bits of X (don't care)

always @(posedge CLK)
	#1 assign {a, b, c, out_exp} = testvect[num]

always @(negedge CLK) begin
	if (s !== out_exp) $display("error ... ");
	num <= num + 1;
end

How big can you make shift registers? At some point, IBM decreed that every register on every IBM chip would be part of one gigantic shift register. So you've got your register file feeding your ALU; it's a 32 x 32 register. There's a test signal; when it's high, the entire thing becomes one shift register. Why? Testing. This became the basis of JTAG. Another thing: dynamic fault imaging. Take a chip and run it inside a scanning electron microscope. Detects backscatter from electrons. Turns out that a metal absorbs depending on what voltage they're at, and oxides absorb depending on the voltage of the metal beneath them. So you get a different intensity depending on the voltage.

We can also take these passgates and make variable interconnects. So if I've got two wires that don't touch, I can put a passgate on there and call that the connect input.

Last time we talked about MUXes. I can make a configurable MUX -- the MUX, we did a two-to-one mux, and if I've got some input over here, I select according to what I have as my select input.

Next time: more MIPS, memory.

CS 150: Digital Design & Computer Architecture

September 11, 2012

Changed office hours this week. CLBs, SAR, K-maps.

Last time: we went from transistors to inverters with enable to D-flipflops to a shift register with some inputs and outputs, and, from there to the idea that once you have that shift register, then you can hook that up with an n-input mux and make an arbitrary function of n variables.

This gives me configurable logic and configurable interconnects, and naturally I take the shift out of one and into another, and I've got an FPGA.

The LUT is the basic building block: I get four of those per slice, and some other stuff: includes fast ripple carry logic for adders, the ability to take two 6s and put them together to form a 7-LUT. So: pretty flexible, fair amount of logic, and that's a slice. One CLB is equal to two slices.

And I've got, what is that, 8640 CLBs per chip. Also: DSP48 blocks. 64 of these, and each one is a little 48-bit configurable ALU. So that gives you something like 70000 (6-LUTs + D-ff). So that's what you've got, what we work with.

Now, let's talk about a successive approximation register analog to digital converter. A very popular device. Link to a chip that implemented the digital piece of this thirty years ago. Why are we looking at this? It's nice; it's an example of a "mixed-signal" (i.e. analog mixed with digital in the same block, where you have a good amount of both) system.

It turns out that analog designers need to be good digital designers these days. I was doing some consulting for a company recently. They had brilliant analog designers, but they had to do some digital blocks on their chips.

"Real world" interfaces. Has some number of output bits that go into the DAC; the DAC's output is simply "linear". You trust the analog designer to give you this piece, and the digital comparator sample and hold circuit, with the sample input, and here's your analog input voltage. So real quick we'll look at what's in those blocks (even though this isn't 150 material). S/H: simplest example is just a transistor. Maybe it's a butterfly gate; typically, there's some storage capacitor on the outside so that if you've got your input voltage; when it goes low, that is held on there.

Maybe there's some buffer amplifier (little CMOS opamp so it can drive nice loads); capacitive input, so signal will stay there for a long time. Not 150 material.

The DAC, a simple way of making this is to generate a reference voltage (diode-connected PMOS with voltage division, say). which you mirror, tied together with a switch, and all of these share the same gate. Comparator's a little more subtle. Maybe when we talk about SRAMs and DRAMs.

Anyway. So. Now we have the ability to generate an analog voltage under digital control. We sample that input and are going to use that signal. This tells us whether the output of the DAC is too big. That together is called a SAR.

So what does that thing do? There's a very simple (dumb) SAR: a counter. From reset in time, its digital output increases linearly; at some point it crosses the analog $V_{in}$, and at that point, you stop. But: that's not such a great thing to do: between 1 and 1024 cycles to get the result. The better way is to do a binary search. Fun to do with dictionaries and kids. Also works here. FSM: go bit-by-bit, starting with most significant bit.

Better solution (instead of using oversized tree -- better in the sense of less logic required): use a shift register (and compute all bits sequentially). Or counter going into a decoder; sixteen outputs of which I only need 10.

Next piece: another common challenge and where a lot of mistakes get made: analog stuff does not simulate as well. While you're developing and debugging, you have to come up with some way of simulating.

Good news: you can often go in and fix things like that. Sort of an aside (although it sometimes shows up in my exams), once you put these transistors down, and then you've got all these layers of metal on top. Turns out that you can actually put this thing in a scanning electron microscope and use undedicated logic and go in with a FIB (focused ion beam) and fix problems. "Metal spin".

Back to chapter two: basic gates again. de Morgan's law: $\bar{AB} = \bar{A} + \bar{B}$: $\bar{\prod{A_i}} = \sum \bar{A_i}$. Similarly, $\bar{\sum{A_i}} = \prod \bar{A_i}$. Suppose you have a two-level NAND/NAND gate: that becomes a sum of products (SoP). Similarly, NOR/NOR is equivalent to a product of sums (PoS).

Now, if I do NOR/NOR/INV, this is a sum of products, but the inputs are inverted. This is an important one. This particular one is useful because of the way you can design logic. The way we used to design logic a few decades ago (and the way we might go back in the future) was with big long strings of NOR gates.

So if I go back to our picture of a common source amplifier (erm, inverter), and we stick a bunch of other transistors in parallel, then we have a NOR gate. Remember: MOS devices have parasitic capacitance.

Consider another configuration. Suppose we invert our initial input and connect to both of these a circled wire, which can be any of the following: fuse / anti-fuse, mask-programmable (when you make the mask, decision to add a contact), transistor with flipflop (part of shift register, e.g.), an extra gate (double-gate transistors).

So now if I chain a bunch more of these together (all NOR'd together, then I can program many functions. In particular, it could just be an inverter.

I can put a bunch of these together, and I can combine the function outputs with another set of NORs, invert it all at the end, and I end up with NOR/NOR/INV.

These guys are called PLAs (programmable logic arrays), and you can still buy them, and they're still useful. Not uncommon to have a couple of flipflops on them. Will have a homework assignment where you use a 30 cent PLA and design something. Quick and dirty way of getting something for cheap.

Not done anymore because slow (huge capacitances), but may come back because of carbon nanotubes. Javey managed to make a nanotube transistor with a source, drain, gate, and he got transport, highest current density per square micron cross-section ever, and showed physics all worked, and this thing is 1nm around. What Prof. Ali Javey's doing now is working with nanowires and showing that you can grow these things on a roller and roll them onto a surface (like a plastic surface), and putting down layers of nanowires in alternating directions.

You can imagine (we're a ways away from this) where you get a transistor at each of these locations, and you've got some CMOS on this side generating signals, and CMOS on the output taking the output (made with big fat gigantic 14nm transistors), and you can put $10^5$ transistors per square micron (not pushing it, since density can get up to $10^6$). End of road for CMOS doesn't mean you ditch CMOS. Imagine making this into a jungle gym; then you're talking about $10^8$ carbon nanotubes per cubic micron, etc. The fact that we can make these long thin transistors on their lengths means that this might come back into fashion.

CS 150: Digital Design & Computer Architecture

September 13, 2012

Questions of the form 16x16 SRAM, design a circuit that will find the smallest positive integer or biggest even number, or count number of times 17 appears in memory, etc. Kris loves these questions where you figure out design (remember: separate datapath and control, come up with it on your own) -- will probably show up on both midterm and final.

Office hours moved.

So... last time, we were talking about PLAs (prog logic array) and stuff (NOR/NOR equivalent to AND/OR). You'll hear people talking about AND plane and OR plane, even though they're both NORs. If you look at Fig 2.2.3, they'll show the same regular and inverted signals, and they just draw this as a line with an AND gate at the end. Pretty common way to draw this; lines with OR gates.

Variant of PLA called a PAL -- subsets of "product" terms going to "OR" gates.

Beginning of complex programmable logic devices (CPLDs, FPGAS). You can still buy these registered PALs.

Why would you use this over a microprocessor? Faster. Niche. The "oh crap" moment when you finish your board and you find that you left something out.

I want to say a little about memory, because you'll be using block ram in your lab next week. There's a ton of different variations of memory, but they all have a couple of things in common: a decoder (address decoder) where you take $n$ input bits and turn them into $2^n$ word lines in a memory that has $2^n$ words. Also have cell array. Going through cell array you have some number of bit lines, we'll call this either $k$ or $2k$, depending on the memory. That goes into some amps / drivers, and then out the other side you have $k$ inputs and/or outputs. Sometimes shared (depends whether or not there's output-enable). Write-enable, output-enable, sometimes clock, sometimes d-in as well as d-out, sometimes multiple d-outs (multiple address data pairs); whole bunch of variation in how this happens. Conceptually, though all comes down to something that looks like this.

So what's that decoder look like? Decoders are very popular circuits: they generate all minterms of their input (gigantic products). Note that if you invert all of the outputs, we get the maxterms (sums).

That was DRAM. Now, SRAM:

Still have word line going across; now I have a bit line and negated bit line. Inside, I have two cross-coupled inverters (bistable storage element). Four transistors in there: already down (vs. 1), and I still have to access. Access transistor going to each side, hooked up to the word line. When I read this thing, I put in an n-bit address, and the transistors pull the bit lines. We want these as small as possible for the bit density. 6T, sense amp needed. You can imagine that what you usually do is pre-charge $BL$, $\bar{BL}$. As soon as you raise the word line for this particular row, what you find is that one of them starts discharging, and the other is constant. Analog sensing present so you can make a decision much much faster.

That's how reads work; writes are interesting. Suppose I have some $D_{in}$, what do I do? I could put an output-enable so that when writing, they don't send anything to the output, but that would increase size significantly. So what do I do? I just make big burly inverters and drive the lines. Big transistors down there overcome small transistors up there; and they flip the bit. PMOS is also generally weaker than NMOS, etc. Just overpower it. One of rare times that you have PMOS pulling up and NMOS pulling down. (notion of "bigger": $W/L$).

Transistors leak. They can leak a substantial amount. By lowering voltage, I reduce power. It turns out there's a nonlinear relationship here, and so the transistors leak a lot less.

So that's SRAM. The other question? What about a register? What's the difference between this and a register file? Comes back to what's in the cell array. We talked that a register is a bunch of flipflops with a shared clock and maybe a shared enable. Think of a register as having the common word line, and you've got a D flipflop in there. There's some clock shared across the entire array, and there's an enable on it and possibly an output, depending on what kind of system you've got set up. We've got D-in, D-out, and if I'm selecting this thing, presumably I want output-enable; if I'm writing, I need to enable write-enable.

So. You clearly have the ability to make registers on chips, so you can clearly do this on the FPGA. Turns out there's some SRAMs on there, too. There's an external SRAM that we may end up using for the class project, and there's a whole bunch of DDR DRAM on there as well.

Canonical forms

Truth tables, minterm / maxterm expansions. These we've seen.

If you have a function equal to the sum of minterms 1,3,5,6,7, we could implement this with fewer gates by using the maxterm expansion.

"Minimum sum of products", "minimum product of sums".

Karnaugh Maps

Easy way to reduce to minimum sum of products or minimum product of sums. (Section 2.7). Based on the combining theorem, which says that $XA + X\bar{A} = X$. Ideally: every row should just have a single value changing. So, I use Gray codes. (e.g. 00, 01, 11, 10). Graphical representation!

CS 150: Digital Design & Computer Architecture

September 18, 2012

Lab this week you are learning about chipscope. Chipscope is kinda like what it sounds: allows you to monitor things happening in the FPGA. One of the interesting things about Chipscope is that it's a FSM monitoring stuff in your FPGA, it also gets compiled down, and it changes the location of everything that goes into your chip. It can actually make your bug go away (e.g. timing bugs).

So. Counters. How do counters work? If I've got a 4-bit counter and I'm counting from 0, what's going on here?

D-ff with an inverter and enable line? This is a T-ff (toggle flipflop). That'll get me my first bit, but my second bit is slower. $Q_1$ wants to toggle only when $Q_0$ is 1. With subsequent bits, they want to toggle when all lower bits are 1.

Counter with en: enable is tied to the toggle of the first bit. Counter with ld: four input bits, four output bits. Clock. Load. Then we're going to want to do a counter with ld, en, rst. Put in logic, etc.

Quite common: ripple carry out (RCO), where we AND $Q[3:0]$ and feed this into the enable of $T_4$.

Ring counter (shift register with one hot out), If reset is low I just shift this thing around and make a circular shift register. If high, I clear the out bit.

Mobius counter: just a ring counter with a feedback inverter in it. Just going to take whatever state in there, and after n clock ticks, it inverts itself. So you have $n$ flipflops, and you get $2n$ states.

And then you've got LFSRs (linear feedback shift registers). Given N flipflops, we know that a straight up or down counter will give us $2^N$ states. Turns out that an LFSR give syou almost that (not 0). So why do that instead of an up-counter? This can give you a PRNG. Fun times with Galois fields.

Various uses, seeds, high enough periods (Mersenne twisters are higher).

RAM

Remember, decoder, cell array, $2^n$ rows, $2^n$ word lines, some number of bit lines coming out of that cell array for I/O with output-enable and write-enable.

When output-enable is low, D goes to high-Z. At some point, some external device starts driving some Din (not from memory). Then I can apply a write pulse (write strobe), which causes our data to be written into the memory at this address location. Whatever was driving it releases, so it goes back to high-impedance, and if we turn output-enable again, we'll see "Din" from the cell array.

During the write pulse, we need Din stable and address stable. We have a pulse because we don't want to break things. Bad things happen.

Notice: no clock anywhere. Your FPGA (in particular, the block ram on the ML505) is a little different in that it has registered input (addr & data). First off, very configurable. All sorts of ways you can set this up, etc. Addr in particular goes into a register and comes out of there, and then goes into a decoder before it goes into the cell array, and what comes out of that cell array is a little bit different also in that there's a data-in line that goes into a register and some data-out as well that's separate and can be configured in a whole bunch of different ways so that you can do a bunch of different things.

The important thing is that you can apply your address to those inputs, and it doesn't show up until the rising edge of the clock. There's the option of having either registered or non-registered output (non-registered for this lab).

So now we've got an ALU and RAM. And so we can build some simple datapaths. For sure you're going to see on the final (and most likely the midterm) problems like "given a 16-bit ALU and a 1024x16 sync SRAM, design a system to find the largest unsigned int in the SRAM."

Demonstration of clock cycles, etc. So what's our FSM look like? Either LOAD or HOLD.

On homework, did not say sync SRAM. Will probably change.

CS 150: Digital Design & Computer Architecture

September 20, 2012

Non-overlapping clocks. n-phase means that you've got n different outputs, and at most one high at any time. Guaranteed dead time between when one goes low and next goes high.

K-maps

Finding minimal sum-of-products and product-of-sums expressions for functions. On-set: all the ones of a function; implicant: one or more circled ones in the onset; a minterm is the smallest implicant you can have, and they go up by powers of two in the number of things you can have; a prime implicant can't be combined with another (by circling); an essential prime implicant is a prime implicant that contains at least one one not in any other prime implicant. A cover is any collection of implicants that contains all of the ones in the on-set, and a minimal cover is one made up of essential prime implicants and the minimum number of implicants.

Hazards vs. glitches. Glitches are when timing issues result in dips (or spikes) in the output; hazards are if they might happen. Completely irrelevant in synchronous logic.

Project

3-stage pipeline MIPS150 processor. Serial port, graphics accelerator. If we look at the datapath elements, the storage elements, you've got your program counter, your instruction memory, register file, and data memory. Figure 7.1 from the book. If you mix that in with figure 8.28, which talks about MMIO, that data memory, there's an address and data bus that this is hooked up to, and if you want to talk to a serial port on a MIPS processor (or an ARM processor, or something like that), you don't address a particular port (not like x86). Most ports are memory-mapped. Actually got a MMIO module that is also hooked up to the address and data bus. For some range of addresses, it's the one that handles reads and writes.

You've got a handful of different modules down here such as a UART receive module and a UART transmit module. In your project, you'll have your personal computer that has a serial port on it, and that will be hooked up to your project, which contains the MIPS150 processor. Somehow, you've got to be able to handle characters transmitted in each direction.

UART

Common ground, TX on one side connected to RX port on other side, and vice versa. Whole bunch more in different connectors. Basic protocol is called RS232, common (people often refer to it by connector name: DB9 (rarely DB25); fortunately, we've moved away from this world and use USB. We'll talk about these other protocols later, some sync, some async. Workhorse for long time, still all over the place.

You're going to build the UART receiver/transmitter and MMIO module that interfaces them. See when something's coming in from software / hardware. Going to start out with polling; we will implement interrupts later on in the project (for timing and serial IO on the MIPS processor). That's really the hardcore place where software and hardware meet. People who understand how each interface works and how to use those optimally together are valuable and rare people.

What you're doing in Lab 4, there's really two concepts of (1) how does serial / UART work and (2) ready / valid handshake.

On the MIPS side, you've got some addresses. Anything that starts with FFFF is part of the memory-mapped region. In particular, the first four are mapped to the UART: they are RX control, RX data, TX control, and TX data.

When you want to send something out the UART, you write the byte -- there's just one bit for the control and one byte for data.

Data goes into some FSM system, and you've got an RX shift register and a TX shift register.

There's one other piece of this, which is that inside of here, the thing interfacing to this IO-mapped module uses this ready bit. If you have two modules: a source and a sink (diagram from the document), the source has some data that is sending out, tells the sink when the data is valid, and the sink tells the source when it is ready. And there's a shared "clock" (baud rate), and this is a synchronous interface.

  • source presents data
  • source raises valid
  • when ready & valid on posedge clock, both sides know the transaction was successful.

Whatever order this happens in, source is responsible for making sure data is valid.

HDLC? Takes bytes and puts into packets, ACKs, etc.

Talk about quartz crystals, resonators. $\pi \cdot 10^7$.

So: before I let you go, parallel load, n bits in, serial out, etc.

CS 150: Digital Design & Computer Architecture

September 25, 2012