Skip to content
This repository

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Browse code

6.823 PS 1

  • Loading branch information...
commit a09ea13978799fd1a1754c8745145de524af65b7 0 parents
Victor Costan authored
2  .gitignore
... ...
@@ -0,0 +1,2 @@
  1
+bin
  2
+tmp
17  .project
... ...
@@ -0,0 +1,17 @@
  1
+<?xml version="1.0" encoding="UTF-8"?>
  2
+<projectDescription>
  3
+	<name>pset_writeups</name>
  4
+	<comment></comment>
  5
+	<projects>
  6
+	</projects>
  7
+	<buildSpec>
  8
+		<buildCommand>
  9
+			<name>net.sourceforge.texlipse.builder.TexlipseBuilder</name>
  10
+			<arguments>
  11
+			</arguments>
  12
+		</buildCommand>
  13
+	</buildSpec>
  14
+	<natures>
  15
+		<nature>net.sourceforge.texlipse.builder.TexlipseNature</nature>
  16
+	</natures>
  17
+</projectDescription>
13  .texlipse
... ...
@@ -0,0 +1,13 @@
  1
+#TeXlipse project settings
  2
+#Mon Sep 14 00:05:53 EDT 2009
  3
+builderNum=2
  4
+outputDir=bin
  5
+makeIndSty=
  6
+bibrefDir=
  7
+outputFormat=pdf
  8
+tempDir=tmp
  9
+mainTexFile=master.tex
  10
+outputFile=pset.pdf
  11
+langSpell=en
  12
+markDer=true
  13
+srcDir=src
21  README.textile
Source Rendered
... ...
@@ -0,0 +1,21 @@
  1
+h1. Problem Set Write-Ups by Victor Costan
  2
+
  3
+This site contains write-ups for the problem sets that I (Victor Costan) was
  4
+assigned in my MIT classes.
  5
+
  6
+h2. Installation
  7
+
  8
+The repository is one big Eclipse project, using the Texlipse plugin that you
  9
+can download at http://texlipse.sourceforge.net/
  10
+
  11
+The write-ups are built using some derivative of the Tex Live distribution that
  12
+can be found at http://www.tug.org/texlive/acquire.html. More specifically, the
  13
+following derivatives are used:
  14
+* MacTex, on MacOS X Snow Leopard: http://www.tug.org/mactex/downloading.html
  15
+* the following packages on Ubuntu 9.10: TBD 
  16
+
  17
+h2. Academic Honesty
  18
+
  19
+If you're a student in one of my classes, I expect that you won't look at my
  20
+solutions before the deadlines for the problem sets. If you use my work for
  21
+other problem sets, I expect that you will acknowledge my contribution.
2  src/6.823/metadata.tex
... ...
@@ -0,0 +1,2 @@
  1
+\newcommand{\PsetClassNumber}{6.823}
  2
+\newcommand{\PsetClassTerm}{Fall 2009}
183  src/6.823/ps1/all.tex
... ...
@@ -0,0 +1,183 @@
  1
+\section{Problem 1}
  2
+Please see attached piece of paper. I don't know how to make circuit diagrams
  3
+in \LaTeX.
  4
+
  5
+\section{Problem 2}
  6
+The value of the second flip-flop is inverted each clock cycle. Assuming the
  7
+flip-flop starts out reset, this means the FF1 bit will be 1 for $2k + 1$ (odd)
  8
+cycles, and 0 for $2k$ (even) cycles.
  9
+
  10
+The value of the first flip-flop only changes when FF1 is 1. So, each odd
  11
+cycle, the bit in FF0 is inverted. Assuming FF0 starts out out reset, this
  12
+means the FF0 bit will be 0 on cycles $4k$ and $4k + 3$, and the bit will be 1
  13
+on cycles $4k + 1$ and $4k + 2$.
  14
+
  15
+\section{Problem 3}
  16
+
  17
+\subsection{Part A}
  18
+The task seems to be computing sum and difference of two 10-dimension vectors.
  19
+The pseudocode is presented in listing \ref{problem3a:code}.
  20
+
  21
+\lstinputlisting[float=bph, language=C, caption=The task performed by the
  22
+MIPS64 code, label=problem3a:code]{6.823/ps1/problem3a.c}
  23
+
  24
+\subsection{Part B}
  25
+Segment B should perform better, because it has less memory loads. Memory loads
  26
+cause pipeline stalls, even if the requested data is in L1 cache.
  27
+
  28
+\subsection{Part C}
  29
+The two segments can produce different results if there is another process
  30
+writing to the memory holding the two input vectors pointed by r1 and r2.
  31
+
  32
+First, segment A should run faster so, given a fixed pattern of writes $(A_i,
  33
+V_i, T_i)$ of value $V_i$ to address $A_i$ at time $T_i$, the pattern might
  34
+interfere with segment B's operation, but not interfere with segment A's
  35
+operation.
  36
+
  37
+Second, segment A has the invariant that a sum and a difference are guaranteed
  38
+to be computed from the same two numbers, regardless of how the memory contents
  39
+changes. Segment B does not have that invariant, because it uses separate loads
  40
+to compute the sum and the difference.
  41
+
  42
+\section{Problem 4}
  43
+
  44
+\subsection{Part A}
  45
+The problem here is that BEQ changes the control flow. More specifically, the
  46
+{\it Instruction Fetch} stage depends on the output of the ALU stage. Actually,
  47
+in a straightforward implementation, the PC (program counter register; input
  48
+for the IF stage) would be updated in the {\it Write Back} stage.
  49
+
  50
+The easiest way to ensure correct behavior would be to stall the pipeline
  51
+for 4 cycles after a conditional branching instruction. Stalling would be
  52
+achieved by inserting NOP instructions in the {\it IF} stage.
  53
+
  54
+Better performance can be achieved by introducing control logic that modify the
  55
+PC register right after the {\it ALU} stage, so the {\it IF} stage can use the
  56
+result immediately. This approach stalls the pipeline for 2 clock cycles
  57
+instead of 4.
  58
+
  59
+\subsection{Part B}
  60
+The problem is that the DSUB operation uses the result of the DSUB operation.
  61
+So the {\it Register File} stage of DSUB must wait for the completion of the
  62
+{\it Write Back} stage of DADD.
  63
+
  64
+The easiest way to ensure correct behavior would be to modify the {\it IF} stage
  65
+to add 3 NOPs after any instruction that writes to a register.
  66
+
  67
+A better-performing solution would do the following:
  68
+\begin{enumerate}
  69
+\item Add logic to the {\it Memory Wait} stage to allow writing to the
  70
+{\it Register File} in this stage, if the write does not involve memory data.
  71
+This is a valid solution for Part A, if the PC is contained in the {\it Register
  72
+File}.
  73
+\item Add logic in the {\it IF} and {\it RF} stages that determines if an
  74
+instruction's output is the same register as next instruction's input, and
  75
+stalls the pipeline in that case.
  76
+\item Stall the pipeline for 2 cycles (not 3, as required by the easy solution),
  77
+if the output of an instruction relies on the input of another instruction.
  78
+\end{enumerate}
  79
+
  80
+\subsection{Part C}
  81
+The problem is that the ADD instruction uses the output of the LD instruction.
  82
+More specifically, the input of the {\it Register File} stage of DADD depends
  83
+on the output of the {\it Write Back} stage of LD. 
  84
+
  85
+The easiest way to ensure correct behavior would be to stall the pipeline for 3
  86
+cycles after any LD instruction. The cycle computation is valid if each memory
  87
+access requires exactly 2 cycles, as suggested by the pipelining diagram, which
  88
+would imply that there is no caching.
  89
+
  90
+A better-performing solution would work along the same lines as in part B, with
  91
+the major difference that the 3 cycle stall cannot be reduced to 2 cycles,
  92
+since the memory output must propagate to the register file. The solution
  93
+becomes more complicated if memory accesses are cached, because the pipeline
  94
+has to be stalled for an unknown number of cycles.
  95
+
  96
+\section{Problem 5}
  97
+I know the ideas behind caching, especially at a software level, but I forgot
  98
+the hardware implementation details. I refered to the caching lecture (Lecture
  99
+19) on the OCW site for 6.004. {\tt http://ocw.mit.edu/}
  100
+
  101
+My source has great figures on cache implementation mechanisms. For that
  102
+reason, I will not attempt to make my own crappy pictures. Instead, I summarize
  103
+the way the implementations work, so that I can show I understand the topic,
  104
+and I refer the reader to the 6.004 slides for the very intuitive figures.
  105
+
  106
+\subsection{Part A}
  107
+A cache is a special memory that is orders of magnitude smaller than the main
  108
+memory, but also orders of magnitue faster. Caches are built on the assumption
  109
+of spatial locality of memory accesses -- if an instruction accesses a location
  110
+in memory, it's likely that some instructions following soon afterwards will
  111
+access the same location, or neighboring locations.
  112
+
  113
+A cache stores the addresses and values of recently accessed memory locations.
  114
+Caches are made up of {\it lines}, and each line contains an association
  115
+between an address and the value of that address. The size of a cache line is
  116
+the size of the value associated with an address, and is measured in bytes or
  117
+machine words.
  118
+
  119
+Cache line addresses are always aligned to cache line boundaries, so the least
  120
+significant bits of a cache line address are always 0, and they don't have to
  121
+be stored. The part of the address that does have to be stored is called a
  122
+line's tag, and the memory taken up by that is called tag RAM. In particular,
  123
+doubling the cache size reduces each line's tag by 1 bit.
  124
+
  125
+\subsection{Part B}
  126
+In higher-level software caches (e.g. memory paging, where memory is a cache
  127
+for a larger disk-based virtual memory), each line has a valid bit, which
  128
+determines if the address-value association at that line is valid. If the valid
  129
+bit is 0, the line is ``empty'', and needs to be re-initialized.
  130
+
  131
+For a RAM cache, it should be possible to initialize the cache on processor
  132
+power-up in a way such that each line is valid.
  133
+
  134
+\subsection{Part C}
  135
+There are two main cache implementations that balance cost and flexibility.
  136
+Fully associative caches are more expensive, but each line in the cache can
  137
+store any address. Fully associative caches are implemented as follows: each
  138
+line has an equality comparator between the significant address bits and the
  139
+line's tag bits; the comparison's outcome (1 for equality, 0 for inequality) is
  140
+ANDed with the NOT of each bit in the line's value, and the results are
  141
+connected to the corresponding cache output bits by pull-up transistors. This
  142
+works because for any address lookup, at most one comparator will produce a 1.
  143
+
  144
+Conversely, direct mapped associative caches are much cheaper but, for a
  145
+given address, there is exactly one line in the cache that could store it. This
  146
+greatly reduces cache performance for access pattern involving different
  147
+addresses that map to the same line. Direct mapped caches are implemented using
  148
+fast SRAM: the tag and value bits for each line are stored in the SRAM, and an
  149
+address lookup is transformed into a lookup in the SRAM. The SRAM's output is
  150
+compared to the address using the same mechanism as direct mapped caches
  151
+(comparator, AND, inverter, pull-up transistor). However, since there is a
  152
+single SRAM output, only one instance of the comparing logic is required for
  153
+the entire cache.
  154
+
  155
+n-way associative caches are a compromise between flexibility and cost. In an
  156
+n-way associative cache, each address can stored in n different lines. This
  157
+greatly reduces the number of memory access patterns that would cause
  158
+contention for a cache line. n-way associative caches are implemented as n
  159
+direct-mapped caches whose results are combined by n instances of the
  160
+comparator logic used in fully-associative caches.
  161
+
  162
+\subsection{Part D}
  163
+As stated in part A, storing multiple words in each line reduces the number of
  164
+tag bits needed per line, which in turn reduces the ratio of tag bits to value
  165
+bits in the cache. Therefore, most caches store multiple words per line. This
  166
+is implemented by starting with the mechanism in part C to look up cache lines,
  167
+then feeding the words in a line to a multiplexer which is controlled by the
  168
+right bits of the address (the most significant bits out of the bits that are
  169
+discarded for tag lookup).
  170
+
  171
+\subsection{Part E}
  172
+A replacement policy is used for direct mapped and n-way associative hashes,
  173
+when the address looked up in the cache is not found. In that case, an
  174
+association that is in the cache at the time of the lookup must be discarded,
  175
+to make room the the address that is currently looked up.
  176
+
  177
+LRU selects the least recently accessed address out of all possible addresses
  178
+(the entire cache for direct mapped caches, the n lines mapped to the new
  179
+address in an n-way cache), while the random replacement policy selects a
  180
+random address. LRU is easier to test for, and provides a clear mechanism that
  181
+works well in practice. Random replacement ensures that there is no known
  182
+worst-case access pattern, so the cache won't look bad in adversarial
  183
+benchmarks (usually put together by a competing supplier).
7  src/6.823/ps1/problem3a.c
... ...
@@ -0,0 +1,7 @@
  1
+void task(uint64_t* a, uint64_t* b, uint64_t* sums,
  2
+		  uint64_t* differences) {
  3
+	for (uint64_t i = 0; i < 10; i++) {
  4
+		sums[i] = a[i] + b[i];
  5
+		differences[i] = a[i] + b[i];
  6
+	}
  7
+}
41  src/master.tex
... ...
@@ -0,0 +1,41 @@
  1
+\documentclass{article}
  2
+
  3
+%% Packages 
  4
+\usepackage{epsfig}
  5
+\usepackage{endnotes}
  6
+\usepackage{listings}
  7
+\lstset{numbers=left, frame=lines, tabsize=4, captionpos=b, numberstyle=\tiny}
  8
+\usepackage{url}
  9
+\usepackage{graphicx}
  10
+\usepackage{amssymb}
  11
+\usepackage{latexsym}
  12
+\usepackage{amsmath}
  13
+\usepackage{boxedminipage}
  14
+\usepackage{appendix}
  15
+\usepackage{clrscode}
  16
+\usepackage{fancyhdr}
  17
+\pagestyle{fancy}
  18
+
  19
+%% Pset author.
  20
+\newcommand{\PsetAuthorName}{Victor Costan}
  21
+\newcommand{\PsetAuthorEmail}{costan@mit.edu}
  22
+
  23
+%% Pset class and instance information.
  24
+\newcommand{\PsetTitle}{Problem Set 1}
  25
+\newcommand{\PsetDueDate}{September 14, 2009}
  26
+\newcommand{\PsetMainFile}{6.823/ps1/all.tex}
  27
+\newcommand{\PsetClassMetadata}{6.823/metadata.tex}
  28
+
  29
+
  30
+\input{\PsetClassMetadata}
  31
+\renewcommand{\leftmark}{\PsetAuthorName\space$<$\PsetAuthorEmail$>$}
  32
+\renewcommand{\rightmark}{\PsetClassNumber\space\PsetClassTerm\space\PsetTitle}
  33
+
  34
+\begin{document}
  35
+
  36
+\title{\PsetClassNumber\space\PsetClassTerm\space\PsetTitle}
  37
+\author{\PsetAuthorName}
  38
+\maketitle
  39
+
  40
+\input{\PsetMainFile}
  41
+\end{document}

0 notes on commit a09ea13

Please sign in to comment.
Something went wrong with that request. Please try again.