In [None]:
%run -i ../python/common.py

# The von Neumann Architecture

As we saw in the [introduction](./intro.ipynb) assembly code directly maps to the native machine code of a computer.  As such the assembly instructions directly allow a programmer to use and control the basic functions of the computer in order to get the computer to do what they want.   Whether that is to search or sort an array of numbers, play a music file or some other task.   

But to understand assembly programming we have to learn what the basic parts are of any computer are and how they work so we can understand what the assembly instructions allow us to do.    This is why understanding how computers and software work are one in the same as learning assembly programming. 

In [None]:
display(Markdown(htmlFig("../images/edvac.png", 
              align="center", 
              margin="auto 0 0 auto", 
              width="100%", id="fig:edvac", 
             caption="Figure: The archetype of the general-purpose digital computer")))

In [None]:
display(Markdown(htmlFig("../images/SLS_TheMachine.png", 
              width="80%", id="fig:vnm", 
             caption="<center>Figure: Our illustration of a von Neumann computer.  Our view is slightly updated to put the model interms of todays computers.</center>")))

Despite the fact that there are many different manufactures of computers they all largely 
share a basic common structure.  We call the generic components, their organization 
and the way they interact the **architecture** of a machine.  In our 
case the common architecture to which most programmable computers are built around 
is the von Neumann Architecture named after [
John von Neumann](https://en.wikipedia.org/wiki/John_von_Neumann).

> <img style="margin: auto 1em auto auto;" align="left" width="60" src="../images/commentary.png"> <p style="background-color:powderblue;"> In our journey to understand computers we will be exploring a fascinating story of human innovation and ingenuity.  This story is full of characters, such as John von Neumann, Alan Turning, Admiral Grace Hopper, Ada Love Lace, and many more, that dared to be and think differently.  Not only do we owe a great deal of gratitude to these courageous people, who often risked a lot in suggesting new ways of thinking and doing things, challenging the orthodoxy of their day, we can also draw inspiration in their diversity and bravery. Remember who you are and that your voice matters, tomorrows innovation rests on your actions! </p> 

## The Central Processing Unit (CPU)

There are many words today that we might hear used to refer to the Central Processing Unit (CPU) including processor, micro-processor and core.  Our goal, at this point, is to build our knowledge of how all computers work and then dive into more details by looking at how a particular computer works.  From this more generic perspective there are two ways of considering what a CPU is.  

1. Physically: A complex electrical device composed of [transistors](https://en.wikipedia.org/wiki/Transistor) and wires.
2. Logically: A core building block for programmable information processing 


### Physically



While we won't dwell too much on the physical nature of a CPU it is worth "looking" at a few examples and noting a few if their characteristics and various challenges we face in building more advanced versions.  These challenges arise from physical limitation in the construction, connection to other devices and powering of the CPU itself. These challenges and the current approach to trying to mitigate them also has an effect on how one needs to write software that will perform well on modern hardware.  We don't really need to worry about that right now but it is a theme we will revisit later once we have a core understanding of the basics of how software and hardware interact.

In [None]:
display(Markdown(htmlFig("../images/physcpus.png", 
              align="left", 
              margin="auto 1em 0 auto", 
              width="100%", id="fig:physcpus", 
             caption="Figure: Examples of physical CPUs.  For each CPU we note the product name, the number of 'pins' that connect it to the reset of the computer and the count of transistors it is composed of.")))

In the <a href="#fig:physcpus">Examples of physical CPUs figure</a>, above, we see photos of some physical cpus along with a "pin out" diagram that describes how to physically connect it to the rest of the computer.  Over the years it is clear that the complexity of CPUs has certainly grown.  In the 1970's CPUs were composed of thousands of transistors and only required tens of pins to connect to rest of the computer.  By 2021 we see that cpus now contain billions of transistors and require thousands for physical connections to the rest of the system.  We have, in fact, pushed the physical boundaries to the point that it is unclear exactly how we can make more powerful cpus that still preserve a simple programming model. The way software is written and structured becomes more and more important in order to get most out of a computer.  But the first step to understanding how to construct advanced software, is understanding the basics of how the parts of a computer interact to execute software. 

<div style="background-color:powderblue;">  
<img style="margin: auto 1em auto auto;" align="left" width="60" src="../images/fyi.png">
While an over-simplification, one of the uses of transistors in a CPU is to create, "switches" that form the logic circuits used to implement basic operations such as adding numbers. In order to make the CPU do these operations faster we need to be able to operate the transistors, turn them on and off, faster and faster.  Unfortunately doing so requires more and more energy and creates more more heat.  As such over the last decade it has become harder and harder to speed up CPUs, given the increase in energy and attendant increase in heat it leads too.   On the other hand we have managed to figure out how to continually shrink the size of transistors.  This has allowed us to pack more and more transistors into CPUs, however some believe that we are getting to end of this ability as we are approaching the physical limits to how small we can make a transistor.</p> 

Regardless we have reached a point that making cpus operate faster is very hard to do and while we might be able to fit more transistors on the chip we are reaching the limits of how many we have powered on at the same time (due to energy and heat constraints).  Finally it is also very hard to imagine that we can fit many more "pins" that connect the cpu to the rest of the computer. 

Over the years these effects have resulted in a situation that the performance of the software we write has become more and more dependent on how it interacts with the internal way the extra transistors are used.  For example in a modern processor a large number of transistors are used to form what is called [cache](https://en.wikipedia.org/wiki/CPU_cache) memory.  While software can benefit, its performance increases, from caches it is possible to write the same software in different ways that get greater or lesser benefit if you understand how the caches work and the way your software interacts with them.  Similarly today CPU's often use the extra transistors to create multiple internal sub-CPUs, called [cores](https://en.wikipedia.org/wiki/Multi-core_processor).  A program, however does not automatically benefit from multiple cores unless it is written to explicitly exploit them via "parallel threads" of execution.  But to understand how to do this one first needs to understand the classic model by which a single core CPU works and its interaction with memory.   

The bottom line is CPU's are complex organizations of transistors and we reaching the physical limits to their construction.

### Logically

In [None]:
display(Markdown('''
> <img style="margin: auto 1em auto auto;" align="left" width="60" src="../images/fyi.png"> <p style="background-color:powderblue;">  It is a common logical model that makes it possible to learn the basics of how a generic CPU works and understand how software executes on any computer.  As a matter of fact if we consider the line of CPU's from INTEL, software written on a 1970 versions of their products can still be run on the CPU's they produced in 2021 because at the heart of it the 2021 version still remains consistent with the basic model of a computer that the 1970's CPU was built around.  Similarly the basic model of programming, and the tools used, are consistent across computers regardless if they based on an old 6502 or a modern X86 or ARM CPU. 
'''))    

There is a common "logical" model of operation to von Neumman computers that defines how software execution works and therefore what software is exactly.  Our goal for this chapter is to understand this logical model and not the details of a particular computer.   The CPU's operation is at the heart of this logical model as it is core to understanding and programming a computer.

Logically a CPU has:
1. A set of internal components
2. Connections to the computers other two main parts -- Memory and Input/Output (I/O) devices 
3. A core "Loop" that combine and coordinates its internal components with Memory and I/O devices to execute programs. 

> <img style="margin: auto 1em auto auto;" align="left" width="60" src="../images/commentary.png"> <p style="background-color:powderblue;"> **First Principles**  We are comfortable with the idea of using a first principles understanding of a subject, like mathematics, to be able to reason about new problems we face in that subject with out having to limit ourselves to blind memorization of facts and recipes for doing things. Computing in no different understanding the core ideas of how computers works including a foundation in the theory of computer science can allow you to transcend simply regurgitating snip-its of code you have memorized and truly unleash your ability to creatively build new things.  For that matter it also allows you to gain an intuitive understanding for how things work that enables you to debug and solve problems that others don't know where to begin with.  Learning the first principles of how computers work is your first steps in becoming a master of the digital universe.

## Visualizing a generic von Neumann computer

In order to understand a generic model of a von Neumann computer we will use a series of diagrams that progressively visualize the components and show how they interact to execute a program.    

There are many ways that one can "build" a computer around the von Neumann model.  The one we present is designed to make it easy for us to concretely understand the model and make the connections, at least logically, to modern systems.  Each part we introduce corresponds to things that are reflected in the "real" computers.  And perhaps more importantly are necessary to understand in order to program any modern computer.  That being said we also, at this stage avoid, details specific to how modern systems have evolved.  Once one has a good understanding of the basic generic model one can then understand how things have varied over the years.

At the end of this chapter we use the SOL6502 simulated computer to explore the parts and model in action.

### The CPU

In [None]:
display(Markdown(htmlFig("../images/VNA_0.png",
                     align="left", 
                      margin="auto 1em 0 auto",
                     width="60%", id="fig:vna_0",
                     caption="Figure: The CPU") + '''
The cpu is a distinct physical region of the computer. Within it is housed several important parts that we need to understand.
'''                     ))

### Operations

In [None]:
display(Markdown(htmlFig("../images/VNA_1.png",
                     align="right", 
                      margin="auto auto auto auto",
                     width="60%", id="fig:vna_1",
                     caption="Figure: CPU's contain a set of <em>M</em> operations that we will use in our programs. We label each operation with a unique number so that we can identify one from another.") + '''
Within the cpu there are circuits/components that provide a set of operations.  The exact set of 
operations and the number can differ widely between particular CPU brands and models (eg INTEL vs ARM). 
However, commonly we can expect operations that 1) performance arithmetic and logic (eg. adding and 
comparing numbers), 2) move values in and out of the CPU, 3) Control what happens next. Collectivelly 
the ones that perform arithmetic and logic  are often referred to as the Arithmetic Logic Unit (ALU).  

In the diagram we can think of each blue labeled operation box as a unique operation that the CPU supports.
As an example the box labeled `OPERATION 2` might implement the addition of two input values to produce a single result value.
The designers of a particular CPU, like INTEL, choose and define the set of operations for their CPUs and must tell us what they are and what they do.

From our prespective the most important thing, is that the set of operations form the 'built' functions 
that we can use in our programs for the particular cpu we are working with. Given that the 
operations are CPU specific our programs will inherintly be CPU specific.   
'''                     ))

### Registers

> Registers are memory locations within the CPU that are connected to the operation circuits.  Each register has a unique fixed name and a value that can be changed.  Their values can be feed as input to an operation or set as as the output from an operation.   Typically three types categories of registers: 1) General Purpose Registers (GPRS) 2) Special Purpose Registers (SPRS) and 3) Hidden Registers.  

With respect to "normal" programming we largely only need to concern ourselves with the the GPRS and knowing about a couple of standard SPRS. As we will see soon it will also be useful to explicitly introduce at least one of the register that usually hidden from programmers to make understanding execution easier.  

> <img style="display: inline; margin: 1em 1em auto auto;" align="left" width="40px" src="../images/fyi.png"> <p style="background-color:powderblue;"> 
  What do we mean by "normal" programming?  As we saw in Part I: The UNIX Development Environment software is largely broken down into two parts: 1) application software that runs within processes and 2) a single operating system kernel composed of software that provides all processes with a wide range of special functions.  This includes the ability to create processes from an executable file and to share the computer among the processes.  Most CPU's provide a special mode of execution for the OS kernel.  In this mode an expanded set of operations and SPRS are available for use.  OS kernel software uses the GPRS, SPRS and the expanded set of operations to implement is routines.  We typically call this mode of operation Privileged" and as such we often call the kernel software privileged code.  expanded access to the hardware resources of the computer and the application or user software as unprivileged.   Many CPUs expand on this idea to provide multiple levels of privileged which enables a further degree of layering of the systems software.  For example may CPUs use extra levels of privileged to introduction support for a virtual machine monitor (or hypervisor) layer of software that can sit below standard operating system kernels and allow the hardware to be shared by multiple OS's each thinking they are running on the computer by themselves.
    
We will visualize registers as boxes within the CPU.  Each box will be broken down so that the name of the register will be on left and the space for the value it contains on the right.

#### General Purpose Registers (GPRS)

In [None]:
display(Markdown(htmlFig("../images/VNA_2.png",
                     align="left", 
                      margin="auto 1em auto auto",
                     width="60%", id="fig:vna_2",
                     caption="Figure: Geneal purpose registers (GPRS)") + '''
 The general purpose registers provide us with a place to store input and output values for operations we want 
 to conduct.  In general the number of GPRS is small, often less than 50.  In later chapters we will go into 
 more detail on how values are represented in registers.  For the moment it is sufficient to think of them as 
 generic places to hold the values that we want to immediately work with in our code using the built in operations 
 of the CPU.   
 
 A typical thing we might want to do is use an `add` operation to add the current values in two of the GPRS replacing 
 the existing value of one of them with the result.  For example given our diagram so far one would expect there to 
 be a way to have `OPERATION 2`, which we will assume to implement addition, take its input values from `R0` and `R2` and 
 place the output result back in `R2`.
 
 Given that there are a small number of GPRS it will be our responsibility to use them wisely and organize what 
 values of our program are stored in which GPRS at any given time.   Furthermore,  it will be our job to move values in and out of them to 
 deal with the fact that we have so few.  CPUs have operations dedicated to transferring values between the GPRS.  
 For example, given our illustration, one expects there to be an operation that would allow us to transfer the 
 current value of `R2` to `R4`, replacing `R4` existing value.  Such register to register tansfer operations permit 
 all such combinations of GPR transfers.   Additionally there will be operations for moving values between 
 locations outside of the CPU and the GPRS.  But more on this when we add connections from the CPU to Memory.
                     '''                     ))

#### Instruction Register (IR)

In [None]:
display(Markdown(htmlFig("../images/VNA_3.png",
                     align="right", 
                      margin="auto auto auto auto",
                     width="60%", id="fig:vna_3",
                     caption="Figure: Instruction Register: A internal hidden register that we introduce to help us understand how a CPU works internally.") + '''
Built into the CPU is an encoding that allows us to use numeric values to indicate what
operation we want done and how to configure which registers the inputs should come from and where any
outputs it might produce should go.  We call such values **Opcodes**.  Given how the CPU Loop works, as we will see soon,
as programmers it is not actually necessary for us to know how and where the opcode  we want the CPU to execute is stored
within the CPU. As such this detail is generally hidden from us by the manufacturer.  However, in order for us to gain 
a concrete working understanding it is worth explicitly introducing a special purpose register which we will call 
the Instruction Register (IR). Our model of the CPU will use the IR to store the opcode that we want the CPU to execute.
Again while modern CPUs do not necessarly document the IR in their programming manuals, also called their 
Instruction Set Archiecture (ISA) manuals, they internally will have something that serves this purpose.   

> **Instruction Set Architecture (ISA)**: This is the term used by CPU manufactures to referred to the details of their 
specific CPUS that a programmer needs to know about to write code and implement tools for computers that use their CPUS.
                     '''                     ))

##### OPCODE and OPERANDS
> An **Opcode** is an encoded value that identifies a particular operation and where it inputs come from and if output that is created should go too.  Inputs to an operation are typically called **Operands**.  You will often find that the term **instruction** and opcode will be used interchangeably.  For historical reason one is used in some context versus the other. 

#### Program Counter (PC)

In [None]:
display(Markdown(htmlFig("../images/VNA_4.png",
                     align="left", 
                      margin="auto 1em auto auto",
                     width="60%", id="fig:vna_4",
                     caption="Figure: Program Counter (PC) Register: A Special Purpose Reister (also called the Instruction Pointer (IP) Register") + 
'''
CPUs have a special purpose register that indicates where in memory the opcode that we want executed is located.  This register is typically called either the Program Counter (PC) or Instruction Pointer (IR).
This register is only used for this purpose and unlike GPRS cannot be used to store arbitrary inputs or outputs of operations.
Rather its value is only treated like an Memory Address.  Where an Address is a number that indicates a postion in the Memory array that is external to the CPU. 
As we will see, the value in the PC is used to load the IR with the opcodes of our program.
'''                     ))

## Memory

In [None]:
display(Markdown(htmlFig("../images/edvacextract.png", 
              width="100%", id="fig:vnm", 
             caption="<center>Figure: Extract form EDVAC Report discussing Memory and its role.</center>")))

In [None]:
display(Markdown(htmlFig("../images/VNA_5.png",
                     align="left", 
                      margin="auto 1em auto auto",
                     width="60%", id="fig:vna_5",
                     caption="Figure: CPU Connected to a large array of external memory") + 
'''
While the CPU is the central part of the computer that has the ability to do operations it
is limited in the number of regiters it has and further we have no direct way of placing values
into the CPU's register.  This is the core role that the memory of a computer serves.  

**Memory** is a large collection of devices, external to the CPU that can store values.  The devices are organized into an array in which each devices
has a unique numeric index called its address.  The memory 
of a modern computer is on the order of $10^9$ locations. The CPU has sets of wires, called the Memory Bus,
that physically allow the values from the memory devices to be electrically transfer between the registers of the CPU and the locations in memory
and vicersa.  The memory Bus is broken down into to an Address Bus and a Data Bus.  The CPU uses its connection
to the address bus to place a value that is interpreted by the Memory controlling hardware as the address of where in memory 
the transfer should happen.  For example by placing $5$ on the address bus the CPU is activating the memory devices at address $5$.
The Data Bus is used to communicate the value that the CPU wants to send, write, to the address it has activated via the address bus or that the memory at 
the activated address is to sent to the CPU (read).  

The act of transferring values on the Memory Bus is called a **Bus Transaction**. And as we have seen there are two types
A Write transaction and a Read transaction.  Writes are also some times referred to as Stores and Reads as Loads.

The set of memory devices connected to the address bus is also some times called "main" memory to distinguish it from 
other types of devices on the computer that can store values.  Typically there are three defining features of memory in addition 
to there being a large number of them:
1. The CPU has direct access to them via the Memory Bus
2. The majority are **Volatile**.  They consume electricity to maintain their values and if the computer is turned off
or if it's battery dies it's values will be lost.  Typically when first turned on such devices will have random values in them.
3. Compared to other types of memory they are fast with respect to the time it takes the CPU to communicate values with them.

Like the GPRS as a programmer it is our responsibility to organize and manage where and what values are placed in memory.
Memory is like a big blank sheet of paper on which we organize our work interms of values that encode the various aspects 
of our program; Opcodes and Data. This last aspect will make more sense when we look at how program execution works next.
Many of the tools like programming languages and operating systems are precisely there to help us with the tasks of
expressing, organizing and loading our program as values into memory.

> It is important to remember, to make use of a value in memory it must be transfered into the CPU, 
similarly updated values must be transfer back from CPU registers to memory.
'''                     ))

In [None]:
display(Markdown('''
> <img style="display: inline; margin: 1em 1em auto auto;" align="left" width="40px" src="../images/fyi.png"> <p style="background-color:powderblue;">  When we buy personal computing devices such as laptops and phones a key feature of them is how much main memory they have.  But sometimes you really have to pay attention to figure it out.  Main memory will often be referred to as RAM (Random Access Memory) the other types of memory that are not directly connected to the CPU will often be referred to as Storage.  While these types of devices permit you to store data on your phone the CPU cannot use them directly to run you programs. 
'''))

## Program Execution

Now that we have introduced a set of components, and their organization, we can move on to use them to illustrate and understand what it means to execute a program.  

### Setup: 

There are two steps that we must perform before we can have the computer execute our program: 1) we must load the values that make up our program into memory and 2) we must minimally initialize the CPU's PC register.

#### Program: Values in Memory

In [None]:
display(Markdown(htmlFig("../images/VNA_6.png",
                     align="right", 
                      margin="auto auto auto 1em",
                     width="60%", id="fig:vna_6",
                     caption="Figure: A Program is a collection of values in Memeory") + 
'''
To program we must first studied the opcodes of our computer's CPU.  We then must pick an address in memory where the opode values
of our program will be loaded and then write our program as opcodes at that location (we will discuss data values later).
The figure to the right illustrates a simplified example fragment of a program made up of three values that have been 
loaded into memory starting at address $5$.  In this simple example we will assume that the first value $146$ 
encodes the addition of the current values in `R0` and `R2` with the result being placed back in `R2`.  And that
the values `36` and `1` together encode storing the value in `R2` to memory at address $1$.

Of course a real program would be composed of many more memory values and we to be sensible we would have
first transferred useful values into `R0` and `R2`.  But once we understand the basics of 
how execution of a single instruction works and how we move on to the next one there is very little more to undertand. 
At that point extending this program is nothing harder than loading more values into memory. 
'''                     ))

#### Initialize the PC to location of first instruction

In [None]:
display(Markdown(htmlFig("../images/VNA_7.png",
                     align="left", 
                      margin="auto 1em auto auto",
                     width="60%", id="fig:vna_7",
                     caption="Figure: PC Initialized to location of first instruction of Program") + 
'''
Before we can start the process of execution we must initialize the PC to the address in memory of where
we want it to start executing.  There are several ways to do this. When 
we are using an operating system this is taken care of for us by the OS when it loads our program (we will examine this process
in later chapters). For the moment we will assume we have the magical abillity to reach into the CPU and set the PC to 
a value.  In our example we see that we have set the PC to address 5 which is where we loaded our code.
At the end of this chapter when we use the SOL6502 to explore program execution we will use the ability it gives us, 
as a teaching computer, do precisly this -- directly set the PC value.  It is worth noting one can also have this 
magical ability on a real computer when using <a href="./Debuggers">machine level debugger</a>.  Finally, when 
working with the hardware directly, before an OS is started or if you are not using one, the CPU provides a <a href="#POR">Reset</a>
sequence to help with initializing the PC (and other CPU registers).
'''                     ))

#### Categorizing values and their Interpretation
Staying organized is very important when programming at this level.  To help we have
some standard conventions for referring to the two main categories of values we load into memory to form our programs.

##### Text 
Sets of values to be loaded into memory that encode opcodes and their operands for the cpu to execute are called **text**.  We call areas of memory in which we load text values **executable**.  

##### Data
Sets of values to be loaded into memory that purely encode various forms of data that we want our program to operate on, such as numbers of various types or ASCII strings, we unsurprising collectively call **data**.  The main distinction is that as a programmer we don't intend for these values to be treated as opcodes and thus executed.  By forming this distinction the tools we use can try and help us catch bugs in our program which lead it to accidentally attempting to execute values from areas of memory that contain data.  See the <a href="#INTERP_sec">Challenges of Interpretation</a> for a more general discussion.

><img style="margin: 10px 0px 0px 0px;" align="left" width="60" src="../images/warning.svg">
<img style="margin: 10px 0px 0px 0px;" align="left" width="60" src="../images/concept.svg"><img style="margin: 10px 0px 0px 0px;" align="left" width="60" src="../images/donot.svg">
<img style="margin: 10px 0px 0px 0px;" align="left" width="60" src="../images/do.svg"><p id='INTERP_sec' style="background-color:tomato;"><b>Challenges of Interpretation</b> It is important to note, there is no reason for our example CPU not to interpret any location in memory which has the value `36` as the start of a store instruction if the `PC` gets set to that address.  There are many reasons why we might have the value `36` in memory at various locations.  At some location we might be using it to encode the dollar sign character in <a href="../unix/terminal.ipynb#ASCII_sec">ASCII</a> or maybe at some other location we are using the value `36` to represent the age of a player in our game or maybe is in the link field of a linked list node that tells us the address of the next node.   This is indicative of the more general challenge when to comes to programming a computer we must be very careful to not accidentally interpret a value at a location in a way that we did not intent for it to be.  Most of our higher level programming tools and languages work very hard to help us avoid this by forcing a very strict organization on how things are put in memory and adding checks to the code to make sure that the our program is not trying to interpret a memory location in a way we did not intended.  When programming in lower-level languages, like assembly and C, we have the power to direct the CPU to do anything it can.  However, it is then our job to take care to interpret all values correctly, by staying extremely organized and write careful code.  There is nothing that marks a particular address as containing a particular type of data item.  The only thing that determines what the value at a location "means" is code that uses it!  So you must organize memory and write code very carefully to avoid the computer from doing something you do not intend.  Remember that a modern computer can execute hundreds of billions of instructions per second so once things go off the rails, because of a miss-interpretation of a value, many instructions can be executed after and you may or may not ever know that an error occurred and where or why!

### The Loop

At the heart of the CPU is a built in infinite loop that coordinates the parts of the CPU and memory to enable program execution, the successive execution of instructions.  The following Figure, <a href="#fig:theLoop">The Loop</a> has extracts from early processor manuals that describe at a high level this loop.  In our generic CPU breaks this down into three distinct internal phases:
1. Fetch 
2. Decode
3. Execute

We do this so that we can have a clear understanding of what a von Neumann CPU must do to execute a program.  The internals operation of modern CPU's have evolved <a href="#MODERNVNAEXT_sec">extensions</a> to improve performance but regardless they generally maintain this same basic view of execution.  This allows programmers to start with this basic view to both understand how software works and to understand the purpose of the extensions and how to write code that is exploits them.

In [None]:
display(Markdown(htmlFig("../images/ASSEMBLY-VNA-THECPU/ASSEMBLY-VNA-THECPU.018.png", 
              align="center", 
              margin="auto auto auto auto", 
              width="80%", id="fig:theloop", 
             caption="Figure: The Loop: Descriptions from the 6502 and 8086 manuals of their execution loop.")))

### Step 1: Fetch Opcode

In [None]:
display(Markdown(htmlFig("../images/VNA_8.png",
                     align="right", 
                      margin="auto auto auto auto",
                     width="60%", id="fig:vna_8",
                     caption="Figure: First step of exection loop is to fetch the value at the location pointed to by the PC") + 
'''
**Fetch**: The step of the CPU program execution loop is the act doing a read bus transation to the address "pointed" to by the PC to load the IR with
the value there so that it can be executed.  When we interpret a number as a memory address we often say that the value "points" to then value in 
memory at that address.  This is why some manufactures call the PC register an Instruction Pointer (IP) Register.

In our example this means that the CPU places the address value contained in the PC ($5$) onto the address bus and the memory 
reponds by placing the value at that address onto the data bus ($146$) .  When the CPU receives the value it internally routes the 
value into the IR.   
'''                     ))

In [None]:
display(Markdown('''
><img style="display: inline; margin: 1em 1em auto auto;" align="left" width="40px" src="../images/fyi.png"> <p style="background-color:powderblue;"> Generally speaking the number of values,
measured in <a href="../assembly/InfoRepl#THEBYTE_sec">bytes</a>, that encode 
instructions (opcodes) vary widely depending on the design of the CPU. As a matter of fact this is one of the things that distinctly distinguishes the INTEL instruction set from the ARM instruction set.  The length of an INTEL instruction are variable with some encoded in as few as a single byte while others can require fifteen. In contrast the ARM arch64 instruction set uses a fixed length encoding of four bytes for all instructions.  The 6502 instructions require either one, two or three bytes.
Instruction sets like INTEL are called complex instruction set computers (CISC), both because of the complexity of
the instruction encodings and the rich and high-level nature of operations they provide.  Where as ARM like instruction sets are called Reduced Instruction Set Computers (RISC),
both due to the fixed and simple nature of the encoding and the realatively simple nature of the operations they provide.
'''))

### Step 2: Decode

Certain things happen in the decode phase: 1) Identify which operation is to be executed 2) determine where the inputs are coming from (getting them from memory if needed) and 3) setting up where the output will go.  The parts of the CPU that do the setup is often called the control plane/path of the CPU.  We are not particularly concerned with details of how this is implemented in the CPU what we care about is have a concrete knowledge of how an opcode is interpreted -- decoded. As such we artificially break it down into decoding down into three distinct steps for clarity.

In [None]:
display(Markdown(htmlFig("../images/VNA_9.png",
                     align="left", 
                      margin="auto 1em auto auto",
                     width="60%", id="fig:vna_9",
                     caption="Figure: First step of exection loop is to fetch the value at the location pointed to by the PC") + 
'''
Perhaps the obvious thing that needs to be done is to identify which internal operation unit 
is required to do the work encoded in the opcode.  In our example our CPU is built to interpret the opcode
value of $146$ as targeting `OPERATION 2`.  To be concrete and consistent with our example, we define `OPERATION 2` to
implement addition.  That is to say it take in two values as inputs and produces the sum as output. In a real CPU the operations
and what are called funtional units (the components that do the work) may not have such a simple one to one relationship and the setting 
up of which functional units need to be used to conduct the operation could be much more subtle and complex. 
'''                     ))

In [None]:
display(Markdown(
'''
<div style="background-color:powderblue;">
<img style="display: inline; margin: 1em 1em auto auto;" align="left" width="40px" src="../images/fyi.png"><b>Memory Address Modes</b>
It is the second case of address modes, where an operand is located in memory, that adds much of the power and complexity to assembly programming.  We will explore memory addressing modes
in more detail in subsequent chapters.  For the moment the the two things to note are:
1. If an operand value is in memory we will need to have a way of figuring out its address (often called the operands **effective address**). As such there can be several address modes 
that support various ways of calculating the effective address.  Some examples include:
    - *Immediate*: the effective address of the operand is the next locations after the first values of the opcode.  In other works the operand value is encoded in the instruction.
    - *Absolute*: the effective address is encoded in bytes of the opcode.  This is the case in the second instruction of our example.  
    Address $6$ and $7$ encode the store of the value of `R2` into the memory location $1$.  In this case in our example the opcode `36` encodes a store of the value 
    in GPR `R2` to the address who's value is stored in the second value of the instruction, address $7$.
    Given that the value at address $7$ is $1$ we know that the effective address of the store is the location $1$.  
    - *Register Indirect*: the effective address is calculated using values in one ore more registers.  Eg. The effective address is the sum of the value in `R3` and `R4`.
2. Additional memory bus transtations will be required either during decode or execute to get and or put value to memory.
<div>
'''))

In [None]:
display(Markdown(htmlFig("../images/VNA_10.png",
                     align="right", 
                      margin="auto auto auto auto",
                     width="60%", id="fig:vna_9",
                     caption="Figure: First step of exection loop is to fetch the value at the location pointed to by the PC") + 
'''
In addition to which operation is to be executed the opcode also encodes where the input operands,
if any are required for the particualr operation must come from.  In our example, the opcode $146$ idenfies that the two value for the addition 
should explicitly come from `R0` and `R2`.  We visualize this as the control path of the CPU configuring the operation unit, `OPERATION 2`, to
receive its inputs fom the two specified GPRS. 

#### Addressing Modes
More generally, most CPUS support a fixed set of ways of specifying where the operands for a operation come from called addressing 
modes.  In the above example since all the operands are in GPRS this might be called *Register* address modes.  Depending on the 
CPU opcodes can encode many other variants, including:
1. *Implied*: no operands are required or the operands come from a fixed set of register (eg. an instruction might implicity use the value in the PC as an operand)
2. *Memory*: an operand is located in memory and must additional memory bus transactions will be nececessary prior to executing the operations
'''                     ))

### Execute : Conduct the operation and update the PC

In [None]:
display(Markdown(htmlFig("../images/VNA_11.png",
                     align="left", 
                      margin="auto auto auto auto",
                     width="60%", id="fig:vna_11",
                     caption="Figure: First step of exection loop is to fetch the value at the location pointed to by the PC") + 
'''
'''                     ))

In [None]:
display(Markdown(htmlFig("../images/VNA_12.png",
                     align="right", 
                      margin="auto auto auto auto",
                     width="60%", id="fig:vna_12",
                     caption="Figure: First step of exection loop is to fetch the value at the location pointed to by the PC") + 
'''
'''                     ))

### Repeat : Back to Step 1 of the Loop

### Stack Pointer Register (SP)

### Stored Program Computer

### Power on Reset

### ROM and Firmware 

### The Software Stack


### Caveats

<a id='MODERNVNAEXT_sec'></a>
#### Modern Extensions
- CPU caches
- pipe-lined and super-scalar designs
- hyper-threading 
- multi-cores
- Non-volatile Memory

## I/O

In [None]:
display(Markdown(htmlFig("../images/VNA_13.png",
                     align="center", 
                      margin="auto auto auto auto",
                     width="60%", id="fig:vna_12",
                     caption="Figure: I/O") + 
'''
'''                     ))

## SOL6502