# Introduction: Assembly Programming
    
Assembly Programming, directing the computer in what it should do, means writing code in the language that the hardware of the computer understands -- or at least very close to it. 

![pgmview](../images/210-3.1-1.pdf)
    
Most programmers are completely unaware of how the computer really works as the languages that they program in are not natively supported by the hardware.  Rather, most languages, such as JavaScript, Java, Python, Ruby, Rust, C#, C++, and even C must be translated into the native code that the computer does understand -- machine code.  
    
## Machine Code
   
The hardware of the computer is built out of electical components that can store and interpret patterns of  bits, binary digits.  A single bit can be though of as a switch that can be in two positions "ON" -- 1  and "OFF" -- 0.  A group of eight bits forms a byte.  We will say  a lot more about bytes  a little later.  The main thing to observe is that a byte is easily expressed as an 8 digit binary, (base 2), number.  Given that there are 8 bits a byte can take on $2^8=256$ 
unique values ranging from $00000000$ to $11111111$.
    
Machine code is a binary code -- meaning that binary values are used to encoded the operations and values that make up a program.   The following table list twenty of the machine codes of a [MOS 6502 Centeral Processing Unit (CPU)](https://en.wikipedia.org/wiki/MOS_Technology_6502) which has a very simple machine code.

|Binary Value| 6502 Operation|
|------------|---------------|
| $00000000$ | interrupt - impl: Implied i |
| $00000001$ | or with accumulator - X,ind: Zero Page Indexed Indirect (zp,x) |
| $00000101$ | or with accumulator - zpg: Zero Page zp |
| $00000110$ | arithmetic shift left - zpg: Zero Page zp |
| $00001000$ | push processor status (SR) - impl: Implied i |
| $00001001$ | or with accumulator - #: Immediate # |
| $00001010$ | arithmetic shift left - A: Accumulator A |
| $00001101$ | or with accumulator - abs: Absolute a |
| $00001110$ | arithmetic shift left - abs: Absolute a |
| $00010000$ | branch on plus (negative clear) - rel: Program Counter Relative r |
| $00010001$ | or with accumulator - ind,Y: Zero Page Indirect Indexed with Y (zp),y |
| $00010101$ | or with accumulator - zpg,X: Zero Page Index with X |
| $00010110$ | arithmetic shift left - zpg,X: Zero Page Index with X |
| $00011000$ | clear carry - impl: Implied i |
| $00011001$ | or with accumulator - abs,Y: Absolute Indexed with Y a,y |
| $00011101$ | or with accumulator - abs,X: Absolute Indexed with X a,x |
| $00011110$ | arithmetic shift left - abs,X: Absolute Indexed with X a,x |
| $00100000$ | jump subroutine - abs: Absolute a |
| $00100001$ | and (with accumulator) - X,ind: Zero Page Indexed Indirect (zp,x) |
| $00100100$ | bit test - zpg: Zero Page zp |
    
> The 6502 was designed and first built in 1975 but continues to be used today. It has 151 simple operations versus the thousands of complex operations supported by an modern [Intel X86-64 processor](https://en.wikipedia.org/wiki/X86-64) which is widely used in computers ranging from laptops to supercomputers.  The following article discusses how to calculate the number of operations an Intel x86-64 processor supports and why it is hard to do so;  ["Enumerating x86-64 – It’s Not as Easy as Counting"]([https://www.unomaha.edu/college-of-information-science-and-technology/research-labs/_files/enumerating-x86-64-instructions.pdf).   Given the simplicity of the 6502 we will often use it to first get our heads around an idea or mechanism, before looking at the same thing on an Intel X86-64 based computer.       
    
A program written directly in machine code is a sequence of binary values.  The following is a small 6502 machine code program that calculates $1+1$:
```
00011000, 10101001, 00000001, 01101001, 00000001
```
       
## Assembly Code
Assembly code is a slighly more generic human friendly code that we can use to program a computer.  Machine code must be expressed purely in numbers but assembly code uses structued text that can be easily translated by a set of tools into the equvalent machine code.   Each group of machine operations of the CPU that does the same function is assigned a human text **memonic**, often referred to as an **instruction**.  For example the following are the 8 different 6502 machine code operations that add two numbers:

|Memonic|Binary Value|6502 Operation|
|-------|------------|--------------|
| ADC | $01100001$ | add with carry - X,ind: Zero Page Indexed Indirect (zp,x) |
| ADC | $01100101$ | add with carry - zpg: Zero Page zp |
| ADC | $01101001$ | add with carry - #: Immediate # |
| ADC | $01101101$ | add with carry - abs: Absolute a |
| ADC | $01110001$ | add with carry - ind,Y: Zero Page Indirect Indexed with Y (zp),y |
| ADC | $01110101$ | add with carry - zpg,X: Zero Page Index with X |
| ADC | $01111001$ | add with carry - abs,Y: Absolute Indexed with Y a,y |
| ADC | $01111101$ | add with carry - abs,X: Absolute Indexed with X a,x |

The memonic for all of them is `ADC`.  The thing that distinguishes the different forms of related operations, such as adding two numbers, is where the values to add will come from and where the results will go. These prameters are often called the **operands** of an instructions.  In assembly code we write the program as a sequence of instructions along with syntax that specifies the operands.  A tool called the assembler is then used to translated our assembly program into the corresponding machine code.   The following is the same 6502 program to add $1+1$ written in assembly code:

```assembly
    CLC         ; Clear the Carry Flag
    LDA #1      ; Load the accumulator with the value 1
    ADC #1      ; Add the value 1 to the accumulator 
```


>Using and editor we could write the above into a file (eg. 6502add.s) and then use a 6502 assembler to translate it into 6502 machine code. eg.
>```shell
>ca65 6502add.s -l 6502add.lst
>```
>This would produce new file that has the machine code verison that could be loaded and run on a 6502 based computer. 

Unlike our previous machine code version the assembly language version is written using the memonics and also has comments thus making it at least "readable" by a human.  However, unlike other programming languages you might be familar with you likely still cannot REALLY read it.  Eg. you probably cannot tell that it a simple program to add two numbers.   

To understand assembly language we need to understand what the operations of the CPU are and what they do.  And to achieve this we must understand the basic functioning of the CPU and the rest of the hardware that makes up the computer.

> The jump from machine code to assembly code illustrates an important pattern that we will see applied over and over again -- Program Translation. Rather than needing to understand and remember of all the details of machine code a programmer can simply learn the assembly language and then rely on the assembler to translate it correctly into machine code.  The assembler is a program who's job is to do this translation -- in some sense it acts like a machine code programmer that we give an assembly code version of our program to.   In a similar manner we can design another programming language and write assembly programs that translate that language into assembly language and then use the assembler to translate the result into machine code.   A programmer that learns our new language need never now about assembly or machine language.  