## Reading 26-1 - Assembly Language

The only place where a computer can store information is memory. It's up to the systems software and the programs that the computer runs to decide what these bytes actually mean. They could be program code, data (integers, strings, images, etc.), or "meta-data" used to build more complex data structures from simple memory boxes. Understanding how complex programs boil down to bytes will help you debug your program, and will make you appreciate why they behave the way they do.

## Motivation

We've now arrived at the end of our introduction to C programming. The rest of the course will mostly look at higher-level concepts built atop the understanding we have developed. But before the look at higher levels, we will briefly pull back the covers and see what happens at the level below C to make your programs run.

    --- | Web sites, Google, Facebook, ChatGPT, etc.
     C  |-------------------------
     S  | Parallel programming    <-- block 4
     E  |-------------------------
     2  | C++ | Operating systems <-- block 3
     0  |-------------------------
     1  | C programming language  <-- block 1 - we discussed this so far
     3  |-------------------------
     3  | Assembly language       <-- block 2- we will briefly cover this
    --- |-------------------------------------------
        | Hardware (chips)        <-- Prof. Morrison's Digital Integrated Circuits course

Now that you understand the C language and memory representations of data, you may wonder about the "magic" hexadecimal bytes that the compiler outputs to make your computer's processor do things like adding numbers. How does the compiler choose these bytes, and what bytes are valid?

Each computer architecture (such as x86-64, which most modern computers use and we're considering in this course) has an <b>instruction set</b> specified by the manufacturer in order to optimize their operations based on their specific needs or intellectual property.

The instruction set, first and foremost, defines what sequences of bytes <b>trigger specific behavior in the processor</b> (e.g., adding numbers, comparing them for equality, or loading data from memory). 

But hexadecimal bytes are hard for humans to read, so the instruction set also comes with a human-readable <b>assembly language</b> that consists of short, mnemonic instructions that correspond directly to a byte encoding (i.e., each of these instructions corresponds to a specific, unique set of hexadecimal bytes).

## Interpreting Bytes In Memory

We will build this up from first principles. Let's start with <a href = "https://github.com/mmorri22/cse20133/blob/main/readings/lec26/add.c">add.c</a>, which is a C program that serves a simple purpose: it reads numbers from the command line and adds them. 

Let's disect the code, and you'll get immersed in the basic structure of a C program, as well as seeing the crucial <code>add()</code> function that we'll use to explore how programs are just bytes in memory.

    #include <stdio.h>
    #include <stdlib.h>

    int sum(int a, int b);

    int main( const int argc, const char* argv[] ){

        if( argc != 3 ){
            return EXIT_FAILURE;
        }

        int first_val = atoi( argv[1] );
        int second_val = atoi( argv[2] );

        fprintf( stdout, "%d + %d = %d\n", first_val, second_val, sum(first_val, second_val) );

        return EXIT_SUCCESS;
    }

    int sum(int a, int b){
        return a+b;
    }

## Programs are just bytes!

We can look at the contents of <code>add.o</code> using a tool called <b>objdump</b>. 
    
    mkdir reading26
    cd reading26
    wget https://raw.githubusercontent.com/mmorri22/cse20133/main/readings/lec26/add.c
    gcc -c add.c
    
## Object Dump

If you are using a Windows Machine or Linux, perform the following instruction:

    objdump -d add.o 

If you are using a Mac, perform the following instruction:

    x86_64-linux-gnu-objdump -d add.o 
    
The <code>objdump</code> prints two things below the <add> line: on the left, the bytes in the file in hexadecimal notation (8d 04 37 c3), and on the right, a human-readable version of what these bytes mean in computer machine language (specifically, in a language called "x86-64 assembly", which is the language my laptop's Intel processor understands).
    
> The full output is available at <a href = "https://github.com/mmorri22/cse20133/blob/main/readings/lec26/add_objdump.txt">add_objdump.txt</a>

When executed in the VS Code Ubuntu environment, we get the following for the <b>sum function</b>:
    
    0000000000000088 <sum>:
    88:   f3 0f 1e fa             endbr64 
    8c:   55                      push   %rbp
    8d:   48 89 e5                mov    %rsp,%rbp
    90:   89 7d fc                mov    %edi,-0x4(%rbp)
    93:   89 75 f8                mov    %esi,-0x8(%rbp)
    96:   8b 55 fc                mov    -0x4(%rbp),%edx
    99:   8b 45 f8                mov    -0x8(%rbp),%eax
    9c:   01 d0                   add    %edx,%eax
    9e:   5d                      pop    %rbp
    9f:   c3                      retq   
    
          ^                       ^
    | bytes in file         | their human-readable meaning in x86-64 machine language
    | in hexadecimal        | (not stored in the file; objdump generated this)
    | notation
    
> <b>On the difference between Windows and Mac</b> - If you are using a Mac, and you try running <code>objdump</code>, you will get the following error:<p>
> <code>trap1:     file format elf64-little</code><br>
> <code>objdump: can't disassemble for architecture UNKNOWN!</code><p>
> This is because the executable we provide is compiled for the x86-64 architecture and contains machine instructions that only x86-64 machines understand, but the computer you’re using only understands ARM64 instructions.<br>

## Generating the Assembly Code

> <b>What does the machine language mean?</b>
> We don't know machine language yet. But to give you an intution, <code>add</code> means to add integers, and <code>retq</code> tells the processor to return to the calling function.
    
We can view the assembly language by performing the command:
    
    gcc add.c -S

We will get the output that is generated in <a href = "https://github.com/mmorri22/cse20133/blob/main/readings/lec26/add.s"><code>add.s</code></a>, which we will break down over the next several lectures. For now, it is sufficient to know how to generate the assembly code in the Linux terminal.

### <font color = "red">Class Introduction Question #1 - What is an Instruction Set and how are they different between computers made by different manufacturers?</font>

### <font color = "red">Class Introduction Question #2 - What is Assembly Language and why do we describe an assembly language when the instruction set precisely describes what happens in the machine?</font>

### <font color = "red">Class Introduction Question #3 - What is the purpose of the objdump instruction? And why do MAC users need to perform x86_64-linux-gnu-objdump instead?</font>