## 📘 Google Colab Intro (Complete Guide for Students)

Before diving into Assembly, here's a quick and helpful guide to using **Google Colab** - the platform where we'll write and run our code, and keep structured notes.

---

### 🌐 What is Google Colab?

* **Google Colaboratory (Colab)** is a **free, cloud-based Jupyter notebook** platform provided by Google.
* It lets you write and execute code in the browser — **no installation required**.
* Colab supports **code cells** (for running commands or code) and **text/Markdown cells** (for notes, explanations, and formatting).
* It's particularly useful for working collaboratively and saving your notebooks directly in **Google Drive**.

---

### 🔍 How to Access Google Colab

1. Go to [https://colab.research.google.com](https://colab.research.google.com)
2. You can:

   * Create a **new notebook** (`File > New Notebook`)
   * **Open an existing notebook** from your Drive or GitHub
   * **Upload a `.ipynb` file** from your device

---

### 💾 How Files and Code Work in Colab

* Files you create or download in a session (like `.asm`, `.out`, `.txt`) are **temporary** — they disappear after the session ends unless:

  * You **download them manually**, or
  * **Mount Google Drive** to save files persistently
* You can run **Linux shell commands** in Colab by prefixing them with an exclamation mark `!`
  → For example: `!nasm -f elf64 hello.asm`

---

### ▶️ How to Run a Code Cell

* Click the **▶️ play button** at the left of a code cell to run it
* Or use keyboard shortcut: `Shift + Enter`
  → This runs the current cell and moves to the next one

---

### 📥 Uploading & Downloading Notebooks

* **To download your notebook**:

  1. Click on `File > Download`
  2. Choose `.ipynb` (default) or `.py` format

* **To upload a notebook**:

  1. Click `File > Upload Notebook`
  2. Select a `.ipynb` file from your device


> ->

## **Topics Covered**
### **III: Fundamentals of Assembly Language**
- A. How to declare literals and constants
  1. What are reserved words
  2. When to use identifiers
  3. What are directives

- B. The 4 parts of an instruction
  1. Label
  2. Instruction mnemonic
  3. Operands
  4. Comment

- C. What are intrinsic data type definitions

- D. The x86 uses Little-Endian vs. Big-Endian order

- E. Using symbolic constants

## III. Fundamentals of Assembly Language

---

### A. **How to Declare Literals and Constants**

---

#### 1. What Are Reserved Words?

* **Reserved words** are keywords predefined by the assembler (like NASM).
* They cannot be redefined or used as variable names.
* Examples include:

  * `mov`, `add`, `int`, `db`, `dw`, `section`, `global`, `org`

> 🧠 Think of them like "grammar rules" of assembly that the assembler understands.

---

#### 2. When to Use Identifiers

* **Identifiers** are names we assign to:

  * **Labels**
  * **Constants**
  * **Variables**
* These help **reference memory locations** or data clearly.
* Must start with a **letter or underscore** and contain only alphanumeric characters or underscores.

**Example:**

```asm
count db 10     ; 'count' is an identifier
```

---

#### 3. What Are Directives?

* **Directives** tell the assembler how to process data or allocate memory.
* They do **not generate machine code**.
* Common NASM directives:

  * `section .data` → declares data section
  * `section .text` → declares code section
  * `global _start` → tells linker entry point
  * `equ` → defines a constant

**Example:**

```asm
section .data
    maxVal equ 255    ; constant definition
```

---

### 💻 How to Run NASM in Google Colab

First, we need to install the `nasm` assembler.

In [None]:
!apt-get update && apt-get install nasm

In [None]:
# Step 1: Write your code
%%writefile program.asm
section .data
    msg db 'Hello, World!', 0xA
    len equ $ - msg

section .text
    global _start
_start:
    mov rax, 1
    mov rdi, 1
    mov rsi, msg
    mov rdx, len
    syscall

    mov rax, 60
    xor rdi, rdi
    syscall

Overwriting program.asm


In [None]:
# Step 2: Assemble and link
!nasm -f elf64 program.asm
!ld -o program program.o

In [None]:
# Step 3: Run the executable
!./program

Hello, World!


## 🧠 Step-by-Step Explanation

---

### 🔹 `section .data`

This section is for **declaring initialized data or constants**.

```nasm
msg db 'Hello, World!', 0xA
```

* `msg`: an identifier (label) for the data.
* `db`: “define byte” — we're storing bytes of text.
* `'Hello, World!'`: ASCII string.
* `0xA`: newline character (`\n`) in hexadecimal — it makes the output jump to the next line.

```nasm
len equ $ - msg
```

* `len`: a constant using `equ` (equate).
* `$`: current location counter (current memory address).
* `$ - msg`: calculates the **length** of the message.

  * This gives the number of bytes from the start of `msg` to the current position (end of string).
* So `len` equals **the number of characters** in the message.

---

### 🔹 `section .text`

This is the **code section** where instructions go.

```nasm
global _start
```

* Tells the assembler to **make `_start` the entry point** — where execution begins.

```nasm
_start:
```

* A label marking where the program starts.

---

### 🔹 Print the message

```nasm
mov rax, 1        ; syscall number for sys_write
mov rdi, 1        ; file descriptor 1 = stdout
mov rsi, msg      ; pointer to the message
mov rdx, len      ; length of the message
syscall           ; perform the write system call
```

Here we use **Linux system calls** directly:

| Register | Purpose                               |
| -------- | ------------------------------------- |
| `rax`    | System call number (1 = write)        |
| `rdi`    | File descriptor (1 = standard output) |
| `rsi`    | Memory address of what to write       |
| `rdx`    | How many bytes to write               |

This part prints: `Hello, World!`

---

### 🔹 Exit the program

```nasm
mov rax, 60       ; syscall number for exit
xor rdi, rdi      ; set rdi to 0 (exit code 0)
syscall
```

| Register | Purpose                        |
| -------- | ------------------------------ |
| `rax`    | System call number (60 = exit) |
| `rdi`    | Exit code (0 = success)        |

* `xor rdi, rdi`: sets `rdi` to 0 (same as `mov rdi, 0`, but shorter and faster).
* This tells Linux: "I'm done, exit the program with code 0."

---

## ✅ Summary

This Assembly program:

1. **Stores** a message in memory.
2. **Uses system call 1** to print it to the screen.
3. **Uses system call 60** to exit the program.

---

# 🧩 III.B: Registers and the Role of the CPU

> *Understanding the central role of the CPU, and how registers are used to store and manipulate data.*

---

## 👩‍🏫 **Concept Explanation**

### 🧠 What are Registers?

Registers are **small, super-fast memory locations** inside the CPU. They:

* Hold temporary data
* Store operands for arithmetic
* Contain addresses to access memory
* Manage control flow

Think of registers as the CPU's **scratchpad**.

---

### 📦 Common Registers in x86\_64

| Register | Description                                        |
| -------- | -------------------------------------------------- |
| `rax`    | Accumulator (used in arithmetic and return values) |
| `rbx`    | Base register                                      |
| `rcx`    | Counter register (used in loops)                   |
| `rdx`    | Data register                                      |
| `rsi`    | Source index (string/memory operations)            |
| `rdi`    | Destination index (also used in syscall args)      |
| `rsp`    | Stack pointer                                      |
| `rbp`    | Base pointer (used in stack frames)                |
| `rip`    | Instruction pointer (tracks current instruction)   |

Each register is **64 bits** in x86\_64. You can also access parts of the registers:

* `rax` (64-bit), `eax` (32-bit), `ax` (16-bit), `al` (8-bit)

---

### 🔁 CPU Workflow with Registers

The CPU executes instructions like this:

1. Fetch instruction (`rip`)
2. Decode instruction
3. Load data from memory to registers
4. Perform operations using registers (`rax`, `rbx`, etc.)
5. Store results
6. Update `rip` to next instruction

---

## 🧪 Example: Simple Move and Add

```nasm
section .text
global _start
_start:
    mov rax, 5        ; Load 5 into rax
    mov rbx, 10       ; Load 10 into rbx
    add rax, rbx      ; rax = rax + rbx (5 + 10)

    mov rdi, rax      ; exit code = result
    mov rax, 60       ; syscall: exit
    syscall
```

---

## 🧾 Explanation

1. `mov rax, 5`
   → Store 5 in register `rax`

2. `mov rbx, 10`
   → Store 10 in `rbx`

3. `add rax, rbx`
   → Add `rbx` to `rax`, so `rax = 15`

4. `mov rdi, rax`
   → Exit code = 15 (result of addition)

5. Exit using syscall 60

✅ When run: **the program exits with status code 15** (you won't see it unless checked via shell).

---

## 🧠 Analogy

Think of registers as **hands of a chef**:

* Instead of walking back to the fridge (RAM) every time,
* They keep **key ingredients** nearby to cook faster (like flour, sugar, etc.).

---

### ✅ Extra Tip

To view the exit code of a program in Linux:

```bash
echo $?
```

If your program exits with 15, this command will print `15`.

----

# 🧩 III.C: Memory Addressing and Data Movement

> *Learn how data is stored in memory and how we move it between memory and registers.*

---

## 🧠 What is Memory Addressing?

Memory is organized into bytes, each with a **unique address** (like a house on a street). When a program wants to read or write data, it needs to know the **exact address**.

---

## 🏠 Types of Addressing in Assembly

| Addressing Mode    | Example            | Description                              |
| ------------------ | ------------------ | ---------------------------------------- |
| Immediate          | `mov rax, 5`       | Load a constant value                    |
| Register           | `mov rbx, rax`     | Copy data between registers              |
| Direct (absolute)  | `mov rax, [var]`   | Load from a memory variable              |
| Indirect (pointer) | `mov rax, [rbx]`   | Load from memory address stored in `rbx` |
| Offset             | `mov rax, [rbx+4]` | Load from `rbx + 4`                      |

---

## 🗃️ Example: Using Memory

```nasm
section .data
    number dq 42          ; Allocate 8 bytes and store 42

section .text
global _start
_start:
    mov rax, [number]     ; Load value at 'number' into rax

    mov rdi, rax          ; Set exit code to 42
    mov rax, 60           ; syscall: exit
    syscall
```

---

### 📖 Explanation

1. `.data` section: We define a label `number` and store 42
2. `mov rax, [number]`: Access the memory location labeled `number`
3. `rax` now contains 42
4. Program exits with code 42

---

## 💡 Note on Square Brackets `[ ]`

* Brackets mean: **“go to memory”**
* `mov rax, rbx` = copy value
* `mov rax, [rbx]` = go to address inside `rbx` and load value

---

## 📦 Common Data Sizes

| Directive | Size    | Example     |
| --------- | ------- | ----------- |
| `db`      | 1 byte  | `db 0xFF`   |
| `dw`      | 2 bytes | `dw 1234`   |
| `dd`      | 4 bytes | `dd 100000` |
| `dq`      | 8 bytes | `dq 42`     |

Use the correct size when loading from memory — mismatches can cause errors!

---


# 🪜 III.D: The Stack and Stack Operations

> *The stack is a special area in memory used for temporary storage, especially during function calls.*

---

## 🎒 What is the Stack?

The **stack** is a *Last In, First Out* (LIFO) data structure. Think of it like a pile of plates — you add and remove from the top.

* It grows **downward** in memory on x86 systems.
* It's used for:

  * Saving function return addresses
  * Local variables
  * Temporary storage

---

## 🧱 Stack Structure

```
High Address
 ┌────────────┐
 │   Old data │ ← Top before push
 ├────────────┤
 │    rbx     │ ← push rbx
 ├────────────┤
 │    rax     │ ← push rax
 ├────────────┤
 │ Return addr│ ← call function
 └────────────┘
Low Address
```

---

## 🔧 Instructions to Work with the Stack

| Instruction | Description                                     |
| ----------- | ----------------------------------------------- |
| `push reg`  | Decrease stack pointer and store register value |
| `pop reg`   | Load value from top of stack into register      |
| `call`      | Push return address, then jump to function      |
| `ret`       | Pop return address and jump back                |

---

## 🧪 Example: Using `push` and `pop`

```nasm
section .text
global _start
_start:
    mov rax, 5
    push rax         ; Push 5 to stack

    mov rax, 10
    push rax         ; Push 10

    pop rbx          ; rbx = 10
    pop rcx          ; rcx = 5

    ; Exit program
    mov rdi, 0
    mov rax, 60
    syscall
```

---

### 🔍 Explanation

1. `rax ← 5` → pushed to stack
2. `rax ← 10` → pushed too
3. `pop rbx` → rbx gets the last pushed value (10)
4. `pop rcx` → rcx gets the earlier value (5)

This shows how the stack works in **reverse order**.

---

## ⚠️ Important Notes

* The **stack pointer** is stored in the `rsp` register (64-bit systems)
* Avoid manually modifying `rsp` unless you know exactly what you're doing
* Every `push` should eventually be matched with a `pop` to maintain stack balance



# 🧬 III.E: What are Intrinsic Data Type Definitions?

> *Intrinsic data types define how data is interpreted and how many bytes it takes in memory.*

---

## 🧠 What are Intrinsic Data Types?

In Assembly (especially with NASM and x86), **intrinsic data types** refer to the *basic, built-in types* that describe how much memory a value takes and how to treat it (as signed/unsigned, integer/floating-point, etc.).

They help the assembler understand how much space to reserve and how to interpret binary data.

---

## 📏 Common Intrinsic Data Types in x86 Assembly

| Data Type | Bytes | Description           | Range (Signed)                  |
| --------- | ----- | --------------------- | ------------------------------- |
| `byte`    | 1     | 8-bit data            | -128 to 127                     |
| `word`    | 2     | 16-bit data           | -32,768 to 32,767               |
| `dword`   | 4     | 32-bit data           | -2,147,483,648 to 2,147,483,647 |
| `qword`   | 8     | 64-bit data           | Huge signed range               |
| `tword`   | 10    | 80-bit floating-point | Special x87 usage (rare)        |

---

## 📦 Defining Data with Intrinsic Types

Here’s how to **declare** data in a NASM program:

```nasm
section .data
val1 db  100       ; 1 byte
val2 dw  12345     ; 2 bytes
val3 dd  12345678  ; 4 bytes
val4 dq  999999999 ; 8 bytes
```

### Mnemonics:

| Assembler Code | Meaning      |
| -------------- | ------------ |
| `db`           | Define byte  |
| `dw`           | Define word  |
| `dd`           | Define dword |
| `dq`           | Define qword |

---

## 🔍 Example Explained

```nasm
section .data
name db 'J', 'o', 'h', 'n', 0   ; Null-terminated string (C-style)
age  db 25                     ; 1 byte for age
salary dd 50000               ; 4 bytes
```

* `name` → Stored as characters in consecutive bytes
* `age` → A single byte for numeric value
* `salary` → A 4-byte (32-bit) number

---

## 🧭 Why It Matters

Knowing how much space each data type takes lets you:

* Align your memory properly
* Optimize performance
* Avoid unexpected behavior (overflow, truncation)

---


# 🧭 III.F: The x86 Uses Little-Endian vs. Big-Endian Order

---

## 🧠 What is Endianness?

**Endianness** is the order in which bytes are stored in memory for multi-byte data types (like `word`, `dword`, `qword`).

There are two main types:

| Type              | Description                                                 |
| ----------------- | ----------------------------------------------------------- |
| **Little-Endian** | Least significant byte stored first (lowest memory address) |
| **Big-Endian**    | Most significant byte stored first (lowest memory address)  |

---

## 💡 x86 Architecture = Little-Endian

This means when storing values like `0x12345678`, memory looks like this:

| Address | Value |
| ------- | ----- |
| 0x00    | `78`  |
| 0x01    | `56`  |
| 0x02    | `34`  |
| 0x03    | `12`  |

So although the number is `0x12345678`, it's stored **backwards** byte-wise.

---

## 🔬 Example in Assembly

```nasm
section .data
val dd 0x12345678
```

In memory (in Little-Endian):

```
Memory:
[78] [56] [34] [12]
```

If this were Big-Endian, it would be:

```
Memory:
[12] [34] [56] [78]
```

---

## 🔍 Why It Matters

* You **must reverse byte order** when manually constructing multibyte values.
* It's especially important when dealing with **file I/O**, **network protocols**, or **hardware registers** (which may use Big-Endian).
* Data from different architectures may **not be directly portable** due to endianness mismatch.

---

## 📦 Real Use Case

When calling functions in assembly that expect 32-bit integers, you often push the values in **native Little-Endian format**, so no manual adjustment is required when pushing constants like:

```nasm
push dword 0x12345678
```

It will be automatically handled by the CPU's architecture.

# 📘 III.G: ASCII Character Set - Text and Character Storage

---

## 📚 What is ASCII?

**ASCII (American Standard Code for Information Interchange)** is a 7-bit character encoding standard used to represent text in computers. Each character is assigned a number from **0 to 127**.

| Character | Decimal | Hex  |
| --------- | ------- | ---- |
| `A`       | 65      | 0x41 |
| `a`       | 97      | 0x61 |
| `0`       | 48      | 0x30 |
| SPACE     | 32      | 0x20 |
| `!`       | 33      | 0x21 |

ASCII is how we store and process human-readable text like `"Hello"` in memory.

---

## 💾 Storing Characters in Assembly

Characters are just **bytes** in memory. For example:

```nasm
section .data
msg db 'A'       ; single character
```

Here, `'A'` = 65 = `0x41`. Stored in memory as one byte: `41`.

To store a string:

```nasm
section .data
greeting db 'Hello, world!', 0
```

* Each character is one byte.
* Ends with `0` (null terminator) to mark end of string for C-compatible functions like `printf`.

---

## 🔍 Examining Memory

Let's say we have:

```nasm
msg db 'ABC'
```

This stores the following in memory:

| Address | Value | ASCII |
| ------- | ----- | ----- |
| 0x00    | `41`  | A     |
| 0x01    | `42`  | B     |
| 0x02    | `43`  | C     |

You can print or read individual characters using system calls or string functions.

---

## 🧠 Why It's Important

* Essential for **outputting messages**, user interaction, file writing.
* Many tools expect **null-terminated** strings (`0` at the end).
* Useful in encoding/decoding tasks, low-level text formatting, debugging memory.

---

## 🛠 Example: String with Null Terminator

```nasm
section .data
msg db 'ChatGPT rocks!', 0
```

This message is safe to pass to Linux's `write` or `printf` system calls.

----


# 📘 III.H: Using Symbolic Constants in Assembly

---

## 🧩 What Are Symbolic Constants?

Symbolic constants are **named values** that don't change during program execution. Think of them like labels for fixed numbers or characters, which improves readability and makes your code easier to maintain.

---

## 🔧 Declaring Symbolic Constants

In NASM, you define symbolic constants using the `%define` directive:

```nasm
%define BUFFER_SIZE 256
%define EXIT_SUCCESS 0
```

These work like simple **find-and-replace macros**. When the assembler sees `BUFFER_SIZE`, it replaces it with `256`.

---

## 📝 Example Usage


```nasm
%define MSG_LEN 13

section .data
msg db 'Hello, world!'

section .text
global _start

_start:
    mov eax, 4          ; syscall number for write
    mov ebx, 1          ; file descriptor 1 (stdout)
    mov ecx, msg        ; message to write
    mov edx, MSG_LEN    ; number of bytes to write
    int 0x80            ; call kernel

    mov eax, 1          ; syscall number for exit
    xor ebx, ebx        ; return 0
    int 0x80
```

Here:

* `MSG_LEN` is defined once and reused — if the message changes length, you only update it in one place.
* This improves **code readability and flexibility**.

---

## 📚 Benefits of Symbolic Constants

* ✅ **Readability**: `BUFFER_SIZE` is easier to understand than `256`.
* ✅ **Maintainability**: Change in one place affects the whole code.
* ✅ **Consistency**: Reduces the chance of typos in repeated values.

---

## ⚠️ Things to Remember

* `%define` is **preprocessor-based** — it's not a runtime variable.
* Symbolic constants are not stored in memory — they just replace text during assembly.

---

## 🛠 Best Practice

Group your constants at the top of the file for easy access:

```nasm
%define STDOUT 1
%define SYS_WRITE 4
%define SYS_EXIT 1
%define EXIT_SUCCESS 0
```

---