### **VIII. Working with strings and Arrays**
Learn how to define and work with strings and arrays in assembly. This section teaches you to use string instructions, create procedures for string handling, and perform operations on multi-dimensional arrays like searching and sorting.

#### **Topics Covered**
A. Defining strings and arrays

B. Using primitive string instructions

C. Building procedures that use strings

D. Operations on 2-dimensional arrays

E. How to build searching and sorting routines

### **A. Defining Strings and Arrays**

In Assembly language, **strings** and **arrays** are both just sequences of bytes (or words) stored in memory. Unlike higher-level languages, **there is no separate "string" data type** - it's up to you to define and handle them manually.

---

### 🔹 Defining a String

A string in Assembly is typically defined using `.db` (define byte) with ASCII characters. You can include a **null terminator (`0x00`)** or use a **length-prefixed** method.

#### ✅ Example 1: Null-Terminated String

```nasm
my_string db "Hello, world!", 0
```

* `"Hello, world!"` is stored as ASCII bytes.
* `0` at the end signals the end of the string.

---

### 🔹 Defining an Array

An array is just a sequence of elements (bytes, words, etc.) in memory.

#### ✅ Example 2: Byte Array

```nasm
my_array db 10, 20, 30, 40, 50
```

* Defines 5 bytes in memory.
* Indexed by position (0-based).
* `my_array+2` gives the 3rd item: 30.

---

### 🛠 Practical Example: Defining and Accessing a String & Array

We'll:

* Define a string: `"Hello!"`
* Define an array of 5 numbers.
* Move each into registers.

---

### ✅ Example

In [None]:
# Save as: define_string_array.asm
!echo '
section .data
    greeting db "Hello!", 0       ; Null-terminated string
    numbers  db 10, 20, 30, 40, 50 ; Array of 5 bytes

section .bss
    letter resb 1
    third_number resb 1

section .text
    global _start

_start:
    ; Load first character of string into AL
    mov al, [greeting]        ; AL = "H"
    mov [letter], al          ; Store in variable

    ; Load 3rd element of array (index 2)
    mov al, [numbers + 2]     ; AL = 30
    mov [third_number], al

    ; Exit
    mov eax, 60
    xor edi, edi
    syscall
' > define_string_array.asm

# Assemble and link
!nasm -f elf64 define_string_array.asm -o define_string_array.o
!ld define_string_array.o -o define_string_array
!./define_string_array

# View stored results
!xxd -g1 -c1 define_string_array.asm | grep -E "letter|third_number"

### 🔍 Code Explanation

* `greeting db "Hello!", 0`: A null-terminated string.
* `numbers db 10, 20, 30, 40, 50`: A 5-element byte array.
* `mov al, [greeting]`: Loads the ASCII value of `'H'` (72) into AL.
* `mov al, [numbers+2]`: Loads the 3rd item in array → 30.

---

### 🧠 Notes

* There's **no bounds checking** — you must manage it manually.
* Strings don't end automatically; you need to add a terminator (`0`) yourself.
* Arrays and strings are accessed by **offsets** (e.g., `+1`, `+2`, etc.)

---

### 📌 Summary

| Concept | Defined With | Terminator?   | Access By          |
| ------- | ------------ | ------------- | ------------------ |
| String  | `db "..."`   | Usually `0`   | `[label + offset]` |
| Array   | `db 1, 2, 3` | No terminator | `[label + index]`  |


⬇️

### **B. Using Primitive String Instructions**

Assembly provides **primitive string instructions** that simplify operations on blocks of memory - such as copying a string, scanning for a character, or initializing memory.

These are powerful because they **automatically use registers and update pointers**, making repetitive operations much faster and cleaner.

---

### 🔹 Common String Instructions

| Instruction | Purpose                         | Uses Registers         |
| ----------- | ------------------------------- | ---------------------- |
| `MOVSB`     | Copy byte from `SI` to `DI`     | `SI`, `DI`, `ES`, `DS` |
| `MOVSW`     | Copy word (2 bytes)             |                        |
| `LODSB`     | Load byte from `[SI]` into `AL` | `SI`, `AL`             |
| `STOSB`     | Store `AL` into `[DI]`          | `DI`, `AL`             |
| `SCASB`     | Compare `AL` with byte at `DI`  | `AL`, `DI`             |
| `CMPSB`     | Compare bytes at `SI` and `DI`  | `SI`, `DI`             |

> 💡 Prefix with `REP`, `REPE`, or `REPNE` to repeat operation automatically.

---

### 🔹 Registers Used in String Ops

| Register | Description                                     |
| -------- | ----------------------------------------------- |
| `SI`     | Source Index (usually points to source string)  |
| `DI`     | Destination Index (for storing results)         |
| `AL`     | Used for byte-level ops like `LODSB`, `STOSB`   |
| `CX`     | Count register (used with `REP`)                |
| `DF`     | Direction flag (determines increment/decrement) |

---

### 🛠 Example: Copying a String using `MOVSB`

Let's copy the string `"Hi!"` into another buffer using `MOVSB` with `REP`.

---

### ✅ Example

In [None]:
# Save as: string_movsb.asm
!echo '
section .data
    source db "Hi!", 0        ; Null-terminated source
    length equ 4              ; Length including null terminator

section .bss
    destination resb 4        ; Allocate 4 bytes for copied string

section .text
    global _start

_start:
    ; Set up registers for string copy
    mov rsi, source           ; RSI = address of source
    mov rdi, destination      ; RDI = address of destination
    mov rcx, length           ; RCX = number of bytes to copy
    cld                       ; Clear direction flag (forward copy)

    rep movsb                 ; Repeat MOVSB until RCX = 0

    ; Exit
    mov eax, 60
    xor edi, edi
    syscall
' > string_movsb.asm

# Assemble and run
!nasm -f elf64 string_movsb.asm -o string_movsb.o
!ld string_movsb.o -o string_movsb
!./string_movsb

# View result in destination buffer
!xxd -g1 -c4 string_movsb.asm | grep destination

### 🔍 Code Explanation

* `rsi = source`: Points to start of the source string.
* `rdi = destination`: Points to destination buffer.
* `rcx = 4`: 3 characters + null terminator.
* `rep movsb`: Repeats `movsb` 4 times — copying bytes from `[rsi]` to `[rdi]`.
* `cld`: Ensures copy moves **forward** (increment).

---

### 🧠 Optional: LODSB and STOSB Example

Want to manually load and store each character?

```nasm
; Manually load and store 1 byte (not shown in main code)
mov rsi, source
mov rdi, destination
lodsb             ; AL = [RSI], RSI++
stosb             ; [RDI] = AL, RDI++
```

---

### 📌 Summary

| Instruction | Purpose       | Use With   | Often Paired With |
| ----------- | ------------- | ---------- | ----------------- |
| `MOVSB`     | Copy byte     | `SI`, `DI` | `REP`, `CLD`      |
| `LODSB`     | Load byte     | `SI`, `AL` |                   |
| `STOSB`     | Store byte    | `DI`, `AL` | `REP`             |
| `SCASB`     | Scan for byte | `AL`, `DI` | `REPNE`, `REPE`   |

---


⬇️

### **C. Building Procedures that Use Strings**

In Assembly language, a **procedure** (or subroutine/function) is a reusable block of code. Unlike high-level languages, you need to manage **parameters, return values, and registers** manually - often through the stack or registers.

When working with **strings**, procedures can be written to:

* Count characters
* Copy strings
* Compare strings
* Reverse strings, etc.

We'll demonstrate **how to define a procedure to count the length of a null-terminated string**, which is a very common task.

---

### 🔹 Strategy

* The string is passed via a register (e.g., `RSI`).
* The procedure loops through the string until it finds a **null terminator (`0`)**.
* It counts how many characters it sees.

---

### 🧠 Why Use a Procedure?

* Helps **reuse** string logic.
* Keeps code organized and clean.
* Mimics behavior of C functions like `strlen`.

---

### ✅ Full Code: Procedure to Count String Length

We'll count the length of `"Assembly!"` (9 characters) using a custom procedure called `string_length`.

---

### 🛠 Example

In [None]:
# Save as: string_length_proc.asm
!echo '
section .data
    my_string db "Assembly!", 0

section .bss
    length resb 1

section .text
    global _start

; -------------------------------
; Procedure: string_length
; Input : RSI = address of string
; Output: AL  = length
; -------------------------------
string_length:
    xor rcx, rcx           ; Clear counter

.loop:
    mov al, [rsi]          ; Load current byte
    cmp al, 0              ; Check if null terminator
    je .done               ; If yes, exit loop
    inc rcx                ; Count one character
    inc rsi                ; Move to next byte
    jmp .loop

.done:
    mov al, cl             ; Move count to AL
    ret

; -------------------------------
_start:
    mov rsi, my_string     ; RSI = pointer to string
    call string_length     ; Call procedure
    mov [length], al       ; Save result

    ; Exit
    mov eax, 60
    xor edi, edi
    syscall
' > string_length_proc.asm

# Assemble and link
!nasm -f elf64 string_length_proc.asm -o string_length_proc.o
!ld string_length_proc.o -o string_length_proc
!./string_length_proc

# View result length
!xxd -g1 -c1 string_length_proc.asm | grep length

### 🔍 Code Explanation

* `string_length:` defines the procedure.
* It reads from `RSI` and counts characters until it finds `0`.
* `RCX` keeps the count; result is returned in `AL`.
* In `_start`, we move the string address to `RSI` and `call string_length`.

---

### 🧠 Notes

* We returned the result in `AL`, which is common for small return values.
* Procedures don't have a formal `return type` — it's your job to define where the result goes.
* You could also pass the string via the **stack**, though using registers is simpler for short routines.

---

### 📌 Summary

| Concept      | Approach           |
| ------------ | ------------------ |
| Parameters   | Passed via `RSI`   |
| Return Value | Sent back in `AL`  |
| Result Usage | Stored in `length` |

⬇️

### **D. Operations on 2-Dimensional Arrays**

In Assembly, there is no built-in support for multidimensional arrays like in C or Python - but we can **simulate** them using **row-major memory layout**.

---

### 🧠 What is Row-Major Order?

A 2D array like:

```
A = [ [1, 2, 3],
      [4, 5, 6] ]
```

Is stored in memory as:

```
A = [1, 2, 3, 4, 5, 6]
```

To access `A[i][j]`, we compute:

```
Offset = (i * number_of_columns + j)
```

---

### ✅ Goal

We will define a `2x3` array and:

* Access a specific element (e.g., row 1, col 2 = `6`)
* Print its value (to verify via memory write)

---

### 🛠 Example

In [None]:
# Save as: array2d_access.asm
!echo '
section .data
    rows    equ 2
    cols    equ 3
    array2D db 1, 2, 3, 4, 5, 6    ; 2 rows x 3 columns

section .bss
    result resb 1                 ; To store accessed element

section .text
    global _start

_start:
    ; Access element at row 1, col 2 (which is 6)
    ; Offset = (1 * 3 + 2) = 5

    mov rbx, 1        ; row = 1
    mov rcx, 2        ; col = 2
    mov rdx, cols     ; total cols = 3

    imul rbx, rdx     ; row * cols → rbx = 3
    add rbx, rcx      ; rbx = 3 + 2 = 5

    mov al, [array2D + rbx]  ; get array[1][2] (6)
    mov [result], al         ; store in result

    ; Exit
    mov eax, 60
    xor edi, edi
    syscall
' > array2d_access.asm

# Assemble and link
!nasm -f elf64 array2d_access.asm -o array2d_access.o
!ld array2d_access.o -o array2d_access
!./array2d_access

# View result
!xxd -g1 -c1 array2d_access.asm | grep result

### 🔍 Code Explanation

* `array2D` is laid out in row-major order.
* We calculate the index with: `offset = row * cols + col`.
* `mov al, [array2D + rbx]` reads the 6th byte (element at row 1, col 2).
* It is stored in `result` to view via hex dump.

---

### 💡 Tip

You can easily loop over 2D arrays using nested loops by computing each element's address using this method.

---

### 📌 Summary

| Concept       | Explanation                 |
| ------------- | --------------------------- |
| Storage Order | Row-major (all rows joined) |
| Index Formula | `(row × cols) + col`        |
| Access Method | `mov al, [array + offset]`  |



⬇️

### **E. How to Build Searching and Sorting Routines**

In Assembly, we can implement basic searching and sorting algorithms by using loops and comparisons. Though slower and more verbose than in high-level languages, they help understand **memory addressing** and **logic control**.

---

### 🔍 Part 1: Linear Search

We'll implement **linear search** to find a number inside a 1D array.

---

### ✅ Code: Linear Search in Assembly

In [None]:
# Save as: linear_search.asm
!echo '
section .data
    array db 4, 7, 1, 9, 2, 5       ; 6 elements
    target db 9                    ; number to search
    len equ 6                      ; length of array

section .bss
    found_index resb 1            ; store index if found
    not_found_msg db -1

section .text
    global _start

_start:
    xor rcx, rcx                  ; index = 0

search_loop:
    cmp rcx, len
    jge not_found

    mov al, [array + rcx]
    cmp al, [target]
    je found

    inc rcx
    jmp search_loop

found:
    mov [found_index], cl         ; store index
    jmp exit

not_found:
    mov [found_index], byte [not_found_msg]

exit:
    ; Exit syscall
    mov eax, 60
    xor edi, edi
    syscall
' > linear_search.asm

!nasm -f elf64 linear_search.asm -o linear_search.o
!ld linear_search.o -o linear_search
!./linear_search

# View the result
!hexdump -C linear_search.asm | grep found_index

### 💬 Code Explanation

* The program searches for the value `9` in the array.
* If found, it stores the index in `found_index`.
* If not found, it stores `-1` (`0xFF`) in `found_index`.

---

### 🔄 Part 2: Bubble Sort (Ascending)

We'll now sort an array in **ascending order** using **Bubble Sort**.

---

### ✅ Code: Bubble Sort in Assembly

In [None]:
# Save as: bubble_sort.asm
!echo '
section .data
    array db 5, 1, 4, 2, 8
    len equ 5

section .text
    global _start

_start:
    mov rsi, 0            ; outer loop index

outer_loop:
    cmp rsi, len
    jge end_sort

    mov rdi, 0            ; inner loop index

inner_loop:
    mov al, [array + rdi]
    cmp rdi, len - 1
    jge next_outer

    mov bl, [array + rdi + 1]
    cmp al, bl
    jle skip_swap

    ; swap
    mov [array + rdi], bl
    mov [array + rdi + 1], al

skip_swap:
    inc rdi
    cmp rdi, len - 1
    jl inner_loop

next_outer:
    inc rsi
    jmp outer_loop

end_sort:
    ; Exit syscall
    mov eax, 60
    xor edi, edi
    syscall
' > bubble_sort.asm

!nasm -f elf64 bubble_sort.asm -o bubble_sort.o
!ld bubble_sort.o -o bubble_sort
!./bubble_sort

# View result
!hexdump -C bubble_sort.asm | grep array

### 🧠 Explanation

* Classic Bubble Sort: compares adjacent elements and swaps them if out of order.
* Each outer loop run "bubbles up" the largest value to the end.
* Sorted array: `[1, 2, 4, 5, 8]`

---

### 🧾 Summary Table

| Routine       | Logic Used                  | Notes              |
| ------------- | --------------------------- | ------------------ |
| Linear Search | Compare target to each item | Stores index or -1 |
| Bubble Sort   | Nested loop, compare, swap  | Sorts in-place     |