## **1.1 What Operating Systems Do**

This section introduces the fundamental role of an operating system (OS) within a computer system. Think of the OS as the core manager that makes everything else possible.

#### **The Four Components of a Computer System**

A computer system can be broken down into four main parts:
1.  **Hardware:** The physical components you can touch. This includes the **CPU (Central Processing Unit)** for computation, **memory (RAM)** for temporary data storage, and **I/O (Input/Output) devices** like keyboards, mice, disks, and monitors. These are the basic computing resources.
2.  **Operating System:** The crucial software that controls the hardware and acts as a coordinator.
3.  **Application Programs:** The software you use to get things done, like word processors (Microsoft Word), web browsers (Google Chrome), compilers (GCC), and games. These use the resources provided by the hardware.
4.  **Users:** The people who interact with the application programs.

**(Refer to Figure 1.1: Abstract view of the components of a computer system)**

The figure shows a layered view: the user interacts with the application programs, which rely on the operating system, which directly controls the computer hardware.

#### **The Operating System as a Government**

A helpful analogy is to think of the operating system as a **government**. A government itself doesn't build houses or grow food. Instead, it provides a structure, rules, and services (like roads and laws) that allow its citizens (the application programs) to work productively and coexist peacefully. Similarly, the OS doesn't do useful work directly for the user; it creates an environment where other programs can do useful work efficiently and without interfering with each other.

To fully understand the OS, we can look at it from two different perspectives.

---

### **1.1.1 User View**

How you see the operating system depends heavily on the type of computer you're using. The OS is designed differently for different user experiences.

*   **Desktop and Laptop Users:** If you're using a PC or Mac, you are the sole user of the machine. The operating system's primary goal here is **ease of use**. It wants to help you maximize your work or play. Performance (speed) and security (protecting your data) are also important, but **resource utilization**—how efficiently the hardware resources are shared—is not a top priority because you aren't sharing them with anyone else.

*   **Mobile Device Users:** For users of smartphones and tablets, the view is similar but the interface is different. Interaction is through touchscreens (taps, swipes) and often voice commands (like Siri or Google Assistant). These devices are almost always connected to networks.

*   **No Direct User View:** Many computers are designed to run without a user directly interacting with them. These are **embedded computers** found in devices like smart appliances, car engines, or industrial machines. Their operating systems are built to run a specific set of programs reliably and without user intervention. The "user view" might just be a few status lights.

---

### **1.1.2 System View**

From the computer's own perspective, the operating system is the program that has the most direct control over the hardware. We can define the OS through two key roles:

1.  **Resource Allocator:**
    A computer system is a collection of expensive resources: **CPU time, memory space, file storage space, and I/O devices**. Imagine many different application programs and users all needing these resources at the same time, sometimes in conflicting ways (e.g., two programs wanting to print at once). The operating system is the **manager** that decides how to allocate these resources to each program. Its goal is to make sure the entire system operates **efficiently** (no resources are wasted) and **fairly** (no single program hogs all the resources).

2.  **Control Program:**
    This view emphasizes the OS's role in **managing and controlling** the execution of programs. The operating system acts as a supervisor to prevent errors and stop programs from using the computer improperly. It is especially concerned with controlling **I/O devices**, which are complex and can easily be used incorrectly, potentially crashing the whole system. This control ensures stability and security.

### **1.1.3 Defining Operating Systems**

We've seen that an operating system can be a resource allocator, a control program, and something that looks different to various users. So, what exactly *is* an operating system? This section explains why it's hard to pin down a single definition and provides the definitions we'll use in this book.

#### **The Challenge of a Single Definition**

There is no single, perfect definition for an operating system. The main reason is the incredible **diversity of computers** themselves. OSes exist in everything from simple toasters and cars to powerful servers and spacecraft. This diversity stems from the rapid evolution of computers.

*   **Historical Context:** Computers evolved quickly from single-purpose machines (e.g., for code-breaking or census calculations) to general-purpose **mainframes**. It was this shift to multifunction systems that made operating systems necessary.
*   **Moore's Law:** The prediction that computing power would double roughly every 18 months led to computers becoming both more powerful and smaller. This explosion in capability and form factors resulted in a huge variety of operating systems, each designed for specific needs.

Since computers are used for so many different things, the OS designed to manage them must also vary. The fundamental goal, however, remains the same: to make the computer system **usable**. Bare hardware is difficult to program. The OS simplifies this by providing common, reusable functions (like handling I/O devices) so that application developers don't have to reinvent the wheel for every program.

#### **What's Included in an Operating System?**

There's also no universal agreement on what software components are officially part of the "operating system." Two common views are:

1.  **The "Everything Shipped" View:** The operating system is everything the software vendor includes in the box when you buy "the operating system." The problem with this definition is that it varies wildly. One OS might be tiny (less than 1 MB) and text-based, while another (like Windows or macOS) is huge (gigabytes) and based on a graphical interface.

2.  **The Kernel View (The Core Definition):** This is the definition we will usually follow. Here, the **operating system** is defined as the **kernel**—the one program that is always running on the computer. The kernel is the core that manages hardware resources and is essential for the system's operation.

However, when we talk about an OS in a broader sense, we often include other programs that come with it. We can categorize all software on a computer into three types:
*   **Kernel:** The core, always-running part of the OS.
*   **System Programs:** Programs that are associated with and support the operating system but are not part of the kernel. Examples include disk formatters, system monitors, and even command-line shells.
*   **Application Programs:** All programs not needed for the system's operation, like word processors, web browsers, and games.

#### **The Modern Complexity: The Case of Mobile OSes**

The question of "what is part of the OS" became a major legal issue in the 1990s with the U.S. vs. Microsoft case, where Microsoft was accused of adding too much functionality (like a web browser) into its OS to limit competition.

Today, this bundling is common, especially in mobile operating systems like **Apple's iOS** and **Google's Android**. These systems include:
*   The core **kernel**.
*   **Middleware:** A set of software frameworks that provide standard services to application developers, such as for databases, multimedia, and graphics. This makes it easier to write apps.

#### **Summary Definition for This Book**

For our purposes, we will consider the **operating system** to include:
1.  The **kernel** (the essential core).
2.  **Middleware** frameworks (common in modern systems).
3.  **System programs** (tools for managing the system).

Most of this textbook will focus on the concepts and techniques involved in building the **kernel** of a general-purpose operating system, as that is where the fundamental challenges of resource management and control are solved.

---

### **Why Study Operating Systems?**

You might wonder why you need to study OSes if you don't plan to build one. The reason is crucial: **almost all code runs on top of an operating system.**

Understanding how the OS works is essential for:
*   **Efficient Programming:** Knowing how the OS manages memory, the CPU, and I/O devices helps you write programs that use these resources efficiently.
*   **Effective Problem-Solving:** When something goes wrong (e.g., a program runs slowly or crashes), understanding the OS helps you diagnose the problem.
*   **Secure Programming:** Many security vulnerabilities stem from misunderstandings about how the OS manages processes and memory. Understanding the OS is key to writing secure code.

In short, understanding the fundamentals of operating systems is not just for OS developers; it is **highly useful for any programmer or computer scientist** who writes applications that run on them.

## **1.2 Computer-System Organization**

This section dives into the hardware organization of a standard computer system. Understanding this hardware setup is crucial because the operating system is designed directly to manage and interact with these components. Think of this as learning the "playing field" on which the OS operates.

#### **The Basic Computer System Layout**

A modern general-purpose computer is built around several key components connected by a central highway:

1.  **CPUs (Central Processing Units):** These are the brains of the computer, responsible for executing program instructions. Modern systems often have multiple CPUs (or cores).
2.  **Device Controllers:** These are specialized processors that manage specific types of I/O devices. Each controller is in charge of a particular device type, like a disk drive, graphics adapter, or USB port. A controller can often manage more than one device (e.g., a USB controller can manage a keyboard, mouse, and printer through a hub).
3.  **Shared Memory (RAM):** This is the main memory that both the CPUs and the device controllers can access.
4.  **System Bus:** This is the communication pathway that connects the CPUs, memory, and device controllers, allowing them to exchange data and signals.

**(Refer to Figure 1.2: A typical PC computer system)**

The figure shows how the CPU and various device controllers (for disks, graphics, USB, etc.) are all connected to the shared memory via the common system bus.

**How Device Controllers Work:**
Each device controller has its own small, fast memory called a **local buffer** and a set of **special-purpose registers**. The controller's job is to handle the low-level details of communicating with its specific device. For example, a disk controller moves data between the physical disk drive and its local buffer.

**The Role of the Operating System: Device Drivers**
The operating system doesn't talk directly to the complex hardware of each device controller. Instead, for each controller type, the OS has a software component called a **device driver**. The driver understands the specific details and commands for its controller. It provides a simple, standard interface to the rest of the OS. This means the OS core can just say "read a block of data" to any disk driver, without needing to know if it's a SATA SSD or a NVMe drive.

**Parallel Execution and Memory Access:**
A key point is that the **CPU and the device controllers can all run at the same time (in parallel)**. They are independent units. They often need to access the main memory (RAM) simultaneously—the CPU to fetch instructions and data, and a controller to place data it has just read from a device. To prevent chaos, a **memory controller** is used to synchronize access to the shared memory, ensuring orderly reads and writes.

We will now explore three fundamental aspects of how this system operates, starting with the crucial concept of interrupts.

---

### **1.2.1 Interrupts**

Interrupts are the fundamental mechanism that allows the CPU to be notified when an event requires its attention. They are essential for efficient I/O handling and prevent the CPU from wasting time waiting for slow devices.

#### **A Typical I/O Operation with Interrupts**

Let's walk through the example of a program reading a character from the keyboard. This illustrates why interrupts are needed.

1.  **Initiating the I/O:** The program requests a read operation. The OS's **device driver** for the keyboard takes over. It communicates with the **keyboard controller** by loading commands and parameters into the controller's registers (e.g., "read a character").
2.  **Controller Handles the Details:** The keyboard controller then performs the actual work. It waits for a keypress, receives the electrical signal, determines which character was pressed, and transfers that character's data into its **local buffer**.
3.  **The Waiting Problem:** Once the driver has started the controller, what should the CPU do? Without interrupts, the CPU would have to sit in a loop constantly checking (or **polling**) the controller's status register to see if the data is ready. This is incredibly inefficient and wastes CPU cycles that could be used for other work.
4.  **The Interrupt Solution:** Instead of polling, the system uses an interrupt. When the keyboard controller has finished transferring the character to its buffer, it **signals the CPU that it has finished** by triggering an **interrupt**.
5.  **CPU Response:** The interrupt signal causes the CPU to immediately pause its current work, save its state (so it can resume later), and jump to a special function called an **interrupt handler** (or interrupt service routine). This handler is part of the device driver.
6.  **Completion:** The interrupt handler in the keyboard driver reads the data from the controller's buffer, delivers it to the requesting program, and then informs the OS scheduler that the program waiting for input can now continue. Finally, the CPU restores its saved state and resumes what it was doing before the interrupt occurred.

In summary, the interrupt is the controller's way of saying, "I'm done with the task you gave me." This mechanism allows the CPU to work on other tasks while slow I/O operations are in progress, leading to much higher system efficiency.

#### **1.2.1.1 Interrupts: A Detailed Overview**

This section provides a deeper look into the hardware mechanism of interrupts, which is a critical concept in computer architecture that enables efficient multitasking and I/O handling.

##### **The Basic Interrupt Mechanism**

Interrupts are signals sent by hardware devices to the CPU via the **system bus** (the main communication highway connecting the CPU, memory, and controllers). These signals can arrive at any time, and they are the primary way hardware communicates with the CPU.

The standard sequence of events when an interrupt occurs is as follows:

1.  **CPU is Interrupted:** The CPU is executing instructions from a program.
2.  **Transfer to Fixed Location:** The interrupt signal causes the CPU to stop its current work immediately. It then transfers execution to a predetermined, fixed memory location.
3.  **Execute Interrupt Service Routine (ISR):** This fixed location contains the starting address of an **Interrupt Service Routine (ISR)**, which is a special function designed to handle the specific interrupt. The ISR, which is part of the device driver, executes.
4.  **Resume Computation:** Once the ISR has finished its task, the CPU resumes executing the original program from the exact point where it was interrupted, as if nothing had happened.

**(Refer to Figure 1.3: Interrupt timeline for a single program doing output)**

This figure shows a visual timeline: the program executes, issues an I/O request, and continues executing. Meanwhile, the I/O device is working. When the device finishes, it issues an interrupt. The CPU then switches to the I/O interrupt service routine to handle the completion before returning control to the user program.

##### **The Need for Speed: The Interrupt Vector**

Interrupts happen very frequently in a running system, so they must be handled as quickly as possible. Each type of device (keyboard, disk, timer) has its own unique ISR. The CPU needs a fast way to find the correct ISR for a given interrupt.

The solution is an **interrupt vector**. This is a table (an array) stored in a fixed location in **low memory** (the first hundred or so memory addresses). Each entry in this table is a pointer to the starting address of an ISR for a specific interrupt.

Here's how it works:
1.  When a device triggers an interrupt, it also sends a unique number, called an **interrupt request number (IRQ)**, along with the signal.
2.  The CPU uses this IRQ number as an **index** into the interrupt vector table.
3.  It looks up the address stored at that index in the table.
4.  It immediately jumps to that address to execute the correct ISR.

This method is extremely fast because it involves a simple table lookup and jump, with no intermediate steps. This design is common across different operating systems like Windows and UNIX/Linux.

##### **Preserving State: The Importance of Saving and Restoring**

A crucial requirement for interrupts to work transparently is that the interrupted program must not know it was interrupted. From the program's perspective, its instructions execute sequentially without any breaks. To achieve this, the system must save the **state** of the interrupted computation.

The **state** includes all the information needed to resume execution exactly where it left off. This primarily means:
*   The **program counter (PC)**, which holds the address of the next instruction to execute.
*   The contents of all **CPU registers**.

This saving of state can happen in two ways:
1.  **Hardware-Automated Saving:** The CPU hardware itself may automatically save the program counter and possibly some key registers onto a special stack (the kernel stack) before jumping to the ISR.
2.  **Software Saving within the ISR:** The ISR code itself is responsible for saving the state of any registers it plans to use. Before doing any work, the ISR must save the current values of these registers (typically by pushing them onto the stack). Before returning, it must restore these saved values.

After the ISR finishes, the saved return address (the old program counter) is loaded back into the PC, and the restored register values are used. This allows the interrupted computation to continue seamlessly, completely unaware of the interrupt. This careful saving and restoring of state is what makes concurrent execution of multiple processes possible.

#### **1.2.1.2 Interrupt Implementation**

This section explains the precise hardware and software steps of an interrupt and then discusses the advanced features needed in a modern OS.

##### **The Step-by-Step Interrupt Mechanism**

The process is a precise sequence of cooperation between hardware and software:

1.  **Hardware Detection:** The CPU hardware has a special wire called the **interrupt-request line**. The CPU checks this wire for a signal **after executing every single instruction**.

2.  **Interrupt Signal:** When a device controller needs service, it **asserts a signal** (puts a voltage) on this line. We say the controller **raises** an interrupt.

3.  **CPU Response:** The CPU **catches** the interrupt. It immediately saves the address of the next instruction (the program counter) and then reads an **interrupt number** provided by the interrupting device.

4.  **Handler Dispatch:** The CPU uses this interrupt number as an index into the **interrupt vector** table stored in low memory. It retrieves the address of the corresponding **interrupt handler** (a routine within the device driver) and jumps to it.

5.  **Software Handling (The ISR):** The interrupt handler now executes. Its tasks are:
    *   **Save State:** It saves the current state (CPU register values) that it will modify, typically by pushing them onto a stack.
    *   **Process Interrupt:** It determines the cause (e.g., which device raised the interrupt) and performs the necessary processing (e.g., reading data from the device controller's buffer).
    *   **Restore State:** It restores the saved register values by popping them off the stack.
    *   **Return:** It executes a special **return from interrupt** instruction. This instruction restores the CPU to its pre-interrupt state, resuming the interrupted program.

6.  **Cycle Completion:** The interrupt is considered **cleared** once the handler has serviced the device.

**(Refer to Figure 1.4: Interrupt-driven I/O cycle)**

This figure provides a numbered flowchart of the entire process, showing the interaction between the CPU and the I/O controller, from initiating I/O to resuming the interrupted task.

##### **Advanced Interrupt-Handling Features**

A basic interrupt system is sufficient for simple computers, but modern operating systems require more sophisticated control. Three key needs are:

1.  **Deferring Interrupts:** The OS must be able to postpone handling interrupts during critical tasks, such as when it is already processing a crucial OS data structure.
2.  **Efficient Dispatching:** The system needs a fast way to find the right handler for a device.
3.  **Prioritization:** Not all interrupts are equally urgent. A network packet arriving is time-sensitive, while a printer finishing a job is not. The OS needs a way to prioritize.

These features are provided by the CPU and an additional chip called an **interrupt controller**.

##### **Interrupt Lines: Maskable vs. Nonmaskable**

Most CPUs have two distinct interrupt-request lines to address the need for deferment:

*   **Nonmaskable Interrupt (NMI):** This interrupt **cannot be disabled** (or "masked") by the CPU. It is reserved for critical, unrecoverable hardware errors like a memory parity error or a power failure warning. The OS *must* respond to it immediately.
*   **Maskable Interrupt:** This is the standard interrupt line used by device controllers (disks, network cards, etc.). The CPU can temporarily **disable** (mask) these interrupts before executing a critical sequence of instructions that must complete without being interrupted. This ensures the OS can perform essential tasks atomically.

##### **Interrupt Chaining: Handling Many Devices**

The interrupt vector table has a limited number of entries. But a modern PC has more devices than available vector numbers. A common solution is **interrupt chaining**.

In this scheme, each entry in the interrupt vector table does not point to a single handler. Instead, it points to the **head of a linked list** of interrupt handlers. When an interrupt occurs, all the handlers in the corresponding chain are called one by one. Each handler checks if its device was the source of the interrupt. The first handler that recognizes its device services the interrupt.

This is a compromise: it avoids the need for a massive interrupt table while also avoiding the inefficiency of having one giant handler check every possible device.

**(Refer to Figure 1.5: Intel processor event-vector table)**

This table shows a real-world example. Entries 0-31 are for nonmaskable events like divide-by-zero errors and page faults. Entries 32-255 are for maskable, device-generated interrupts.

##### **Interrupt Priority Levels**

The interrupt mechanism also supports **priority levels**. This allows the CPU to defer low-priority interrupts without disabling all interrupts. More importantly, it enables **interrupt preemption**: a high-priority interrupt can itself interrupt the execution of a low-priority interrupt handler. This ensures the most urgent work is done first.

##### **Summary**

Interrupts are the fundamental mechanism for handling asynchronous events in a computer system, from I/O completion to hardware errors. Modern systems use a sophisticated architecture involving maskable interrupts, an interrupt vector for fast dispatching, chaining for flexibility, and priority levels to ensure that time-critical tasks receive immediate attention. Efficient interrupt handling is absolutely essential for good system performance.

### **1.2.2 Storage Structure**

#### **The Central Role of Memory**
Think of the CPU as the brain of the computer. This brain can only think about things that are right in front of it. In computer terms, the CPU can only execute instructions that are already loaded into the **main memory** (also called **RAM - Random Access Memory**). This is a fast, rewritable, but **volatile** type of memory. "Volatile" means it's like short-term memory; it forgets everything as soon as the power turns off. Main memory is typically built using **DRAM (Dynamic Random-Access Memory)** technology.

But if the computer just turned on and RAM is empty, how does it know what to do first? This is where other, more permanent types of memory come in.

#### **Nonvolatile Memory: The Bootstrap and Firmware**
The very first program that runs when you turn on a computer is the **bootstrap program**. Since the RAM is empty and volatile, this program can't be stored there. Instead, it's stored in **nonvolatile memory**, which retains its contents even without power.

One common type is **EEPROM (Electrically Erasable Programmable Read-Only Memory)**, a form of **firmware**. Think of EEPROM as a permanent notepad. You can write on it and change it if you really need to, but you don't do it often because it's a slow process. It's perfect for storing essential, rarely-changed information like the bootstrap program, a device's serial number, or hardware settings (like on an iPhone).



#### **Storage Definitions and Notation**

Before we go further, let's define the basic building blocks of storage:

*   **Bit:** The smallest unit, a single binary digit (0 or 1).
*   **Byte:** A group of 8 bits. This is the smallest unit that a computer typically moves around.
*   **Word:** The natural unit of data for a specific computer architecture. It's the size of the processor's registers. A word is made up of one or more bytes. For example, a 64-bit computer has a word size of 64 bits, or 8 bytes. The CPU likes to work with full words whenever possible.

We measure storage in bytes. Because computers use binary math, the units are based on powers of 2 (2^10 = 1024), not powers of 10 (1000). However, manufacturers often use the decimal system for marketing.

*   **Kilobyte (KB):** 1,024 bytes
*   **Megabyte (MB):** 1,024^2 bytes (1,048,576 bytes)
*   **Gigabyte (GB):** 1,024^3 bytes
*   **Terabyte (TB):** 1,024^4 bytes
*   **Petabyte (PB):** 1,024^5 bytes

**Important Exception:** Networking speeds are measured in **bits per second** (e.g., Mbps), because data is sent one bit at a time over a network.

#### **How the CPU Interacts with Memory**
All memory can be thought of as a massive array of bytes, where each byte has a unique **address**. The CPU interacts with memory using two fundamental instructions:

1.  **Load:** Moves a byte or word from main memory *into* a CPU register.
2.  **Store:** Moves the content of a CPU register *to* a byte or word in main memory.

This happens in the **instruction-execution cycle** (von Neumann architecture):
1.  **Fetch:** The CPU fetches the next instruction from the memory address held in the **program counter** and loads it into the **instruction register**.
2.  **Decode:** The CPU decodes the instruction.
3.  **Execute:** If the instruction needs data from memory, it performs a **load** to get the operands into registers. The CPU executes the instruction.
4.  **Store Result:** The result may be **stored** back into memory.

From the memory's perspective, it just sees a stream of addresses coming from the CPU. It doesn't know or care if an address is for an instruction or a piece of data.

---

#### **The Need for Secondary Storage**
Ideally, we'd keep everything in fast main memory (RAM). But this is impossible for two reasons:
1.  **Size:** Main memory is too small to hold all of our programs and data permanently.
2.  **Volatility:** RAM loses all data when the power is lost.

To solve this, computers use **secondary storage**. This is nonvolatile storage that can hold massive amounts of data permanently. The most common types are **Hard-Disk Drives (HDDs)** and **Nonvolatile Memory (NVM)** devices (like SSDs). Programs live on secondary storage until they are loaded into RAM to run. The trade-off? Secondary storage is **much slower** than main memory. Managing this speed difference is a critical job of the operating system (discussed in Chapter 11).

---

#### **The Memory Hierarchy**

We don't just have two types of storage. We have a whole pyramid of storage types, called the **memory hierarchy**. (Refer to **Figure 1.6** in your text).

This hierarchy is organized based on a trade-off between **speed, size, and cost**:
*   **Rule of Thumb:** The **smaller** and **faster** the memory, the more **expensive** it is per byte and the **closer** it needs to be to the CPU.
*   **Volatility:** The hierarchy is also split between volatile and nonvolatile storage.

**Let's walk through the hierarchy from top (fastest, smallest) to bottom (slowest, largest):**

| Level | Example | Volatile? | Typical Use |
| :--- | :--- | :--- | :--- |
| **Primary Storage** | **Registers** (inside CPU) | Volatile | Holding the data the CPU is working on right now. |
| | **Cache** (L1, L2, L3) | Volatile | A buffer between the super-fast CPU and slower RAM. |
| | **Main Memory** (RAM) | Volatile | Holding programs and data currently in use. |
| **Secondary Storage** | **NVM Devices** (e.g., SSDs, Flash) | Nonvolatile | Permanent storage for programs and data; faster than HDDs. |
| | **HDDs** (Hard Drives) | Nonvolatile | Permanent storage for programs and data; high capacity, low cost. |
| **Tertiary Storage** | **Magnetic Tapes, Optical Discs** | Nonvolatile | Archival and backup; very slow, very high capacity. |

**Key Technology Notes:**
*   The top three levels (Registers, Cache, RAM) are built with **semiconductor memory** (like DRAM).
*   **NVM devices** (like the flash memory in your phone or an SSD) are becoming extremely common and are faster than traditional hard drives (HDDs).

---

#### **Operating System Terminology for Storage**

To keep things clear throughout the book, the text will use specific terms:

*   **"Memory":** This will always refer to **volatile storage** (RAM). If it means something else (like a register), it will be specified.
*   **"NVS" (NonVolatile Storage):** This is the general term for storage that persists without power. It's divided into two types:
    1.  **Mechanical Storage:** Devices with moving parts. Examples: **HDDs, optical disks, magnetic tapes**. Generally, these are larger, slower, and cheaper per byte.
    2.  **Electrical Storage / NVM (NonVolatile Memory):** Solid-state devices with no moving parts. Examples: **Flash memory, SSDs (Solid-State Drives)**. Generally, these are faster, smaller, and more expensive per byte.

The goal of a good storage system design is to balance all these factors: use fast memory where you need speed, and cheap, spacious storage where you need capacity. **Caches** are a crucial tool for this, acting as high-speed buffers to smooth over the large speed differences between levels of the hierarchy.

### **1.2.3 I/O Structure**

#### **The Importance of I/O Management**
A huge part of an operating system's job is managing Input/Output (I/O). This is critical for both system reliability and performance. I/O is complex because there are so many different types of devices (keyboards, disks, network cards, etc.), each with their own speeds and ways of communicating.

#### **The Problem with Simple Interrupt-Driven I/O**
Recall from Section 1.2.1 the basic concept of interrupt-driven I/O: a device sends an interrupt signal to the CPU when it needs attention. This works well for slow devices that handle small amounts of data, like a keyboard where you type one character at a time.

However, this method creates a lot of **overhead** for bulk data transfers, like reading a large file from a hard drive. Imagine if for every single byte of that file, the hard drive had to interrupt the CPU. The CPU would spend almost all its time just handling interrupts instead of doing useful work, slowing the entire system to a crawl.

#### **The Solution: Direct Memory Access (DMA)**
To solve this problem, computers use a smarter component called a **DMA (Direct Memory Access) controller**.

Here’s how DMA works for a large data transfer, like reading a file from a storage device:

1.  **Setup:** The CPU does some initial work. It tells the DMA controller the following:
    *   The memory address where the data should be written (or read from).
    *   The number of bytes to transfer.
    *   The direction of the transfer (read from device or write to device).

2.  **Transfer:** Once set up, the **DMA controller takes over**. It manages the entire data transfer directly between the I/O device and the main memory.
    *   **The CPU is free** during this time to execute other tasks.

3.  **Completion:** After the entire block of data has been transferred, the DMA controller sends a **single interrupt** to the CPU to say, "The operation is complete."

**The key advantage:** Instead of one interrupt per byte (which is inefficient), we have **one interrupt per large block of data**. This dramatically reduces the CPU's overhead and allows for much faster I/O operations.

#### **System Architecture: Buses vs. Switches**
Most standard computers use a **bus architecture**, where a single shared communication pathway (the bus) connects the CPU, memory, and I/O devices. Devices take turns using the bus. This can create a bottleneck if multiple devices need to communicate at once.

**High-end systems** (like powerful servers) often use a **switch architecture**. Think of it like a network switch: the switch allows multiple components to have direct, concurrent conversations with each other. For example, a disk drive can transfer data to memory at the same time as a network card is receiving data. In this kind of system, **DMA becomes even more effective** because data paths don't have to wait for a shared bus.

#### **Putting It All Together: How a Modern Computer System Works**
(Refer to **Figure 1.7** in your text for a visual representation of these interactions.)

The figure shows the interplay between all components:
*   The **CPU** follows its **thread of execution**, going through the **instruction execution cycle** (fetch, decode, execute).
*   It accesses **instructions and data** from **memory** (often via a **cache** for speed).
*   When an **I/O request** is made, the **DMA controller** handles the bulk **data movement** between the **device** and **memory**.
*   Once the transfer is complete, an **interrupt** is sent to the CPU, which then resumes its work.

This coordinated effort, managed by the operating system, allows the computer to perform efficiently, keeping the CPU busy while I/O operations happen in the background.

## **1.3 Computer-System Architecture**

#### **Introduction: Categorizing Systems by Processors**
In the previous section, we looked at the general components of a computer system (CPU, memory, I/O). Now, we'll see how these components can be organized in different ways. The primary way to categorize computer systems is by the number of general-purpose processors they use.

---

### **1.3.1 Single-Processor Systems**

#### **The Traditional Single-Core CPU**
Traditionally, most computers were **single-processor systems**. This means they had one **general-purpose processor** (one CPU) containing a single **processing core**. The **core** is the part of the processor that actually executes instructions and has its own set of local registers for storing data. This single main CPU core is what runs the operating system and application processes by executing a general-purpose instruction set.

#### **The Role of Special-Purpose Processors**
Even in these so-called "single-processor" systems, there are often many other processors! These are **special-purpose processors** designed for specific tasks. They are not general-purpose CPUs. Examples include:
*   **Disk controller microprocessors**
*   **Keyboard controller microprocessors**
*   **Graphics controller (GPU) processors**

These special-purpose processors have two key characteristics:
1.  They run a very **limited, specialized instruction set**.
2.  They **do not run user processes** like a web browser or a word processor.

#### **How the OS Manages Special-Purpose Processors**
The operating system's relationship with these processors varies:

1.  **Managed by the OS:** In many cases, the OS directly manages them. The OS sends them commands and monitors their status.
    *   **Example - Disk Controller:** The main CPU tells the disk controller microprocessor to read a certain block of data. The disk controller then handles the complex, low-level task of moving the read head, reading the data, and managing its own queue of requests. This **offloads work** from the main CPU, freeing it up for other tasks. This is a form of hardware-level synchronization.

2.  **Autonomous Hardware Components:** In other cases, these processors are low-level components that work completely independently. The operating system does not communicate with them directly.
    *   **Example - Keyboard Controller:** A microprocessor in your keyboard constantly scans the key matrix. When you press a key, it autonomously converts the physical keypress into a scan code and sends it to the main CPU. The OS doesn't tell it how to do this; it just receives the result.

#### **The Key Definition of a Single-Processor System**
The critical point is this: **The presence of these special-purpose microprocessors does NOT make a system a multiprocessor system.**

**The definition is strict:** If a system has only **one general-purpose CPU** with a **single processing core**, it is a **single-processor system**, regardless of how many other special-purpose chips it has.

Because of this definition, **very few modern computers are truly single-processor systems.** Almost all contemporary devices—from smartphones to laptops—use CPUs with multiple cores, which we will discuss next. This section describes an older, but foundational, architectural model.

### **1.3.2 Multiprocessor Systems**

#### **Introduction: The Dominance of Multiprocessing**
On virtually all modern computers, from smartphones to servers, **multiprocessor systems** are the standard. These systems have multiple processing units that work together. The main goal is to increase **throughput**—getting more work done in less time. While adding a second processor doesn't double the speed (due to coordination overhead), it significantly boosts performance.

#### **Symmetric Multiprocessing (SMP)**
The most common type of multiprocessor system uses **Symmetric Multiprocessing (SMP)**. In an SMP system:
*   There are two or more identical, **peer CPUs**.
*   Each CPU is independent and has its own set of **registers** and a **private cache** (often called an L1 cache).
*   All CPUs **share the same physical memory** and I/O devices, connected via a common **system bus**.

(Refer to **Figure 1.8** in your text for a visual of this architecture).

**How it works:** In SMP, every processor can perform any task, whether it's running the operating system kernel or a user application. This allows **N processes to run truly in parallel if there are N CPUs**.

**The Challenge:** Because the CPUs are separate, the system can become unbalanced. One CPU might be idle while another is overloaded. To prevent this, the operating system must use shared data structures to distribute the workload dynamically among all processors. This requires careful programming to avoid conflicts, a topic covered in Chapters 5 and 6.

#### **The Evolution: Multicore Systems**
The definition of a multiprocessor has evolved. Instead of having multiple separate processor chips, we now mostly have **multicore** systems, where multiple computing cores reside on a single physical chip.

*   **Core:** The basic computation unit of a CPU.
*   **Multicore:** A single processor chip that contains multiple cores.

(Refer to **Figure 1.9** in your text for a dual-core design).

**Advantages of Multicore:**
1.  **Efficiency:** Communication between cores on the same chip is much faster than communication between separate chips.
2.  **Power Savings:** One chip with multiple cores uses significantly less power than multiple single-core chips. This is crucial for mobile devices and laptops.

**Typical Multicore Design:** Each core has its own private **L1 cache**. They often share a larger **L2 cache** on the same chip. This combines the speed of private caches with the capacity of a shared cache.

From the operating system's perspective, **a multicore processor with N cores looks exactly like N standard CPUs**. This places a major responsibility on the OS (and application programmers) to efficiently schedule tasks across all available cores. All modern operating systems (Windows, macOS, Linux, Android, iOS) support SMP on multicore systems.

---

### **Definitions of Computer System Components**

To avoid confusion, here are the precise definitions the text will use:

*   **CPU (Central Processing Unit):** The hardware that executes instructions. We use this as a general term for a single computational unit.
*   **Processor:** A physical chip that contains one or more CPUs.
*   **Core:** The basic computation unit of the CPU. A single computing engine.
*   **Multicore:** Including multiple computing cores on the same CPU chip.
*   **Multiprocessor:** A system that includes multiple processors.

**Important Note:** Since almost all systems are now multicore, we will use "CPU" loosely to mean a computational unit. We will use "core" and "multicore" when specifically referring to the architecture of a single chip.

---

#### **Scaling Beyond the Bus: Non-Uniform Memory Access (NUMA)**
There's a limit to how well SMP systems can scale. As you add more and more CPUs, they all compete for access to the shared memory over the single system bus, which becomes a **bottleneck**.

**NUMA (Non-Uniform Memory Access)** is an advanced architecture designed to solve this scaling problem.

*   **How it works:** In a NUMA system, each CPU (or group of CPUs) has its own **local memory**. The CPUs are connected by a high-speed **system interconnect**.
*   **The "Non-Uniform" Part:** The key characteristic is that access time to memory is **not uniform**.
    *   Accessing **local memory** (the memory attached to your own CPU) is very fast.
    *   Accessing **remote memory** (memory attached to another CPU) is slower because it has to travel across the interconnect.

(Refer to **Figure 1.10** in your text for a visual of the NUMA architecture).

**Advantage:** NUMA systems can scale to a much larger number of processors because there is less contention for a single memory bus.
**Disadvantage:** Performance can suffer if a process running on one CPU needs to frequently access data in another CPU's memory. The OS must be smart about **CPU scheduling** and **memory management** (discussed in Section 5.5.2 and Section 10.5.4) to keep a process and its memory on the same node as much as possible. NUMA is very popular in high-end servers.

#### **Blade Servers**
Finally, **blade servers** represent another multiprocessor design. In a blade server chassis, multiple independent **processor boards** (blades) are stacked together. The key difference is that:
*   Each blade **boots independently** and runs its **own instance of an operating system**.
*   Some blades can themselves be multiprocessor systems.

This blurs the line between a single computer and a cluster of computers. Essentially, a blade server is a collection of multiple independent multiprocessor systems sharing a single chassis for power and networking.

### **1.3.3 Clustered Systems**

#### **What is a Clustered System?**
A **clustered system** is another form of a multiprocessor system, but it's structured differently from the tightly-coupled SMP and NUMA systems we just discussed. Instead of multiple CPUs sharing memory inside one computer, a cluster is made up of **two or more individual systems** (called **nodes**) that are linked together. Each node is typically a complete, independent computer, often a multicore system itself. These systems are considered **loosely coupled**.

The exact definition of a cluster can be fuzzy, but the generally accepted one is that clustered computers **share storage** and are **closely linked via a network** like a **LAN (Local-Area Network)** or a very fast interconnect like **InfiniBand**.

(Refer to **Figure 1.11** in your text for the general structure of a clustered system).

#### **Primary Goal: High-Availability**
The most common reason for building a cluster is to provide **high-availability** service. This means the service provided by the cluster will continue to operate even if one or more of its nodes fails.

**How does it achieve this? Through redundancy.**
*   Special **cluster software** runs on each node.
*   The nodes constantly **monitor each other** over the network (a "heartbeat").
*   If a node fails, a monitoring node can take over its work: it takes ownership of the failed node's storage and restarts its applications.
*   From a user's perspective, this results in only a brief interruption of service.

This ability to continue service is a form of **graceful degradation**. Some highly robust clusters are **fault tolerant**, meaning they can survive the failure of any single component without any interruption in service. Fault tolerance requires sophisticated mechanisms to detect, diagnose, and correct failures automatically.

#### **Types of Clustering: Asymmetric vs. Symmetric**

1.  **Asymmetric Clustering (Active-Passive):**
    *   One machine (the **active server**) runs the applications.
    *   The other machine is in **hot-standby mode**. It does nothing but monitor the active server.
    *   If the active server fails, the hot-standby host becomes active.
    *   This is simple but inefficient because the standby hardware is idle until a failure occurs.

2.  **Symmetric Clustering (Active-Active):**
    *   Two or more hosts are **both running applications** and monitoring each other.
    *   This is more efficient because it uses all of the available hardware.
    *   It requires that there are multiple applications to run, so if one node fails, its workload can be distributed across the remaining active nodes.

#### **Secondary Goal: High-Performance Computing**
Because a cluster is a group of computers connected by a network, it can also be used to tackle massive computational problems. The combined power of all the nodes can far exceed that of a single-processor or even a large SMP system.

To use a cluster this way, the application must be specially designed using a technique called **parallelization**. This means the program is split into separate components that can run simultaneously on different nodes in the cluster. Each node works on its part of the problem, and the results are combined at the end for a final solution.

#### **Parallel Clusters and Shared Data**
A common type of high-performance cluster is a **parallel cluster**, where multiple hosts need to access the **same data** on shared storage. This is complex because most standard operating systems aren't designed for multiple computers to simultaneously read from and write to the same disk.

To make this work, we need:
*   **Special Software:** Special versions of applications and operating systems are required. For example, **Oracle Real Application Clusters (RAC)** is a database version designed for parallel clusters.
*   **Distributed Lock Manager (DLM):** This is a crucial software component that controls access to the shared data. It provides **access control and locking** to prevent conflicts when multiple nodes try to modify the same data at the same time. The DLM ensures data remains consistent.

#### **The Role of Storage-Area Networks (SANs)**
Cluster technology is rapidly evolving, enabled in large part by **Storage-Area Networks (SANs)** (covered in Section 11.7.4). A SAN is a dedicated, high-speed network that provides multiple servers with access to a shared pool of storage devices.

**How SANs help clustering:**
*   The applications and data reside on the SAN, not on any individual node.
*   Any node in the cluster can run an application because they all have equal access to the data on the SAN.
*   This makes failover seamless. If a host fails, any other host can immediately take over, as it already has access to the necessary data and applications.
*   This allows for very large-scale database clusters where dozens of hosts can work on the same database, boosting both performance and reliability.

---

### **PC Motherboard (Sidebar)**

The text includes a sidebar on a PC motherboard to connect these abstract concepts to physical hardware.

*   A desktop PC motherboard with a processor socket, DRAM slots, and I/O connectors is a fully functioning computer once assembled.
*   Even low-cost CPUs today contain **multiple cores**.
*   Some motherboards have **multiple processor sockets**, creating an SMP system.
*   More advanced systems with multiple system boards can create **NUMA systems**.

This highlights that the architectures discussed (single-core, multicore, SMP, NUMA) are all built upon the same fundamental physical components.

## **1.4 Operating-System Operations**

#### **Introduction: The OS as an Execution Environment**
We've covered the computer's hardware; now let's talk about the software that brings it to life: the operating system. The OS creates the environment where programs can run. While different OSes are built in different ways, they share common fundamental operations.

#### **Booting the System: The Bootstrap Program**
For a computer to start—when you power it on or reboot it—it needs an initial program to run. This is the **bootstrap program** (often called "the BIOS" or "UEFI firmware" in PCs).

*   **Where is it stored?** It's stored in **firmware** (like EEPROM) on the computer's hardware, so it's available immediately when power is applied.
*   **What does it do?** It's a simple program that performs the initial "wake-up" sequence:
    1.  It initializes all hardware components: CPU registers, device controllers, and memory.
    2.  Its most important job is to **locate the operating system kernel** on a storage device (like a hard drive or SSD) and **load it into memory**.
*   Once the kernel is loaded into memory and the CPU starts executing it, the OS takes over.

#### **System Startup: The Kernel and Daemons**
After the bootstrap program loads the OS kernel, the kernel begins providing services. However, not all system services run inside the kernel itself.

*   **System Daemons:** Many services are started by **system programs** that are loaded at boot time. These programs run in the background as long as the system is on and are called **daemons** (in UNIX/Linux) or services (in Windows).
*   **Example - systemd:** On modern Linux systems, the first program started after the kernel is typically `systemd`. Its job is to start all the other necessary daemons, like a network manager, a scheduler, and a logging service.
*   Once all these daemons are running, the system is considered **fully booted** and waits for events.

#### **Event-Driven Execution: The Role of Interrupts**
When the system is fully booted, what does the OS do? If there's nothing to run, no I/O to handle, and no user input, the OS simply waits. It is an **event-driven system**. Almost all events are signaled by an **interrupt**.

We learned about **hardware interrupts** in Section 1.2.1 (e.g., a disk controller signaling that a data transfer is complete). Now we introduce a second, crucial type:

**Trap (or Exception): A software-generated interrupt.**
There are two main causes of a trap:
1.  **An Error:** For example, if a program tries to divide by zero or access memory it doesn't have permission for, the CPU generates a trap. The OS then handles this error, often by terminating the offending program.
2.  **A Service Request:** This is the deliberate way a user program asks the OS to do something on its behalf. A program performs this request by executing a special instruction called a **system call** (e.g., to read a file, send data over a network, or create a new process). Executing a system call triggers a trap, which switches the CPU from user mode to kernel mode, allowing the OS to safely execute the requested service.

---

### **Hadoop (Sidebar)**

**Hadoop** is a practical, real-world example of software designed for the clustered systems we just discussed. It's an open-source framework for processing massive data sets ("big data") across a cluster of inexpensive computers.

**Key Characteristics:**
*   **Designed for Clusters:** It scales from one machine to thousands.
*   **Manages Parallelism:** It assigns tasks to nodes and manages communication between them to process data in parallel.
*   **Provides Reliability:** It automatically detects and handles node failures, making the entire cluster highly reliable.

**Hadoop is organized into three core components:**
1.  **Distributed File System (HDFS):** Manages files and data spread across all the nodes in the cluster.
2.  **YARN ("Yet Another Resource Negotiator"):** Acts as the cluster's operating system. It manages resources (CPU, memory) and schedules tasks on the nodes.
3.  **MapReduce:** A programming model that allows problems to be broken down into parts that can be processed in parallel on different nodes. The "Map" step processes the data, and the "Reduce" step combines the results into a final answer.

Hadoop typically runs on Linux, and applications can be written in various languages, with Java being a popular choice due to extensive libraries.

### **1.4.1 Multiprogramming and Multitasking**

#### **The Need for Running Multiple Programs**
A fundamental goal of operating systems is to maximize the use of the CPU. A single program can rarely keep the CPU or I/O devices busy 100% of the time. Furthermore, users want the ability to run more than one program at once. **Multiprogramming** solves this by ensuring the CPU always has a program to execute, thereby increasing CPU utilization and keeping the user productive.

In a multiprogrammed system, a program that is loaded into memory and executing is called a **process**.

#### **How Multiprogramming Works**
The core idea is straightforward:

1.  The operating system keeps **several processes in memory** at the same time. (Refer to **Figure 1.12** for a visual of the memory layout).
2.  The OS begins executing one process.
3.  Eventually, that process will have to wait for something, like an I/O operation (e.g., reading a file from a slow disk).
4.  Instead of letting the CPU sit idle, the OS **switches to and executes another process** that is ready to run.
5.  When *that* process needs to wait, the CPU switches to yet another process.
6.  This continues, and when the first process finishes waiting, it gets the CPU back.

As long as there is at least one process that can execute, the CPU is never idle.

**Analogy:** Think of a lawyer working on multiple cases. While one case is waiting for a court date or for documents to be prepared, the lawyer works on another case. If the lawyer has enough clients, she is never idle.

#### **From Multiprogramming to Multitasking**
**Multitasking** (or **time-sharing**) is a direct extension of multiprogramming. The key difference is the **frequency of switching** and the **primary goal**:

*   In multiprogramming, the goal is to maximize **CPU utilization** (a system-oriented goal).
*   In multitasking, the goal is to provide a **fast response time** to the user (a user-oriented goal).

**Why is frequent switching needed?** Interactive processes (like a text editor or a web browser) spend a lot of time waiting for user input. User input is incredibly slow from a computer's perspective (e.g., typing at 7 characters per second). Instead of letting the CPU idle during this wait, a multitasking OS **rapidly switches** the CPU to another process. This happens so quickly that it gives the illusion that all programs are running simultaneously, providing a responsive user experience.

#### **The OS Mechanisms Required for Multiprogramming/Multitasking**
Running multiple processes concurrently is complex and requires the OS to provide several key features:

1.  **Memory Management (Chapters 9 & 10):** Having multiple processes in memory at once requires the OS to allocate memory to each process, protect each process's memory from others, and manage the movement of processes between memory and disk. This leads to the concept of **virtual memory**, which allows a program to run even if it's not entirely in physical RAM, making the system more flexible.

2.  **CPU Scheduling (Chapter 5):** When more than one process is ready to run, the OS must decide which one to run next. The algorithm that makes this decision is called the **scheduler**.

3.  **Protection (Chapter 17):** The OS must ensure that processes cannot interfere with each other or with the OS itself. This involves protecting resources like memory, the CPU, and I/O devices.

4.  **Process Synchronization and Communication (Chapters 6 & 7):** When processes need to interact (e.g., to share data), the OS must provide mechanisms to allow them to coordinate their activities safely to avoid corrupting data.

5.  **Deadlock Handling (Chapter 8):** The OS must manage the system to prevent or resolve **deadlocks**, a situation where two or more processes are stuck forever, each waiting for a resource held by the other.

6.  **File Systems and Storage Management (Chapters 11, 13, 14, 15):** Programs need to store and retrieve data permanently. The OS provides a **file system** on secondary storage (like hard drives) to manage this data in a structured way.

In summary, the simple goal of "running multiple programs at once" forces the operating system to become a sophisticated manager of all the system's resources. The rest of the textbook essentially details how the OS implements each of these required mechanisms.

### **1.4.2 Dual-Mode and Multimode Operation**

#### **The Need for Protection**
The operating system and all user applications share the same hardware. A critical job of the OS is to ensure that a faulty or malicious user program cannot disrupt the operation of other programs or the OS itself. To achieve this, the system must have a clear way to distinguish between code that is part of the trusted operating system and code that belongs to a regular user application. This is accomplished through hardware-supported **modes of execution**.

#### **Dual-Mode Operation**
The fundamental design is **dual-mode operation**, which provides two separate modes:
1.  **Kernel Mode (Privileged Mode):** Also known as supervisor mode or system mode. The OS runs in this mode. Code executing in kernel mode has complete, unrestricted access to all hardware and can execute every instruction in the CPU's instruction set.
2.  **User Mode:** User applications run in this mode. Code executing in user mode has restricted access. It cannot directly perform operations that affect the system's overall state.

A hardware bit, called the **mode bit**, is used to indicate the current mode. It is typically set to **0 for kernel mode** and **1 for user mode**.

#### **How the Transition Works**
(Refer to **Figure 1.13** for a visual diagram of this process).

1.  **Boot Time:** The hardware starts in kernel mode. The OS loads and then starts the first user application, switching the mode bit to user mode.
2.  **User Process Execution:** The user process runs in user mode.
3.  **System Call Request:** When a user process needs an OS service (like reading a file), it executes a **system call**. This is a special instruction that triggers a **trap** (a software interrupt).
4.  **Switch to Kernel Mode:** The hardware handles the trap by:
    *   Switching the mode bit from 1 (user) to 0 (kernel).
    *   Saving the current state of the user process.
    *   Transferring control to a predefined **interrupt vector** location, which points to the appropriate OS service routine (the **system-call handler**).
5.  **Execute in Kernel Mode:** The OS code now executes in kernel mode, with full privileges, to perform the requested service (e.g., accessing the disk controller).
6.  **Return to User Mode:** Once the service is complete, the OS executes a special instruction that:
    *   Switches the mode bit back to 1 (user).
    *   Restores the saved state of the user process.
    *   Returns control to the instruction immediately after the system call in the user program.

This transition also happens on **hardware interrupts** (like a timer interrupt or I/O completion) and other **traps** (like division by zero errors).

#### **Privileged Instructions: The Enforcer of Protection**
The mechanism that enforces this protection is the concept of **privileged instructions**. These are machine instructions that have the potential to cause harm (e.g., directly controlling I/O devices, managing memory, halting the CPU) and are designed to **execute only in kernel mode**.

*   If a user program attempts to execute a privileged instruction, the hardware does not execute it. Instead, it generates a trap, handing control to the OS. The OS then typically terminates the offending program for violating the rules.
*   The instruction to switch to kernel mode is itself a privileged instruction.

#### **Beyond Two Modes: Multimode Operation**
While dual-mode is the foundation, some systems use more than two modes for finer-grained control:
*   **Intel x86 CPUs** have four privilege **rings** (0 to 3). Ring 0 is kernel mode, and Ring 3 is user mode. Rings 1 and 2 are rarely used in practice.
*   **ARMv8 CPUs** have seven different modes.
*   **Virtualization Support:** CPUs that support hardware virtualization (Section 18.1) often have a separate mode for the **Virtual Machine Manager (VMM)**. The VMM runs with more privilege than a user process but less than the full kernel, allowing it to create and manage virtual machines safely.

#### **The Complete Life Cycle of Instruction Execution**
We can now describe the complete cycle:
1.  Control starts with the OS in **kernel mode**.
2.  Control is passed to a user application, and the mode is set to **user mode**.
3.  Control is returned to the OS via an **interrupt**, **trap**, or **system call**, switching the mode back to **kernel mode**.
4.  The OS handles the event and may return control to the same user process or a different one, switching back to **user mode**.

#### **Handling Errors**
This hardware protection also catches program errors. If a user program tries to execute an illegal instruction or access memory outside its allocated space, the hardware traps to the OS. The OS then handles this error, which usually means terminating the program abnormally. It may produce an error message and a **memory dump** (a file containing the program's memory state at the time of the crash) to aid in debugging.

In summary, dual-mode operation is the fundamental hardware mechanism that allows the operating system to maintain ultimate control over the computer, protecting itself and user programs from each other.

### **1.4.3 Timer**

#### **The Problem: Maintaining CPU Control**

Dual-mode operation protects the OS from a program that tries to execute a bad instruction, but what protects the system from a program that simply never gives up the CPU? A user program could accidentally get stuck in an infinite loop or deliberately refuse to call system services, effectively freezing the machine and preventing the OS or any other program from running.

To prevent a single program from monopolizing the CPU, the operating system uses a **timer**.

#### **How the Timer Works**

A timer is a hardware device that interrupts the CPU after a specified period of time. The operating system uses it like an alarm clock to regain control.

There are two general types:
*   **Fixed-rate timer:** Interrupts the CPU at a constant frequency (e.g., every 1/60th of a second).
*   **Variable timer:** Can be set to interrupt after a specific, variable time interval.

Most modern systems use a variable timer, which is typically implemented with two components:
1.  A **fixed-rate clock** that ticks at a constant frequency (e.g., every 1 millisecond).
2.  A **counter** register that the operating system can set.

Here's the process:
1.  The OS decides how much time to give a program (its **time quantum** or **time slice**), say 100 milliseconds.
2.  Before switching to user mode and starting the program, the OS loads the value `100` into the **counter**.
3.  The program runs in user mode.
4.  With every tick of the clock (every 1 ms), the hardware automatically **decrements the counter** by one. The program is unaware this is happening.
5.  When the counter reaches `0`, the timer hardware triggers an **interrupt**.
6.  This interrupt forces the CPU to switch from user mode to kernel mode, just like a system call, and transfers control back to the operating system's scheduler.

The OS can then decide what to do next: it might give the same program another time slice, or it might switch to a different program that is waiting to run. This mechanism is the foundation of **CPU scheduling** and **multitasking**.

**Example:** If the system has a 10-bit counter and a 1-millisecond clock, the counter can hold values from 0 to 1023. This means the OS can set the timer to interrupt after any interval from 1 ms to 1024 ms.

#### **Timer Implementation in Linux**

The text provides a real-world example with the Linux kernel:
*   **`HZ`:** This is a kernel configuration value that defines the frequency of the timer interrupts. For example, if `HZ = 250`, the timer interrupts the CPU 250 times per second, meaning an interrupt occurs every 4 milliseconds. This value can vary based on the system.
*   **`jiffies`:** This is a kernel variable that counts the total number of timer interrupts that have occurred since the system was booted. If `HZ` is 250, then `jiffies` increases by 250 every second. It's like a system-wide tick counter.

#### **Privileged Instruction**

Crucially, the instructions that load a new value into the timer counter are **privileged instructions**. A user program cannot set or modify the timer. If it could, a malicious program could set the timer to a huge value (like 100 years) and effectively disable the OS's ability to regain control, taking over the machine. Therefore, only the OS, running in kernel mode, is allowed to set the timer.

## **1.5 Resource Management**

An operating system is fundamentally a **resource manager**. Its primary job is to manage the various hardware and software resources of a computer system efficiently and fairly. The key resources it manages are:
*   The **CPU** (or CPUs)
*   **Memory** (RAM)
*   **File-storage space** (on disks)
*   **I/O devices** (like keyboards, mice, and network cards)

This section introduces how the OS manages these resources, starting with the most central concept: the process.

---

### **1.5.1 Process Management**

#### **What is a Process?**

A program sitting on your disk (like `chrome.exe` or `gcc`) is just a file—a passive set of instructions. A **process** is a **program in execution**. It is an active entity.

*   **Analogy:** A program is a recipe (a set of instructions). A process is the activity of a chef actually following that recipe, using ingredients (resources) like a bowl, a mixer, and an oven.
*   **Examples:** When you double-click a web browser icon, you start a process. Every running application on your PC or phone is one or more processes.

A process is more than just the program code (which is called the **text section**). It also includes the current state of the activity: the values in the CPU registers, the program counter (which points to the next instruction to execute), and the contents of the stack and memory.

#### **Process vs. Program**

This is a critical distinction:

| Program | Process |
| :--- | :--- |
| Passive entity (a file on disk) | Active entity (an executing instance) |
| Static | Dynamic - its state changes as it runs |
| Exists once | Can have multiple instances (e.g., three separate processes for three separate browser windows) |

Even if two processes are running the exact same program (like two users both using the same text editor), they are considered **separate execution sequences**. Each has its own memory space, its own program counter, and its own set of resources.

#### **Process Resources and Lifecycle**

A process needs resources to accomplish its task:
*   **CPU time:** To execute its instructions.
*   **Memory:** To hold its code and data.
*   **Files:** To read or write data.
*   **I/O devices:** To interact with the world.

These resources are allocated to the process by the OS when it is started. The process may also be given input data. For example, a process starting a web browser is given a URL as input. When the process finishes its task and terminates, the OS reclaims all the resources so they can be used by other processes.

#### **Single-Threaded vs. Multithreaded Processes**

*   A **single-threaded process** has a **single program counter**. This counter keeps track of the next instruction to execute. The execution is strictly sequential—one instruction after another.
*   A **multithreaded process** has **multiple program counters**, one for each "thread of execution" within the process. Threads allow a single process to perform multiple tasks concurrently (e.g., a web browser downloading a file in one thread while displaying a page in another). We will cover threads in detail in Chapter 4.

#### **The Operating System's Responsibilities in Process Management**

The OS is responsible for all activities related to process management. Its key duties are:

1.  **Process Creation and Deletion:** The OS must be able to create new processes (for both users and the OS itself) and clean up after them when they terminate.
2.  **Process Scheduling:** With many processes wanting to run but only one or a few CPUs, the OS must decide which process runs on which CPU and for how long. This is the job of the **CPU scheduler**.
3.  **Process Suspension and Resumption:** The OS must be able to pause (suspend) a process's execution and later continue (resume) it from the same point. This happens constantly due to timer interrupts and I/O requests.
4.  **Process Synchronization:** When processes need to communicate or share resources, the OS must provide mechanisms to ensure they do so in an orderly and correct way, preventing chaos (like two processes trying to print to the same printer at the same time).
5.  **Process Communication:** The OS provides methods for processes to exchange information with each other.

These concepts form the core of modern operating systems and are explored in depth in Chapters 3 through 7.

### **1.5.2 Memory Management**

#### **The Central Role of Main Memory**

As you learned in computer architecture, the **main memory** (RAM) is the central storage unit that the CPU interacts with directly. Think of it as the CPU's immediate workspace.

*   It's a large array of bytes, where each byte has a unique **address**.
*   It is a volatile, fast-access repository for data shared by the CPU and I/O devices.
*   Following the **von Neumann architecture**, the CPU must fetch both instructions and data from main memory to execute a program. During the **instruction-fetch cycle**, it reads the next instruction from memory. During the **data-fetch cycle**, it reads or writes the data the instruction needs.

A crucial point: the CPU can only directly access data that is in main memory. If data is on a disk, it **must** first be transferred to RAM before the CPU can process it. Similarly, for a program to run, its instructions must be loaded into memory.

#### **The Basic Memory Management Lifecycle**

For a single program to run, the process is straightforward:
1.  **Mapping and Loading:** The program's instructions and data must be mapped from their relative locations in the program file to specific, absolute addresses in physical memory and then loaded into those locations.
2.  **Execution:** The CPU fetches and executes instructions, accessing memory using these absolute addresses.
3.  **Termination:** When the program ends, the memory space it occupied is marked as available for the next program.

#### **The Need for Advanced Memory Management**

The simple "one program at a time" model is inefficient. To improve CPU utilization and allow for multitasking (running several programs concurrently), the operating system must keep **multiple programs in memory at the same time**.

This creates several challenges that the OS must solve through **memory management**:
*   How do we keep track of which parts of memory are free and which are in use?
*   How do we allocate memory to a new process without interfering with existing processes?
*   When memory is full, how do we decide which program (or part of a program) to remove to make space for a new one?
*   How do we protect one process's memory from being accessed or overwritten by another process?

There are many different memory-management schemes (e.g., paging, segmentation), and their effectiveness depends on the situation and the hardware support available. The choice of algorithm is a major design decision for an OS.

#### **The Operating System's Responsibilities in Memory Management**

The OS is responsible for all activities related to managing the computer's memory. Its key duties are:

1.  **Tracking Memory Usage:** The OS must maintain a record of every memory location, knowing whether it is allocated or free, and if allocated, which process is using it. This is often done with data structures like **bitmaps** or **linked lists**.
2.  **Allocating and Deallocating Memory:** When a process requests memory (e.g., when it starts or needs to store more data), the OS must find a suitable block of free memory, allocate it to the process, and update its records. When a process releases memory (e.g., when it terminates), the OS must mark that memory as free again.
3.  **Deciding What to Swap:** When the system needs more memory than is physically available, the OS must decide which processes (or parts of processes) should be temporarily moved **out** of memory to a secondary storage area (like a **swap space** on disk) to free up space. Later, it must decide when to move them back **into** memory. This process is crucial for running large programs or many programs simultaneously.

These techniques, which ensure efficient, fair, and safe use of memory, are discussed in detail in Chapters 9 and 10.

### **1.5.3 File-System Management**

#### **The Goal: A Logical View of Storage**

Dealing directly with the physical properties of storage devices (like magnetic platters on a hard drive or memory cells in an SSD) would be incredibly complex for users and programmers. The operating system solves this by providing a uniform, logical view of storage. It hides the messy hardware details behind a simple, abstract concept: the **file**.

A **file** is a logical storage unit—a collection of related information created by a user or program. The operating system is responsible for mapping these abstract files onto the actual physical media.

#### **What is a File?**

A file is an extremely general and powerful concept. It can contain anything:
*   **Programs:** Source code (e.g., `my_program.c`) or executable object code (e.g., `my_program.exe`).
*   **Data:** Numeric data, text documents, spreadsheets, images, music files (like MP3s), videos, etc.

Files can be:
*   **Free-form:** Like a simple text file (.txt) where the content has little inherent structure.
*   **Rigidly formatted:** Like a database file or an MP3 file, which must follow a specific internal structure for the data to be meaningful.

#### **The Role of the Operating System**

The OS provides a consistent way to work with files, regardless of the underlying physical media. These media can vary greatly in their characteristics:
*   **Type:** Magnetic hard disk, solid-state drive (SSD), CD/DVD, USB flash drive, network storage.
*   **Properties:** Access speed, capacity, data-transfer rate, and access method (sequential, like a tape; or random-access, like a disk).

The OS abstracts these differences away. The same system call (`open`, `read`, `write`) works for a file on an SSD, a hard drive, or a USB stick.

To help users and programs manage thousands of files, the OS organizes them into **directories** (or folders). Directories create a hierarchical structure, making it easier to locate and group related files.

Finally, when multiple users share a system, the OS must provide **protection**. It controls which users are allowed to access a file and what they are allowed to do with it (e.g., read, write, or execute).

#### **The Operating System's Responsibilities in File Management**

The OS is responsible for all activities related to managing files and storage. Its key duties are:

1.  **Creating and Deleting Files:** The OS provides the mechanism to create new files and delete them when they are no longer needed, managing the space on the storage device.
2.  **Creating and Deleting Directories:** The OS allows for the creation of directory structures to organize files and the removal of these directories when they are empty.
3.  **Providing File and Directory Manipulation Primitives:** The OS supplies fundamental operations (system calls) for working with files and directories. This includes:
    *   Opening and closing a file.
    *   Reading data from a file or writing data to a file.
    *   Repositioning within a file (seeking).
    *   Renaming files, moving files between directories, and listing directory contents.
4.  **Mapping Files to Storage:** This is a core function. The OS must keep track of which specific blocks of data on a physical storage device belong to which file. It manages the translation from a file's logical structure (a sequence of bytes) to the physical blocks on the disk.
5.  **Backing Up Files:** The OS often provides utilities to back up files to stable, non-volatile storage (like another disk or tape) to prevent data loss in case of hardware failure, accidental deletion, or other disasters. This ensures data persistence and integrity.

### **1.5.4 Mass-Storage Management**

#### **The Need for Secondary and Tertiary Storage**

As we know from memory management, **main memory (RAM)** is volatile and too small to hold all the data and programs a system needs permanently. Therefore, we rely on non-volatile **mass-storage** devices to serve as the permanent, high-capacity repository for the system.

*   **Secondary Storage (e.g., HDDs, SSDs):** This is the primary online storage used for active data and programs. Programs like your web browser or word processor are stored here until they are loaded into memory for execution. These devices are the source of data for processing and the destination for saving results. Because the CPU interacts with these devices so frequently, their efficient management is **crucial for overall system performance**.
*   **Tertiary Storage (e.g., Magnetic Tapes, Optical Discs):** This is used for offline storage that is slower, lower cost, and often higher capacity. Its uses include:
    *   Creating **backups** of important data from secondary storage.
    *   Storing **seldom-used data** (archival storage).
    *   Long-term storage where immediate access is not required.

Tertiary storage is not directly involved in the day-to-day speed of the system, but it is still important for data integrity and archival purposes.

#### **The Operating System's Responsibilities in Secondary Storage Management**

The proper management of the secondary storage subsystem is a major task for the OS. Its responsibilities include:

1.  **Mounting and Unmounting:** This is the process of preparing a storage device (like a USB drive or a new hard disk) for use by the system (**mounting**) and properly disconnecting it when it's no longer needed (**unmounting**). This ensures data is written correctly and the filesystem is kept intact.
2.  **Free-Space Management:** The OS must keep track of all the free blocks on the storage device so it can quickly allocate space when a new file is created or an existing file grows.
3.  **Storage Allocation:** When a file is saved, the OS must decide which specific free blocks on the disk to assign to it. Different allocation methods (contiguous, linked, indexed) have different performance trade-offs.
4.  **Disk Scheduling:** This is a critical performance activity. When multiple requests are made to read from or write to the disk, the OS must decide the order in which to service these requests. The goal of the **disk scheduling algorithm** is to minimize the seek time of the disk arm, thereby maximizing the total throughput of the storage subsystem. The speed of the disk scheduler can directly impact how fast the entire computer feels.
5.  **Partitioning:** The OS allows a physical disk drive to be divided into logical sections called **partitions**. Each partition can be managed as a separate storage device, often with its own filesystem. This is useful for organizing data or installing multiple operating systems.
6.  **Protection:** The OS must enforce access-control mechanisms to ensure that unauthorized users cannot access files on secondary storage.

#### **Tertiary Storage Management**

While sometimes managed by dedicated applications, the operating system can also handle tertiary storage. Its tasks in this area include:
*   Managing the insertion and removal (**mounting/unmounting**) of tapes or discs.
*   Controlling which process gets exclusive access to a tertiary storage device.
*   Automating the **migration** of data from secondary storage (e.g., a hard drive) to tertiary storage (e.g., a tape archive) based on policies.

### **1.5.5 Cache Management**

#### **The Caching Principle**

**Caching** is a fundamental concept in computer systems designed to overcome speed mismatches between different components. The core idea is simple:

1.  Data is kept in a primary, larger, but slower storage system (e.g., main memory).
2.  When that data is used, a copy is placed into a smaller, but much faster, storage system called the **cache**.
3.  The next time the data is needed, the system first checks the cache.
    *   If the data is found there (a **cache hit**), it is used directly from the fast cache, saving significant time.
    *   If the data is not in the cache (a **cache miss**), it must be retrieved from the slower primary storage, and a copy is placed into the cache for future use.

This principle is based on the **locality of reference**, which means that programs tend to access the same data or instructions repeatedly over short periods of time.

#### **The Storage Hierarchy**

Caching creates a **storage hierarchy**, a pyramid of storage types where each level is smaller, faster, and more expensive per byte than the level below it. **Figure 1.14** provides a detailed comparison of these levels.

Let's break down the hierarchy from top to bottom:

| Level | Name | Typical Size | Managed By | Role & Purpose |
| :--- | :--- | :--- | :--- | :--- |
| 1 | **Registers** | < 1 KB | Compiler | These are the CPU's own internal, ultra-fast memory locations. They act as a cache for the **L1 cache**. The compiler decides what data to keep in registers. |
| 2 | **Cache** (L1, L2, L3) | < 16 MB | Hardware | A high-speed memory (SRAM) that caches frequently used instructions and data from main memory. Its management (what to store, what to replace) is handled entirely by hardware logic. |
| 3 | **Main Memory** (RAM) | < 64 GB | Operating System | The primary working memory for all running programs. The OS manages what parts of programs and data are loaded into RAM from disk. It is a cache for **secondary storage**. |
| 4 | **Solid-State Disk** (SSD) | < 1 TB | Operating System | Fast, non-volatile secondary storage. The OS manages it as a cache for the even slower magnetic disk or as primary storage itself. |
| 5 | **Magnetic Disk** (HDD) | < 10 TB | Operating System | The main repository for all programs and data. It is cached by main memory and SSDs. |

**Key Takeaway from Figure 1.14:** As you move down the hierarchy, the **access time increases dramatically** (from 0.25 nanoseconds for registers to 5,000,000 ns for a disk), while the **capacity increases**.

#### **Operating System's Role in Caching**

The OS is primarily concerned with managing the software-controlled caches in the hierarchy, specifically:
*   **Main Memory as a Cache for Disk:** The OS decides which parts of a file or program to keep in RAM, anticipating that they will be needed soon.
*   **Disk Caches:** The OS may use a portion of main memory to cache frequently accessed disk blocks, speeding up disk I/O operations.

The main challenge in cache management is **cache size is limited**. When the cache is full and new data needs to be brought in, the OS must decide which old data to **replace**. The choice of this **replacement policy** (e.g., Least Recently Used - LRU) is critical for performance.

#### **The Problem of Coherency**

Caching introduces a major complication: **multiple copies of the same data can exist simultaneously at different levels of the hierarchy.**

**Example (Follow Figure 1.15):**
Imagine an integer `A` stored in a file on a **magnetic disk**.
1.  A program needs to increment `A`. The OS loads the disk block containing `A` into **main memory**.
2.  The CPU then copies `A` into the hardware **cache**.
3.  Finally, `A` is loaded into a **register** where the increment operation (`A+1`) happens.

Now, there are four copies of `A`: on the disk, in RAM, in the cache, and in the register. After the increment, only the value in the **register** is correct. The others are now out-of-date (stale).

In a single-process system, this is manageable because all accesses will go to the highest-level (most recent) copy. However, it creates serious problems in more complex environments:

1.  **Multitasking:** If the OS switches the CPU to another process after `A` is incremented in the register but before it's written back to memory, the second process might read the old, stale value of `A` from main memory.
2.  **Multiprocessor Systems:** If multiple CPUs have their own caches, `A` could be copied into several caches. If one CPU updates `A` in its local cache, the other CPUs will have stale copies. This is the **cache coherency** problem, which is typically solved by hardware protocols that invalidate or update all other copies when one is changed.
3.  **Distributed Systems:** The problem is magnified when copies (replicas) of a file are stored on different computers across a network. Keeping all these geographically separated replicas consistent when updates occur is a complex challenge.

The OS must implement mechanisms, especially in memory and file management, to ensure that processes always see the most recent version of data, despite this complex caching hierarchy.

### **1.5.6 I/O System Management**

#### **The Goal: Hiding Hardware Peculiarities**

I/O devices are incredibly diverse—keyboards, mice, disk drives, network cards, and graphics cards all function in very different ways. A key purpose of the operating system is to hide these specific hardware details, or "peculiarities," from users and applications. It provides a simple, uniform interface to perform I/O operations, regardless of the underlying device.

#### **The I/O Subsystem**

To achieve this, the OS contains a dedicated **I/O subsystem**. Think of it as a specialized department that handles all communication with the outside world. Its main components are:

1.  **A Memory-Management Component:** This part deals with transferring data between devices and main memory efficiently. It uses several techniques:
    *   **Buffering:** Storing data temporarily in an area of memory (a **buffer**) to smooth out the speed differences between a fast CPU and a slow device. For example, data being sent to a printer is first stored in a buffer so the CPU doesn't have to wait for the slow printing process.
    *   **Caching:** Keeping a copy of frequently accessed data in a faster memory (as discussed in 1.5.5) to speed up I/O operations.
    *   **Spooling:** This is a high-level form of buffering used for devices like printers that cannot be multiplexed (you can't print lines from two different documents simultaneously). The **spooler** intercepts output for a device, stores each task as a separate file on disk, and then feeds them to the device one at a time. This allows multiple processes to "finish" their print jobs quickly, even though the physical printing happens sequentially.

2.  **A General Device-Driver Interface:** This is a standard set of commands (an API) that the rest of the OS uses to talk to any device. It provides a common language for functions like `read`, `write`, and `open`.

3.  **Device Drivers:** For each specific type of hardware device, there is a **device driver**. The driver is a software module that understands the exact details and command set of its assigned device. It translates the generic requests from the general device-driver interface into the specific, low-level instructions that the hardware controller expects. **Only the device driver needs to know the peculiarities of the device.**

As discussed earlier in the chapter, this subsystem relies heavily on **interrupts** and **DMA** to handle I/O efficiently, freeing the CPU from being tied up.

---

## **1.6 Security and Protection**

#### **The Difference Between Protection and Security**

In a multi-user, multitasking system, it is essential to control which processes can access which resources. The OS provides mechanisms for this, which fall into two related categories:

*   **Protection:** This is the **internal** mechanism for controlling the access of processes or users to the resources defined by the computer system. It is about ensuring that each component of a system uses only the resources it is authorized to use.
    *   **Examples:** Memory protection hardware ensures a process can only access its own memory space. The timer protects the CPU from being monopolized by a single process. Privileged instructions protect device controllers.

*   **Security:** This is the **external and internal** defense of the system from malicious attacks. Protection mechanisms are the tools used to build security. A system can have perfect protection mechanisms but still be insecure if, for example, a user's password is stolen.
    *   **Examples of attacks:** Viruses, worms, denial-of-service attacks, identity theft, theft of service.

In short, **protection is about ensuring controlled access; security is about defending against attackers.**

#### **User and Group Identities**

For protection and security to work, the system must be able to identify who is who. The OS maintains a database of users:

*   **User ID (UID):** A unique numerical identifier assigned to each user. In Windows, this is called a **Security ID (SID)**. When a user logs in, all processes created by that user are "tagged" with this ID.
*   **Group ID (GID):** Users can be organized into groups. This allows the OS to manage permissions for collections of users efficiently (e.g., granting read access to a file to everyone in the "students" group). A user can belong to one or more groups.

#### **Privilege Escalation**

Sometimes a user needs to temporarily perform an action that requires higher privileges than they normally have (e.g., changing their password, which requires writing to the system password file).

Operating systems provide mechanisms for **privilege escalation**. A common example in UNIX-like systems is the **setuid (set user ID)** attribute. When a program with the setuid bit enabled is executed, it does not run with the user's ID, but with the ID of the program's owner (often the root/administrator). This allows a regular user to execute specific, privileged operations in a controlled way. The process runs with this **effective UID** until it relinquishes the privilege or ends.

These concepts of protection and security are explored in depth in Chapters 16 and 17.

## **1.7 Virtualization**

#### **What is Virtualization?**

**Virtualization** is a technology that takes the physical hardware of a single computer—the CPU, memory, disks, and network cards—and abstracts it to create multiple, isolated, virtual execution environments. Each of these environments, called a **virtual machine (VM)**, behaves like a separate private computer, complete with its own operating system.

A user can run multiple, different operating systems (like Windows, Linux, and macOS) simultaneously on the same physical machine and switch between them just like switching between application windows.

#### **Virtualization vs. Emulation**

It's important to distinguish virtualization from a related concept: **emulation**.

*   **Emulation:** This involves **simulating** the hardware of one type of CPU on a different type of CPU. The emulator translates every instruction from the "guest" system into instructions the "host" system's CPU can understand.
    *   **Example:** When Apple switched from PowerPC to Intel processors, they provided "Rosetta," which emulated a PowerPC CPU on an Intel CPU, allowing old applications to run.
    *   **Drawback:** Emulation is computationally expensive because of the translation process, leading to significant performance loss.

*   **Virtualization:** This requires the guest operating system to be compiled for the **same CPU architecture** as the host machine. The virtualization software does not need to simulate a different CPU; it simply manages access to the real, physical CPU.
    *   **Benefit:** Because the guest OS is running on native hardware, performance is much higher than with emulation.

#### **How Virtualization Works: The Virtual Machine Manager**

The core software that enables virtualization is called the **Virtual Machine Manager (VMM)**, also known as a **hypervisor**.<br>
![image.png](attachment:image.png) <br>
Refer to **Figure 1.16** for a visual comparison:
*   **(a) Traditional System:** A single operating system kernel manages hardware and runs multiple processes.
*   **(b) Virtualized System:** A **Virtual Machine Manager** runs directly on the hardware. The VMM then creates and manages multiple virtual machines (VM1, VM2, VM3). Each VM runs its own, full operating system kernel and its own set of processes.

The VMM is responsible for:
1.  **Resource Allocation:** It allocates shares of the physical CPU, memory, and I/O devices to each virtual machine.
2.  **Isolation and Protection:** It ensures that each virtual machine is isolated from the others. A crash or problem in one VM does not affect the others.

#### **Why is Virtualization Useful?**

Even though modern OSs are great at multitasking, virtualization is extremely important for several reasons:

1.  **Consolidation:** In data centers, instead of running one application per physical server (which wastes resources), many virtual machines can run on a single, powerful physical server. This reduces hardware costs, power consumption, and physical space needs.
2.  **Development and Testing:** Software developers can test their applications on different operating systems (Windows, Linux, etc.) all on a single laptop or server.
3.  **Legacy Application Support:** A business can run an old application that only works on Windows XP inside a Windows XP virtual machine on a modern computer.
4.  **Cloud Computing:** The entire cloud infrastructure is built on virtualization. When you rent a server from a cloud provider, you are almost always getting a virtual machine.

#### **Types of Virtualization**

The text mentions an evolution:
*   **Hosted VMMs (Type 2):** The VMM runs as an application on top of a host operating system (e.g., VMware Workstation, Oracle VirtualBox). This is common for desktop use.
*   **Bare-Metal VMMs (Type 1):** The VMM is installed directly on the physical hardware and acts as the host operating system itself (e.g., VMware ESXi, Citrix XenServer). This is common in data centers for maximum performance and efficiency.

Virtualization is a deep topic, and its full implementation details are covered in Chapter 18.

## **1.8 Distributed Systems**

#### **What is a Distributed System?**

A **distributed system** is a group of independent, physically separate computers that are connected by a network. These computers work together to appear as a single, coherent system to the user. The key idea is that multiple machines, which might be different from each other (heterogeneous), collaborate to provide services.

The goals of creating a distributed system are to improve:
*   **Computation Speed:** Tasks can be split up and run on multiple computers in parallel.
*   **Functionality:** Users can access services and resources that aren't available on their local machine.
*   **Data Availability:** Data can be replicated across multiple machines, so it's still accessible even if one machine fails.
*   **Reliability:** If one computer fails, the system as a whole can continue to operate using the remaining computers.

#### **The Role of Networking**

A **network** is the fundamental communication path that makes distributed systems possible. It is simply a connection between two or more computers. The operating system handles network access in different ways:
*   Some OSs make networking look like file access (e.g., you can access a remote file as if it were on your local disk using a network file system like **NFS**).
*   Other times, users or applications must explicitly use network functions (e.g., using an **FTP** client to transfer a file).

Most systems support a mix of these approaches. The most common network protocol suite is **TCP/IP**, which forms the backbone of the internet. From the OS's perspective, a network is managed by a **network interface card (NIC)** and its corresponding **device driver**.

#### **Types of Networks**

Networks are categorized based on the geographical distance they cover:

| Type | Name | Typical Range | Example |
| :--- | :--- | :--- | :--- |
| **PAN** | Personal-Area Network | Several feet | Connecting a wireless headset to a phone via Bluetooth. |
| **LAN** | Local-Area Network | Room, Building, Campus | A network connecting all computers in a university department using Ethernet or Wi-Fi. |
| **MAN** | Metropolitan-Area Network | A city | A network connecting libraries across a city. |
| **WAN** | Wide-Area Network | Country, Global | The internet, or a private network connecting a company's offices worldwide. |

Networks can use various media to transmit data, including copper wires, fiber optic cables, and wireless transmissions (radio waves, microwaves, satellites).

#### **Network Operating Systems vs. Distributed Operating Systems**

There's a spectrum of how closely integrated the computers in a network are:

1.  **Network Operating System:**
    *   In this model, each computer is **autonomous** and runs its own independent operating system.
    *   The OS is "network-aware"—it provides features that allow it to share files and exchange messages with other computers on the network.
    *   However, users are typically aware that they are accessing remote resources. For example, they might have to log into a remote machine explicitly to use its files.
    *   This is a **loosely coupled** system.

2.  **Distributed Operating System:**
    *   This is a more advanced, **tightly coupled** model.
    *   The different computers communicate so closely that they create the **illusion of a single, unified operating system** controlling the entire network.
    *   A user doesn't need to know where a file is stored or which CPU is executing a program; the system handles all of this transparently.
    *   This is much more complex to implement but provides a simpler experience for the user.

The concepts of networking and distributed systems are explored in detail in Chapter 19.

## **1.9 Kernel Data Structures**

The efficiency of an operating system depends not just on its algorithms but also on the data structures it uses to organize information. This section covers fundamental data structures that are ubiquitous in kernel code.

---

### **1.9.1 Lists, Stacks, and Queues**

#### **The Limitation of Arrays**

An **array** is a simple data structure where any element can be accessed directly by its index, much like how main memory is addressed. However, arrays have limitations for OS tasks:
*   They are inefficient for storing data items of varying sizes.
*   Inserting or deleting an item in the middle of an array requires shifting all subsequent elements, which is computationally expensive.

#### **Linked Lists**

To overcome these limitations, operating systems heavily use **lists**. In a list, items are accessed sequentially. The most common implementation is a **linked list**, where each item (or **node**) contains data and a pointer to the next node.

There are several types of linked lists, as shown in the figures:

*   **Singly Linked List (Figure 1.17):** Each node points to the next node in the list. The last node points to `NULL`, indicating the end of the list.
*   **Doubly Linked List (Figure 1.18):** Each node has pointers to both its **predecessor** (previous) and **successor** (next) node. This allows for traversal in both directions but requires more memory per node.
*   **Circularly Linked List (Figure 1.19):** The last node points back to the first node, creating a circle. This is useful for applications that cycle through data continuously.

**Advantages of Linked Lists:**
*   They can easily handle items of different sizes.
*   Insertion and deletion are very efficient (O(1)) once the position is found, as they only require updating a few pointers.

**Disadvantage of Linked Lists:**
*   Finding a specific item requires traversing the list from the beginning, which has **linear time complexity - O(n)**.

#### **Stacks**

A **stack** is a data structure that follows the **Last-In, First-Out (LIFO)** principle. Think of a stack of plates: you add a plate to the top, and you take a plate from the top.

The two fundamental operations are:
*   **Push:** Add an item to the top of the stack.
*   **Pop:** Remove the top item from the stack.

**Operating System Use Case:**
Stacks are crucial for managing **function calls**. When a function is called, the OS pushes the **return address** (where to go back to), parameters, and local variables onto a region of memory called the **call stack**. When the function returns, these items are popped off the stack, and the CPU resumes from the return address.

#### **Queues**

A **queue** is a data structure that follows the **First-In, First-Out (FIFO)** principle. Think of a line of people waiting for a service: the first person to join the line is the first one to be served.

**Operating System Use Cases:**
Queues are everywhere in operating systems:
*   **Printer Queue:** Print jobs are sent to a queue and printed in the order they were received.
*   **CPU Scheduling:** As you will see in Chapter 5, processes that are ready to run are placed in a **ready queue**. The scheduler selects the next process from the front of this queue to run on the CPU.
*   **I/O Device Waiting:** Processes waiting for an I/O device (like a disk) are placed in a device queue.

These simple data structures form the building blocks for more complex kernel components, allowing the OS to manage resources and control flow efficiently.

### **1.9.2 Trees**

#### **What is a Tree?**

A **tree** is a hierarchical data structure. Data is organized in nodes connected by parent-child relationships, much like a family tree or a company's organizational chart.

*   **General Tree:** A parent node can have any number of child nodes.
*   **Binary Tree:** A more restricted and common form where a parent node can have at most **two children**, typically called the **left child** and the **right child**.

#### **Binary Search Trees (BST)**

A **Binary Search Tree (BST)** is a binary tree with an important ordering property: for any given node, the values in its **left subtree** are all **less than or equal** to the node's value, and the values in its **right subtree** are all **greater than** the node's value.

Refer to **Figure 1.20** for an example. The root node holds the value `17`. All values in its left subtree (`6`, `12`, `14`) are less than 17. All values in its right subtree (`35`, `38`, `40`) are greater than 17. This property holds for every node in the tree.

This structure allows for efficient searching. To find a value, you start at the root and compare the target value to the current node. If it's smaller, you go to the left child; if it's larger, you go to the right child. You repeat this process until you find the value or reach a `NULL` pointer.

*   **Worst-Case Performance:** If items are inserted in sorted order (e.g., 1, 2, 3, 4), the tree effectively becomes a linked list. Searching for an item in this degenerate case has a worst-case performance of **O(n)**.

#### **Balanced Binary Search Trees**

To avoid the worst-case scenario, we use **balanced binary search trees**. These trees use algorithms to ensure that the tree remains "bushy" and does not become a long chain. In a balanced tree with `n` items, the maximum number of levels from the root to a leaf is proportional to **log₂ n** (written as **O(log n)**).

This guarantees that the worst-case search, insertion, and deletion times are **O(log n)**, which is very efficient even for large values of `n`.

**Operating System Use Case:**
The text mentions that the Linux kernel uses a specific type of balanced BST called a **red-black tree** in its CPU-scheduling algorithm (specifically, the Completely Fair Scheduler). This allows it to efficiently manage and select the next process to run.

---

### **1.9.3 Hash Functions and Maps**

#### **The Goal: Constant-Time Access**

Searching through a list or even a balanced tree requires multiple steps (O(n) or O(log n)). A **hash function** is a technique that aims to achieve **constant-time O(1)** data retrieval, meaning the time to find an item is (ideally) the same regardless of how many items are stored.

#### **How Hashing Works**

A **hash function** takes a piece of data (like a string or a number) as input, performs a calculation, and outputs a numeric value called a **hash value** or **hash code**.

This hash value is then used as an **index** into a table (usually an array) to directly locate the data. Instead of searching through all items, you compute the hash and go straight to the corresponding array slot.

#### **Hash Collisions**

A potential problem is that two different inputs might produce the same hash value. This is called a **hash collision**.

*   **Solution:** The common solution is to have each slot in the hash table hold a **linked list**. All items that hash to the same index are stored in a list at that location. When retrieving an item, the hash function points you to the correct list, and then you perform a (hopefully short) linear search within that list.
*   **Efficiency:** The quality of a hash function is measured by how well it distributes items evenly across the table, minimizing collisions. A good hash function with few collisions provides performance close to O(1). A bad one can degrade to O(n).

#### **Hash Maps**

A **hash map** (or hash table) is a data structure that uses hashing to store and retrieve `[key: value]` pairs.

**Operating System Use Case:**
The text provides a classic example: user authentication.
1.  The system stores a table of `[username: password]` pairs.
2.  When a user enters their username and password, the system applies the **hash function to the username**.
3.  The resulting hash value is used as an index to instantly retrieve the stored password associated with that username from the table.
4.  The system then compares the retrieved password with the one the user entered.

This is much faster than searching through a list of all users every time someone logs in. Hash maps are used throughout operating systems for tasks that require very fast lookups, such as file system directory lookups and managing kernel objects.

### **1.9.4 Bitmaps**

#### **What is a Bitmap?**

A **bitmap** (or bit array) is a simple but powerful data structure consisting of a sequence of `n` binary digits (bits). Each bit, which can be either 0 or 1, is used to represent the status of a corresponding item.

For example, a bitmap can be used to track the availability of `n` resources:
*   **Bit value 0:** Could mean the resource is **available**.
*   **Bit value 1:** Could mean the resource is **unavailable** (or vice versa, the convention is arbitrary).

The position of the bit in the string corresponds to the resource ID. The value of the bit at the `i`-th position tells you the status of the `i`-th resource.

**Example from the text:**
Consider the bitmap: `001011101`

*   Bit 0: 0 -> Resource 0 is **available**
*   Bit 1: 0 -> Resource 1 is **available**
*   Bit 2: 1 -> Resource 2 is **unavailable**
*   Bit 3: 0 -> Resource 3 is **available**
*   Bit 4: 1 -> Resource 4 is **unavailable**
*   Bit 5: 1 -> Resource 5 is **unavailable**
*   Bit 6: 1 -> Resource 6 is **unavailable**
*   Bit 7: 0 -> Resource 7 is **available**
*   Bit 8: 1 -> Resource 8 is **unavailable**

#### **The Power of Space Efficiency**

The primary advantage of a bitmap is its extreme **space efficiency**. A single bit is the smallest unit of data a computer can address.

*   **Comparison:** If you were to use a Boolean variable (which typically occupies one byte, or 8 bits, in languages like C) to track each resource's status, your data structure would be **8 times larger** than a bitmap.
*   **Significance:** This efficiency is critical when you need to track the status of thousands or millions of items. The memory savings are enormous.

#### **Operating System Use Case: Disk Block Management**

A classic use of bitmaps in operating systems is for **free-space management** on a disk.

*   A disk is divided into many small units called **disk blocks**.
*   A large disk can have millions of these blocks.
*   The file system uses a bitmap where each bit corresponds to one disk block.
    *   **Bit value 0:** Block is **free** and available for allocation.
    *   **Bit value 1:** Block is **allocated** to a file and is in use.

When a file needs to be created or extended, the OS can quickly scan the bitmap to find a free block (a '0' bit). When a file is deleted, the OS simply sets the bits corresponding to its blocks back to '0'. This makes allocation and deallocation very fast.

---

### **Linux Kernel Data Structures**

The text provides a helpful note on where to find these data structures in a real-world OS, the Linux kernel. This demonstrates that these are not just theoretical concepts but are used extensively in practice.

*   **Linked Lists:** The implementation is found in the include file `<linux/list.h>`.
*   **Queues (kfifo):** The implementation for a queue (called a `kfifo` in Linux) is in the source file `kfifo.c`.
*   **Balanced Binary Search Trees:** Linux uses red-black trees, and their implementation details are in `<linux/rbtree.h>`.

#### **Summary**

In summary, fundamental data structures like **lists, stacks, queues, trees, hash maps, and bitmaps** are the building blocks of operating system kernels. They are used to manage processes, memory, files, and all other system resources efficiently. Understanding these structures is key to understanding how the OS itself is implemented.

## **1.10 Computing Environments**

This section discusses how operating systems are used in different settings, from offices to homes, and how these environments have evolved over time.

#### **1.10.1 Traditional Computing**

The concept of "traditional computing" has changed significantly. The clear boundaries that once existed between different types of computer systems have become blurred due to advancements in networking and web technologies.

**The Evolution of the Office Environment:**
*   **Past:** A typical office used to consist of individual personal computers (PCs) connected to a local network. Specialized computers called **servers** provided central services like file storage and printing. Remote access to the office network was difficult, and portability was limited to laptop computers that had to be physically carried and connected.
*   **Present:** Modern offices are defined by web technologies and high-speed Wide Area Networks (WANs).
    *   **Portals:** Companies now create internal websites (portals) that employees can access securely from anywhere, reducing the reliance on direct connections to internal servers.
    *   **Network Computers / Thin Clients:** These are simplified computers that act more like terminals. They rely heavily on a central server to do most of the processing. They are used when easier maintenance or stronger security is needed, as the software and data are centrally managed.
    *   **Mobility:** Mobile devices like smartphones and tablets can synchronize with desktop computers and connect directly to company networks via Wi-Fi or cellular data to access the web portal.

**The Evolution of the Home Environment:**
*   **Past:** Homes typically had one computer with a slow dial-up modem connection to the internet or a remote office.
*   **Present:** High-speed internet is now common and affordable. This has transformed home networks.
    *   Home computers can now act as **servers** themselves (e.g., serving web pages or media).
    *   Homes often have complex networks that include multiple devices like printers, client PCs, and servers.
    *   **Firewalls** are essential for home network security. A firewall is a system (often part of a router) that controls the incoming and outgoing network traffic based on security rules, protecting the devices on the network from unauthorized access.

**A Note on Historical System Types:**
In the past, when computing resources were scarce and expensive, operating systems were designed to maximize resource utilization. There were two main types:
1.  **Batch Systems:** These processed jobs (like running a program on a large dataset) one after another in a batch. Input was predetermined from files, not interactive users.
2.  **Interactive Systems:** These waited for direct input from a user.

To make the most of these expensive machines, **time-sharing** was developed. Time-sharing systems allow multiple users to interact with the same computer simultaneously. The operating system uses a timer and scheduling algorithms to rapidly switch the CPU between each user's processes. This switching happens so fast that it gives each user the illusion that they have their own dedicated machine.

**Time-Sharing Today:**
While traditional multi-user time-sharing systems are rare, the fundamental technique is still used everywhere. On your personal laptop, the CPU is being time-shared between all the processes you have running—your web browser, your music player, system background tasks, and each individual tab in the browser might be its own process. The operating system gives a small slice of CPU time to each process, creating the experience of multitasking. So, the core idea of time-sharing is now applied to the processes of a single user.

### **1.10.2 Mobile Computing**

**Mobile computing** refers to the use of handheld devices like smartphones and tablets. Their key features are portability and light weight.

**Evolution of Mobile Devices:**
*   **Past:** Initially, mobile devices sacrificed screen size, memory capacity, and overall power compared to desktops and laptops. This trade-off was made to gain mobile access to basic services like email and web browsing.
*   **Present:** The functionality gap has narrowed significantly. Modern mobile devices are so powerful that it's often hard to tell the difference between a high-end tablet and a laptop. In fact, mobile devices now provide unique functionalities that are either impossible or impractical on traditional computers.

**Unique Features and Applications:**
Mobile devices are used for a vast range of tasks beyond communication, including media consumption (music, video, books) and content creation (photos, HD video recording and editing). Their unique hardware has enabled entirely new application categories:
*   **GPS (Global Positioning System):** An embedded chip that uses satellites to determine the device's exact location on Earth. This is crucial for navigation apps (like Google Maps) and for finding nearby services.
*   **Accelerometer:** Detects the device's orientation relative to the ground and senses motion like tilting and shaking. This allows for intuitive controls in games and is essential for...
*   **Gyroscope:** Works with the accelerometer to provide more precise orientation data. Together, these sensors enable **Augmented-Reality (AR)** applications, which overlay digital information onto a live view of the real world through the device's camera. It's difficult to imagine such applications on a traditional, non-mobile computer.

**Technical Constraints and Connectivity:**
*   **Networking:** Mobile devices connect to online services using **IEEE 802.11 (Wi-Fi)** wireless networks or cellular data networks (4G/5G).
*   **Hardware Limitations:** Despite their power, mobile devices still have more limited storage and processing speed compared to desktop PCs. For example, a smartphone might have 256 GB of storage, while a desktop could have 8 TB. To conserve battery life, mobile processors are often smaller, slower, and have fewer cores than their desktop counterparts.

**Dominant Mobile Operating Systems:**
Two operating systems dominate this space:
1.  **Apple iOS:** Designed to run exclusively on Apple's devices like the iPhone and iPad.
2.  **Google Android:** An open-source OS that powers smartphones and tablets from many different manufacturers (like Samsung, Google, etc.).

We will examine these OSes in more detail in Chapter 2.

---

### **1.10.3 Client-Server Computing**

This is a fundamental model for organizing networked systems. In a **client-server system**, the workload is divided between two types of computers:
*   **Servers:** Powerful systems that provide services or resources.
*   **Clients:** Devices (like desktops, laptops, or smartphones) that use those services.

The general structure of this system is shown in **Figure 1.22**. Clients send requests over a network, and servers respond to those requests.
<br>![image.png](attachment:image.png)<br>
**Categories of Servers:**
Server systems can be broadly classified into two types based on the service they provide:

1.  **Compute Servers:**
    *   **What they do:** They provide an interface for clients to request that a specific action or computation be performed.
    *   **How it works:** The client sends a request (e.g., "process this data"). The server executes the action and sends the result back to the client.
    *   **Example:** A database server. When you search for a product on a website, your browser (the client) sends a query to the website's database server. The server processes the query, finds the matching products, and sends the results back to your browser.

2.  **File Servers:**
    *   **What they do:** They provide a file-system interface, allowing clients to create, read, update, and delete files.
    *   **How it works:** The client requests a specific file, and the server sends the entire file over the network.
    *   **Example:** A **web server** is a classic file server. When you visit a webpage, your browser requests the HTML, CSS, and image files from the web server. The server then sends those files to your browser to be displayed. The files can range from simple text to complex multimedia like high-definition video.

### **1.10.4 Peer-to-Peer Computing**

**Peer-to-Peer (P2P)** computing is a different model for building distributed systems. Unlike the client-server model, there is no permanent distinction between clients and servers. Instead, every computer (or "node") in the network is considered a **peer**. Each peer can act as both a **client** (requesting a service) and a **server** (providing a service), depending on the situation.

**Advantage over Client-Server:**
The main advantage of a P2P system is that it eliminates the **bottleneck** of a central server. In a client-server system, if the server fails or gets overloaded with requests, the entire service can go down or become slow. In a P2P system, services can be provided by many different nodes distributed across the entire network, making the system more robust and scalable.

**How Peer-to-Peer Works: Discovering Services**
For a P2P system to function, a node must first join the network. Once it's part of the network, a key challenge is figuring out which peer offers the service or resource it needs. There are two primary ways this is accomplished:

1.  **Centralized Lookup Service:**
    *   **How it works:** When a node joins the network, it tells a **centralized server** what services or resources it can provide. This server acts as a directory or index. When a node needs a service, it first contacts this central lookup server to ask, "Who has what I need?" The lookup server responds with the address of the peer that can provide the service. After that, the two peers communicate directly with each other.
    *   **Analogy:** This is like a library's central card catalog. You go to the catalog to find which shelf has the book you want, then you go directly to that shelf to get it.

2.  **Decentralized Discovery (No Central Server):**
    *   **How it works:** This method uses no central directory. Instead, a peer that needs a service **broadcasts** a request to all the other peers it is connected to. The request essentially asks, "Does anyone have this file or service?" If a peer receives the request and can fulfill it, that peer responds directly to the requester. To make this work, the system requires a **discovery protocol**—a set of rules that allows peers to find each other and advertise their services.
    *   This scenario is illustrated in **Figure 1.23**, which shows a peer-to-peer system with no centralized service.
    *   **Analogy:** This is like shouting a question in a crowded room. Anyone who knows the answer can shout it back to you.

**Historical and Modern Examples:**
P2P networks became famous in the late 1990s with file-sharing applications:
*   **Napster:** Used a **centralized lookup service**. A central server maintained an index of all the music files available on users' computers. When you searched for a song, you queried Napster's central server, which told you which user had the file. The actual file transfer then happened directly between your computer and the other user's computer. Napster was shut down due to copyright infringement lawsuits.
*   **Gnutella:** Used a **decentralized discovery** approach. When you searched for a file, your request was broadcast to other Gnutella users. Those who had the file would respond to you directly. This lack of a central server made it harder to shut down.

**A Hybrid Example: Skype**
Skype (especially its earlier versions) is a good example of a **hybrid peer-to-peer** system.
*   It uses a **centralized login server** to authenticate users when they first sign in.
*   However, once users are logged in, the system tries to establish direct **peer-to-peer** connections for voice and video calls, as well as for text messaging (using **VoIP - Voice over IP** technology).
*   This hybrid approach combines the convenience of central management (for login) with the efficiency and scalability of direct peer-to-peer communication.

### **1.10.5 Cloud Computing**

**Cloud Computing** is a model for delivering computing resources—like processing power, storage, databases, and even full software applications—as a service over a network (almost always the Internet). Instead of owning and maintaining their own physical computing infrastructure, users can access these resources on-demand, paying only for what they use.

It can be seen as a large-scale extension of **virtualization**. Cloud providers have massive data centers filled with thousands of physical servers. They use virtualization to create countless **Virtual Machines (VMs)** on this hardware, which are then allocated to customers as needed.

**Types of Cloud Computing:**
Cloud computing is categorized in several ways. The categories often overlap, and a single cloud environment can provide a combination of them.

**1. By Deployment Model (Who can use it?):**
*   **Public Cloud:** The cloud infrastructure is owned and operated by a commercial provider (like Amazon Web Services (AWS), Microsoft Azure, or Google Cloud). It is made available to the general public over the Internet. Anyone can sign up and pay for services.
*   **Private Cloud:** The cloud infrastructure is operated solely for a single organization. It may be managed by the organization itself or by a third party, but it exists within the organization's firewall, offering more control and security.
*   **Hybrid Cloud:** A combination of public and private clouds that remain distinct but are connected by technology, allowing data and applications to be shared between them. This gives a business flexibility—for example, running its normal workload on a private cloud but "bursting" out to a public cloud for peak demand periods.

**2. By Service Model (What is provided?):**
*   **Software as a Service (SaaS):** Delivers full, ready-to-use applications over the Internet. The user doesn't manage the underlying infrastructure or platform; they just use the software. Examples: Gmail, Microsoft Office 365, Salesforce.
*   **Platform as a Service (PaaS):** Provides a platform or environment (including programming languages, databases, web servers, and development tools) that allows customers to develop, run, and manage their own applications without the complexity of building and maintaining the underlying infrastructure. Examples: Google App Engine, Microsoft Azure App Services.
*   **Infrastructure as a Service (IaaS):** Provides the fundamental computing resources: virtual machines, storage, networks, and operating systems. Users have control over the OS and deployed applications but do not manage the underlying cloud infrastructure. Examples: Amazon EC2 (for compute) and Amazon S3 (for storage).

**Cloud Management and Architecture:**
Inside a cloud data center, you will find traditional operating systems running on the physical servers. However, the key software layers that make cloud computing work are:
1.  **Virtual Machine Monitors (VMMs) / Hypervisors:** These manage the virtual machines on each physical server.
2.  **Cloud Management Tools:** These operate at a higher level, managing the entire pool of resources across all servers. Tools like VMware vCloud Director or open-source options like OpenStack and Eucalyptus orchestrate the VMMs, allocate resources to users, and provide the customer interface. Because these tools manage the fundamental resources of the entire data center, they can be considered a new type of large-scale, distributed operating system.
<br> ![image.png](attachment:image.png) <br>
**Figure 1.24** illustrates the architecture of a public cloud offering IaaS.
*   **Customer Requests:** Users send requests over the Internet.
*   **Firewall:** Both the cloud services and the management interface are protected by a **firewall** to ensure security.
*   **Customer Interface & Cloud Management:** The request goes through a customer interface, which is managed by the cloud management services. A **load balancer** distributes incoming requests across multiple servers to avoid overloading any single one.
*   **Infrastructure:** The management system then provisions the required resources from the pools of **servers**, **storage**, and **virtual machines**.

### **1.10.6 Real-Time Embedded Systems**

**Embedded computers** are the most common type of computer in the world. They are specialized devices designed to perform specific, dedicated tasks and are built into larger systems. You find them in car engines, medical devices, industrial robots, microwave ovens, and TV remotes.

**Key Characteristics of Embedded Systems:**
*   **Specific Task:** They are designed for a single purpose or a limited set of functions.
*   **Primitive OS:** The operating systems they run are often very simple and lightweight, providing only the essential features needed for the task.
*   **Limited or No User Interface (UI):** They typically don't have a complex UI like a desktop computer. Instead, they spend most of their time monitoring sensors and controlling hardware directly.

**Variations in Embedded Systems:**
Not all embedded systems are the same. They exist on a spectrum of complexity:
1.  **General-Purpose Computer with Special Software:** Some are essentially standard computers running a general-purpose OS (like a stripped-down version of **Linux**) with a custom application that performs the specific task (e.g., a kiosk or a smart TV).
2.  **Dedicated Hardware with an Embedded OS:** Others are custom hardware devices that run a special-purpose **embedded operating system** built solely for that device's function.
3.  **Application-Specific Integrated Circuit (ASIC):** The simplest forms are hardware chips (ASICs) programmed to perform a specific function without needing any operating system at all.

**Expanding Role: The "Smart" World**
The use of embedded systems is growing rapidly, especially with the rise of the "Internet of Things" (IoT). Entire homes can be automated, with a central computer (which could be an embedded system itself) controlling heating, lighting, and appliances. Web connectivity allows for remote control, like telling your house to turn up the heat before you arrive home. Future possibilities include a refrigerator that can automatically order milk when it senses you are running low.

---

#### **The Critical Link: Real-Time Operating Systems**

Most embedded systems require a **Real-Time Operating System (RTOS)**. A real-time system is critical when there are **rigid, well-defined time constraints** for processing data and responding to events. They are primarily used as control devices.

**How it Works:**
Sensors (like a thermometer or a motion detector) send data to the computer. The computer must analyze this data and, if necessary, send a command to an actuator (like a valve or a motor) within a strict deadline. Examples include:
*   Medical imaging systems (e.g., a CT scanner)
*   Industrial control systems (e.g., controlling a robotic arm on an assembly line)
*   Automotive systems (e.g., engine fuel injection, anti-lock brakes)
*   Avionics and weapon systems

**The Definition of "Failure" in Real-Time Systems:**
In a real-time system, **correctness depends not only on the right answer but also on the time taken to produce it.**
*   A result that is correct but delivered too late is a **system failure**.
*   **Example:** If a robot arm is building a car and receives a "halt" signal because something is wrong, the system fails if the arm does not stop *before* it crashes into the car. A delay of a few milliseconds could be catastrophic.

This is a fundamental difference from a general-purpose system like your laptop. On your laptop, it is *desirable* for the system to respond quickly, but a slight delay is usually acceptable. In a hard real-time system, a delay is **not acceptable**.

We will explore the scheduling algorithms that make real-time operation possible in **Chapter 5**, and look at the real-time features of the Linux kernel in **Chapter 20**.

## **1.11 Free and Open-Source Operating Systems**

The ability to study operating systems is greatly enhanced by the availability of **free** and **open-source** software. Both types provide the **source code** of the operating system, which is the human-readable instructions written by programmers. This is in contrast to the **compiled binary code**, which is the machine-readable format that the computer actually executes.

It is important to understand that "free software" and "open-source software" are distinct concepts, championed by different communities.

*   **Free Software (or Free/Libre Software):** The "free" here refers to **freedom**, not just price. Free software is defined by its licensing, which guarantees users four essential freedoms:
    1.  The freedom to run the program for any purpose.
    2.  The freedom to study how the program works and change it (which requires access to the source code).
    3.  The freedom to redistribute copies.
    4.  The freedom to distribute copies of your modified versions to others.
*   **Open-Source Software:** This term focuses more on the practical benefits of having access to the source code, such as improved collaboration and security. While it requires the source code to be available, its licenses may not grant all the freedoms associated with "free software."

**The Key Difference:** Therefore, **all free software is open source**, but **not all open-source software is "free"** in the libre sense. Some open-source licenses may have restrictions that conflict with the four freedoms.

**Examples of Operating System Models:**
*   **GNU/Linux:** The most famous example. It is an open-source operating system, and many of its distributions (like Ubuntu) are free software. However, some distributions may include proprietary components.
*   **Microsoft Windows:** A classic example of **closed-source** or **proprietary software**. Microsoft owns the code, restricts its use, and keeps the source code secret.
*   **Apple macOS:** A **hybrid** approach. Its core, a kernel named **Darwin**, is open-source. However, the user interface and many key components are proprietary and closed-source.

#### **Why Source Code Matters for Learning**

Having the source code is a powerful learning tool for several reasons:
*   **Transparency:** You can see exactly how the system works, from low-level scheduling to high-level system calls.
*   **Modification and Experimentation:** A student can modify the source code, recompile it, and run the modified OS to see the effects of their changes. This is an excellent way to understand complex algorithms.
*   **No Reverse Engineering Needed:** Reverse engineering binary code to understand functionality is extremely difficult and time-consuming. Source code provides the complete picture, including programmer comments.

This textbook will use high-level descriptions of algorithms but will also include pointers to open-source code for deeper study and projects that involve modifying OS source code.

#### **Benefits of the Open-Source Model**

The open-source development model offers significant advantages:
*   **Community Development:** A global community of interested programmers can contribute by writing, debugging, analyzing, and improving the code. Many of these contributors are volunteers.
*   **Security ("Linus's Law"):** The principle that "given enough eyeballs, all bugs are shallow" suggests that open-source code can be more secure because more people are examining it for vulnerabilities. While bugs exist, they are often found and fixed more quickly than in closed-source systems.
*   **Commercial Viability:** Companies like **Red Hat** have built successful business models around open-source software by selling support, customization, and integration services, rather than just the software license itself.

---

### **1.11.1 History**

The relationship between software and its source code has evolved significantly.

*   **1950s-1970s: The Era of Sharing**
    In the early days, software was commonly distributed with its source code. Computer enthusiasts and user groups freely shared and modified code. For example, Digital Equipment Corporation (DEC) distributed its operating systems as source code without restrictive copyrights.

*   **1980s: The Shift to Proprietary Software**
    As the software industry grew, companies began to see software as a primary product. To protect their intellectual property and generate revenue, they started distributing only the **compiled binary files**, keeping the source code secret. This created **proprietary software**. By the 1980s, this closed-source model had become the norm, even for operating systems on hobbyist computers.

### **1.11.2 Free Operating Systems**

In response to the growing trend of proprietary software, **Richard Stallman** launched a movement in 1984 to create a free, UNIX-compatible operating system called **GNU** (a recursive acronym for "GNU's Not Unix!").

**The Philosophy of "Free Software":**
For Stallman, "free" refers to **freedom**, not price. The movement does not oppose selling software, but insists that users must have four essential freedoms:
1.  The freedom to run the program for any purpose.
2.  The freedom to study how the program works and adapt it to your needs (which requires access to the **source code**).
3.  The freedom to redistribute copies so you can help your neighbor.
4.  The freedom to improve the program and release your improvements to the public, so that the whole community benefits.

In 1985, Stallman published the **GNU Manifesto**, outlining this philosophy, and founded the **Free Software Foundation (FSF)** to promote the development and use of free software.

**Copyleft and the GNU General Public License (GPL):**
To legally protect these freedoms, the FSF uses a concept called **copyleft**, which uses copyright law to achieve the opposite of its usual purpose. Instead of restricting use, copyleft ensures the software remains free.
*   The **GNU General Public License (GPL)** is the most well-known copyleft license.
*   It grants the four freedoms but with a crucial condition: if you redistribute the program, or a modified version of it, you must do so under the **same GPL license**. This "share-alike" clause prevents anyone from taking free code, modifying it, and turning it into a proprietary product. The source code must always be available.
*   This concept is similar to the Creative Commons "Attribution-ShareAlike" license.

---

### **1.11.3 GNU/Linux**

**GNU/Linux** is the prime example of a successful free and open-source operating system. Its development is a story of two projects merging.

**The Genesis: GNU and the Linux Kernel**
*   By 1991, the **GNU Project** had developed almost all the components of a complete operating system (compilers, editors, utilities, libraries) except for one critical part: a working kernel (the core of the OS that manages hardware).
*   In 1991, **Linus Torvalds**, a Finnish student, wrote a rudimentary UNIX-like kernel and released it on the internet. He used the GNU development tools to build it.
*   Leveraging the internet, thousands of programmers worldwide began contributing to Torvalds' kernel, which became known as **Linux**.
*   Initially, Linux had a non-commercial license. In 1992, Torvalds re-released it under the **GPL**, making it free software and allowing it to be combined with the GNU system.

**The Result: Distributions**
The combination of the Linux kernel and the GNU utilities created the complete **GNU/Linux** operating system. This has led to the creation of hundreds of different **distributions** (or "distros"), which are custom-built versions of the system. They vary in their target audience, pre-installed software, user interface, and support. Major distributions include:
*   **Red Hat Enterprise Linux:** For commercial, enterprise use.
*   **Ubuntu:** A popular, user-friendly distribution for desktops and servers.
*   **Debian:** A community-driven distribution known for stability.
*   **Specialized Distros:** Some are designed for specific purposes. For example, **PCLinuxOS** is a **live CD/DVD**—an OS that can be booted directly from a disc or USB drive without installing it on the computer's hard drive. A variant like **PCLinuxOS Supergamer DVD** comes pre-loaded with games and drivers.

#### **How to Run Linux for Study**

The text recommends an easy way to run Linux alongside your current operating system using **virtualization**:

1.  **Download a Virtual Machine Monitor (VMM):** Install a free tool like **VirtualBox** from https://www.virtualbox.org/. This software allows you to run an entire operating system as a "guest" within a window on your "host" OS.
2.  **Get a Linux Image:** You can either:
    *   Install an OS from scratch using an installation CD image.
    *   Download a pre-built virtual machine image from a site like http://virtualboxes.org/images/, which comes with an OS and applications already installed.
3.  **Boot the Virtual Machine:** Start the virtual machine within VirtualBox, and you will have a full Linux system running on your computer.

An alternative to VirtualBox is **Qemu**, which includes tools for converting VirtualBox images.

This textbook provides a virtual machine image of GNU/Linux running **Ubuntu**. This image contains the Linux source code and development tools. We will use this environment for examples and a detailed case study in **Chapter 20**.

### **1.11.4 BSD UNIX**

**BSD UNIX** has a longer and more complex history than Linux. It originated in 1978 as a set of modifications and enhancements to AT&T's original UNIX operating system, developed at the University of California at Berkeley (UCB).

**Key Historical Points:**
*   **Not Initially Open Source:** Early BSD releases included source code, but they were not considered "open source" in the modern sense because they required a license from AT&T, the original owner of UNIX.
*   **Legal Hurdles:** The development of BSD was significantly delayed by a lawsuit from AT&T over intellectual property. This legal battle was a major catalyst for the creation of completely free, AT&T-independent UNIX-like systems.
*   **Resolution:** The lawsuit was eventually settled, leading to the release of a fully functional, truly open-source version called **4.4BSD-lite** in 1994. This release is the foundational ancestor of modern BSD systems.

**Modern BSD Distributions:**
Similar to Linux, there are several distributions (or "flavors") of BSD, each with a slightly different focus:
*   **FreeBSD:** Focuses on performance and ease of use on standard PC hardware.
*   **NetBSD:** Emphasizes portability, running on a vast array of hardware platforms.
*   **OpenBSD:** Prioritizes security and code correctness.
*   **DragonflyBSD:** Explores novel approaches to multiprocessing.

**How to Study BSD Source Code:**
The process is very similar to studying Linux:
1.  **Download a Virtual Machine Image:** You can download a pre-configured FreeBSD virtual machine image and run it using a virtual machine manager like Virtualbox (as described previously for Linux).
2.  **Locate the Source Code:** The entire operating system source code is included with the distribution and is stored in the directory `/usr/src/`.
3.  **Find the Kernel Code:** The kernel source code is located in `/usr/src/sys/`. For example, to study the virtual memory implementation, you would examine the files in the `/usr/src/sys/vm/` directory.
4.  **Online Browsing:** Alternatively, you can browse the source code online via the FreeBSD project's website: **https://svnweb.freebsd.org**.

**Version Control Systems:**
The BSD project, like most large open-source projects, uses a **Version Control System (VCS)** to manage changes to the source code. BSD uses **Subversion (SVN)**.
*   **Purpose of a VCS:** These systems allow developers to "pull" the latest code to their computer, make changes, and "push" those changes back to a central repository. They also keep a complete history of every file, manage contributions from multiple developers, and help resolve conflicts.
*   **Other VCS:** Another extremely popular version control system is **git**, which is used to manage the Linux kernel source code and many other projects.

**macOS and Darwin:**
The core of Apple's macOS, called **Darwin**, is based on BSD UNIX. Darwin itself is open-source. Its source code is available from **http://www.opensource.apple.com/**, with each macOS release having its corresponding open-source components posted. The macOS kernel package begins with "xnu". Apple also provides extensive developer resources at **http://developer.apple.com**.

---

### **THE STUDY OF OPERATING SYSTEMS**

We are in a golden age for studying operating systems. The barriers to entry have never been lower, thanks to two major developments:

**1. The Open-Source Movement:**
*   **Access to Code:** Major operating systems like **Linux, BSD UNIX, Solaris, and parts of macOS** are available in both source and binary form. This allows us to move beyond just reading descriptions and to see how things actually work by examining the code itself.
*   **Historical Systems:** Even older, commercially obsolete operating systems have been open-sourced, allowing students to study the design constraints and solutions from eras with limited CPU, memory, and storage. A large list of open-source OS projects is available online.

**2. The Rise of Virtualization:**
*   **Easy Experimentation:** Free and widely available virtualization software like **VMware Player** and **Virtualbox** allows you to run hundreds of different operating systems as "virtual appliances" on a single physical machine. You can test, experiment, and even break an OS inside a virtual machine without affecting your main system or needing dedicated hardware.
*   **Hardware Simulation:** For truly historical study, simulators exist for old hardware (like the DECSYSTEM-20). This allows you to run an original operating system like TOPS-20, complete with its original source code, on a modern machine.

**From Student to Developer:**
This open environment makes the transition from student to contributor or even creator possible. With dedication and an internet connection, a student can download source code, modify it, and create their own operating system distribution. Access to knowledge and tools is now limited only by a student's interest and effort, not by proprietary restrictions.

### **1.11.5 Solaris**

**Solaris** is the UNIX-based operating system developed by Sun Microsystems (now owned by Oracle). Its history reflects the evolution of the UNIX family:

*   **Origins:** Sun's original operating system, **SunOS**, was based on **BSD UNIX**.
*   **Transition:** In 1991, Sun shifted its base from BSD to AT&T's **System V UNIX**, which led to the renaming of the operating system to Solaris.
*   **Open-Sourcing:** In 2005, Sun open-sourced most of the Solaris code under the name **OpenSolaris**. This move was significant as it provided access to the source code of a mature, enterprise-level commercial UNIX system.
*   **Oracle Acquisition:** Oracle's purchase of Sun in 2009 created uncertainty about the future of the OpenSolaris project. Oracle ultimately discontinued the open-source model for the main Solaris product.

**The Illumos Project:**
The community that had grown around OpenSolaris continued its development independently. This effort coalesced into **Project Illumos**.
*   **Purpose:** Illumos is a community-driven, open-source fork of the OpenSolaris codebase. It has expanded beyond the original OpenSolaris base to include new features and improvements.
*   **Role:** Illumos now serves as the core for several derivative operating system distributions, keeping the OpenSolaris lineage alive. You can find more information at **http://wiki.illumos.org**.

---

### **1.11.6 Open-Source Systems as Learning Tools**

The open-source movement has created an unprecedented opportunity for students to learn about operating systems deeply and practically.

**Direct Hands-On Learning:**
*   **Examination and Modification:** Students can read the source code of mature, full-featured operating systems to understand how algorithms are implemented in real-world scenarios. They can then modify this code, compile it, and test their changes to see the direct effects.
*   **Community Participation:** Students can contribute to real projects by helping to find and fix bugs (**debugging**), which is an invaluable skill. This provides practical experience far beyond theoretical study.

**Access to Historical Context:**
The availability of source code for historic systems, like **Multics**, allows students to understand the design decisions and constraints of earlier eras of computing. This historical knowledge provides a stronger foundation for understanding modern systems and implementing new projects.

**Diversity of Systems:**
A major advantage is the ability to compare different systems. For example:
*   **GNU/Linux** and **BSD UNIX** are both open-source, but they have different histories, goals, licensing terms, and design philosophies.
*   This diversity allows students to see multiple solutions to the same fundamental problems (e.g., process scheduling, memory management).

**Cross-Pollination and Innovation:**
Open-source licenses often allow code to be shared between projects. This leads to **cross-pollination**, where the best features from one system are incorporated into another. For example, major components from **OpenSolaris**, such as its advanced filesystem (ZFS) and debugging tools (DTrace), have been ported to BSD-based systems and Linux. This sharing accelerates innovation and improvement across all open-source projects.

The benefits of open-source software are likely to continue driving an increase in the number, quality, and adoption of these projects by both individuals and companies.

## **1.12 Summary**

This chapter introduced the fundamental concepts of operating systems. Here is a summary of the key points:

*   **Operating System Definition:** An operating system is the software that acts as an intermediary between the computer hardware and application programs. It manages the hardware and provides an environment for programs to run.

*   **Interrupts:** These are crucial signals from hardware devices to the CPU, alerting it that an event requires attention (e.g., a key is pressed, a disk read is complete). The operating system uses an **interrupt handler** to manage these events.

*   **Main Memory (RAM):** This is the primary, volatile storage that the CPU can access directly. Programs must be loaded into main memory to be executed. Its contents are lost when power is turned off.

*   **Storage Hierarchy:** Computer storage is organized in a hierarchy based on speed and cost.
    *   **Top (Fast/Expensive):** CPU registers, cache.
    *   **Middle:** Main Memory (RAM).
    *   **Bottom (Slow/Inexpensive):** Nonvolatile storage like **hard disks**, which provide permanent, high-capacity storage for programs and data.

*   **Multiprocessor Systems:** Modern computers contain multiple processors (CPUs), and each CPU often contains multiple computing **cores**, allowing true parallel execution.

*   **CPU Management:**
    *   **Multiprogramming:** This technique keeps several jobs (processes) in memory at once. If one job waits for I/O, the CPU can switch to another job, ensuring the CPU is always busy.
    *   **Multitasking (Time-Sharing):** An extension of multiprogramming that rapidly switches the CPU between processes, providing users with a fast, interactive response time.

*   **Dual-Mode Operation:** To protect the system, hardware supports two modes:
    *   **User Mode:** Where user applications run. Access to certain instructions and memory is restricted.
    *   **Kernel Mode:** Where the operating system runs. It has unrestricted access to all hardware instructions, including privileged instructions for I/O control, timer management, and interrupt handling.

*   **Process Management:** A **process** is the fundamental unit of work. The OS is responsible for creating, deleting, and managing processes, including enabling them to communicate and synchronize with each other.

*   **Memory Management:** The OS keeps track of memory usage, allocates memory to processes when they need it, and frees it when they are done.

*   **Storage Management:** The OS manages disk space and provides a **file system**—a way to store, organize, and retrieve files and directories on storage devices.

*   **Protection and Security:** The OS provides mechanisms to control access to resources (**protection**) and to defend the system from external and internal threats (**security**).

*   **Virtualization:** This technology involves abstracting physical hardware to create multiple, isolated execution environments (virtual machines) on a single physical machine.

*   **Data Structures:** Operating systems rely on fundamental data structures like lists, stacks, queues, trees, and maps to manage information efficiently.

*   **Computing Environments:** Operating systems are used in various settings:
    *   **Traditional Computing** (evolving office and home PCs)
    *   **Mobile Computing** (smartphones, tablets)
    *   **Client-Server Computing**
    *   **Peer-to-Peer (P2P) Computing**
    *   **Cloud Computing** (IaaS, PaaS, SaaS)
    *   **Real-Time Embedded Systems**

*   **Free and Open-Source Operating Systems:**
    *   These systems provide their **source code**, which is a powerful learning tool.
    *   **Free Software** emphasizes user freedoms (use, study, modify, redistribute).
    *   **Open-Source Software** focuses on the practical benefits of collaborative development.
    *   Examples include **GNU/Linux**, **FreeBSD**, and **OpenSolaris**.