

### **Chapter 4: Threads & Concurrency - Core Concepts Handbook**

This section covers the fundamental distinctions between processes and threads, the benefits of multithreading, and the various models for thread implementation.

---

#### **1. What is the main difference between a Process and a Thread?**

**Answer:**
The main difference lies in **resource ownership and execution context**.
*   A **Process** is an instance of a running program and operates as an *independent entity*. Each process has its own complete and separate address space in memory (code, data, heap, stack), and system resources (like open files, I/O operations). Communication between processes (IPC) is complex and heavyweight.
*   A **Thread** is a "lightweight process" and is the basic unit of CPU utilization within a process. Threads belonging to the *same process* share the process's address space and resources (e.g., global variables, open files). Each thread has its own independent stack and register set (including Program Counter) to track its execution state. Communication between threads of the same process is simple due to shared memory.

**In short: Processes are units of resource isolation, while Threads are units of execution and scheduling within a process.**

---

#### **2. Name and explain four main advantages of using Multithreading.**

**Answer:**

1.  **Responsiveness:** In interactive applications (like a desktop GUI or a web browser), multithreading can allow a program to remain responsive even if part of it is blocked or performing a long-running operation. For example, a user can interact with the interface (handled by one thread) while a file is loading in the background (handled by another thread).

2.  **Resource Sharing:** Threads by default share the memory and resources of their parent process. This sharing allows for efficient data sharing between multiple tasks without needing complex and slow Inter-Process Communication (IPC) mechanisms like shared memory segments or pipes.

3.  **Economy:** Creating and managing threads is significantly faster and consumes far fewer system resources than creating processes. Thread creation involves allocating a much smaller memory footprint (just a stack and registers) compared to duplicating an entire address space for a process. Context switching between threads is also cheaper than switching between processes.

4.  **Scalability and Performance:** On multi-core or multi-processor systems, multithreading allows true parallel execution. A single-threaded process can only run on one CPU core, while a multithreaded process can distribute its threads across multiple cores, leading to a substantial performance increase for computationally intensive tasks.

---

#### **3. Why is creating a Thread for each web server request better than creating a new Process?**

**Answer:**
Creating a thread is vastly more **economical** in terms of both time and resource consumption.
*   **Process Creation Overhead:** Forking a new process involves duplicating the entire parent process's address space, which is a time-consuming and memory-intensive operation.
*   **Thread Creation Overhead:** Creating a new thread within the existing server process is much faster, as it only requires allocating a small stack and register set. The code, data, and open files are already shared.
*   **Context Switching Overhead:** When a process blocks (e.g., on I/O), a full context switch (swapping page tables, registers, etc.) is required. A thread context switch within the same process is much faster because the memory address space remains the same.

For a web server that may handle thousands of concurrent requests, the performance difference between process-per-request and thread-per-request is monumental, making the thread-based model the only practical choice.

---

#### **4. What is the difference between Concurrency and Parallelism?**

**Answer:**
This is a critical distinction in computer science.
*   **Concurrency:** This means that **multiple tasks can make progress** without necessarily running at the same instant. It is about managing access to shared resources and dealing with multiple tasks that have overlapping lifetimes. On a single-core system, the CPU achieves concurrency by rapidly switching (time-slicing) between threads.
*   **Parallelism:** This means that **multiple tasks are executing simultaneously**. This requires hardware with multiple execution units (e.g., multiple CPU cores). Parallelism is a subset of concurrency; it is possible only when a system is capable of concurrency *and* has multiple cores.

**Analogy:** On a single-core machine, you can concurrently cook and talk on the phone by switching between them, but you cannot do them in parallel. On a multi-core machine, one core can encode a video (parallel task A) while another runs the OS (parallel task B), and the system remains concurrent.

---

#### **5. What are the models of communication between User Threads and Kernel Threads?**

**Answer:**
These are the **Multithreading Models** that define the relationship between User Threads (UT - managed by a user-level thread library) and Kernel Threads (KT - supported and managed by the OS kernel).

1.  **Many-to-One Model:** Maps many user threads to a single kernel thread. (e.g., Classic Green Threads).
2.  **One-to-One Model:** Maps each user thread to a distinct kernel thread. (e.g., Windows, Linux).
3.  **Many-to-Many Model:** Multiplexes many user threads to a smaller or equal number of kernel threads.
4.  **Two-Level Model:** A variation of the Many-to-Many model that also allows a user thread to be bound to a specific kernel thread.

---

#### **6. What are the advantages and disadvantages of the Many-to-One model?**

**Answer:**

*   **Advantages:**
    *   **Efficiency:** Thread management (creation, scheduling, synchronization) is done entirely in user space, which is very fast and does not require a system call.
    *   **Portability:** It can be implemented on operating systems that do not natively support threads at the kernel level.

*   **Disadvantages:**
    *   **Lack of True Concurrency:** The entire process is blocked if one thread makes a blocking system call (e.g., I/O), because the single kernel thread that underpins them all is blocked.
    *   **No Parallelism:** It cannot take advantage of multiple CPU cores, as all user threads are ultimately scheduled on a single kernel thread, which is assigned to a single core.

---

#### **7. How is the Two-Level model different from the Many-to-Many model?**

**Answer:**
The Two-Level model is a specific, flexible implementation of the Many-to-Many concept.
*   **Many-to-Many Model:** This model strictly multiplexes an arbitrary number of user threads to a pool of kernel threads. The programmer has no direct control over which kernel thread a user thread runs on.
*   **Two-Level Model:** This model also multiplexes user threads to a pool of kernel threads, but it adds a crucial feature: it allows the programmer to **bind** a specific, important user thread directly to a specific kernel thread. This provides more control. The bound thread behaves like a One-to-One model thread (can run in parallel, won't be swapped out), while the other, unbound threads behave like the Many-to-Many model.

---

#### **8. What is the difference between Asynchronous and Synchronous threading?**

**Answer:**
This difference concerns how the parent thread manages its child threads.

*   **Synchronous (Blocking):** The parent thread creates one or more child threads and then must **wait for all of them to terminate** before it can itself continue its execution. The parent is blocked until all children finish their work.

*   **Asynchronous (Non-blocking):** The parent thread creates one or more child threads and then **immediately continues its own execution** in parallel with the children. The parent and child threads run concurrently and independently.

---

#### **9. What is the advantage of using a Thread Pool?**

**Answer:**
The primary advantage is **performance optimization and resource control**.
Creating a new thread for every single task (e.g., for every web request) has significant overhead and can potentially lead to system thrashing if the number of concurrent tasks becomes too high. A thread pool creates a limited number of threads at process startup and keeps them alive for the duration of the program.

**Advantages:**
1.  **Faster Service:** Using an existing thread from a pool is faster than creating a brand new thread for each task.
2.  **Resource Limiting:** The pool limits the number of threads that can exist simultaneously, preventing a flood of tasks from overloading the system.
3.  **Flexible Scheduling:** The pool can manage how tasks are scheduled and executed (e.g., in a queue) without the application developer having to manage the thread lifecycle.

---

#### **10. On a 4-core system, what is the difference between running 8 CPU-bound threads and 8 I/O-bound threads?**

**Answer:**
The difference lies in how the threads utilize the system resources and how the OS scheduler behaves.

*   **8 CPU-bound Threads:** A CPU-bound thread spends most of its life performing computations, using the CPU intensively with very few I/O operations.
    *   **Behavior:** On a 4-core system, only 4 threads can run in true parallel at any given instant. The other 4 threads will be in a ready queue, waiting for a time slice on a core. The system will experience a very high rate of **context switches** as the scheduler rapidly swaps threads on and off the cores to give them all a fair share of CPU time. Overall CPU utilization will be near 100%.

*   **8 I/O-bound Threads:** An I/O-bound thread spends most of its time waiting for the completion of I/O operations (disk, network) and very little time using the CPU.
    *   **Behavior:** These threads will frequently voluntarily yield the CPU while waiting for I/O. This allows the scheduler to very efficiently interleave their execution. While one thread is blocked on I/O, many others can be scheduled to run on the available cores. With 8 I/O-bound threads on 4 cores, the system can achieve a high degree of **concurrency** with relatively low contention for the CPU cores themselves. The overall system throughput will be high because the CPU is almost never idle, but the context switch rate will be lower than the CPU-bound case because threads are naturally blocking themselves.



#### **11. When does adding more threads decrease system performance?**

**Answer:**
Adding more threads improves performance only up to a certain point, known as the "knee" of the performance curve. Beyond this point, performance degrades due to several overheads:

1.  **Increased Context Switching:** The operating system spends more time saving and restoring the state of threads (register values, stack pointers) and less time actually executing the application code. This is pure overhead.
2.  **Cache Thrashing:** When a thread is context-switched out, the data it was using (cached in the CPU's L1/L2 cache) is likely to be evicted by the next thread. When it switches back in, it experiences "cache misses," forcing slow accesses to main memory. With too many threads, the CPU caches are constantly being flushed and refilled.
3.  **Synchronization Overhead:** As the number of threads contending for shared resources increases, the time spent waiting for locks (mutexes) also increases. Threads block each other, leading to reduced parallelism and increased scheduler load.
4.  **Memory Consumption:** Each thread requires memory for its stack (often 1MB+ by default). Creating thousands of threads can exhaust the virtual address space of the process.

---

#### **12. In a program with 100 threads, if one thread blocks on a system call, what happens in the Many-to-One model?**

**Answer:**
**The entire process and all 100 of its user-level threads are blocked.**
In the Many-to-One model, all user threads are mapped onto a **single kernel thread**. The kernel scheduler only sees this one kernel thread. If any user thread makes a blocking system call (e.g., reading from a file), it causes the one underlying kernel thread to block. Since the kernel is unaware of the user-level threads, it has no ability to schedule another one. The entire process is de-scheduled, and all other user threads, even if they are ready to run, are forced to wait until the system call completes and the kernel thread becomes unblocked. This is the major drawback of the Many-to-One model.

---

#### **13. What is the difference in resource consumption between Thread and Process context switching?**

**Answer:**
**Thread context switching is significantly cheaper and faster than process context switching.**

*   **Process Context Switch:** Requires switching the virtual memory address space. This involves:
    *   Flushing the TLB (Translation Lookaside Buffer), which leads to more memory address translation overhead after the switch.
    *   Switching the CPU's page table register to point to the new process's page table.
    *   This is a very expensive operation.

*   **Thread Context Switch (within the same process):** Does **not** require switching the memory address space. The page table and TLB entries remain valid because the threads share the same memory map. The OS only needs to:
    *   Save and restore the CPU registers (Program Counter, Stack Pointer, general-purpose registers).
    *   Switch to a new kernel stack for the thread.
    This is a much lighter-weight operation.

---

#### **14. If threads in a program modify shared data without synchronization, what type of error occurs?**

**Answer:**
A **Race Condition** occurs.
A race condition is a situation where the behavior of the program depends on the relative timing or interleaving of multiple threads' execution. The final outcome is **non-deterministic** and can vary from one run to the next. This often leads to corrupted data, incorrect results, or program crashes. The specific error is that the final state of the shared data depends on which thread won the "race" to access and modify it last.

---

#### **15. If multiple threads are reading from shared memory and none are writing, is a lock needed?**

**Answer:**
**No, a lock is not needed in this specific scenario.**
This is known as a **"Read-Only"** shared state. As long as you can guarantee that not a single thread will be writing to the data, concurrent reads are perfectly safe. Multiple threads can read the same memory location simultaneously without causing data corruption or race conditions. Introducing a lock in this case would only add unnecessary overhead and limit concurrency.

---

#### **16. Why do some programs perform worse on multi-core systems than on single-core systems?**

**Answer:**
This performance degradation is almost always due to **synchronization problems** that become apparent only with true parallelism.

1.  **False Sharing:** Two threads on different cores modify *different* variables that happen to reside on the same CPU cache line. When one thread updates its variable, it invalidates the entire cache line for the other core, forcing a memory update. This causes a storm of cache coherency traffic (MESI protocol overhead) that severely degrades performance, even though the threads aren't logically sharing data.
2.  **Highly Contended Locks:** On a single-core system, a lock might be acquired and released quickly with little contention because only one thread runs at a time. On a multi-core system, multiple threads can simultaneously contend for the same lock. The lock becomes a serialization point, forcing threads to wait idle, wasting CPU cycles and reducing parallel efficiency.
3.  **Inefficient Algorithms:** A program may use a parallelization strategy that has high communication overhead between threads, which outweighs the benefits of parallel computation.

---

#### **17. What is the difference between User-Level Scheduling and Kernel-Level Scheduling?**

**Answer:**
This distinction refers to *which entity* performs the scheduling of threads.

*   **User-Level Scheduling (User Threads):**
    *   **Scheduler:** The thread library (e.g., in the programming language runtime) acts as the scheduler at the user level.
    *   **Entities Scheduled:** User-level threads.
    *   **Advantage:** Very fast, as thread switching does not require a privileged mode switch to the kernel.
    *   **Disadvantage:** If one user-level thread blocks on a system call, the kernel blocks the entire process, and the user-level scheduler cannot run to switch to another ready user thread (the problem of the Many-to-One model).

*   **Kernel-Level Scheduling (Kernel Threads):**
    *   **Scheduler:** The operating system kernel itself is the scheduler.
    *   **Entities Scheduled:** Kernel-level threads.
    *   **Advantage:** The kernel can schedule another thread from the same process if one thread blocks, maintaining concurrency. It can also leverage multiple CPU cores effectively.
    *   **Disadvantage:** Slower, because thread operations (create, schedule, destroy) require a system call (a trap into the kernel).

Modern systems (One-to-One model) use kernel-level scheduling for concurrency and parallelism.

---

#### **18. Explain the difference between Deadlock, Livelock, and Starvation.**

**Answer:**
All three are related to resource contention but represent different states.

*   **Deadlock:** A situation where a set of threads are **blocked permanently**, each holding a resource and waiting for another resource held by a different thread in the set. None of them can make progress. (e.g., Thread A has Lock 1 and waits for Lock 2; Thread B has Lock 2 and waits for Lock 1). The state is static.
*   **Livelock:** Threads are **not blocked** but are unable to make progress because they are constantly reacting to each other's state changes. They are like two people trying to pass each other in a hallway who keep stepping side-to-side into each other's way. The threads are consuming CPU but doing no useful work. The state is active but unproductive.
*   **Starvation:** A thread is **unable to gain access to a necessary resource** because other threads (often with higher priority or different scheduling) are constantly being chosen over it. The thread is "starved" of CPU time or a lock. While the starved thread waits indefinitely, other threads may be making progress.

---

#### **19. Why is Asynchronous Thread Cancellation dangerous?**

**Answer:**
Asynchronous cancellation is dangerous because it immediately terminates a thread **at an arbitrary point in its execution**, which can leave the program's shared data in an inconsistent state.

**Specific Risks:**
1.  **Resource Leaks:** The thread might be holding a lock (like a mutex) when it is canceled. The lock is never released, causing all other threads that need that lock to deadlock permanently.
2.  **Data Structure Corruption:** The thread might be in the middle of updating a complex data structure (e.g., a linked list). If canceled mid-update, the data structure can be left with broken pointers or invalid invariants, leading to crashes or incorrect behavior later.

The safer alternative is **Deferred Cancellation**, where the target thread periodically checks if it should terminate and then cleans up its resources properly before exiting.

---

#### **20. Under what conditions is using Atomic Operations better than a Mutex?**

**Answer:**
Atomic operations are better when the critical section is **extremely small and simple**, involving a single memory location.

**Conditions Favoring Atomic Operations:**
*   The operation is a simple read-modify-write on a single variable (e.g., incrementing a counter: `count++`).
*   The operation is supported by the hardware via atomic CPU instructions (e.g., `compare-and-swap`, `fetch-and-add`).

**Reason:** A mutex involves the overhead of a system call if the lock is contended, and it always involves checking and updating the lock's state. An atomic operation, being implemented in hardware, is often a single instruction and is therefore much faster, avoids context switches, and is immune to deadlock (as there is no locking/unlocking of multiple resources). For a simple counter update, an atomic increment can be orders of magnitude faster than locking a mutex, updating, and then unlocking.


#### **21. Why do a large number of threads cause slowness or even crashes in some systems?**

**Answer:**
This is due to exhausting fundamental system resources and increased overhead:

1.  **Memory Exhaustion:** Each thread requires memory for its stack (default size can be 1-8 MB). Creating 1000 threads can consume 1-8 GB of virtual memory just for stacks, potentially exhausting the process's address space or physical RAM, leading to crashes or excessive paging.
2.  **High Context Switch Overhead:** With thousands of threads, the OS scheduler spends most of its time switching between them instead of letting them run. This results in **thrashing**, where the system is busy managing threads instead of doing useful work.
3.  **Kernel Resource Limits:** The OS kernel has internal data structures (e.g., process/thread control blocks) to manage threads. There is a system-wide or per-process limit on the number of threads. Exceeding this limit will cause thread creation to fail, crashing the application.

---

#### **22. When is it better to use Asynchronous I/O instead of Multithreading?**

**Answer:**
It is better to use Asynchronous (Async) I/O when you have a **very large number of I/O-bound operations** and the primary goal is to maximize I/O throughput with minimal resource usage.

*   **Multithreading (One-thread-per-request):** Requires a dedicated thread for each concurrent I/O operation. Managing 10,000 concurrent network connections with 10,000 threads is inefficient due to the memory and context-switching overhead.
*   **Asynchronous I/O:** A **single thread** can initiate thousands of I/O operations and be notified when they complete. The thread is not blocked waiting for I/O; it can use the waiting time to handle completed operations or submit new ones. This model scales to handle an enormous number of concurrent I/O operations with a very small, fixed number of threads (often called an "event loop").

**Use Case:** High-performance web servers, network proxies, and database systems.

---

#### **23. In a multithreaded program, if one thread enters an infinite loop, do the other threads continue?**

**Answer:**
**It depends on the thread type and the nature of the loop.**

*   **If the program uses Kernel-Level Threads (One-to-One model):** **Yes, the other threads will continue.** The OS scheduler sees them as independent entities. If the looping thread is on a CPU core, it will use its entire time slice, but the OS will preempt it and schedule other ready threads (both from the same process and others) on the available cores.
*   **If the program uses purely User-Level Threads (Many-to-One model):** **No, the entire process will hang.** The kernel sees only one thread of execution. If that single kernel thread is stuck in an infinite loop, it cannot be preempted at the user-level, and the entire process is effectively dead.

---

#### **24. What is the performance difference between a Thread-safe queue and a Non-thread-safe queue?**

**Answer:**
A **Thread-safe queue** uses synchronization mechanisms (like mutexes) to ensure that concurrent enqueue and dequeue operations do not corrupt the internal data structure. A **Non-thread-safe queue** has no such protection.

**Performance Difference:**
The thread-safe queue is **slower** due to synchronization overhead.
*   **Locking/Unlocking:** Every operation requires acquiring and releasing a lock, which involves atomic instructions and potential system calls.
*   **Contention:** Under high concurrency, threads will block each other, waiting for the lock, leading to serialized execution and reduced parallelism.

The non-thread-safe queue is faster but can only be used safely by a single thread. The performance trade-off is correctness and safety for speed.

---

#### **25. When is it better to use a Spinlock?**

**Answer:**
A spinlock is better when the **expected wait time for a lock is very short** and you are running on a **multi-core system**.

*   **How it works:** Instead of blocking (which involves a context switch), a thread waiting for a spinlock repeatedly checks (spins) in a tight loop until the lock becomes available.
*   **Advantage:** If the lock is held for a very short time, spinning avoids the significant overhead of a context switch.
*   **Disadvantage:** If the lock is held for a long time, spinning wastes CPU cycles that could be used by another thread. It should never be used on a single-core system.

**Use Case:** Protecting very short critical sections inside an OS kernel or low-level synchronization.

---

#### **26. Suppose you have a program with 5000 threads, each doing only a few milliseconds of work. What alternative do you suggest?**

**Answer:**
The best alternative is to use a **Thread Pool**.

Instead of creating and destroying 5000 threads (which is very expensive), a fixed-size pool of worker threads (e.g., 50 threads) is created at startup. Tasks (the few milliseconds of work) are submitted to a queue. Worker threads in the pool take tasks from the queue, execute them, and then go back to get the next task.

**Benefits:**
*   Eliminates the overhead of frequent thread creation and destruction.
*   Controls resource consumption by limiting the total number of threads.
*   Allows the number of threads to be tuned to the number of available CPU cores.

---

#### **27. Suppose a program consists of 30% serial code and 70% parallel code. According to Amdahl's Law, what is the maximum speedup on an 8-core system?**

**Answer:**
Amdahl's Law defines the theoretical speedup as: `Speedup = 1 / (S + (P / N))`
Where:
*   `S` is the fraction of the program that is serial (0.30).
*   `P` is the fraction that is parallel (0.70).
*   `N` is the number of processors (8).

`Speedup = 1 / (0.30 + (0.70 / 8)) = 1 / (0.30 + 0.0875) = 1 / 0.3875 ≈ 2.58`

**Conclusion:** Even with an infinite number of cores, the maximum speedup is limited to `1 / S = 1 / 0.30 ≈ 3.33`. The serial portion becomes the bottleneck.

---

#### **28. If the OS does not use preemptive scheduling for threads, what problem might occur?**

**Answer:**
A single thread could **monopolize the CPU**, causing all other threads in the system to **starve**.

In a **cooperative (non-preemptive)** scheduling model, a thread keeps the CPU until it voluntarily yields control (e.g., by blocking on I/O or explicitly calling `thread_yield()`). If a thread has a bug and enters a long-running computation loop without yielding, it will never give up the CPU. This prevents other threads, even high-priority ones, from running, making the entire system unresponsive.

Preemptive scheduling solves this by allowing the OS to interrupt (preempt) a running thread after its time slice expires, ensuring fairness and responsiveness.

---

#### **29. What is the difference between Data-Level Parallelism and Instruction-Level Parallelism (ILP)?**

**Answer:**
This is the difference between parallelizing *across data* and parallelizing *within a single instruction stream*.

*   **Data-Level Parallelism (DLP):** Executing the **same operation** on multiple **data elements** simultaneously. This is the paradigm behind **SIMD (Single Instruction, Multiple Data)** CPU instructions (e.g., SSE, AVX). A single instruction, like an addition, can add eight pairs of numbers at once. It is exploited by compilers and explicitly in code for scientific computing or graphics.

*   **Instruction-Level Parallelism (ILP):** Executing multiple **different instructions** from a **single thread** simultaneously. The CPU hardware does this dynamically at runtime by using techniques like pipelining, superscalar execution, and out-of-order execution. For example, while one instruction is fetching data from memory, the CPU can simultaneously execute an arithmetic instruction that doesn't depend on it. This is transparent to the programmer.

---

#### **30. Why does increasing the number of threads in a multithreaded program not necessarily increase speed?**

**Answer:**
Increasing threads does not increase speed when the program hits a bottleneck. Reasons include:

1.  **Amdahl's Law:** The **serial portion** of the code does not speed up with more threads and eventually dominates the runtime.
2.  **Resource Contention:** Threads start competing for shared resources like:
    *   **Locks:** Highly contended locks serialize execution, making threads wait.
    *   **Memory Bandwidth:** The memory bus becomes saturated, and cores wait for data.
    *   **I/O Bandwidth:** The disk or network becomes the bottleneck.
3.  **Synchronization Overhead:** The time spent on locking, signaling, and waiting increases with the number of threads.
4.  **Hardware Limits:** You cannot get a speedup beyond the number of available CPU cores for CPU-bound tasks. Adding more threads than cores only increases context switching overhead.