# What are Processes and Threads?

Understanding processes and threads is fundamental to writing concurrent and parallel programs in Python. This notebook explains the differences between processes and threads, and when to use each.

## What We'll Learn

1. What is a Process?
2. What is a Thread?
3. Differences Between Processes and Threads
4. Concurrency vs Parallelism
5. The Global Interpreter Lock (GIL)
6. When to Use Threads vs Processes
7. Simple Examples

---

## 1. What is a Process?

A **process** is an independent instance of a program running in memory. Each process has its own:
- Memory space (isolated from other processes)
- System resources (file handles, network connections, etc.)
- CPU state and registers
- Execution context

**Key Characteristics:**
- **Isolated**: Processes don't share memory with each other
- **Independent**: Crash in one process doesn't affect others
- **Heavyweight**: Creating a process is resource-intensive
- **True Parallelism**: Can run on multiple CPU cores simultaneously
- **Inter-Process Communication (IPC)**: Requires special mechanisms to share data

**Example:** When you open Chrome, each tab might be a separate process.

---

## 2. What is a Thread?

A **thread** is a lightweight unit of execution within a process. Multiple threads can exist within one process and share:
- Memory space (same heap, global variables)
- File descriptors and resources
- Process state

**Key Characteristics:**
- **Shared Memory**: Threads share the same memory space
- **Lightweight**: Creating a thread is faster than creating a process
- **Dependent**: All threads crash if the process crashes
- **Synchronization Required**: Must coordinate access to shared data
- **GIL Limited**: In Python, limited by Global Interpreter Lock (CPython)

**Example:** A web browser process might have multiple threads - one for UI, one for downloads, one for rendering.

---

## 3. Processes vs Threads: Key Differences

| Feature | Process | Thread |
|---------|---------|--------|
| **Memory** | Separate memory space | Shared memory space |
| **Creation Time** | Slow (heavyweight) | Fast (lightweight) |
| **Resource Usage** | High | Low |
| **Communication** | Complex (IPC required) | Simple (shared memory) |
| **Isolation** | High (isolated) | Low (shared) |
| **Parallelism** | True parallelism | Limited by GIL (in Python) |
| **Failure Impact** | Independent | Affects all threads |
| **Use Case** | CPU-bound tasks | I/O-bound tasks |
| **Example** | Separate applications | Components of one application |

---

## 4. Concurrency vs Parallelism

These terms are often confused but have distinct meanings:

### Concurrency
**Definition**: Multiple tasks making progress by switching between them quickly (interleaving).

- Tasks start, run, and complete in overlapping time periods
- May or may not run simultaneously
- Can be achieved with single CPU core
- **Example**: Single chef cooking multiple dishes by switching between them

### Parallelism
**Definition**: Multiple tasks executing simultaneously at the same time.

- Tasks actually run at the exact same moment
- Requires multiple CPU cores
- Subset of concurrency
- **Example**: Multiple chefs each cooking a different dish

**In Python:**
- **Threads**: Provide concurrency (not true parallelism due to GIL)
- **Processes**: Provide true parallelism

---

## 5. The Global Interpreter Lock (GIL)

The **GIL** is a mutex (lock) in CPython that allows only one thread to execute Python bytecode at a time.

### Why Does the GIL Exist?

- **Memory Management**: Python's memory management isn't thread-safe
- **C Extensions**: Simplifies integration with C libraries
- **Historical**: Easier implementation for the original Python interpreter

### Impact of the GIL

**Limitations:**
- Threads cannot execute Python code in true parallel
- CPU-bound multithreaded programs don't get faster
- Only one thread runs Python code at any moment

**Not Limited:**
- I/O operations (file, network) release the GIL
- C extensions can release the GIL
- NumPy operations often release the GIL

### Solutions to GIL Limitations

1. **Use Multiprocessing**: Each process has its own GIL (true parallelism)
2. **Use C Extensions**: NumPy, Pandas release GIL for heavy computations
3. **Use Alternative Interpreters**: Jython, IronPython (no GIL)
4. **Asyncio**: For I/O-bound concurrency without threads

---

## 6. When to Use Threads vs Processes

### Use Threads (Threading) When:

✅ **I/O-Bound Tasks**
- Network requests (web scraping, API calls)
- File operations (reading/writing files)
- Database queries
- User input/output

✅ **Shared State Needed**
- Need to share data between concurrent tasks
- Lower memory overhead is important

✅ **Examples:**
- Web scraping multiple URLs
- Downloading multiple files
- Handling multiple client connections
- GUI applications with background tasks

### Use Processes (Multiprocessing) When:

✅ **CPU-Bound Tasks**
- Mathematical computations
- Data processing and transformation
- Image/video processing
- Machine learning training

✅ **True Parallelism Needed**
- Want to utilize multiple CPU cores
- Performance is critical

✅ **Isolation Required**
- Tasks should not interfere with each other
- Stability is important (crash isolation)

✅ **Examples:**
- Data analysis on large datasets
- Image processing batch jobs
- Scientific computations
- Encoding/decoding multimedia

---

## 7. Simple Demonstration

Let's see a simple example comparing sequential, threaded, and multiprocessing approaches.

In [None]:
import time
import threading
import multiprocessing
import os

def task(name, duration):
    """Simulate a task that takes some time"""
    print(f"Task {name} starting (PID: {os.getpid()}, Thread: {threading.current_thread().name})")
    time.sleep(duration)
    print(f"Task {name} completed")

# Sequential Execution
print("=== Sequential Execution ===")
start = time.time()
task("A", 1)
task("B", 1)
task("C", 1)
print(f"Time taken: {time.time() - start:.2f} seconds\n")

# Threading (Concurrent)
print("=== Threading (Concurrent) ===")
start = time.time()
threads = []
for i, name in enumerate(["A", "B", "C"]):
    t = threading.Thread(target=task, args=(name, 1))
    threads.append(t)
    t.start()

for t in threads:
    t.join()
print(f"Time taken: {time.time() - start:.2f} seconds\n")

# Multiprocessing (Parallel)
print("=== Multiprocessing (Parallel) ===")
start = time.time()
processes = []
for i, name in enumerate(["A", "B", "C"]):
    p = multiprocessing.Process(target=task, args=(name, 1))
    processes.append(p)
    p.start()

for p in processes:
    p.join()
print(f"Time taken: {time.time() - start:.2f} seconds")

---

## Summary

**Key Takeaways:**

1. **Process**:
   - Independent program instance with separate memory
   - True parallelism on multiple cores
   - Higher resource usage
   - Use for CPU-bound tasks

2. **Thread**:
   - Lightweight execution unit within a process
   - Shares memory with other threads
   - Limited by GIL in Python
   - Use for I/O-bound tasks

3. **Concurrency vs Parallelism**:
   - Concurrency: Tasks make progress by switching
   - Parallelism: Tasks run simultaneously

4. **GIL (Global Interpreter Lock)**:
   - Limits Python threads to one at a time
   - Doesn't affect I/O operations
   - Multiprocessing bypasses GIL

5. **Decision Guide**:
   - **I/O-bound** → Threading
   - **CPU-bound** → Multiprocessing
   - **Shared state** → Threading
   - **Isolation needed** → Multiprocessing

**Next Steps:**
- Learn threading implementation
- Learn multiprocessing implementation
- Explore concurrent.futures for simpler API