# üß† Python Multiprocessing & Multithreading: A Beginner‚Äôs Guide

Welcome to this beginner-friendly guide on Python parallel programming!

---

## üéØ Purpose of This Notebook

This notebook is designed to help you understand **how to make Python code faster and more efficient** using `multithreading` and `multiprocessing`. These techniques allow you to **run multiple tasks at the same time** ‚Äî which is incredibly useful in data science, automation, analytics, and beyond.

---

## üë§ Who This Is For

Whether you're a:
- üß™ Data Scientist running complex computations
- üìä Data Analyst processing large Excel/CSV files
- ü§ù Consultant automating reporting pipelines

‚Ä¶this notebook will help you **get started with concurrency and parallelism in Python.**

And guess what? I‚Äôm learning it too ‚Äî and sharing everything I understand so that we all learn together.

---

## ‚ö° Why This Matters

Most Python code runs one line at a time, even on powerful computers with multiple CPU cores. This means your scripts can be **slow** when handling:
- Large datasets
- Time-consuming calculations
- Slow APIs
- Disk or network I/O

By learning **parallel programming**, you can:
- Speed up your code dramatically üöÄ
- Make your applications more responsive
- Get better at designing efficient workflows

Let‚Äôs dive in!


In [1]:
import time

def slow_task(name, duration):
    print(f"Starting task {name}...")
    time.sleep(duration)
    print(f"Finished task {name} in {duration} seconds.")

# Start measuring time
start_time = time.time()

# Run 3 slow tasks one after another
slow_task("A", 2)
slow_task("B", 2)
slow_task("C", 2)

end_time = time.time()
print(f"\nTotal time taken: {round(end_time - start_time, 2)} seconds")


Starting task A...
Finished task A in 2 seconds.
Starting task B...
Finished task B in 2 seconds.
Starting task C...
Finished task C in 2 seconds.

Total time taken: 6.0 seconds


## üê¢ Sequential Execution in Python

Let‚Äôs begin by understanding how Python runs tasks **one after the other** ‚Äî this is called **sequential execution**.

We‚Äôll simulate this by running three slow tasks, each taking 2 seconds.

```python
def slow_task(name, duration):
    print(f"Starting task {name}...")
    time.sleep(duration)
    print(f"Finished task {name} in {duration} seconds.")


Time ‚Üí   0s       2s       4s       6s
         |--------|--------|--------|
Task A:  [--------]
Task B:           [--------]
Task C:                    [--------]


# ‚ùì Why Should You Care About Parallel Programming?

---

## ‚ö†Ô∏è The Problem

Python is **single-threaded by default**, meaning it runs **one line of code at a time**, even on machines with multiple CPU cores.

This leads to performance issues when dealing with:

- ‚è≥ **Slow operations** like reading large files, downloading from the internet, or calling APIs  
- üß† **Heavy CPU tasks** like training machine learning models or looping over large data  
- üîÅ **Repetitive operations** that could run independently, but don‚Äôt

---

## üí° Why care about this?

Most data folks run loops for:
- File loads
- Model training
- Metric calculations

These loops are often **independent** and can be run in **parallel** ‚Äî but we don‚Äôt do it by default.

This results in **wasted time** and **under-utilized CPUs**.

---

## üîÅ Real-World Examples

| Use Case                          | Without Parallelism (Slow)  | With Parallelism (Fast)      |
|----------------------------------|-----------------------------|------------------------------|
| Downloading 100 files            | One-by-one ‚è≥              | Many at once üöÄ             |
| Reading large CSV files          | Serially in a loop          | Split across processes       |
| Processing user sessions         | One-by-one in a for loop    | Distribute across CPU cores  |
| Scoring multiple ML models       | Sequential scoring          | Parallel scoring             |

---

## üí• The Opportunity

By using `multithreading` and `multiprocessing`, you can:

‚úÖ Use all CPU cores  
‚úÖ Speed up workflows by 2x, 4x, or more  
‚úÖ Write faster, scalable, and more responsive code  

---

### üß† Rule of Thumb

> If your task **waits a lot** (e.g., downloading, reading), use **multithreading**.  
> If your task **works the CPU a lot** (e.g., math, transformations), use **multiprocessing**.

We‚Äôll explore both with simple, visual examples next!

---

## üß† Bonus Analogy

Imagine a restaurant with 4 chefs. But only one chef is cooking at a time because the manager gives instructions to one person at a time.

Now imagine if the manager gives instructions to all 4 chefs at once ‚Äî **tasks get done faster**.

That‚Äôs the difference between sequential and parallel execution in Python.



# üîÑ What Are Multiprocessing and Multithreading?

Let‚Äôs start from the very basics.

---

## üîπ What Is a Process?

A **process** is simply a program that is currently running.

Examples:
- When you open Google Chrome ‚Üí a process starts.
- When you run a Python script ‚Üí a process starts.

Each process has:
- Its **own memory space**
- Its **own Python interpreter**
- Its **own resources**

Processes are **independent** and do not share memory. That‚Äôs why they‚Äôre safe for heavy computation.

---

## üß© What Is Multiprocessing?

**Multiprocessing** means running **multiple Python processes** at the same time.

- True parallelism using multiple CPU cores
- Each process runs independently with its own memory
- Safe from memory conflicts, but uses more RAM

### ‚úÖ Best For:
- CPU-heavy tasks
- Mathematical operations
- Data transformations
- Model training


## üßµ What Is a Thread?

A **thread** is the smallest unit of execution within a process.

- A single Python process can contain **multiple threads**
- All threads inside a process **share the same memory space**
- Threads are **lightweight** and quick to create
- They run concurrently (one at a time, very fast switching)

---

## ü§ñ What Is Multithreading?

**Multithreading** is the ability of a program (process) to run **multiple threads** at once.

- Threads perform different tasks **in parallel** (or at least concurrently)
- Useful when tasks spend time **waiting** (e.g., downloading data, reading files)
- All threads share the **same memory**, which makes communication easier ‚Äî but also riskier due to conflicts

---

### ‚úÖ Best For:

- ‚¨áÔ∏è Downloading files from the internet
- üì° Calling APIs
- üìÇ Reading or writing many files
- üñ•Ô∏è Monitoring or logging systems
- üí§ Tasks that mostly wait (I/O-bound)


---
# üî≤ Visualization
## Multithreading

```text
[Main Program]
     ‚îÇ
     ‚îî‚îÄ‚îÄ [One Process with Shared Memory]
           ‚îú‚îÄ‚îÄ Thread 1 ‚îÄ‚îÄ üì• API Call
           ‚îú‚îÄ‚îÄ Thread 2 ‚îÄ‚îÄ üì§ File Write
           ‚îî‚îÄ‚îÄ Thread 3 ‚îÄ‚îÄ üìÅ Read CSV

‚úî Threads run concurrently  
‚úî All threads share the same memory  
‚úî Great for I/O-bound tasks

---
## Multiprocessing

```text
[Main Program]
     ‚îÇ
     ‚îú‚îÄ‚îÄ [Process 1] ‚îÄ‚îÄ üî¢ Task A (Own memory)
     ‚îú‚îÄ‚îÄ [Process 2] ‚îÄ‚îÄ üßÆ Task B (Own memory)
     ‚îî‚îÄ‚îÄ [Process 3] ‚îÄ‚îÄ üìä Task C (Own memory)

‚úî Processes run in parallel  
‚úî Each process has separate memory  
‚úî Great for CPU-bound tasks

# üßµ Multithreading in Action ‚Äì Simulated CSV Downloads

Let‚Äôs simulate a common data science task:
> Downloading multiple datasets or reports (e.g., from APIs, internal dashboards)

We‚Äôll simulate each download taking 2 seconds using `time.sleep()`.

---




## Example 1: Simulated file download

### üê¢ Version 1 ‚Äì Sequential Downloads

We loop through one file at a time.

In [12]:
import time

# This function simulates downloading a CSV file.
# It waits for 2 seconds to mimic download time.
def download_csv(name):
    print(f"Starting download: {name}")
    time.sleep(2)  # Simulate time delay (e.g., waiting for an API)
    print(f"Finished download: {name}")

# Record the start time
start_time = time.time()

# Simulate downloading 5 files, one after the other (sequentially)
for i in range(5):
    download_csv(f"file_{i+1}.csv")

# Record the end time and print how long it took
end_time = time.time()
print(f"\nTotal time taken: {round(end_time - start_time, 2)} seconds")


Starting download: file_1.csv
Finished download: file_1.csv
Starting download: file_2.csv
Finished download: file_2.csv
Starting download: file_3.csv
Finished download: file_3.csv
Starting download: file_4.csv
Finished download: file_4.csv
Starting download: file_5.csv
Finished download: file_5.csv

Total time taken: 10.0 seconds


---

### ‚ö° Version 2 ‚Äì Parallel Downloads with Threads

Let‚Äôs now use the `threading` module to run the downloads concurrently.


In [13]:
import threading
import time

# Same download simulation function as before
def download_csv(name):
    print(f"Starting download: {name}")
    time.sleep(4)
    print(f"Finished download: {name}")

# Record start time
start_time = time.time()

# Store thread objects
threads = []

# Launch 5 threads to run in parallel
for i in range(5):
    # Create a thread that runs the download_csv function
    t = threading.Thread(target=download_csv, args=(f"file_{i+1}.csv",))
    threads.append(t)  # Keep a reference to the thread
    t.start()          # Start the thread immediately

# Wait for all threads to complete before moving on
for t in threads:
    t.join()

# Record end time
end_time = time.time()
print(f"\nTotal time taken with threads: {round(end_time - start_time, 2)} seconds")


Starting download: file_1.csv
Starting download: file_2.csv
Starting download: file_3.csv
Starting download: file_4.csv
Starting download: file_5.csv
Finished download: file_1.csv
Finished download: file_2.csv
Finished download: file_3.csv
Finished download: file_5.csv
Finished download: file_4.csv

Total time taken with threads: 4.01 seconds


## Example 2: Actual file download

### üê¢ Version 1 ‚Äì Sequential Downloads

We loop through one JSON at a time.

##

In [15]:
import requests
import time
import json

# Function to fetch a single post by ID from the API
def fetch_post(post_id):
    print(f"Fetching post {post_id}")
    response = requests.get(f"https://jsonplaceholder.typicode.com/posts/{post_id}")
    print(f"Done fetching post {post_id}")
    return response.json()

# Start timing
start = time.time()

results = {}

# Loop through post IDs 1 to 5
# These calls happen one after the other (sequentially)
for i in range(1, 6):
    results[i] = fetch_post(i)

# End timing
end = time.time()
print(f"\n‚è±Ô∏è Total time (sequential): {round(end - start, 2)} seconds\n")

# Print all post responses in a clean JSON format
for post_id in results:
    print(f"üìÑ Post ID {post_id}:\n")
    print(json.dumps(results[post_id], indent=2))
    print("\n" + "-"*60 + "\n")


Fetching post 1
Done fetching post 1
Fetching post 2
Done fetching post 2
Fetching post 3
Done fetching post 3
Fetching post 4
Done fetching post 4
Fetching post 5
Done fetching post 5

‚è±Ô∏è Total time (sequential): 1.99 seconds

üìÑ Post ID 1:

{
  "userId": 1,
  "id": 1,
  "title": "sunt aut facere repellat provident occaecati excepturi optio reprehenderit",
  "body": "quia et suscipit\nsuscipit recusandae consequuntur expedita et cum\nreprehenderit molestiae ut ut quas totam\nnostrum rerum est autem sunt rem eveniet architecto"
}

------------------------------------------------------------

üìÑ Post ID 2:

{
  "userId": 1,
  "id": 2,
  "title": "qui est esse",
  "body": "est rerum tempore vitae\nsequi sint nihil reprehenderit dolor beatae ea dolores neque\nfugiat blanditiis voluptate porro vel nihil molestiae ut reiciendis\nqui aperiam non debitis possimus qui neque nisi nulla"
}

------------------------------------------------------------

üìÑ Post ID 3:

{
  "userId": 1,
  

---

### ‚ö° Version 2 ‚Äì Parallel Downloads with Threads

Let‚Äôs now use the `threading` module to run the downloads concurrently.


In [16]:
import threading
import requests
import time
import json

# Shared dictionary to store thread results (keyed by post_id)
results = {}

# Function to run inside each thread
def fetch_post_thread(post_id):
    print(f"Fetching post {post_id}")
    response = requests.get(f"https://jsonplaceholder.typicode.com/posts/{post_id}")
    results[post_id] = response.json()
    print(f"Done fetching post {post_id}")

# Start timing
start = time.time()

threads = []  # List to hold all thread objects

# Launch one thread per post_id (1 to 5)
for i in range(1, 6):
    # Create a new thread to fetch a post
    t = threading.Thread(target=fetch_post_thread, args=(i,))
    threads.append(t)  # Add to the thread list
    t.start()          # Start the thread (runs in background)

# Wait for all threads to finish before moving on
for t in threads:
    t.join()

# End timing
end = time.time()
print(f"\n‚ö° Total time (multithreaded): {round(end - start, 2)} seconds\n")

# Display all results in readable format
for post_id in sorted(results):
    print(f"üìÑ Post ID {post_id}:\n")
    print(json.dumps(results[post_id], indent=2))
    print("\n" + "-"*60 + "\n")


Fetching post 1
Fetching post 2
Fetching post 3
Fetching post 4
Fetching post 5
Done fetching post 3
Done fetching post 1
Done fetching post 5
Done fetching post 2
Done fetching post 4

‚ö° Total time (multithreaded): 0.44 seconds

üìÑ Post ID 1:

{
  "userId": 1,
  "id": 1,
  "title": "sunt aut facere repellat provident occaecati excepturi optio reprehenderit",
  "body": "quia et suscipit\nsuscipit recusandae consequuntur expedita et cum\nreprehenderit molestiae ut ut quas totam\nnostrum rerum est autem sunt rem eveniet architecto"
}

------------------------------------------------------------

üìÑ Post ID 2:

{
  "userId": 1,
  "id": 2,
  "title": "qui est esse",
  "body": "est rerum tempore vitae\nsequi sint nihil reprehenderit dolor beatae ea dolores neque\nfugiat blanditiis voluptate porro vel nihil molestiae ut reiciendis\nqui aperiam non debitis possimus qui neque nisi nulla"
}

------------------------------------------------------------

üìÑ Post ID 3:

{
  "userId": 1,
  

## Key Takeaways
- Threads are ideal when tasks are **I/O-bound** ‚Äî like waiting for downloads, file reads, or web requests.
- In this case, multithreading **reduced our total time from ~20s to ~4s**!
- Threads share memory, so they‚Äôre lightweight and fast to spin up.

In the next part, we‚Äôll use multithreading with **real API calls** or **file processing tasks**.
