## Multithreading
good for I/O bound tasks, not CPU bound.

In [5]:
import threading
import time


def crawl(link, delay=3):
    print(f"crawl started for {link}")
    time.sleep(delay)  # Blocking I/O (simulating a network request)
    print(f"Crawl for link {link} took {delay} seconds")
    print(f"crawl ended for {link}")


links = [
    "https://python.org",
    "https://docs.python.org",
    "https://peps.python.org",
]

In [9]:
# normal function calls sequentially 
print("Starting I/O bound tasks in sequential")
start_time = time.time()

for link in links:
    crawl(link)
    print("-----------")
    
end_time = time.time()

print(f"Total time taken to complete : {end_time-start_time}")

Starting I/O bound tasks in sequential
crawl started for https://python.org
Crawl for link https://python.org took 3 seconds
crawl ended for https://python.org
-----------
crawl started for https://docs.python.org
Crawl for link https://docs.python.org took 3 seconds
crawl ended for https://docs.python.org
-----------
crawl started for https://peps.python.org
Crawl for link https://peps.python.org took 3 seconds
crawl ended for https://peps.python.org
-----------
Total time taken to complete : 9.012431144714355


In sequential execution, one task runs then another one and so on.
Here the `crawl` function is I/O blocking task that takes some seconds to complete the task, now during this delay our system is simply waiting and then it executes next task in pipeline.

We can parallise these tasks to save time and utilize the waiting time and compute.

In [10]:
# Start threads for each link
print("Starting I/O bound tasks in sequential")
start_time = time.time()

threads = []
for link in links:
    # Using `args` to pass positional arguments and `kwargs` for keyword arguments
    t = threading.Thread(target=crawl, args=(link,), kwargs={"delay": 3})
    threads.append(t)

# Start each thread
for t in threads:
    t.start()

# Wait for all threads to finish
for t in threads:
    t.join()
    
end_time = time.time()

print(f"Total time taken to complete : {end_time-start_time}")

Starting I/O bound tasks in sequential
crawl started for https://python.org
crawl started for https://docs.python.org
crawl started for https://peps.python.org
Crawl for link https://peps.python.org took 3 seconds
crawl ended for https://peps.python.org
Crawl for link https://docs.python.org took 3 seconds
crawl ended for https://docs.python.org
Crawl for link https://python.org took 3 seconds
crawl ended for https://python.org
Total time taken to complete : 3.0067381858825684


In multi threading way, the total time taken to complete 3 I/O bound tasks is almost 1/3 of sequential. This reduces latency by a great margin by running the tasks in multiple threads but at a given time only one runs because of Python's GIL Lock, but the wait time is utilized smartly by switching the threads so all tasks complete faster than sequentially running them.

## AsyncIO
Good for performant I/O bound volume tasks.
It will achieve the same concurrency as of multi threading but with less overhead because only one thread is used along with the event loop. There is less of swithcing threads so better used for high volume I/O bound tasks. 

In [15]:
import asyncio
import time


async def async_io_task(name):
    print(f"Task {name}: Starting I/O wait...")
    # 'await' tells the event loop: "I'm waiting, switch to another task!"
    await asyncio.sleep(1)
    print(f"Task {name}: I/O finished.")


async def main_async():
    await asyncio.gather(
        async_io_task(0),
        async_io_task(1),
        async_io_task(2)
    )

start_time = time.time()
# asyncio.run(main_async())
await main_async()

print(f"\n--- Asyncio Total Time: {time.time() - start_time:.2f} seconds ---")

Task 0: Starting I/O wait...
Task 1: Starting I/O wait...
Task 2: Starting I/O wait...
Task 0: I/O finished.
Task 1: I/O finished.
Task 2: I/O finished.

--- Asyncio Total Time: 1.00 seconds ---


## Multiprocessing
for CPU bound tasks to run in true parallel

In [None]:
import multiprocessing
import time


def cpu_task(name):
    print(f"Process {name}: Starting computation...")
    # Simulate heavy computation
    for _ in range(20_000_000):
        _ = 1 + 1
    print(f"Process {name}: Computation finished.")


start_time = time.time()
processes = []
for i in range(3):
    p = multiprocessing.Process(target=cpu_task, args=(i,))
    processes.append(p)
    p.start()

for p in processes:
    p.join()  # Wait for all processes to complete

print(
    f"\n--- Multiprocessing Total Time: {time.time() - start_time:.2f} seconds (Good Scaling) ---")