<img src="./images/threading.png" width="800"/>

# Introduction to Threading in Python

Threading is a form of parallelism that allows a program to run multiple operations concurrently in the same process space. It is a key technique in programming that can significantly improve the efficiency and responsiveness of applications, especially those that perform I/O-bound tasks or need to maintain a responsive UI while performing long-running operations.


A thread is the smallest unit of processing that can be scheduled by an operating system. A Python application starts with a single thread (the main thread), but can spawn additional threads to execute code in parallel.


Threading is particularly useful in scenarios where an application needs to perform tasks that are waiting on external resources or have some idle time. Common use cases include:

- **Web applications**: Handling multiple requests simultaneously.
- **I/O-bound applications**: Performing network requests, file I/O, or database transactions concurrently.
- **UI applications**: Keeping the interface responsive while performing background tasks.


The Global Interpreter Lock, or GIL, is a mutex that protects access to Python objects, preventing multiple threads from executing Python bytecodes at once. This means that even in a multi-threaded script, only one thread can be in a state of execution at any point in time. The presence of the GIL in CPython (the standard Python implementation) means that threads are not truly running in parallel, but rather, they take turns executing.


**Implications of the GIL**:
- **CPU-bound tasks**: For tasks that require heavy CPU computation, threading may not provide a significant performance improvement in Python due to the GIL. Such tasks won't run in true parallelism because of the GIL's restrictions.
- **I/O-bound tasks**: Threading can improve performance for I/O-bound tasks since while one thread waits for I/O operations to complete, other threads can continue executing.


Both threading and multiprocessing are used to achieve parallelism in programming. However, they do so in fundamentally different ways:

- **Threading**: Shares memory space and is more lightweight. Threads within the same process share the same data space with the main thread and can therefore easily share information among themselves. However, they are limited by the GIL in the context of CPU-bound tasks.

- **Multiprocessing**: Involves running processes in completely separate memory spaces. This approach circumvents the GIL, allowing true parallelism for CPU-bound tasks by leveraging multiple CPUs or cores. Each process has its own instance of the Python interpreter and memory space, which means they don't share global variables. Communication between processes is possible but more complex and involves using inter-process communication (IPC) mechanisms.


In summary, the choice between threading and multiprocessing depends on the nature of the task. For I/O-bound tasks, threading is often sufficient and more efficient due to its lower overhead. For CPU-bound tasks, especially those that need to scale across multiple cores, multiprocessing is a better choice.

**Table of contents**<a id='toc0_'></a>    
- [Installing Required Packages](#toc1_)    
- [The Basics of Making HTTP Requests in Python](#toc2_)    
  - [Making a GET Request to Check a Website's Availability](#toc2_1_)    
- [Implementing Site Connectivity Checker Using For Loops](#toc3_)    
  - [Using a For Loop to Check Each Site's Connectivity](#toc3_1_)    
  - [Limitations of Using a For Loop for This Task](#toc3_2_)    
- [Introduction to Threading in Python](#toc4_)    
  - [How Threading Can Overcome Limitations](#toc4_1_)    
  - [Basic Syntax and Usage of the Threading Module](#toc4_2_)    
  - [Creating and Starting Threads](#toc4_3_)    
  - [Joining Threads to Wait for Their Completion](#toc4_4_)    
- [Implementing Site Connectivity Checker Using Threading](#toc5_)    
  - [Step-by-Step Implementation](#toc5_1_)    
  - [Performance Comparison](#toc5_2_)    
- [Conclusion](#toc6_)    
  - [Potential Issues or Pitfalls with Threading](#toc6_1_)    
  - [Further Reading and Projects](#toc6_2_)    

<!-- vscode-jupyter-toc-config
	numbering=false
	anchor=true
	flat=false
	minLevel=2
	maxLevel=6
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

## <a id='toc1_'></a>[Installing Required Packages](#toc0_)

For the Site Connectivity Checker project, you'll primarily need the `requests` library for making HTTP requests to websites. The `threading` module, which is part of Python's standard library, will be used for implementing threading.


- **Install the requests package**:
  With the virtual environment activated, install the `requests` library using `pip`:
  ```
  pip install requests
  ```


After installation, you can verify that `requests` is installed by running:
```
pip list
```
This command will list all the packages installed in the virtual environment, including their versions. Ensure that `requests` is listed among the installed packages.



The `threading` module is part of Python's standard library, so you don't need to install it separately. However, it's essential to understand its usage and limitations (e.g., the Global Interpreter Lock or GIL in CPython) when implementing threading in your projects.


You now have a virtual environment set up for the Site Connectivity Checker project with all the necessary packages installed. This isolated environment ensures that your project's dependencies are managed efficiently, preventing conflicts and issues related to package versions.

## <a id='toc2_'></a>[The Basics of Making HTTP Requests in Python](#toc0_)

The `requests` library in Python is a powerful HTTP client that makes it incredibly simple to send HTTP/1.1 requests. It allows you to send all kinds of HTTP requests (GET, POST, DELETE, PUT, PATCH), without the need to manually add query strings to your URLs or form-encode your POST data. The `requests` library is renowned for its simplicity and ease of use, making it the de facto choice for making HTTP requests in Python.


Features of the `requests` Library:
- Elegant and simple syntax for making requests and accessing response data.
- Automatic handling of URL encoding.
- Support for form data, multipart file uploads, and parameters.
- Automatic decoding of response content and its encoding.
- Support for sessions with cookie persistence.
- SSL Certificate verification.


Before you can start making HTTP requests, you need to ensure that the `requests` library is installed in your Python environment. If you haven't installed it yet, you can do so by running the following command in your terminal or command prompt (make sure your virtual environment is activated if you're using one):


```bash
pip install requests
```


### <a id='toc2_1_'></a>[Making a GET Request to Check a Website's Availability](#toc0_)


To check if a website is available, you can send a GET request to the website's URL and then check the HTTP status code of the response. A status code of `200 OK` generally indicates that the website is available and accessible. Here's a simple example:


In [1]:
import requests

def check_site_availability(url):
    try:
        response = requests.get(url)
        # A status code of 200 means the request was successful,
        # and the site is up/available.
        if response.status_code == 200:
            print(f"{url} is up!")
        else:
            print(f"{url} is down or not reachable. Status Code: {response.status_code}")
    except requests.exceptions.RequestException as e:
        # This catches exceptions like connection errors
        print(f"Failed to reach {url}. Error: {e}")


In [3]:
check_site_availability("https://www.google.com")

https://www.google.com is up!


In [4]:
check_site_availability("https://www.unknownurlthatdoesntexist.com")

Failed to reach https://www.unknownurlthatdoesntexist.com. Error: HTTPSConnectionPool(host='www.unknownurlthatdoesntexist.com', port=443): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x10b881d20>: Failed to resolve 'www.unknownurlthatdoesntexist.com' ([Errno 8] nodename nor servname provided, or not known)"))


- The `requests.get()` function sends a GET request to the specified URL.
- The `status_code` property of the `response` object contains the HTTP status code returned by the server.
- We use a try-except block to catch any exceptions that may occur during the request (e.g., network problems, invalid URLs).
- By checking the `status_code`, we determine if the site is accessible or not and print an appropriate message.


The `requests` library simplifies the process of making HTTP requests in Python, providing an intuitive interface for interacting with the web. By leveraging the `requests` library to perform a simple GET request, we can easily check the availability of websites, which is the foundation of our Site Connectivity Checker project.

## <a id='toc3_'></a>[Implementing Site Connectivity Checker Using For Loops](#toc0_)

In this section, we'll develop a basic version of the Site Connectivity Checker using a for loop in Python. This initial implementation will serve as a foundation, allowing us to later explore the advantages of introducing threading for this task.


First, we start by defining a list of websites URLs that we want to check for availability. For demonstration purposes, we'll include a few example websites:


In [5]:
websites_to_check = [
    "https://www.google.com",
    "https://www.github.com",
    "https://www.stackoverflow.com",
    "https://www.python.org",
    "https://nonexistentwebsite.example"
]

### <a id='toc3_1_'></a>[Using a For Loop to Check Each Site's Connectivity](#toc0_)


Next, we'll write a function that iterates over this list with a for loop, using the `requests` library to check each website's availability:


In [6]:
import requests

In [7]:
def check_sites_availability(websites_list):
    for url in websites_list:
        try:
            response = requests.get(url, timeout=5)
            if response.status_code == 200:
                print(f"{url} is up!")
            else:
                print(f"{url} is down or not reachable. Status Code: {response.status_code}")
        except requests.exceptions.RequestException as e:
            # This catches exceptions like connection errors
            print(f"Failed to reach {url}. Error: {e}")


In [8]:
# Checking the websites' availability
check_sites_availability(websites_to_check)

https://www.google.com is up!
https://www.github.com is up!
https://www.stackoverflow.com is up!
https://www.python.org is up!
Failed to reach https://nonexistentwebsite.example. Error: HTTPSConnectionPool(host='nonexistentwebsite.example', port=443): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x10b883700>: Failed to resolve 'nonexistentwebsite.example' ([Errno 8] nodename nor servname provided, or not known)"))


### <a id='toc3_2_'></a>[Limitations of Using a For Loop for This Task](#toc0_)


While the for loop method is straightforward and easy to implement, it has significant limitations when used for tasks like checking the connectivity of multiple websites:

- **Sequential Execution**: The for loop checks each website one after the other, waiting for each request to complete before moving on to the next. This sequential approach is inefficient, especially when dealing with a large list of websites or websites that are slow to respond.

- **Time Consumption**: Because each request is made synchronously, the total time taken to check all websites is the sum of the time taken for each individual request. This can lead to long waiting times, particularly if any of the websites are slow or unresponsive.

- **Blocking**: The script is blocked until the for loop has completed checking all websites. This means that if the script is part of a larger application, other tasks cannot be performed concurrently.


Using a for loop to implement the Site Connectivity Checker demonstrates the basic functionality but also highlights the inefficiency of handling I/O-bound tasks synchronously in Python. The sequential checking of websites results in unnecessary waiting and a poor utilization of resources. In the next sections, we'll explore how threading can address these limitations by allowing multiple checks to occur in parallel, significantly improving the performance of our Site Connectivity Checker.

## <a id='toc4_'></a>[Introduction to Threading in Python](#toc0_)

In the previous section, we discussed the limitations of using a for loop for checking the connectivity of multiple websites, primarily focusing on its sequential and blocking nature. This section introduces threading as a solution to overcome these limitations, allowing for concurrent execution and more efficient use of time, especially for I/O-bound tasks.


### <a id='toc4_1_'></a>[How Threading Can Overcome Limitations](#toc0_)


Threading allows a program to run multiple operations concurrently in the same process space. It is particularly beneficial for I/O-bound tasks, like network requests, where the program spends a significant amount of time waiting for external responses. By using threads, we can initiate multiple website checks simultaneously, significantly reducing the total execution time compared to a sequential for loop approach.


### <a id='toc4_2_'></a>[Basic Syntax and Usage of the Threading Module](#toc0_)


Python's `threading` module provides a way to create and work with threads. Here is a basic outline of using the `threading` module:

1. **Import the threading module**: First, import the module into your Python script.
   ```python
   import threading
   ```

2. **Define a target function**: This is the function that will be executed by each thread.
   ```python
   def check_site_availability(url):
       # Function logic here
   ```

3. **Create a Thread**: Instantiate a thread object by specifying the target function and its arguments.
   ```python
   thread = threading.Thread(target=check_site_availability, args=("https://www.example.com",))
   ```

4. **Start the Thread**: Begin the thread's activity.
   ```python
   thread.start()
   ```

5. **Join the Thread**: Wait for the thread to complete its task.
   ```python
   thread.join()
   ```


### <a id='toc4_3_'></a>[Creating and Starting Threads](#toc0_)


To check multiple websites concurrently, we can create a thread for each website check. Here's how you might adjust our previous website checking function to use threading:


```python
import requests
import threading

def check_site_availability(url):
    try:
        response = requests.get(url, timeout=5)
        if response.status_code == 200:
            print(f"{url} is up!")
        else:
            print(f"{url} is down or not reachable. Status Code: {response.status_code}")
    except requests.exceptions.RequestException as e:
        print(f"Failed to reach {url}. Error: {e}")

websites_to_check = [
    "https://www.google.com",
    "https://www.github.com",
    # Add more sites as needed
]

threads = []

for url in websites_to_check:
    thread = threading.Thread(target=check_site_availability, args=(url,))
    threads.append(thread)
    thread.start()

for thread in threads:
    thread.join()
```


### <a id='toc4_4_'></a>[Joining Threads to Wait for Their Completion](#toc0_)


The `join()` method ensures that the main program waits for all threads to complete their tasks before moving on. This is particularly important when the outcome of the thread's execution is needed later in the program or if you need to ensure that all resources are cleaned up properly.


Threading in Python offers a powerful solution for improving the efficiency of I/O-bound tasks by enabling concurrent execution. By leveraging threads for tasks like checking website connectivity, we can significantly reduce execution time and avoid the blocking nature of sequential execution. However, it's essential to be mindful of the Global Interpreter Lock (GIL) in CPython, which can limit the effectiveness of threading for CPU-bound tasks.

## <a id='toc5_'></a>[Implementing Site Connectivity Checker Using Threading](#toc0_)

To address the limitations of the sequential site connectivity checker implemented with a for loop, we'll now modify the implementation to use Python's threading module. This approach allows us to check the connectivity of multiple websites concurrently, significantly improving the overall execution time.


### <a id='toc5_1_'></a>[Step-by-Step Implementation](#toc0_)


1. **Import Necessary Modules**:
   First, ensure you import the `threading` and `requests` modules.


In [9]:
import threading
import requests

2. **Define the Target Function**:
   The target function, which performs the website connectivity check, remains largely the same but will now be executed by individual threads.


In [10]:
def check_site_availability(url):
    try:
        response = requests.get(url, timeout=5)
        if response.status_code == 200:
            print(f"{url} is up!")
        else:
            print(f"{url} is down or not reachable. Status Code: {response.status_code}")
    except requests.exceptions.RequestException as e:
        print(f"Failed to reach {url}. Error: {e}")

3. **Create and Start Threads**:
   For each website in the list, create a new thread object targeting the `check_site_availability` function. Start each thread immediately after its creation.


In [11]:
websites_to_check = [
    "https://www.google.com",
    "https://www.github.com",
    "https://www.stackoverflow.com",
    "https://www.python.org",
    "https://nonexistentwebsite.example"
]

threads = []
for url in websites_to_check:
    thread = threading.Thread(target=check_site_availability, args=(url,))
    threads.append(thread)
    thread.start()

Failed to reach https://nonexistentwebsite.example. Error: HTTPSConnectionPool(host='nonexistentwebsite.example', port=443): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x10b883940>: Failed to resolve 'nonexistentwebsite.example' ([Errno 8] nodename nor servname provided, or not known)"))
https://www.python.org is up!
https://www.google.com is up!
https://www.github.com is up!
https://www.stackoverflow.com is up!


4. **Wait for All Threads to Complete**:
   Use the `join()` method on each thread to ensure the main program waits for all threads to complete their execution.


In [12]:
for thread in threads:
    thread.join()

### <a id='toc5_2_'></a>[Performance Comparison](#toc0_)


To compare the performance of the threading implementation against the for-loop implementation, you can measure the time it takes for each approach to check the connectivity of all websites in the list.

- **For-loop Implementation**:
  The for-loop approach checks each website sequentially, meaning the total execution time is the sum of the time taken to check each site. This can be considerably slow, especially with a large list of websites or websites that are slow to respond.

- **Threading Implementation**:
  The threading approach allows for concurrent checks, significantly reducing the total execution time. The execution time is closer to the longest single request time rather than the sum of all request times.


You can use Python's `time` module to measure and compare the execution times of both approaches:


In [13]:
import time

start_time = time.time()
# Place function call here (either for-loop or threading implementation)
end_time = time.time()

print(f"Execution time: {end_time - start_time} seconds")

Execution time: 1.0967254638671875e-05 seconds


By leveraging threading for the Site Connectivity Checker, we can overcome the limitations of sequential execution found in the for-loop implementation. This results in a more efficient and faster way to check the connectivity of multiple websites, demonstrating the power of concurrency for I/O-bound tasks. While threading offers significant improvements for tasks like this, it's essential to consider thread management and potential complexity as the number of concurrent threads grows.

## <a id='toc6_'></a>[Conclusion](#toc0_)

The implementation of threading in the Site Connectivity Checker project brings to light several key advantages, particularly for I/O-bound tasks such as checking the availability of websites. These benefits include:

- **Concurrent Execution**: Threading allows multiple website checks to occur simultaneously, significantly reducing the total execution time compared to a sequential approach.
- **Improved Responsiveness**: For applications that require maintaining responsiveness while performing background tasks (such as a GUI application), threading can perform long-running checks without freezing the user interface.
- **Efficient Use of Waiting Time**: In I/O-bound operations where the task spends a significant amount of time waiting for external responses, threading helps utilize this waiting time to initiate other operations, thereby optimizing the application's overall efficiency.


### <a id='toc6_1_'></a>[Potential Issues or Pitfalls with Threading](#toc0_)


While threading offers considerable benefits, it also introduces complexity and potential issues that developers need to be aware of:

- **Race Conditions**: When two or more threads access shared data and try to change it at the same time, it can lead to inconsistent or unexpected outcomes.
- **Thread Safety**: Not all code is thread-safe, meaning it can produce errors or unexpected results when accessed by multiple threads concurrently. Special care must be taken when accessing shared resources or mutable data structures.
- **Deadlocks**: This occurs when two or more threads are waiting on each other to release resources, causing all of them to remain blocked indefinitely.
- **Overhead**: Creating and managing a large number of threads can introduce significant overhead, both in terms of memory and CPU usage, potentially negating the benefits of threading if not managed correctly.


### <a id='toc6_2_'></a>[Further Reading and Projects](#toc0_)


To explore threading in more depth and become acquainted with best practices for its use, consider the following project ideas:
- **Web Crawler**: Implement a multi-threaded web crawler that can fetch pages from multiple websites concurrently.
- **File Downloader**: Create a tool that downloads files from multiple sources simultaneously using threading.
- **Data Processing Pipeline**: Build a data processing application that uses threads to perform different stages of data processing (e.g., fetching, transforming, saving) in parallel.


Threading is a powerful tool in a Python developer's arsenal, allowing for more efficient and responsive applications. By understanding its benefits and potential pitfalls, developers can leverage threading to optimize their projects, particularly those involving I/O-bound tasks. With practice and continued learning, threading can be effectively utilized to tackle a wide range of programming challenges.