### **Introduction to Python Logging**

The logging module in Python provides a flexible framework for emitting log messages from your code. Logs are essential for understanding and debugging your program, especially in production environments or when you're working with complex systems like web scraping.

---

#### **Why Use Logging?**
1. **Debugging:** Helps in tracking program execution without cluttering the code with `print()` statements.
2. **Persistence:** Logs can be saved to a file, enabling analysis after the program finishes.
3. **Control:** You can set logging levels to filter messages based on their importance.
4. **Structured Output:** With proper configuration, logs can include timestamps, severity levels, and more.

---

#### **Basic Concepts in Logging**
1. **Loggers:** The main entry point for logging. You can think of them as entities that emit log messages.
2. **Handlers:** Define where the log messages go (console, file, etc.).
3. **Levels:** Determine the severity of a log message. Common levels are:
   - `DEBUG`: Detailed information for diagnosing problems.
   - `INFO`: Confirmation that things are working as expected.
   - `WARNING`: An indication of something unexpected or an issue that isn’t critical yet.
   - `ERROR`: A serious problem that prevents the program from continuing.
   - `CRITICAL`: A very serious error, often indicating a program crash.

---

#### **Basic Logging Example**

Here’s how to get started with Python's logging module:

```python
import logging

# Set up a basic logger
logging.basicConfig(
    level=logging.DEBUG,  # Set the minimum logging level
    format='%(asctime)s - %(levelname)s - %(message)s'  # Define the log message format
)

# Example log messages
logging.debug("This is a debug message. Used for detailed diagnostic output.")
logging.info("This is an info message. Indicates the program is running as expected.")
logging.warning("This is a warning message. Something unexpected happened.")
logging.error("This is an error message. A problem occurred.")
logging.critical("This is a critical message. A serious error happened.")
```

---

#### **Output Explanation**
When you run the code, you'll see output like this:

```
2024-12-23 14:23:01,123 - DEBUG - This is a debug message. Used for detailed diagnostic output.
2024-12-23 14:23:01,124 - INFO - This is an info message. Indicates the program is running as expected.
2024-12-23 14:23:01,125 - WARNING - This is a warning message. Something unexpected happened.
2024-12-23 14:23:01,126 - ERROR - This is an error message. A problem occurred.
2024-12-23 14:23:01,127 - CRITICAL - This is a critical message. A serious error happened.
```

- **Timestamp:** Indicates when the log was recorded.
- **Log Level:** Shows the severity of the log message.
- **Message:** The custom message provided.

---

#### **Key Functions**
1. **`logging.basicConfig()`**: Sets up the configuration for logging.
2. **Logging methods:** These emit messages with a severity level:
   - `logging.debug()`
   - `logging.info()`
   - `logging.warning()`
   - `logging.error()`
   - `logging.critical()`

In [3]:
import time
import random
import logging

In [4]:
""" 
Example of scraping process
"""
def start_scraping():
    # Scraping page 1 to 10
    for i in range(1, 11):
        print(f"Scraping page {i}")
        time.sleep(1)
        # Getting page response
        print("Scraping completed successfully.")

start_scraping()

Scraping page 1
Scraping completed successfully.
Scraping page 2
Scraping completed successfully.
Scraping page 3
Scraping completed successfully.
Scraping page 4
Scraping completed successfully.
Scraping page 5
Scraping completed successfully.
Scraping page 6
Scraping completed successfully.
Scraping page 7
Scraping completed successfully.
Scraping page 8
Scraping completed successfully.
Scraping page 9
Scraping completed successfully.
Scraping page 10
Scraping completed successfully.


In [15]:
"""
Objective: Understand the basics of Python's logging module and why it's important. 
Logging helps you monitor your program's behavior and debug issues without relying on print statements.
"""

# TODO:
# 1. Set up a basic logger that logs messages at the INFO level.
# 2. Replace the start_scraping print statement with logging message.
# 3. Log the following messages with info level:
#    - "Scraping page " following by the page number
#    - "Scraping completed successfully."
import logging
import time 

logging.basicConfig(level=logging.INFO, force=True)

def start_scraping():
    # Scraping page 1 to 10
    for i in range(1, 11):
        logging.info(f"Scraping page {i}")
        time.sleep(1)
        # Getting page response
        logging.info("Scraping completed successfully.")

start_scraping()

INFO:root:Scraping page 1
INFO:root:Scraping completed successfully.
INFO:root:Scraping page 2
INFO:root:Scraping completed successfully.
INFO:root:Scraping page 3
INFO:root:Scraping completed successfully.
INFO:root:Scraping page 4
INFO:root:Scraping completed successfully.
INFO:root:Scraping page 5
INFO:root:Scraping completed successfully.
INFO:root:Scraping page 6
INFO:root:Scraping completed successfully.
INFO:root:Scraping page 7
INFO:root:Scraping completed successfully.
INFO:root:Scraping page 8
INFO:root:Scraping completed successfully.
INFO:root:Scraping page 9
INFO:root:Scraping completed successfully.
INFO:root:Scraping page 10
INFO:root:Scraping completed successfully.


In [17]:
""" 
Objective: Setup different logs level
"""

# TODO: Setup a logger that only log error messages

logging.basicConfig(level=logging.ERROR, force=True)

def scraping_with_error_response():
    response_code = [200, 200, 200, 200, 200, 404, 503]

    # Scraping page 1 to 10
    for i in range(1,11):
        # TODO: Add log message for tracking page number

        time.sleep(1)

        # Getting page response
        response = random.choice(response_code)
        
        if response == 200:   
            # TODO: Add log message for valid response
            logging.info("valid response")

        else:
            # TODO: Add log message for invalid response
            logging.error("invalid response")


scraping_with_error_response()

ERROR:root:invalid response
ERROR:root:invalid response


In [23]:
"""
Objective: Learn to configure logging to log messages to a file for persistent records. 
This is useful for analyzing scraping sessions or debugging after the program runs.
"""

# TODO:
# 1. Configure logging to log messages at the DEBUG level to a file named `scraper.log`.
# 2. Add timestamps to the log messages.
# 3. Use previous function for this task

logging.basicConfig(
    level=logging.DEBUG,
    filename="scraper.log",
    force=True, 
    format="[%(asctime)s] %(levelname)s \t- %(message)s",
    datefmt="%Y-%m-%d %H:%M:%S"
)

def scraping_with_error_response():
    response_code = [200, 200, 200, 200, 200, 404, 503]

    # Scraping page 1 to 10
    for i in range(1,11):
        # TODO: Add log message for tracking page number

        time.sleep(1)

        # Getting page response
        response = random.choice(response_code)
        
        if response == 200:   
            # TODO: Add log message for valid response
            logging.info(f"valid response, response code: {response}")

        else:
            # TODO: Add log message for invalid response
            logging.error(f"invalid response, response code: {response}")


scraping_with_error_response()


In [None]:
"""
Objective: Apply logging to a full scraping workflow and use different logging levels for various stages.
This will help you monitor and troubleshoot scraping operations more effectively.
"""

# TODO:
# 1. Write a script that:
#    - Logs INFO when scraping starts.
#    - Logs DEBUG for each URL being processed.
#    - Logs ERROR if a request fails.
#    - Logs INFO when scraping ends.
# 2. Scrape data from multiple URLs, including one invalid URL to test the error logging.

import logging
import requests

logging.basicConfig(
    level=logging.DEBUG, 
    force=True, 
    filename="scrape_link.log",
    datefmt="%Y-%m-%d %H:%M:%S",
    format="[%(asctime)s] %(levelname)s \t- %(message)s"
)

def scrape_links(links):
    logging.info("Scraping started.")
    
    for link in links:
        logging.debug(f"Processing link: {link}")
        try:
            response = requests.get(link)
            response.raise_for_status()
            logging.info(f"Successfully scraped link: {link}")
        except requests.exceptions.RequestException as e:
            logging.error(f"Failed to scrape link: {link} - {e}")

    logging.info("Scraping completed.")

links = [
    "https://www.google.com",  # Valid
    "https://www.github.com",  # Valid
    "https://www.python.org",  # Valid
    "https://www.openai.com",  # Valid
    "https://example.com",  # Valid
    "htp://invalid-url.com",  # Invalid (typo in "http")
    "https://invalid domain.com",  # Invalid (space in domain)
    "ftp://files.example.com",  # Valid but not an HTTP link
    "http://256.256.256.256",  # Invalid (IP out of range)
    "https://thiswebsitedoesnotexist.xyz"  # Likely Invalid (random domain)
]

scrape_links(links)

In [37]:
"""
Objective: Explore advanced logging by using custom handlers to log messages to multiple destinations. 
This technique improves flexibility in handling log output.
"""
# Create handlers
console_handler = logging.StreamHandler() # This will shows log message in the console
console_handler.setLevel(logging.DEBUG)

# TODO: 
# 1. Create another handler for storing log in a file using logging.FileHandler('error.log')
# 2. Set the level to DEBUG
file_handler = logging.FileHandler('error.log')
file_handler.setLevel(logging.DEBUG)

# Create formatter
formatter = logging.Formatter('%(asctime)s - %(levelname)s - %(message)s')

# Attach formatter to handlers
console_handler.setFormatter(formatter)
# TODO: Add formatter to the file handler
file_handler.setFormatter(formatter)


# Create logger and attach handlers
logger = logging.getLogger('ScraperLogger')
logger.setLevel(logging.DEBUG)
logger.addHandler(console_handler) # Attach stream handler into the logger object
# TODO: Attach the file handler into the logger object
logger.addHandler(file_handler)


In [38]:
""" 
Objective: Handling failed requests using logging
"""
# TODO:
# 1. Create a function that loop through number and get the random response,
# just like previous code but modify it as you like
# 2. Handle stream log in the console and the error log in a file
# 3. Provide a file that contains all of failed URL so you can retry again
# 4. Automate the process (optional)

def scrape_links(links):
    logger.info("Scraping started.")
    
    for link in links:
        logger.debug(f"Processing link: {link}")
        try:
            response = requests.get(link)
            response.raise_for_status()
            logger.info(f"Successfully scraped link: {link}")
        except requests.exceptions.RequestException as e:
            logger.error(f"Failed to scrape link: {link} - {e}")

    logger.info("Scraping completed.")

links = [
    "https://www.google.com",  # Valid
    "https://www.github.com",  # Valid
    "https://www.python.org",  # Valid
    "https://www.openai.com",  # Valid
    "https://example.com",  # Valid
    "htp://invalid-url.com",  # Invalid (typo in "http")
    "https://invalid domain.com",  # Invalid (space in domain)
    "ftp://files.example.com",  # Valid but not an HTTP link
    "http://256.256.256.256",  # Invalid (IP out of range)
    "https://thiswebsitedoesnotexist.xyz"  # Likely Invalid (random domain)
]

scrape_links(links)

2025-03-15 13:15:46,952 - INFO - Scraping started.
2025-03-15 13:15:46,952 - INFO - Scraping started.
2025-03-15 13:15:46,952 - INFO - Scraping started.
2025-03-15 13:15:46,952 - INFO - Scraping started.
2025-03-15 13:15:46,952 - INFO - Scraping started.
2025-03-15 13:15:46,952 - INFO - Scraping started.
2025-03-15 13:15:46,952 - INFO - Scraping started.
2025-03-15 13:15:46,952 - INFO - Scraping started.
2025-03-15 13:15:46,952 - INFO - Scraping started.
2025-03-15 13:15:46,952 - INFO - Scraping started.
2025-03-15 13:15:46,967 - DEBUG - Processing link: https://www.google.com
2025-03-15 13:15:46,967 - DEBUG - Processing link: https://www.google.com
2025-03-15 13:15:46,967 - DEBUG - Processing link: https://www.google.com
2025-03-15 13:15:46,967 - DEBUG - Processing link: https://www.google.com
2025-03-15 13:15:46,967 - DEBUG - Processing link: https://www.google.com
2025-03-15 13:15:46,967 - DEBUG - Processing link: https://www.google.com
2025-03-15 13:15:46,967 - DEBUG - Processing l

### **Reflection**
In what situation logging will help you a lot?

to debugging, troubleshooting and monitoring app performance

### **Exploration**
Explore advanced log and monitoring tools like:
- Loguru
- Loggly
- Datadog