### **Introduction to Python Logging**

The logging module in Python provides a flexible framework for emitting log messages from your code. Logs are essential for understanding and debugging your program, especially in production environments or when you're working with complex systems like web scraping.

---

#### **Why Use Logging?**
1. **Debugging:** Helps in tracking program execution without cluttering the code with `print()` statements.
2. **Persistence:** Logs can be saved to a file, enabling analysis after the program finishes.
3. **Control:** You can set logging levels to filter messages based on their importance.
4. **Structured Output:** With proper configuration, logs can include timestamps, severity levels, and more.

---

#### **Basic Concepts in Logging**
1. **Loggers:** The main entry point for logging. You can think of them as entities that emit log messages.
2. **Handlers:** Define where the log messages go (console, file, etc.).
3. **Levels:** Determine the severity of a log message. Common levels are:
   - `DEBUG`: Detailed information for diagnosing problems.
   - `INFO`: Confirmation that things are working as expected.
   - `WARNING`: An indication of something unexpected or an issue that isn’t critical yet.
   - `ERROR`: A serious problem that prevents the program from continuing.
   - `CRITICAL`: A very serious error, often indicating a program crash.

---

#### **Basic Logging Example**

Here’s how to get started with Python's logging module:

```python
import logging

# Set up a basic logger
logging.basicConfig(
    level=logging.DEBUG,  # Set the minimum logging level
    format='%(asctime)s - %(levelname)s - %(message)s'  # Define the log message format
)

# Example log messages
logging.debug("This is a debug message. Used for detailed diagnostic output.")
logging.info("This is an info message. Indicates the program is running as expected.")
logging.warning("This is a warning message. Something unexpected happened.")
logging.error("This is an error message. A problem occurred.")
logging.critical("This is a critical message. A serious error happened.")
```

---

#### **Output Explanation**
When you run the code, you'll see output like this:

```
2024-12-23 14:23:01,123 - DEBUG - This is a debug message. Used for detailed diagnostic output.
2024-12-23 14:23:01,124 - INFO - This is an info message. Indicates the program is running as expected.
2024-12-23 14:23:01,125 - WARNING - This is a warning message. Something unexpected happened.
2024-12-23 14:23:01,126 - ERROR - This is an error message. A problem occurred.
2024-12-23 14:23:01,127 - CRITICAL - This is a critical message. A serious error happened.
```

- **Timestamp:** Indicates when the log was recorded.
- **Log Level:** Shows the severity of the log message.
- **Message:** The custom message provided.

---

#### **Key Functions**
1. **`logging.basicConfig()`**: Sets up the configuration for logging.
2. **Logging methods:** These emit messages with a severity level:
   - `logging.debug()`
   - `logging.info()`
   - `logging.warning()`
   - `logging.error()`
   - `logging.critical()`

In [2]:
import time
import random
import logging

In [3]:
""" 
Example of scraping process
"""
def start_scraping():
    # Scraping page 1 to 10
    for i in range(1, 11):
        print(f"Scraping page {i}")
        time.sleep(1)
        # Getting page response
        print("Scraping completed successfully.")

start_scraping()

Scraping page 1
Scraping completed successfully.
Scraping page 2
Scraping completed successfully.
Scraping page 3
Scraping completed successfully.
Scraping page 4
Scraping completed successfully.
Scraping page 5
Scraping completed successfully.
Scraping page 6
Scraping completed successfully.
Scraping page 7
Scraping completed successfully.
Scraping page 8
Scraping completed successfully.
Scraping page 9
Scraping completed successfully.
Scraping page 10
Scraping completed successfully.


In [4]:
"""
Objective: Understand the basics of Python's logging module and why it's important. 
Logging helps you monitor your program's behavior and debug issues without relying on print statements.
"""

# TODO:
# 1. Set up a basic logger that logs messages at the INFO level.
# 2. Replace the start_scraping print statement with logging message.
# 3. Log the following messages with info level:
#    - "Scraping page " following by the page number
#    - "Scraping completed successfully."

# Set up basic logger
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s'
)

def start_scraping():
    # Scraping page 1 to 10
    for i in range(1, 11):
        logging.info(f"Scraping page {i}")
        time.sleep(1)
        # Getting page response
        logging.info("Scraping completed successfully.")

start_scraping()


2025-03-09 15:57:02,297 - INFO - Scraping page 1
2025-03-09 15:57:03,300 - INFO - Scraping completed successfully.
2025-03-09 15:57:03,301 - INFO - Scraping page 2
2025-03-09 15:57:04,302 - INFO - Scraping completed successfully.
2025-03-09 15:57:04,305 - INFO - Scraping page 3
2025-03-09 15:57:05,308 - INFO - Scraping completed successfully.
2025-03-09 15:57:05,310 - INFO - Scraping page 4
2025-03-09 15:57:06,314 - INFO - Scraping completed successfully.
2025-03-09 15:57:06,315 - INFO - Scraping page 5
2025-03-09 15:57:07,317 - INFO - Scraping completed successfully.
2025-03-09 15:57:07,323 - INFO - Scraping page 6
2025-03-09 15:57:08,327 - INFO - Scraping completed successfully.
2025-03-09 15:57:08,328 - INFO - Scraping page 7
2025-03-09 15:57:09,330 - INFO - Scraping completed successfully.
2025-03-09 15:57:09,334 - INFO - Scraping page 8
2025-03-09 15:57:10,342 - INFO - Scraping completed successfully.
2025-03-09 15:57:10,346 - INFO - Scraping page 9
2025-03-09 15:57:11,349 - INFO 

In [6]:
""" 
Objective: Setup different logs level
"""

# TODO: Setup a logger that only log error messages
logging.basicConfig(
    level=logging.ERROR,
    format='%(asctime)s - %(levelname)s - %(message)s'
)


def scraping_with_error_response():
    response_code = [200, 200, 200, 200, 200, 404, 503]

    # Scraping page 1 to 10
    for i in range(1,11):
        # TODO: Add log message for tracking page number
        logging.debug(f"Scraping page {i}")

        time.sleep(1)

        # Getting page response
        response = random.choice(response_code)
        
        if response == 200:   
            # TODO: Add log message for valid response
            logging.info(f"Page {i} scraped successfully. Status code: {response}")


        else:
            # TODO: Add log message for invalid response
            logging.error(f"Failed to scrape page {i}. Status code: {response}")



scraping_with_error_response()

2025-03-09 16:04:05,462 - INFO - Page 1 scraped successfully. Status code: 200
2025-03-09 16:04:06,465 - INFO - Page 2 scraped successfully. Status code: 200
2025-03-09 16:04:07,470 - INFO - Page 3 scraped successfully. Status code: 200
2025-03-09 16:04:08,472 - ERROR - Failed to scrape page 4. Status code: 404
2025-03-09 16:04:09,477 - INFO - Page 5 scraped successfully. Status code: 200
2025-03-09 16:04:10,481 - INFO - Page 6 scraped successfully. Status code: 200
2025-03-09 16:04:11,483 - INFO - Page 7 scraped successfully. Status code: 200
2025-03-09 16:04:12,488 - INFO - Page 8 scraped successfully. Status code: 200
2025-03-09 16:04:13,492 - ERROR - Failed to scrape page 9. Status code: 503
2025-03-09 16:04:14,497 - ERROR - Failed to scrape page 10. Status code: 404


In [7]:
"""
Objective: Learn to configure logging to log messages to a file for persistent records. 
This is useful for analyzing scraping sessions or debugging after the program runs.
"""

# TODO:
# 1. Configure logging to log messages at the DEBUG level to a file named `scraper.log`.
# 2. Add timestamps to the log messages.
# 3. Use previous function for this task

# Configure logging to write to file with DEBUG level and timestamps
logging.basicConfig(
    level=logging.DEBUG,
    format='%(asctime)s - %(levelname)s - %(message)s',
    filename='scraper.log',
    filemode='w'
)

def scraping_with_error_response():
    response_code = [200, 200, 200, 200, 200, 404, 503]

    # Scraping page 1 to 10
    for i in range(1,11):
        logging.debug(f"Scraping page {i}")
        time.sleep(1)

        # Getting page response
        response = random.choice(response_code)
        
        if response == 200:   
            logging.info(f"Page {i} scraped successfully")
        else:
            logging.error(f"Failed to scrape page {i}. Status code: {response}")

scraping_with_error_response()


2025-03-09 16:06:02,414 - INFO - Page 1 scraped successfully
2025-03-09 16:06:03,416 - INFO - Page 2 scraped successfully
2025-03-09 16:06:04,418 - ERROR - Failed to scrape page 3. Status code: 404
2025-03-09 16:06:05,423 - ERROR - Failed to scrape page 4. Status code: 503
2025-03-09 16:06:06,427 - INFO - Page 5 scraped successfully
2025-03-09 16:06:07,430 - ERROR - Failed to scrape page 6. Status code: 503
2025-03-09 16:06:08,433 - INFO - Page 7 scraped successfully
2025-03-09 16:06:09,434 - INFO - Page 8 scraped successfully
2025-03-09 16:06:10,436 - INFO - Page 9 scraped successfully
2025-03-09 16:06:11,439 - INFO - Page 10 scraped successfully


In [8]:
"""
Objective: Apply logging to a full scraping workflow and use different logging levels for various stages.
This will help you monitor and troubleshoot scraping operations more effectively.
"""

# TODO:
# 1. Write a script that:
#    - Logs INFO when scraping starts.
#    - Logs DEBUG for each URL being processed.
#    - Logs ERROR if a request fails.
#    - Logs INFO when scraping ends.
# 2. Scrape data from multiple URLs, including one invalid URL to test the error logging.

import logging
import requests
from typing import List

# Configure logging
logging.basicConfig(
    level=logging.DEBUG,
    format='%(asctime)s - %(levelname)s - %(message)s'
)

def scrape_urls(urls: List[str]) -> None:
    logging.info("Starting web scraping process")
    
    for url in urls:
        logging.debug(f"Processing URL: {url}")
        
        try:
            response = requests.get(url)
            response.raise_for_status()
            logging.info(f"Successfully scraped {url}")
            
        except requests.RequestException as e:
            logging.error(f"Failed to scrape {url}: {str(e)}")
            
    logging.info("Web scraping process completed")

# Test URLs (including one invalid)
urls_to_scrape = [
    "https://www.python.org",
    "https://www.github.com",
    "https://this-is-invalid-url.com",  # Invalid URL
    "https://www.wikipedia.org"
]

# Run the scraper
scrape_urls(urls_to_scrape)


2025-03-09 16:07:49,950 - INFO - Starting web scraping process
2025-03-09 16:07:50,034 - INFO - Successfully scraped https://www.python.org
2025-03-09 16:07:50,285 - INFO - Successfully scraped https://www.github.com
2025-03-09 16:07:51,031 - ERROR - Failed to scrape https://this-is-invalid-url.com: HTTPSConnectionPool(host='this-is-invalid-url.com', port=443): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x000001F941921370>: Failed to resolve 'this-is-invalid-url.com' ([Errno 11001] getaddrinfo failed)"))
2025-03-09 16:07:51,138 - INFO - Successfully scraped https://www.wikipedia.org
2025-03-09 16:07:51,140 - INFO - Web scraping process completed


In [11]:
"""
Objective: Explore advanced logging by using custom handlers to log messages to multiple destinations. 
This technique improves flexibility in handling log output.
"""
# Create handlers
console_handler = logging.StreamHandler() # This will shows log message in the console
console_handler.setLevel(logging.DEBUG)

# TODO: 
# 1. Create another handler for storing log in a file using logging.FileHandler('error.log')
# 2. Set the level to DEBUG
file_handler = logging.FileHandler('error.log')
file_handler.setLevel(logging.DEBUG)

# Create formatter
formatter = logging.Formatter('%(asctime)s - %(levelname)s - %(message)s')

# Attach formatter to handlers
console_handler.setFormatter(formatter)
# TODO: Add formatter to the file handler
file_handler.setFormatter(formatter)



# Create logger and attach handlers
logger = logging.getLogger('ScraperLogger')
logger.setLevel(logging.DEBUG)
logger.addHandler(console_handler) # Attach stream handler into the logger object
# TODO: Attach the file handler into the logger object
logger.addHandler(file_handler)


In [12]:
""" 
Objective: Handling failed requests using logging
"""
# TODO:
# 1. Create a function that loop through number and get the random response,
# just like previous code but modify it as you like
# 2. Handle stream log in the console and the error log in a file
# 3. Provide a file that contains all of failed URL so you can retry again
# 4. Automate the process (optional)

import logging
import random
import time
from typing import List

# Configure logging handlers
console_handler = logging.StreamHandler()
console_handler.setLevel(logging.INFO)

file_handler = logging.FileHandler('error.log')
file_handler.setLevel(logging.ERROR)

# Create formatter
formatter = logging.Formatter('%(asctime)s - %(levelname)s - %(message)s')
console_handler.setFormatter(formatter)
file_handler.setFormatter(formatter)

# Setup logger
logger = logging.getLogger('ScrapeLogger')
logger.setLevel(logging.DEBUG)
logger.addHandler(console_handler)
logger.addHandler(file_handler)

def scrape_with_retry(urls: List[str], max_retries: int = 3) -> None:
    failed_urls = []
    
    for url in urls:
        success = False
        retries = 0
        
        while not success and retries < max_retries:
            logger.info(f"Attempting to scrape {url}")
            response = random.choice([200, 404, 503])
            
            if response == 200:
                logger.info(f"Successfully scraped {url}")
                success = True
            else:
                retries += 1
                logger.error(f"Failed to scrape {url} (Status: {response}, Attempt: {retries})")
                time.sleep(1)  # Wait before retry
        
        if not success:
            failed_urls.append(url)
            
    # Save failed URLs to file
    if failed_urls:
        with open('failed_urls.txt', 'w') as f:
            for url in failed_urls:
                f.write(f"{url}\n")
        logger.info(f"Saved {len(failed_urls)} failed URLs to failed_urls.txt")

# Test URLs
test_urls = [
    "https://example1.com",
    "https://example2.com",
    "https://example3.com",
    "https://example4.com",
    "https://example5.com"
]

# Run scraper
scrape_with_retry(test_urls)

2025-03-09 16:12:52,213 - INFO - Attempting to scrape https://example1.com
2025-03-09 16:12:52,213 - INFO - Attempting to scrape https://example1.com
2025-03-09 16:12:52,217 - ERROR - Failed to scrape https://example1.com (Status: 404, Attempt: 1)
2025-03-09 16:12:52,217 - ERROR - Failed to scrape https://example1.com (Status: 404, Attempt: 1)
2025-03-09 16:12:53,224 - INFO - Attempting to scrape https://example1.com
2025-03-09 16:12:53,224 - INFO - Attempting to scrape https://example1.com
2025-03-09 16:12:53,228 - INFO - Successfully scraped https://example1.com
2025-03-09 16:12:53,228 - INFO - Successfully scraped https://example1.com
2025-03-09 16:12:53,233 - INFO - Attempting to scrape https://example2.com
2025-03-09 16:12:53,233 - INFO - Attempting to scrape https://example2.com
2025-03-09 16:12:53,237 - ERROR - Failed to scrape https://example2.com (Status: 503, Attempt: 1)
2025-03-09 16:12:53,237 - ERROR - Failed to scrape https://example2.com (Status: 503, Attempt: 1)
2025-03-

### **Reflection**
In what situation logging will help you a lot?

(answer here)

Logging is particularly helpful in several key situations:

1. Debugging Complex Applications
- Tracking program flow in multi-threaded applications
- Understanding sequence of events leading to errors
- Monitoring state changes in long-running processes
2. Production Environment Monitoring
- Tracking application performance
- Identifying bottlenecks
- Monitoring system health
- Early detection of potential issues
3. Web Scraping Projects
- Tracking successful/failed requests
- Monitoring rate limits
- Recording data extraction progress
- Managing retry mechanisms
4. Distributed Systems
- Tracing requests across multiple services
- Debugging communication between components
- Monitoring system synchronization
5. Data Processing Pipelines
- Tracking progress of long-running jobs
- Monitoring data transformation steps
- Recording processing errors
- Validating data quality
6. User Behavior Analysis
- Recording user interactions
- Tracking feature usage
- Monitoring error patterns
- Analyzing performance impact
7. Automated Systems
- Recording scheduled task execution
- Monitoring background processes
- Tracking automated decisions
- Validating system responses
8. Security Monitoring
- Recording access attempts
- Tracking unauthorized activities
- Monitoring system changes
- Logging security events
The structured nature of logging, combined with different severity levels and persistent storage, makes it invaluable for maintaining and troubleshooting complex systems.

### **Exploration**
Explore advanced log and monitoring tools like:
- Loguru
- Loggly
- Datadog

Here's an exploration of these advanced logging and monitoring tools:

1. Loguru

In [14]:
from loguru import logger

#Install extension Loguru
# pip install loguru

# Basic setup
logger.add("app.log", rotation="500 MB", retention="10 days")

# Rich formatting and easy debugging
@logger.catch
def process_data():
    logger.info("Starting data processing")
    try:
        result = 1 / 0  # Intentional error
    except Exception as e:
        logger.error(f"Processing failed: {e}")
    logger.success("Processing completed")

process_data()

[32m2025-03-09 16:19:59.392[0m | [1mINFO    [0m | [36m__main__[0m:[36mprocess_data[0m:[36m12[0m - [1mStarting data processing[0m
[32m2025-03-09 16:19:59.395[0m | [31m[1mERROR   [0m | [36m__main__[0m:[36mprocess_data[0m:[36m16[0m - [31m[1mProcessing failed: division by zero[0m
[32m2025-03-09 16:19:59.397[0m | [32m[1mSUCCESS [0m | [36m__main__[0m:[36mprocess_data[0m:[36m17[0m - [32m[1mProcessing completed[0m


Key Features:

- Better formatting out of the box
- Automatic exception catching
- Log rotation and retention
- Asynchronous logging
- Structured logging support
2. Loggly (Cloud-based logging service)

In [17]:
import logging
import loggly.handlers

# pip install loggly

# Configure Loggly handler
logger = logging.getLogger('loggly_logger')
logger.setLevel(logging.INFO)

# Replace with your Loggly token
handler = loggly.handlers.HTTPSHandler(
    'YOUR_CUSTOMER_TOKEN.loggly.com/inputs/YOUR_TOKEN/',
    'POST'
)

logger.addHandler(handler)

# Log events
logger.info('Application started')
logger.error('An error occurred', extra={'custom_field': 'value'})

ModuleNotFoundError: No module named 'loggly'

Key Features:

- Centralized log management
- Real-time log aggregation
- Advanced search capabilities
- Custom dashboards
- Alert management
3. Datadog (Full-scale monitoring platform)

In [None]:
from datadog import initialize, statsd
from ddtrace import patch_all

# Initialize the Datadog agent
initialize(
    api_key='YOUR_API_KEY',
    app_key='YOUR_APP_KEY'
)

# Enable automatic instrumentation
patch_all()

# Monitor metrics
def process_request():
    # Track timing
    with statsd.timed('app.request_duration'):
        # Your code here
        pass

    # Track custom metrics
    statsd.increment('app.request_count')
    statsd.gauge('app.memory_usage', 100)

Key Features:

- Infrastructure monitoring
- Application performance monitoring (APM)
- Log management
- Real-time metrics
- Custom dashboards and alerts
- Distributed tracing
Comparison:

1. Loguru: Best for local development and simple applications
2. Loggly: Ideal for cloud-based log aggregation and analysis
3. Datadog: Complete solution for large-scale applications needing comprehensive monitoring
Choose based on:

- Application scale
- Budget constraints
- Monitoring requirements
- Infrastructure complexity
- Team size and expertise