
# Lab 9: Build a Log Aggregator

In this lab, you will create your own log generator, build a command-line utility that scans log files, summarizes their contents, and provides insight into system behavior. Data structures to track log message levels such as `INFO`, `WARNING`, `ERROR`, and `CRITICAL`.

This lab reinforces:
- File I/O
- Pattern recognition (regex)
- Dictionaries and counters
- Functions and modularity
- CLI arguments, logging



## Part 1: Create Log files (20%)
Using the the following example log format below create a **python file** that will log errors In a structured tree format 

You will find examples in the folder called Logs that you can use to build your program.

Remember set of logs should have a varied levels of log entries (`INFO`, `WARNING`, `ERROR`, `CRITICAL`) and tailored message types for different service components.
You must create 5 structured logs here are some examples:

    sqldb
    ui
    frontend.js
    backend.js
    frontend.flask
    backend.flask

You may use chat GPT to create sample outputs NOT THE LOGS. IE:

    System failure
    Database corruption
    Disk failure detected
    Database corruption


In [9]:
# Paste your python file here 
# don't forget to upload it with your submission

import logging
import os
import random
from datetime import datetime, timedelta

# Create a folder for logs if not exists
os.makedirs("Logs", exist_ok=True)

# Components to simulate logs for
components = ["sqldb", "ui", "frontend.js", "backend.js", "frontend.flask", "backend.flask"]

# Possible messages
messages = [
    "System failure", "Database corruption", "Disk failure detected",
    "Unhandled exception", "Service timeout", "User login success",
    "Configuration loaded", "Cache miss", "API response delayed",
    "Memory leak detected"
]

# Log levels
levels = [logging.INFO, logging.WARNING, logging.ERROR, logging.CRITICAL]

# Create separate loggers for each component
for component in components:
    logger = logging.getLogger(component)
    logger.setLevel(logging.DEBUG)

    handler = logging.FileHandler(f"Logs/{component}.log")
    formatter = logging.Formatter("%(asctime)s | %(name)s | %(levelname)s | %(message)s")
    handler.setFormatter(formatter)
    logger.addHandler(handler)

    # Simulate writing 50 log entries per component
    current_time = datetime.now()
    for _ in range(50):
        # Randomize message
        message = random.choice(messages)
        level = random.choice(levels)

        # Log it
        logger.log(level, message)

        # Artificially advance time a bit for realism
        current_time += timedelta(seconds=random.randint(1, 120))

    logger.handlers.clear()  # Clean up handlers so it doesn't duplicate logs if rerun

print("Log files successfully created in 'Logs' folder!")


Log files successfully created in 'Logs' folder!



### Example Log Format

You will work with logs that follow this simplified structure:

```
2025-04-11 23:20:36,913 | my_app | INFO | Request completed
2025-04-11 23:20:36,914 | my_app.utils | ERROR | Unhandled exception
2025-04-11 23:20:36,914 | my_app.utils.db | CRITICAL | Disk failure detected
```


## Part 2: Logging the Log File (40%)
    New File
### Part 2a: Read the Log File (see lab 7) (10%)


Write a function to read the contents of a log file into a list of lines. Handle file errors gracefully.

### Part 2b: Parse Log Lines (see code below if you get stuck) (10%)

Use a regular expression to extract:
- Timestamp
- Log name
- Log level
- Message

### Part 2c: Count Log Levels (20%)

Create a function to count how many times each log level appears. Store the results in a dictionary. Then output it as a Json File
You may pick your own format but here is an example. 
```python
{
    "INFO": 
    {
        "Request completed": 42, 
        "Heartbeat OK": 7
    }

    "WARNING":
    {
        ...
    }
}

```


In [10]:
# Paste your python file here 
# don't forget to upload it with your submission

def read_log_file(file_path):
    """Reads a log file and returns a list of lines."""
    try:
        with open(file_path, 'r') as file:
            return file.readlines()
    except FileNotFoundError:
        print(f"Error: The file {file_path} was not found.")
        return []
    except Exception as e:
        print(f"An error occurred: {e}")
        return []



In [11]:
import json

def count_log_levels(log_lines):
    """
    Counts how many times each message appears for each log level.
    """
    log_summary = {}

    for line in log_lines:
        timestamp, logger_name, log_level, message = parse_log_line(line)
        if log_level and message:
            if log_level not in log_summary:
                log_summary[log_level] = {}
            if message not in log_summary[log_level]:
                log_summary[log_level][message] = 0
            log_summary[log_level][message] += 1

    return log_summary

def save_summary_to_json(summary, output_file):
    """Saves the log summary dictionary into a JSON file."""
    with open(output_file, 'w') as file:
        json.dump(summary, file, indent=4)
# Set the path to the log file
log_file_path = "Logs/your_log_file.log"  # Update the filename if needed

# 2a: Read the file
log_lines = read_log_file(log_file_path)

# 2b + 2c: Parse lines and count
summary = count_log_levels(log_lines)

# Save to JSON
save_summary_to_json(summary, "log_summary.json")

print("Log summary saved to 'log_summary.json'!")


Error: The file Logs/your_log_file.log was not found.
Log summary saved to 'log_summary.json'!



## Step 3: Generate Summary Report (40%)
    New File
### Step 3a (20%):
 Develop a function that continuously monitors your JSON file(s) and will print a real-time summary of log activity. It should keep count of the messages grouped by log level (INFO, WARNING, ERROR, CRITICAL) and display only the critical messages. (I.e. If new data comes in the summary will change and a new critical message will be printed)
 - note: do not reprocess the entire file on each update.  

### Step 3a: Use a Matplotlib (Lecture 10) (20%)
Develop a function that continuously monitors your JSON file(s) and will graph in real-time a bar or pie plot of each of the errors.  (a graph for each log level). 
- The graph should show the distribution of log messages by level  (INFO, WARNING, ERROR, CRITICAL)  


### Critical notes:
- Your code mus use Daemon Threads (Lecture 14)
- 3a and 3b do not need to run at the same time. 


In [12]:
# Paste your python file here 
# don't forget to upload it with your submission

import threading
import json
import time
import os

def monitor_json_file(file_path):
    """Continuously monitors the JSON file and prints critical messages in real time."""
    last_size = 0
    processed_critical = set()

    while True:
        try:
            current_size = os.path.getsize(file_path)
            if current_size != last_size:
                last_size = current_size

                with open(file_path, 'r') as file:
                    data = json.load(file)

                print("\nUpdated Log Summary:")
                for level, messages in data.items():
                    total = sum(messages.values())
                    print(f"{level}: {total} messages")

                    if level == "CRITICAL":
                        for message, count in messages.items():
                            if message not in processed_critical:
                                print(f"New CRITICAL message detected: {message} (Count: {count})")
                                processed_critical.add(message)
            time.sleep(2)  # check every 2 seconds
        except Exception as e:
            print(f"Error monitoring JSON file: {e}")
            time.sleep(2)

def start_monitoring(file_path):
    monitor_thread = threading.Thread(target=monitor_json_file, args=(file_path,), daemon=True)
    monitor_thread.start()


In [13]:
# Here is a sample regex that parses a log file and extracts relevant information. 
# you will need to modify it. Review Lecture 11
import re


def parse_log_line(line):
    """
    Parses a log line into timestamp, logger name, log level, and message.
    Example format expected: '2023-04-27 10:05:01,123 - myLogger - INFO - Request completed'
    """
    log_pattern = r"^(.*?)\s\|\s(\w+)\s\|\s(\w+)\s\|\s(.*)$"
    match = re.match(log_pattern, line)
    if match:
        timestamp, logger_name, log_level, message = match.groups()
        return timestamp, logger_name, log_level, message
    else:
        return None, None, None, None  # In case the line is not formatted correctly

