# 🚀 Mini-Project: Server Log Analyzer

Welcome to your first mini-project! The goal is to build a tool that reads a server log file, analyzes its contents, and prints a summary report. This project will test your skills with file I/O, functions, dictionaries, and error handling.

We will build this project in a **modular** way, creating a specific function for each task.

--- 
### Step 0: Create a Sample Log File

First, we need some data to analyze. Run the cell below to create a sample `server.log` file. This file simulates a web server's access log, where each line contains an IP address, a timestamp, the request made, a status code, and the size of the response.

**Log Format:** `IP_ADDRESS - [TIMESTAMP] "REQUEST" STATUS_CODE SIZE`

In [None]:
%%writefile server.log
192.168.1.1 - [02/Oct/2025:15:50:11] "GET /index.html" 200 1500
192.168.1.2 - [02/Oct/2025:15:50:12] "GET /about.html" 200 1200
192.168.1.1 - [02/Oct/2025:15:50:15] "GET /contact.html" 200 1350
192.168.1.3 - [02/Oct/2025:15:51:20] "GET /products/" 404 250
192.168.1.1 - [02/Oct/2025:15:51:22] "POST /login" 200 1800
INVALID LOG ENTRY
192.168.1.2 - [02/Oct/2025:15:52:01] "GET /images/logo.png" 200 3500
192.168.1.4 - [02/Oct/2025:15:52:05] "GET /secret-data/" 403 200
192.168.1.2 - [02/Oct/2025:15:53:10] "GET /about.html" 200 1200

--- 
### Step 1: Create a Function to Read the File

Our first module will be a function that handles opening and reading the log file. This keeps our file I/O logic separate from our analysis logic.

**Your Task:**
Create a function called `read_log_file(filepath)` that:
1.  Takes one argument: `filepath` (the path to the log file).
2.  Uses a **`try-except`** block to handle a `FileNotFoundError`.
3.  If the file is found, it should read all the lines and return them as a **list of strings**.
4.  If the file is not found, it should print an error message and return an empty list `[]`.

In [None]:
# Write the read_log_file function here


# --- Test your function ---
log_lines = read_log_file('server.log')
if log_lines:
    print(f"Successfully read {len(log_lines)} lines.")

# Test the error handling
non_existent_lines = read_log_file('non_existent_log.txt')
print(f"Reading a non-existent file returned: {non_existent_lines}")

--- 
### Step 2: Create a Function to Parse a Log Line

Now we need a function that can take a single log line (a string) and extract the important information from it. We want to get the IP address and the status code.

**Your Task:**
Create a function called `parse_log_line(line)` that:
1.  Takes one argument: `line` (a string from the log file).
2.  Uses a **`try-except`** block to handle lines that don't fit the expected format (like the `"INVALID LOG ENTRY"` line).
3.  If the line is valid, it should split the string to extract the **IP address** (the first part) and the **status code** (the second-to-last part).
4.  The function should return a **tuple** containing the IP address and the status code, like `('192.168.1.1', '200')`.
5.  If the line is invalid, the `except` block should catch the error (e.g., an `IndexError`) and the function should return `None`.

In [None]:
# Write the parse_log_line function here


# --- Test your function ---
valid_line = '192.168.1.1 - [02/Oct/2025:15:50:11] "GET /index.html" 200 1500'
invalid_line = 'INVALID LOG ENTRY'

print(f"Parsed valid line: {parse_log_line(valid_line)}")
print(f"Parsed invalid line: {parse_log_line(invalid_line)}")

--- 
### Step 3: Create a Function to Analyze the Data

This is the core of our project. This function will take the list of log lines and use our parsing function to count the number of hits from each IP address and the occurrences of each status code.

**Your Task:**
Create a function called `analyze_log_data(log_lines)` that:
1.  Takes one argument: `log_lines` (the list of strings from our first function).
2.  Initializes two empty **dictionaries**: `ip_counts` and `status_counts`.
3.  **Loops** through each `line` in `log_lines`.
4.  Inside the loop, calls your `parse_log_line()` function for each line.
5.  If the parsed result is not `None`, it should update the dictionaries. For `ip_counts`, if the IP is already a key, increment its value; otherwise, add it with a value of 1. Do the same for `status_counts`.
6.  The function should return both dictionaries.

In [None]:
# Write the analyze_log_data function here


# --- Test your function ---
test_lines = [
    '192.168.1.1 - [TIMESTAMP] "REQUEST" 200 1500',
    '192.168.1.2 - [TIMESTAMP] "REQUEST" 404 250',
    '192.168.1.1 - [TIMESTAMP] "REQUEST" 200 1800',
    'INVALID LINE'
]
ip_hits, status_codes = analyze_log_data(test_lines)
print(f"IP Counts: {ip_hits}")
print(f"Status Codes: {status_codes}")

--- 
### Step 4: Create a Function to Generate the Report

Finally, we need a function to present our analysis in a clean, readable format.

**Your Task:**
Create a function called `generate_report(ip_counts, status_counts)` that:
1.  Takes two arguments: the `ip_counts` and `status_counts` dictionaries.
2.  Prints a formatted report title, like `"--- Log Analysis Report ---"`.
3.  Prints the top IP addresses by looping through the `ip_counts` dictionary.
4.  Prints the count for each status code by looping through the `status_counts` dictionary.

In [None]:
# Write the generate_report function here


# --- Test your function ---
test_ips = {'192.168.1.1': 3, '192.168.1.2': 2}
test_statuses = {'200': 4, '404': 1}
generate_report(test_ips, test_statuses)

--- 
### Step 5: Putting It All Together 🧩

Now, let's create a main script that calls all our modular functions in the correct order to run the complete analysis.

**Your Task:**
1.  Define a variable `log_file_path` with the value `'server.log'`.
2.  Call `read_log_file()` to get the lines from the file.
3.  If the lines are not empty, pass them to `analyze_log_data()` to get the two count dictionaries.
4.  Finally, pass those two dictionaries to `generate_report()` to print the final summary.

In [None]:
# Main script execution

def main():
    log_file_path = 'server.log'
    
    # Step 1: Read the file
    lines = read_log_file(log_file_path)
    
    # Step 2 & 3: Analyze and Generate Report if file was read successfully
    if lines:
        ip_counts, status_counts = analyze_log_data(lines)
        generate_report(ip_counts, status_counts)

# Run the main function
main()