## Assignment: Log Analysis Script

### **Objective**

## The goal of this assignment is to assess your ability to write a Python script that processes log files to extract and analyze key information. This assignment evaluates your proficiency in **file handling**, **string manipulation**, and **data analysis**, which are essential skills for cybersecurity-related programming tasks.

# Step 1: Import Required Libraries

In [3]:
import re
import csv
from collections import defaultdict

# Step 2: Define Functions
 # 1. Read Log File
 # This function reads the log file and returns its content line by line.

In [6]:
def read_log_file(file_path):
    with open(file_path, 'r') as file:
        return file.readlines()


In [8]:
def count_requests_per_ip(log_lines):
    ip_counts = defaultdict(int)
    ip_pattern = r'^(\d+\.\d+\.\d+\.\d+)'
    for line in log_lines:
        match = re.match(ip_pattern, line)
        if match:
            ip_counts[match.group(1)] += 1
    return sorted(ip_counts.items(), key=lambda x: x[1], reverse=True)


In [10]:
def most_accessed_endpoint(log_lines):
    endpoint_counts = defaultdict(int)
    endpoint_pattern = r'"[A-Z]+\s(\/\S+)\sHTTP'
    for line in log_lines:
        match = re.search(endpoint_pattern, line)
        if match:
            endpoint_counts[match.group(1)] += 1
    return max(endpoint_counts.items(), key=lambda x: x[1])


In [25]:
def detect_suspicious_activity(log_lines, threshold=10):
    """
    Identifies IP addresses with failed login attempts exceeding a given threshold.
    """
    failed_attempts = defaultdict(int)
    # Updated regex to ensure it captures the failure cases properly
    failed_login_pattern = r'^(\d+\.\d+\.\d+\.\d+).*"POST\s+/login.*"\s401'

    for line in log_lines:
        match = re.match(failed_login_pattern, line)
        if match:
            failed_attempts[match.group(1)] += 1

    # Filter IPs exceeding the threshold
    return {ip: count for ip, count in failed_attempts.items() if count >= threshold}



In [27]:
def save_results_to_csv(ip_requests, endpoint, suspicious_ips, output_file='log_analysis_results.csv'):
    with open(output_file, 'w', newline='') as csvfile:
        writer = csv.writer(csvfile)
        
        # Write Requests per IP
        writer.writerow(['Requests per IP'])
        writer.writerow(['IP Address', 'Request Count'])
        for ip, count in ip_requests:
            writer.writerow([ip, count])
        
        # Write Most Accessed Endpoint
        writer.writerow([])
        writer.writerow(['Most Accessed Endpoint'])
        writer.writerow(['Endpoint', 'Access Count'])
        writer.writerow([endpoint[0], endpoint[1]])
        
        # Write Suspicious Activity
        writer.writerow([])
        writer.writerow(['Suspicious Activity Detected'])
        writer.writerow(['IP Address', 'Failed Login Attempts'])
        for ip, count in suspicious_ips.items():
            writer.writerow([ip, count])


# Step 3: Main Script
 # The main script ties everything together.

In [30]:
def main():
    log_file_path = 'sample.log'
    
    # Read log file
    log_lines = read_log_file(log_file_path)
    
    # Perform analysis
    ip_requests = count_requests_per_ip(log_lines)
    endpoint = most_accessed_endpoint(log_lines)
    suspicious_ips = detect_suspicious_activity(log_lines, threshold=5)  # Set threshold to 5 for testing
    
    # Display results
    print("Requests per IP:")
    for ip, count in ip_requests:
        print(f"{ip:<20}{count}")
    
    print("\nMost Frequently Accessed Endpoint:")
    print(f"{endpoint[0]} (Accessed {endpoint[1]} times)")
    
    print("\nSuspicious Activity Detected:")
    if suspicious_ips:
        for ip, count in suspicious_ips.items():
            print(f"{ip:<20}{count}")
    else:
        print("No suspicious activity detected.")
    
    # Save to CSV
    save_results_to_csv(ip_requests, endpoint, suspicious_ips)
    print("\nResults saved to 'log_analysis_results.csv'.")

if __name__=="__main__":
    main()

Requests per IP:
203.0.113.5         8
198.51.100.23       8
192.168.1.1         7
10.0.0.2            6
192.168.1.100       5

Most Frequently Accessed Endpoint:
/login (Accessed 13 times)

Suspicious Activity Detected:
203.0.113.5         8
192.168.1.100       5

Results saved to 'log_analysis_results.csv'.
