# Lab 04: LLM-Powered Security Log Analysis

Use Large Language Models to analyze security logs and extract actionable threat intelligence.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/depalmar/ai_for_the_win/blob/main/notebooks/lab04_llm_log_analysis.ipynb)

## Learning Objectives
- Multi-source log parsing (Windows Events, Sysmon, Firewall, Proxy, EDR)
- High-volume log processing with noise filtering
- LLM-powered threat interpretation and summarization
- IOC extraction and MITRE ATT&CK mapping
- Attack chain reconstruction
- Automated report generation

## Log Sources Covered

This lab processes logs from real enterprise environments:
- **Windows Security** (4624, 4625, 4648, 4672, 4688, 4697, 4698, 4699)
- **Sysmon** (1-26 event types)
- **Firewall/IDS** (blocked connections, alerts)
- **Proxy/Web** (HTTP requests, DNS queries)
- **EDR** (process telemetry, file events)
- **Cloud** (AWS CloudTrail, Azure Activity)

In [None]:
# Install dependencies (uncomment for Colab)
# !pip install anthropic pandas

In [None]:
import os
import re
import json
from datetime import datetime
from typing import List, Dict
from dataclasses import dataclass

# Set your API key
# os.environ['ANTHROPIC_API_KEY'] = 'your-api-key-here'

## 1. Log Data Structures

In [None]:
@dataclass
class LogEntry:
    """Parsed log entry with enrichment."""
    timestamp: str
    source: str
    event_id: int
    severity: str
    message: str
    raw: str
    host: str = ""
    user: str = ""
    process: str = ""
    parent_process: str = ""
    command_line: str = ""
    src_ip: str = ""
    dst_ip: str = ""
    dst_port: int = 0

# Comprehensive sample security logs simulating a real attack
SAMPLE_LOGS = """
# Initial Access - Phishing Email Delivered
2024-01-15 09:12:15 MAIL messageId=ABC123 from=invoice@malicious.com to=jsmith@company.com subject="Urgent Invoice" attachment=Invoice_Jan2024.docm status=delivered
2024-01-15 09:12:45 PROXY user=jsmith action=download url=https://mail.company.com/attachments/Invoice_Jan2024.docm size=156KB

# Execution - Macro Enabled
2024-01-15 09:15:23 SYSMON EventID=1 Image=C:\\Program Files\\Microsoft Office\\WINWORD.EXE CommandLine="WINWORD.EXE /n Invoice_Jan2024.docm" User=COMPANY\\jsmith ParentImage=explorer.exe
2024-01-15 09:15:45 SYSMON EventID=1 Image=C:\\Windows\\System32\\cmd.exe CommandLine="cmd.exe /c powershell -ep bypass -enc SQBFAFgAIAAoAE4AZQB3AC0ATwBiAGoAZQBjAHQA..." User=COMPANY\\jsmith ParentImage=WINWORD.EXE

# Discovery - Network Enumeration
2024-01-15 09:18:00 SYSMON EventID=1 Image=C:\\Windows\\System32\\net.exe CommandLine="net view /domain" User=COMPANY\\jsmith ParentImage=powershell.exe
2024-01-15 09:18:05 SYSMON EventID=1 Image=C:\\Windows\\System32\\net.exe CommandLine="net group \"Domain Admins\" /domain" User=COMPANY\\jsmith ParentImage=powershell.exe
2024-01-15 09:18:10 SYSMON EventID=1 Image=C:\\Windows\\System32\\nltest.exe CommandLine="nltest /dclist:company.local" User=COMPANY\\jsmith ParentImage=powershell.exe
2024-01-15 09:18:15 SYSMON EventID=1 Image=C:\\Windows\\System32\\net.exe CommandLine="net share" User=COMPANY\\jsmith ParentImage=powershell.exe

# Credential Access - Mimikatz
2024-01-15 09:20:00 SYSMON EventID=1 Image=C:\\Users\\jsmith\\AppData\\Local\\Temp\\m.exe CommandLine="m.exe sekurlsa::logonpasswords" User=COMPANY\\jsmith ParentImage=powershell.exe
2024-01-15 09:20:01 SYSMON EventID=10 SourceImage=m.exe TargetImage=lsass.exe GrantedAccess=0x1010

# Lateral Movement - PSExec
2024-01-15 09:25:00 WINSEC EventID=4624 LogonType=3 TargetUserName=admin TargetDomainName=COMPANY IpAddress=192.168.1.100 WorkstationName=WKS001
2024-01-15 09:25:05 SYSMON EventID=1 Image=C:\\Windows\\PSEXESVC.exe CommandLine="PSEXESVC.exe" User=NT AUTHORITY\\SYSTEM ParentImage=services.exe Host=DC01
2024-01-15 09:25:10 WINSEC EventID=4672 SubjectUserName=admin PrivilegeList=SeDebugPrivilege,SeTakeOwnershipPrivilege Host=DC01

# Defense Evasion - Disable Security
2024-01-15 09:28:00 SYSMON EventID=1 Image=C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe CommandLine="powershell.exe Set-MpPreference -DisableRealtimeMonitoring $true" User=COMPANY\\admin ParentImage=cmd.exe Host=DC01
2024-01-15 09:28:05 EDR action=tampering product=WindowsDefender operation=DisableRealTimeProtection host=DC01 user=admin

# Collection - Data Staging
2024-01-15 09:30:00 SYSMON EventID=1 Image=C:\\Windows\\System32\\7z.exe CommandLine="7z.exe a -p archive.7z C:\\Shares\\Finance\\*" User=COMPANY\\admin ParentImage=cmd.exe Host=DC01
2024-01-15 09:32:00 SYSMON EventID=11 TargetFilename=C:\\Users\\admin\\Desktop\\archive.7z Size=45MB Host=DC01

# Exfiltration - C2 Communication
2024-01-15 09:35:00 FIREWALL action=allow src=192.168.1.50 dst=185.234.72.99 port=443 bytes=47185920 protocol=TLS
2024-01-15 09:35:01 DNS query=cdn-update.evil.com type=A response=185.234.72.99 host=DC01
2024-01-15 09:35:05 PROXY user=admin url=https://cdn-update.evil.com/upload method=POST size=45MB status=200

# Normal Activity (noise)
2024-01-15 09:00:00 WINSEC EventID=4624 LogonType=10 TargetUserName=helpdesk IpAddress=192.168.1.5 WorkstationName=HELPDESK01
2024-01-15 09:05:00 WINSEC EventID=4624 LogonType=2 TargetUserName=receptionist WorkstationName=FRONT01
2024-01-15 09:10:00 PROXY user=marketing url=https://linkedin.com method=GET status=200
2024-01-15 09:11:00 DNS query=www.google.com type=A response=142.250.185.46
2024-01-15 09:12:00 FIREWALL action=allow src=192.168.1.10 dst=13.107.42.14 port=443 protocol=TLS
2024-01-15 09:14:00 SYSMON EventID=1 Image=C:\\Windows\\System32\\svchost.exe CommandLine="svchost.exe -k netsvcs" User=SYSTEM
2024-01-15 09:16:00 WINSEC EventID=4634 LogonType=3 TargetUserName=service_account
2024-01-15 09:17:00 MAIL messageId=DEF456 from=newsletter@company.com to=all@company.com subject="Weekly Update" status=delivered
2024-01-15 09:19:00 PROXY user=sales url=https://salesforce.com method=GET status=200
2024-01-15 09:21:00 DNS query=teams.microsoft.com type=A response=52.113.194.132
2024-01-15 09:22:00 SYSMON EventID=3 Image=outlook.exe DestinationIp=40.97.153.146 DestinationPort=443
2024-01-15 09:23:00 WINSEC EventID=4624 LogonType=3 TargetUserName=backup_svc IpAddress=192.168.1.200
2024-01-15 09:24:00 EDR action=scan result=clean file=C:\\Users\\jdoe\\Downloads\\report.pdf
2024-01-15 09:26:00 FIREWALL action=allow src=192.168.1.15 dst=151.101.1.69 port=443 protocol=TLS
""".strip()

class EnhancedLogParser:
    """Enhanced log parser with multi-format support."""
    
    PATTERNS = {
        'sysmon': r'(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) SYSMON EventID=(\d+) (.+)',
        'winsec': r'(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) WINSEC EventID=(\d+) (.+)',
        'firewall': r'(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) FIREWALL (.+)',
        'proxy': r'(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) PROXY (.+)',
        'dns': r'(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) DNS (.+)',
        'mail': r'(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) MAIL (.+)',
        'edr': r'(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) EDR (.+)',
    }
    
    SEVERITY_RULES = {
        'critical': ['mimikatz', 'm.exe sekurlsa', 'lsass.exe', 'psexesvc', 'disablerealtimemonitoring'],
        'high': ['powershell.*-enc', 'cmd.exe /c', 'net view', 'nltest', 'domain admins', 'exfil'],
        'medium': ['net share', 'net group', '7z.exe', 'archive', 'blocked', 'denied'],
        'low': ['eventid=4624', 'eventid=4634', 'action=allow', 'action=scan'],
    }
    
    SUSPICIOUS_PROCESSES = {
        'mimikatz.exe', 'm.exe', 'procdump.exe', 'psexec.exe', 
        'cobaltstrike.exe', 'beacon.exe', 'rubeus.exe'
    }
    
    def parse(self, raw_logs: str) -> List[LogEntry]:
        """Parse multi-format logs."""
        entries = []
        for line in raw_logs.strip().split('\n'):
            line = line.strip()
            if not line or line.startswith('#'):
                continue
            
            entry = self._parse_line(line)
            if entry:
                entries.append(entry)
        
        return entries
    
    def _parse_line(self, line: str) -> LogEntry:
        """Parse individual log line."""
        for source, pattern in self.PATTERNS.items():
            match = re.match(pattern, line)
            if match:
                groups = match.groups()
                timestamp = groups[0]
                
                if source in ['sysmon', 'winsec']:
                    event_id = int(groups[1])
                    message = groups[2]
                else:
                    event_id = 0
                    message = groups[1] if len(groups) > 1 else ""
                
                severity = self._classify_severity(line)
                
                # Extract additional fields
                user = self._extract_field(message, ['User=', 'TargetUserName=', 'user='])
                process = self._extract_field(message, ['Image=', 'process='])
                command = self._extract_field(message, ['CommandLine='])
                
                return LogEntry(
                    timestamp=timestamp,
                    source=source.upper(),
                    event_id=event_id,
                    severity=severity,
                    message=message,
                    raw=line,
                    user=user,
                    process=process,
                    command_line=command
                )
        
        return None
    
    def _classify_severity(self, message: str) -> str:
        """Classify log severity based on content."""
        message_lower = message.lower()
        for severity, keywords in self.SEVERITY_RULES.items():
            if any(kw in message_lower for kw in keywords):
                return severity
        return 'info'
    
    def _extract_field(self, message: str, patterns: List[str]) -> str:
        """Extract field value from message."""
        for pattern in patterns:
            if pattern in message:
                start = message.find(pattern) + len(pattern)
                end = message.find(' ', start)
                if end == -1:
                    end = len(message)
                return message[start:end].strip()
        return ""

# Parse logs
parser = EnhancedLogParser()
entries = parser.parse(SAMPLE_LOGS)

print(f"Parsed {len(entries)} log entries from {len(set(e.source for e in entries))} sources")
print(f"\nSeverity breakdown:")
severity_counts = Counter(e.severity for e in entries)
for sev, count in sorted(severity_counts.items(), key=lambda x: ['critical', 'high', 'medium', 'low', 'info'].index(x[0])):
    print(f"  {sev.upper()}: {count}")

print(f"\nCritical/High severity events:")
for entry in entries:
    if entry.severity in ['critical', 'high']:
        print(f"  [{entry.severity.upper()}] {entry.timestamp} | {entry.source} | {entry.message[:60]}...")

## 2. Log Parser

In [None]:
class LogParser:
    """Parse various log formats."""
    
    PATTERNS = {
        'standard': r'(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) (\w+) (.+)',
        'syslog': r'(\w{3}\s+\d+\s+\d{2}:\d{2}:\d{2}) (\S+) (\S+): (.+)'
    }
    
    SEVERITY_KEYWORDS = {
        'critical': ['injection', 'ransomware', 'exfiltration'],
        'high': ['failed', 'blocked', 'suspicious', 'malware'],
        'medium': ['unusual', 'warning', 'locked'],
        'low': ['info', 'success', 'allowed']
    }
    
    def parse(self, raw_logs: str) -> List[LogEntry]:
        entries = []
        for line in raw_logs.strip().split('\n'):
            if not line.strip():
                continue
            entry = self._parse_line(line)
            if entry:
                entries.append(entry)
        return entries
    
    def _parse_line(self, line: str) -> LogEntry:
        match = re.match(self.PATTERNS['standard'], line)
        if match:
            timestamp, source, message = match.groups()
            return LogEntry(
                timestamp=timestamp,
                source=source,
                event_type=source,
                severity=self._classify_severity(message),
                message=message,
                raw=line
            )
        return None
    
    def _classify_severity(self, message: str) -> str:
        message_lower = message.lower()
        for severity, keywords in self.SEVERITY_KEYWORDS.items():
            if any(kw in message_lower for kw in keywords):
                return severity
        return 'info'

# Parse logs
parser = LogParser()
entries = parser.parse(SAMPLE_LOGS)

print(f"Parsed {len(entries)} log entries")
for entry in entries[:3]:
    print(f"  [{entry.severity.upper()}] {entry.source}: {entry.message[:50]}...")

## 3. LLM Log Analyzer

In [None]:
class LLMLogAnalyzer:
    """Use LLM to analyze security logs."""
    
    def __init__(self):
        try:
            from anthropic import Anthropic
            self.client = Anthropic()
            self.available = True
        except:
            self.available = False
            print("Note: Anthropic client not available. Using mock responses.")
    
    def analyze_logs(self, entries: List[LogEntry]) -> Dict:
        """Analyze log entries and generate insights."""
        logs_text = "\n".join([e.raw for e in entries])
        
        prompt = f"""Analyze these security logs and provide:
1. Summary of events
2. Potential security incidents detected
3. Attack timeline if applicable
4. Recommended actions

LOGS:
{logs_text}

Provide a structured analysis."""
        
        if self.available:
            response = self.client.messages.create(
                model="claude-sonnet-4-20250514",
                max_tokens=1024,
                messages=[{"role": "user", "content": prompt}]
            )
            return {"analysis": response.content[0].text}
        else:
            return self._mock_analysis(entries)
    
    def _mock_analysis(self, entries: List[LogEntry]) -> Dict:
        """Mock analysis for demo purposes."""
        return {
            "analysis": """
## Security Log Analysis

### Summary
The logs show a potential attack sequence spanning approximately 5 minutes.

### Incidents Detected
1. **Brute Force Attack** (09:15:23 - 09:15:26)
   - 3 failed login attempts for 'admin' from 192.168.1.100
   - Account was locked after threshold reached

2. **C2 Communication Attempt** (09:17:00 - 09:18:05)
   - Blocked outbound connection to 203.0.113.5:4444
   - DNS query to known malware domain: malware-c2.evil.com

3. **Malware Execution** (09:18:00 - 09:20:00)
   - Encoded PowerShell execution
   - New executable dropped
   - Process injection detected

### Recommended Actions
1. Isolate affected hosts (10.0.0.50, workstation with admin user)
2. Block C2 IP 203.0.113.5 at perimeter
3. Analyze dropped executable update.exe
4. Reset admin credentials after investigation
5. Check for lateral movement
"""
        }

# Analyze logs
analyzer = LLMLogAnalyzer()
result = analyzer.analyze_logs(entries)
print(result['analysis'])

## 4. Pattern Detection

In [None]:
class PatternDetector:
    """Detect common attack patterns in logs."""
    
    ATTACK_PATTERNS = {
        'brute_force': {
            'keywords': ['failed login', 'authentication failed'],
            'threshold': 3,
            'window_seconds': 60
        },
        'c2_communication': {
            'keywords': ['blocked connection', 'suspicious dns', 'malware'],
            'ports': [4444, 8888, 31337, 6667]
        },
        'lateral_movement': {
            'keywords': ['psexec', 'wmi', 'remote', 'injection']
        },
        'data_exfiltration': {
            'keywords': ['large transfer', 'upload', 'exfil']
        }
    }
    
    def detect(self, entries: List[LogEntry]) -> List[Dict]:
        detections = []
        
        for pattern_name, config in self.ATTACK_PATTERNS.items():
            matches = self._find_matches(entries, config)
            if matches:
                detections.append({
                    'pattern': pattern_name,
                    'count': len(matches),
                    'entries': matches[:5]  # First 5 matches
                })
        
        return detections
    
    def _find_matches(self, entries: List[LogEntry], config: Dict) -> List[LogEntry]:
        matches = []
        for entry in entries:
            message_lower = entry.message.lower()
            if any(kw in message_lower for kw in config.get('keywords', [])):
                matches.append(entry)
        return matches

# Detect patterns
detector = PatternDetector()
patterns = detector.detect(entries)

print("\nDetected Attack Patterns:")
print("=" * 40)
for p in patterns:
    print(f"\n{p['pattern'].upper()} ({p['count']} events)")
    for entry in p['entries']:
        print(f"  - [{entry.timestamp}] {entry.message[:60]}...")

## 5. Log Statistics

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

# Convert to DataFrame for analysis
df = pd.DataFrame([{
    'timestamp': e.timestamp,
    'source': e.source,
    'severity': e.severity,
    'message': e.message
} for e in entries])

# Visualize
fig, axes = plt.subplots(1, 2, figsize=(12, 4))

# Events by source
df['source'].value_counts().plot(kind='bar', ax=axes[0], color='steelblue')
axes[0].set_title('Events by Source')
axes[0].set_ylabel('Count')

# Events by severity
severity_order = ['critical', 'high', 'medium', 'low', 'info']
severity_colors = {'critical': 'red', 'high': 'orange', 'medium': 'yellow', 'low': 'green', 'info': 'blue'}
severity_counts = df['severity'].value_counts()
colors = [severity_colors.get(s, 'gray') for s in severity_counts.index]
severity_counts.plot(kind='bar', ax=axes[1], color=colors)
axes[1].set_title('Events by Severity')
axes[1].set_ylabel('Count')

plt.tight_layout()
plt.show()

## Summary

In this lab, we built an LLM-powered log analysis system:

1. **Log Parsing** - Extracted structured data from raw logs
2. **LLM Analysis** - Used Claude to generate security insights
3. **Pattern Detection** - Identified common attack patterns
4. **Visualization** - Created dashboards for log statistics

### Key Takeaways:
- LLMs excel at summarizing and correlating log events
- Combine rule-based detection with LLM analysis
- Structured prompts yield better results
- Pre-filter logs to reduce token usage

### Next Steps:
1. Add real-time log streaming
2. Build query interface for log search
3. Create automated alerting pipeline