Universal log parser with pattern detection. Feed it any log file and get structured output, level filtering, and automatic anomaly detection � no format configuration needed.
Debugging production issues means digging through logs in different formats across different services. log-parser auto-detects the format (structured logs, syslog, nginx access logs, Python logging) and gives you a unified view with built-in pattern detection for error bursts, repeated failures, time gaps, and traffic anomalies.
pip install -e .Or with dev dependencies:
pip install -e ".[dev]"# Quick summary of a log file
log-parser app.log
# Filter by level
log-parser app.log --level ERROR
# JSON output for piping
log-parser app.log --format json
# Plain text output
log-parser app.log --format plain
# Grep through logs
log-parser app.log --grep "timeout|refused"
# Run pattern detection
log-parser app.log --detect-patterns
# Only process first 100 lines
log-parser app.log --head 100
# Strict mode (fail on unparseable lines)
log-parser app.log --strictfrom log_parser.parser import LogParser
from log_parser.patterns import PatternDetector
parser = LogParser()
entries = parser.parse_file("app.log")
print(f"Detected format: {parser.detected_format}")
print(f"Parsed {parser.stats['parsed']}/{parser.stats['total']} lines")
# Filter errors
errors = [e for e in entries if e.is_error]
# Detect patterns
detector = PatternDetector(entries)
patterns = detector.detect_all()
for p in patterns:
print(f"[{p.severity}] {p.name}: {p.description}")The parser auto-detects these log formats from the first matching line:
- Common structured �
2024-01-15T10:30:00Z INFO [source] message - Syslog �
Jan 15 10:30:00 hostname process[pid]: message - Nginx access � Combined log format
- Python logging �
LEVEL:logger:message - JSON-like � Lines containing
"level"and"message"fields
Custom formats can be added by passing LogFormat objects to the parser.
Built-in detectors scan for:
- Error bursts � Clusters of errors within a time window
- Repeated errors � Same error message appearing multiple times (with number normalization)
- High error rate � When errors exceed a configurable threshold of total entries
- Time gaps � Suspicious gaps in the log timeline (possible downtime)
- IP anomalies � IPs with unusually high request counts (access logs)
- HTTP errors � High counts of 4xx/5xx status codes
Core parsing engine with auto-detection.
parse_line(line, line_number)� Parse a single line into aLogEntryparse_lines(lines)� Parse multiple linesparse_file(path)� Parse a filedetected_format� Name of the auto-detected formatstats� Parse statistics (total, parsed, failed)
Anomaly detection on parsed entries.
detect_all()� Run all detectors, returnslist[PatternMatch]detect_error_bursts(window_seconds, threshold)detect_repeated_errors(min_count)detect_time_gaps(gap_seconds)- Individual detectors can be called separately
Enum with standard levels: TRACE, DEBUGD INFO, WARN, ERROR, FATAL, UNKNOWN. Handles aliases (WARNING, CRITICAL, SEVERE).
log_parser/
parser.py # Core parsing engine, format definitions, timestamp handling
patterns.py # Pattern detection and anomaly analysis
formatters.py # Output formatting (plain, JSON, summary, pattern reports)
cli.py # CLI entry point (click)
The flow is: Log file � Parser auto-detects format � LogEntry objects � PatternDetector finds anomalies � Formatters produce output.
MIT