Skip to content

mconcas/log-analyzer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Log Analyzer

A streaming plain-text log analyzer for very large files (GB-scale), focused on metrics that help tune Fluent Bit parsing.

What it extracts

  • Throughput: elapsed wall-clock time, MB/s, lines/s
  • Line shape: min/avg/max length, percentiles (p50/p90/p95/p99), very-long-line counts
  • Timestamp detection: presence, format breakdown (ISO 8601, Apache, epoch seconds/millis, syslog), earliest/latest timestamp range
  • Severity tokens: trace / debug / info / warn / error / fatal
  • JSON analysis: JSON-like line ratio and parse failures
  • Parse-failure hints: lines matching parse/malformed/invalid/unmarshal patterns, with top recurring signatures
  • Error hints: top recurring error signatures
  • Multiline block analysis: consecutive non-timestamp lines — frequency, line/byte length stats, heuristic format detection (json, yaml_like, xml_like, stacktrace_like, plain_text), and one example block per format
  • Duplicate line detection: top-K most repeated lines
  • Sample lines: lines with parse-failure or orphaned error hints
  • Custom regex counters via --pattern
  • Colored terminal output (auto-detected, --no-color to disable)
  • Fluent Bit configuration recommendations: optimal Buffer_Size, Mem_Buf_Limit, Time_Format, multiline parser settings, and advisory notes — all derived automatically from the analysis

Build

cargo build --release

Binary: target/release/log-analyzer

Usage

Analyze one or more files:

./target/release/log-analyzer /path/to/app.log /path/to/worker.log

Gzipped logs (auto-detected by .gz extension):

./target/release/log-analyzer /path/to/app.log.gz

From stdin:

cat /path/to/app.log | ./target/release/log-analyzer -

Track specific patterns:

./target/release/log-analyzer app.log \
  --pattern 'cannot parse' \
  --pattern 'invalid time format' \
  --pattern 'multiline'

Export machine-readable JSON:

./target/release/log-analyzer app.log --json-out report.json

Control report limits:

./target/release/log-analyzer app.log \
  --sample-limit 50 \
  --top-k 30 \
  --max-signatures 50000 \
  --top-duplicates 20

Disable color or progress:

./target/release/log-analyzer app.log --no-color
./target/release/log-analyzer app.log --no-progress
./target/release/log-analyzer app.log --progress-interval 0.5

CLI flags

Flag Default Description
--json-out <FILE> Write JSON report to file
--pattern <REGEX> Custom regex to count (repeatable)
--sample-limit <N> 20 Max sample lines to collect
--top-k <N> 20 Top-K signatures to show
--max-signatures <N> 10000 Max unique signatures to track
--top-duplicates <N> 10 Top-N duplicate lines to show
--progress-interval <S> 1.0 Progress update interval (seconds)
--no-progress false Suppress progress output on stderr
--no-color false Disable colored output

Notes

  • Processing is streaming and single-pass; memory is bounded via signature caps.
  • Text decoding uses UTF-8 with replacement for invalid bytes.
  • The tool does not modify input files.
  • Progress is printed to stderr and includes ETA for regular files with known size.
  • For stdin and .gz inputs, ETA is shown as n/a because total uncompressed size is not known up front.
  • Local benchmark: ~1.98 GiB log file processed in ~5.5 s (~369 MiB/s, ~953 k lines/s).

Fluent Bit recommendations

Every report ends with a Fluent Bit Configuration Recommendations section that suggests concrete settings derived from the log analysis:

Setting How it is derived
Buffer_Size Next power-of-two above the longest line + 1 KB framing overhead
Mem_Buf_Limit Based on estimated real-world ingestion rate × 60 s headroom, clamped to [8 MB, 512 MB]
storage.max_chunks_up Mem_Buf_Limit ÷ 2 MB chunk size, rounded to a power of two (≥ 128)
Format json if >50% of lines are JSON-like, otherwise regex
Time_Format Fluent Bit strftime string matching the dominant timestamp format
flush_timeout Scaled from multiline block depth (1–5 s)
parser_firstline Regex anchored to the dominant timestamp pattern

Advisory notes flag issues such as mixed timestamp formats, very long lines, high multiline ratios, invalid UTF-8, and parse-failure patterns.

The same data is available in the fluent_bit key of the --json-out output.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages