Skip to content

ping2A/IronSift

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

18 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

IronSift πŸ”πŸ›‘οΈ

"Where's Waldo?" for Cybersecurity β€” Fleet-wide anomaly detection powered by unsupervised machine learning.

Created with Claude.ai but supervised by a human (me apparently).

IronSift is a high-performance Rust-based cybersecurity tool that analyzes massive process logs to identify compromised machines in server fleets. Using DBSCAN clustering and TF-IDF feature engineering, it detects threats without requiring known attack signatures.s

🎯 Quick Start (3 Ways)

Option 1: Super Simple API (Recommended for Getting Started)

use ironsift::{build_profiles_simple, analyze_fleet, DetectionConfig};

fn main() {
    let config = DetectionConfig::default();
    
    // Just provide (machine_id, process_name, parent_name) - PIDs handled automatically!
    let processes = vec![
        ("server1".to_string(), "nginx".to_string(), "systemd".to_string()),
        ("server1".to_string(), "worker".to_string(), "nginx".to_string()),
        ("server2".to_string(), "miner".to_string(), "systemd".to_string()),  // ⚠️ Anomaly
    ];
    
    let profiles = build_profiles_simple(processes, &config);
    let report = analyze_fleet(&profiles, &config).unwrap();
    report.print();
}

Option 2: ProcessBuilder API (More Control)

use ironsift::{ProcessBuilder, ProcessEntry, build_profiles, analyze_fleet, DetectionConfig};

fn main() {
    let config = DetectionConfig::default();
    let mut builder = ProcessBuilder::new();
    
    // Simple method
    builder.add_process("server1", "nginx", "systemd");
    
    // Or fluent API with full control
    builder.add(
        ProcessEntry::new("server1".to_string(), "worker".to_string())
            .parent("nginx")
            .uid(33)
            .path("/usr/sbin/nginx")
            .args("worker process")
    );
    
    // NEW: Automatic command line parsing!
    builder.add_command("server2", "/usr/bin/postgres -D /var/lib/postgresql/data", Some("systemd"));
    
    // NEW: Bare commands (no full path) work too!
    builder.add_command("server3", "ls /etc/", Some("bash"));
    
    // NEW: JSON log parsing!
    builder.add_json(r#"{"host": "server4", "cmd": "nginx", "uid": 33}"#);
    
    let profiles = build_profiles(builder.build(), &config);
    let report = analyze_fleet(&profiles, &config).unwrap();
    report.print();
}

Option 3: With Real PIDs (From System Logs)

use ironsift::{RawLogEntry, build_profiles, analyze_fleet, DetectionConfig};

fn main() {
    let config = DetectionConfig::default();
    
    let entries = vec![
        RawLogEntry {
            machine_id: "server1".to_string(),
            pid: 1, ppid: 0,
            name: "systemd".to_string(),
            uid: 0,
            path: "/usr/lib/systemd/systemd".to_string(),
            args: "--system".to_string(),
            timestamp: None,
        },
        // ... more entries
    ];
    
    let profiles = build_profiles(entries, &config);
    let report = analyze_fleet(&profiles, &config).unwrap();
    report.print();
}

See EXAMPLES.md for complete usage examples.


πŸ“œ Version History

v0.3.0 (Current) - Enhanced Analysis & Input Flexibility

  • ✨ Enhanced Detailed Console Output - Rich reporting with attack categorization
  • ✨ Automatic Command Line Parsing - Handles bare commands (ls /etc/) and full paths
  • ✨ Native JSON Log Parsing - Docker, Kubernetes, CloudWatch, Elasticsearch support
  • πŸ“š Comprehensive documentation (15+ guides)
  • πŸ§ͺ 50+ tests covering all features

v0.2.0 - Flexible APIs & Automation

  • 🎯 Three flexible APIs (Simple, Builder, Direct)
  • πŸ”„ Automatic PID/PPID resolution
  • πŸ“ Reorganized project structure (CLI separated)
  • πŸ“– Extensive documentation

v0.1.0 - Initial Release

  • πŸ” Core DBSCAN clustering
  • πŸ“Š TF-IDF feature engineering
  • 🚨 Anomaly detection
  • πŸ“ˆ Basic reporting

πŸ“₯ Multiple Input Methods

IronSift accepts data in various formats - choose what works for your logs:

Full Command Lines (with paths)

builder.add_command("server1", "/usr/bin/nginx -c /etc/nginx.conf", Some("systemd"));
// β†’ Automatically extracts: name="nginx", path="/usr/bin/nginx", args="-c /etc/nginx.conf"

Bare Commands (no paths)

// Common in ps output, shell commands
builder.add_command("server1", "ls /etc/", Some("bash"));
builder.add_command("server1", "grep error app.log", Some("bash"));
// β†’ Works perfectly! name="ls", path="ls", args="/etc/"

JSON Logs (Docker, Kubernetes, CloudWatch)

// Single JSON entry
builder.add_json(r#"{"host": "server1", "cmd": "/usr/bin/nginx", "uid": 33}"#);

// Batch (JSON array or NDJSON)
builder.add_json_batch(r#"[
    {"container": "web-1", "command": "nginx", "userid": 33},
    {"node": "worker-1", "cmd": "python3 app.py", "uid": 1000}
]"#);

Supported JSON key names:

  • Machine: machine_id, hostname, host, server, node, container, pod
  • Command: command, cmd, cmdline, commandline
  • User: uid, user_id, userid

See JSON_PARSING.md and COMMAND_PARSING.md for complete documentation.


🎯 Features

Core Detection Capabilities

Feature Description
Multivariate Analysis Analyzes 6 dimensions: Process Name, Parent (auto-resolved), UID, Path, Entropy, Path Risk
PID Awareness Automatically resolves parent processes from PID/PPID relationships
Unsupervised Learning Zero-config detection β€” no signature database required
Scale Invariant Works on 10 logs or 10 million logs
Minority Cluster Detection Identifies coordinated attacks (botnets, APTs)
High Entropy Detection Flags obfuscated commands and encoded payloads
Suspicious Path Analysis Detects execution from /tmp, /dev/shm, hidden directories

Detection Scenarios

IronSift can identify:

  • Cryptominers: Unusual processes with high CPU, suspicious paths
  • Web Shells: PHP/Python processes with high-entropy eval() payloads
  • Privilege Escalation: Normal processes suddenly running as root (UID 0)
  • Lateral Movement: Unusual SSH/SCP activity with anomalous targets
  • Rootkits: Processes masquerading as system services
  • APT Campaigns: Small clusters of compromised machines with identical malware

πŸ“¦ Installation

Prerequisites

  • Rust 1.70+ (rustup recommended)
  • 4GB+ RAM for large datasets

Build from Source

cd ironsift
cargo build --release

πŸ”§ Quick Start

1. Generate Test Data

Create a realistic dataset with 100 machines and embedded attack scenarios:

cargo run --release --bin generator

Output: large_dataset.csv (100,000 logs with 10 compromised machines)

The generated data includes:

  • Realistic PID/PPID relationships
  • systemd as PID 1 on each machine
  • Normal processes as children of systemd
  • Attack processes with proper parent relationships

2. Run Analysis

Analyze the fleet and display results:

cargo run --release --bin ironsift

Sample Output:

================================================================================
                         IRONSIFT ANALYSIS REPORT                              
================================================================================
Fleet Size: 100 machines
Detection Sensitivity: High

--- Configuration ---
  DBSCAN Tolerance: 0.05
  Entropy Threshold: 4.5
  Minority Cluster Ratio: 10%

--- Cluster Distribution ---
  Cluster 0: 90 machines (90.0%)
  Noise (Outliers): 10 machines (10.0%)

================================================================================
Status: 🚨 ANOMALIES DETECTED
================================================================================
Suspicious Machines: 10

πŸ’€ CRITICAL (3):
   These machines are isolated outliers - likely compromised

  πŸ’€ machine_013 (Distance: 1.500)
     β”œβ”€ Cluster: Noise (isolated outlier)
     β”œβ”€ Total processes: 150
     β”œβ”€ Suspicious processes: 50 ⚠️
     β”œβ”€ Rare processes (< 5% of fleet):
     β”‚  β€’ kworker (path: /tmp/.X11-unix/kworker)
     β”‚  β€’ systemd (path: /var/tmp/.cache/systemd)
     β”œβ”€ Suspicious processes detected:
     β”‚
     β”‚  πŸ“› kworker (count: 30)
     β”‚     Parent: systemd
     β”‚     Path: /tmp/.X11-unix/kworker
     β”‚     UID: 0 (root) ⚠️
     β”‚     Risk factors:
     β”‚       🚨 High entropy arguments (possible obfuscation)
     β”‚       🚨 Suspicious execution path: /tmp/.X11-unix/kworker
     β”‚       🚨 Running as root (UID 0)
     β”‚       🚨 Executing from temporary directory
     └─ Activity period: 2024-01-01 10:00:00 to 2024-01-07 15:30:00

πŸ”΄ HIGH (4):
   Strong deviation from baseline - investigate immediately

  πŸ”΄ machine_042 (Distance: 0.823)
     β”œβ”€ Suspicious processes: 15 ⚠️
     └─ Unusual: php-fpm (high entropy eval payloads)
  ...

--- Detected Attack Patterns ---
  ⛏️  Cryptomining (3 machines): machine_013, machine_027, machine_065
  πŸ•ΈοΈ  Web Shells (2 machines): machine_042, machine_088
  ⬆️  Privilege Escalation (4 machines): machine_019, machine_051, ...
  πŸ“‚ Suspicious Execution Paths (5 machines): machine_013, machine_027, ...

================================================================================
Recommended Actions:
  1. Review flagged machines and investigate anomalous processes
  2. Check process execution paths and command arguments
  3. Verify parent-child process relationships
  4. Cross-reference with network logs and file access logs
  5. Export detailed report: cargo run --bin ironsift -- --export-json
================================================================================

See OUTPUT_EXAMPLES.md for complete output examples.

3. Export Forensic Report

Generate a detailed JSON report for incident response:

cargo run --release --bin ironsift -- --export-json

Output: forensic_report.json


βš™οΈ Configuration

Command Line Options

ironsift [OPTIONS]

Options:
  --config <file>       Load configuration from JSON file
  --export-json         Export detailed forensic report
  --tolerance <value>   Override DBSCAN tolerance (default: 0.05)
  --help                Show help message

Custom Configuration

On first run, IronSift creates ironsift_config.json:

{
  "entropy_threshold": 4.5,
  "minority_cluster_ratio": 0.10,
  "dbscan_tolerance": 0.05,
  "dbscan_min_samples": 2,
  "normalize_features": true,
  "suspicious_path_patterns": [
    "/tmp/",
    "/dev/shm/",
    "/var/tmp/",
    "/home/[^/]+/\\.[^/]+"
  ]
}

Tuning Guide

Parameter Effect Recommended Range
dbscan_tolerance Detection sensitivity 0.03 (strict) - 0.10 (loose)
minority_cluster_ratio Botnet detection threshold 0.05 - 0.15
entropy_threshold Obfuscation detection 3.5 (sensitive) - 5.5 (strict)

Example: Increase sensitivity for high-security environments:

cargo run --bin ironsift -- --tolerance 0.03

πŸ“Š Understanding Results

Anomaly Severity Levels

Level Score Meaning Action
πŸ’€ Critical > 1.0 Isolated outlier, likely compromised Immediate isolation
πŸ”΄ High 0.6-1.0 Strong deviation, investigate ASAP Priority investigation
🟠 Medium 0.3-0.6 Moderate anomaly, worth reviewing Schedule review
🟑 Low 0.0-0.3 Minor deviation, may be benign Monitor

Forensic Report Structure

The JSON export includes:

{
  "report_timestamp": "2024-12-10T15:30:00Z",
  "fleet_size": 100,
  "anomalies_detected": 10,
  "config": { ... },
  "investigation_targets": [
    {
      "machine_id": "machine_013",
      "severity": "Critical",
      "distance_score": 1.5,
      "suspicious_processes": [
        {
          "name": "kworker",
          "path": "/tmp/.X11-unix/kworker",
          "parent": "systemd",
          "risk_factors": [
            "High entropy arguments (possible obfuscation)",
            "Suspicious execution path: /tmp/.X11-unix/kworker",
            "Running as root (UID 0)"
          ]
        }
      ]
    }
  ]
}

πŸ§ͺ Testing

Run the comprehensive test suite:

cargo test

Test Coverage

  • Shannon entropy calculation
  • Suspicious path detection
  • Clean fleet (no false positives)
  • Single outlier detection
  • Minority cluster detection (botnet scenario)
  • Process risk factor analysis
  • PID/PPID parent resolution
  • Unknown parent handling

πŸ—οΈ Architecture

Data Flow

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                           IRONSIFT PIPELINE                                       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

  Raw Input                    Profile Building              Analysis
  ─────────                    ────────────────              ───────

  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”             β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”           β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚ CSV / JSON   β”‚             β”‚ Group by        β”‚           β”‚ TF-IDF          β”‚
  β”‚ Process Logs │────────────►│ machine_id      │──────────►│ Vectorization   β”‚
  β”‚ or File      β”‚   parse     β”‚                 β”‚  build    β”‚ (rare = signal) β”‚
  β”‚ Access Logs  β”‚             β”‚ Resolve PPID β†’  β”‚  profiles β”‚                 β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜             β”‚ parent names    β”‚           β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚                     β”‚                 β”‚                    β”‚
         β”‚                     β”‚ Whitelist /     β”‚                    β–Ό
         β”‚                     β”‚ filter paths    β”‚           β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
         └────────────────────►│                 β”‚           β”‚ L2 Normalize    β”‚
                               β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜           β”‚ DBSCAN Cluster  β”‚
                                                             β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                                                      β”‚
                                                                      β–Ό
  Output                     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”             β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  ──────                     β”‚ Anomaly Scoring │◄────────────│ Noise = outlier β”‚
                             β”‚ & Severity      β”‚  cluster    β”‚ Small cluster   β”‚
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”           β”‚ (Criticalβ†’Low)  β”‚   ids       β”‚ = minority      β”‚
  β”‚ Console      │◄───────────                 β”‚             β”‚ Large cluster   β”‚
  β”‚ Report       β”‚  print    β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜             β”‚ = baseline      β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                    β”‚                      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β–²                            β”‚
         β”‚                            β–Ό
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”             β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚ forensic_    │◄────────────│ Risk factors    β”‚
  β”‚ report.json  β”‚  export     β”‚ (entropy, path, β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜             β”‚  root, mtime)   β”‚
                               β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Process vs File Analysis

  PROCESS MODE (default)              FILE MODE (--files)
  ─────────────────────              ───────────────────

  RawLogEntry                         RawFileEntry
  β€’ machine_id, pid, ppid             β€’ machine_id, path, uid
  β€’ name, path, args, uid             β€’ timestamp, mtime
  β€’ timestamp                         
         β”‚                                    β”‚
         β–Ό                                    β–Ό
  ProcessSignature                    FileSignature
  β€’ name + parent + uid + path        β€’ path + uid
  β€’ is_suspicious_path, entropy       β€’ is_suspicious_path
         β”‚                            β€’ has_mtime_anomaly
         β–Ό                                    β”‚
  MachineProfile                      MachineFileProfile
  (counts per process)                (counts per file + mtimes)
         β”‚                                    β”‚
         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                        β–Ό
              analyze_fleet / analyze_files_fleet
                        β”‚
                        β–Ό
              AnalysisReport (anomalies, severity)

Key Algorithms

  1. PID Resolution: Automatically maps PPID to parent process names
  2. TF-IDF Weighting: Boosts rare processes, reduces noise from common ones
  3. L2 Normalization: Ensures distance metrics work correctly across varied fleet sizes
  4. DBSCAN: Density-based clustering that naturally identifies outliers
  5. Shannon Entropy: Measures randomness in command arguments (detects obfuscation)

πŸŽ“ How It Works

The "Iron Consensus" Principle

IronSift treats each machine as a vector in N-dimensional feature space:

  • Normal machines cluster tightly (distance β‰ˆ 0)
  • Compromised machines drift away due to:
    • Rare processes not seen elsewhere
    • Unusual execution paths
    • High-entropy obfuscated commands
    • Privilege escalation patterns
    • Abnormal parent-child relationships

Clustering (Conceptual)

    Feature space (simplified 2D view)
    ─────────────────────────────────

         β€’ β€’ β€’  β€’ β€’
       β€’   β€’ β€’ β€’   β€’          ← Normal machines (tight cluster)
        β€’ β€’   β€’ β€’ β€’
          β€’ β€’ β€’ β€’
              β˜…                 ← Isolated outlier (NOISE)
                                β†’ πŸ’€ CRITICAL: likely compromised

                    β—„ ─ ─ ─ ─ β–Ί
                 small cluster
                 (minority)        ← πŸ”΄ HIGH: botnet / APT pattern
                    β–³ β–³
                     β–³

    DBSCAN: density-based clustering
    β€’ Points in dense regions β†’ same cluster (baseline).
    β€’ Points in sparse regions β†’ "noise" = anomaly.
    β€’ Small clusters β†’ minority = coordinated deviance.

Example Detection

Fleet: 100 web servers running nginx, postgres, node

Anomaly: Machine #42 suddenly has:

php-fpm (PID 5432, PPID 108 [apache2]) β†’ eval(base64_decode('aGVsbG8gd29ybGQ='))

IronSift Analysis:

  Raw log                    Resolution              TF-IDF              DBSCAN
  ───────                    ──────────              ──────              ──────

  machine_42                 PPID 108    rare        Machine #42         Main cluster
  pid 5432, ppid 108   ───►  β†’ apache2   process  ──► vector differs  ──► β€’ β€’ β€’ β€’ β€’
  name php-fpm               parent      (1/100)     from baseline         β€’
  args eval(base64…)         resolved    β–Ό            β–Ό                    β˜…  ← #42
                                β”‚        IDF boost   distance β‰ˆ 1.2        (outlier)
                                β”‚        100Γ—        β–Ό
                                β”‚                    πŸ”΄ HIGH severity
                                └─────────────────── anomaly
  1. Resolves parent: PPID 108 β†’ apache2
  2. Computes TF-IDF: This exact process appears on 1/100 machines
  3. IDF boost: 100Γ— signal amplification for this rare event
  4. DBSCAN: Machine #42 is 1.2 units away from main cluster
  5. Result: πŸ”΄ HIGH severity anomaly detected

πŸ“ˆ Performance

Benchmarks on a 4-core CPU:

Fleet Size Logs Processing Time Memory
100 machines 100K 0.8s 45 MB
1,000 machines 1M 6.2s 320 MB
10,000 machines 10M 58s 2.8 GB

With parallel processing enabled (Rayon)


πŸ› οΈ Use Cases

Production Monitoring

# Daily cron job
0 2 * * * cd /opt/ironsift && \
  ./ingest_logs.sh && \
  cargo run --release --bin ironsift -- --export-json && \
  ./alert_soc.sh forensic_report.json

Incident Response

# Quick triage after breach detection
cargo run --release --bin ironsift -- --tolerance 0.03 --export-json

Research & Red Team

# Test detection against custom malware
./inject_attack.sh && cargo run --bin ironsift

Stay secure. Sift the iron from the ore. πŸ”’

About

'Where's Waldo?'

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages