Battle-tested file forensics platform for security professionals
Filo transforms unknown binary blobs into classified, repairable, and explainable artifacts with offline ML learning capabilities.
- π Deep File Analysis: Multi-layered signature, structural, and ZIP container analysis
- π― Smart Format Detection: Distinguishes DOCX/XLSX/PPTX, ODT/ODP/ODS, ZIP, JAR, APK, EPUB
- π§ Enhanced ML Learning: Discriminative pattern extraction, rich statistical features, n-gram profiling
- π§ Intelligent Repair: Reconstruct corrupted headers automatically with 21 repair strategies
- π Flexible Output: Concise evidence display (top 3 by default), full details with -a/--all-evidence
- π Confidence Breakdown: Auditable detection with --explain flag (court-ready transparency)
- π‘οΈ Contradiction Detection: Identifies malware, polyglots, structural anomalies (malware triage)
- π΅οΈ Embedded Detection: Find files hidden inside files - ZIP in EXE, PNG after EOF (malware hunter candy)
- π§ Tool Fingerprinting: Identify how/when/with what tools a file was created (forensic attribution)
β οΈ Polyglot Detection (NEW v0.2.5): Detect dual-format files (GIFAR, PNG+ZIP, PDF+JS) with risk assessment- π₯οΈ CPU Architecture Detection (NEW v0.2.8): Automatic detection of CPU architecture for executables (90+ architectures: x86, ARM, RISC-V, Xtensa, MIPS, etc.)
- π¨ zsteg-Compatible Steganography (v0.2.7): 60+ bit plane LSB/MSB extraction (PNG/BMP), auto base64 decoding, file type detection, CTF-optimized
- π PCAP Analysis (v0.2.6): Network capture file analysis with protocol detection, string extraction, base64 decoding, flag hunting
- π Batch Processing: Parallel directory analysis with configurable workers
- π Hash Lineage Tracking: Cryptographic chain-of-custody for court evidence
- π¦ Container Detection: Deep ZIP-based format inspection for Office and archive formats
- β‘ Performance Profiling: Identify bottlenecks in large-scale analysis
- π¨ Enhanced CLI: Color-coded output, hex dumps, repair suggestions
- π§Ή Easy Maintenance: Reset ML model and lineage database with simple commands
Option 1: Easy Install (.deb package)
# Clone and build
git clone https://github.com/supunhg/Filo
cd Filo
./build-deb.sh
# Install
sudo dpkg -i filo-forensics_0.2.8_all.debOption 2: From Source
git clone https://github.com/supunhg/Filo
cd Filo
pip install -e .Usage:
# Analyze unknown file
filo analyze suspicious.bin
# Identify CPU architecture (ELF/PE/Mach-O executables)
filo analyze binary # Shows: x86-64, ARM64, Xtensa, etc.
# Detect steganography (zsteg-compatible with auto base64 decoding)
filo stego challenge.png # CTF flag hunting
filo stego image.png --all # Show all 60+ bit plane results
filo stego image.png --extract="b1,rgba,lsb,xy" -o flag.txt
# Analyze PCAP network capture files
filo pcap capture.pcap
# Show detailed confidence breakdown (forensic-grade)
filo analyze --explain file.bin
# Show all detection evidence and embedded artifacts
filo analyze -a -e file.bin
# Analyze with JSON output
filo analyze --json file.bin > report.json
# Teach ML about a file format
filo teach correct_file.zip -f zip
# Batch process directory
filo batch ./directory
# Repair corrupted file
filo repair --format=png broken_image.bin
# Reset ML model or lineage database
filo reset-ml -y
filo reset-lineage -yThe easiest way to install Filo is to build and install the .deb package:
# Clone repository
git clone https://github.com/supunhg/Filo
cd Filo
# Build .deb package
./build-deb.sh
# Install
sudo dpkg -i filo-forensics_0.2.8_all.deb
# Start using immediately
filo --version
filo analyze file.binFeatures:
- β
Isolated installation at
/opt/filo/(no system conflicts) - β Automatic dependency management
- β
Global
filocommand (works from anywhere) - β No manual virtual environment activation
- β
Clean uninstall:
sudo dpkg -r filo-forensics
Supported: Ubuntu 20.04+, Debian 11+, and compatible distributions
Note: All user data is stored in /home/user/.filo/ directory:
- ML model:
/home/user/.filo/learned_patterns.pkl - Lineage database:
/home/user/.filo/lineage.db
git clone https://github.com/supunhg/Filo
cd Filo
pip install -e .# Clone and install with dev dependencies
git clone https://github.com/supunhg/Filo
cd Filo
pip install -e ".[dev]"
# Run tests
pytestfrom filo import Analyzer, RepairEngine
from filo.batch import analyze_directory
from filo.export import export_to_file
from filo.container import analyze_archive
# Analyze file with ML enabled
analyzer = Analyzer(use_ml=True)
result = analyzer.analyze_file("unknown.bin")
print(f"Detected: {result.primary_format} ({result.confidence:.0%})")
print(f"Alternatives: {result.alternative_formats[:3]}")
# View detection evidence
for evidence in result.evidence_chain[:3]:
print(f" {evidence['module']}: {evidence['confidence']:.0%}")
# Teach ML about correct format
with open("sample.zip", "rb") as f:
analyzer.teach(f.read(), "zip")
# Batch process directory
batch_result = analyze_directory("./data", recursive=True)
print(f"Analyzed {batch_result.analyzed_count} files")
# Export to JSON/SARIF
export_to_file(result, "report.json", format="json")
# Analyze container (DOCX, ZIP, etc.)
container = analyze_archive("document.docx")
for entry in container.entries:
print(f"{entry.path}: {entry.format}")
# Repair file
repair = RepairEngine()
repaired_data, report = repair.repair_file("corrupt.png")# Analysis with limited evidence (default: top 3)
filo analyze suspicious.bin
# Show all evidence and embedded artifacts
filo analyze -a -e suspicious.bin
# Show detailed confidence breakdown (auditable, court-ready)
filo analyze --explain file.bin
# Combine for full transparency
filo analyze --explain -a -e file.bin
# Disable ML for pure signature detection
filo analyze --no-ml file.bin
# Analysis with JSON output
filo analyze --json suspicious.bin
# Detect embedded files (ZIP in EXE, PNG after EOF)
filo analyze malware.exe -e
# Identify tool/creator fingerprints
filo analyze document.pdf # Automatically fingerprints
# Batch processing with export
filo batch ./directory --export=sarif --output=scan.sarif
# Teach ML about file formats
filo teach correct_file.zip -f zip
filo teach image.png -f png
# Reset ML model or lineage database
filo reset-ml -y
filo reset-lineage -y
# Export to JSON for scripting
filo analyze --json file.bin | jq '.primary_format'
# Security: Detect embedded malware in documents
filo analyze suspicious.docx # Automatically checks for contradictions
# Automation: Filter files with critical contradictions
filo analyze *.docx --json | \
jq 'select(.contradictions[]? | .severity == "critical")'
# Check for hidden files
filo analyze *.png --json | \
jq 'select(.embedded_objects | length > 0)'
# Chain-of-custody: Query file transformation lineage
filo lineage $(sha256sum repaired.png | cut -d' ' -f1)
# View lineage history
filo lineage-history --operation repair
# Export lineage for court
filo lineage $FILE_HASH --format json --output chain-of-custody.jsonFilo now accurately distinguishes between ZIP-based formats by inspecting container contents:
- Office Open XML: DOCX, PPTX, XLSX (via
[Content_Types].xml) - OpenDocument: ODT, ODP, ODS (via
mimetypefile) - Archives: JAR, APK, EPUB, plain ZIP
- Large files: Efficient handling of files >10MB using file path access
Three major improvements to machine learning detection:
- Discriminative Pattern Extraction: Automatically discovers format-specific byte sequences
- Rich Feature Analysis: 8 statistical features including compression ratio, entropy, byte distribution
- N-gram Profiling: Fuzzy matching using top 100 byte trigrams for similarity detection
Evidence display now shows only the top 3 most relevant items by default:
# Concise output (default)
filo analyze file.zip
# Full evidence when needed
filo analyze --all-evidence file.zip- Quick Start Guide - Get started in 5 minutes
- Steganography Detection - Hidden data extraction (LSB/MSB, metadata, trailing data) (NEW)
- Embedded Detection - Find files hidden inside files
- Tool Fingerprinting - Forensic attribution (who/when/how)
- Confidence Breakdown - Auditable detection explanations
- Hash Lineage - Chain-of-custody tracking
- Polyglot Detection - Dual-format file detection
- Contradiction Detection - Malware & anomaly detection
- Architecture - Detailed system design
- Examples - Code examples and demos
π¨ Steganography Detection
Detect hidden data in image files and documents:
filo stego image.png
# Output:
# π Steganography Analysis: image.png
#
# β Potential Hidden Data Found (3 methods)
#
# Method: b1,rgb,lsb,xy
# Confidence: 95% (FLAG PATTERN DETECTED)
# Data: picoCTF{h1dd3n_1n_LSB}Features:
- β LSB/MSB Detection: Extract data from least/most significant bits (PNG, BMP)
- β Multiple Channels: Test RGB, RGBA, individual channels (r, g, b, a), BGR
- β Bit Orders: Both LSB and MSB with row/column-major ordering
- β PDF Metadata: Extract hidden flags from Author, Title, Subject, Keywords
- β Trailing Data: Detect data after JPEG EOI, PNG IEND, PDF EOF markers
- β Flag Recognition: Automatic CTF flag pattern detection (picoCTF{}, flag{}, HTB{})
- β Auto-Decode: Automatic base64 and zlib decompression
- β Extraction: Save specific channels/methods to files
Full Guide: Steganography Detection Documentation
π PCAP Network Analysis
Quick triage for network capture files:
filo pcap dump.pcap
# Output:
# π Statistics
# Packets: 1,234
# Protocols: TCP (800), UDP (400), ICMP (34)
#
# π© FLAGS FOUND (2)
# picoCTF{n3tw0rk_f0r3n51c5}
# flag{hidden_in_packets}
#
# π Base64 Data
# cGljb0NURnsuLi59 β picoCTF{...}Features:
- β Protocol Detection: IPv4, IPv6, TCP, UDP, ICMP, ARP
- β String Extraction: ASCII strings from packet payloads
- β Base64 Decoding: Automatic detection and decoding
- β Flag Hunting: CTF flag pattern search across all packets
- β HTTP Extraction: GET/POST requests and headers
- β Lightweight: No Wireshark/tshark dependency for quick triage
New Format Support:
- π¦ PCAP/PCAPNG: Network capture files (little/big-endian)
- π Shell Archives (shar): Self-extracting shell script archives
v0.2.8 - CPU Architecture Detection (Latest)
π₯οΈ Major Enhancement: CPU Architecture Detection
Filo now automatically detects and reports CPU architecture for executable files:
filo analyze astronaut
# Output:
# π₯οΈ CPU Architecture:
# β’ Tensilica Xtensa Architecture (32-bit, Little-endian)
# Format: ELF | Machine Code: 0x005EKey Features:
- β 90+ architectures supported: x86, x86-64, ARM, ARM64, RISC-V, MIPS, PowerPC, Xtensa, SPARC, AVR, Alpha, IA-64, and many more
- β Three executable formats: ELF (Linux/Unix), PE/COFF (Windows), Mach-O (macOS/iOS)
- β Complete information: Architecture name, address width (32/64-bit), endianness, machine code
- β CTF-optimized: Instantly solve architecture identification challenges
- β Comprehensive testing: 24 tests covering all major architectures
Supported Architectures Include:
- Common: x86, x86-64, ARM (32/64-bit), RISC-V, MIPS, PowerPC
- Embedded: Xtensa (IoT/WiFi), AVR (Atmel), SuperH, M68k
- Specialized: SPARC, Alpha AXP, IA-64 (Itanium), S390 (mainframe)
- Exotic: VAX, PDP-10/11, TMS320C6000, Elbrus e2k, BPF
Documentation: See docs/ARCHITECTURE_DETECTION.md for complete guide
π Test Coverage: 24 new tests (100% passing) π― CTF Ready: Solves architecture challenges in one command
v0.2.7 - zsteg-Compatible Steganography
β¨ Major Enhancement: zsteg Algorithm Compatibility
Filo's steganography detection now matches the industry-standard zsteg tool exactly:
Key Features:
- β 60+ bit plane configurations tested per image
- β Byte-for-byte identical extraction compared to zsteg
- β Multi-bit extraction (b1, b2, b4) with correct nibble/byte packing
- β Auto base64 decoding - shows decoded flags directly (improvement over zsteg!)
- β File type detection - OpenPGP keys, Targa, Applesoft BASIC, Alliant
- β Smart result filtering - hides metadata noise by default
- β zsteg-style output - familiar format for CTF players
Also in v0.2.7:
- Reduced embedded object false positives (confidence threshold 0.70 β 0.80)
- Added format exclusion rules (skip WASM/ICO patterns in ELF/PE binaries)
- Parent format awareness in embedded detection
Testing:
- Validated on CTF challenge images (picoCTF)
- Algorithm verification against zsteg reference output
- Multi-bit extraction tested (b2, b4 bit planes)
π Test Coverage: 85%+ (all tests passing)
Full Details: RELEASE_v0.2.7.md
v0.2.6 - Steganography & PCAP Analysis
β¨ New Features:
- Steganography detection (LSB/MSB analysis, PDF metadata, trailing data)
- PCAP network capture analysis with flag hunting
- Enhanced output filtering
v0.2.5 - Polyglot & Dual-Format Detection
Filo can now detect files that are simultaneously valid in multiple formats:
filo analyze suspicious_image.gif
# Output:
# β Polyglot Detected:
# β’ GIF + JAR - GIF + JAR hybrid (GIFAR attack) (91%)
# Risk: HIGH | Pattern: gifarSupported Polyglot Patterns:
- GIFAR (GIF+JAR) - HIGH RISK: Classic attack vector for bypassing image filters
- PDF + JavaScript - HIGH RISK: Malicious PDFs with embedded JS payloads
- PE + ZIP - HIGH RISK: Windows executables that are also ZIP archives
- PNG + ZIP - MEDIUM RISK: Images with hidden ZIP archives
- JPEG + ZIP - MEDIUM RISK: JPEG files with embedded archives
Key Features:
- β Multi-format validation (PNG, GIF, JPEG, ZIP, JAR, RAR, PDF, PE, ELF)
- β Security risk assessment (HIGH, MEDIUM, LOW)
- β Confidence scoring (70-98%)
- β JavaScript payload detection in PDFs
- β Demo polyglot files for testing
- β Comprehensive test suite (26 new tests)
Documentation: See docs/POLYGLOT_DETECTION.md for complete guide
π Test Coverage: 67% overall (173/173 tests passing, +26 polyglot tests)
π― Supported Formats: 60+ file formats
π¬ Detection Accuracy: 95%+ on clean files, 70%+ on corrupted files
v0.2.4 - Embedded Detection & Tool Fingerprinting (Previous)
β¨ Enhancements:
- Embedded Object Detection - Find files hidden inside files (ZIP in EXE, PNG after EOF, polyglots)
- Tool Fingerprinting - Identify creation tools, versions, OS, timestamps (forensic attribution)
- Short Flags -
-afor all evidence,-efor all embedded artifacts - Reset Commands -
filo reset-mlandfilo reset-lineagefor easy maintenance - Demo Files - Sophisticated test files in
demo/directory - Hash Lineage Tracking - Cryptographic chain-of-custody for all transformations
- Format Contradiction Detection - Identifies malware, polyglots, embedded executables
- Confidence Decomposition - Auditable detection with --explain flag
- ZIP Container Analysis - Accurate DOCX/XLSX/PPTX/ODT/ODP/ODS detection
- Enhanced ML Learning - Pattern extraction, rich features, n-gram profiling
π 147/147 tests passing
We welcome contributions! Priority areas:
- Format specifications (YAML)
- Analysis plugins
- Test corpus samples
- Performance optimizations
Filo is designed with security in mind:
- Non-destructive analysis (unless explicitly requested with repair commands)
- Resource-limited processing
- Input-validated at all layers
- No external network calls (fully offline ML)
Supun Hewagamage (@supunhg)
When you need to know not just what something is, but why it's that, and how to fix it.