Skip to content

flutterde/file-scanner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

file-scanner

A comprehensive npm package for scanning files and detecting potential security threats, malware, and malicious content across multiple file types.

Features

Core High-Risk File Type Support

PDF Documents

  • âś… Magic bytes verification (%PDF- header)
  • âś… EOF marker validation (%%EOF)
  • âś… Cross-reference table integrity
  • âś… Embedded JavaScript detection
  • âś… Malicious form actions detection
  • âś… Launch actions that execute system commands
  • âś… Object stream manipulation detection
  • âś… Digital signature validation readiness
  • âś… Version consistency checks

Image Files

PNG:

  • âś… PNG signature verification (89 50 4E 47 0D 0A 1A 0A)
  • âś… IHDR chunk validation
  • âś… CRC checksum verification
  • âś… Dimension sanity limits (memory exhaustion protection)
  • âś… Chunk ordering enforcement

JPEG:

  • âś… SOI/EOI marker validation (FF D8 / FF D9)
  • âś… EXIF metadata injection detection
  • âś… Segment length validation
  • âś… Buffer overflow protection

WebP:

  • âś… RIFF container validation
  • âś… Chunk size verification
  • âś… VP8/VP8L format validation

GIF:

  • âś… Signature validation (GIF87a/GIF89a)
  • âś… Logical screen descriptor checks
  • âś… Frame count limits (DoS protection)

SVG:

  • âś… XML well-formedness
  • âś… Script tag detection (XSS protection)
  • âś… Event handler detection
  • âś… External entity (XXE) detection
  • âś… Foreign object inspection

Archive Files

ZIP:

  • âś… Magic bytes verification
  • âś… Zip slip vulnerability detection (path traversal)
  • âś… Compression bomb detection
  • âś… Entry count limits
  • âś… Nested archive detection
  • âś… Null byte in filename detection

RAR, GZIP, 7Z, TAR:

  • âś… Format-specific signature validation
  • âś… Header integrity checks
  • âś… Path traversal protection

Executable Files

Windows PE (EXE/DLL):

  • âś… MZ signature validation
  • âś… PE header verification
  • âś… Entropy analysis (packed/encrypted detection)
  • âś… Digital signature checking

Linux ELF:

  • âś… ELF magic number validation
  • âś… Class and architecture validation
  • âś… Entropy analysis

Script Files

JavaScript, Python, Bash, PowerShell, VBScript:

  • âś… Dangerous function detection (eval, exec, etc.)
  • âś… Command injection pattern matching
  • âś… Base64/hex encoding detection
  • âś… Obfuscation indicators
  • âś… URL extraction

Configuration Files

JSON, YAML, XML:

  • âś… Syntax validation
  • âś… Prototype pollution detection (JSON)
  • âś… YAML tag injection detection
  • âś… XXE attack prevention (XML)
  • âś… Billion laughs attack detection
  • âś… Nesting depth limits

Cross-Cutting Security Features

  • âś… File size validation
  • âś… MIME type consistency checks
  • âś… Magic byte verification
  • âś… Entropy analysis
  • âś… Character encoding validation
  • âś… Memory allocation boundaries

Installation

npm install @flutterde/file-scanner

Environment Variables

PRODUCTION

Control logging behavior:

  • PRODUCTION=true: Disables all debug and info logs (for production environments)
  • PRODUCTION=false or unset: Enables detailed logs with emojis (for development)
# Development mode (with logs)
node your-app.js

# Production mode (no logs)
PRODUCTION=true node your-app.js

Log Examples (Development Mode):

🔎 Starting scan: image.jpg (245.67 KB)
🔍 Detecting file type from magic bytes...
🖼️ File type: jpg (image/jpeg) (from magic bytes)
ℹ️ Running deep scan on JPG file...
✨ File is clean: image.jpg

Note: Error logs are always shown, even in production mode.

Usage

Basic Example

import { scanFile } from '@flutterde/file-scanner';
import { readFileSync } from 'fs';

// Read file as buffer
const buffer = readFileSync('/path/to/suspicious-file.pdf');
const result = await scanFile(buffer, { fileName: 'suspicious-file.pdf' });

console.log('Is Clean:', result.isClean);
console.log('Threats Found:', result.threats.length);
console.log('File Type:', result.fileType?.mime);

if (!result.isClean) {
  result.threats.forEach(threat => {
    console.log(`[${threat.severity}] ${threat.type}: ${threat.description}`);
  });
}

Base64 File Upload (No Extension)

The scanner automatically detects file types from magic bytes, perfect for base64 uploads from browsers where the file extension might not be available:

import { scanFile } from '@flutterde/file-scanner';

// Simulate base64 upload from browser (no file extension)
const base64String = 'iVBORw0KGgoAAAANSUhEUgAAAAUA...';
const buffer = Buffer.from(base64String, 'base64');

// No fileName needed - type detected from buffer content
const result = await scanFile(buffer, { deepScan: true });

console.log('Detected Type:', result.fileType?.ext); // 'png'
console.log('MIME Type:', result.fileType?.mime);     // 'image/png'
console.log('Is Clean:', result.isClean);

Advanced Usage with Options

import { scanFile, ScanOptions } from '@flutterde/file-scanner';
import { readFileSync } from 'fs';

const buffer = readFileSync('/path/to/file.zip');

const options: ScanOptions = {
  fileName: 'archive.zip',
  maxFileSize: 100 * 1024 * 1024, // 100MB
  maxNestingDepth: 5,
  maxCompressionRatio: 50,
  deepScan: true,
};

const result = await scanFile(buffer, options);

// Access detailed metadata
console.log('Signature Valid:', result.metadata?.signatureValid);
console.log('Checks Performed:', result.metadata?.checksPerformed);
console.log('Warnings:', result.metadata?.warnings);
console.log('Compression Ratio:', result.metadata?.compressionRatio);

Handling Results

import { scanFile, ThreatType, FileCategory } from '@flutterde/file-scanner';
import { readFileSync } from 'fs';

const buffer = readFileSync('/path/to/document.pdf');
const result = await scanFile(buffer, { fileName: 'document.pdf' });

// Check for specific threat types
const hasJavaScript = result.threats.some(
  t => t.type === ThreatType.EMBEDDED_JAVASCRIPT
);

// Filter by severity
const criticalThreats = result.threats.filter(
  t => t.severity === 'critical'
);

// Check file category
if (result.fileType?.category === FileCategory.EXECUTABLE) {
  console.warn('Executable file detected!');
}

API Reference

scanFile(buffer: Buffer, options?: ScanOptions): Promise<ScanResult>

Scans a file buffer for security threats.

Parameters:

  • buffer (Buffer): File content as a Buffer
  • options (ScanOptions, optional): Scanning configuration
    • fileName (string, optional): Original filename for reference
    • maxFileSize (number): Maximum allowed size in bytes
    • deepScan (boolean): Enable thorough scanning
    • maxCompressionRatio (number): For archives
    • Other options...

Returns: Promise<ScanResult>

ScanResult Interface

interface ScanResult {
  fileName: string;          // File name or 'buffer'
  isClean: boolean;          // Overall safety status
  threats: Threat[];         // Detected threats
  scannedAt: Date;          // Scan timestamp
  fileType?: FileTypeInfo;   // Detected file type
  fileSize: number;          // File size in bytes
  metadata?: FileMetadata;   // Additional information
}

Threat Interface

interface Threat {
  type: ThreatType;                    // Threat category
  severity: 'low' | 'medium' | 'high' | 'critical';
  description: string;                 // Human-readable description
  location?: string;                   // Location in file
  details?: Record<string, unknown>;   // Additional context
}

ScanOptions Interface

interface ScanOptions {
  fileName?: string;            // Original filename (optional)
  maxFileSize?: number;         // Max bytes to scan (default: 500MB)
  maxNestingDepth?: number;     // Max archive nesting (default: unlimited)
  maxCompressionRatio?: number; // Max compression ratio (default: unlimited)
  deepScan?: boolean;           // Enable deep scanning (default: true)
  customPatterns?: RegExp[];    // Custom threat patterns
  skipChecks?: string[];        // Skip specific checks
}

Threat Types

The package detects the following threat categories:

PDF Threats

  • EMBEDDED_JAVASCRIPT - JavaScript in PDFs
  • MALICIOUS_FORM_ACTION - Suspicious form actions
  • LAUNCH_ACTION - System command execution
  • OBJECT_STREAM_MANIPULATION - PDF structure manipulation

Image Threats

  • CHUNK_MANIPULATION - PNG chunk tampering
  • BUFFER_OVERFLOW - Malformed dimensions
  • POLYGLOT_FILE - Dual-format files
  • METADATA_INJECTION - EXIF payload injection

Archive Threats

  • ZIP_SLIP - Path traversal vulnerability
  • COMPRESSION_BOMB - Decompression DoS
  • NESTED_ARCHIVE - Excessive nesting

Executable Threats

  • CODE_EXECUTION - Executable detection
  • SUSPICIOUS_ENTROPY - Packed/encrypted code
  • INVALID_SIGNATURE - Missing/invalid signatures

Script Threats

  • DANGEROUS_FUNCTION - eval, exec, etc.
  • CODE_INJECTION - Injection patterns
  • DESERIALIZATION_ATTACK - Unsafe deserialization

Web Content Threats

  • XSS_PAYLOAD - Cross-site scripting
  • IFRAME_INJECTION - Hidden iframes
  • FORM_HIJACKING - Form manipulation

Universal Threats

  • INVALID_MAGIC_BYTES - Wrong file signature
  • MIME_MISMATCH - Type/extension mismatch
  • EXCESSIVE_SIZE - Unreasonably large files
  • MALFORMED_STRUCTURE - Corrupted structure

File Categories

Files are categorized into:

  • PDF - PDF documents
  • IMAGE - Image files (PNG, JPEG, GIF, SVG, WebP)
  • DOCUMENT - Office documents (DOCX, XLSX, etc.)
  • ARCHIVE - Compressed archives (ZIP, RAR, TAR, etc.)
  • EXECUTABLE - Binary executables (EXE, DLL, ELF, etc.)
  • SCRIPT - Script files (JS, Python, Bash, etc.)
  • WEB_CONTENT - Web files (HTML, CSS, WASM, etc.)
  • DATABASE - Database files (SQL, CSV, etc.)
  • MEDIA - Media files (MP4, MP3, etc.)
  • UNKNOWN - Unrecognized types

Security Best Practices

  1. Always scan files before processing - Especially user uploads
  2. Set appropriate size limits - Prevent resource exhaustion
  3. Use deep scanning in production - More thorough analysis
  4. Monitor critical threats - Act on critical severity findings
  5. Validate MIME types - Don't trust file extensions
  6. Log scan results - Maintain audit trail
  7. Quarantine suspicious files - Isolate threats immediately

Performance Considerations

  • Files are read into memory for analysis
  • Large files (>100MB) may require increased memory
  • Deep scanning adds ~20-30% overhead
  • Archive scanning doesn't extract contents (metadata only)
  • Entropy calculation samples first 100KB

Limitations

  • Document macros: Limited detection for Office macros (basic patterns only)
  • Encrypted archives: Cannot inspect encrypted content
  • Polymorphic malware: Signature-based detection only
  • Zero-day exploits: No behavioral analysis
  • Binary analysis: Basic PE/ELF validation, not full disassembly

Contributing

Contributions are welcome! Please ensure:

  • All tests pass
  • New validators include comprehensive checks
  • Documentation is updated
  • Code follows TypeScript best practices

License

ISC

Author

otman

Repository

https://github.com/flutterde/file-scanner

Support

For issues and feature requests, please use the GitHub issue tracker.

Installation

npm install @flutterde/file-scanner

Parameters:

  • filePath - Path to the file to scan

Returns: A promise that resolves to a ScanResult object containing:

  • filePath - Path to the scanned file
  • isClean - Whether the file is clean
  • threats - Array of detected threats
  • scannedAt - Timestamp of when the scan was performed

Development

# Install dependencies
npm install

# Build the project
npm run build

# Watch mode for development
npm run dev

# Clean build artifacts
npm run clean

License

ISC

About

A npm package that scan files and check if contains scripts or malwares

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published