Runtime behavioral analysis for Python packages, npm modules, DMG and EXE files — catching supply chain attacks that install-time scanners miss.
TraceTree executes suspicious packages inside an isolated Docker sandbox. Right after the initial download starts, it drops the container's network interface. This safely triggers and logs malicious outbound connection attempts without actually letting traffic escape.
A regex engine parses the strace output, tracks system calls (like clone, execve, socket, and openat), and builds a directed graph using NetworkX. Finally, a RandomForestClassifier trained on known malware evaluates the graph's topology to detect anomalous behavior.
You need Python 3.9+ and Docker running on your machine.
git clone https://github.com/tejasprasad2008-afk/TraceTree.git
cd TraceTree
# Install the CLI tool
pip install .The pipeline is controlled via a Typer CLI.
# Analyze a PyPI package
cascade-analyze requests
# Evaluate standard dependency files
cascade-analyze requirements.txt
cascade-analyze package.json
# Analyze compiled installers
cascade-analyze malicious_app.dmg
cascade-analyze payload.exeTraceTree uses a supervised RandomForestClassifier to map execution boundaries to an anomaly score. On the first run, cascade-analyze automatically downloads the latest trained model from a public Google Cloud Storage bucket.
If you want to train the model locally using the datasets in data/:
# Force download the latest model from GCS
cascade-update
# Run the 60-package dataset through the sandbox to train a new model
cascade-train- Security Researchers: Hunting undocumented supply chain behavior.
- DevOps / DevSecOps: Validating the runtime safety of injected dependencies.
- Software Engineers: Profiling the exact syscall requirements of applications.
The pipeline is split into 5 core modules:
/sandbox: Manages the Docker container lifecycle and actively restricts networking during testing./monitor: Parses thestracelog to track execution paths and network attempts./graph: Usesnetworkxto translate parent/child process relationships into an edge graph./ml: Feeds the extracted graph features into aRandomForestClassifierfor anomaly detection./cli: The Typer entrypoint that orchestrates the pipeline and renders the terminal UI.
In late 2024, the highly obfuscated XZ Utils backdoor bypassed standard static scanning. Advanced supply chain malware often hides malicious operations deep within legitimate-looking test code or delayed payload fetches. By analyzing the runtime execution graph, TraceTree bypasses code obfuscation entirely to see exactly what external files, commands, and sockets a package actually tries to open.
Pull requests are welcome. Please ensure new features remain decoupled across the existing architecture.
MIT