This project implements a comprehensive static code analysis system for detecting security vulnerabilities, code quality issues, and potential bugs in Python codebases. It combines traditional rule-based analysis with machine learning approaches to provide accurate vulnerability detection and prioritization.
DISCLAIMER: This is a defensive research and educational demonstration tool. It is not intended for production security operations and may contain inaccuracies. Use only for legitimate security research and code quality improvement purposes.
- AST-based Analysis: Deep code structure analysis using Abstract Syntax Trees
- Rule-based Detection: Comprehensive security vulnerability patterns
- ML-powered Prioritization: Machine learning models for bug triage and priority ranking
- Interactive Demo: Streamlit-based web interface for code analysis
- Explainable Results: SHAP-based explanations for vulnerability detections
- Privacy Protection: Automatic PII detection and redaction
pip install -e .from src.analysis.scanner import CodeScanner
from src.models.vulnerability_detector import VulnerabilityDetector
# Initialize scanner
scanner = CodeScanner()
detector = VulnerabilityDetector()
# Analyze code
code = """
def login(username, password):
if password == 'admin123': # Hardcoded password
return True
return False
"""
results = scanner.scan_code(code)
vulnerabilities = detector.predict_vulnerabilities(results)streamlit run demo/app.pysrc/
├── analysis/ # Core analysis modules
├── data/ # Data processing and feature engineering
├── models/ # ML models and vulnerability detection
├── evaluation/ # Metrics and evaluation tools
├── visualization/ # Plotting and visualization utilities
└── utils/ # Common utilities and helpers
configs/ # Configuration files
data/ # Sample datasets and generated data
scripts/ # Training and evaluation scripts
tests/ # Test suite
assets/ # Generated plots and results
demo/ # Streamlit demo application
python scripts/train_vulnerability_detector.py --config configs/default.yamlpython scripts/evaluate_models.py --model-path models/vulnerability_detector.pkl --generate-plotsThe system uses YAML configuration files for all settings:
analysis:
max_file_size: 1000000
supported_extensions: [".py"]
models:
vulnerability_detector:
model_type: "gradient_boosting"
features: ["ast_features", "text_features", "complexity_metrics"]
evaluation:
test_split: 0.2
cv_folds: 5
metrics: ["precision@k", "recall", "f1_score"]- Input: Python source code files or code snippets
- Features: AST nodes, control flow graphs, text embeddings
- Labels: Vulnerability types, severity levels, priority scores
- Security: SQL injection, XSS, hardcoded secrets, unsafe deserialization
- Quality: Code smells, complexity issues, maintainability problems
- Bugs: Logic errors, exception handling, resource leaks
Run the test suite:
pytest tests/ -v- Precision@K: Accuracy of top-K vulnerability predictions
- Vulnerability Recall: Detection rate for different vulnerability types
- False Positive Rate: Rate of incorrect vulnerability flags
- Processing Speed: Analysis time per line of code
- Analysis is static only (no runtime behavior)
- Limited to Python codebases
- May miss context-dependent vulnerabilities
- Requires manual verification of critical findings
This tool is designed for defensive security research and educational purposes only. It should not be used for:
- Unauthorized security testing
- Malicious code analysis
- Production security operations without proper validation
Always ensure you have proper authorization before analyzing any codebase.
- Fork the repository
- Create a feature branch
- Make changes with proper tests
- Run linting and tests:
pre-commit run --all-files - Submit a pull request
MIT License - See LICENSE file for details# Static-Code-Analysis-Security-Project