This repository implements a lightweight, explainable SQL Injection (SQLi) detection and prevention prototype. A Flask proxy intercepts SQL queries, normalizes them into structural fingerprints, and enforces a whitelist of approved fingerprints. Queries that do not match the whitelist are blocked and logged.
This design prioritizes interpretability, ease of deployment, and low operational overhead—suitable for research, demonstrations, and lightweight production prototypes.
- Grammar-based fingerprinting: replace literals with placeholders and normalize keywords.
- Whitelist enforcement of safe query fingerprints.
- Flask proxy that intercepts queries before they reach the database.
- SQLite-backed execution for allowed queries.
- Evaluation scripts and automated test harness.
- Human-readable whitelist (
whitelist.json) for audit and review.
.
├── dataset/
│ └── queries.csv # Labeled queries: query,label
├── src/
│ ├── fingerprint.py # Query normalization
│ ├── whitelist.py # Whitelist load/check/update
│ ├── train.py # Build whitelist from dataset
│ ├── proxy.py # Flask proxy server
│ ├── evaluate.py # Evaluation script (metrics)
│ └── test_proxy.py # Integration tests
├── whitelist.json # Generated whitelist (can be regenerated)
├── requirements.txt
└── README.md
- Python 3.10 or newer
pip- Recommended: virtual environment (
venv) - Python packages (see
requirements.txt):- Flask
- pandas
- scikit-learn
- requests
Install dependencies:
python -m venv venv
# Windows
venv\Scripts\activate
# macOS / Linux
source venv/bin/activate
pip install -r requirements.txt- Train the whitelist (generates
whitelist.jsonfromdataset/queries.csv):python src/train.py
- Start the Flask proxy:
Default address:
python src/proxy.py
http://127.0.0.1:5001
Send queries (examples below) or use the provided PowerShell helper for Windows.
POST /query— submit a SQL string as JSON:{ "query": "<SQL>" }GET /health— healthcheckGET /status— server statusGET /whitelist— current whitelist summaryGET /blocked— blocked queries log
Paste this function into PowerShell once, then call it to test queries safely:
function Invoke-ProxyQuery {
param(
[Parameter(Mandatory=$true)][string] $Query
)
$Url = 'http://127.0.0.1:5001/query'
$Body = @{ query = $Query } | ConvertTo-Json
try {
$Result = Invoke-RestMethod -Uri $Url -Method Post -ContentType 'application/json' -Body $Body -ErrorAction Stop
Write-Host "`nALLOWED QUERY" -ForegroundColor Green
$Result | Format-List
} catch {
$Resp = $_.Exception.Response
if ($Resp) {
$Reader = New-Object System.IO.StreamReader($Resp.GetResponseStream())
$Body = $Reader.ReadToEnd()
$Reader.Close()
try {
$Json = $Body | ConvertFrom-Json
Write-Host "`nBLOCKED QUERY" -ForegroundColor Red
$Json | Format-List
} catch {
Write-Host "`nBLOCKED QUERY (non-JSON body)" -ForegroundColor Red
Write-Host $Body
}
} else {
Write-Host "REQUEST FAILED: $($_.Exception.Message)" -ForegroundColor Red
}
}
}Usage:
# Allowed (normal) query
Invoke-ProxyQuery -Query "SELECT * FROM users WHERE id=1"
# Blocked (SQLi) example
Invoke-ProxyQuery -Query "SELECT * FROM users WHERE id='1' OR '1'='1'"Open cmd.exe (not PowerShell) and run:
# Allowed
curl -X POST http://127.0.0.1:5001/query -H "Content-Type: application/json" -d "{\"query\":\"SELECT * FROM users WHERE id=1\"}"
# Blocked
curl -X POST http://127.0.0.1:5001/query -H "Content-Type: application/json" -d "{\"query\":\"SELECT * FROM users WHERE id='1' OR '1'='1'\"}"# Allowed
curl -X POST http://127.0.0.1:5001/query -H "Content-Type: application/json" -d '{ "query": "SELECT * FROM users WHERE id=1" }'
# Blocked
curl -X POST http://127.0.0.1:5001/query -H "Content-Type: application/json" -d '{ "query": "SELECT * FROM users WHERE id='\''1'\'' OR '\''1'\''='\''1'\''" }'Run the evaluation script to compute accuracy, precision, recall, F1 and save results:
python src/evaluate.pyExample output (controlled dataset):
Accuracy: 1.0000 (100.00%)
Precision: 1.0000 (100.00%)
Recall: 1.0000 (100.00%)
F1 Score: 1.0000 (100.00%)
evaluation_results.json is created/updated with the run summary.
fingerprint.pynormalizes queries by replacing string and numeric literals with?, removing comments, and uppercasing keywords.train.pybuildswhitelist.jsonfrom labeled normal queries indataset/queries.csv.proxy.pyenforces the whitelist and executes allowed queries on a local SQLite DB (proxy_db.sqlite).- To expand the whitelist safely, use a staged learning mode: gather candidate fingerprints, manually review them, then add.
- Activate your virtual environment.
- Run
python src/train.pyand show whitelist generation. - Start proxy:
python src/proxy.py. - In PowerShell: paste the
Invoke-ProxyQueryfunction, then run sample queries. - Show
/blockedand/whitelistendpoints results.
- Fork the repo, create a feature branch, run tests locally, then open a pull request.
- Keep whitelist updates auditable: automatic additions must be subject to manual review.
This project is provided for academic and research use. It is not hardened for production use. Review and harden before any live deployment.
Rakshit Bansal, Sarthak Ray