Python 3.10+, Kali Linux, nmap, whois, dnsutils, curl.
sudo apt update && sudo apt install -y nmap whois dnsutils curl
pip3 install requests python-dotenvConcurrent TCP port scanner using asyncio with a Semaphore rate limit.
# Scan localhost ports 1-1024 with default settings
python3 scanner.py 127.0.0.1
# Scan specific ports
python3 scanner.py 192.168.1.10 --ports 22,80,443,8080
# Scan a range with custom rate and save JSON output
python3 scanner.py 192.168.1.10 --ports 1-10000 --rate 500 --timeout 0.5 --output results.json- asyncio over threading: a single event loop with cooperative scheduling is more efficient than OS threads at high concurrency (hundreds of simultaneous connections), because threads carry per-thread stack overhead.
- Semaphore for rate limiting:
asyncio.Semaphore(rate)caps concurrent connections without restructuring the code. This prevents both resource exhaustion on the scanner and IDS/rate-limit triggering on the target. argparsefrom the start: hardcoded targets make scripts single-use. Parameterising everything makes the tool reusable across engagements.
At very high concurrency, the operating system exhausts its per-process file descriptor limit and the kernel's ephemeral port range. When the OS cannot allocate a new socket, the connect() call raises an OSError rather than timing out cleanly — which the scanner interprets as "port closed". The port is actually open; the scanner simply could not reach it due to local resource exhaustion.
This matters operationally: "the scanner did not detect it" is never the same claim as "the port is closed." Any tool — including nmap — can produce false negatives if configured too aggressively, if a firewall silently drops packets, or if the target is rate-limiting incoming connections. Scan results are evidence of what was observable at the time of the scan, not proof of absence.
Parses nmap XML output and enriches SSH hosts with their host key type.
# Generate the XML first
nmap -sV --open -oX scan.xml 192.168.1.0/24
# Parse and enrich
python3 parse_scan.py --input scan.xml --output hosts.jsonxml.etree.ElementTree(stdlib) instead of a third-party nmap library: no extra dependency, and reading the XML directly teaches the underlying structure.- Independent SSH enrichment:
ssh-keyscanruns only for hosts with port 22 open. Each call is wrapped in a try/except with atimeout=6so a non-responsive host doesn't block the entire script. Nonefor missing SSH key type: explicitnullin JSON is unambiguous; it distinguishes "we checked and found nothing" from "we didn't check."
The version string is direct intelligence: it lets an attacker query CVE databases (NVD, Exploit-DB) for known vulnerabilities in that exact release without sending a single additional packet. A server returning Server: Apache forces the attacker to run version-fingerprinting probes, which are slower, noisier, and more likely to trigger IDS alerts. Hiding the version string does not fix underlying vulnerabilities, but it increases the attacker's cost and reduces the signal available to automated scanners.
# auth.log (SSH brute-force simulation)
python3 - <<'EOF'
import random, datetime
ips = ["10.0.0.1", "10.0.0.2", "185.220.101.5", "192.168.1.50", "45.33.32.156"]
users = ["root", "admin", "ubuntu", "daniel"]
now = datetime.datetime.now()
with open("auth.log", "w") as f:
for _ in range(500):
ip = random.choices(ips, weights=[2, 2, 40, 1, 30])[0]
user = random.choice(users)
ts = (now - datetime.timedelta(seconds=random.randint(0, 86400))).strftime("%b %d %H:%M:%S")
f.write(f"{ts} kali sshd[1234]: Failed password for {user} from {ip} port {random.randint(40000,60000)} ssh2\n")
for _ in range(20):
ts = (now - datetime.timedelta(seconds=random.randint(0, 86400))).strftime("%b %d %H:%M:%S")
f.write(f"{ts} kali sshd[1234]: Accepted publickey for daniel from 192.168.1.1 port {random.randint(40000,60000)} ssh2\n")
EOF
# access.log (web server with injected attack traffic)
python3 - <<'EOF'
import random, datetime
ips = ["10.0.0.1", "185.220.101.5", "45.33.32.156", "66.249.66.1", "192.168.1.50"]
normal_paths = ["/", "/index.html", "/about", "/contact", "/static/main.css"]
attack_paths = [
"/?id=1' UNION SELECT 1,2,3--",
"/admin/../../../etc/passwd",
"/search?q=<script>alert(1)</script>",
"/wp-admin/",
"/cgi-bin/test.cgi?cmd=id",
]
now = datetime.datetime.now()
with open("access.log", "w") as f:
for hour in range(24):
count = random.randint(80, 120)
if hour == 3:
count = 950
for _ in range(count):
ip = random.choices(ips, weights=[30, 5, 5, 10, 3])[0]
path = random.choices(normal_paths + attack_paths, weights=[20]*5 + [1]*5)[0]
status = 200 if path in normal_paths else random.choice([200, 403, 500])
ts = (now.replace(hour=hour, minute=random.randint(0,59))).strftime("%d/%b/%Y:%H:%M:%S +0000")
f.write(f'{ip} - - [{ts}] "GET {path} HTTP/1.1" {status} {random.randint(200,5000)}\n')
EOFpython3 auth_analysis.py --log auth.log --threshold 10
python3 log_analysis.py --log access.log --report report.mddefaultdictandCounter: accumulate counts without initialisation boilerplate.- Streaming line-by-line (
for line in Path(...).open()): processes multi-gigabyte logs without loading them into memory. - 3-sigma threshold computed from the actual data distribution, not a hardcoded number: the baseline adapts to the server's real traffic level.
- Combined
report.mdwritten at the end oflog_analysis.pyso all findings are in one place.
Web traffic has strong daily periodicity: a server serving business users sees 2,000 req/hr at 14:00 and 50 req/hr at 04:00. A single global baseline mixes these together, producing a high standard deviation that makes the threshold too permissive during business hours (real attacks blend in) and too sensitive at night (normal low-traffic hours trigger as "anomalous").
A more robust approach segments the baseline by time bucket: compare 03:00 traffic only against other historical 03:00 readings (same hour, different days). This normalises the daily cycle before computing the deviation, producing a time-aware baseline that flags genuine spikes within each typical traffic stratum instead of comparing apples to oranges.
Integrated multi-stage reconnaissance tool. Supports both domain and IP modes, writes structured JSON, a Markdown report, and a full audit log.
# Domain mode (auto-detected)
python3 recon.py scanme.nmap.org --verbose
# IP mode (explicit)
python3 recon.py 45.33.32.156 --mode ip --output ./my_recon/ --verbose
# Auto-detect IP
python3 recon.py 45.33.32.156 --verboserecon_<target>_<timestamp>/
├── results.json — all findings as structured data
├── report.md — human-readable summary
└── audit.log — timestamped record of every action
- Each step fails independently: every tool call goes through
run(), which catchesTimeoutExpired,FileNotFoundError, and general exceptions individually. A missingwhoisbinary does not crash the DNS enumeration. - Structured data, not raw text: every parser extracts fields into a dict.
results.jsonis machine-readable and can be fed into downstream scripts without regex post-processing. - Audit log is non-negotiable: penetration tests require proof of what the tool did and when. The log records every command, its return code, and the timestamp. It also captures tool unavailability (
NOT_FOUND) so results can be interpreted correctly. - Missing security headers flagged in report: CSP, HSTS, X-Frame-Options, and X-Content-Type-Options are the minimum baseline. Their absence is a finding, not just neutral information.
Your tool (active recon) sends packets directly to the target: nmap SYNs, DNS queries with your IP as source, HTTP HEAD requests, whois lookups that may be logged. Every one of these leaves a trace in the target's logs and in network monitoring infrastructure. A defender with IDS, firewall logs, or a SIEM will see your IP appear in DNS queries and in nmap's TCP SYNs within seconds.
Shodan (passive recon) has already scanned the internet; you query Shodan's database, not the target. Your IP never touches the target. No packet, no log entry. From a network-monitoring defender's perspective, passive recon is undetectable because there is nothing to detect — the scanner that generated Shodan's data ran months or years ago from Shodan's own infrastructure.
When each is appropriate:
- Use passive (Shodan) for pre-engagement intelligence gathering when stealth is required, for targets in scope that are particularly sensitive (ICS/SCADA), or when you need historical service data without triggering alarms.
- Use active (your tool / nmap) when you need current, authoritative data — Shodan's records may be stale — or when the engagement explicitly authorises active scanning and you need confirmation of what is reachable right now.
- In a real engagement, both are used: passive first to build a target map without alerting defenders, then active to confirm and fill gaps.