Would your code catch http://2130706433/ as localhost?
This SSRF mitigation didn't. And it passed code review.
Here's a subset of a real SSRF mitigation that Cursor + Claude implemented:
LOCALHOST_PATTERN = re.compile(
r'^https?://(localhost|127\.0\.0\.1|0\.0\.0\.0|\[::1\])',
re.IGNORECASE
)
INTERNAL_IP_PATTERN = re.compile(
r'^https?://(10\.|172\.(1[6-9]|2[0-9]|3[01])\.|192\.168\.)',
re.IGNORECASE
)
def validate_url(url):
if LOCALHOST_PATTERN.match(url):
raise ValueError("Localhost not allowed")
if INTERNAL_IP_PATTERN.match(url):
raise ValueError("Internal IP not allowed")Looks reasonable. But I tested these with Python's requests.get(). Every single one bypasses the regex AND successfully connects:
http://2130706433/ # Decimal IP for 127.0.0.1
http://0x7f000001/ # Hex
http://[::ffff:127.0.0.1]/ # IPv4-mapped IPv6
http://google.com@127.0.0.1/ # Hostname is actually 127.0.0.1
[space]http://127.0.0.1/ # Leading space
[tab]http://127.0.0.1/ # Leading tab
These aren't theoretical risks. SSRF bypasses can let attackers:
- Steal cloud credentials - access
http://169.254.169.254/to retrieve AWS/GCP/Azure IAM tokens, then pivot to your entire infrastructure - Reach internal services - hit admin panels, databases, and APIs that assume "internal = trusted"
- Exfiltrate data - access internal documentation, secrets vaults, or customer data
This is how major breaches happen. In 2019, Capital One lost 100 million customer records when an attacker used SSRF to access AWS metadata credentials. Shopify, GitLab, and others have paid significant bug bounties for similar SSRF bypasses to internal services.
The business impact: regulatory fines, breach notification costs, incident response, legal exposure, and reputation damage. Capital One paid over $300 million in settlements and remediation. And it started with one bypassable validation function.
Regex pattern-matches the string. But requests.get() interprets the URL - and those aren't the same thing. Leading whitespace? Stripped. Decimal IP? Converted. The attacker isn't trying to match your regex; they're trying to reach your internal network.
The core issue: regex denylists fail by allowing. Every bypass you don't enumerate gets through.
No. SAST tools detect the absence of validation, not inadequate validation. They see a regex check before a network request and move on - "SSRF mitigation present, no finding."
SAST doesn't understand bypass semantics. It doesn't know that http://2130706433/ resolves to localhost, or that requests.get() strips leading whitespace. It pattern-matches code, not attacker behavior.
Instead of pattern-matching strings, extract the hostname and validate its properties:
from urllib.parse import urlparse
import ipaddress
def validate_url(url):
parsed = urlparse(url)
if not parsed.hostname:
raise ValueError("Invalid URL")
try:
ip = ipaddress.ip_address(parsed.hostname)
if ip.is_loopback or ip.is_private or ip.is_reserved or ip.is_link_local:
raise ValueError("Restricted IP")
except ValueError:
pass # It's a hostname, not an IPThis is still a denylist - but a far more robust one. By parsing the URL first and validating the actual IP address, ipaddress.is_loopback handles all representations of 127.0.0.1 - decimal, hex, IPv4-mapped IPv6 - because it checks the parsed value, not string patterns.
| Approach | Behavior | Failure Mode |
|---|---|---|
| Regex denylist | Blocks string patterns you've enumerated | Fails open - encoding tricks bypass it |
| Parsed denylist | Blocks IP properties after parsing | More robust - handles encoding variations |
| Allowlist | Only permits explicitly approved destinations | Fails closed - safest when feasible |
For maximum security, consider an allowlist approach if your use case permits - only allow URLs to specific, known-good destinations rather than trying to block all bad ones.
Validating the URL string - even with proper parsing - only protects you at the input layer. You also need protection at the network layer.
Why? Because:
- DNS rebinding - a public domain can resolve to
127.0.0.1. The hostnameattacker.compasses your URL validation, but whenrequests.get()resolves it, the DNS points to localhost. - Redirects - a legitimate-looking URL can return a 302 redirect to
http://169.254.169.254/. Your validation checked the original URL, not the redirect target.
To fully mitigate SSRF, you need to validate at request time - after DNS resolution and before the connection is made. This typically means:
- Resolving the hostname yourself and validating the IP before passing it to your HTTP client
- Disabling redirects or validating each redirect target
- Using network-level controls (egress filtering, firewall rules) as a backstop
Input validation catches the obvious cases. Network-layer validation catches the clever ones.
And this isn't about blaming the AI - it generated a pattern it's seen thousands of times. A human would easily have made the exact same mistake.
The real question is: how do we systematically improve and drive scalable, robust security practices across an entire codebase?
Use cases change overnight. New integrations get added. Business logic evolves. That URL validator that was "good enough" last quarter might now be handling user-supplied webhooks or third-party callbacks with completely different trust assumptions. Security mitigations must evolve in lockstep - or they silently become vulnerabilities.
You shouldn't need to wait for a late and expensive pentest to catch issues like this. These bypasses aren't novel - they're well-documented techniques that should be caught continuously, as code is written.
We help teams close the gap between code that looks secure and code that actually is:
-
Understand business logic and intended use cases - security that doesn't account for what the code needs to do ends up too permissive or breaks in production. As your use cases evolve, so should your threat model.
-
Autonomous threat modeling to pinpoint security hot spots - ongoing analysis that identifies where SSRF, injection, and other risks live in your codebase as it evolves. Not a one-time diagram that's outdated by the next sprint.
-
Security recommendations your coding agent can follow - give AI assistants the context to generate secure code in the first place, built on proven mitigations rather than pattern-matching from training data.
-
Verify that the coding agent followed secure practices - close the loop by checking that generated code actually implements mitigations correctly, catching gaps like the regex denylist above before they ship.
AI is accelerating how fast we write code. Oplane helps make sure security keeps pace - continuously, not just at checkpoints.
Want to see how Oplane works for your codebase? Get in touch or visit oplane.io