Python query-stripping uses naive ? find
Severity: Medium
Affected repos: middleware-python
Component boundary: middleware-python interceptor privacy
Symptom
_strip_query() in middleware-python/recost/_interceptor.py finds the first ? in the URL and truncates. This fails on:
- URLs with a
# fragment before the query position: https://api.example.com/path#anchor?x=1 → keeps everything before ?, which now includes the fragment. Not actually a privacy leak, but inconsistent with Node.
- Encoded
%3F in the path: https://api.example.com/path%3F/real?key=secret → truncates at the encoded ?, leaving the real query string in place. Privacy leak.
- Unparseable URLs: fall through with the raw string.
The Node SDK uses URL parsing (new URL(...)) and extracts pathname cleanly. Parity is broken.
Evidence
middleware-python/recost/_interceptor.py — _strip_query() uses url.find("?") and url[:idx].
middleware-node/src/core/interceptor.ts — uses new URL(url).pathname (correct).
Impact
- For URLs with a malformed or encoded path, the Python SDK can ship query parameters (including potentially secrets like API tokens in query strings) to the telemetry endpoint. This contradicts the SDK's own privacy promise.
Fix recommendation
from urllib.parse import urlparse
def _strip_query(url: str) -> str:
try:
parsed = urlparse(url)
return f"{parsed.scheme}://{parsed.netloc}{parsed.path}"
except Exception:
return url # Best effort; never raise from a hot path
Verification
- Test cases:
https://api.example.com/foo?secret=abc → https://api.example.com/foo
https://api.example.com/p%3Fath?real=q → https://api.example.com/p%3Fath
https://api.example.com/foo#x?y=z → https://api.example.com/foo
Python query-stripping uses naive
?findSeverity: Medium
Affected repos:
middleware-pythonComponent boundary: middleware-python interceptor privacy
Symptom
_strip_query()inmiddleware-python/recost/_interceptor.pyfinds the first?in the URL and truncates. This fails on:#fragment before the query position:https://api.example.com/path#anchor?x=1→ keeps everything before?, which now includes the fragment. Not actually a privacy leak, but inconsistent with Node.%3Fin the path:https://api.example.com/path%3F/real?key=secret→ truncates at the encoded?, leaving the real query string in place. Privacy leak.The Node SDK uses URL parsing (
new URL(...)) and extracts pathname cleanly. Parity is broken.Evidence
middleware-python/recost/_interceptor.py—_strip_query()usesurl.find("?")andurl[:idx].middleware-node/src/core/interceptor.ts— usesnew URL(url).pathname(correct).Impact
Fix recommendation
Verification
https://api.example.com/foo?secret=abc→https://api.example.com/foohttps://api.example.com/p%3Fath?real=q→https://api.example.com/p%3Fathhttps://api.example.com/foo#x?y=z→https://api.example.com/foo