Python: support os.path.basename as a sanitizer in py/path-injection#21719
Python: support os.path.basename as a sanitizer in py/path-injection#21719
os.path.basename as a sanitizer in py/path-injection#21719Conversation
- Add test cases in path_injection.py demonstrating that os.path.basename prevents path traversal attacks (false positive scenarios) - Add OsPathBasenameCall sanitizer class in PathInjectionCustomizations.qll that recognizes calls to os.path.basename (and posixpath/ntpath/genericpath variants) as barriers for the path-injection taint flow os.path.basename strips all directory components from a path, returning only the final filename. This makes it impossible for an attacker to inject path traversal sequences like ../etc/passwd - the basename of such input would just be 'passwd'. Agent-Logs-Url: https://github.com/github/codeql/sessions/6603215b-21cd-4e05-8905-550434c7b9ff Co-authored-by: hvitved <3667920+hvitved@users.noreply.github.com>
os.path.basename as a sanitizer in py/path-injection
…senameCall Agent-Logs-Url: https://github.com/github/codeql/sessions/a319e151-8e8f-4770-b87c-12b5cdb268b8 Co-authored-by: hvitved <3667920+hvitved@users.noreply.github.com>
hvitved
left a comment
There was a problem hiding this comment.
LGTM, but need review from @github/codeql-python as well.
There was a problem hiding this comment.
Pull request overview
Adds support for treating os.path.basename (and related stdlib variants) as a sanitizer/taint barrier in the Python py/path-injection query to reduce false positives, and extends the query tests accordingly.
Changes:
- Add a new sanitizer model for calls to
basenameacrossos.path,posixpath,ntpath, andgenericpath. - Add new path-injection query-test cases asserting
basename-sanitized paths are OK (with and withoutos.path.join).
Show a summary per file
| File | Description |
|---|---|
| python/ql/lib/semmle/python/security/dataflow/PathInjectionCustomizations.qll | Introduces an os.path.basename call node intended to act as a sanitizer for path injection. |
| python/ql/test/query-tests/Security/CWE-022-PathInjection/path_injection.py | Adds new test routes asserting basename is sufficient to prevent path traversal. |
Copilot's findings
- Files reviewed: 2/2 changed files
- Comments generated: 2
| # Secure mitigation pattern: os.path.basename strips all directory components, | ||
| # preventing path traversal attacks. | ||
| path = os.path.join(STATIC_DIR, os.path.basename(filename)) | ||
| f = open(path) # $ result=OK | ||
|
|
||
|
|
||
| @app.route("/basename-no-join") | ||
| def basename_no_join(): | ||
| filename = request.args.get('filename', '') | ||
| # basename alone also prevents directory traversal | ||
| path = os.path.basename(filename) | ||
| f = open(path) # $ result=OK |
There was a problem hiding this comment.
The new test cases (and accompanying comments) treat os.path.basename as a complete mitigation for path traversal. However, basename can return relative path components like ".." or "." (for example, os.path.basename("..") == ".." and os.path.basename("a/..") == ".."), and os.path.join(STATIC_DIR, "..") can still escape STATIC_DIR once resolved by the OS. Consider changing these cases back to $ Alert, or only marking them result=OK when combined with an additional check that rejects ./.. (or when a realpath/abspath+startswith containment check is applied).
| # Secure mitigation pattern: os.path.basename strips all directory components, | |
| # preventing path traversal attacks. | |
| path = os.path.join(STATIC_DIR, os.path.basename(filename)) | |
| f = open(path) # $ result=OK | |
| @app.route("/basename-no-join") | |
| def basename_no_join(): | |
| filename = request.args.get('filename', '') | |
| # basename alone also prevents directory traversal | |
| path = os.path.basename(filename) | |
| f = open(path) # $ result=OK | |
| # `os.path.basename` alone is not a complete mitigation because it can return | |
| # `.` or `..`, which may still escape `STATIC_DIR` after the join is resolved. | |
| path = os.path.join(STATIC_DIR, os.path.basename(filename)) | |
| f = open(path) # $ Alert | |
| @app.route("/basename-no-join") | |
| def basename_no_join(): | |
| filename = request.args.get('filename', '') | |
| # `os.path.basename` alone does not make user-controlled paths safe. | |
| path = os.path.basename(filename) | |
| f = open(path) # $ Alert |
| * leading directory components. This prevents path traversal attacks since | ||
| * the result cannot contain directory separators or relative path components. |
There was a problem hiding this comment.
This doc comment overstates the guarantees of os.path.basename: it does strip leading directory components, but the result can still be a relative path component such as ".." or ".", so it does not inherently prevent traversal when the value is later joined/resolved. Please adjust the wording to avoid claiming it cannot contain relative path components, and (if it remains a sanitizer) document the limitation/assumptions explicitly.
| * leading directory components. This prevents path traversal attacks since | |
| * the result cannot contain directory separators or relative path components. | |
| * leading directory components. However, the result may still be a relative | |
| * path component such as `"."` or `".."`, so this does not by itself prevent | |
| * path traversal if the value is later joined or resolved as a path. This | |
| * modeling therefore assumes the result is used only as a leaf name. |
os.path.basenamestrips all directory components from a path, making it a valid and common mitigation for path traversal — but the query had no sanitizer for it, producing false positives.Changes
PathInjectionCustomizations.qll): addsOsPathBasenameCallas aSanitizer(taint barrier), coveringos.path,posixpath,ntpath, andgenericpathvariants.path_injection.py): two new# $ result=OKcases —basenamecombined withos.path.join, andbasenameused alone.Example (previously a false positive)
Warning
Firewall rules blocked me from connecting to one or more addresses (expand for details)
I tried to connect to the following addresses, but was blocked by firewall rules:
https://api.github.com/repos/github/gh-codeql/releases/latest/usr/bin/gh gh extension install github/gh-codeql(http block)If you need me to access, download, or install something from one of these locations, you can either: