Skip to content

Conversation

Copy link

Copilot AI commented Oct 27, 2025

The script accepted unvalidated user input from YAML config files (language, extractor_options) and passed it to subprocess calls, creating a command injection vulnerability.

Changes

Input validation

  • Added VALID_LANGUAGES whitelist for 9 supported CodeQL languages (cpp, csharp, go, java, javascript, python, ruby, rust, swift)
  • Created validate_language() to reject languages outside the whitelist
  • Created validate_extractor_options() to reject shell metacharacters: ;&|<>\$(){}[]'"\n*`
  • Validation runs early in main() and build_database() to fail fast

Subprocess hardening

  • Added explicit shell=False to both subprocess.check_call invocations
  • Separated command arguments (--language, language) instead of string concatenation (--language={language})
  • Added security comments referencing Python subprocess docs and OWASP

Documentation & testing

  • Updated module docstring with security requirements for maintainers
  • Added 6 test cases covering valid/invalid languages and extractor options

Example

# Before: No validation, implicit shell=False
subprocess.check_call(["codeql", "database", "create", f"--language={language}", ...])

# After: Validated input, explicit shell=False
validate_language(language)
validate_extractor_options(extractor_options)
subprocess.check_call(["codeql", "database", "create", "--language", language, ...], shell=False)
Original prompt

Refactor misc/scripts/models-as-data/bulk_generate_mad.py to ensure secure usage of subprocess.check_call:

  1. Explicitly set shell=False in all subprocess.check_call invocations, even if arguments are already passed as a list.
  2. Validate or sanitize any variable/user-supplied arguments passed to subprocess, such as 'language' and 'extractor_options'. This can include:
    • Asserting that 'language' is from a known safe list.
    • Validating 'extractor_options' are well-formed (e.g., not accepting shell metacharacters or complex strings).
  3. Add a clear inline comment above the subprocess invocation, referencing the security rationale, and linking to the Python subprocess documentation and OWASP for command injection prevention.
  4. (Optional but recommended) Add a regression test or code comment noting the requirement to keep shell=False and validate inputs for future maintainers.

This addresses the security concern described in the attached issue: #5.

Reference sources:

This pull request was created as a result of the following prompt from Copilot chat.

Refactor misc/scripts/models-as-data/bulk_generate_mad.py to ensure secure usage of subprocess.check_call:

  1. Explicitly set shell=False in all subprocess.check_call invocations, even if arguments are already passed as a list.
  2. Validate or sanitize any variable/user-supplied arguments passed to subprocess, such as 'language' and 'extractor_options'. This can include:
    • Asserting that 'language' is from a known safe list.
    • Validating 'extractor_options' are well-formed (e.g., not accepting shell metacharacters or complex strings).
  3. Add a clear inline comment above the subprocess invocation, referencing the security rationale, and linking to the Python subprocess documentation and OWASP for command injection prevention.
  4. (Optional but recommended) Add a regression test or code comment noting the requirement to keep shell=False and validate inputs for future maintainers.

This addresses the security concern described in the attached issue: #5.

Reference sources:


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI and others added 3 commits October 27, 2025 20:53
Co-authored-by: dmsmi <6402178+dmsmi@users.noreply.github.com>
Co-authored-by: dmsmi <6402178+dmsmi@users.noreply.github.com>
Co-authored-by: dmsmi <6402178+dmsmi@users.noreply.github.com>
Copilot AI changed the title [WIP] Refactor bulk_generate_mad.py for subprocess security Secure subprocess usage in bulk_generate_mad.py against command injection Oct 27, 2025
Copilot AI requested a review from dmsmi October 27, 2025 21:01
Copilot finished work on behalf of dmsmi October 27, 2025 21:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants