Secure subprocess usage in bulk_generate_mad.py against command injection #6

Copilot · 2025-10-27T20:45:46Z

The script accepted unvalidated user input from YAML config files (language, extractor_options) and passed it to subprocess calls, creating a command injection vulnerability.

Changes

Input validation

Added VALID_LANGUAGES whitelist for 9 supported CodeQL languages (cpp, csharp, go, java, javascript, python, ruby, rust, swift)
Created validate_language() to reject languages outside the whitelist
Created validate_extractor_options() to reject shell metacharacters: ;&|<>\$(){}[]'"\n*`
Validation runs early in main() and build_database() to fail fast

Subprocess hardening

Added explicit shell=False to both subprocess.check_call invocations
Separated command arguments (--language, language) instead of string concatenation (--language={language})
Added security comments referencing Python subprocess docs and OWASP

Documentation & testing

Updated module docstring with security requirements for maintainers
Added 6 test cases covering valid/invalid languages and extractor options

Example

# Before: No validation, implicit shell=False
subprocess.check_call(["codeql", "database", "create", f"--language={language}", ...])

# After: Validated input, explicit shell=False
validate_language(language)
validate_extractor_options(extractor_options)
subprocess.check_call(["codeql", "database", "create", "--language", language, ...], shell=False)

Original prompt

Refactor misc/scripts/models-as-data/bulk_generate_mad.py to ensure secure usage of subprocess.check_call:

Explicitly set shell=False in all subprocess.check_call invocations, even if arguments are already passed as a list.

Validate or sanitize any variable/user-supplied arguments passed to subprocess, such as 'language' and 'extractor_options'. This can include:

Asserting that 'language' is from a known safe list.

Validating 'extractor_options' are well-formed (e.g., not accepting shell metacharacters or complex strings).

Add a clear inline comment above the subprocess invocation, referencing the security rationale, and linking to the Python subprocess documentation and OWASP for command injection prevention.

(Optional but recommended) Add a regression test or code comment noting the requirement to keep shell=False and validate inputs for future maintainers.

This addresses the security concern described in the attached issue: #5.

Reference sources:

Python subprocess documentation: https://docs.python.org/3/library/subprocess.html#security-considerations

OWASP Command Injection: https://owasp.org/www-community/attacks/Command_Injection

This pull request was created as a result of the following prompt from Copilot chat.

Refactor misc/scripts/models-as-data/bulk_generate_mad.py to ensure secure usage of subprocess.check_call:

Explicitly set shell=False in all subprocess.check_call invocations, even if arguments are already passed as a list.

Validate or sanitize any variable/user-supplied arguments passed to subprocess, such as 'language' and 'extractor_options'. This can include:

Asserting that 'language' is from a known safe list.

Validating 'extractor_options' are well-formed (e.g., not accepting shell metacharacters or complex strings).

Add a clear inline comment above the subprocess invocation, referencing the security rationale, and linking to the Python subprocess documentation and OWASP for command injection prevention.

(Optional but recommended) Add a regression test or code comment noting the requirement to keep shell=False and validate inputs for future maintainers.

This addresses the security concern described in the attached issue: #5.

Reference sources:

Python subprocess documentation: https://docs.python.org/3/library/subprocess.html#security-considerations

OWASP Command Injection: https://owasp.org/www-community/attacks/Command_Injection

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Co-authored-by: dmsmi <6402178+dmsmi@users.noreply.github.com>

Initial plan

8c4a7b7

Copilot AI assigned Copilot and dmsmi Oct 27, 2025

Copilot started work on behalf of dmsmi October 27, 2025 20:46 View session

Copilot AI and others added 3 commits October 27, 2025 20:53

Add subprocess security improvements with validation and tests

cb79816

Co-authored-by: dmsmi <6402178+dmsmi@users.noreply.github.com>

Address code review feedback - improve argument handling

44a7877

Co-authored-by: dmsmi <6402178+dmsmi@users.noreply.github.com>

Add security requirements to module docstring

ca8e255

Co-authored-by: dmsmi <6402178+dmsmi@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Refactor bulk_generate_mad.py for subprocess security~~ Secure subprocess usage in bulk_generate_mad.py against command injection Oct 27, 2025

Copilot AI requested a review from dmsmi October 27, 2025 21:01

Copilot finished work on behalf of dmsmi October 27, 2025 21:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Secure subprocess usage in bulk_generate_mad.py against command injection #6

Secure subprocess usage in bulk_generate_mad.py against command injection #6

Copilot AI commented Oct 27, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Secure subprocess usage in bulk_generate_mad.py against command injection #6

Are you sure you want to change the base?

Secure subprocess usage in bulk_generate_mad.py against command injection #6

Conversation

Copilot AI commented Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Example

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Oct 27, 2025 •

edited

Loading