Mitigate command injection risks in bulk_generate_mad.py #8

Copilot · 2025-10-28T20:20:30Z

The build_database function passes user-controlled language and extractor_options parameters to subprocess.check_call without validation, creating command injection risk.

Changes

Input Validation

validate_language(): Whitelist check against known CodeQL languages (cpp, csharp, go, java, javascript, python, ruby, rust, swift)
validate_extractor_options(): Regex validation allowing only [a-zA-Z0-9_\-./=:] to reject shell metacharacters
Both raise ValueError with clear messages on invalid input

Subprocess Hardening

Explicit shell=False in all subprocess.check_call invocations
Validation called before subprocess execution in build_database()

Documentation

Security comments referencing Python subprocess docs and OWASP guidance
Docstrings explain security posture and allowed character sets

Example

Before:

def build_database(language: str, extractor_options, ...):
    subprocess.check_call([
        "codeql", "database", "create",
        f"--language={language}",  # Unvalidated
        *extractor_options,        # Unvalidated
        ...
    ])

After:

def build_database(language: str, extractor_options, ...):
    validate_language(language)              # Raises ValueError if not in whitelist
    validate_extractor_options(extractor_options)  # Raises ValueError if contains shell metacharacters
    
    subprocess.check_call([
        "codeql", "database", "create",
        f"--language={language}",
        *extractor_options,
        ...
    ], shell=False)  # Explicit

Testing

Added test_bulk_generate_mad.py with 10 test cases covering valid/invalid inputs for both validation functions.

Original prompt

Update misc/scripts/models-as-data/bulk_generate_mad.py to mitigate potential command injection risks in the build_database function, as outlined in issue #5. Specifically:\n\n- Explicitly set shell=False in all subprocess.check_call invocations.\n- Validate any user-controlled or variable input in the command list (such as language and extractor_options). Allow only known safe values for language and ensure extractor_options are simple key-value strings (alphanumeric, underscores, dashes).\n- Add comments referencing the relevant Python subprocess security documentation and OWASP guidance, documenting why the usage is safe.\n- If any input is rejected as unsafe, raise a ValueError with a clear message.\n\nReference:\n- https://docs.python.org/3/library/subprocess.html#security-considerations\n- https://owasp.org/www-community/attacks/Command_Injection\n\nEnsure the code is clean, maintainable, and explains its security posture for future maintainers.

This pull request was created as a result of the following prompt from Copilot chat.

Update misc/scripts/models-as-data/bulk_generate_mad.py to mitigate potential command injection risks in the build_database function, as outlined in issue #5. Specifically:\n\n- Explicitly set shell=False in all subprocess.check_call invocations.\n- Validate any user-controlled or variable input in the command list (such as language and extractor_options). Allow only known safe values for language and ensure extractor_options are simple key-value strings (alphanumeric, underscores, dashes).\n- Add comments referencing the relevant Python subprocess security documentation and OWASP guidance, documenting why the usage is safe.\n- If any input is rejected as unsafe, raise a ValueError with a clear message.\n\nReference:\n- https://docs.python.org/3/library/subprocess.html#security-considerations\n- https://owasp.org/www-community/attacks/Command_Injection\n\nEnsure the code is clean, maintainable, and explains its security posture for future maintainers.

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

…_generate_mad.py Co-authored-by: dmsmi <6402178+dmsmi@users.noreply.github.com>

Co-authored-by: dmsmi <6402178+dmsmi@users.noreply.github.com>

Initial plan

ea14953

Copilot AI assigned Copilot and dmsmi Oct 28, 2025

Copilot started work on behalf of dmsmi October 28, 2025 20:21 View session

Copilot AI and others added 2 commits October 28, 2025 20:30

Add security enhancements to mitigate command injection risks in bulk…

7bd0770

…_generate_mad.py Co-authored-by: dmsmi <6402178+dmsmi@users.noreply.github.com>

Fix validation to reject empty strings in extractor options

28d88a0

Co-authored-by: dmsmi <6402178+dmsmi@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Update bulk_generate_mad.py to mitigate command injection risks~~ Mitigate command injection risks in bulk_generate_mad.py Oct 28, 2025

Copilot AI requested a review from dmsmi October 28, 2025 20:39

Copilot finished work on behalf of dmsmi October 28, 2025 20:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Mitigate command injection risks in bulk_generate_mad.py #8

Mitigate command injection risks in bulk_generate_mad.py #8

Uh oh!

Copilot AI commented Oct 28, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Mitigate command injection risks in bulk_generate_mad.py #8

Are you sure you want to change the base?

Mitigate command injection risks in bulk_generate_mad.py #8

Uh oh!

Conversation

Copilot AI commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Example

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Oct 28, 2025 •

edited

Loading