Skip to content

Conversation

Copy link

Copilot AI commented Oct 28, 2025

The build_database function passes user-controlled language and extractor_options parameters to subprocess.check_call without validation, creating command injection risk.

Changes

Input Validation

  • validate_language(): Whitelist check against known CodeQL languages (cpp, csharp, go, java, javascript, python, ruby, rust, swift)
  • validate_extractor_options(): Regex validation allowing only [a-zA-Z0-9_\-./=:] to reject shell metacharacters
  • Both raise ValueError with clear messages on invalid input

Subprocess Hardening

  • Explicit shell=False in all subprocess.check_call invocations
  • Validation called before subprocess execution in build_database()

Documentation

  • Security comments referencing Python subprocess docs and OWASP guidance
  • Docstrings explain security posture and allowed character sets

Example

Before:

def build_database(language: str, extractor_options, ...):
    subprocess.check_call([
        "codeql", "database", "create",
        f"--language={language}",  # Unvalidated
        *extractor_options,        # Unvalidated
        ...
    ])

After:

def build_database(language: str, extractor_options, ...):
    validate_language(language)              # Raises ValueError if not in whitelist
    validate_extractor_options(extractor_options)  # Raises ValueError if contains shell metacharacters
    
    subprocess.check_call([
        "codeql", "database", "create",
        f"--language={language}",
        *extractor_options,
        ...
    ], shell=False)  # Explicit

Testing

Added test_bulk_generate_mad.py with 10 test cases covering valid/invalid inputs for both validation functions.

Original prompt

Update misc/scripts/models-as-data/bulk_generate_mad.py to mitigate potential command injection risks in the build_database function, as outlined in issue #5. Specifically:\n\n- Explicitly set shell=False in all subprocess.check_call invocations.\n- Validate any user-controlled or variable input in the command list (such as language and extractor_options). Allow only known safe values for language and ensure extractor_options are simple key-value strings (alphanumeric, underscores, dashes).\n- Add comments referencing the relevant Python subprocess security documentation and OWASP guidance, documenting why the usage is safe.\n- If any input is rejected as unsafe, raise a ValueError with a clear message.\n\nReference:\n- https://docs.python.org/3/library/subprocess.html#security-considerations\n- https://owasp.org/www-community/attacks/Command_Injection\n\nEnsure the code is clean, maintainable, and explains its security posture for future maintainers.

This pull request was created as a result of the following prompt from Copilot chat.

Update misc/scripts/models-as-data/bulk_generate_mad.py to mitigate potential command injection risks in the build_database function, as outlined in issue #5. Specifically:\n\n- Explicitly set shell=False in all subprocess.check_call invocations.\n- Validate any user-controlled or variable input in the command list (such as language and extractor_options). Allow only known safe values for language and ensure extractor_options are simple key-value strings (alphanumeric, underscores, dashes).\n- Add comments referencing the relevant Python subprocess security documentation and OWASP guidance, documenting why the usage is safe.\n- If any input is rejected as unsafe, raise a ValueError with a clear message.\n\nReference:\n- https://docs.python.org/3/library/subprocess.html#security-considerations\n- https://owasp.org/www-community/attacks/Command_Injection\n\nEnsure the code is clean, maintainable, and explains its security posture for future maintainers.


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI and others added 2 commits October 28, 2025 20:30
…_generate_mad.py

Co-authored-by: dmsmi <6402178+dmsmi@users.noreply.github.com>
Co-authored-by: dmsmi <6402178+dmsmi@users.noreply.github.com>
Copilot AI changed the title [WIP] Update bulk_generate_mad.py to mitigate command injection risks Mitigate command injection risks in bulk_generate_mad.py Oct 28, 2025
Copilot AI requested a review from dmsmi October 28, 2025 20:39
Copilot finished work on behalf of dmsmi October 28, 2025 20:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants