Skip to content

Enhance ghidra backend with existing project feature#3066

Open
saniyafatima07 wants to merge 3 commits intomandiant:masterfrom
saniyafatima07:ghidra-feature
Open

Enhance ghidra backend with existing project feature#3066
saniyafatima07 wants to merge 3 commits intomandiant:masterfrom
saniyafatima07:ghidra-feature

Conversation

@saniyafatima07
Copy link
Copy Markdown

@saniyafatima07 saniyafatima07 commented Apr 30, 2026

This PR adds support for analyzing existing Ghidra projects directly using the existing input_file argument.

Users can now provide input in the format:

/path/to/project.gpr:folder/program

Motivation & Context

Currently, the Ghidra backend always creates a new temporary project and re-analyzes the binary. This:

  • increases analysis time
  • discards existing annotations and prior work

This change enables reuse of already-analyzed projects, improving performance and usability.

Implementation Details

  • Added parse_ghidra_project_path() to detect and parse .gpr:program syntax.
  • Integrated parsing early in CLI flow (in main(), immediately after handle_common_args()).
  • When .gpr:program is detected:
    • args.input_file is rewritten to the .gpr project path.
    • extracted program path is stored as args.ghidra_program.
    • backend is set to BACKEND_GHIDRA.
    • if user explicitly sets a different backend, capa exits with invalid input format.
  • Passed ghidra_program_path through get_extractor_from_cli() into capa.loader.get_extractor().
  • Updated Ghidra loader logic:
    • open existing project with create=False,
    • load program using consume_program,
    • skip temporary project creation/import/analyze path for existing-project input.
  • Default behavior remains unchanged when .gpr:program syntax is not used.

Tests

Added tests for:

  1. valid .gpr:program parsing (case-insensitive)
  2. invalid syntax handling
  3. backend validation when incorrect backend is used

Closes #3004

Checklist

  • CHANGELOG updated
  • Added few tests
  • No documentation update needed
    We can include a small section about this feature though. Feel free to share your opinions.
  • This submission includes AI-generated code and I have provided details in the description.

AI Usage Disclosure

Parts of this implementation were assisted using AI tools (Github Copilot, ChatGPT).
AI was used for:

  • refining implementation approach
  • improving edge case handling
    All code was reviewed, modified and tested manually before submission.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add bug fixes, new features, breaking changes and anything else you think is worthwhile mentioning to the master (unreleased) section of CHANGELOG.md. If no CHANGELOG update is needed add the following to the PR description: [x] No CHANGELOG update needed

@github-actions github-actions Bot dismissed their stale review April 30, 2026 16:41

CHANGELOG updated or no update needed, thanks! 😄

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for analyzing programs directly from existing Ghidra projects using the syntax /path/to/project.gpr:folder/program. The changes include a new path parser, updates to the Ghidra loader to consume existing programs via pyghidra, and CLI adjustments to automatically configure the Ghidra backend when this syntax is detected. Review feedback highlights opportunities to harden the path parsing logic, improve user-facing error messages for backend mismatches, and address the omission of file limitation checks (such as packed binary detection) when bypassing standard file extraction.

I am having trouble creating individual review comments. Click here to see my feedback.

capa/main.py (416)

medium

The current implementation uses .lower().find(".gpr:") to detect the Ghidra project syntax. While this works for standard cases, it might incorrectly trigger if a path contains .gpr: as part of a directory name or filename that isn't the project extension. A more robust approach would be to verify that the substring .gpr: is preceded by a valid filename and followed by a non-empty program path, or to use a regular expression to ensure it matches the expected pattern at the end of the project file component.

capa/main.py (1045-1046)

medium

When an existing Ghidra project is detected but an incompatible backend is explicitly specified (e.g., -b vivisect), the code raises ShouldExitError(E_INVALID_INPUT_FORMAT). It would be more user-friendly to log a specific error message explaining that the .gpr:program syntax is only compatible with the Ghidra backend before exiting.

References
  1. Provide clear explanations for each concern and offer specific, improved code suggestions when applicable. (link)

capa/main.py (1080)

medium

Setting file_extractors = [] when ghidra_info is present effectively skips the static and dynamic limitation checks (e.g., packed binary detection). While it's likely that a user providing an existing Ghidra project has already handled such issues, skipping these checks means capa won't provide its usual warnings if the program inside the project still appears packed or limited. Consider if it's possible to still run these checks against the program bytes extracted from Ghidra.

@saniyafatima07 saniyafatima07 marked this pull request as ready for review April 30, 2026 16:49
@saniyafatima07
Copy link
Copy Markdown
Author

@mike-hunhoff Could you please review this pr?
Thank you for your time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ghidra: enable feature extraction from existing Ghidra project binary

1 participant