Enhance ghidra backend with existing project feature#3066
Enhance ghidra backend with existing project feature#3066saniyafatima07 wants to merge 3 commits intomandiant:masterfrom
Conversation
There was a problem hiding this comment.
Please add bug fixes, new features, breaking changes and anything else you think is worthwhile mentioning to the master (unreleased) section of CHANGELOG.md. If no CHANGELOG update is needed add the following to the PR description: [x] No CHANGELOG update needed
CHANGELOG updated or no update needed, thanks! 😄
There was a problem hiding this comment.
Code Review
This pull request introduces support for analyzing programs directly from existing Ghidra projects using the syntax /path/to/project.gpr:folder/program. The changes include a new path parser, updates to the Ghidra loader to consume existing programs via pyghidra, and CLI adjustments to automatically configure the Ghidra backend when this syntax is detected. Review feedback highlights opportunities to harden the path parsing logic, improve user-facing error messages for backend mismatches, and address the omission of file limitation checks (such as packed binary detection) when bypassing standard file extraction.
I am having trouble creating individual review comments. Click here to see my feedback.
capa/main.py (416)
The current implementation uses .lower().find(".gpr:") to detect the Ghidra project syntax. While this works for standard cases, it might incorrectly trigger if a path contains .gpr: as part of a directory name or filename that isn't the project extension. A more robust approach would be to verify that the substring .gpr: is preceded by a valid filename and followed by a non-empty program path, or to use a regular expression to ensure it matches the expected pattern at the end of the project file component.
capa/main.py (1045-1046)
When an existing Ghidra project is detected but an incompatible backend is explicitly specified (e.g., -b vivisect), the code raises ShouldExitError(E_INVALID_INPUT_FORMAT). It would be more user-friendly to log a specific error message explaining that the .gpr:program syntax is only compatible with the Ghidra backend before exiting.
References
- Provide clear explanations for each concern and offer specific, improved code suggestions when applicable. (link)
capa/main.py (1080)
Setting file_extractors = [] when ghidra_info is present effectively skips the static and dynamic limitation checks (e.g., packed binary detection). While it's likely that a user providing an existing Ghidra project has already handled such issues, skipping these checks means capa won't provide its usual warnings if the program inside the project still appears packed or limited. Consider if it's possible to still run these checks against the program bytes extracted from Ghidra.
|
@mike-hunhoff Could you please review this pr? |
This PR adds support for analyzing existing Ghidra projects directly using the existing input_file argument.
Users can now provide input in the format:
/path/to/project.gpr:folder/programMotivation & Context
Currently, the Ghidra backend always creates a new temporary project and re-analyzes the binary. This:
This change enables reuse of already-analyzed projects, improving performance and usability.
Implementation Details
Tests
Added tests for:
Closes #3004
Checklist
We can include a small section about this feature though. Feel free to share your opinions.
AI Usage Disclosure
Parts of this implementation were assisted using AI tools (Github Copilot, ChatGPT).
AI was used for:
All code was reviewed, modified and tested manually before submission.