Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dynamic: Time Travel Debugging (TTD) integration #1649

Open
atxr opened this issue Jul 18, 2023 · 10 comments
Open

dynamic: Time Travel Debugging (TTD) integration #1649

atxr opened this issue Jul 18, 2023 · 10 comments
Labels
enhancement New feature or request

Comments

@atxr
Copy link

atxr commented Jul 18, 2023

Summary

Develop a TTD exctractor and add keywords to the rules to use trace files generated by TTD to improve capa dynamic analysis and defeat packers.

Motivation

Because capa is trying to develop some dynamic analysis features, I would like to suggest using Microsoft TTD. Thanks to TTD, you can generate trace files that record the context of the binary at each instruction.
I could develop a TTD extractor that would add new features to capa from the trace file.

In the end, one could scan a binary sample with a TTD trace and use new rules to select a position in a trace where capa should work.
The TTD extractor would need time indicators to know when to scan in the timeline. These indicators can be TTD cursors (time position) or functions that would be hooked in the trace.
Also, the extractor would require a memory range to scan. Hence, several optimizations can be developed like scanning the heap, the module memory, the stack...

Here is a quick look of what a rule could look like:

...
- ttd:
  - time:
    - cursor: ["100:0", "200:a"]                                       # provide hardcoded time position
    - hook: ["ntdll!NtCreateThreadEx", "ntdll!NtCreateUserProcess"]    # provide functions to hook
  - memory: ["heap", "module"]                                         # select memory ranges to scan
...

This rule tries to detect thread and process creation, and scan the heap at these time positions to search some shellcode that could be loaded by a packer, or some useful strings loaded dynamically.

Alternative projects

I'm currently working on https://github.com/airbus-cert/yara-ttd which aims to apply yara rules on TTD trace files thanks to these TTD bindings.
The tool is currently working and has many use cases when dealing with yara rules on a packed binaries.

I also read the dynamic-feature-extraction branch you are working on to integrate CAPE.
The TTD integration could work alongside this project to provide a more precise analysis and an in-depth dynamic memory scan.


I didn't started yet to implement the extractor because I wanted some validation from the capa team before. Of course, I will welcome any kind of advice!

@williballenthin
Copy link
Collaborator

hey @atxr!

I am very keen on integrating TTD with capa. Like you said, the technology might make it easier to analyze samples that are packed.

It's an interesting idea to specify a collection of hooks/events at which point to analyze the state of the process and find capabilities. This makes me think of using capa against a memory dump (a good idea, but something we don't have today). So, I think this is feasible, and would enable some complementary enhancements.

I'm not yet convinced of the proposal to extend the rule format to specify the hook locations. This sort of thing seems orthogonal to the description of a capability. That is, the author describing how to find browser cookie stealing behavior shouldn't have to be an expert in VMProtect and decide where to hook. Instead, I'd recommend this be provided either as a CLI argument (in the case of the TTD cursor ID) or use some reasonable defaults (like ExitProcess, WriteFile, etc.).

For awareness, I had previously been thinking that we'd use TTD as a sort of sandbox that we can use to capture the API trace and feed into the dynamic analyzer that @yelhamer is working on. I think this idea can be independent of what you propose here and we should explore both.

@mr-tz
Copy link
Collaborator

mr-tz commented Jul 18, 2023

I think capa + TTD could be amazing! We've also discussed this before with @xusheng6, so tagging him here.

@atxr
Copy link
Author

atxr commented Jul 19, 2023

Thank you for your feedback! I totally agree with your proposal!

If I summarize a bit this TTD integration:

  • It should extend capa to analyze memory dumps based on defaults hooks
  • These hooks/positions should be tweakable thanks to extra command line args in capa

In the end, it should look like:

capa sample.exe sample.run                                       # analyze TTD trace sample.run with the default hooks
capa sample.exe sample.run --ttd-hook ntdll!NtCreateUserProcess  # specify a hook
capa sample.exe sample.run --ttd-cursor 100:1A                   # specify a cursor position

I have few questions though regarding the implementation:

  • Should I create a new TTD feature extractor like @yelhamer did for CAPE?
  • If so, should I base my PR on master or on the dynamic-feature-extraction branch? Even if CAPE and TTD aren't linked, I saw you discussed and work a lot on how to integrate these dynamic features in this branch, and I was wondering if there was some code that could be necessary for me in this branch.

I'm still trying to familiarize myself with the project to figure out how I'll integrate TTD, I might come with other questions soon 🙂

@williballenthin
Copy link
Collaborator

williballenthin commented Jul 19, 2023

If I summarize a bit this TTD integration:

  • It should extend capa to analyze memory dumps based on defaults hooks
  • These hooks/positions should be tweakable thanks to extra command line args in capa

Both of these sound great, and so do the proposed command lines.

Should I create a new TTD feature extractor like @yelhamer did for CAPE?

I think it should look more like the Binary Ninja feature extractor that @xusheng6 added in #1343. That's because I'd recommend that you focus on static analysis of memory snapshots, not dynamic analysis API traces. Conceptually, capa static analysis is things like functions/basic blocks/instructions while its (proposed) dynamic analysis is things like API calls found in threads and processes. In this issue, lets focus on static analysis of memory snapshots derived from TTD traces. I'll open another issue (#1655) to track the use of TTD for dynamic analysis. If you'd prefer to work on that feature, no problem! (Though, I'd suggest we wait until the CAPE implementation is done and lessons are learned.)

Given that the idea is to have capa analyze snapshots of memory at specific points of time in TTD traces, I wonder if we can start by building:

  1. TTD memory snapshot exporter: given a TTD trace and cursor position (or later, hook specification), write a memory snapshot(s) and metadata to a file(s). Ideally we could use a common format, like minidump or similar, but not required.
  2. a memory snapshot feature extractor for capa. static analysis of memory dumps to find capabilities #1654

These could be built in parallel as temporarily separate utilities. Then we can wire 1 and 2 within capa and add the CLI arguments, etc. The benefit is that we might also be able to provide memory snapshots from other systems, like sandboxes, which would be neat. I also suspect the TTD memory snapshot exporter might be generally useful for other things like dumping unpacked executables.

I'm just brainstorming here. What do you think?

@williballenthin
Copy link
Collaborator

added #1654 to track static analysis of memory snapshots

@williballenthin
Copy link
Collaborator

added #1655 to track dynamic analysis via TTD traces.

@atxr
Copy link
Author

atxr commented Jul 19, 2023

First of all, I think your ideas are really great!

  1. TTD memory snapshot exporter: given a TTD trace and cursor position (or later, hook specification), write a memory snapshot(s) and metadata to a file(s). Ideally we could use a common format, like minidump or similar, but not required.

Just to be sure, should this snapshot exporter be part of capa or should it be a dependency that I could develop in another repo?
For the second point, I'll start looking at BN feature extractor to understand better then.

@williballenthin
Copy link
Collaborator

should this snapshot exporter be part of capa or should it be a dependency that I could develop in another repo?

I think this can be up to you. If you can find other consumers for the library, then maybe it makes sense to be external. Or maybe capa is just a good central place to store and distribute this. shrug. It's also fine to start in your own repo and then merge into capa when you're happy.

For the second point, I'll start looking at BN feature extractor to understand better then.

Great!

I'm also going to do a bit of background research on what we'd need to implement a memory snapshot feature extractor. At least so I can talk intelligently with you about it, and/or to write code alongside you :-)

@williballenthin williballenthin added the enhancement New feature or request label Jul 19, 2023
@atxr
Copy link
Author

atxr commented Jul 19, 2023

Awesome! Then I'll start with an external repo and see next if it makes sense to merge into capa!
I saw your links in #1654 I'll take a look!
Thanks again for your interest in this feature!

@N3mes1s
Copy link

N3mes1s commented Feb 2, 2024

FYI I think you could use the programmatic api to instrument and run the capas

TTD live recorder API sample
This is a sample demonstrating how a program can use TTD's live recording API to record portions of itself.

https://github.com/microsoft/WinDbg-Samples/tree/master/TTD/LiveRecorderApiSample

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants