Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

performance issues with some samples #1989

Open
5 tasks done
williballenthin opened this issue Feb 15, 2024 · 6 comments
Open
5 tasks done

performance issues with some samples #1989

williballenthin opened this issue Feb 15, 2024 · 6 comments
Labels
bug Something isn't working gsoc Work related to Google Summer of Code project. performance Related to capa's performance

Comments

@williballenthin
Copy link
Collaborator

williballenthin commented Feb 15, 2024

Investigate CPU and memory usage for the following samples. If it's something we're doing wrong, let's optimize that behavior. If its an issue with viv or other dependency, perhaps we can introduce heuristics to detect difficult samples and bail early (opt-in).

Tasks

consolidated takeaways:

  • don't disassemble huge code sections (few MB)
  • don't try to load samples with huge data sections (dozens of MBs)
  • don't analyze packed samples (sections with entropy ~8.0)
  • disable vivisect.analysis.generic.symswitchcase function analysis module

in #1499 and #1500 we discuss adding a section scope and associated features. these could be used to match the first three points above. or, we could hardcode the logic into the viv workspace loader and have it raise an exception.

@williballenthin williballenthin added the bug Something isn't working label Feb 15, 2024
@mr-tz
Copy link
Collaborator

mr-tz commented Feb 15, 2024

on first brief glance...

39a91796fafe9d2efc2cea0de239179a3a2d406ea482af310710e6f5fed00083 hangs early:

...
DEBUG:viv_utils.flirt:found library function: 0x10481ff0: ?
DEBUG:viv_utils.flirt:found library function: 0x10482000: ?
DEBUG:viv_utils.flirt:found library function: 0x1049ae50: ?

and it's similar for 359f1f07a9d037c5d4ab95e56285d46c0c106a970235bbbcacdf06851626fabd

@williballenthin
Copy link
Collaborator Author

williballenthin commented Feb 15, 2024

39a91796fafe9d2efc2c...

39a91796fafe9d2efc2cea0de239179a3a2d406ea482af310710e6f5fed00083 avfilter-7.dll

Size
6.77 MB

like @mr-tz mentioned, loading the workspace is taking a long time:

image
stuck here for seconds/minutes.

note that this is not a dedicated FLIRT matching phase; FLIRT matching happens while the workspace is loaded, and the stack trace below shows its not an issue with python-flirt.

CPU is pegged and RAM is growing:
image

stack trace at time of kill:

^CTraceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "...capa/capa/main.py", line 965, in <module>
    sys.exit(main())
             ^^^^^^
  File "...capa/capa/main.py", line 852, in main
    extractor = get_extractor_from_cli(args, input_format, backend)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "...capa/capa/main.py", line 755, in get_extractor_from_cli
    return capa.loader.get_extractor(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "...capa/capa/loader.py", line 254, in get_extractor
    vw = get_workspace(input_path, input_format, sigpaths)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "...capa/capa/loader.py", line 160, in get_workspace
    vw.analyze()
  File "...capa/.direnv/python-3.11/lib/python3.11/site-packages/vivisect/__init__.py", line 819, in analyze
    mod.analyze(self)
  File "...capa/.direnv/python-3.11/lib/python3.11/site-packages/vivisect/analysis/generic/relocations.py", line 18, in analyze
    vw.makePointer(va, follow=True)
  File "...capa/.direnv/python-3.11/lib/python3.11/site-packages/vivisect/__init__.py", line 2107, in makePointer
    self.followPointer(tova)
  File "...capa/.direnv/python-3.11/lib/python3.11/site-packages/vivisect/__init__.py", line 780, in followPointer
    self.makeFunction(va, arch=arch)
  File "...capa/.direnv/python-3.11/lib/python3.11/site-packages/vivisect/__init__.py", line 1552, in makeFunction
    realfva = self.cfctx.addEntryPoint(va, arch=arch)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "...capa/.direnv/python-3.11/lib/python3.11/site-packages/envi/codeflow.py", line 294, in addEntryPoint
    self._cb_function(va, {'CallsFrom': calls_from})
  File "...capa/.direnv/python-3.11/lib/python3.11/site-packages/vivisect/base.py", line 819, in _cb_function
    vw.analyzeFunction(fva)
  File "...capa/.direnv/python-3.11/lib/python3.11/site-packages/vivisect/__init__.py", line 832, in analyzeFunction
    fmod.analyzeFunction(self, fva)
  File "...capa/.direnv/python-3.11/lib/python3.11/site-packages/vivisect/analysis/i386/calling.py", line 137, in analyzeFunction
    emu.runFunction(fva, maxhit=1)
  File "...capa/.direnv/python-3.11/lib/python3.11/site-packages/vivisect/impemu/emulator.py", line 491, in runFunction
    self.executeOpcode(op)
  File "...capa/.direnv/python-3.11/lib/python3.11/site-packages/envi/archs/i386/emu.py", line 255, in executeOpcode
    newpc = meth(op)
            ^^^^^^^^
  File "...capa/.direnv/python-3.11/lib/python3.11/site-packages/envi/archs/i386/emu.py", line 722, in i_call
    self.doPush(saved)
  File "...capa/.direnv/python-3.11/lib/python3.11/site-packages/envi/archs/i386/emu.py", line 407, in doPush
    esp = self.getRegister(REG_ESP)
          ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "...capa/.direnv/python-3.11/lib/python3.11/site-packages/vivisect/impemu/platarch/i386.py", line 28, in getRegister
    rval = value = e_i386.IntelEmulator.getRegister(self, index)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "...capa/.direnv/python-3.11/lib/python3.11/site-packages/envi/registers.py", line 295, in getRegister
    def getRegister(self, index):

It looks to me like viv is taking a really long time to analyze this sample. If there are MBs of code, then this is a reasonable outcome.

Binary Ninja takes 208 seconds to find 12,344 functions over 0x4C0E00 code (about 4.9MB, a lot).


takeaways:

  • don't disassemble huge code sections

@williballenthin
Copy link
Collaborator Author

williballenthin commented Feb 15, 2024

a0ca23f56230fc857f1246a5f8e9cb4742e90ce78122f7393de00a017028cbbd

https://www.virustotal.com/gui/file/a0ca23f56230fc857f1246a5f8e9cb4742e90ce78122f7393de00a017028cbbd
DaVinci_Deluxe.exe
Size 15.74 MB (!!!)

loads pretty quickly in Binary Ninja, but there are only two local functions.

size of code is 0x583000, which is very large.

the two huge sections have entropy 8, so this seems mostly encrypted:
image

all sections are RWX:
image

so in summary, there's almost nothing usable here, but viv probably thinks it needs to disassemble 10MB or more.


takeaways:

  • don't analyze packed samples
  • don't disassemble huge sections

@williballenthin
Copy link
Collaborator Author

williballenthin commented Feb 16, 2024

a4f906f671f02b2cec47a8706e8b042f3cea0739dad15f24b92449a932203972

https://www.virustotal.com/gui/file/a4f906f671f02b2cec47a8706e8b042f3cea0739dad15f24b92449a932203972
amd64 ELF for Android
Size 729.45 KB

Binary Ninja loads in about 5 seconds.
408 functions, although I think a lot of analysis is missing.
image

viv is taking a long time to load the workspace:
image

...and mem:
image

initially spends a lot of time (many seconds) running cxxfilt to demangle names, but during this time, CPU/mem usage is low:

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "capa/capa/main.py", line 965, in <module>
    sys.exit(main())
             ^^^^^^
  File "capa/capa/main.py", line 852, in main
    extractor = get_extractor_from_cli(args, input_format, backend)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "capa/capa/main.py", line 755, in get_extractor_from_cli
    return capa.loader.get_extractor(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "capa/capa/loader.py", line 254, in get_extractor
    vw = get_workspace(input_path, input_format, sigpaths)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "capa/capa/loader.py", line 149, in get_workspace
    vw = viv_utils.getWorkspace(str(path), analyze=False, should_save=False)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "capa/.direnv/python-3.11/lib/python3.11/site-packages/viv_utils/__init__.py", line 117, in getWorkspace
    vw.loadFromFile(fp)
  File "capa/.direnv/python-3.11/lib/python3.11/site-packages/vivisect/__init__.py", line 2824, in loadFromFile
    fname = mod.parseFile(self, filename=filename, baseaddr=baseaddr)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "capa/.direnv/python-3.11/lib/python3.11/site-packages/vivisect/parsers/elf.py", line 32, in parseFile
    return loadElfIntoWorkspace(vw, elf, filename=filename, baseaddr=baseaddr)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "capa/.direnv/python-3.11/lib/python3.11/site-packages/vivisect/parsers/elf.py", line 494, in loadElfIntoWorkspace
    postfix = applyRelocs(elf, vw, addbase, baseoff)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "capa/.direnv/python-3.11/lib/python3.11/site-packages/vivisect/parsers/elf.py", line 728, in applyRelocs
    dmglname = demangle(name)
               ^^^^^^^^^^^^^^
  File "capa/.direnv/python-3.11/lib/python3.11/site-packages/vivisect/parsers/elf.py", line 973, in demangle
    import cxxfilt
  File "capa/.direnv/python-3.11/lib/python3.11/site-packages/cxxfilt/__init__.py", line 39, in <module>
    libc = ctypes.CDLL(find_any_library('c'))
                       ^^^^^^^^^^^^^^^^^^^^^
  File "capa/.direnv/python-3.11/lib/python3.11/site-packages/cxxfilt/__init__.py", line 33, in find_any_library
    lib = ctypes.util.find_library(choice)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nix/store/s31jwk4jsiqczzkrd8rcnjrhiyk2z4kf-devshell-dir/lib/python3.11/ctypes/util.py", line 257, in find_library
    _get_soname(_findLib_gcc(name)) or _get_soname(_findLib_ld(name))
                                                   ^^^^^^^^^^^^^^^^^
  File "/nix/store/s31jwk4jsiqczzkrd8rcnjrhiyk2z4kf-devshell-dir/lib/python3.11/ctypes/util.py", line 241, in _findLib_ld
    out, _ = p.communicate()
             ^^^^^^^^^^^^^^^
  File "/nix/store/s31jwk4jsiqczzkrd8rcnjrhiyk2z4kf-devshell-dir/lib/python3.11/subprocess.py", line 1207, in communicate
    stdout, stderr = self._communicate(input, endtime, timeout)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nix/store/s31jwk4jsiqczzkrd8rcnjrhiyk2z4kf-devshell-dir/lib/python3.11/subprocess.py", line 2075, in _communicate
    ready = selector.select(timeout)
            ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nix/store/s31jwk4jsiqczzkrd8rcnjrhiyk2z4kf-devshell-dir/lib/python3.11/selectors.py", line 415, in select
    fd_event_list = self._selector.poll(timeout)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyboardInterrupt

when viv is allocating all that memory (which spikes up and down, up to around 100GB at least), the program doesn't respond to ctrl-c, so i dont have a stacktrace yet.

can use py-spy to show the stack trace at this point:

image

so it seems symboliks is taking a lot of memory?
foo

after following the stacktrace a bit, it seems that there are either very complex or very many symbolic expressions being tracked, and this eats time and CPU.

if this is a prevalent bug, then we can look into disabling symboliks. or, we can rely on the user/system to kill capa when it takes too many resources. i don't immediately see any tricks to guessing this will happen.

looks like its: analyzeFunction (vivisect/analysis/generic/symswitchcase.py:1251
which is here: https://github.com/vivisect/vivisect/blob/9534f164954bd417767b6a5ac0a6185fd16ed942/vivisect/analysis/generic/symswitchcase.py#L374

looks like this is enabled for ELF, but not PE:
https://github.com/vivisect/vivisect/blob/9534f164954bd417767b6a5ac0a6185fd16ed942/vivisect/analysis/__init__.py#L136

which we could disable with delFuncAnalysisModule: https://github.com/vivisect/vivisect/blob/9534f164954bd417767b6a5ac0a6185fd16ed942/vivisect/__init__.py#L581

image

when this is disabled, analysis completes in a reasonable amount of time.


takeaways:

  • disable vivisect.analysis.generic.symswitchcase function analysis module

@williballenthin
Copy link
Collaborator Author

williballenthin commented Feb 16, 2024

a1c3dcb87b243005ed3bb2b88998adfb54b2cba01d92b401afd99f2027b7ef1e

https://www.virustotal.com/gui/file/a1c3dcb87b243005ed3bb2b88998adfb54b2cba01d92b401afd99f2027b7ef1e
64-bit DLL
Size 447.62 KB

Binary Ninja takes only a few seconds to load.

image

image

no imports or exports.
section names seem weird (after .reloc).
im guessing this is a corrupt PE.

oh look at this section:
image

image

thats about 900 MB. and note that the subsequent sections overlap, so its definitely corrupt. and, if a naive PE loader tries to map this, it will create that 900MB section.

sure enough capa tries to allocate a large amount of memory:
image


takeaways:

  • don't try to load samples with huge sections
  • maybe try to detect corruption and skip that

@williballenthin
Copy link
Collaborator Author

williballenthin commented Feb 16, 2024

359f1f07a9d037c5d4ab95e56285d46c0c106a970235bbbcacdf06851626fabd

https://www.virustotal.com/gui/file/359f1f07a9d037c5d4ab95e56285d46c0c106a970235bbbcacdf06851626fabd
Size 92.00 KB
32-bit EXE

there's a weird initial section that (1) overlaps and is therefore invalid, and (2) is huge (1.4GB).
image


takeaways:

  • don't try to load samples with huge sections
  • maybe try to detect corruption and skip that

@mr-tz mr-tz added gsoc Work related to Google Summer of Code project. performance Related to capa's performance labels May 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working gsoc Work related to Google Summer of Code project. performance Related to capa's performance
Projects
Status: No status
Development

No branches or pull requests

2 participants