swap out yara pattern scanning for binary regexps #30

williballenthin · 2023-08-03T20:42:22Z

closes #29
closes #23

closes #29

williballenthin · 2023-08-03T21:31:45Z

tests pass when i run them locally.

williballenthin · 2023-08-04T10:41:13Z

unfortunately the performance of the binaryregexp is significantly worse than yara. in the same environment, the test cases on master run in 67s while this PR takes 346s (5x).

only now am i seeing that Go's regular expression implementation is really slow: https://github.com/mariomka/regex-benchmark#performance

there are certainly some optimizations we can make to avoid doing so much regexp scanning; i will think on it and try to propose which strategies we might try first.

stevemk14ebr · 2023-08-04T10:42:56Z

Awe that's too bad to hear. Unfortunately I switched to yara originally for performance reasons so this makes some sense.

try to improve performance by doing a single pass over the data, looking for any of the pattern candidates, and fetching these via the SubexpIndex functionality.

williballenthin · 2023-08-04T14:26:34Z

initially in this PR i was lazy with the regular expressions: i ran five expressions over the data separately, one for each pattern: 6439aec#diff-5c52596a150c1cf395c3609f471a82dbdeca26bccb840730182b301007cf2699R87

i assumed this was an obvious place to optimize, so i put all five patterns into a single pattern like (a|b|c|...) and used named groups to fetch the results. i figured this would be faster because it scans the data only once:

GoReSym/objfile/scanner.go

Line 96 in d09fb19

m := g.FindAllIndex(data, -1)

surprisingly, the performance doesnt really change much. perhaps the regex engine does 5x as much work during the merged pattern scan? ugh.

the profile data looks like this, currently:


go test -v -run "^TestAllVersions/117"
@williballenthin ➜ /workspaces/GoReSym (master-perf) $ go test -v -run "^TestAllVersions/117"

--- PASS: TestAllVersions (4.46s)

--- PASS: TestAllVersions/117/testproject_lin (0.91s)

--- PASS: TestAllVersions/117/testproject_lin_32 (0.62s)

--- PASS: TestAllVersions/117/testproject_lin_stripped (0.43s)

--- PASS: TestAllVersions/117/testproject_lin_stripped_32 (0.34s)

--- PASS: TestAllVersions/117/testproject_mac (0.87s)

--- PASS: TestAllVersions/117/testproject_mac_stripped (0.48s)

--- PASS: TestAllVersions/117/testproject_win_32.exe (0.23s)

--- PASS: TestAllVersions/117/testproject_win_stripped_32.exe (0.10s)

--- PASS: TestAllVersions/117/testproject_win_stripped.exe (0.12s)

--- PASS: TestAllVersions/117/testproject_win.exe (0.20s)

PASS

ok      github.com/mandiant/GoReSym     4.705s
@williballenthin ➜ /workspaces/GoReSym (fix/issue-29) $ pprof -cum -top cpu-master-perf.pprof

File: GoReSym.test

Build ID: bd44f557032899b425c68f2041eee53bf543b181

Type: cpu

Time: Aug 4, 2023 at 2:11pm (UTC)

Duration: 4.45s, Total samples = 4.71s (105.93%)

Showing nodes accounting for 4.30s, 91.30% of 4.71s total

Dropped 129 nodes (cum <= 0.02s)

flat  flat%   sum%        cum   cum%

0     0%     0%      3.91s 83.01%  github.com/mandiant/GoReSym.TestAllVersions.func1

0     0%     0%      3.91s 83.01%  github.com/mandiant/GoReSym.main_impl

0     0%     0%      3.91s 83.01%  testing.tRunner

0     0%     0%         2s 42.46%  github.com/mandiant/GoReSym/objfile.(*Entry).PCLineTable

0     0%     0%         2s 42.46%  github.com/mandiant/GoReSym/objfile.(*File).PCLineTable (inline)

0     0%     0%      1.42s 30.15%  github.com/mandiant/GoReSym/objfile.(*Entry).ModuleDataTable

0     0%     0%      1.42s 30.15%  github.com/mandiant/GoReSym/objfile.(*File).ModuleDataTable

1.25s 26.54% 26.54%      1.26s 26.75%  runtime.cgocall

0     0% 26.54%      1.17s 24.84%  github.com/mandiant/GoReSym/objfile.findModuleInitPCHeader

0     0% 26.54%      1.11s 23.57%  github.com/mandiant/GoReSym/debug/elf.(*File).DataAfterSection

1.10s 23.35% 49.89%      1.10s 23.35%  runtime.memmove

0     0% 49.89%      1.05s 22.29%  github.com/mandiant/GoReSym/objfile.(*elfFile).pcln

0     0% 49.89%      1.05s 22.29%  github.com/mandiant/GoReSym/objfile.(*elfFile).pcln_scan

0     0% 49.89%      1.01s 21.44%  runtime.growslice

0     0% 49.89%      0.81s 17.20%  runtime.systemstack

0     0% 49.89%      0.80s 16.99%  github.com/mandiant/GoReSym/objfile.(*elfFile).moduledata_scan

0     0% 49.89%      0.76s 16.14%  github.com/mandiant/GoReSym/debug/macho.(*File).DataAfterSection

0.01s  0.21% 50.11%      0.68s 14.44%  runtime.mallocgc

0     0% 50.11%      0.62s 13.16%  github.com/mandiant/GoReSym/objfile.(*machoFile).moduledata_scan

0     0% 50.11%      0.58s 12.31%  internal/poll.(*FD).Pread

0     0% 50.11%      0.58s 12.31%  io.(*SectionReader).ReadAt

0     0% 50.11%      0.58s 12.31%  os.(*File).ReadAt

0     0% 50.11%      0.58s 12.31%  os.(*File).pread (inline)

0     0% 50.11%      0.56s 11.89%  syscall.Pread (inline)

0.01s  0.21% 50.32%      0.56s 11.89%  syscall.Syscall6

0     0% 50.32%      0.56s 11.89%  syscall.pread

0.54s 11.46% 61.78%      0.54s 11.46%  runtime/internal/syscall.Syscall6

0     0% 61.78%      0.53s 11.25%  github.com/mandiant/GoReSym/objfile.(*machoFile).pcln

0     0% 61.78%      0.53s 11.25%  github.com/mandiant/GoReSym/objfile.(*machoFile).pcln_scan

0     0% 61.78%      0.53s 11.25%  syscall.RawSyscall6

0.49s 10.40% 72.19%      0.49s 10.40%  runtime.memclrNoHeapPointers

0     0% 72.19%      0.48s 10.19%  github.com/hillu/go-yara/v4.(*Compiler).AddString

0     0% 72.19%      0.48s 10.19%  runtime.gcBgMarkWorker

0     0% 72.19%      0.47s  9.98%  github.com/hillu/go-yara/v4.(*Compiler).AddString.func4

0     0% 72.19%      0.47s  9.98%  github.com/hillu/go-yara/v4._Cfunc_yr_compiler_add_string

0     0% 72.19%      0.46s  9.77%  github.com/mandiant/GoReSym/objfile.(*Entry).ParseType

0.02s  0.42% 72.61%      0.46s  9.77%  github.com/mandiant/GoReSym/objfile.(*Entry).ParseType_impl

0     0% 72.61%      0.45s  9.55%  runtime.gcBgMarkWorker.func2

0     0% 72.61%      0.45s  9.55%  runtime.gcDrain

0     0% 72.61%      0.44s  9.34%  runtime.makeslice

0     0% 72.61%      0.43s  9.13%  github.com/mandiant/GoReSym/objfile.(*Entry).ParseTypeLinks

0     0% 72.61%      0.43s  9.13%  github.com/mandiant/GoReSym/objfile.(*File).ParseTypeLinks (inline)

0     0% 72.61%      0.38s  8.07%  github.com/hillu/go-yara/v4.(*Compiler).GetRules

0     0% 72.61%      0.38s  8.07%  github.com/hillu/go-yara/v4.(*Compiler).GetRules.func1

0     0% 72.61%      0.38s  8.07%  github.com/hillu/go-yara/v4._Cfunc_yr_compiler_get_rules

0     0% 72.61%      0.38s  8.07%  github.com/mandiant/GoReSym/debug/elf.(*Section).Data

0     0% 72.61%      0.34s  7.22%  github.com/mandiant/GoReSym/objfile.(*peFile).pcln

0     0% 72.61%      0.34s  7.22%  github.com/mandiant/GoReSym/objfile.(*peFile).pcln_scan

0     0% 72.61%      0.31s  6.58%  runtime.markroot

0     0% 72.61%      0.30s  6.37%  github.com/hillu/go-yara/v4.(*Scanner).ScanMem

0     0% 72.61%      0.30s  6.37%  github.com/hillu/go-yara/v4.(*Scanner).ScanMem.func2

0     0% 72.61%      0.30s  6.37%  github.com/hillu/go-yara/v4._Cfunc_yr_scanner_scan_mem

...
@williballenthin ➜ /workspaces/GoReSym (fix/issue-29) $ go test -v -run "^TestAllVersions/117"

--- PASS: TestAllVersions (32.13s)

--- PASS: TestAllVersions/117/testproject_lin (8.75s)

--- PASS: TestAllVersions/117/testproject_lin_32 (6.06s)

--- PASS: TestAllVersions/117/testproject_lin_stripped (2.32s)

--- PASS: TestAllVersions/117/testproject_lin_stripped_32 (1.64s)

--- PASS: TestAllVersions/117/testproject_mac (7.23s)

--- PASS: TestAllVersions/117/testproject_mac_stripped (3.40s)

--- PASS: TestAllVersions/117/testproject_win_32.exe (0.76s)

--- PASS: TestAllVersions/117/testproject_win_stripped_32.exe (0.53s)

--- PASS: TestAllVersions/117/testproject_win_stripped.exe (0.58s)

--- PASS: TestAllVersions/117/testproject_win.exe (0.84s)

PASS

ok      github.com/mandiant/GoReSym     32.133s
@williballenthin ➜ /workspaces/GoReSym (fix/issue-29) $ pprof -cum -top cpu-issue-29.pprof

File: GoReSym.test

Type: cpu

Time: Aug 4, 2023 at 2:14pm (UTC)

Duration: 30.44s, Total samples = 30.30s (99.55%)

Showing nodes accounting for 29.04s, 95.84% of 30.30s total

Dropped 226 nodes (cum <= 0.15s)

flat  flat%   sum%        cum   cum%

0     0%     0%     29.37s 96.93%  github.com/mandiant/GoReSym.TestAllVersions.func1

0     0%     0%     29.37s 96.93%  github.com/mandiant/GoReSym.main_impl

0     0%     0%     29.37s 96.93%  testing.tRunner

0     0%     0%     27.58s 91.02%  github.com/mandiant/GoReSym/objfile.(*Entry).PCLineTable

0     0%     0%     27.58s 91.02%  github.com/mandiant/GoReSym/objfile.(*File).PCLineTable (inline)

0     0%     0%     27.14s 89.57%  github.com/mandiant/GoReSym/objfile.findModuleInitPCHeader

0     0%     0%     27.12s 89.50%  github.com/mandiant/GoReSym/objfile.(*BinaryRegexpGroup).FindAllIndex (inline)

0     0%     0%     27.12s 89.50%  rsc.io/binaryregexp.(*Regexp).FindAllIndex

0     0%     0%     27.12s 89.50%  rsc.io/binaryregexp.(*Regexp).allMatches

0     0%     0%     27.12s 89.50%  rsc.io/binaryregexp.(*Regexp).doExecute

1.02s  3.37%  3.37%     27.12s 89.50%  rsc.io/binaryregexp.(*machine).match

17.70s 58.42% 61.78%     21.56s 71.16%  rsc.io/binaryregexp.(*machine).add

0     0% 61.78%     15.55s 51.32%  github.com/mandiant/GoReSym/objfile.(*elfFile).pcln

0     0% 61.78%     15.55s 51.32%  github.com/mandiant/GoReSym/objfile.(*elfFile).pcln_scan

0     0% 61.78%      9.44s 31.16%  github.com/mandiant/GoReSym/objfile.(*machoFile).pcln

0     0% 61.78%      9.44s 31.16%  github.com/mandiant/GoReSym/objfile.(*machoFile).pcln_scan

4.30s 14.19% 75.97%      4.39s 14.49%  rsc.io/binaryregexp.(*machine).step

3.87s 12.77% 88.75%      3.87s 12.77%  runtime.memmove

0     0% 88.75%      2.52s  8.32%  github.com/mandiant/GoReSym/objfile.(*peFile).pcln

0     0% 88.75%      2.51s  8.28%  github.com/mandiant/GoReSym/objfile.(*peFile).pcln_scan

0     0% 88.75%      1.33s  4.39%  github.com/mandiant/GoReSym/objfile.(*Entry).ModuleDataTable

0     0% 88.75%      1.33s  4.39%  github.com/mandiant/GoReSym/objfile.(*File).ModuleDataTable

0     0% 88.75%      1.03s  3.40%  runtime.systemstack

0     0% 88.75%      0.91s  3.00%  github.com/mandiant/GoReSym/debug/elf.(*File).DataAfterSection

0     0% 88.75%      0.83s  2.74%  runtime.growslice

0     0% 88.75%      0.82s  2.71%  github.com/mandiant/GoReSym/objfile.(*elfFile).moduledata_scan

0.80s  2.64% 91.39%      0.81s  2.67%  rsc.io/binaryregexp.(*machine).alloc (inline)

0.03s 0.099% 91.49%      0.66s  2.18%  runtime.mallocgc

...

williballenthin · 2023-08-04T14:34:32Z

one thing thats suspicious is how often the scanner routine is called:

@williballenthin ➜ /workspaces/GoReSym (fix/issue-29) $ go test -v -run "^TestAllVersions/117/testproject_lin$"
findModuleInitPCHeader: len(data)= 1977516
findModuleInitPCHeader: len(data)= 1463668
findModuleInitPCHeader: len(data)= 1245252
findModuleInitPCHeader: len(data)= 1244874
findModuleInitPCHeader: len(data)= 1243622
findModuleInitPCHeader: len(data)= 1243534
findModuleInitPCHeader: len(data)= 1243534
findModuleInitPCHeader: len(data)= 880038
findModuleInitPCHeader: len(data)= 880006
findModuleInitPCHeader: len(data)= 812998
findModuleInitPCHeader: len(data)= 782262
findModuleInitPCHeader: len(data)= 589998
findModuleInitPCHeader: len(data)= 568678
findModuleInitPCHeader: len(data)= 568397
findModuleInitPCHeader: len(data)= 457639
findModuleInitPCHeader: len(data)= 436120
findModuleInitPCHeader: len(data)= 436078
findModuleInitPCHeader: len(data)= 234232
findModuleInitPCHeader: len(data)= 131110
findModuleInitPCHeader: len(data)= 95205
findModuleInitPCHeader: len(data)= 95105
findModuleInitPCHeader: len(data)= 45065
findModuleInitPCHeader: len(data)= 1977516
findModuleInitPCHeader: len(data)= 1463668
findModuleInitPCHeader: len(data)= 1245252
findModuleInitPCHeader: len(data)= 1244874
findModuleInitPCHeader: len(data)= 1243622
findModuleInitPCHeader: len(data)= 1243534
findModuleInitPCHeader: len(data)= 1243534
findModuleInitPCHeader: len(data)= 880038
findModuleInitPCHeader: len(data)= 880006
findModuleInitPCHeader: len(data)= 812998
findModuleInitPCHeader: len(data)= 782262
findModuleInitPCHeader: len(data)= 589998
findModuleInitPCHeader: len(data)= 568678
findModuleInitPCHeader: len(data)= 568397
findModuleInitPCHeader: len(data)= 457639
findModuleInitPCHeader: len(data)= 436120
findModuleInitPCHeader: len(data)= 436078
findModuleInitPCHeader: len(data)= 234232
findModuleInitPCHeader: len(data)= 131110
findModuleInitPCHeader: len(data)= 95205
findModuleInitPCHeader: len(data)= 95105
findModuleInitPCHeader: len(data)= 45065
--- PASS: TestAllVersions (7.64s)
    --- PASS: TestAllVersions/117/testproject_lin (7.48s)
PASS
ok      github.com/mandiant/GoReSym     7.658s

for the single ELF file the scanner is invoked 44 times. perhaps we can invoke the scanner once across the entire file and re-use the results, since they shouldn't change (GoReSym doesn't change the input file).

findModuleInitPCHeader is called once in (f *elfFile) pcln_scan() once for each section as:

for _, sec := range f.elf.Sections {
    ...
    data := f.elf.DataAfterSection(sec)
    ...
    sigResults := findModuleInitPCHeader(data, sec.Addr)
}

pcln_scan is called once by (f *elfFile) pcln()
is called once by (e *Entry) PCLineTable(versionOverride string, knownGoTextBase uint64)
is called once by (f *File) PCLineTable(versionOverride string, knownGoTextBase uint64)
is called in main_impl potentially many times as the real text base is recovered.

so there are two things to do:

call the scan once for all the file data, and then sub-select the results per section, as necessary, and
scan for pcln candidates once, since they don't depend on the text base address

scan file data once

this will take a little work, since some of the pointers that are recovered are relative to the section being scanned (... + sectionBase), so if we scan all data at once, we'll have to fix up this pointers afterwards.

scan pcln candidates once

the pcln candidate scan doesn't depend on the Go text base address, so when the scan is restarted with a valid Go text base address, we should re-use the results from before. we can cache these in the Entry, perhaps.

GoReSym/objfile/objfile.go

Line 218 in ad76d74

candidates, err := e.raw.pcln()

in ebc220a we implement this caching and find that the runtime improves by 50%, as expected!

stevemk14ebr · 2023-08-10T12:03:58Z

build_all.sh

Please restore this, we don't need the CGO linking lines anymore but this exists so that I can automatically cross-compile all versions of GoReSym for all platforms in one command when it's time to cut releases.

This is the only change required, looks great otherwise. Thanks for exploring the removal of yara, I'm happy with the result here :)

ah, ok, sure no problem!

build_test_files.sh

objfile/patterns.go

objfile/patterns_test.go

objfile/scanner.go

williballenthin · 2023-08-10T12:59:17Z

this PR is still too slow to merge. i'm going to work on performance optimizations before i recommend that this be merged.

i expect to be able to optimize the outer scanning loops such that the net performance change here is within 50% of today's running time. the PR is around 7x slower right now. regex scanning will remain slow, nothing we can do about that, but we'll just scan much less data. the result is a GoReSym of similar performance but no linking difficulties.

i'll do most of the perf work in other branches so as not to confuse the diff here.

stevemk14ebr · 2023-08-10T16:37:01Z

Sounds good. Let me know if you'd like to discuss this first before you put too much work in. It's possible we may be able to re-visit the core algorithm itself. I've focused on correctness so far, optimizations have been mostly ignored till now except for when necessary to keep performance acceptable.

Not to mention it's spaghettified a little as samples exercising new edge cases have been found and fixed.

williballenthin · 2023-08-10T17:36:43Z

great, will do. i had thought i would have time immediately but there's a lot of GSoC stuff to close out. will do this as soon as i can since i think this will help VT.

stevemk14ebr · 2023-08-13T13:24:50Z

I had an idea for a possible optimization we could try. I read a blog by burntsushi a while ago on techniques he utilized for efficient regex searching in his library. One of the simple techniques used was to not start a regex scan until after first performing a fast prefix scan to create a candidate match set, we should try this. We could adapt the yara to regex routine to extract the 'longest fixed sequence' of bytes per pattern and scan over all file bytes once per pattern matching this sequence, and return a candidate regex match of length [-len(regex), len(regex)]. We then perform binary regex over these candidates. We use a window of negative length to support when the 'longest fixed sequence' is in the middle of a pattern rather than the start.

This makes an assumption that the binary regex crate does not already do this, or even if it does then the second assumption that go's memcmp would be faster than the compare the regex crate does over enough data.

A second issue we have is that we have multiple patterns. A hopefully easy solution would be to somehow reliably determine which arch (independent of os) a binary is using and only scan for that pattern. If we cannot do that reliably then we will need to experiment on if we should do multiple candidate regex scans over the data per signature or scan once with multiple comparisons at each step.

Thoughts?

williballenthin · 2023-08-13T18:28:23Z

use only candidate arch patterns - yup, agree.
use pre scan to identify candidate matches - yup, also agree.

For (2) we know from one of the other PRs that Go's string.Index() routine is very fast, so I do expect this to be a lot faster than the regex logic.

When making changes in this part of the code I don't feel very confident because I'm not sure if any test cases cover these scenarios. Can you confirm and/or can we find a few more sample files to exercise the pattern scanning functionality?

stevemk14ebr · 2023-08-13T18:32:46Z

This feature exists mostly to combat garble. I generate test binaries with garble build -ldflags="-w -s"

stevemk14ebr · 2023-08-14T19:17:31Z

I've implemented the needle optimization, it seems to have large increases to performance based on limited empirical testing 😄

Need to do more testing and benchmarking still but seems like a win!

File: GoReSym.test
Type: cpu
Time: Aug 14, 2023 at 5:04pm (EDT)
Duration: 33.84s, Total samples = 36.99s (109.31%)
Showing nodes accounting for 30.44s, 82.29% of 36.99s total
Dropped 421 nodes (cum <= 0.18s)
      flat  flat%   sum%        cum   cum%
         0     0%     0%     22.80s 61.64%  github.com/mandiant/GoReSym.TestAllVersions.func1
         0     0%     0%     22.80s 61.64%  github.com/mandiant/GoReSym.main_impl
         0     0%     0%     22.80s 61.64%  testing.tRunner
         0     0%     0%     12.80s 34.60%  github.com/mandiant/GoReSym/objfile.(*Entry).ModuleDataTable
         0     0%     0%     12.80s 34.60%  github.com/mandiant/GoReSym/objfile.(*File).ModuleDataTable
     0.06s  0.16%  0.16%     12.38s 33.47%  runtime.systemstack
     0.04s  0.11%  0.27%      8.86s 23.95%  runtime.gcBgMarkWorker
         0     0%  0.27%      8.15s 22.03%  github.com/mandiant/GoReSym/objfile.(*elfFile).moduledata_scan
         0     0%  0.27%      7.84s 21.19%  github.com/mandiant/GoReSym/debug/elf.(*File).DataAfterSection
         0     0%  0.27%      7.63s 20.63%  runtime.gcBgMarkWorker.func2
     0.07s  0.19%  0.46%      7.63s 20.63%  runtime.gcDrain
     0.02s 0.054%  0.51%      6.66s 18.00%  github.com/mandiant/GoReSym/objfile.(*Entry).ParseTypeLinks
         0     0%  0.51%      6.66s 18.00%  github.com/mandiant/GoReSym/objfile.(*File).ParseTypeLinks (inline)
         0     0%  0.51%      6.64s 17.95%  github.com/mandiant/GoReSym/objfile.(*Entry).ParseType
     0.03s 0.081%  0.59%      6.57s 17.76%  runtime.growslice
     0.15s  0.41%  1.00%      6.51s 17.60%  github.com/mandiant/GoReSym/objfile.(*Entry).ParseType_impl
     2.42s  6.54%  7.54%      6.19s 16.73%  runtime.scanobject
     6.12s 16.55% 24.09%      6.12s 16.55%  runtime.memmove
     0.50s  1.35% 25.44%      5.93s 16.03%  runtime.mallocgc
     0.04s  0.11% 25.55%      4.67s 12.63%  io.(*SectionReader).ReadAt
     0.03s 0.081% 25.63%      4.65s 12.57%  os.(*File).ReadAt
     0.01s 0.027% 25.66%      4.62s 12.49%  os.(*File).pread (inline)
     0.02s 0.054% 25.71%      4.61s 12.46%  internal/poll.(*FD).Pread
         0     0% 25.71%      4.45s 12.03%  syscall.Pread (inline)
     0.04s  0.11% 25.82%      4.45s 12.03%  syscall.pread
     0.09s  0.24% 26.06%      4.41s 11.92%  syscall.Syscall6
     0.01s 0.027% 26.09%         4s 10.81%  github.com/mandiant/GoReSym/objfile.(*machoFile).moduledata_scan
     3.83s 10.35% 36.44%      3.83s 10.35%  runtime/internal/syscall.Syscall6
     0.01s 0.027% 36.47%      3.83s 10.35%  syscall.RawSyscall6
         0     0% 36.47%      3.68s  9.95%  github.com/mandiant/GoReSym/debug/macho.(*File).DataAfterSection
     0.03s 0.081% 36.55%      3.09s  8.35%  runtime.makeslice
         0     0% 36.55%      2.90s  7.84%  github.com/mandiant/GoReSym/objfile.(*Entry).PCLineTable
         0     0% 36.55%      2.90s  7.84%  github.com/mandiant/GoReSym/objfile.(*File).PCLineTable (inline)
         0     0% 36.55%      2.70s  7.30%  github.com/mandiant/GoReSym/debug/elf.(*Section).Data
     2.51s  6.79% 43.34%      2.51s  6.79%  runtime.memclrNoHeapPointers
     0.02s 0.054% 43.39%      2.49s  6.73%  runtime.schedule
     0.03s 0.081% 43.47%      2.41s  6.52%  github.com/mandiant/GoReSym/objfile.(*Entry).readRTypeName

objfile/patterns.go

stevemk14ebr · 2023-08-15T17:34:56Z

After fixing up the dataAfterSection cache to work for all OS's we get another 3x speedup:

File: GoReSym.test
Type: cpu
Time: Aug 15, 2023 at 1:32pm (EDT)
Duration: 11.05s, Total samples = 10.89s (98.55%)
Showing nodes accounting for 8.78s, 80.62% of 10.89s total
Dropped 267 nodes (cum <= 0.05s)
      flat  flat%   sum%        cum   cum%
         0     0%     0%      9.62s 88.34%  testing.tRunner
         0     0%     0%      9.60s 88.15%  github.com/mandiant/GoReSym.TestAllVersions.func1
         0     0%     0%      9.60s 88.15%  github.com/mandiant/GoReSym.main_impl
         0     0%     0%      4.88s 44.81%  github.com/mandiant/GoReSym/objfile.(*Entry).ParseTypeLinks
         0     0%     0%      4.88s 44.81%  github.com/mandiant/GoReSym/objfile.(*File).ParseTypeLinks (inline)
         0     0%     0%      4.79s 43.99%  github.com/mandiant/GoReSym/objfile.(*Entry).ParseType
     0.08s  0.73%  0.73%      4.72s 43.34%  github.com/mandiant/GoReSym/objfile.(*Entry).ParseType_impl
     0.01s 0.092%  0.83%      2.55s 23.42%  os.(*File).ReadAt
     0.01s 0.092%  0.92%      2.54s 23.32%  io.(*SectionReader).ReadAt
     0.01s 0.092%  1.01%      2.53s 23.23%  os.(*File).pread (inline)
     0.05s  0.46%  1.47%      2.52s 23.14%  internal/poll.(*FD).Pread
         0     0%  1.47%      2.37s 21.76%  github.com/mandiant/GoReSym/objfile.(*Entry).PCLineTable
         0     0%  1.47%      2.37s 21.76%  github.com/mandiant/GoReSym/objfile.(*File).PCLineTable (inline)
         0     0%  1.47%      2.36s 21.67%  syscall.Pread (inline)
     0.01s 0.092%  1.56%      2.36s 21.67%  syscall.pread
     0.01s 0.092%  1.65%      2.35s 21.58%  syscall.Syscall6
     2.10s 19.28% 20.94%      2.10s 19.28%  runtime/internal/syscall.Syscall6
         0     0% 20.94%      2.10s 19.28%  syscall.RawSyscall6
         0     0% 20.94%      2.01s 18.46%  github.com/mandiant/GoReSym/objfile.(*Entry).ModuleDataTable
         0     0% 20.94%      2.01s 18.46%  github.com/mandiant/GoReSym/objfile.(*File).ModuleDataTable
     0.43s  3.95% 24.89%      1.97s 18.09%  runtime.mallocgc
     0.01s 0.092% 24.98%      1.72s 15.79%  github.com/mandiant/GoReSym/objfile.(*Entry).readRTypeName
     0.20s  1.84% 26.81%      1.70s 15.61%  bytes.Index
     0.02s  0.18% 27.00%      1.19s 10.93%  github.com/mandiant/GoReSym/objfile.(*elfFile).read_memory
         0     0% 27.00%      1.09s 10.01%  runtime.systemstack
     0.02s  0.18% 27.18%      1.04s  9.55%  github.com/mandiant/GoReSym/objfile.(*peFile).read_memory
         0     0% 27.18%      1.02s  9.37%  github.com/mandiant/GoReSym/objfile.findModuleInitPCHeader
         0     0% 27.18%      0.95s  8.72%  github.com/mandiant/GoReSym/objfile.FindRegex
         0     0% 27.18%      0.82s  7.53%  bytes.IndexByte (inline)
         0     0% 27.18%      0.82s  7.53%  github.com/mandiant/GoReSym/objfile.(*elfFile).pcln
         0     0% 27.18%      0.82s  7.53%  github.com/mandiant/GoReSym/objfile.(*elfFile).pcln_scan
         0     0% 27.18%      0.80s  7.35%  github.com/mandiant/GoReSym/objfile.findAllOccurrences
         0     0% 27.18%      0.79s  7.25%  github.com/mandiant/GoReSym/objfile.(*elfFile).moduledata_scan
     0.77s  7.07% 34.25%      0.77s  7.07%  indexbytebody
     0.02s  0.18% 34.44%      0.75s  6.89%  runtime.makeslice
     0.01s 0.092% 34.53%      0.70s  6.43%  runtime.growslice
     0.69s  6.34% 40.86%      0.69s  6.34%  runtime.memmove
         0     0% 40.86%      0.67s  6.15%  github.com/mandiant/GoReSym/objfile.(*peFile).moduledata_scan
     0.67s  6.15% 47.02%      0.67s  6.15%  runtime.memclrNoHeapPointers
     0.03s  0.28% 47.29%      0.62s  5.69%  encoding/binary.Read
         0     0% 47.29%      0.59s  5.42%  runtime.gcBgMarkWorker
         0     0% 47.29%      0.56s  5.14%  github.com/mandiant/GoReSym/objfile.(*machoFile).pcln
         0     0% 47.29%      0.56s  5.14%  github.com/mandiant/GoReSym/objfile.(*machoFile).pcln_scan
     0.01s 0.092% 47.38%      0.55s  5.05%  github.com/mandiant/GoReSym/objfile.(*elfFile).symbols
         0     0% 47.38%      0.53s  4.87%  github.com/mandiant/GoReSym/objfile.(*machoFile).moduledata_scan
     0.05s  0.46% 47.84%      0.53s  4.87%  github.com/mandiant/GoReSym/objfile.(*machoFile).symbols
         0     0% 47.84%      0.50s  4.59%  github.com/mandiant/GoReSym/debug/gosym.NewTable
         0     0% 47.84%      0.49s  4.50%  github.com/mandiant/GoReSym/objfile.(*peFile).pcln
         0     0% 47.84%      0.49s  4.50%  github.com/mandiant/GoReSym/objfile.(*peFile).pcln_scan
     0.02s  0.18% 48.03%      0.47s  4.32%  github.com/mandiant/GoReSym/objfile.(*machoFile).read_memory
         0     0% 48.03%      0.47s  4.32%  runtime.gcBgMarkWorker.func2
     0.01s 0.092% 48.12%      0.47s  4.32%  runtime.gcDrain
         0     0% 48.12%      0.44s  4.04%  github.com/mandiant/GoReSym/debug/elf.(*File).Symbols (inline)
         0     0% 48.12%      0.44s  4.04%  github.com/mandiant/GoReSym/debug/elf.(*File).getSymbols
         0     0% 48.12%      0.40s  3.67%  fmt.Sprintf
     0.37s  3.40% 51.52%      0.38s  3.49%  internal/bytealg.IndexRabinKarpBytes
     0.03s  0.28% 51.79%      0.33s  3.03%  runtime.concatstrings
     0.07s  0.64% 52.43%      0.32s  2.94%  encoding/binary.(*decoder).value
     0.10s  0.92% 53.35%      0.32s  2.94%  runtime.scanobject
         0     0% 53.35%      0.31s  2.85%  runtime.(*pageAlloc).scavenge
         0     0% 53.35%      0.31s  2.85%  runtime.(*pageAlloc).scavenge.func1
         0     0% 53.35%      0.31s  2.85%  runtime.(*pageAlloc).scavengeOne
         0     0% 53.35%      0.31s  2.85%  runtime.(*scavengerState).init.func2
         0     0% 53.35%      0.31s  2.85%  runtime.(*scavengerState).run
         0     0% 53.35%      0.31s  2.85%  runtime.bgscavenge
     0.06s  0.55% 53.90%      0.29s  2.66%  runtime.heapBitsSetType
         0     0% 53.90%      0.29s  2.66%  runtime.sysUnused
     0.01s 0.092% 53.99%      0.29s  2.66%  runtime.sysUnusedOS
     0.01s 0.092% 54.09%      0.28s  2.57%  runtime.concatstring2
     0.28s  2.57% 56.66%      0.28s  2.57%  runtime.madvise
         0     0% 56.66%      0.27s  2.48%  github.com/mandiant/GoReSym/debug/elf.(*File).DataAfterSection
     0.02s  0.18% 56.84%      0.27s  2.48%  runtime.newobject (partial-inline)
     0.02s  0.18% 57.02%      0.26s  2.39%  github.com/mandiant/GoReSym/debug/gosym.(*LineTable).go12Funcs
         0     0% 57.02%      0.26s  2.39%  github.com/mandiant/GoReSym/objfile.(*Entry).ReadPointerSizeMem
     0.03s  0.28% 57.30%      0.25s  2.30%  runtime.slicebytetostring
     0.04s  0.37% 57.67%      0.24s  2.20%  fmt.(*pp).doPrintf
         0     0% 57.67%      0.24s  2.20%  github.com/mandiant/GoReSym/debug/elf.(*File).getSymbols64
         0     0% 57.67%      0.23s  2.11%  github.com/mandiant/GoReSym/debug/gosym.(*LineTable).go12MapFiles
     0.02s  0.18% 57.85%      0.23s  2.11%  runtime.memclrNoHeapPointersChunked
         0     0% 57.85%      0.22s  2.02%  runtime.schedule
     0.21s  1.93% 59.78%      0.21s  1.93%  indexbody

williballenthin · 2023-08-15T17:45:37Z

good catch on the dataAfterSection issue. that was obviously an oversight and I apologize for introducing the bug.

stevemk14ebr · 2023-08-15T17:49:13Z

no worries, i've introduced plenty of bugs xD

…iling when valid moduledata symbols present, fix functab parsing on huuge binaries

stevemk14ebr · 2023-08-15T19:02:14Z

At this point I think this is ready to merge 😄 I've implemented an additional optimization that's been needed for a while now to avoid re-parsing all the pclntab candidates the second time through. Then I found the underlying bug which you had reported when parsing kubectl and fixed that. That issue was two parts:

When we scan the pclntab candidates the first time through, we expect that finding a moduledata with the pclntab VA means that we have the correct candidate. This assumption was wrong, because we were using the optionally present binary symbols to locate the moduledata. When these symbols were valid, the pclntab VA was in fact not being verified leaving us with the wrong candidate in some situations. I've simply removed this branch so that we always scan for the moduledata using the pclntab VA so that this assumption is now true. This is slightly slower but ultimately more correct and simpler code.
With very large binaries >= uint16_max function counts we'd hit an OOM guard that I'd added prior. Wild enough there does exist Go binaries in the wild with 100k plus functions, I've 3x-ed this limit to fix. The kubectl binary has ~100k functions 🤯

stevemk14ebr · 2023-08-15T19:08:09Z

#36 forgot to push these

EDIT: released https://github.com/mandiant/GoReSym/releases/tag/v2.5 🥳

swap out yara pattern scanning for binary regexps

6439aec

closes #29

williballenthin added the enhancement New feature or request label Aug 3, 2023

williballenthin requested a review from stevemk14ebr August 3, 2023 20:42

williballenthin marked this pull request as ready for review August 3, 2023 20:48

patterns_test: add additional test cases demonstrating binary regexp

e41ddfa

williballenthin mentioned this pull request Aug 3, 2023

profile runtime and optimize performance #31

Closed

rename test generation file

7fd3d68

williballenthin marked this pull request as draft August 4, 2023 10:47

williballenthin added 3 commits August 4, 2023 13:33

patterns: add some demonstrations of regexp group handling

91bbd8b

patterns: introduce BinaryRegexpGroup

d09fb19

try to improve performance by doing a single pass over the data, looking for any of the pattern candidates, and fetching these via the SubexpIndex functionality.

add pprof hook to main_test

8da26ac

objfile: cache results of pcln candidate scan

ebc220a

stevemk14ebr requested changes Aug 10, 2023

View reviewed changes

Implement Regex Needle Optimization

11d153d

github-advanced-security bot found potential problems Aug 14, 2023

View reviewed changes

objfile/patterns.go Dismissed Show dismissed Hide dismissed

stevemk14ebr added 3 commits August 14, 2023 17:07

Fix out of range bug

1add66f

Add tests

244799b

fixup dir permissions

9c69452

Apply data after section cache to all os's

0188000

Implement optization mentioned in #13, fix pclntab VA verification fa…

0410333

…iling when valid moduledata symbols present, fix functab parsing on huuge binaries

stevemk14ebr marked this pull request as ready for review August 15, 2023 19:02

stevemk14ebr merged commit 7e83100 into master Aug 15, 2023
3 checks passed

stevemk14ebr deleted the fix/issue-29 branch August 15, 2023 19:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

swap out yara pattern scanning for binary regexps #30

swap out yara pattern scanning for binary regexps #30

williballenthin commented Aug 3, 2023 •

edited

Loading

williballenthin commented Aug 3, 2023

williballenthin commented Aug 4, 2023 •

edited

Loading

stevemk14ebr commented Aug 4, 2023 •

edited

Loading

williballenthin commented Aug 4, 2023

williballenthin commented Aug 4, 2023 •

edited

Loading

stevemk14ebr Aug 10, 2023

stevemk14ebr Aug 10, 2023

williballenthin Aug 10, 2023

stevemk14ebr Aug 10, 2023

williballenthin commented Aug 10, 2023

stevemk14ebr commented Aug 10, 2023 •

edited

Loading

williballenthin commented Aug 10, 2023

stevemk14ebr commented Aug 13, 2023 •

edited

Loading

williballenthin commented Aug 13, 2023

stevemk14ebr commented Aug 13, 2023

stevemk14ebr commented Aug 14, 2023 •

edited

Loading

stevemk14ebr commented Aug 15, 2023

williballenthin commented Aug 15, 2023

stevemk14ebr commented Aug 15, 2023

stevemk14ebr commented Aug 15, 2023 •

edited

Loading

stevemk14ebr commented Aug 15, 2023 •

edited

Loading

swap out yara pattern scanning for binary regexps #30

swap out yara pattern scanning for binary regexps #30

Conversation

williballenthin commented Aug 3, 2023 • edited Loading

williballenthin commented Aug 3, 2023

williballenthin commented Aug 4, 2023 • edited Loading

stevemk14ebr commented Aug 4, 2023 • edited Loading

williballenthin commented Aug 4, 2023

williballenthin commented Aug 4, 2023 • edited Loading

scan file data once

scan pcln candidates once

stevemk14ebr Aug 10, 2023

Choose a reason for hiding this comment

stevemk14ebr Aug 10, 2023

Choose a reason for hiding this comment

williballenthin Aug 10, 2023

Choose a reason for hiding this comment

stevemk14ebr Aug 10, 2023

Choose a reason for hiding this comment

williballenthin commented Aug 10, 2023

stevemk14ebr commented Aug 10, 2023 • edited Loading

williballenthin commented Aug 10, 2023

stevemk14ebr commented Aug 13, 2023 • edited Loading

williballenthin commented Aug 13, 2023

stevemk14ebr commented Aug 13, 2023

stevemk14ebr commented Aug 14, 2023 • edited Loading

stevemk14ebr commented Aug 15, 2023

williballenthin commented Aug 15, 2023

stevemk14ebr commented Aug 15, 2023

stevemk14ebr commented Aug 15, 2023 • edited Loading

stevemk14ebr commented Aug 15, 2023 • edited Loading

williballenthin commented Aug 3, 2023 •

edited

Loading

williballenthin commented Aug 4, 2023 •

edited

Loading

stevemk14ebr commented Aug 4, 2023 •

edited

Loading

williballenthin commented Aug 4, 2023 •

edited

Loading

stevemk14ebr commented Aug 10, 2023 •

edited

Loading

stevemk14ebr commented Aug 13, 2023 •

edited

Loading

stevemk14ebr commented Aug 14, 2023 •

edited

Loading

stevemk14ebr commented Aug 15, 2023 •

edited

Loading

stevemk14ebr commented Aug 15, 2023 •

edited

Loading