Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

swap out yara pattern scanning for binary regexps #30

Merged
merged 13 commits into from
Aug 15, 2023
Merged

Conversation

williballenthin
Copy link
Contributor

@williballenthin williballenthin commented Aug 3, 2023

closes #29
closes #23

@williballenthin williballenthin added the enhancement New feature or request label Aug 3, 2023
@williballenthin williballenthin marked this pull request as ready for review August 3, 2023 20:48
@williballenthin
Copy link
Contributor Author

tests pass when i run them locally.

@williballenthin
Copy link
Contributor Author

williballenthin commented Aug 4, 2023

unfortunately the performance of the binaryregexp is significantly worse than yara. in the same environment, the test cases on master run in 67s while this PR takes 346s (5x).

only now am i seeing that Go's regular expression implementation is really slow: https://github.com/mariomka/regex-benchmark#performance

there are certainly some optimizations we can make to avoid doing so much regexp scanning; i will think on it and try to propose which strategies we might try first.

@stevemk14ebr
Copy link
Collaborator

stevemk14ebr commented Aug 4, 2023

Awe that's too bad to hear. Unfortunately I switched to yara originally for performance reasons so this makes some sense.

@williballenthin williballenthin marked this pull request as draft August 4, 2023 10:47
try to improve performance by doing a single pass over the data,
looking for any of the pattern candidates, and fetching these via the
SubexpIndex functionality.
@williballenthin
Copy link
Contributor Author

initially in this PR i was lazy with the regular expressions: i ran five expressions over the data separately, one for each pattern: 6439aec#diff-5c52596a150c1cf395c3609f471a82dbdeca26bccb840730182b301007cf2699R87

i assumed this was an obvious place to optimize, so i put all five patterns into a single pattern like (a|b|c|...) and used named groups to fetch the results. i figured this would be faster because it scans the data only once:

m := g.FindAllIndex(data, -1)

surprisingly, the performance doesnt really change much. perhaps the regex engine does 5x as much work during the merged pattern scan? ugh.

the profile data looks like this, currently:


go test -v -run "^TestAllVersions/117"

@williballenthin ➜ /workspaces/GoReSym (master-perf) $ go test -v -run "^TestAllVersions/117"
--- PASS: TestAllVersions (4.46s)
--- PASS: TestAllVersions/117/testproject_lin (0.91s)
--- PASS: TestAllVersions/117/testproject_lin_32 (0.62s)
--- PASS: TestAllVersions/117/testproject_lin_stripped (0.43s)
--- PASS: TestAllVersions/117/testproject_lin_stripped_32 (0.34s)
--- PASS: TestAllVersions/117/testproject_mac (0.87s)
--- PASS: TestAllVersions/117/testproject_mac_stripped (0.48s)
--- PASS: TestAllVersions/117/testproject_win_32.exe (0.23s)
--- PASS: TestAllVersions/117/testproject_win_stripped_32.exe (0.10s)
--- PASS: TestAllVersions/117/testproject_win_stripped.exe (0.12s)
--- PASS: TestAllVersions/117/testproject_win.exe (0.20s)
PASS
ok github.com/mandiant/GoReSym 4.705s

@williballenthin ➜ /workspaces/GoReSym (fix/issue-29) $ pprof -cum -top cpu-master-perf.pprof
File: GoReSym.test
Build ID: bd44f557032899b425c68f2041eee53bf543b181
Type: cpu
Time: Aug 4, 2023 at 2:11pm (UTC)
Duration: 4.45s, Total samples = 4.71s (105.93%)
Showing nodes accounting for 4.30s, 91.30% of 4.71s total
Dropped 129 nodes (cum <= 0.02s)
flat flat% sum% cum cum%
0 0% 0% 3.91s 83.01% github.com/mandiant/GoReSym.TestAllVersions.func1
0 0% 0% 3.91s 83.01% github.com/mandiant/GoReSym.main_impl
0 0% 0% 3.91s 83.01% testing.tRunner
0 0% 0% 2s 42.46% github.com/mandiant/GoReSym/objfile.(*Entry).PCLineTable
0 0% 0% 2s 42.46% github.com/mandiant/GoReSym/objfile.(*File).PCLineTable (inline)
0 0% 0% 1.42s 30.15% github.com/mandiant/GoReSym/objfile.(*Entry).ModuleDataTable
0 0% 0% 1.42s 30.15% github.com/mandiant/GoReSym/objfile.(*File).ModuleDataTable
1.25s 26.54% 26.54% 1.26s 26.75% runtime.cgocall
0 0% 26.54% 1.17s 24.84% github.com/mandiant/GoReSym/objfile.findModuleInitPCHeader
0 0% 26.54% 1.11s 23.57% github.com/mandiant/GoReSym/debug/elf.(*File).DataAfterSection
1.10s 23.35% 49.89% 1.10s 23.35% runtime.memmove
0 0% 49.89% 1.05s 22.29% github.com/mandiant/GoReSym/objfile.(*elfFile).pcln
0 0% 49.89% 1.05s 22.29% github.com/mandiant/GoReSym/objfile.(*elfFile).pcln_scan
0 0% 49.89% 1.01s 21.44% runtime.growslice
0 0% 49.89% 0.81s 17.20% runtime.systemstack
0 0% 49.89% 0.80s 16.99% github.com/mandiant/GoReSym/objfile.(*elfFile).moduledata_scan
0 0% 49.89% 0.76s 16.14% github.com/mandiant/GoReSym/debug/macho.(*File).DataAfterSection
0.01s 0.21% 50.11% 0.68s 14.44% runtime.mallocgc
0 0% 50.11% 0.62s 13.16% github.com/mandiant/GoReSym/objfile.(*machoFile).moduledata_scan
0 0% 50.11% 0.58s 12.31% internal/poll.(*FD).Pread
0 0% 50.11% 0.58s 12.31% io.(*SectionReader).ReadAt
0 0% 50.11% 0.58s 12.31% os.(*File).ReadAt
0 0% 50.11% 0.58s 12.31% os.(*File).pread (inline)
0 0% 50.11% 0.56s 11.89% syscall.Pread (inline)
0.01s 0.21% 50.32% 0.56s 11.89% syscall.Syscall6
0 0% 50.32% 0.56s 11.89% syscall.pread
0.54s 11.46% 61.78% 0.54s 11.46% runtime/internal/syscall.Syscall6
0 0% 61.78% 0.53s 11.25% github.com/mandiant/GoReSym/objfile.(*machoFile).pcln
0 0% 61.78% 0.53s 11.25% github.com/mandiant/GoReSym/objfile.(*machoFile).pcln_scan
0 0% 61.78% 0.53s 11.25% syscall.RawSyscall6
0.49s 10.40% 72.19% 0.49s 10.40% runtime.memclrNoHeapPointers
0 0% 72.19% 0.48s 10.19% github.com/hillu/go-yara/v4.(*Compiler).AddString
0 0% 72.19% 0.48s 10.19% runtime.gcBgMarkWorker
0 0% 72.19% 0.47s 9.98% github.com/hillu/go-yara/v4.(*Compiler).AddString.func4
0 0% 72.19% 0.47s 9.98% github.com/hillu/go-yara/v4._Cfunc_yr_compiler_add_string
0 0% 72.19% 0.46s 9.77% github.com/mandiant/GoReSym/objfile.(*Entry).ParseType
0.02s 0.42% 72.61% 0.46s 9.77% github.com/mandiant/GoReSym/objfile.(*Entry).ParseType_impl
0 0% 72.61% 0.45s 9.55% runtime.gcBgMarkWorker.func2
0 0% 72.61% 0.45s 9.55% runtime.gcDrain
0 0% 72.61% 0.44s 9.34% runtime.makeslice
0 0% 72.61% 0.43s 9.13% github.com/mandiant/GoReSym/objfile.(*Entry).ParseTypeLinks
0 0% 72.61% 0.43s 9.13% github.com/mandiant/GoReSym/objfile.(*File).ParseTypeLinks (inline)
0 0% 72.61% 0.38s 8.07% github.com/hillu/go-yara/v4.(*Compiler).GetRules
0 0% 72.61% 0.38s 8.07% github.com/hillu/go-yara/v4.(*Compiler).GetRules.func1
0 0% 72.61% 0.38s 8.07% github.com/hillu/go-yara/v4._Cfunc_yr_compiler_get_rules
0 0% 72.61% 0.38s 8.07% github.com/mandiant/GoReSym/debug/elf.(*Section).Data
0 0% 72.61% 0.34s 7.22% github.com/mandiant/GoReSym/objfile.(*peFile).pcln
0 0% 72.61% 0.34s 7.22% github.com/mandiant/GoReSym/objfile.(*peFile).pcln_scan
0 0% 72.61% 0.31s 6.58% runtime.markroot
0 0% 72.61% 0.30s 6.37% github.com/hillu/go-yara/v4.(*Scanner).ScanMem
0 0% 72.61% 0.30s 6.37% github.com/hillu/go-yara/v4.(*Scanner).ScanMem.func2
0 0% 72.61% 0.30s 6.37% github.com/hillu/go-yara/v4._Cfunc_yr_scanner_scan_mem
...

@williballenthin ➜ /workspaces/GoReSym (fix/issue-29) $ go test -v -run "^TestAllVersions/117"
--- PASS: TestAllVersions (32.13s)
--- PASS: TestAllVersions/117/testproject_lin (8.75s)
--- PASS: TestAllVersions/117/testproject_lin_32 (6.06s)
--- PASS: TestAllVersions/117/testproject_lin_stripped (2.32s)
--- PASS: TestAllVersions/117/testproject_lin_stripped_32 (1.64s)
--- PASS: TestAllVersions/117/testproject_mac (7.23s)
--- PASS: TestAllVersions/117/testproject_mac_stripped (3.40s)
--- PASS: TestAllVersions/117/testproject_win_32.exe (0.76s)
--- PASS: TestAllVersions/117/testproject_win_stripped_32.exe (0.53s)
--- PASS: TestAllVersions/117/testproject_win_stripped.exe (0.58s)
--- PASS: TestAllVersions/117/testproject_win.exe (0.84s)
PASS
ok github.com/mandiant/GoReSym 32.133s

@williballenthin ➜ /workspaces/GoReSym (fix/issue-29) $ pprof -cum -top cpu-issue-29.pprof
File: GoReSym.test
Type: cpu
Time: Aug 4, 2023 at 2:14pm (UTC)
Duration: 30.44s, Total samples = 30.30s (99.55%)
Showing nodes accounting for 29.04s, 95.84% of 30.30s total
Dropped 226 nodes (cum <= 0.15s)
flat flat% sum% cum cum%
0 0% 0% 29.37s 96.93% github.com/mandiant/GoReSym.TestAllVersions.func1
0 0% 0% 29.37s 96.93% github.com/mandiant/GoReSym.main_impl
0 0% 0% 29.37s 96.93% testing.tRunner
0 0% 0% 27.58s 91.02% github.com/mandiant/GoReSym/objfile.(*Entry).PCLineTable
0 0% 0% 27.58s 91.02% github.com/mandiant/GoReSym/objfile.(*File).PCLineTable (inline)
0 0% 0% 27.14s 89.57% github.com/mandiant/GoReSym/objfile.findModuleInitPCHeader
0 0% 0% 27.12s 89.50% github.com/mandiant/GoReSym/objfile.(*BinaryRegexpGroup).FindAllIndex (inline)
0 0% 0% 27.12s 89.50% rsc.io/binaryregexp.(*Regexp).FindAllIndex
0 0% 0% 27.12s 89.50% rsc.io/binaryregexp.(*Regexp).allMatches
0 0% 0% 27.12s 89.50% rsc.io/binaryregexp.(*Regexp).doExecute
1.02s 3.37% 3.37% 27.12s 89.50% rsc.io/binaryregexp.(*machine).match
17.70s 58.42% 61.78% 21.56s 71.16% rsc.io/binaryregexp.(*machine).add
0 0% 61.78% 15.55s 51.32% github.com/mandiant/GoReSym/objfile.(*elfFile).pcln
0 0% 61.78% 15.55s 51.32% github.com/mandiant/GoReSym/objfile.(*elfFile).pcln_scan
0 0% 61.78% 9.44s 31.16% github.com/mandiant/GoReSym/objfile.(*machoFile).pcln
0 0% 61.78% 9.44s 31.16% github.com/mandiant/GoReSym/objfile.(*machoFile).pcln_scan
4.30s 14.19% 75.97% 4.39s 14.49% rsc.io/binaryregexp.(*machine).step
3.87s 12.77% 88.75% 3.87s 12.77% runtime.memmove
0 0% 88.75% 2.52s 8.32% github.com/mandiant/GoReSym/objfile.(*peFile).pcln
0 0% 88.75% 2.51s 8.28% github.com/mandiant/GoReSym/objfile.(*peFile).pcln_scan
0 0% 88.75% 1.33s 4.39% github.com/mandiant/GoReSym/objfile.(*Entry).ModuleDataTable
0 0% 88.75% 1.33s 4.39% github.com/mandiant/GoReSym/objfile.(*File).ModuleDataTable
0 0% 88.75% 1.03s 3.40% runtime.systemstack
0 0% 88.75% 0.91s 3.00% github.com/mandiant/GoReSym/debug/elf.(*File).DataAfterSection
0 0% 88.75% 0.83s 2.74% runtime.growslice
0 0% 88.75% 0.82s 2.71% github.com/mandiant/GoReSym/objfile.(*elfFile).moduledata_scan
0.80s 2.64% 91.39% 0.81s 2.67% rsc.io/binaryregexp.(*machine).alloc (inline)
0.03s 0.099% 91.49% 0.66s 2.18% runtime.mallocgc
...

@williballenthin
Copy link
Contributor Author

williballenthin commented Aug 4, 2023

one thing thats suspicious is how often the scanner routine is called:

@williballenthin ➜ /workspaces/GoReSym (fix/issue-29) $ go test -v -run "^TestAllVersions/117/testproject_lin$"
findModuleInitPCHeader: len(data)= 1977516
findModuleInitPCHeader: len(data)= 1463668
findModuleInitPCHeader: len(data)= 1245252
findModuleInitPCHeader: len(data)= 1244874
findModuleInitPCHeader: len(data)= 1243622
findModuleInitPCHeader: len(data)= 1243534
findModuleInitPCHeader: len(data)= 1243534
findModuleInitPCHeader: len(data)= 880038
findModuleInitPCHeader: len(data)= 880006
findModuleInitPCHeader: len(data)= 812998
findModuleInitPCHeader: len(data)= 782262
findModuleInitPCHeader: len(data)= 589998
findModuleInitPCHeader: len(data)= 568678
findModuleInitPCHeader: len(data)= 568397
findModuleInitPCHeader: len(data)= 457639
findModuleInitPCHeader: len(data)= 436120
findModuleInitPCHeader: len(data)= 436078
findModuleInitPCHeader: len(data)= 234232
findModuleInitPCHeader: len(data)= 131110
findModuleInitPCHeader: len(data)= 95205
findModuleInitPCHeader: len(data)= 95105
findModuleInitPCHeader: len(data)= 45065
findModuleInitPCHeader: len(data)= 1977516
findModuleInitPCHeader: len(data)= 1463668
findModuleInitPCHeader: len(data)= 1245252
findModuleInitPCHeader: len(data)= 1244874
findModuleInitPCHeader: len(data)= 1243622
findModuleInitPCHeader: len(data)= 1243534
findModuleInitPCHeader: len(data)= 1243534
findModuleInitPCHeader: len(data)= 880038
findModuleInitPCHeader: len(data)= 880006
findModuleInitPCHeader: len(data)= 812998
findModuleInitPCHeader: len(data)= 782262
findModuleInitPCHeader: len(data)= 589998
findModuleInitPCHeader: len(data)= 568678
findModuleInitPCHeader: len(data)= 568397
findModuleInitPCHeader: len(data)= 457639
findModuleInitPCHeader: len(data)= 436120
findModuleInitPCHeader: len(data)= 436078
findModuleInitPCHeader: len(data)= 234232
findModuleInitPCHeader: len(data)= 131110
findModuleInitPCHeader: len(data)= 95205
findModuleInitPCHeader: len(data)= 95105
findModuleInitPCHeader: len(data)= 45065
--- PASS: TestAllVersions (7.64s)
    --- PASS: TestAllVersions/117/testproject_lin (7.48s)
PASS
ok      github.com/mandiant/GoReSym     7.658s

for the single ELF file the scanner is invoked 44 times. perhaps we can invoke the scanner once across the entire file and re-use the results, since they shouldn't change (GoReSym doesn't change the input file).

findModuleInitPCHeader is called once in (f *elfFile) pcln_scan() once for each section as:

for _, sec := range f.elf.Sections {
    ...
    data := f.elf.DataAfterSection(sec)
    ...
    sigResults := findModuleInitPCHeader(data, sec.Addr)
}

pcln_scan is called once by (f *elfFile) pcln()
is called once by (e *Entry) PCLineTable(versionOverride string, knownGoTextBase uint64)
is called once by (f *File) PCLineTable(versionOverride string, knownGoTextBase uint64)
is called in main_impl potentially many times as the real text base is recovered.

so there are two things to do:

  1. call the scan once for all the file data, and then sub-select the results per section, as necessary, and
  2. scan for pcln candidates once, since they don't depend on the text base address

scan file data once

this will take a little work, since some of the pointers that are recovered are relative to the section being scanned (... + sectionBase), so if we scan all data at once, we'll have to fix up this pointers afterwards.

scan pcln candidates once

the pcln candidate scan doesn't depend on the Go text base address, so when the scan is restarted with a valid Go text base address, we should re-use the results from before. we can cache these in the Entry, perhaps.

candidates, err := e.raw.pcln()

in ebc220a we implement this caching and find that the runtime improves by 50%, as expected!

build_all.sh Outdated
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please restore this, we don't need the CGO linking lines anymore but this exists so that I can automatically cross-compile all versions of GoReSym for all platforms in one command when it's time to cut releases.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only change required, looks great otherwise. Thanks for exploring the removal of yara, I'm happy with the result here :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, ok, sure no problem!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

build_test_files.sh Show resolved Hide resolved
objfile/patterns.go Outdated Show resolved Hide resolved
objfile/patterns_test.go Outdated Show resolved Hide resolved
objfile/scanner.go Outdated Show resolved Hide resolved
@williballenthin
Copy link
Contributor Author

this PR is still too slow to merge. i'm going to work on performance optimizations before i recommend that this be merged.

i expect to be able to optimize the outer scanning loops such that the net performance change here is within 50% of today's running time. the PR is around 7x slower right now. regex scanning will remain slow, nothing we can do about that, but we'll just scan much less data. the result is a GoReSym of similar performance but no linking difficulties.

i'll do most of the perf work in other branches so as not to confuse the diff here.

@stevemk14ebr
Copy link
Collaborator

stevemk14ebr commented Aug 10, 2023

Sounds good. Let me know if you'd like to discuss this first before you put too much work in. It's possible we may be able to re-visit the core algorithm itself. I've focused on correctness so far, optimizations have been mostly ignored till now except for when necessary to keep performance acceptable.

Not to mention it's spaghettified a little as samples exercising new edge cases have been found and fixed.

@williballenthin
Copy link
Contributor Author

great, will do. i had thought i would have time immediately but there's a lot of GSoC stuff to close out. will do this as soon as i can since i think this will help VT.

@stevemk14ebr
Copy link
Collaborator

stevemk14ebr commented Aug 13, 2023

I had an idea for a possible optimization we could try. I read a blog by burntsushi a while ago on techniques he utilized for efficient regex searching in his library. One of the simple techniques used was to not start a regex scan until after first performing a fast prefix scan to create a candidate match set, we should try this. We could adapt the yara to regex routine to extract the 'longest fixed sequence' of bytes per pattern and scan over all file bytes once per pattern matching this sequence, and return a candidate regex match of length [-len(regex), len(regex)]. We then perform binary regex over these candidates. We use a window of negative length to support when the 'longest fixed sequence' is in the middle of a pattern rather than the start.

This makes an assumption that the binary regex crate does not already do this, or even if it does then the second assumption that go's memcmp would be faster than the compare the regex crate does over enough data.

A second issue we have is that we have multiple patterns. A hopefully easy solution would be to somehow reliably determine which arch (independent of os) a binary is using and only scan for that pattern. If we cannot do that reliably then we will need to experiment on if we should do multiple candidate regex scans over the data per signature or scan once with multiple comparisons at each step.

Thoughts?

@williballenthin
Copy link
Contributor Author

  1. use only candidate arch patterns - yup, agree.
  2. use pre scan to identify candidate matches - yup, also agree.

For (2) we know from one of the other PRs that Go's string.Index() routine is very fast, so I do expect this to be a lot faster than the regex logic.

When making changes in this part of the code I don't feel very confident because I'm not sure if any test cases cover these scenarios. Can you confirm and/or can we find a few more sample files to exercise the pattern scanning functionality?

@stevemk14ebr
Copy link
Collaborator

This feature exists mostly to combat garble. I generate test binaries with garble build -ldflags="-w -s"

@stevemk14ebr
Copy link
Collaborator

stevemk14ebr commented Aug 14, 2023

I've implemented the needle optimization, it seems to have large increases to performance based on limited empirical testing 😄

Need to do more testing and benchmarking still but seems like a win!

File: GoReSym.test
Type: cpu
Time: Aug 14, 2023 at 5:04pm (EDT)
Duration: 33.84s, Total samples = 36.99s (109.31%)
Showing nodes accounting for 30.44s, 82.29% of 36.99s total
Dropped 421 nodes (cum <= 0.18s)
      flat  flat%   sum%        cum   cum%
         0     0%     0%     22.80s 61.64%  github.com/mandiant/GoReSym.TestAllVersions.func1
         0     0%     0%     22.80s 61.64%  github.com/mandiant/GoReSym.main_impl
         0     0%     0%     22.80s 61.64%  testing.tRunner
         0     0%     0%     12.80s 34.60%  github.com/mandiant/GoReSym/objfile.(*Entry).ModuleDataTable
         0     0%     0%     12.80s 34.60%  github.com/mandiant/GoReSym/objfile.(*File).ModuleDataTable
     0.06s  0.16%  0.16%     12.38s 33.47%  runtime.systemstack
     0.04s  0.11%  0.27%      8.86s 23.95%  runtime.gcBgMarkWorker
         0     0%  0.27%      8.15s 22.03%  github.com/mandiant/GoReSym/objfile.(*elfFile).moduledata_scan
         0     0%  0.27%      7.84s 21.19%  github.com/mandiant/GoReSym/debug/elf.(*File).DataAfterSection
         0     0%  0.27%      7.63s 20.63%  runtime.gcBgMarkWorker.func2
     0.07s  0.19%  0.46%      7.63s 20.63%  runtime.gcDrain
     0.02s 0.054%  0.51%      6.66s 18.00%  github.com/mandiant/GoReSym/objfile.(*Entry).ParseTypeLinks
         0     0%  0.51%      6.66s 18.00%  github.com/mandiant/GoReSym/objfile.(*File).ParseTypeLinks (inline)
         0     0%  0.51%      6.64s 17.95%  github.com/mandiant/GoReSym/objfile.(*Entry).ParseType
     0.03s 0.081%  0.59%      6.57s 17.76%  runtime.growslice
     0.15s  0.41%  1.00%      6.51s 17.60%  github.com/mandiant/GoReSym/objfile.(*Entry).ParseType_impl
     2.42s  6.54%  7.54%      6.19s 16.73%  runtime.scanobject
     6.12s 16.55% 24.09%      6.12s 16.55%  runtime.memmove
     0.50s  1.35% 25.44%      5.93s 16.03%  runtime.mallocgc
     0.04s  0.11% 25.55%      4.67s 12.63%  io.(*SectionReader).ReadAt
     0.03s 0.081% 25.63%      4.65s 12.57%  os.(*File).ReadAt
     0.01s 0.027% 25.66%      4.62s 12.49%  os.(*File).pread (inline)
     0.02s 0.054% 25.71%      4.61s 12.46%  internal/poll.(*FD).Pread
         0     0% 25.71%      4.45s 12.03%  syscall.Pread (inline)
     0.04s  0.11% 25.82%      4.45s 12.03%  syscall.pread
     0.09s  0.24% 26.06%      4.41s 11.92%  syscall.Syscall6
     0.01s 0.027% 26.09%         4s 10.81%  github.com/mandiant/GoReSym/objfile.(*machoFile).moduledata_scan
     3.83s 10.35% 36.44%      3.83s 10.35%  runtime/internal/syscall.Syscall6
     0.01s 0.027% 36.47%      3.83s 10.35%  syscall.RawSyscall6
         0     0% 36.47%      3.68s  9.95%  github.com/mandiant/GoReSym/debug/macho.(*File).DataAfterSection
     0.03s 0.081% 36.55%      3.09s  8.35%  runtime.makeslice
         0     0% 36.55%      2.90s  7.84%  github.com/mandiant/GoReSym/objfile.(*Entry).PCLineTable
         0     0% 36.55%      2.90s  7.84%  github.com/mandiant/GoReSym/objfile.(*File).PCLineTable (inline)
         0     0% 36.55%      2.70s  7.30%  github.com/mandiant/GoReSym/debug/elf.(*Section).Data
     2.51s  6.79% 43.34%      2.51s  6.79%  runtime.memclrNoHeapPointers
     0.02s 0.054% 43.39%      2.49s  6.73%  runtime.schedule
     0.03s 0.081% 43.47%      2.41s  6.52%  github.com/mandiant/GoReSym/objfile.(*Entry).readRTypeName

objfile/patterns.go Dismissed Show dismissed Hide dismissed
@stevemk14ebr
Copy link
Collaborator

After fixing up the dataAfterSection cache to work for all OS's we get another 3x speedup:

File: GoReSym.test
Type: cpu
Time: Aug 15, 2023 at 1:32pm (EDT)
Duration: 11.05s, Total samples = 10.89s (98.55%)
Showing nodes accounting for 8.78s, 80.62% of 10.89s total
Dropped 267 nodes (cum <= 0.05s)
      flat  flat%   sum%        cum   cum%
         0     0%     0%      9.62s 88.34%  testing.tRunner
         0     0%     0%      9.60s 88.15%  github.com/mandiant/GoReSym.TestAllVersions.func1
         0     0%     0%      9.60s 88.15%  github.com/mandiant/GoReSym.main_impl
         0     0%     0%      4.88s 44.81%  github.com/mandiant/GoReSym/objfile.(*Entry).ParseTypeLinks
         0     0%     0%      4.88s 44.81%  github.com/mandiant/GoReSym/objfile.(*File).ParseTypeLinks (inline)
         0     0%     0%      4.79s 43.99%  github.com/mandiant/GoReSym/objfile.(*Entry).ParseType
     0.08s  0.73%  0.73%      4.72s 43.34%  github.com/mandiant/GoReSym/objfile.(*Entry).ParseType_impl
     0.01s 0.092%  0.83%      2.55s 23.42%  os.(*File).ReadAt
     0.01s 0.092%  0.92%      2.54s 23.32%  io.(*SectionReader).ReadAt
     0.01s 0.092%  1.01%      2.53s 23.23%  os.(*File).pread (inline)
     0.05s  0.46%  1.47%      2.52s 23.14%  internal/poll.(*FD).Pread
         0     0%  1.47%      2.37s 21.76%  github.com/mandiant/GoReSym/objfile.(*Entry).PCLineTable
         0     0%  1.47%      2.37s 21.76%  github.com/mandiant/GoReSym/objfile.(*File).PCLineTable (inline)
         0     0%  1.47%      2.36s 21.67%  syscall.Pread (inline)
     0.01s 0.092%  1.56%      2.36s 21.67%  syscall.pread
     0.01s 0.092%  1.65%      2.35s 21.58%  syscall.Syscall6
     2.10s 19.28% 20.94%      2.10s 19.28%  runtime/internal/syscall.Syscall6
         0     0% 20.94%      2.10s 19.28%  syscall.RawSyscall6
         0     0% 20.94%      2.01s 18.46%  github.com/mandiant/GoReSym/objfile.(*Entry).ModuleDataTable
         0     0% 20.94%      2.01s 18.46%  github.com/mandiant/GoReSym/objfile.(*File).ModuleDataTable
     0.43s  3.95% 24.89%      1.97s 18.09%  runtime.mallocgc
     0.01s 0.092% 24.98%      1.72s 15.79%  github.com/mandiant/GoReSym/objfile.(*Entry).readRTypeName
     0.20s  1.84% 26.81%      1.70s 15.61%  bytes.Index
     0.02s  0.18% 27.00%      1.19s 10.93%  github.com/mandiant/GoReSym/objfile.(*elfFile).read_memory
         0     0% 27.00%      1.09s 10.01%  runtime.systemstack
     0.02s  0.18% 27.18%      1.04s  9.55%  github.com/mandiant/GoReSym/objfile.(*peFile).read_memory
         0     0% 27.18%      1.02s  9.37%  github.com/mandiant/GoReSym/objfile.findModuleInitPCHeader
         0     0% 27.18%      0.95s  8.72%  github.com/mandiant/GoReSym/objfile.FindRegex
         0     0% 27.18%      0.82s  7.53%  bytes.IndexByte (inline)
         0     0% 27.18%      0.82s  7.53%  github.com/mandiant/GoReSym/objfile.(*elfFile).pcln
         0     0% 27.18%      0.82s  7.53%  github.com/mandiant/GoReSym/objfile.(*elfFile).pcln_scan
         0     0% 27.18%      0.80s  7.35%  github.com/mandiant/GoReSym/objfile.findAllOccurrences
         0     0% 27.18%      0.79s  7.25%  github.com/mandiant/GoReSym/objfile.(*elfFile).moduledata_scan
     0.77s  7.07% 34.25%      0.77s  7.07%  indexbytebody
     0.02s  0.18% 34.44%      0.75s  6.89%  runtime.makeslice
     0.01s 0.092% 34.53%      0.70s  6.43%  runtime.growslice
     0.69s  6.34% 40.86%      0.69s  6.34%  runtime.memmove
         0     0% 40.86%      0.67s  6.15%  github.com/mandiant/GoReSym/objfile.(*peFile).moduledata_scan
     0.67s  6.15% 47.02%      0.67s  6.15%  runtime.memclrNoHeapPointers
     0.03s  0.28% 47.29%      0.62s  5.69%  encoding/binary.Read
         0     0% 47.29%      0.59s  5.42%  runtime.gcBgMarkWorker
         0     0% 47.29%      0.56s  5.14%  github.com/mandiant/GoReSym/objfile.(*machoFile).pcln
         0     0% 47.29%      0.56s  5.14%  github.com/mandiant/GoReSym/objfile.(*machoFile).pcln_scan
     0.01s 0.092% 47.38%      0.55s  5.05%  github.com/mandiant/GoReSym/objfile.(*elfFile).symbols
         0     0% 47.38%      0.53s  4.87%  github.com/mandiant/GoReSym/objfile.(*machoFile).moduledata_scan
     0.05s  0.46% 47.84%      0.53s  4.87%  github.com/mandiant/GoReSym/objfile.(*machoFile).symbols
         0     0% 47.84%      0.50s  4.59%  github.com/mandiant/GoReSym/debug/gosym.NewTable
         0     0% 47.84%      0.49s  4.50%  github.com/mandiant/GoReSym/objfile.(*peFile).pcln
         0     0% 47.84%      0.49s  4.50%  github.com/mandiant/GoReSym/objfile.(*peFile).pcln_scan
     0.02s  0.18% 48.03%      0.47s  4.32%  github.com/mandiant/GoReSym/objfile.(*machoFile).read_memory
         0     0% 48.03%      0.47s  4.32%  runtime.gcBgMarkWorker.func2
     0.01s 0.092% 48.12%      0.47s  4.32%  runtime.gcDrain
         0     0% 48.12%      0.44s  4.04%  github.com/mandiant/GoReSym/debug/elf.(*File).Symbols (inline)
         0     0% 48.12%      0.44s  4.04%  github.com/mandiant/GoReSym/debug/elf.(*File).getSymbols
         0     0% 48.12%      0.40s  3.67%  fmt.Sprintf
     0.37s  3.40% 51.52%      0.38s  3.49%  internal/bytealg.IndexRabinKarpBytes
     0.03s  0.28% 51.79%      0.33s  3.03%  runtime.concatstrings
     0.07s  0.64% 52.43%      0.32s  2.94%  encoding/binary.(*decoder).value
     0.10s  0.92% 53.35%      0.32s  2.94%  runtime.scanobject
         0     0% 53.35%      0.31s  2.85%  runtime.(*pageAlloc).scavenge
         0     0% 53.35%      0.31s  2.85%  runtime.(*pageAlloc).scavenge.func1
         0     0% 53.35%      0.31s  2.85%  runtime.(*pageAlloc).scavengeOne
         0     0% 53.35%      0.31s  2.85%  runtime.(*scavengerState).init.func2
         0     0% 53.35%      0.31s  2.85%  runtime.(*scavengerState).run
         0     0% 53.35%      0.31s  2.85%  runtime.bgscavenge
     0.06s  0.55% 53.90%      0.29s  2.66%  runtime.heapBitsSetType
         0     0% 53.90%      0.29s  2.66%  runtime.sysUnused
     0.01s 0.092% 53.99%      0.29s  2.66%  runtime.sysUnusedOS
     0.01s 0.092% 54.09%      0.28s  2.57%  runtime.concatstring2
     0.28s  2.57% 56.66%      0.28s  2.57%  runtime.madvise
         0     0% 56.66%      0.27s  2.48%  github.com/mandiant/GoReSym/debug/elf.(*File).DataAfterSection
     0.02s  0.18% 56.84%      0.27s  2.48%  runtime.newobject (partial-inline)
     0.02s  0.18% 57.02%      0.26s  2.39%  github.com/mandiant/GoReSym/debug/gosym.(*LineTable).go12Funcs
         0     0% 57.02%      0.26s  2.39%  github.com/mandiant/GoReSym/objfile.(*Entry).ReadPointerSizeMem
     0.03s  0.28% 57.30%      0.25s  2.30%  runtime.slicebytetostring
     0.04s  0.37% 57.67%      0.24s  2.20%  fmt.(*pp).doPrintf
         0     0% 57.67%      0.24s  2.20%  github.com/mandiant/GoReSym/debug/elf.(*File).getSymbols64
         0     0% 57.67%      0.23s  2.11%  github.com/mandiant/GoReSym/debug/gosym.(*LineTable).go12MapFiles
     0.02s  0.18% 57.85%      0.23s  2.11%  runtime.memclrNoHeapPointersChunked
         0     0% 57.85%      0.22s  2.02%  runtime.schedule
     0.21s  1.93% 59.78%      0.21s  1.93%  indexbody

@williballenthin
Copy link
Contributor Author

good catch on the dataAfterSection issue. that was obviously an oversight and I apologize for introducing the bug.

@stevemk14ebr
Copy link
Collaborator

no worries, i've introduced plenty of bugs xD

…iling when valid moduledata symbols present, fix functab parsing on huuge binaries
@stevemk14ebr
Copy link
Collaborator

stevemk14ebr commented Aug 15, 2023

At this point I think this is ready to merge 😄 I've implemented an additional optimization that's been needed for a while now to avoid re-parsing all the pclntab candidates the second time through. Then I found the underlying bug which you had reported when parsing kubectl and fixed that. That issue was two parts:

  1. When we scan the pclntab candidates the first time through, we expect that finding a moduledata with the pclntab VA means that we have the correct candidate. This assumption was wrong, because we were using the optionally present binary symbols to locate the moduledata. When these symbols were valid, the pclntab VA was in fact not being verified leaving us with the wrong candidate in some situations. I've simply removed this branch so that we always scan for the moduledata using the pclntab VA so that this assumption is now true. This is slightly slower but ultimately more correct and simpler code.

  2. With very large binaries >= uint16_max function counts we'd hit an OOM guard that I'd added prior. Wild enough there does exist Go binaries in the wild with 100k plus functions, I've 3x-ed this limit to fix. The kubectl binary has ~100k functions 🤯

@stevemk14ebr stevemk14ebr marked this pull request as ready for review August 15, 2023 19:02
@stevemk14ebr stevemk14ebr merged commit 7e83100 into master Aug 15, 2023
3 checks passed
@stevemk14ebr stevemk14ebr deleted the fix/issue-29 branch August 15, 2023 19:02
@stevemk14ebr
Copy link
Collaborator

stevemk14ebr commented Aug 15, 2023

#36 forgot to push these

EDIT: released https://github.com/mandiant/GoReSym/releases/tag/v2.5 🥳

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

remove dependency on yara GoReSym binary is linked with libyara.so.10 on Linux
2 participants