-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
swap out yara pattern scanning for binary regexps #30
Conversation
tests pass when i run them locally. |
unfortunately the performance of the binaryregexp is significantly worse than yara. in the same environment, the test cases on master run in 67s while this PR takes 346s (5x). only now am i seeing that Go's regular expression implementation is really slow: https://github.com/mariomka/regex-benchmark#performance there are certainly some optimizations we can make to avoid doing so much regexp scanning; i will think on it and try to propose which strategies we might try first. |
Awe that's too bad to hear. Unfortunately I switched to yara originally for performance reasons so this makes some sense. |
try to improve performance by doing a single pass over the data, looking for any of the pattern candidates, and fetching these via the SubexpIndex functionality.
initially in this PR i was lazy with the regular expressions: i ran five expressions over the data separately, one for each pattern: 6439aec#diff-5c52596a150c1cf395c3609f471a82dbdeca26bccb840730182b301007cf2699R87 i assumed this was an obvious place to optimize, so i put all five patterns into a single pattern like Line 96 in d09fb19
surprisingly, the performance doesnt really change much. perhaps the regex engine does 5x as much work during the merged pattern scan? ugh. the profile data looks like this, currently:
|
one thing thats suspicious is how often the scanner routine is called: @williballenthin ➜ /workspaces/GoReSym (fix/issue-29) $ go test -v -run "^TestAllVersions/117/testproject_lin$"
findModuleInitPCHeader: len(data)= 1977516
findModuleInitPCHeader: len(data)= 1463668
findModuleInitPCHeader: len(data)= 1245252
findModuleInitPCHeader: len(data)= 1244874
findModuleInitPCHeader: len(data)= 1243622
findModuleInitPCHeader: len(data)= 1243534
findModuleInitPCHeader: len(data)= 1243534
findModuleInitPCHeader: len(data)= 880038
findModuleInitPCHeader: len(data)= 880006
findModuleInitPCHeader: len(data)= 812998
findModuleInitPCHeader: len(data)= 782262
findModuleInitPCHeader: len(data)= 589998
findModuleInitPCHeader: len(data)= 568678
findModuleInitPCHeader: len(data)= 568397
findModuleInitPCHeader: len(data)= 457639
findModuleInitPCHeader: len(data)= 436120
findModuleInitPCHeader: len(data)= 436078
findModuleInitPCHeader: len(data)= 234232
findModuleInitPCHeader: len(data)= 131110
findModuleInitPCHeader: len(data)= 95205
findModuleInitPCHeader: len(data)= 95105
findModuleInitPCHeader: len(data)= 45065
findModuleInitPCHeader: len(data)= 1977516
findModuleInitPCHeader: len(data)= 1463668
findModuleInitPCHeader: len(data)= 1245252
findModuleInitPCHeader: len(data)= 1244874
findModuleInitPCHeader: len(data)= 1243622
findModuleInitPCHeader: len(data)= 1243534
findModuleInitPCHeader: len(data)= 1243534
findModuleInitPCHeader: len(data)= 880038
findModuleInitPCHeader: len(data)= 880006
findModuleInitPCHeader: len(data)= 812998
findModuleInitPCHeader: len(data)= 782262
findModuleInitPCHeader: len(data)= 589998
findModuleInitPCHeader: len(data)= 568678
findModuleInitPCHeader: len(data)= 568397
findModuleInitPCHeader: len(data)= 457639
findModuleInitPCHeader: len(data)= 436120
findModuleInitPCHeader: len(data)= 436078
findModuleInitPCHeader: len(data)= 234232
findModuleInitPCHeader: len(data)= 131110
findModuleInitPCHeader: len(data)= 95205
findModuleInitPCHeader: len(data)= 95105
findModuleInitPCHeader: len(data)= 45065
--- PASS: TestAllVersions (7.64s)
--- PASS: TestAllVersions/117/testproject_lin (7.48s)
PASS
ok github.com/mandiant/GoReSym 7.658s for the single ELF file the scanner is invoked 44 times. perhaps we can invoke the scanner once across the entire file and re-use the results, since they shouldn't change (GoReSym doesn't change the input file).
for _, sec := range f.elf.Sections {
...
data := f.elf.DataAfterSection(sec)
...
sigResults := findModuleInitPCHeader(data, sec.Addr)
}
so there are two things to do:
scan file data oncethis will take a little work, since some of the pointers that are recovered are relative to the section being scanned ( scan pcln candidates oncethe pcln candidate scan doesn't depend on the Go text base address, so when the scan is restarted with a valid Go text base address, we should re-use the results from before. we can cache these in the Entry, perhaps. Line 218 in ad76d74
in ebc220a we implement this caching and find that the runtime improves by 50%, as expected! |
build_all.sh
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please restore this, we don't need the CGO linking lines anymore but this exists so that I can automatically cross-compile all versions of GoReSym for all platforms in one command when it's time to cut releases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only change required, looks great otherwise. Thanks for exploring the removal of yara, I'm happy with the result here :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, ok, sure no problem!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks!
this PR is still too slow to merge. i'm going to work on performance optimizations before i recommend that this be merged. i expect to be able to optimize the outer scanning loops such that the net performance change here is within 50% of today's running time. the PR is around 7x slower right now. regex scanning will remain slow, nothing we can do about that, but we'll just scan much less data. the result is a GoReSym of similar performance but no linking difficulties. i'll do most of the perf work in other branches so as not to confuse the diff here. |
Sounds good. Let me know if you'd like to discuss this first before you put too much work in. It's possible we may be able to re-visit the core algorithm itself. I've focused on correctness so far, optimizations have been mostly ignored till now except for when necessary to keep performance acceptable. Not to mention it's spaghettified a little as samples exercising new edge cases have been found and fixed. |
great, will do. i had thought i would have time immediately but there's a lot of GSoC stuff to close out. will do this as soon as i can since i think this will help VT. |
I had an idea for a possible optimization we could try. I read a blog by burntsushi a while ago on techniques he utilized for efficient regex searching in his library. One of the simple techniques used was to not start a regex scan until after first performing a fast prefix scan to create a candidate match set, we should try this. We could adapt the yara to regex routine to extract the 'longest fixed sequence' of bytes per pattern and scan over all file bytes once per pattern matching this sequence, and return a candidate regex match of length [-len(regex), len(regex)]. We then perform binary regex over these candidates. We use a window of negative length to support when the 'longest fixed sequence' is in the middle of a pattern rather than the start. This makes an assumption that the binary regex crate does not already do this, or even if it does then the second assumption that go's memcmp would be faster than the compare the regex crate does over enough data. A second issue we have is that we have multiple patterns. A hopefully easy solution would be to somehow reliably determine which arch (independent of os) a binary is using and only scan for that pattern. If we cannot do that reliably then we will need to experiment on if we should do multiple candidate regex scans over the data per signature or scan once with multiple comparisons at each step. Thoughts? |
For (2) we know from one of the other PRs that Go's string.Index() routine is very fast, so I do expect this to be a lot faster than the regex logic. When making changes in this part of the code I don't feel very confident because I'm not sure if any test cases cover these scenarios. Can you confirm and/or can we find a few more sample files to exercise the pattern scanning functionality? |
This feature exists mostly to combat |
I've implemented the needle optimization, it seems to have large increases to performance based on limited empirical testing 😄 Need to do more testing and benchmarking still but seems like a win!
|
After fixing up the dataAfterSection cache to work for all OS's we get another 3x speedup:
|
good catch on the dataAfterSection issue. that was obviously an oversight and I apologize for introducing the bug. |
no worries, i've introduced plenty of bugs xD |
…iling when valid moduledata symbols present, fix functab parsing on huuge binaries
At this point I think this is ready to merge 😄 I've implemented an additional optimization that's been needed for a while now to avoid re-parsing all the pclntab candidates the second time through. Then I found the underlying bug which you had reported when parsing
|
#36 forgot to push these EDIT: released https://github.com/mandiant/GoReSym/releases/tag/v2.5 🥳 |
closes #29
closes #23