-
Notifications
You must be signed in to change notification settings - Fork 187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Address 2.0 performance issue #321
Comments
Two things:
- first, be sure you’re apples-to-apples with compiler optimization flags between the two builds (e.g., -O2). This is obvious, but obvious things make good checklist items, and the number of times I’ve had devs waste hours/days on performance degradation issues only to smack their heads and exclaim “oh I wasn’t building in release mode”…
- second, use source-level profiling (gprof? My attempt with Google’s libprofile didn’t work well, but it’s finicky) between the two builds so that you know where b_e 2.0 is spending its time. Source-level profiling would immediately confirm your hypothesis about sbuf, and may also reveal unexpected details (e.g., an inordinate amount of time spent waiting on mutexes).
Desperately hoping for schools to reopen soon so I can assist…
… On Jan 5, 2022, at 8:29 AM, Simson L. Garfinkel ***@***.***> wrote:
BE2.0 is now highly reliable and compiles on AMD. Now it is time to address performance issues.
Something about the handling of data is much slower on 2.0. It may be all of the moderated sbuf access, and if so, we will need a way to disable all of the runtime checks.
Reading just the ubnist1 disk image with -x all, BE1.6 reads in 4.2 seconds on my macMini and BE2.0 reads in 4.0 seconds.
Running with -E json with 6 cores produces speeds of BE16: 5.5, 5.5, 5.4 and BE2.0: 9.0, 9.0, 9.0
Running with -E json -j1 produces BE16: 18.9, 18.7 and BE2.0: 42, 42
So we have an easy test case with a scanner that's easy to work with. This should be relatively straightforward to debug. @jonstewart if you have any suggestions, please let me know. Otherwise I'll probably follow this approach:
Add a DEBUG flag for scan_json to have it just return, to have it compute the histogram with the sbuf machinery and return, and to have it access every byte with the sbuf machinery and return.
Create a performance benchmark script that can reliably benchmark two executables and produce summary results.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications on the go with GitHub Mobile for iOS or Android.
You are receiving this because you were mentioned.
|
Gprof is the # 2 approach if simple code inspection and some I really suspect is is the super-safe sbuf system I created. Too many runtime checks…. But now I can work on efficiency given that I have correctness |
Well look at this:
Turns out that the rewritten autoconf file was not running the optimizer when compiling CXXFLAGS.
I've now gone through the |
🎉
… On Jan 5, 2022, at 6:26 PM, Simson L. Garfinkel ***@***.***> wrote:
Well look at this:
configure: *************************************
configure: *************************************
configure: PACKAGE_NAME: BULK_EXTRACTOR
configure: PACKAGE_VERSION: 2.0.0-beta3
configure: CC: gcc
configure: CXX: g++ -std=c++17
configure: CPPFLAGS: -I/opt/local/include -I/usr/local/include -DUTC_OFFSET=+0000
configure: CFLAGS: -g -g -O3 -fPIC
configure: CXXFLAGS: -g -pthread -Wall -MD -Wpointer-arith -Wshadow -Wwrite-strings -Wcast-align -Wredundant-decls -Wdisabled-optimization -Wfloat-equal -Wmultichar -Wmissing-noreturn -Woverloaded-virtual -Wsign-promo -funit-at-a-time -Weffc++ -fPIC
configure: LIBS: -lewf -lstdc++ -lexpat -lz -ldl -ltermcap -lsqlite3 -lexpat
configure: LDFLAGS: -L/opt/local/lib -L/usr/local/lib
Turns out that the rewritten autoconf file was not running the optimizer when compiling CXXFLAGS.
CXXFLAGS fixed.
I've now gone through the scan_aes and discovered it was going to the end of the buffer, rather than the end of the page. I changed that and its going much faster now too.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications on the go with GitHub Mobile for iOS or Android.
You are receiving this because you were mentioned.
|
Discovery
vs.
And, get this --- the flex scanners don't respect the pagesize, and appear to scan to the end of the margin.
|
Hasn't that always been a problem with the flex scanners? They have to keep
reading past the pagesize to account for any boundary hits, but there's not
a great way of telling them to stop looking for new hits. That's a bit of
cleverness in lightgrep of which I'm unduly proud, that you can provide
more input to lg_closeout_search() to resolve any pending hits from
previous input, but won't start new hits; and once anything pending has
been resolved, it exits. Typically that just needs a couple bytes of input
to get back to the zero state.
Is it possible to recognize transitioning to the zero state with flex, so
you could throw an exception early and not have to wait/hope for a full
matching hit in the margin?
…On Wed, Jan 5, 2022 at 10:27 PM Simson L. Garfinkel < ***@***.***> wrote:
Discovery #2: margin size got inadvertently increased from 4MiB to 16MiB:
(base) ***@***.*** debug % ~/gits/bulk_extractor-1.6/src/bulk_extractor -h | grep margin
-g NN - specify margin (default 4194304)
(base) ***@***.*** debug %
vs.
~/gits/bulk_extractor/src/bulk_extractor -j1 -1 -E email -o 23.51s user 0.44s system 104% cpu 22.889 total
(base) ***@***.*** ~ % ~/gits/bulk_extractor/src/bulk_extractor -h | grep margin
-g, --marginsize arg margin size in bytes (default: 16777216)
(base) ***@***.*** ~ %
And, get this --- the flex scanners don't respect the pagesize, and appear
to scan to the end of the margin.
- Modify flex scanners so that if a match starts inside the margin it
throws an exception that ends the flex scanner.
- Update BE2 codebase so that the default margin is 4MiB and not 16MiB.
—
Reply to this email directly, view it on GitHub
<#321 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABKVVNGOKXY6XAAIX4LRYLUUUD2DANCNFSM5LJ5DAAA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I just found a way to tell them to stop when the regular expression scanner crosses into the margin. You'll like it. |
Before implementing new margin hack (and before shortening the margin from 16MiB to 4MiB):
After:
Improvement from 138.56s to 80.65s of user time. It's still not as good as BE1.6, probably because of the bigger margin size.... Check out 4007d30 |
Now with the margin lowered from 16MiB to 4MiB:
LOOK AT THAT* -- It's even faster than version 1.6. About time. That's the exception logic. Now to add it to scan_accts. |
Ick. Exception causes allocated values not to be freed by the lexer. https://github.com/simsong/bulk_extractor/runs/4723066087?check_suite_focus=true Looks like I'll need to just move the point to the end. |
I've run many performance tests now with BE16 vs. BE20beta4 (which includes the flex improvements noted above). The table below is all done with the domexusers disk. All are run
With the recursive analysis, the extra time spent in the big scanners (accts, net, email) gets multiplied multiple times. So I'm nearly there, and the new infrastructure is amazing, and by throwing a lot of cores at the thing, BE2 is faster than BE1. But once I get the net, email and accts scanner faster, BE2 will be much faster than BE1. So I'm going to close this thread and open a thread for net , accts, and email. |
BE2.0 is now highly reliable and compiles on AMD. Now it is time to address performance issues.
Something about the handling of data is much slower on 2.0. It may be all of the moderated sbuf access, and if so, we will need a way to disable all of the runtime checks.
Reading just the ubnist1 disk image with
-x all
, BE1.6 reads in 4.2 seconds on my macMini and BE2.0 reads in 4.0 seconds.Running with
-E json
with 6 cores produces speeds of BE16: 5.5, 5.5, 5.4 and BE2.0: 9.0, 9.0, 9.0Running with
-E json -j1
produces BE16: 18.9, 18.7 and BE2.0: 42, 42So we have an easy test case with a scanner that's easy to work with. This should be relatively straightforward to debug. @jonstewart if you have any suggestions, please let me know. Otherwise I'll probably follow this approach:
scan_json
to have it just return, to have it compute the histogram with the sbuf machinery and return, and to have it access every byte with the sbuf machinery and return.The text was updated successfully, but these errors were encountered: