Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow compilation for big bytes pattern #657

Closed
Alphare opened this issue Mar 24, 2020 · 4 comments · Fixed by #658
Closed

Slow compilation for big bytes pattern #657

Alphare opened this issue Mar 24, 2020 · 4 comments · Fixed by #658

Comments

@Alphare
Copy link

Alphare commented Mar 24, 2020

As promised in #524, I'm opening this issue because I found a difference in performance between Re2 (with custom bindings) and regex when compiling the regex itself.

Here is the example:

https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=eb83255739973d40d16997574ba9dd35

These are the exact bytes that are constructed after parsing Netbeans' old .hgignore and this is the exact way I am using them.

regex takes about 5ms to build the regex while Re2 takes about 1ms on a very stable, 4 core machine.

I cannot offer you an easy reproduction for Re2, however should you want to try to see the difference within Mercurial I can help.

I am using rustc version 1.34.2 and regex version 1.3.5 on a Linux amd64 machine.

Note: I've also seen a small performance difference when using is_match in Mercurial's working directory traversal code, but that is a separate issue probably.

@BurntSushi
Copy link
Member

Thanks. It would be useful to just get the actual regex next time. I had to spend a bit of time making it look nicer:

(?x)
.*~$
|.*/build$
|.*/dist$
|.*/nbproject/private$
|.*/nbproject/sqe.properties$
|.*/nbproject/findbugs.settings$
|.*/nbproject/.*\.mk$
|.*/nbproject/.*\.bash$
|.*/test/(coverage
|lib
|results
|work)$
|.*derby\.log
|.*\.class$
|.*\.DS_Store$
|.*\.orig$
|.*\.rej$
|.*\.tmp\.$
|.*\.swp$
|^tags$
|.*/gensrc$
|.*\.ant-targets-build\.xml$
|^\.classpath$
|^\.project$
|^\.externalToolBuilders/.*$
|^hs_err_pid[0-9]*\.log
|^[^/]+/external/[^/]+\.(zip
|jar
|gz
|bz2
|gem
|dll
|xpi
|dylib
|so)$
|^apisupport.osgidemo/osgidemo/netbinox/.*$
|^o.apache.tools.ant.module/external/lib$
|^classfile/\.nbintdb$
|^classfile/classfile\.jar$
|^cnd.apt/src/org/netbeans/modules/cnd/apt/impl/support/generated$
|^cnd.apt/src/org/netbeans/modules/cnd/apt/support/APTTokenTypes\.java$
|^asm/src/org/netbeans/modules/asm/base/generated$
|^cnd.asm/src/org/netbeans/modules/cnd/asm/base/generated$
|^cnd.modelimpl/src/org/netbeans/modules/cnd/modelimpl/parser/generated$
|^cnd.modelimpl/src/org/netbeans/modules/cnd/modelimpl/impl/services/evaluator/parser/generated$
|^cnd.modelimpl/src/org/netbeans/modules/cnd/modelimpl/parser/FortranLexicalPrepass.java$
|^cnd.modelimpl/src/org/netbeans/modules/cnd/modelimpl/parser/FortranParser.g$
|^cnd.modelimpl/src/org/netbeans/modules/cnd/modelimpl/parser/FortranParser2.g$
|^cnd.modelimpl/src/org/netbeans/modules/cnd/modelimpl/parser/FortranParserActionFactory.java$
|^cnd.modelimpl/src/org/netbeans/modules/cnd/modelimpl/parser/FortranStream.java$
|^cnd.modelimpl/src/org/netbeans/modules/cnd/modelimpl/parser/FortranToken.java$
|^cnd.modelimpl/src/org/netbeans/modules/cnd/modelimpl/parser/FortranTokenStream.java$
|^cnd.modelimpl/src/org/netbeans/modules/cnd/modelimpl/parser/IActionEnums.java$
|^cnd.modelimpl/src/org/netbeans/modules/cnd/modelimpl/parser/IFortranParserAction.java$
|^db/javahelp/org/netbeans/modules/usersguide$
|^dbschema/src/org/netbeans/modules/dbschema/jdbcimpl/resources/templates\.jar$
|^db.sample/test/unit/.*\.properties$
|^db.metadata.model/test/unit/data/mysql-connector-java-5.1.5-bin.jar$
|^db.sql.visualeditor/src/org/netbeans/modules/db/sql/visualeditor/parser/.*\.java$
|^editor.lib2/test/\.ant-targets-build\.xml$
|^editor/test/unit/src/org/netbeans/modules/editor/java/data/.*\.jar$
|^performance/enterprise/test/.*\.sh$
|^performance/enterprise/test/qa-functional/data/PerformanceTestData$
|^performance/enterprise/test/qa-functional/data/PerformanceTestData/nbproject/build-impl\.xml$
|^performance/enterprise/test/qa-functional/data/PerformanceTestData/nbproject/genfiles\.properties$
|^performance/enterprise/test/qa-functional/data/PerformanceTestWebApplication/nbproject/build-impl\.xml$
|^performance/enterprise/test/qa-functional/data/PerformanceTestWebApplication/nbproject/genfiles\.properties$
|^performance/enterprise/test/qa-functional/src/gui/TempTest\.java$
|^performance/enterprise/test/unit/data/PerformanceTestData$
|^swingapp/src/org/netbeans/modules/swingapp/resources/BasicShellApp\.zip$
|^swingapp/src/org/netbeans/modules/swingapp/resources/CRUDShellApp\.zip$
|^ide/launcher/windows/netbeans\.res$
|^ide.kit/test/qa-functional/data/tomcat6/
|^ide.kit/test/whitelist/temp/
|^identity.ant/nbproject/suite\.properties$
|^identity.profile.api/src/org/netbeans/modules/identity/profile/api/configurator/impl/file/jaxb$
|^identity.samples/src/org/netbeans/modules/identity/samples/resources/.*\.zip$
|^installer/default$
|^installer/.*-subst\.xml$
|^installer/asbundle/customcode/lib$
|^installer/cdc/customcode/lib$
|^installer/cnd/customcode/lib$
|^installer/coreide/customcode/lib$
|^installer/infra/build/nbi_all$
|^installer/j2ee/customcode/lib$
|^installer/jbossbundle/customcode/lib$
|^installer/jdkbundle/customcode/lib$
|^installer/jnlp/release$
|^installer/jnlp/default\.keystore$
|^installer/jnlp/jars$
|^installer/lib/packjars/classes$
|^installer/mobility/customcode/lib$
|^installer/packages/solaris/i386$
|^installer/packages/solaris/sparc$
|^installer/packages/ubuntu/netbeans-5.5/netbeans-5_5_1\.tar\.gz$
|^installer/packages/ubuntu/netbeans-5.5/netbeans-5_5_1-platform\.tar\.gz$
|^installer/packages/ubuntu/netbeans-5.5/nb55-jumbodocpack\.zip$
|^installer/packages/ubuntu/netbeans-5.5/netbeans-5_5_1-javadoc\.tar\.gz$
|^installer/profiler/customcode/lib$
|^j2ee\.dd/src/.+/impl/.+/model(_[0-9]+)*(_frag)?$
|^j2ee.dd/test/unit/data/web\.xml$
|^j2ee.dd/test/unit/data/web\.diff$
|^j2ee.dd/test/unit/data/invalid/web\.xml$
|^j2ee.toplinklib/external/toplink$
|^java.editor/test/unit/src/org/netbeans/modules/editor/java/data/.*\.jar$
|^java.helpset/javahelp/org/netbeans/modules/java/helpset/docs/gui/gui_template\.html$
|^java.navigation/test/run\.sh$
|^jellytools/test/qa-functional/data/SampleProject/build\.xml$
|^jellytools/test/qa-functional/data/SampleProject/nbproject/build-impl\.xml$
|^jellytools/test/qa-functional/data/SampleProject/nbproject/genfiles\.properties$
|^jemmy/run$
|^lib.terminalemulator/demosrc/telnet/de$
|^lexer/gen/build$
|^mvd/lib$
|^mvd/graphlib/classes$
|^vmd.game/\.classpath$
|^vmd.game/\.project$
|^vmd.components.midp/netbeans_midp_components_basic/dist/doc$
|^vmd.components.midp/netbeans_midp_components_basic/dist/lib$
|^vmd.components.midp/netbeans_midp_components_basic/dist/netbeans_midp_components_basic\.jad$
|^vmd.components.midp.pda/netbeans_midp_components_pda/dist/doc$
|^vmd.components.midp.pda/netbeans_midp_components_pda/dist/lib$
|^vmd.components.midp.pda/netbeans_midp_components_pda/dist/netbeans_midp_components_pda\.jad$
|^vmd.components.midp.wma/netbeans_midp_components_wma/dist/doc$
|^vmd.components.midp.wma/netbeans_midp_components_wma/dist/lib$
|^vmd.components.midp.wma/netbeans_midp_components_wma/dist/netbeans_midp_components_wma\.jad$
|^performance/mobility/test/.*\.sh$
|^mobility.project/test/x$
|^nbbuild/netbeans$
|^nbbuild/NetBeans-.*\.zip$
|^nbbuild/testuserdir$
|^nbbuild/NetBeans-.*\.log$
|^nbbuild/l10ndist$
|^nbbuild/nbms$
|^nbbuild/user\.build\.properties$
|^nbbuild/moduledefs-tmp\.properties$
|^nbbuild/site\.build\.properties$
|^nbbuild/external/findbugs/lib$
|^nbbuild/external/findbugs/plugin$
|^nbi/engine/native/jnilib/solaris-sparc/\.make\.state\.Debug$
|^nbi/engine/native/jnilib/solaris-x86/\.make\.state\.Debug$
|^nbi/engine/native/jnilib/windows/vc80\.idb$
|^nbi/engine/native/jnilib/windows/dist/windows-x86\.exp$
|^nbi/engine/native/jnilib/windows/dist/windows-x86\.lib$
|^nbi/engine/native/jnilib/windows/dist/windows-x64\.exp$
|^nbi/engine/native/jnilib/windows/dist/windows-x64\.lib$
|^nbi/infra/build/engine/\.ant-lib$
|^nbi/infra/build/jvm/cache$
|^nbi/infra/build/jvm/build-private\.sh$
|^nbi/infra/sandbox/src$
|^performance/lib$
|^performance/jtrace/win32/jtrace\.dll$
|^performance/jtrace/win32/jtrace\.exp$
|^performance/jtrace/win32/jtrace\.lib$
|^performance/performancetestutilities/nbproject/ide-file-targets\.xml$
|^performance/test/.*\.sh$
|^performance/test/perf/lib$
|^performance/test/perf/report$
|^performance/test/perf/baseline$
|^performance/test/qa-functional/src/gui/.*Temp.*$
|^performance/test/qa-functional/src/gui/memory_usage/.*Temp.*$
|^performance/test/reports/classes$
|^performance/test/reports/output\.xml$
|^performance/test/reports/perfresults\.jar$
|^performance/threaddemo/user\.build\.properties$
|^php.doc/libdoc/doc\.gzip$
|^php.doc/libdoc/doc\.html$
|^j2ee.sun.appsrv81/\.keystore$
|^tomcat5/\.swo$
|^performance/uml/test/.*\.sh$
|^performance/uml/test/qa-functional/src/gui/TempTest\.java$
|^versioning/clearcase/JavaApplication21/\.hgignore$
|^versioning/clearcase/JavaApplication21/\.hg$
|^versioning/test/qa-functional/data/tck\.properties$
|^web\.client\.tools\.internetexplorer/nativesrc/(activescript
|Release
|Debug
|VSIP8.0)$
|^web\.client\.tools\.internetexplorer/nativesrc/NetBeansExtension\.(h
|ncb
|suo
|vcproj\..+)$
|^web\.client\.tools\.internetexplorer/nativesrc/NetBeansExtension_(i
|p)\.c$
|^web\.client\.tools\.internetexplorer/nativesrc/dlldata\.c$
|^web.jsf.navigation/\.externalToolBuilders$
|^web.jsf.navigation/\.project$
|^web.debug/test/run\.sh$
|^websvc.jaxrpc/src/org/netbeans/modules/websvc/jaxrpc/dev/dd/gen$
|^websvc.jaxrpc/src/org/netbeans/modules/websvc/wsdl/config/impl$
|^websvc.restkit/nbms$
|^websvc.saas.api/src/org/netbeans/modules/websvc/saas/model/jaxb$
|^websvc.saas.api/src/org/netbeans/modules/websvc/saas/model/oauth$
|^websvc.saas.api/src/org/netbeans/modules/websvc/saas/model/wadl$
|^websvc.wsitmodelext/coverage$
|^xml.tax/lib/test/qa-functional/src/org/netbeans/tax/data/Delme\.xml$
|^xml/test/qa-functional/src$
|^xml/test/qa-lib/xml-test-lib\.jar$
|^xml.tools/test/qa-functional/src/org/netbeans/modules/xsl/action/data/out/document\.html$
|^hibernate/test/unit/data/db-derby-10\.2\.2\.0-bin$
|.*\.orig\..*$
|.*\.chg\..*$
|.*\.conflict\~$
|^nbbuild/jna\.keystore$
|^nbi/engine/testData$
|^o\.n\.bootstrap/launcher/windows/nbexec_exe\.res$
|^installer/mac/newbuild/dist_dmg$
|^cordova\.platforms\.ios/external/ios-sim$
|^cnd\.discovery/tools/BuildTrace/\.dep\.inc$

BurntSushi added a commit that referenced this issue Mar 24, 2020
This fixes a pretty bad performance bug in the NFA compiler. In
particular, c_char was implemented by diverting to c_class, which is
correct, but rather costly to do for every single character in a regex.
This causes way more things than necessary to go through the class
compilation infrastructure, which includes the suffix caching.

We fix this by just special casing c_char. This speeds up regex
compilation in #657 by around 30%.

Fixes #657
@BurntSushi
Copy link
Member

Once I profiled it, the problem was pretty clear: the compiler was using the class compilation infrastructure to handle even plain literal characters. In big regexes like this, that adds up quite a bit. I opened #658 to fix that. In my test, it got about a 30% improvement, so I think that should narrow the gap considerably for this particular case.

Once that was fixed, I didn't see any other obvious hotspots that were easy to fix. Interestingly, a good chunk of the time was actually spend in the parsing and translation steps. I haven't spent much (if any) time optimizing that, so that may be the next place to go.

I ported your benchmark to regex-automata's NFA compiler, and it does seem to be a bit faster there. Although I don't think I did a completely fair comparison.

BurntSushi added a commit that referenced this issue Mar 24, 2020
This fixes a pretty bad performance bug in the NFA compiler. In
particular, c_char was implemented by diverting to c_class, which is
correct, but rather costly to do for every single character in a regex.
This causes way more things than necessary to go through the class
compilation infrastructure, which includes the suffix caching.

We fix this by just special casing c_char. This speeds up regex
compilation in #657 by around 30%.

Fixes #657
@BurntSushi
Copy link
Member

The fix for this is on crates.io in regex 1.3.6.

@Alphare
Copy link
Author

Alphare commented Mar 25, 2020

It would be useful to just get the actual regex next time. I had to spend a bit of time making it look nicer
Sorry, I was trying to make sure that I didn't accidentally escape something. Net time I'll give you the pattern directly.

Once I profiled it, the problem was pretty clear: the compiler was using the class compilation infrastructure to handle even plain literal characters. In big regexes like this, that adds up quite a bit. I opened #658 to fix that. In my test, it got about a 30% improvement, so I think that should narrow the gap considerably for this particular case.

I am seeing the same performance improvement on my side.

Once that was fixed, I didn't see any other obvious hotspots that were easy to fix. Interestingly, a good chunk of the time was actually spend in the parsing and translation steps. I haven't spent much (if any) time optimizing that, so that may be the next place to go.

Ok, thank you for the quick fix. I will be keeping my eye on new releases of regex. :)

The fix for this is on crates.io in regex 1.3.6.

regex was introduced as the default option in the experimental Rust extensions of Mercurial for ignore mechanism in the current default branch. It should in the next (5.4) release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants