Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop recompiling pattern at runtime #1642

Closed
rmannibucau opened this issue Jan 16, 2024 · 4 comments
Closed

Stop recompiling pattern at runtime #1642

rmannibucau opened this issue Jan 16, 2024 · 4 comments
Labels

Comments

@rmannibucau
Copy link

Is your feature request related to a problem? Please describe.
https://github.com/plantuml/plantuml/blob/master/src/net/sourceforge/plantuml/text/TLineType.java#L56 uses pattern matching but not from a precompiled pattern, from a new pattern for each call which makes the overall processing slow when having a lot of of "big" graphs.

Describe the solution you'd like
Move all regexes to static Pattern instance and just use the Matcher#matches instead of String.matches method.

Describe alternatives you've considered
Fork?

Additional context
Getting that from asciidoctorj integration - not important but gives the context.
Here is the interesting thread dump stack:

"main" #1 prio=5 os_prio=0 cpu=144024.33ms elapsed=308.49s tid=0x00007f3fe4023cd0 nid=0x27 runnable  [0x00007f3feba21000]
   java.lang.Thread.State: RUNNABLE
	at java.util.regex.Pattern.compile(java.base@17.0.3/Pattern.java:1758)
	at java.util.regex.Pattern.<init>(java.base@17.0.3/Pattern.java:1430)
	at java.util.regex.Pattern.compile(java.base@17.0.3/Pattern.java:1069)
	at java.util.regex.Pattern.matches(java.base@17.0.3/Pattern.java:1174)
	at java.lang.String.matches(java.base@17.0.3/String.java:2842)
	at net.sourceforge.plantuml.text.TLineType.getFromLineInternal(TLineType.java:62)
@arnaudroques
Copy link
Contributor

Thanks for the feedback: this should be fixed with last snapshot (1.2024.0beta2).

We would appreciate it if you could confirm whether you notice any changes. While we suspect that regexes may not have been recompiled on every call due to internal caching mechanisms for frequently used regex patterns, we cannot confirm this with certainty. Therefore, implementing this change was a prudent decision.

@rmannibucau
Copy link
Author

Hi @arnaudroques , I'm not sure of which cache you are talking about (there is none in the JVM) but it is fixed now, plantuml is no more the slowest part in the site rendering, thanks a lot.

@arnaudroques
Copy link
Contributor

arnaudroques commented Jan 18, 2024

Many thanks for the feedback, I was wrong about my caching statement. Glad it works better now :-)

Since we are talking about perf, we could easily add another optimisation here: most of the regex starts with ^\s*! so we could first do a match test on ^\s*! and only check the full list of regexes in case of positive match.
Since most of the time most, lines do not start with ! it probably makes sense.

Any though on this one?

@rmannibucau
Copy link
Author

Looks not bad but it is mainly compilation cost which was huge so guess you fixed the 80%, up to you to play with the remaining 20 ;).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants