Skip to content

Conversation

@basil
Copy link
Contributor

@basil basil commented Nov 14, 2024

Parsing https://updates.jenkins.io/update-center.json is extremely slow (hundreds of times slower than jq, for example). It consistently takes about 8 seconds and allocates about 170 GiB of RAM over the course of the parsing procedure. Profiling showed lots of regular expression compilation like

  java.lang.Thread.State: RUNNABLE
	at java.util.regex.Pattern.compile(java.base@11.0.5/Pattern.java:1757)
	at java.util.regex.Pattern.<init>(java.base@11.0.5/Pattern.java:1428)
	at java.util.regex.Pattern.compile(java.base@11.0.5/Pattern.java:1068)
	at net.sf.json.regexp.JdkRegexpMatcher.<init>(JdkRegexpMatcher.java:38)
	at net.sf.json.regexp.JdkRegexpMatcher.<init>(JdkRegexpMatcher.java:31)
	at net.sf.json.regexp.RegexpUtils.getMatcher(RegexpUtils.java:39)
	at net.sf.json.util.JSONTokener.matches(JSONTokener.java:111)
	at net.sf.json.JSONObject._fromJSONTokener(JSONObject.java:912)
	at net.sf.json.JSONObject.fromObject(JSONObject.java:156)
	at net.sf.json.util.JSONTokener.nextValue(JSONTokener.java:348)
	at net.sf.json.JSONArray._fromJSONTokener(JSONArray.java:1131)
	at net.sf.json.JSONArray.fromObject(JSONArray.java:125)
	at net.sf.json.util.JSONTokener.nextValue(JSONTokener.java:351)
	at net.sf.json.JSONObject._fromJSONTokener(JSONObject.java:955)
	at net.sf.json.JSONObject.fromObject(JSONObject.java:156)
	at net.sf.json.util.JSONTokener.nextValue(JSONTokener.java:348)
	at net.sf.json.JSONObject._fromJSONTokener(JSONObject.java:955)
	at net.sf.json.JSONObject.fromObject(JSONObject.java:156)
	at net.sf.json.util.JSONTokener.nextValue(JSONTokener.java:348)
	at net.sf.json.JSONObject._fromJSONTokener(JSONObject.java:955)
	at net.sf.json.JSONObject._fromString(JSONObject.java:1145)
	at net.sf.json.JSONObject.fromObject(JSONObject.java:162)
	at net.sf.json.JSONObject.fromObject(JSONObject.java:132)

and string allocation like

   java.lang.Thread.State: RUNNABLE
	at java.lang.String.<init>(String.java:207)
	at java.lang.String.substring(String.java:1933)
	at net.sf.json.util.JSONTokener.matches(JSONTokener.java:110)
	at net.sf.json.JSONObject._fromJSONTokener(JSONObject.java:912)
	at net.sf.json.JSONObject.fromObject(JSONObject.java:156)
	at net.sf.json.util.JSONTokener.nextValue(JSONTokener.java:348)
	at net.sf.json.JSONArray._fromJSONTokener(JSONArray.java:1131)
	at net.sf.json.JSONArray.fromObject(JSONArray.java:125)
	at net.sf.json.util.JSONTokener.nextValue(JSONTokener.java:351)
	at net.sf.json.JSONObject._fromJSONTokener(JSONObject.java:955)
	at net.sf.json.JSONObject.fromObject(JSONObject.java:156)
	at net.sf.json.util.JSONTokener.nextValue(JSONTokener.java:348)
	at net.sf.json.JSONObject._fromJSONTokener(JSONObject.java:955)
	at net.sf.json.JSONObject.fromObject(JSONObject.java:156)
	at net.sf.json.util.JSONTokener.nextValue(JSONTokener.java:348)
	at net.sf.json.JSONObject._fromJSONTokener(JSONObject.java:955)
	at net.sf.json.JSONObject._fromString(JSONObject.java:1145)
	at net.sf.json.JSONObject.fromObject(JSONObject.java:162)
	at net.sf.json.JSONObject.fromObject(JSONObject.java:132)

There are two issues here: repeatedly compiling a pattern where a simple .startsWith("null") would have sufficed, and repeatedly copying a massive string just to search a few characters in it. See flame graphs before and after.

before

after

I added a new unit test. This has also been shipping in production in our fork of json-lib to Jenkins users in 2.456 since May without any reported issues.

@aalmiray aalmiray added this to the 3.2.0 milestone Nov 15, 2024
@aalmiray aalmiray merged commit db76e69 into kordamp:master Nov 15, 2024
1 check passed
@aalmiray
Copy link
Collaborator

Thank you 😄

@basil basil deleted the starts-with branch November 15, 2024 20:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants