Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve json parser performance #5538

Closed
wants to merge 10 commits into from

Conversation

liguozhong
Copy link
Contributor

@liguozhong liguozhong commented Mar 4, 2022

What this PR does / why we need it:
Improve json parser performance, from 2913 ns/op to 1132 ns/op

Which issue(s) this PR fixes:
Fixes [#4328]
reimplement pr: #5417

Special notes for your reviewer:
New Benchmark_Parser

Benchmark_Parser/json/no_labels_hints-12            250380        4772 ns/op
Benchmark_Parser/json/labels_hints-12               962606        1172 ns/op
Benchmark_Parser/jsonParser-not_json_line/no_labels_hints-12          81525914          13.54 ns/op
Benchmark_Parser/jsonParser-not_json_line/labels_hints-12              6865554         155.3 ns/op

Old Benchmark_Parser

Benchmark_Parser/json/no_labels_hints-12            299535        3892 ns/op
Benchmark_Parser/json/labels_hints-12               355948        2913 ns/op
Benchmark_Parser/jsonParser-not_json_line/no_labels_hints-12          38245290          30.84 ns/op
Benchmark_Parser/jsonParser-not_json_line/labels_hints-12             39412749          31.16 ns/op

key change.
use "github.com/buger/jsonparser" replace "github.com/json-iterator/go"
image

Checklist

  • Documentation added
  • Tests updated
  • Add an entry in the CHANGELOG.md about the changes.

@liguozhong liguozhong requested a review from a team as a code owner March 4, 2022 09:47
@cyriltovena
Copy link
Contributor

Can you share the allocations difference ?

@liguozhong
Copy link
Contributor Author

liguozhong commented Mar 7, 2022

Can you share the allocations difference ?

@cyriltovena hi ,I Replaced jsoniter "github.com/json-iterator/go" in "JSONParser" struct with "github.com/buger/jsonparser", but encountered two problems that the test could not pass. give me some hints.
FAIL: Test_jsonParser_Parse/escaped
FAIL: Test_jsonParser_Parse/utf8_error_rune

image

@liguozhong
Copy link
Contributor Author

liguozhong commented Mar 7, 2022

Can I delete these two cases

{
			"escaped",
			[]byte(`{"counter":1,"foo":"foo\\\"bar", "price": {"_net_":5.56909}}`),
			labels.Labels{},
			labels.Labels{
				{Name: "counter", Value: "1"},
				{Name: "price__net_", Value: "5.56909"},
				{Name: "foo", Value: `foo\"bar`},
			},
		},
		{
			"utf8 error rune",
			[]byte(`{"counter":1,"foo":"�", "price": {"_net_":5.56909}}`),
			labels.Labels{},
			labels.Labels{
				{Name: "counter", Value: "1"},
				{Name: "price__net_", Value: "5.56909"},
				{Name: "foo", Value: ""},
			},
		},

@splitice
Copy link
Contributor

@liguozhong those look like tests of UTF8 and complex json syntax and look valid.

Whats the problem?

The first failure looks like double escaping.

That second test seems to imply that UTF8 error rune should not be propagated into the result. I'm not sure the reasoning behind this decision myself, perhaps someone else can answer?

keys := make([]string, 0)
keys = append(keys, field)

cacheKeys, ok := j.jsonKeyCache[field]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, this seems to be an orthogonal improvement to the parser. Could you factor out the cache in another PR? The benchmarks do not seem to be a fair comparison between the different parser implementations. How much improvements does the cache yield?

Copy link
Contributor Author

@liguozhong liguozhong Mar 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.remove cache benchmark result.

Benchmark_Parser
Benchmark_Parser/json
Benchmark_Parser/json/no_labels_hints
Benchmark_Parser/json/no_labels_hints-12         	  217604	      5228 ns/op
Benchmark_Parser/json/labels_hints
Benchmark_Parser/json/labels_hints-12            	  528496	      2223 ns/op
Benchmark_Parser/jsonParser-not_json_line
Benchmark_Parser/jsonParser-not_json_line/no_labels_hints
Benchmark_Parser/jsonParser-not_json_line/no_labels_hints-12         	84411745	        12.85 ns/op
Benchmark_Parser/jsonParser-not_json_line/labels_hints
Benchmark_Parser/jsonParser-not_json_line/labels_hints-12            	 6507972	       160.3 ns/op

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

buger/jsonparser with cache : 1172 ns/op
buger/jsonparser without cache: 2223 ns/op
json-iterator: 2913 ns/op

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cache is very effective for this json library, especially when the json key is deep.

but two problems that the test could not pass. give me some hints.
FAIL: Test_jsonParser_Parse/escaped
FAIL: Test_jsonParser_Parse/utf8_error_rune

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patience!

@liguozhong
Copy link
Contributor Author

@liguozhong those look like tests of UTF8 and complex json syntax and look valid.

Whats the problem?

The first failure looks like double escaping.

That second test seems to imply that UTF8 error rune should not be propagated into the result. I'm not sure the reasoning behind this decision myself, perhaps someone else can answer?

This json library is hard to implement these two tests

@liguozhong liguozhong closed this Mar 10, 2022
@liguozhong liguozhong reopened this Mar 10, 2022
Copy link
Contributor

@splitice splitice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a loki reviewer, but I did a read through and a check of the jsonparser documentation.

_ = it.ReadMapCB(j.parseMap(""))
if it.Error != nil && it.Error != io.EOF {
return it.Error
func (j *JSONParser) handleJson(preKey string, line []byte) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldnt it be better if the arguments were: line []byte, prefixKey string (optional like argument at end)

}
j.lbs.Set(field, string(v))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If JSON rune handling was required a replace could be performed here. Either as a string replace, or probably better as a custom string builder

Perhaps something like:

var sb strings.Builder
for _, c := range string(v) {
  if c != '\uFFFD' {
    sb.WriteRune(c)
  }
}
return j.lbs.Set(field, sb.String())

Copy link
Contributor

@splitice splitice Mar 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively (and better) to support escaping in general (which currently you don't)

You could try calling GetString(v) this should also solve the failing escaping test (Test_jsonParser_Parse/escaped).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, my cassandra keeps crashing this week and I'm busy with two issues this week (#5667 and #5595) , when my cassandra is stable, I'll come back to fix the json parser's with your tips

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about an unsafe to string here? This would avoid copying the data until LabelResult.String() is called. It's tricky though. The underlying line must not change and LabelResult.String() would have to unescape. In any case, we must start with some memory profiling.

return it.Error
func (j *JSONParser) handleJson(preKey string, line []byte) error {
err := jsonparser.ObjectEach(line, func(key []byte, val []byte, valueType jsonparser.ValueType, offset int) error {
jsonKey := preKey + sanitizeLabelKey(string(key), true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line here, specially string(key) makes the whole PR back to where it was before.

I suggest you take a look at difference in memory allocations, we're trying to lower them and I don't think it's happening.

Use benstats to compare results.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest you also look at benchmark profile, go test -bench -memprofile mem.out to figure where the work needs to be done.

Again the whole point is to avoid allocation as much as possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a quick look, it seems that we need to create a pool for the slice of bytes that will contains the potential keys. Then try to work with slice of bytes only, avoiding the string(key) as much as possible.

Copy link
Contributor

@jeschkies jeschkies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a look at our production profiling and found that the JSON parser has the highest impact on allocations and not CPU. buger/jsonparser might be even worse. I hope I get to share the flamegraphs.

That means it is important to share the allocations. However, it made me wonder. What do you think about making readValue avoid any copies? It could read []byte slice instead. I've traced the usage of the labels. At some point we call String() on the LabelSet but there is no need to copy the bytes before.

Sorry opening this rabbit hole. Just wanted to start a discussion.

EDIT: Just saw that Cyril mentions the same. Be aware though. ReadStringAsSlice seems to make a copy as well.

@jeschkies
Copy link
Contributor

I've just found that burger/jsonparer would allow us to access the underlying bytes without allocations.

@splitice
Copy link
Contributor

splitice commented Apr 21, 2022

Please don't forget that the underlying bytes may contain escape sequences when doing benchmarks.

@cyriltovena
Copy link
Contributor

The direction is good. The new library seems to allow what we want todo, avoid allocations.

@splitice is right about those tests, we added them as regression test, we had this issue in production with UTF8 characters.

I'm happy to have the whole code rework, but I think we should be cautious about allocation here.

@stale
Copy link

stale bot commented Jun 12, 2022

Hi! This issue has been automatically marked as stale because it has not had any
activity in the past 30 days.

We use a stalebot among other tools to help manage the state of issues in this project.
A stalebot can be very useful in closing issues in a number of cases; the most common
is closing issues or PRs where the original reporter has not responded.

Stalebots are also emotionless and cruel and can close issues which are still very relevant.

If this issue is important to you, please add a comment to keep it open. More importantly, please add a thumbs-up to the original issue entry.

We regularly sort for closed issues which have a stale label sorted by thumbs up.

We may also:

  • Mark issues as revivable if we think it's a valid issue but isn't something we are likely
    to prioritize in the future (the issue will still remain closed).
  • Add a keepalive label to silence the stalebot if the issue is very common/popular/important.

We are doing our best to respond, organize, and prioritize all issues but it can be a challenging task,
our sincere apologies if you find yourself at the mercy of the stalebot.

@stale stale bot added the stale A stale issue or PR that will automatically be closed. label Jun 12, 2022
@stale stale bot closed this Jul 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/L stale A stale issue or PR that will automatically be closed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants