-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve json parser performance #5538
Conversation
Can you share the allocations difference ? |
@cyriltovena hi ,I Replaced jsoniter "github.com/json-iterator/go" in "JSONParser" struct with "github.com/buger/jsonparser", but encountered two problems that the test could not pass. give me some hints. |
Can I delete these two cases {
"escaped",
[]byte(`{"counter":1,"foo":"foo\\\"bar", "price": {"_net_":5.56909}}`),
labels.Labels{},
labels.Labels{
{Name: "counter", Value: "1"},
{Name: "price__net_", Value: "5.56909"},
{Name: "foo", Value: `foo\"bar`},
},
},
{
"utf8 error rune",
[]byte(`{"counter":1,"foo":"�", "price": {"_net_":5.56909}}`),
labels.Labels{},
labels.Labels{
{Name: "counter", Value: "1"},
{Name: "price__net_", Value: "5.56909"},
{Name: "foo", Value: ""},
},
}, |
@liguozhong those look like tests of UTF8 and complex json syntax and look valid. Whats the problem? The first failure looks like double escaping. That second test seems to imply that UTF8 error rune should not be propagated into the result. I'm not sure the reasoning behind this decision myself, perhaps someone else can answer? |
pkg/logql/log/parser.go
Outdated
keys := make([]string, 0) | ||
keys = append(keys, field) | ||
|
||
cacheKeys, ok := j.jsonKeyCache[field] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, this seems to be an orthogonal improvement to the parser. Could you factor out the cache in another PR? The benchmarks do not seem to be a fair comparison between the different parser implementations. How much improvements does the cache yield?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.remove cache benchmark result.
Benchmark_Parser
Benchmark_Parser/json
Benchmark_Parser/json/no_labels_hints
Benchmark_Parser/json/no_labels_hints-12 217604 5228 ns/op
Benchmark_Parser/json/labels_hints
Benchmark_Parser/json/labels_hints-12 528496 2223 ns/op
Benchmark_Parser/jsonParser-not_json_line
Benchmark_Parser/jsonParser-not_json_line/no_labels_hints
Benchmark_Parser/jsonParser-not_json_line/no_labels_hints-12 84411745 12.85 ns/op
Benchmark_Parser/jsonParser-not_json_line/labels_hints
Benchmark_Parser/jsonParser-not_json_line/labels_hints-12 6507972 160.3 ns/op
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
buger/jsonparser with cache : 1172 ns/op
buger/jsonparser without cache: 2223 ns/op
json-iterator: 2913 ns/op
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cache is very effective for this json library, especially when the json key is deep.
but two problems that the test could not pass. give me some hints.
FAIL: Test_jsonParser_Parse/escaped
FAIL: Test_jsonParser_Parse/utf8_error_rune
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the patience!
This json library is hard to implement these two tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a loki reviewer, but I did a read through and a check of the jsonparser documentation.
_ = it.ReadMapCB(j.parseMap("")) | ||
if it.Error != nil && it.Error != io.EOF { | ||
return it.Error | ||
func (j *JSONParser) handleJson(preKey string, line []byte) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldnt it be better if the arguments were: line []byte, prefixKey string
(optional like argument at end)
} | ||
j.lbs.Set(field, string(v)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If JSON rune handling was required a replace could be performed here. Either as a string replace, or probably better as a custom string builder
Perhaps something like:
var sb strings.Builder
for _, c := range string(v) {
if c != '\uFFFD' {
sb.WriteRune(c)
}
}
return j.lbs.Set(field, sb.String())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively (and better) to support escaping in general (which currently you don't)
You could try calling GetString(v)
this should also solve the failing escaping test (Test_jsonParser_Parse/escaped).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about an unsafe to string here? This would avoid copying the data until LabelResult.String()
is called. It's tricky though. The underlying line
must not change and LabelResult.String()
would have to unescape. In any case, we must start with some memory profiling.
return it.Error | ||
func (j *JSONParser) handleJson(preKey string, line []byte) error { | ||
err := jsonparser.ObjectEach(line, func(key []byte, val []byte, valueType jsonparser.ValueType, offset int) error { | ||
jsonKey := preKey + sanitizeLabelKey(string(key), true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line here, specially string(key) makes the whole PR back to where it was before.
I suggest you take a look at difference in memory allocations, we're trying to lower them and I don't think it's happening.
Use benstats to compare results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest you also look at benchmark profile, go test -bench -memprofile mem.out to figure where the work needs to be done.
Again the whole point is to avoid allocation as much as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a quick look, it seems that we need to create a pool for the slice of bytes that will contains the potential keys. Then try to work with slice of bytes only, avoiding the string(key) as much as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a look at our production profiling and found that the JSON parser has the highest impact on allocations and not CPU. I hope I get to share the flamegraphs.buger/jsonparser
might be even worse.
That means it is important to share the allocations. However, it made me wonder. What do you think about making readValue
avoid any copies? It could read []byte
slice instead. I've traced the usage of the labels. At some point we call String()
on the LabelSet
but there is no need to copy the bytes before.
Sorry opening this rabbit hole. Just wanted to start a discussion.
EDIT: Just saw that Cyril mentions the same. Be aware though. ReadStringAsSlice
seems to make a copy as well.
I've just found that |
Please don't forget that the underlying bytes may contain escape sequences when doing benchmarks. |
The direction is good. The new library seems to allow what we want todo, avoid allocations. @splitice is right about those tests, we added them as regression test, we had this issue in production with UTF8 characters. I'm happy to have the whole code rework, but I think we should be cautious about allocation here. |
Hi! This issue has been automatically marked as stale because it has not had any We use a stalebot among other tools to help manage the state of issues in this project. Stalebots are also emotionless and cruel and can close issues which are still very relevant. If this issue is important to you, please add a comment to keep it open. More importantly, please add a thumbs-up to the original issue entry. We regularly sort for closed issues which have a We may also:
We are doing our best to respond, organize, and prioritize all issues but it can be a challenging task, |
What this PR does / why we need it:
Improve json parser performance, from 2913 ns/op to 1132 ns/op
Which issue(s) this PR fixes:
Fixes [#4328]
reimplement pr: #5417
Special notes for your reviewer:
New Benchmark_Parser
Old Benchmark_Parser
key change.
use "github.com/buger/jsonparser" replace "github.com/json-iterator/go"
Checklist
CHANGELOG.md
about the changes.