-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decrease Clones #35945
Decrease Clones #35945
Conversation
🚀 |
@jeniawhite is this one ready to be reviewed or do you still have things to do on it? |
This pull request is now in conflicts. Could you fix it? 🙏
|
This pull request is now in conflicts. Could you fix it? 🙏
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just comment suggestions for better clarity of what's going on.
* Decrease Clones * Adding changelog * CodeReview changes (cherry picked from commit faf88b7) # Conflicts: # libbeat/beat/event.go # libbeat/beat/event_test.go
* Decrease Clones (#35945) * Decrease Clones * Adding changelog * CodeReview changes (cherry picked from commit faf88b7) # Conflicts: # libbeat/beat/event.go # libbeat/beat/event_test.go --------- Co-authored-by: Evgeniy Belyi <jeniawhite92@gmail.com> Co-authored-by: Denis Rechkunov <denis.rechkunov@elastic.co>
* Decrease Clones * Adding changelog * CodeReview changes
What does this PR do?
Investigating the code, I've noticed that we are doing unnecessary clones.
This takes up significant resources from the memory and cpu time (Garbage Collection cycles).
In
deepUpdates
of events we are starting off with a clone in order to manipulate thetimestamp
and themetadata
fields, also the processors rely on this clone to not alter data from my understanding.There is no reason for
deepUpdate
to handle the cloning of data, this should be handled by the consumer (processors).In order to avoid mutation of input data, I've deferred the keys and rolled them back upon completion of the function.
The only occurrence that uses this code is located in
add_fields.go
:Which is the action for the processors, and if you look at the code it is already cloning when the data is shared.
This means that we are basically double cloning and it is redundant.
The next step would be to investigate the
shared
variable because it seems like that it is alwaystrue
and my assumption is that we do not necessary always run in a shared usecase.After the changes I've ran a benchmark that generates 20Million small events and I saw a huge improvement on the stability of GC cycles and memory allocation.
Old code:
New code:
Why is it important?
This is a component in the heart of libbeat that manages events and is being utilized by multiple products.
This component is being utilized in the hottest paths (every event processed by this code).
Checklist
CHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.