Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decrease Clones #35945

Merged
merged 6 commits into from
Jul 13, 2023
Merged

Decrease Clones #35945

merged 6 commits into from
Jul 13, 2023

Conversation

jeniawhite
Copy link
Contributor

@jeniawhite jeniawhite commented Jun 28, 2023

What does this PR do?

Investigating the code, I've noticed that we are doing unnecessary clones.
Screen Shot 2023-06-28 at 15 35 59
This takes up significant resources from the memory and cpu time (Garbage Collection cycles).
In deepUpdates of events we are starting off with a clone in order to manipulate the timestamp and the metadata fields, also the processors rely on this clone to not alter data from my understanding.
There is no reason for deepUpdate to handle the cloning of data, this should be handled by the consumer (processors).
In order to avoid mutation of input data, I've deferred the keys and rolled them back upon completion of the function.

The only occurrence that uses this code is located in add_fields.go:

func (af *addFields) Run(event *beat.Event) (*beat.Event, error) {
	if event == nil || len(af.fields) == 0 {
		return event, nil
	}

	fields := af.fields
	if af.shared {
		fields = fields.Clone()
	}

	if af.overwrite {
		event.DeepUpdate(fields)
	} else {
		event.DeepUpdateNoOverwrite(fields)
	}

	return event, nil
}

Which is the action for the processors, and if you look at the code it is already cloning when the data is shared.
This means that we are basically double cloning and it is redundant.

The next step would be to investigate the shared variable because it seems like that it is always true and my assumption is that we do not necessary always run in a shared usecase.

After the changes I've ran a benchmark that generates 20Million small events and I saw a huge improvement on the stability of GC cycles and memory allocation.

Old code:
Screen Shot 2023-06-29 at 15 39 48

New code:
Screen Shot 2023-06-29 at 15 40 07

Why is it important?

This is a component in the heart of libbeat that manages events and is being utilized by multiple products.
This component is being utilized in the hottest paths (every event processed by this code).

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

@jeniawhite jeniawhite added enhancement backport-skip Skip notification from the automated backport with mergify labels Jun 28, 2023
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Jun 28, 2023
@amitkanfer
Copy link
Collaborator

🚀

@elasticmachine
Copy link
Collaborator

elasticmachine commented Jun 28, 2023

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2023-07-13T15:11:15.534+0000

  • Duration: 74 min 47 sec

Test stats 🧪

Test Results
Failed 0
Passed 27469
Skipped 2029
Total 29498

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@jlind23
Copy link
Collaborator

jlind23 commented Jul 10, 2023

@jeniawhite is this one ready to be reviewed or do you still have things to do on it?

@jeniawhite jeniawhite marked this pull request as ready for review July 12, 2023 20:04
@jeniawhite jeniawhite requested a review from a team as a code owner July 12, 2023 20:04
@jeniawhite jeniawhite requested review from rdner and faec July 12, 2023 20:04
@mergify
Copy link
Contributor

mergify bot commented Jul 12, 2023

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b evgb-DecreaseClone upstream/evgb-DecreaseClone
git merge upstream/main
git push upstream evgb-DecreaseClone

@jeniawhite
Copy link
Contributor Author

@jlind23 Removed the draft tag, ready for review.
@rdner Would be great if you'd take a look.
Thank you!

@mergify
Copy link
Contributor

mergify bot commented Jul 12, 2023

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b evgb-DecreaseClone upstream/evgb-DecreaseClone
git merge upstream/main
git push upstream evgb-DecreaseClone

Copy link
Member

@rdner rdner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just comment suggestions for better clarity of what's going on.

libbeat/beat/event_test.go Outdated Show resolved Hide resolved
libbeat/beat/event.go Show resolved Hide resolved
libbeat/beat/event.go Outdated Show resolved Hide resolved
libbeat/beat/event.go Show resolved Hide resolved
libbeat/beat/event.go Show resolved Hide resolved
libbeat/beat/event.go Show resolved Hide resolved
libbeat/beat/event.go Show resolved Hide resolved
@jeniawhite jeniawhite requested a review from rdner July 13, 2023 15:11
@jeniawhite jeniawhite enabled auto-merge (squash) July 13, 2023 15:41
@jeniawhite jeniawhite merged commit faf88b7 into elastic:main Jul 13, 2023
1 check passed
@rdner rdner added backport-7.17 Automated backport to the 7.17 branch with mergify Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team and removed backport-skip Skip notification from the automated backport with mergify labels Sep 14, 2023
@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Sep 14, 2023
mergify bot pushed a commit that referenced this pull request Sep 14, 2023
* Decrease Clones

* Adding changelog

* CodeReview changes

(cherry picked from commit faf88b7)

# Conflicts:
#	libbeat/beat/event.go
#	libbeat/beat/event_test.go
rdner added a commit that referenced this pull request Sep 18, 2023
* Decrease Clones (#35945)

* Decrease Clones

* Adding changelog

* CodeReview changes

(cherry picked from commit faf88b7)

# Conflicts:
#	libbeat/beat/event.go
#	libbeat/beat/event_test.go

---------

Co-authored-by: Evgeniy Belyi <jeniawhite92@gmail.com>
Co-authored-by: Denis Rechkunov <denis.rechkunov@elastic.co>
Scholar-Li pushed a commit to Scholar-Li/beats that referenced this pull request Feb 5, 2024
* Decrease Clones

* Adding changelog

* CodeReview changes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-7.17 Automated backport to the 7.17 branch with mergify enhancement Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants