Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

audit: Reduce the number of memory allocations in the hotpath #16056

Merged
merged 8 commits into from
Jan 18, 2024

Conversation

graphcareful
Copy link
Contributor

@graphcareful graphcareful commented Jan 10, 2024

With auditing enabled, high traffic events and low batch intervals it was observed that auditing was using more of the CPU then expected. This was traced back to many string conversions / string interpolation events when constructing an audit event.

An audit event is constructed at every possible attempt to enqueue the event so a hash code can be calculated. In the event the message has been determined it is a duplicate, the event can be deallocated, but this is wasteful. The event should only be constructed if it unique, to avoid the expensive serialization performed.

This pull request modifies the control flow in the audit log manager to calculate a hash code for the ocsf event without constructing the event itself. Since event construction depends on these parameters anyway, an appropriately unique hash code can be calculated with a free function using these parameters as input. In the event the hash code is unique, then the event will be allocated.

Linking to issue: #15898

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.3.x
  • v23.2.x
  • v23.1.x

Release Notes

Improvements

  • Reduces the number of allocations performed by the auditing subsystem

ss::httpd::const_req req,
const ss::sstring& user,
const ss::sstring& svc_name) {
const auto activity_id = http_method_to_activity_id(req._method);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@michael-redpanda do we need the activity id to calculate the base hash? All that would matter is hashing the method I think. Then we could make this method private to utils.h again

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This I believe is for the API activity, yes? API activity includes both HTTP and Kafka. Kafka doesn't have a 'method' field. So yes, I believe we do need to use activity ID in the hash. You should double check, but does "type_id" of the base type get hashed? If so then "activity_id" is already being hashed as Type ID is a combination of class ID and activity ID (https://schema.ocsf.io/1.0.0/classes/api_activity?extensions= look towards the bottom).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah for sure, i was just wondering if it would be good enough to has something like 0 - 5, one value for each of the potential HTTP method types, instead of having to get one of 5 stringified versions of the enum to later just hash

@graphcareful
Copy link
Contributor Author

I think it might also be worth it adding an additional find/replace commit to change the name of the equality_fields() family of methods. Currently they are now only used in the estimated_ocsf_size() method, I propose either changing the name of the method, or a small refactor to just have all ocsf types implement their own size() method (credit to Noah's original idea)

@graphcareful
Copy link
Contributor Author

Changes in force-push

  • Fixed use after move
  • Fixed bug with not accounting for size sempahore units

Copy link
Member

@StephanDollberg StephanDollberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into this.

Had a look and approach seems good. It seems like we are hashing quite a bit more now but maybe this is just an impression. Hashing was already quite visible before so want to make sure we are as smart as possible there.

Will leave to enterprise for a more thorough review.

@graphcareful
Copy link
Contributor Author

Thanks for looking into this.

Had a look and approach seems good. It seems like we are hashing quite a bit more now but maybe this is just an impression. Hashing was already quite visible before so want to make sure we are as smart as possible there.

Will leave to enterprise for a more thorough review.

Thanks for taking a look! We should be hashing the same amount as in the current impl since the hash function is called at the top of do_enqueue_event just like it stands as of now in dev.

src/v/security/acl.h Outdated Show resolved Hide resolved
src/v/security/audit/schemas/hashing_utils.h Show resolved Hide resolved
@graphcareful
Copy link
Contributor Author

Changes in force-push

  • Added concept to constrain type T passed to thedo_enqueue method
  • Implemented Bens feedback making the hashing code more graceful
  • Fixed a bug in a hash method that returned a value of 0 that attributed to a few ducktape test failures

src/v/hashing/utils.h Outdated Show resolved Hide resolved
src/v/hashing/utils.h Outdated Show resolved Hide resolved
src/v/hashing/utils.h Outdated Show resolved Hide resolved
src/v/security/audit/schemas/hashing_utils.cc Outdated Show resolved Hide resolved
src/v/security/audit/schemas/hashing_utils.cc Outdated Show resolved Hide resolved
src/v/hashing/utils.h Show resolved Hide resolved
@graphcareful
Copy link
Contributor Author

Changes in force-push

  • Fix for bug that included product in the application lifecycle msg unconditionally, making a dt test fail

@graphcareful
Copy link
Contributor Author

Change in force-push

  • Implemented Bens updated feedback on namespace around the new hashing utils and modifications to calls to hash_combine

@graphcareful
Copy link
Contributor Author

Changes in force-push

  • Rebased against dev

@vbotbuildovich
Copy link
Collaborator

new failures in https://buildkite.com/redpanda/redpanda/builds/43691#018cfec8-1b3f-4b54-a5d6-8864b63b3835:

"rptest.tests.transactions_test.TransactionsAuthorizationTest.simple_authz_test"

src/v/hashing/utils.h Outdated Show resolved Hide resolved
src/v/security/acl.h Outdated Show resolved Hide resolved
src/v/hashing/utils.h Outdated Show resolved Hide resolved
BenPope
BenPope previously approved these changes Jan 17, 2024
src/v/security/acl.h Outdated Show resolved Hide resolved
src/v/hashing/utils.h Outdated Show resolved Hide resolved
src/v/security/acl.h Outdated Show resolved Hide resolved
- That way auditing can hash these types to quickly get a hash code of a
larger ocsf type that encapsulates these acl options.
- This will be useful in future commits where a type can be passed to a
templated method and static polymorphic dispatch can be used to
determine which builder method to choose to construct the type
@graphcareful graphcareful merged commit 9be522f into redpanda-data:dev Jan 18, 2024
19 checks passed
@vbotbuildovich
Copy link
Collaborator

/backport v23.3.x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants