Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for #5890: Event Format Results and the Kafka Logger #6449

Merged
merged 8 commits into from
Jun 7, 2020

Conversation

analyzeDFIR
Copy link
Contributor

Tl;dr: Fixes #5890 using RapidJSON, adds tests for event and batch formatted results.

As the title suggests, fixes the bug reported in #5890 related to event format results from the processes table. The bug occurs in the function getMsgName. It uses the rfind method to get the index of the "name" key containing the query name, but event format results with a "name" column (see: processes table) will cause the function to return the incorrect result. The current tests for the Kafka logger plugin don't catch this case as they currently only cover snapshot formatted results, not the batch or event formats.

I added test cases for all formats, both with and without a "name" column in the query result set, and updated the implementation of getMsgName to parse the payload as a JSON object and extract the top-level "name" key's value. I believe the invariant that the top-level "name" key, if it exists, will hold the name of the query run is true for all results.

If the OSQuery core team is concerned about degraded performance due to parsing each payload as JSON, I wrote an alternative implementation relying on string parsing here.

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented May 18, 2020

CLA Check
The committers are authorized under a signed CLA.

@theopolis
Copy link
Member

Oh wow, nice work. This makes me rethink the LoggerPlugin API, perhaps we should add overloads to make this data available to plugins without having to re-parse the string data.

Do you have any guesstimates or basic profiling of the impact of re-parsing JSON? I think your implementation https://github.com/analyzeDFIR/osquery/blob/osquery-5890/plugins/logger/kafka_producer.cpp#L135 is my preference in lieu of us having a more appropriate API exposed.

@analyzeDFIR
Copy link
Contributor Author

RE benchmarks: I have not benchmarked these implementations, though I am happy to do so. Any thoughts on the design, and where the benchmark code should go? Looks like there are some logger benchmarks in osquery/logger/benchmarks/logger_benchmarks.cpp, but I don't see any benchmarks for plugins. I could write something simple that parses N random JSON blobs with each and times them, but let me know what you think.

RE the LoggerPlugin API: I was thinking the same thing when approaching this problem. Not sure how much it would help other plugins, but I'd imagine pre-processing the payload in some way is not unique to Kafka. Whatever implementation you choose now, I think we can agree that the RapidJSON implemention is cleaner and easier to maintain in the long run.

@theopolis
Copy link
Member

I could write something simple that parses N random JSON blobs with each and times them, but let me know what you think.

@analyzeDFIR, right I think if we could compare and contrast your two implementations, JSON vs "string peeking", we could make an informed decision on maintainability vs. performance. If you could create a minified benchmark and just comment here with the results or delta I think that's enough for us to move forward.

@analyzeDFIR
Copy link
Contributor Author

@analyzeDFIR, right I think if we could compare and contrast your two implementations, JSON vs "string peeking", we could make an informed decision on maintainability vs. performance. If you could create a minified benchmark and just comment here with the results or delta I think that's enough for us to move forward.

I do not have much experience with CMake and am having some trouble figuring out how to use the project's CMake configuration to build my test code. Do I have to add another build target? Any help you could provide would be much appreciated.

@theopolis
Copy link
Member

Ah, to be clear, I was suggesting building a small benchmark outside of osquery and osquery's build system. Something quick and easy just to compare the implementations.

@analyzeDFIR
Copy link
Contributor Author

analyzeDFIR commented May 23, 2020

The JSON implementation relies on osquery/utils/json/json.h. I tried just using clang++ to build a test including that file, but it eventually relies on osquery/utils/system/system.h and I think that's built by CMake based on OS.

Edit: I'm running these tests on a 2015 Macbook Pro, 2.5 GHz Intel Core i7, 16 GB 1600 MHz DDR3. Created two test files: 11MB osqueryd.snapshots.log with 184 records, and 63KB osquery.results.log with 152 records. Using the time command's user process time statistic.

The "string peeking" implementation takes, on average, ~3-3.15ms per snapshot record and ~0.03ms per event record on my machine. This averages to ~1.6ms per record.

Edit 2: Copied the osquery/utils/system/posix/system.{h,cpp} files to osquery/utils/system/system.h and that solved my initial issue. Now getting a list of other errors, including:

test_getMsgName_json.cpp:28:18: error: no matching member function for call to 'HasMember'
      !doc.doc().HasMember(fieldName)) {
       ~~~~~~~~~~^~~~~~~~~
/usr/local/include/rapidjson/document.h:1094:10: note: candidate function not viable: no known conversion from 'const std::string' (aka 'const basic_string<char, char_traits<char>, allocator<char> >') to
      'const rapidjson::GenericValue<rapidjson::UTF8<char>, rapidjson::MemoryPoolAllocator<rapidjson::CrtAllocator> >::Ch *' (aka 'const char *') for 1st argument
    bool HasMember(const Ch* name) const { return FindMember(name) != MemberEnd(); }
         ^
/usr/local/include/rapidjson/document.h:1118:10: note: candidate template ignored: could not match 'GenericValue' against 'basic_string'
    bool HasMember(const GenericValue<Encoding, SourceAllocator>& name) const { return FindMember(name) != MemberEnd(); }
         ^
test_getMsgName_json.cpp:31:25: error: no viable overloaded operator[] for type 'rapidjson::Document' (aka 'GenericDocument<UTF8<> >')
  auto& name = doc.doc()[fieldName];
               ~~~~~~~~~^~~~~~~~~~

Officially out of my depth.

@analyzeDFIR
Copy link
Contributor Author

Tested both String and JSON implementations with 10,000 snapshot and event format records, results were:

Implementation Number of Records Result Format Runtime Range
JSON 10,000 event 61-71 ms
String 10,000 event 35-40 ms
JSON 10,000 snapshot 83-93 ms
String 10,000 snapshot 49-60 ms

In short, String implementation is faster (as expected), but JSON implementation isn't as slow as expected. Let me know which you'd like to go with and I'll make the necessary changes.

@theopolis
Copy link
Member

Thank you so much for that analysis! I am happy to merge this as-is but I am not a Kafka user so the impact will not effect my infra (thus it is easy for me to accept the performance impact).

We now have the analysis here for posterity, if we need to revisit this in an effort to address performance issues we can adopt the "string peeking" method or brainstorm alternatives.

@theopolis theopolis merged commit c197af9 into osquery:master Jun 7, 2020
aikuchin pushed a commit to aikuchin/osquery that referenced this pull request Jul 11, 2023
…0 to master

* commit 'eeee0fb0957f5af983f817c2e6f19c53108d9e09': (83 commits)
  Add additional changelog items (osquery#6523)
  Changelog for 4.4.0 (osquery#6492)
  build: Add Azure tables to specs CMakeLists (osquery#6507)
  CMake: Correct macOS framework linking (osquery#6522)
  tables: Only populate table cache with star-like selects (osquery#6513)
  CMake: Fix and cleanup compile flags (osquery#6521)
  docs: Add note to bump the Homebrew cask (osquery#6519)
  tests: Fix atom_packages, processes, rpm_packages flakiness (osquery#6518)
  bug: Do not use system proxy for AWS local authority (osquery#6512)
  packaging: updating docs on cpack usage to include Chocolatey (osquery#6022)
  bug: Fix typed_row table caching (osquery#6508)
  Implement event batching support for Windows tables (osquery#6280)
  http: Use sync resolve (osquery#6490)
  Add support for basic chassis information (osquery#5282)
  Only emit 'denylist' warning once (osquery#6493)
  docs: Remove references to brew in macOS install (osquery#6494)
  Fix for osquery#5890: Event Format Results and the Kafka Logger (osquery#6449)
  make apt_sources table parsing much more resilient (osquery#6482)
  Make file and hash container columns hidden (osquery#6486)
  Update documentation to use 'allow list' and 'deny list' diction (osquery#6489)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

differential osquery query output to base_topic
2 participants