Fix for #5890: Event Format Results and the Kafka Logger #6449

analyzeDFIR · 2020-05-18T22:59:59Z

Tl;dr: Fixes #5890 using RapidJSON, adds tests for event and batch formatted results.

As the title suggests, fixes the bug reported in #5890 related to event format results from the processes table. The bug occurs in the function getMsgName. It uses the rfind method to get the index of the "name" key containing the query name, but event format results with a "name" column (see: processes table) will cause the function to return the incorrect result. The current tests for the Kafka logger plugin don't catch this case as they currently only cover snapshot formatted results, not the batch or event formats.

I added test cases for all formats, both with and without a "name" column in the query result set, and updated the implementation of getMsgName to parse the payload as a JSON object and extract the top-level "name" key's value. I believe the invariant that the top-level "name" key, if it exists, will hold the name of the query run is true for all results.

If the OSQuery core team is concerned about degraded performance due to parsing each payload as JSON, I wrote an alternative implementation relying on string parsing here.

… mode results

…op-level "name" key.

linux-foundation-easycla · 2020-05-18T23:00:05Z

The committers are authorized under a signed CLA.

✅ analyzeDFIR (2cfa509, 1f4c918, f0b63dd, 0c5d6e5, 3511db7, 17c5190, 3bd0732, 8dba2db)

theopolis · 2020-05-20T01:30:37Z

Oh wow, nice work. This makes me rethink the LoggerPlugin API, perhaps we should add overloads to make this data available to plugins without having to re-parse the string data.

Do you have any guesstimates or basic profiling of the impact of re-parsing JSON? I think your implementation https://github.com/analyzeDFIR/osquery/blob/osquery-5890/plugins/logger/kafka_producer.cpp#L135 is my preference in lieu of us having a more appropriate API exposed.

analyzeDFIR · 2020-05-20T02:19:54Z

RE benchmarks: I have not benchmarked these implementations, though I am happy to do so. Any thoughts on the design, and where the benchmark code should go? Looks like there are some logger benchmarks in osquery/logger/benchmarks/logger_benchmarks.cpp, but I don't see any benchmarks for plugins. I could write something simple that parses N random JSON blobs with each and times them, but let me know what you think.

RE the LoggerPlugin API: I was thinking the same thing when approaching this problem. Not sure how much it would help other plugins, but I'd imagine pre-processing the payload in some way is not unique to Kafka. Whatever implementation you choose now, I think we can agree that the RapidJSON implemention is cleaner and easier to maintain in the long run.

theopolis · 2020-05-23T13:16:28Z

I could write something simple that parses N random JSON blobs with each and times them, but let me know what you think.

@analyzeDFIR, right I think if we could compare and contrast your two implementations, JSON vs "string peeking", we could make an informed decision on maintainability vs. performance. If you could create a minified benchmark and just comment here with the results or delta I think that's enough for us to move forward.

analyzeDFIR · 2020-05-23T17:25:05Z

@analyzeDFIR, right I think if we could compare and contrast your two implementations, JSON vs "string peeking", we could make an informed decision on maintainability vs. performance. If you could create a minified benchmark and just comment here with the results or delta I think that's enough for us to move forward.

I do not have much experience with CMake and am having some trouble figuring out how to use the project's CMake configuration to build my test code. Do I have to add another build target? Any help you could provide would be much appreciated.

theopolis · 2020-05-23T17:36:16Z

Ah, to be clear, I was suggesting building a small benchmark outside of osquery and osquery's build system. Something quick and easy just to compare the implementations.

analyzeDFIR · 2020-05-23T17:39:31Z

The JSON implementation relies on osquery/utils/json/json.h. I tried just using clang++ to build a test including that file, but it eventually relies on osquery/utils/system/system.h and I think that's built by CMake based on OS.

Edit: I'm running these tests on a 2015 Macbook Pro, 2.5 GHz Intel Core i7, 16 GB 1600 MHz DDR3. Created two test files: 11MB osqueryd.snapshots.log with 184 records, and 63KB osquery.results.log with 152 records. Using the time command's user process time statistic.

The "string peeking" implementation takes, on average, ~3-3.15ms per snapshot record and ~0.03ms per event record on my machine. This averages to ~1.6ms per record.

Edit 2: Copied the osquery/utils/system/posix/system.{h,cpp} files to osquery/utils/system/system.h and that solved my initial issue. Now getting a list of other errors, including:

test_getMsgName_json.cpp:28:18: error: no matching member function for call to 'HasMember'
      !doc.doc().HasMember(fieldName)) {
       ~~~~~~~~~~^~~~~~~~~
/usr/local/include/rapidjson/document.h:1094:10: note: candidate function not viable: no known conversion from 'const std::string' (aka 'const basic_string<char, char_traits<char>, allocator<char> >') to
      'const rapidjson::GenericValue<rapidjson::UTF8<char>, rapidjson::MemoryPoolAllocator<rapidjson::CrtAllocator> >::Ch *' (aka 'const char *') for 1st argument
    bool HasMember(const Ch* name) const { return FindMember(name) != MemberEnd(); }
         ^
/usr/local/include/rapidjson/document.h:1118:10: note: candidate template ignored: could not match 'GenericValue' against 'basic_string'
    bool HasMember(const GenericValue<Encoding, SourceAllocator>& name) const { return FindMember(name) != MemberEnd(); }
         ^
test_getMsgName_json.cpp:31:25: error: no viable overloaded operator[] for type 'rapidjson::Document' (aka 'GenericDocument<UTF8<> >')
  auto& name = doc.doc()[fieldName];
               ~~~~~~~~~^~~~~~~~~~

Officially out of my depth.

analyzeDFIR · 2020-06-04T20:00:36Z

Tested both String and JSON implementations with 10,000 snapshot and event format records, results were:

Implementation	Number of Records	Result Format	Runtime Range
JSON	10,000	event	61-71 ms
String	10,000	event	35-40 ms
JSON	10,000	snapshot	83-93 ms
String	10,000	snapshot	49-60 ms

In short, String implementation is faster (as expected), but JSON implementation isn't as slow as expected. Let me know which you'd like to go with and I'll make the necessary changes.

theopolis · 2020-06-07T01:42:00Z

Thank you so much for that analysis! I am happy to merge this as-is but I am not a Kafka user so the impact will not effect my infra (thus it is easy for me to accept the performance impact).

We now have the analysis here for posterity, if we need to revisit this in an effort to address performance issues we can adopt the "string peeking" method or brainstorm alternatives.

…0 to master * commit 'eeee0fb0957f5af983f817c2e6f19c53108d9e09': (83 commits) Add additional changelog items (osquery#6523) Changelog for 4.4.0 (osquery#6492) build: Add Azure tables to specs CMakeLists (osquery#6507) CMake: Correct macOS framework linking (osquery#6522) tables: Only populate table cache with star-like selects (osquery#6513) CMake: Fix and cleanup compile flags (osquery#6521) docs: Add note to bump the Homebrew cask (osquery#6519) tests: Fix atom_packages, processes, rpm_packages flakiness (osquery#6518) bug: Do not use system proxy for AWS local authority (osquery#6512) packaging: updating docs on cpack usage to include Chocolatey (osquery#6022) bug: Fix typed_row table caching (osquery#6508) Implement event batching support for Windows tables (osquery#6280) http: Use sync resolve (osquery#6490) Add support for basic chassis information (osquery#5282) Only emit 'denylist' warning once (osquery#6493) docs: Remove references to brew in macOS install (osquery#6494) Fix for osquery#5890: Event Format Results and the Kafka Logger (osquery#6449) make apt_sources table parsing much more resilient (osquery#6482) Make file and hash container columns hidden (osquery#6486) Update documentation to use 'allow list' and 'deny list' diction (osquery#6489) ...

analyzeDFIR added 8 commits May 16, 2020 03:30

Added tests to reproduce bug identified in osquery#5890

2cfa509

Patched implementation of getMsgName to account for batch and event…

1f4c918

… mode results

Added definition of findAllOccurrences to header file

f0b63dd

Fixed error in test case for batch mode results without name column

0c5d6e5

Combined checks for batch and snapshot mode into same if statement.

3511db7

Alternative implementation of getMsgName using RapidJSON to parse t…

17c5190

…op-level "name" key.

Proper spacing

3bd0732

Formatted code with cmake --build . --target format

8dba2db

theopolis added bug logging labels May 20, 2020

analyzeDFIR mentioned this pull request May 23, 2020

KafkaProducerPlugin can't get the right topic name if msg's payload contains data 'name' #5559

Open

theopolis approved these changes Jun 7, 2020

View reviewed changes

theopolis merged commit c197af9 into osquery:master Jun 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix for #5890: Event Format Results and the Kafka Logger #6449

Fix for #5890: Event Format Results and the Kafka Logger #6449

analyzeDFIR commented May 18, 2020

linux-foundation-easycla bot commented May 18, 2020 •

edited

Loading

theopolis commented May 20, 2020

analyzeDFIR commented May 20, 2020

theopolis commented May 23, 2020

analyzeDFIR commented May 23, 2020

theopolis commented May 23, 2020

analyzeDFIR commented May 23, 2020 •

edited

Loading

analyzeDFIR commented Jun 4, 2020

theopolis commented Jun 7, 2020

Fix for #5890: Event Format Results and the Kafka Logger #6449

Fix for #5890: Event Format Results and the Kafka Logger #6449

Conversation

analyzeDFIR commented May 18, 2020

linux-foundation-easycla bot commented May 18, 2020 • edited Loading

theopolis commented May 20, 2020

analyzeDFIR commented May 20, 2020

theopolis commented May 23, 2020

analyzeDFIR commented May 23, 2020

theopolis commented May 23, 2020

analyzeDFIR commented May 23, 2020 • edited Loading

analyzeDFIR commented Jun 4, 2020

theopolis commented Jun 7, 2020

linux-foundation-easycla bot commented May 18, 2020 •

edited

Loading

analyzeDFIR commented May 23, 2020 •

edited

Loading