Integrate mux parser into stirling #367

ddelnano · 2021-11-29T20:43:33Z

This is my second attempt at integrating the mux parser into stirling. I'm opening this preemptively to show that the testing for #366 was successful, but we need to merge that PR first and rebase to pull out the docker image related changes.

Todo

Fix issue that is causing a subset of mux data from being collected
Debug why a bug in Tping/Rping parsing causes the mux stitcher to lose data
- Incorrectly stripping the buffer caused the Tdispatch/Rdispatch frames to go missing. Fixing that bug causes all of the frames to exist except the Rping/Tping (since the Rping isn't parsed correctly). After looking at this again with more experience debugging this problem, I'm able to verify that the stitcher behaves as expected even with the Rping parser bug
Decrease false positive rate of mux protocol inference. I attempted to do this by parsing more of the mux message when the entire frame is available.
Avoid mutating mux stitcher deques by using iterators
Understand T/Rinit message structure better (6 unknown bytes) and update the mux parser accordingly

pixie-io-buildbot · 2021-11-29T20:43:35Z

Can one of the admins verify this patch?

src/stirling/source_connectors/socket_tracer/mux_table.h

src/stirling/source_connectors/socket_tracer/mux_trace_bpf_test.cc

src/stirling/source_connectors/socket_tracer/bcc_bpf/protocol_inference_test.cc

chengruizhe

Mostly looks good. Let's talk about how to test the Mux protocol inference rules on slack.

chengruizhe · 2021-12-13T22:36:17Z

src/stirling/source_connectors/socket_tracer/bcc_bpf/protocol_inference.h

@@ -487,6 +487,66 @@ static __inline bool is_redis_message(const char* buf, size_t count) {
  return true;
 }

+static __inline enum message_type_t infer_mux_message(const char* buf, size_t count) {
+  static const int8_t kTreq = 1;


Checking for a mux_type of 1 or -1 is probably not robust enough, because it's likely that another protocol can have a 1 at that location. We do have a network traffic dataset that we use internally to test how protocol inference rules affect the accuracy of classifying different protocols. I suggest we get some Mux data from a real application, add it to the dataset, and test its effect on the other protocols, before merging the protocol inference code. See experimental/stirling/protocol_inference.

I attempted to enhance the mux protocol inference by reducing the number of types (R/Tinit, R/Tdispatch, Rerr(Old)) and by parsing 'special' strings that are present in those frame types. Ideally we would support all mux types, but there are some that aren't as high value.

If I remember correctly, the protocol inference runs on a connection until it's classified and is only provided the data from the syscall, correct? If that's the case I was hoping that having stronger verification on Rerr, Tinit and Rinit would help reduce false positives since it's earlier in the connection lifecycle. Let me know what you think of my latest update.

I am definitely interested in trying the protocol inference against the network traffic dataset. I shared the mux pcap file that I generated myself with you in slack. Unfortunately I don't know if I'll be able to provide real application traffic, but let me know how I can help with this testing.

src/stirling/source_connectors/socket_tracer/mux_table.h

chengruizhe · 2021-12-13T22:46:13Z

src/stirling/source_connectors/socket_tracer/mux_table.h

+namespace stirling {
+
+// clang-format off
+static constexpr DataElement kMuxElements[] = {


Do you plan to add a body field in the future? If so, let's add a TODO.

I think it depends how we handle the nested thrift protocol data in thriftmux. I was picturing that the "body" field would be stored in a different table populated by a thrift protocol parser. That would allow non thriftmux users to get thrift protocol parsing.

However, it's not clear to me how we would join the mux and thrift procotol data in that situation. In an ideal world, the mux and thrift data would be easily joined. Has there been any discussion on how to handle protocols that are nested like this?

We haven't had nested protocols like this before, so we'll be treading new ground. The problem is that we have only one ConnTracker per connection, and there's an assumption that the ConnTracker will output to one table.

I suppose we could detect the more specific inner protocol and choose the more specific one when possible. Needs more discussion.

I expected this would require some discussion :) and good point regarding the ConnTracker 1:1 mapping. For now I've added a TODO comment explaining the future work.

src/stirling/source_connectors/socket_tracer/socket_trace_connector.cc

oazizi000 · 2021-12-16T06:37:49Z

src/stirling/source_connectors/socket_tracer/mux_table.h

+namespace stirling {
+
+// clang-format off
+static constexpr DataElement kMuxElements[] = {


We haven't had nested protocols like this before, so we'll be treading new ground. The problem is that we have only one ConnTracker per connection, and there's an assumption that the ConnTracker will output to one table.

I suppose we could detect the more specific inner protocol and choose the more specific one when possible. Needs more discussion.

src/stirling/source_connectors/socket_tracer/mux_table.h

src/stirling/source_connectors/socket_tracer/socket_trace_connector.h

…in a bug fix), remove unused client image and add class to use image Signed-off-by: Dom Del Nano <ddelnano@gmail.com>

…er image Signed-off-by: Dom Del Nano <ddelnano@gmail.com>

…es removes the buffer data correctly Signed-off-by: Dom Del Nano <ddelnano@gmail.com>

Signed-off-by: Dom Del Nano <ddelnano@gmail.com>

…ng for mux's 24 bit tag field Signed-off-by: Dom Del Nano <ddelnano@gmail.com>

…ire frame is provided Signed-off-by: Dom Del Nano <ddelnano@gmail.com>

Signed-off-by: Dom Del Nano <ddelnano@gmail.com>

…_test Signed-off-by: Dom Del Nano <ddelnano@gmail.com>

Signed-off-by: Dom Del Nano <ddelnano@gmail.com>

…d inside the mux protocol) Signed-off-by: Dom Del Nano <ddelnano@gmail.com>

…g postgres traffic Signed-off-by: Dom Del Nano <ddelnano@gmail.com>

ddelnano · 2021-12-19T15:05:59Z

@chengruizhe I removed the part of the mux protocol inference that was misclassifying postgres traffic.

oazizi000 · 2022-01-11T07:27:45Z

@ddelnano 95199df pulled in most of these changes.

The protocol inference code was omitted as it is caused us to breach the eBPF instruction limit. This also means that for now, the Mux protocol support is not yet active.

codecov-commenter · 2022-01-11T07:31:28Z

Codecov Report

Merging #367 (95199df) into main (2fb2d7c) will decrease coverage by 0.01%.
The diff coverage is 55.43%.

❗ Current head 95199df differs from pull request most recent head 84767f8. Consider uploading reports for the commit 84767f8 to get more accurate results

@@            Coverage Diff             @@
##             main     #367      +/-   ##
==========================================
- Coverage   63.44%   63.42%   -0.02%     
==========================================
  Files        1044     1048       +4     
  Lines       65051    65254     +203     
==========================================
+ Hits        41271    41390     +119     
- Misses      22553    22631      +78     
- Partials     1227     1233       +6

Impacted Files	Coverage Δ
src/cloud/api/controllers/vizier_cluster_grpc.go	`64.60% <ø> (-0.20%)`	⬇️
src/stirling/bpf_tools/bcc_symbolizer.cc	`0.00% <0.00%> (ø)`
...ling/source_connectors/perf_profiler/symbolizer.cc	`0.00% <ø> (ø)`
...rce_connectors/socket_tracer/bcc_bpf_intf/common.h	`100.00% <ø> (ø)`
...nnectors/socket_tracer/bcc_bpf_intf/socket_trace.h	`0.00% <ø> (ø)`
...ing/source_connectors/socket_tracer/data_stream.cc	`98.21% <ø> (ø)`
...et_tracer/protocols/kafka/decoder/packet_decoder.h	`93.87% <ø> (ø)`
...socket_tracer/protocols/kafka/opcodes/sync_group.h	`0.00% <0.00%> (ø)`
...nnectors/socket_tracer/protocols/kafka/stitcher.cc	`53.00% <0.00%> (-7.23%)`	⬇️
..._connectors/socket_tracer/socket_trace_connector.h	`70.00% <ø> (ø)`
... and 53 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2fb2d7c...84767f8. Read the comment docs.

oazizi000 · 2022-01-11T18:05:39Z

Marking this as closed since the majority has been pulled in. The last piece (eBPF mux protocol inference) will be submitted as a separate PR.

ddelnano mentioned this pull request Nov 29, 2021

Add the base for thriftmux scala docker images (layers) used for end to end bpf testing #366

Closed

1 task

ddelnano force-pushed the integrate-mux-parser-into-stirling-take-2 branch 2 times, most recently from 28389ae to b719d55 Compare December 10, 2021 02:03

oazizi000 reviewed Dec 13, 2021

View reviewed changes

chengruizhe reviewed Dec 13, 2021

View reviewed changes

ddelnano force-pushed the integrate-mux-parser-into-stirling-take-2 branch from e3bceb9 to 79b228a Compare December 15, 2021 02:09

oazizi000 reviewed Dec 16, 2021

View reviewed changes

ddelnano added 16 commits December 19, 2021 07:03

Address issues with thriftmux container: upgrade rules_docker (pulls …

803a6a1

…in a bug fix), remove unused client image and add class to use image Signed-off-by: Dom Del Nano <ddelnano@gmail.com>

Implement basic mux protocol bpf test with bazel built thriftmux dock…

ab7ec34

…er image Signed-off-by: Dom Del Nano <ddelnano@gmail.com>

Enforce stricter validation on the mux tag field and ensure ParseFram…

be36061

…es removes the buffer data correctly Signed-off-by: Dom Del Nano <ddelnano@gmail.com>

Updates to get mux_trace_bpf_test working

5df079a

Signed-off-by: Dom Del Nano <ddelnano@gmail.com>

Fix duplicate declaration missed during the rebase

09d9f37

Signed-off-by: Dom Del Nano <ddelnano@gmail.com>

Rename mux trace bpf test helper function name

11bc344

Signed-off-by: Dom Del Nano <ddelnano@gmail.com>

Update mux protocol inference test with correct test case

e83048f

Signed-off-by: Dom Del Nano <ddelnano@gmail.com>

Use static_cast when appending tag to mux table

7f289f0

Signed-off-by: Dom Del Nano <ddelnano@gmail.com>

Remove unnecessary comment change and clarify comment regarding testi…

315d7d9

…ng for mux's 24 bit tag field Signed-off-by: Dom Del Nano <ddelnano@gmail.com>

Update mux's protocol inference to have a stronger guarantee when ent…

4a98e3f

…ire frame is provided Signed-off-by: Dom Del Nano <ddelnano@gmail.com>

Address linting issues

5f41f64

Signed-off-by: Dom Del Nano <ddelnano@gmail.com>

Remove friend definition since mux testing has moved to mux_trace_bpf…

acbb1d2

…_test Signed-off-by: Dom Del Nano <ddelnano@gmail.com>

Remove mux resp_type and tag from mux table

79fa4e9

Signed-off-by: Dom Del Nano <ddelnano@gmail.com>

Fix linting issue

554fe99

Signed-off-by: Dom Del Nano <ddelnano@gmail.com>

Add comment regarding future plans to support thriftmux (thrift neste…

e289c24

…d inside the mux protocol) Signed-off-by: Dom Del Nano <ddelnano@gmail.com>

Ensure that mux protocol inference does not interfere with classifyin…

84767f8

…g postgres traffic Signed-off-by: Dom Del Nano <ddelnano@gmail.com>

ddelnano force-pushed the integrate-mux-parser-into-stirling-take-2 branch from 5e2c268 to 84767f8 Compare December 19, 2021 15:04

oazizi000 closed this Jan 11, 2022

ddelnano mentioned this pull request Oct 5, 2022

Add TLS tracing support to netty (portion of JVM based languages) #407

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate mux parser into stirling #367

Integrate mux parser into stirling #367

ddelnano commented Nov 29, 2021 •

edited

pixie-io-buildbot commented Nov 29, 2021

chengruizhe left a comment

chengruizhe Dec 13, 2021

ddelnano Dec 14, 2021

chengruizhe Dec 13, 2021

ddelnano Dec 14, 2021

oazizi000 Dec 16, 2021

ddelnano Dec 16, 2021

oazizi000 Dec 16, 2021

ddelnano commented Dec 19, 2021

oazizi000 commented Jan 11, 2022 •

edited

codecov-commenter commented Jan 11, 2022

oazizi000 commented Jan 11, 2022

Integrate mux parser into stirling #367

Integrate mux parser into stirling #367

Conversation

ddelnano commented Nov 29, 2021 • edited

Todo

pixie-io-buildbot commented Nov 29, 2021

chengruizhe left a comment

Choose a reason for hiding this comment

chengruizhe Dec 13, 2021

Choose a reason for hiding this comment

ddelnano Dec 14, 2021

Choose a reason for hiding this comment

chengruizhe Dec 13, 2021

Choose a reason for hiding this comment

ddelnano Dec 14, 2021

Choose a reason for hiding this comment

oazizi000 Dec 16, 2021

Choose a reason for hiding this comment

ddelnano Dec 16, 2021

Choose a reason for hiding this comment

oazizi000 Dec 16, 2021

Choose a reason for hiding this comment

ddelnano commented Dec 19, 2021

oazizi000 commented Jan 11, 2022 • edited

codecov-commenter commented Jan 11, 2022

Codecov Report

oazizi000 commented Jan 11, 2022

ddelnano commented Nov 29, 2021 •

edited

oazizi000 commented Jan 11, 2022 •

edited