Skip to content

Instrument OpenSSL tracing to detect validity of assumptions for new tls tracing method#1161

Merged
JamesMBartlett merged 5 commits intopixie-io:mainfrom
ddelnano:ddelnano/verify-socket-fd-syscall-access-tls-tracing-rebased
Apr 6, 2023
Merged

Instrument OpenSSL tracing to detect validity of assumptions for new tls tracing method#1161
JamesMBartlett merged 5 commits intopixie-io:mainfrom
ddelnano:ddelnano/verify-socket-fd-syscall-access-tls-tracing-rebased

Conversation

@ddelnano
Copy link
Copy Markdown
Member

@ddelnano ddelnano commented Apr 4, 2023

Summary: Instrument OpenSSL tracing to detect validity of assumptions for new tls tracing method

After discussing #1123 with @oazizi000 and @etep, we decided that invest in stronger validation that the assumptions for that PR and our future tls tracing are valid. Instead of relying on struct offsets of user space data structures, the new tracing method will access a connection's socket fd from the underlying socket syscalls during SSL_write / SSL_read calls.

This technique should only be compatible with "BIO native" OpenSSL use cases, which are the OpenSSL use cases Pixie supports today. BIO native means that a compatible application uses a BIO provided by OpenSSL (via SSL_set_fd) and results in OpenSSL issuing read/write syscalls on your behalf for the underlying socket.

There are two situations we wanted to understand prior to proceeding with #1123: how custom BIO implementations behave (netty, nodejs, etc) and detecting unrelated (non socket) syscalls while SSL_write and SSL_read are on the stack. The former was verified with an experiment based on this change and is described in the Test Plan section below. We believe the latter should not occur, but the changes in this PR instrument this situation in order to detect if doest occur. Assuming this condition isn't encountered, we will proceed with #1132 (with the minor changes learned from this experiment).

Relevant Issues: #692

Type of change: /kind feature

Test Plan: Verified the following:

ddelnano added 2 commits April 4, 2023 20:00
new tls tracing method (accessing socket fd via underlying syscall)

Signed-off-by: Dom Del Nano <ddelnano@pixielabs.ai>
Signed-off-by: Dom Del Nano <ddelnano@pixielabs.ai>
@ddelnano ddelnano requested a review from a team April 4, 2023 21:39
… initialized

Signed-off-by: Dom Del Nano <ddelnano@pixielabs.ai>
Comment thread src/stirling/source_connectors/socket_tracer/bcc_bpf/socket_trace.c Outdated
Comment thread src/stirling/source_connectors/socket_tracer/bcc_bpf/socket_trace.c Outdated
Comment thread src/stirling/source_connectors/socket_tracer/bcc_bpf/socket_trace.c
@ddelnano ddelnano force-pushed the ddelnano/verify-socket-fd-syscall-access-tls-tracing-rebased branch from fadec6c to b38eee7 Compare April 5, 2023 22:03
…ent for more clarity

Signed-off-by: Dom Del Nano <ddelnano@pixielabs.ai>
@ddelnano ddelnano force-pushed the ddelnano/verify-socket-fd-syscall-access-tls-tracing-rebased branch from b38eee7 to 2af5d01 Compare April 5, 2023 22:11
ddelnano added a commit to ddelnano/pixie that referenced this pull request Apr 6, 2023
Signed-off-by: Dom Del Nano <ddelnano@pixielabs.ai>
…back

Signed-off-by: Dom Del Nano <ddelnano@pixielabs.ai>
vihangm pushed a commit that referenced this pull request Apr 6, 2023
…ECK` and metrics for error conditions (#1168)

Summary: Fix perf profiler bug causing higher BPF map utilization and
add `DCHECK` and metrics for error conditions

The perf profiler connector (and soon the socket trace connector -
#1161) record error conditions in a BPF map. While the perf profiler has
instrumented this value for a while, we haven't leveraged using this
data in anyway. This change `DCHECK`s the error condition in addition to
adding prometheus metrics.

This `DCHECK` caught a bug in our map switching logic. The perf profiler
has an "A" and "B" map and alternates between them. Prior to this change
we were using the A map exclusively since our variable for B was
accidentally pointing to A
([source](https://github.com/pixie-io/pixie/blob/cf071f3608de5500213cef05ddb1bba88544ea66/src/stirling/source_connectors/perf_profiler/bcc_bpf/profiler.c#L64-L65)).

Relevant Issues: N/A

Type of change: /kind bug

Test Plan: Verified the following:
- [x] `DCHECK` triggers on the first commit of this PR
([P351](https://phab.corp.pixielabs.ai/P351))
- [x] Verified that the overflow prometheus metric value is incremented
when the overflow condition occurs
([P350](https://phab.corp.pixielabs.ai/P350))
- [x] `DCHECK` no longer occurs with `profiler.c` fix

Changelog Message:
- Fixed a bug that may lead to missing stack frames for the continuous
profiler. We believe this situation is unlikely to occur since our
buffers are sized with sufficient margin.

---------

Signed-off-by: Dom Del Nano <ddelnano@pixielabs.ai>
@ddelnano
Copy link
Copy Markdown
Member Author

ddelnano commented Apr 6, 2023

@pixie-io/maintainers can you review this?

Edit: removed my comment about asking for help to trigger a new build. I didn't realize non pixie-io/maintainers could trigger pixie-io-buildbot.

@ddelnano
Copy link
Copy Markdown
Member Author

ddelnano commented Apr 6, 2023

@pixie-io-buildbot test this please

@JamesMBartlett
Copy link
Copy Markdown
Member

/perf nightly

Copy link
Copy Markdown
Member

@JamesMBartlett JamesMBartlett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, going to wait on the perf eval

openssl_trace_state_->get_value(kOpenSSLTraceStatusIdx, error_code);

if (error_code == kOpenSSLMismatchedFDsDetected) {
openssl_trace_mismatched_fds_counter_.Increment();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it will undercount mismatched fds, if they occur more than once in between calls to this func. I think you said we don't expect this to ever happen so maybe its fine if we undercount, but just want to double check.

Copy link
Copy Markdown
Member Author

@ddelnano ddelnano Apr 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Undercounting is fine. We just need to be able to detect if it happens despite our expectation that it should never occur (assuming our understanding of the approach taken in #1123 is correct). Appreciate the detailed review 👍

Copy link
Copy Markdown
Member

@JamesMBartlett JamesMBartlett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The perf eval died, I think because cloud went down, but one of the 3 experiments worked: https://lookerstudio.google.com/u/0/reporting/9701de3b-f906-4dd2-a1e9-48ca0b1e07e6/page/p_g7fj6pf4yc?params=%7B%22experiment_ids%22:%22e717cae1-d763-4389-baec-3928c4eacba6%22%7D

It looks the same as what I would expect, so its probably good to go. Also, the other 2 experiments that didn't finish don't test TLS traffix so should really matter here.

@JamesMBartlett JamesMBartlett merged commit 40900f1 into pixie-io:main Apr 6, 2023
ddelnano added a commit to ddelnano/pixie that referenced this pull request May 1, 2023
Signed-off-by: Dom Del Nano <ddelnano@pixielabs.ai>
ddelnano added a commit to ddelnano/pixie that referenced this pull request May 1, 2023
…ECK` and metrics for error conditions (pixie-io#1168)

Summary: Fix perf profiler bug causing higher BPF map utilization and
add `DCHECK` and metrics for error conditions

The perf profiler connector (and soon the socket trace connector -
pixie-io#1161) record error conditions in a BPF map. While the perf profiler has
instrumented this value for a while, we haven't leveraged using this
data in anyway. This change `DCHECK`s the error condition in addition to
adding prometheus metrics.

This `DCHECK` caught a bug in our map switching logic. The perf profiler
has an "A" and "B" map and alternates between them. Prior to this change
we were using the A map exclusively since our variable for B was
accidentally pointing to A
([source](https://github.com/pixie-io/pixie/blob/cf071f3608de5500213cef05ddb1bba88544ea66/src/stirling/source_connectors/perf_profiler/bcc_bpf/profiler.c#L64-L65)).

Relevant Issues: N/A

Type of change: /kind bug

Test Plan: Verified the following:
- [x] `DCHECK` triggers on the first commit of this PR
([P351](https://phab.corp.pixielabs.ai/P351))
- [x] Verified that the overflow prometheus metric value is incremented
when the overflow condition occurs
([P350](https://phab.corp.pixielabs.ai/P350))
- [x] `DCHECK` no longer occurs with `profiler.c` fix

Changelog Message:
- Fixed a bug that may lead to missing stack frames for the continuous
profiler. We believe this situation is unlikely to occur since our
buffers are sized with sufficient margin.

---------

Signed-off-by: Dom Del Nano <ddelnano@pixielabs.ai>
ddelnano added a commit to ddelnano/pixie that referenced this pull request May 1, 2023
…tls tracing method (pixie-io#1161)

Summary: Instrument OpenSSL tracing to detect validity of assumptions
for new tls tracing method

After discussing pixie-io#1123 with @oazizi000 and @etep, we decided that invest
in stronger validation that the assumptions for that PR and our future
tls tracing are valid. Instead of relying on struct offsets of user
space data structures, the new tracing method will access a connection's
socket fd from the underlying socket syscalls during `SSL_write` /
`SSL_read` calls.

This technique should only be compatible with "BIO native" OpenSSL use
cases, which are the OpenSSL use cases Pixie supports today. BIO native
means that a compatible application uses a
[BIO](https://wiki.openssl.org/index.php/BIO) provided by OpenSSL (via
[SSL_set_fd](https://www.openssl.org/docs/manmaster/man3/SSL_set_fd.html))
and results in OpenSSL issuing read/write syscalls on your behalf for
the underlying socket.

There are two situations we wanted to understand prior to proceeding
with pixie-io#1123: how custom BIO implementations behave (netty, nodejs, etc)
and detecting unrelated (non socket) syscalls while `SSL_write` and
`SSL_read` are on the stack. The former was verified with an experiment
based on this change and is described in the Test Plan section below. We
believe the latter should not occur, but the changes in this PR
instrument this situation in order to detect if doest occur. Assuming
this condition isn't encountered, we will proceed with pixie-io#1132 (with the
minor changes learned from this experiment).

Relevant Issues: pixie-io#692

Type of change: /kind feature

Test Plan: Verified the following:
- [x] `DCHECK` added to `socket_trace_connector` does not detect
mismatched file descriptors for `openssl_trace_bpf_test` and
`netty_tls_trace_bpf_test` ([P345](https://phab.corp.pixielabs.ai/P345)
-- openssl_trace_bpf_test is still disabled - pixie-io#699)
- [x] Verified with a BPF histogram that the number of active syscalls
within SSL_write/SSL_read are non zero for BIO native use cases and zero
for non BIO native cases (nodejs, netty). See more details in pixie-io#1160

---------

Signed-off-by: Dom Del Nano <ddelnano@pixielabs.ai>
ddelnano added a commit to ddelnano/pixie that referenced this pull request May 1, 2023
Signed-off-by: Dom Del Nano <ddelnano@pixielabs.ai>
JamesMBartlett pushed a commit that referenced this pull request May 1, 2023
…ggle) (#1123)

Summary: Implement new tls tracing method behind stirling cli flag
(feature toggle)

The tls tracing method added in this PR determines a connection's
identity (socket fd) through a different mechanism from our existing
tracing. Instead of relying on struct offsets of user space data
structures, it accesses the socket fd via the underlying socket syscalls
while `SSL_write` / `SSL_read` calls occur. This is a prerequisite to
support BoringSSL because its rolling release style makes the previous
method of user space offsets untenable. This has the added benefit of
reducing our maintenance cost for our existing OpenSSL tracing. Assuming
future versions of OpenSSL maintain the same contract, we will not
require any code changes to support them -- Pixie will gain OpenSSL v3
support once this new tracing is the default (as mentioned in the later
testing).

Our assumption is that the applications that explicitly set the socket
fd on the SSL struct (the applications supported by our previous tracing
technique) --
[nginx](https://github.com/nginx/nginx/blob/dfe70f74a3558f05142fb552cea239add123d414/src/event/ngx_event_openssl.c#L1696),
[python](https://github.com/python/cpython/blob/e375bff03736f809fbc234010c087ef9d7e0d384/Modules/_ssl.c#L836)
all use Openssl with its native BIO interface. In order to verify that
assumption, this change will be feature flagged and monitored carefully
as its enabled on internal clusters. The existing [conn_stats_bytes
metric](https://github.com/pixie-io/pixie/blob/f45ced1803e6e44406f20f1171c15a24f4d5a17a/src/stirling/source_connectors/socket_tracer/metrics.cc#L62)
will be monitored for volume of tls traffic traced to verify there is no
loss of instrumentation coverage between the new and old method.

Relevant Issues: #692

Type of change: /kind feature

Test Plan: Verified the following
- [x] New test passes with `enable_openssl_v3_testing` bool flag enabled
([P336](https://phab.corp.pixielabs.ai/P336)). This verifies that the
new tracing technique and the feature toggle are functional since our
previous tracing does not support OpenSSL v3
- [x] Existing `openssl_trace_bpf_test` and `netty_tls_trace_bpf_test`
passes ([P335](https://phab.corp.pixielabs.ai/P335)). This is explicitly
mentioned since `openssl_trace_bpf_test` is still disabled #699
- [x] Verified the `bool_flag` added from PR feedback conditionally
enables the OpenSSL v3 testing
([P337](https://phab.corp.pixielabs.ai/P337))
- [x] Validate metrics from #1161 to verify our assumptions are correct

---------

Signed-off-by: Dom Del Nano <ddelnano@pixielabs.ai>
Signed-off-by: Dom Del Nano <ddelnano@gmail.com>
vihangm pushed a commit that referenced this pull request May 8, 2023
Summary: Remove update to BPF map not used in status quo tls tracing

In order to vet the new style of tls tracing, we introduced a mechanism
for detecting mismatched fds (#1161). This instrumented all of our tls
tracing when it was first developed. When the new method of tls tracing
was introduced, we removed the mismatched fd detection from the status
quo tls tracing (#1123), however, this BPF map update was missed in that
refactor (#1123).

Relevant Issues: #692

Type of change: /kind bug

Test Plan: Existing tests pass

Signed-off-by: Dom Del Nano <ddelnano@pixielabs.ai>
ddelnano added a commit to k8sstormcenter/pixie that referenced this pull request Feb 25, 2026
…ECK` and metrics for error conditions (pixie-io#1168)

Summary: Fix perf profiler bug causing higher BPF map utilization and
add `DCHECK` and metrics for error conditions

The perf profiler connector (and soon the socket trace connector -
pixie-io#1161) record error conditions in a BPF map. While the perf profiler has
instrumented this value for a while, we haven't leveraged using this
data in anyway. This change `DCHECK`s the error condition in addition to
adding prometheus metrics.

This `DCHECK` caught a bug in our map switching logic. The perf profiler
has an "A" and "B" map and alternates between them. Prior to this change
we were using the A map exclusively since our variable for B was
accidentally pointing to A
([source](https://github.com/pixie-io/pixie/blob/cf071f3608de5500213cef05ddb1bba88544ea66/src/stirling/source_connectors/perf_profiler/bcc_bpf/profiler.c#L64-L65)).

Relevant Issues: N/A

Type of change: /kind bug

Test Plan: Verified the following:
- [x] `DCHECK` triggers on the first commit of this PR
([P351](https://phab.corp.pixielabs.ai/P351))
- [x] Verified that the overflow prometheus metric value is incremented
when the overflow condition occurs
([P350](https://phab.corp.pixielabs.ai/P350))
- [x] `DCHECK` no longer occurs with `profiler.c` fix

Changelog Message:
- Fixed a bug that may lead to missing stack frames for the continuous
profiler. We believe this situation is unlikely to occur since our
buffers are sized with sufficient margin.

---------

Signed-off-by: Dom Del Nano <ddelnano@pixielabs.ai>
ddelnano added a commit to k8sstormcenter/pixie that referenced this pull request Feb 25, 2026
…tls tracing method (pixie-io#1161)

Summary: Instrument OpenSSL tracing to detect validity of assumptions
for new tls tracing method

After discussing pixie-io#1123 with @oazizi000 and @etep, we decided that invest
in stronger validation that the assumptions for that PR and our future
tls tracing are valid. Instead of relying on struct offsets of user
space data structures, the new tracing method will access a connection's
socket fd from the underlying socket syscalls during `SSL_write` /
`SSL_read` calls.

This technique should only be compatible with "BIO native" OpenSSL use
cases, which are the OpenSSL use cases Pixie supports today. BIO native
means that a compatible application uses a
[BIO](https://wiki.openssl.org/index.php/BIO) provided by OpenSSL (via
[SSL_set_fd](https://www.openssl.org/docs/manmaster/man3/SSL_set_fd.html))
and results in OpenSSL issuing read/write syscalls on your behalf for
the underlying socket.

There are two situations we wanted to understand prior to proceeding
with pixie-io#1123: how custom BIO implementations behave (netty, nodejs, etc)
and detecting unrelated (non socket) syscalls while `SSL_write` and
`SSL_read` are on the stack. The former was verified with an experiment
based on this change and is described in the Test Plan section below. We
believe the latter should not occur, but the changes in this PR
instrument this situation in order to detect if doest occur. Assuming
this condition isn't encountered, we will proceed with pixie-io#1132 (with the
minor changes learned from this experiment).

Relevant Issues: pixie-io#692

Type of change: /kind feature

Test Plan: Verified the following:
- [x] `DCHECK` added to `socket_trace_connector` does not detect
mismatched file descriptors for `openssl_trace_bpf_test` and
`netty_tls_trace_bpf_test` ([P345](https://phab.corp.pixielabs.ai/P345)
-- openssl_trace_bpf_test is still disabled - pixie-io#699)
- [x] Verified with a BPF histogram that the number of active syscalls
within SSL_write/SSL_read are non zero for BIO native use cases and zero
for non BIO native cases (nodejs, netty). See more details in pixie-io#1160

---------

Signed-off-by: Dom Del Nano <ddelnano@pixielabs.ai>
ddelnano added a commit to k8sstormcenter/pixie that referenced this pull request Feb 25, 2026
…ggle) (pixie-io#1123)

Summary: Implement new tls tracing method behind stirling cli flag
(feature toggle)

The tls tracing method added in this PR determines a connection's
identity (socket fd) through a different mechanism from our existing
tracing. Instead of relying on struct offsets of user space data
structures, it accesses the socket fd via the underlying socket syscalls
while `SSL_write` / `SSL_read` calls occur. This is a prerequisite to
support BoringSSL because its rolling release style makes the previous
method of user space offsets untenable. This has the added benefit of
reducing our maintenance cost for our existing OpenSSL tracing. Assuming
future versions of OpenSSL maintain the same contract, we will not
require any code changes to support them -- Pixie will gain OpenSSL v3
support once this new tracing is the default (as mentioned in the later
testing).

Our assumption is that the applications that explicitly set the socket
fd on the SSL struct (the applications supported by our previous tracing
technique) --
[nginx](https://github.com/nginx/nginx/blob/dfe70f74a3558f05142fb552cea239add123d414/src/event/ngx_event_openssl.c#L1696),
[python](https://github.com/python/cpython/blob/e375bff03736f809fbc234010c087ef9d7e0d384/Modules/_ssl.c#L836)
all use Openssl with its native BIO interface. In order to verify that
assumption, this change will be feature flagged and monitored carefully
as its enabled on internal clusters. The existing [conn_stats_bytes
metric](https://github.com/pixie-io/pixie/blob/f45ced1803e6e44406f20f1171c15a24f4d5a17a/src/stirling/source_connectors/socket_tracer/metrics.cc#L62)
will be monitored for volume of tls traffic traced to verify there is no
loss of instrumentation coverage between the new and old method.

Relevant Issues: pixie-io#692

Type of change: /kind feature

Test Plan: Verified the following
- [x] New test passes with `enable_openssl_v3_testing` bool flag enabled
([P336](https://phab.corp.pixielabs.ai/P336)). This verifies that the
new tracing technique and the feature toggle are functional since our
previous tracing does not support OpenSSL v3
- [x] Existing `openssl_trace_bpf_test` and `netty_tls_trace_bpf_test`
passes ([P335](https://phab.corp.pixielabs.ai/P335)). This is explicitly
mentioned since `openssl_trace_bpf_test` is still disabled pixie-io#699
- [x] Verified the `bool_flag` added from PR feedback conditionally
enables the OpenSSL v3 testing
([P337](https://phab.corp.pixielabs.ai/P337))
- [x] Validate metrics from pixie-io#1161 to verify our assumptions are correct

---------

Signed-off-by: Dom Del Nano <ddelnano@pixielabs.ai>
Signed-off-by: Dom Del Nano <ddelnano@gmail.com>
ddelnano added a commit to k8sstormcenter/pixie that referenced this pull request Feb 25, 2026
…#1298)

Summary: Remove update to BPF map not used in status quo tls tracing

In order to vet the new style of tls tracing, we introduced a mechanism
for detecting mismatched fds (pixie-io#1161). This instrumented all of our tls
tracing when it was first developed. When the new method of tls tracing
was introduced, we removed the mismatched fd detection from the status
quo tls tracing (pixie-io#1123), however, this BPF map update was missed in that
refactor (pixie-io#1123).

Relevant Issues: pixie-io#692

Type of change: /kind bug

Test Plan: Existing tests pass

Signed-off-by: Dom Del Nano <ddelnano@pixielabs.ai>
ddelnano added a commit to k8sstormcenter/pixie that referenced this pull request Feb 25, 2026
…ECK` and metrics for error conditions (pixie-io#1168)

Summary: Fix perf profiler bug causing higher BPF map utilization and
add `DCHECK` and metrics for error conditions

The perf profiler connector (and soon the socket trace connector -
pixie-io#1161) record error conditions in a BPF map. While the perf profiler has
instrumented this value for a while, we haven't leveraged using this
data in anyway. This change `DCHECK`s the error condition in addition to
adding prometheus metrics.

This `DCHECK` caught a bug in our map switching logic. The perf profiler
has an "A" and "B" map and alternates between them. Prior to this change
we were using the A map exclusively since our variable for B was
accidentally pointing to A
([source](https://github.com/pixie-io/pixie/blob/cf071f3608de5500213cef05ddb1bba88544ea66/src/stirling/source_connectors/perf_profiler/bcc_bpf/profiler.c#L64-L65)).

Relevant Issues: N/A

Type of change: /kind bug

Test Plan: Verified the following:
- [x] `DCHECK` triggers on the first commit of this PR
([P351](https://phab.corp.pixielabs.ai/P351))
- [x] Verified that the overflow prometheus metric value is incremented
when the overflow condition occurs
([P350](https://phab.corp.pixielabs.ai/P350))
- [x] `DCHECK` no longer occurs with `profiler.c` fix

Changelog Message:
- Fixed a bug that may lead to missing stack frames for the continuous
profiler. We believe this situation is unlikely to occur since our
buffers are sized with sufficient margin.

---------

Signed-off-by: Dom Del Nano <ddelnano@pixielabs.ai>
ddelnano added a commit to k8sstormcenter/pixie that referenced this pull request Feb 25, 2026
…tls tracing method (pixie-io#1161)

Summary: Instrument OpenSSL tracing to detect validity of assumptions
for new tls tracing method

After discussing pixie-io#1123 with @oazizi000 and @etep, we decided that invest
in stronger validation that the assumptions for that PR and our future
tls tracing are valid. Instead of relying on struct offsets of user
space data structures, the new tracing method will access a connection's
socket fd from the underlying socket syscalls during `SSL_write` /
`SSL_read` calls.

This technique should only be compatible with "BIO native" OpenSSL use
cases, which are the OpenSSL use cases Pixie supports today. BIO native
means that a compatible application uses a
[BIO](https://wiki.openssl.org/index.php/BIO) provided by OpenSSL (via
[SSL_set_fd](https://www.openssl.org/docs/manmaster/man3/SSL_set_fd.html))
and results in OpenSSL issuing read/write syscalls on your behalf for
the underlying socket.

There are two situations we wanted to understand prior to proceeding
with pixie-io#1123: how custom BIO implementations behave (netty, nodejs, etc)
and detecting unrelated (non socket) syscalls while `SSL_write` and
`SSL_read` are on the stack. The former was verified with an experiment
based on this change and is described in the Test Plan section below. We
believe the latter should not occur, but the changes in this PR
instrument this situation in order to detect if doest occur. Assuming
this condition isn't encountered, we will proceed with pixie-io#1132 (with the
minor changes learned from this experiment).

Relevant Issues: pixie-io#692

Type of change: /kind feature

Test Plan: Verified the following:
- [x] `DCHECK` added to `socket_trace_connector` does not detect
mismatched file descriptors for `openssl_trace_bpf_test` and
`netty_tls_trace_bpf_test` ([P345](https://phab.corp.pixielabs.ai/P345)
-- openssl_trace_bpf_test is still disabled - pixie-io#699)
- [x] Verified with a BPF histogram that the number of active syscalls
within SSL_write/SSL_read are non zero for BIO native use cases and zero
for non BIO native cases (nodejs, netty). See more details in pixie-io#1160

---------

Signed-off-by: Dom Del Nano <ddelnano@pixielabs.ai>
ddelnano added a commit to k8sstormcenter/pixie that referenced this pull request Feb 25, 2026
…ggle) (pixie-io#1123)

Summary: Implement new tls tracing method behind stirling cli flag
(feature toggle)

The tls tracing method added in this PR determines a connection's
identity (socket fd) through a different mechanism from our existing
tracing. Instead of relying on struct offsets of user space data
structures, it accesses the socket fd via the underlying socket syscalls
while `SSL_write` / `SSL_read` calls occur. This is a prerequisite to
support BoringSSL because its rolling release style makes the previous
method of user space offsets untenable. This has the added benefit of
reducing our maintenance cost for our existing OpenSSL tracing. Assuming
future versions of OpenSSL maintain the same contract, we will not
require any code changes to support them -- Pixie will gain OpenSSL v3
support once this new tracing is the default (as mentioned in the later
testing).

Our assumption is that the applications that explicitly set the socket
fd on the SSL struct (the applications supported by our previous tracing
technique) --
[nginx](https://github.com/nginx/nginx/blob/dfe70f74a3558f05142fb552cea239add123d414/src/event/ngx_event_openssl.c#L1696),
[python](https://github.com/python/cpython/blob/e375bff03736f809fbc234010c087ef9d7e0d384/Modules/_ssl.c#L836)
all use Openssl with its native BIO interface. In order to verify that
assumption, this change will be feature flagged and monitored carefully
as its enabled on internal clusters. The existing [conn_stats_bytes
metric](https://github.com/pixie-io/pixie/blob/f45ced1803e6e44406f20f1171c15a24f4d5a17a/src/stirling/source_connectors/socket_tracer/metrics.cc#L62)
will be monitored for volume of tls traffic traced to verify there is no
loss of instrumentation coverage between the new and old method.

Relevant Issues: pixie-io#692

Type of change: /kind feature

Test Plan: Verified the following
- [x] New test passes with `enable_openssl_v3_testing` bool flag enabled
([P336](https://phab.corp.pixielabs.ai/P336)). This verifies that the
new tracing technique and the feature toggle are functional since our
previous tracing does not support OpenSSL v3
- [x] Existing `openssl_trace_bpf_test` and `netty_tls_trace_bpf_test`
passes ([P335](https://phab.corp.pixielabs.ai/P335)). This is explicitly
mentioned since `openssl_trace_bpf_test` is still disabled pixie-io#699
- [x] Verified the `bool_flag` added from PR feedback conditionally
enables the OpenSSL v3 testing
([P337](https://phab.corp.pixielabs.ai/P337))
- [x] Validate metrics from pixie-io#1161 to verify our assumptions are correct

---------

Signed-off-by: Dom Del Nano <ddelnano@pixielabs.ai>
Signed-off-by: Dom Del Nano <ddelnano@gmail.com>
ddelnano added a commit to k8sstormcenter/pixie that referenced this pull request Feb 25, 2026
…#1298)

Summary: Remove update to BPF map not used in status quo tls tracing

In order to vet the new style of tls tracing, we introduced a mechanism
for detecting mismatched fds (pixie-io#1161). This instrumented all of our tls
tracing when it was first developed. When the new method of tls tracing
was introduced, we removed the mismatched fd detection from the status
quo tls tracing (pixie-io#1123), however, this BPF map update was missed in that
refactor (pixie-io#1123).

Relevant Issues: pixie-io#692

Type of change: /kind bug

Test Plan: Existing tests pass

Signed-off-by: Dom Del Nano <ddelnano@pixielabs.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants