Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix TraceableTcpTransportChannel classcast exception #3514

Conversation

Gaganjuneja
Copy link
Contributor

@Gaganjuneja Gaganjuneja commented Oct 10, 2023

Description

Fix the TraceableTcpTransportChannel class cast exception when enabling the feature opensearch.experimental.feature.telemetry.enabled. Issue reported here - #3464 (comment)

Issues Resolved

#3515

Reference Chnage - https://github.com/opensearch-project/OpenSearch/blob/8e5e54e87f6f86a2425a5c767e208767ba3df14c/server/src/main/java/org/opensearch/transport/RequestHandlerRegistry.java#L101

Is this a backport? If so, please add backport PR # and/or commits #

Testing

[Please provide details of testing done: unit testing, integration testing and manual testing]

Check List

  • New functionality includes testing
  • New functionality has been documented
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@codecov
Copy link

codecov bot commented Oct 10, 2023

Codecov Report

Merging #3514 (f7cae91) into main (c285e87) will increase coverage by 0.15%.
Report is 21 commits behind head on main.
The diff coverage is 66.66%.

Impacted file tree graph

@@             Coverage Diff              @@
##               main    #3514      +/-   ##
============================================
+ Coverage     64.77%   64.92%   +0.15%     
- Complexity     3623     3634      +11     
============================================
  Files           281      282       +1     
  Lines         20586    20598      +12     
  Branches       3400     3385      -15     
============================================
+ Hits          13334    13374      +40     
+ Misses         5552     5542      -10     
+ Partials       1700     1682      -18     
Files Coverage Δ
...urity/ssl/transport/SecuritySSLRequestHandler.java 61.72% <100.00%> (-0.50%) ⬇️
.../opensearch/security/filter/OpenSearchRequest.java 76.92% <50.00%> (-2.03%) ⬇️

... and 39 files with indirect coverage changes

@Gaganjuneja
Copy link
Contributor Author

Gaganjuneja commented Oct 10, 2023

@reta, @peternied, @cwperks Please take a look at it. We need to merge this fix in 2.11

@cwperks
Copy link
Member

cwperks commented Oct 10, 2023

@Gaganjuneja There is another issue with the Telemetry experimental feature and the security plugin. The way that the TraceableHttpChannel wraps the underlying channel here is problematic. Security needed to recently refactor where auth was performed from the last step of the netty pipeline to further up the pipeline and needs access to the netty channel.

You can see an example here of how the security plugin would access the channel: https://github.com/opensearch-project/security/blob/main/src/main/java/org/opensearch/security/filter/SecurityRestFilter.java#L135

Copy link
Member

@peternied peternied left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Gaganjuneja Thanks for creating this PR - if this is breaking it seems like we are missing some tests for these scenarios.

final TcpChannel inner = ((TcpTransportChannel) channel).getChannel();
nettyChannel = (Netty4TcpChannel) ((BaseTcpTransportChannel) inner).getChannel();
} else if (channel instanceof BaseTcpTransportChannel) {
final TcpChannel inner = ((BaseTcpTransportChannel) channel).getChannel();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets add a test(s) that confirms this fix is resolved.

If you are looking for an example of how these tests could be written, take a look at https://github.com/opensearch-project/security/blob/main/src/integrationTest/java/org/opensearch/security/http/OnBehalfOfJwtAuthenticationTest.java that might provide some guidance on cluster configuration and a basic setup.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@peternied, I have written a Integ test case and its failing for 2 more issues. Where RestRequest::getHttpChannel is being type casted to Netty4HttpChannel. At one place this is already been reported by @cwperks and in addition to that one check is here

|| !(underlyingRequest.getHttpChannel() instanceof Netty4HttpChannel)) {

Copy link
Collaborator

@reta reta Oct 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Gaganjuneja @peternied I see at least 2 issues with this code:

  • it very coupled with Netty, that should not be the case
  • the issue will come back when we would instrument TaskTransportChannel, that should not be the case

Would I would like to suggest: add get (and possibly put, have to look into it) methods to HttpChannel and TransportChannel:

public T get(String attribute, Class<T> clazz);

With that, we could remove || !(underlyingRequest.getHttpChannel() instanceof Netty4HttpChannel)) { from OpenSearchRequest since it will be null value anyway. @Gaganjuneja do you want me to push this change to core?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @reta, Yes it would be more cleaner approach.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is one more change where we are trying to get the SSLHandler from the inbound and outbound pipeline

SslHandler sslhandler = (SslHandler) httpChannel.getNettyChannel().pipeline().get("ssl_http");

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@Gaganjuneja Gaganjuneja Oct 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One clarification, Please pardon me for the naive question. How will we access the channel attributes (NettyAttribute::popFrom) for authentication? Or the old implementation should be suffice.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems to be yet another case (attr vs handler, looking into that), thanks!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One clarification, Please pardon me for the naive question. How will we access the channel attributes (NettyAttribute::popFrom) for authentication? Or the old implementation should be suffice.

@Gaganjuneja chatted with Craig (@cwperks) offline, it seems like we may try to pursue a different path for early termination of the the request, I would suggest we do nothing for channel attributes (NettyAttribute::*) now since it is likely to go away (sadly no chances to fix that 2.11). It should not be an issue, tracing is an experimental feature that is expected to have hiccups.

Copy link
Collaborator

@reta reta Oct 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Gaganjuneja I've updated the pull request with respect to core changes

@Gaganjuneja
Copy link
Contributor Author

@Gaganjuneja There is another issue with the Telemetry experimental feature and the security plugin. The way that the TraceableHttpChannel wraps the underlying channel here is problematic. Security needed to recently refactor where auth was performed from the last step of the netty pipeline to further up the pipeline and needs access to the netty channel.

You can see an example here of how the security plugin would access the channel: https://github.com/opensearch-project/security/blob/main/src/main/java/org/opensearch/security/filter/SecurityRestFilter.java#L135

@reta, This seems to be new changes which will be bypassed for TraceableHttpChannel always. I am thinking, if we can add a new method to the HttpChannel::getType and let all the implementations implement it.

default String getType() {
    return ChannelType.DEFAULT;
}

For Netty4

public String getType() {
    return ChannelType.NETTY4;
}

@reta
Copy link
Collaborator

reta commented Oct 10, 2023

@reta, This seems to be new changes which will be bypassed for TraceableHttpChannel always. I am thinking, if we can add a new method to the HttpChannel::getType and let all the implementations implement it.

@Gaganjuneja I sadly cannot look at the issue today, travelling back home, but it does not look right - the HTTP transport should have no dependency on the engine behind it, if any, I will take a look tomorrow morning

@Gaganjuneja
Copy link
Contributor Author

@reta, This seems to be new changes which will be bypassed for TraceableHttpChannel always. I am thinking, if we can add a new method to the HttpChannel::getType and let all the implementations implement it.

@Gaganjuneja I sadly cannot look at the issue today, travelling back home, but it does not look right - the HTTP transport should have no dependency on the engine behind it, if any, I will take a look tomorrow morning

Sure, sorry for bothering you!

@Gaganjuneja
Copy link
Contributor Author

@reta, This seems to be new changes which will be bypassed for TraceableHttpChannel always. I am thinking, if we can add a new method to the HttpChannel::getType and let all the implementations implement it.

@Gaganjuneja I sadly cannot look at the issue today, travelling back home, but it does not look right - the HTTP transport should have no dependency on the engine behind it, if any, I will take a look tomorrow morning

Other simplest option could be is to have TraceableHttpChannel::getDelegateChannel method and inside the security plugin we can put an additional check

HttpChannel channel = restRequest.getChannel();
if(channel instanceof TraceableHttpChannel){
   channel = ((TraceableHttpChannel)channel).getDelegateChannel();
}

//Now the Netty4 check could proceed.

Signed-off-by: Gagan Juneja <gagandeepjuneja@gmail.com>
@reta reta force-pushed the fix-TraceableTcpTransportChannel-classcast branch from 6eb05ba to d8451c7 Compare October 12, 2023 12:30
…PIs instead of typecasting

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>
@reta reta force-pushed the fix-TraceableTcpTransportChannel-classcast branch from d8451c7 to 49eda17 Compare October 12, 2023 12:36
cwperks
cwperks previously approved these changes Oct 12, 2023
willyborankin
willyborankin previously approved these changes Oct 12, 2023
Copy link
Member

@peternied peternied left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - this change looks much cleaner. Happy to re-review after an integration test has been added.

@reta
Copy link
Collaborator

reta commented Oct 12, 2023

Thanks - this change looks much cleaner. Happy to re-review after an integration test has been added.

@Gaganjuneja are you working on that?

@Gaganjuneja
Copy link
Contributor Author

Thanks - this change looks much cleaner. Happy to re-review after an integration test has been added.

@Gaganjuneja are you working on that?

@reta, I will pick this up very first thing next week.

Signed-off-by: Gagan Juneja <gjjuneja@amazon.com>
@Gaganjuneja
Copy link
Contributor Author

Thanks - this change looks much cleaner. Happy to re-review after an integration test has been added.

@Gaganjuneja are you working on that?

@reta, I will pick this up very first thing next week.

There is nothing we can test about telemetry. Only thing is to enable it and see if cluster is coming up fine as far as the integration with security plugin is concerned. I have enable it for SearchOperationTest.java and now it should be able to catch if anything is breaking due to change in the Telemetry side.

Gagan Juneja added 2 commits October 19, 2023 17:03
Signed-off-by: Gagan Juneja <gjjuneja@amazon.com>
Signed-off-by: Gagan Juneja <gjjuneja@amazon.com>
@@ -371,6 +373,16 @@ public class SearchOperationTest {
USER_ALLOWED_TO_PERFORM_INDEX_OPERATIONS_ON_SELECTED_INDICES,
USER_ALLOWED_TO_CREATE_INDEX
)
.nodeSettings(
Copy link
Collaborator

@reta reta Oct 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This flag is noop if plugin is not installed, right? I think what we could do is to enhance (or add new) GA which does full plugin install test to have security and opentelemetry plugin installed together, @peternied does it sound right to you?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - I would expect that behavior. @Gaganjuneja can you help us understand what this setting controls?

Copy link
Contributor Author

@Gaganjuneja Gaganjuneja Oct 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes @reta you are right. but this would at least create the Traceable wrappers.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes @reta you are right. but this would at least create the Traceable wrappers.

how? it will use NOOP tracer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yes. After this recent change tracer:isTracingEnabled, it wont.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've moved this flag to its own class - but I was not able to reproduce a failure with the other changes in this CR reverted.

Please pull the latest changes from your fork and test locally until the test can fail if you'd reverted the other changes - its possible I've got the feature flag configured incorrectly.

@@ -371,6 +373,16 @@ public class SearchOperationTest {
USER_ALLOWED_TO_PERFORM_INDEX_OPERATIONS_ON_SELECTED_INDICES,
USER_ALLOWED_TO_CREATE_INDEX
)
.nodeSettings(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - I would expect that behavior. @Gaganjuneja can you help us understand what this setting controls?

@@ -371,6 +373,9 @@ public class SearchOperationTest {
USER_ALLOWED_TO_PERFORM_INDEX_OPERATIONS_ON_SELECTED_INDICES,
USER_ALLOWED_TO_CREATE_INDEX
)
.nodeSettings(
Map.of(FeatureFlags.TELEMETRY_SETTING.getKey(), true, TelemetrySettings.TRACER_FEATURE_ENABLED_SETTING.getKey(), true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should these tests use tracing vs not use tracing?

As I see it the tracing platform feature certainly could have tests that are in the security plugin, but I would expect all tests to behave the same if the tracing plugin is operational vs noop. If that was the case, then I'd expect just a couple of tests that verify specific security plugin + tracing behavior works as opposed to flagging different behavior for whole test suites.

If we believe that we need whole suites of duplicate testing to ensure confidence I might not understand how tracing is interacting with the security plugin and would love more detail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@peternied, As mentioned by @reta these settings are not going to work and tracing would be noop in this scenario. In case of NoopTracer we tried to implement in a way that its shouldn't alter the execution pattern by unnecessarily introducing the tracer related core changes (Tracing being an experimental feature eg. is TraceableHttpChannel here. If tracer is noop we are not wrapping the HttpChannel hence its working with FeatureFlag disabled).

Now to test the telemetry plugin and security plugin we need to install them together and execute the tests.

To validate the telemetry working we need to have some search/indexing requests and validate if traces are being generated and which is not very straight forward.

Copy link
Member

@peternied peternied Oct 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To validate the telemetry working we need to have some search/indexing requests and validate if traces are being generated and which is not very straight forward.

While it might not be, I think that is where we will need to land before - think of it as a requirement to release the tracing feature. Could you create a new issue for end to end validation and include what sanity tests might be needed alongside tests that are security specific, then link it to where all the tracing deliverables are being tracked?

I don't want to bog down this fix if its more appropriate to step back and confirm the approach to then execute on it. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are we testing security with other plugins today?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more thing, Even if we write the integration test, it's still going to fail bcauase the issue is partially fixed and the check inside NettyAttribute.java is still there -

public static <T> Optional<T> popFrom(final RestRequest request, final AttributeKey<T> attribute) {
        if (request.getHttpChannel() instanceof Netty4HttpChannel) {
            Channel nettyChannel = ((Netty4HttpChannel) request.getHttpChannel()).getNettyChannel();
            return Optional.ofNullable(nettyChannel.attr(attribute).getAndSet(null));
        }
        return Optional.empty();
    }

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, correct, this part is on me, thanks for reminding @Gaganjuneja

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Gaganjuneja You can see an example of how anomaly-detection tests with security here: https://github.com/opensearch-project/anomaly-detection/blob/main/.github/workflows/test_security.yml

They run their full integ test suite in a cluster w/ the security plugin installed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more thing, Even if we write the integration test, it's still going to fail bcauase the issue is partially fixed and the check inside NettyAttribute.java is still there -

public static <T> Optional<T> popFrom(final RestRequest request, final AttributeKey<T> attribute) {
        if (request.getHttpChannel() instanceof Netty4HttpChannel) {
            Channel nettyChannel = ((Netty4HttpChannel) request.getHttpChannel()).getNettyChannel();
            return Optional.ofNullable(nettyChannel.attr(attribute).getAndSet(null));
        }
        return Optional.empty();
    }

Ah, correct, this part is on me, thanks for reminding @Gaganjuneja

@reta, What is your opinion, this should be addressed here on in another PR.

@DarshitChanpura, Once the above issue is addressed, I will write the integration test.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reta, What is your opinion, this should be addressed here on in another PR.

Up to you @Gaganjuneja

Signed-off-by: Peter Nied <petern@amazon.com>
@DarshitChanpura
Copy link
Member

@Gaganjuneja would you need any helping getting this across?

@DarshitChanpura
Copy link
Member

@Gaganjuneja are you still working on this?

@peternied
Copy link
Member

Doesn't look like this is being worked on anymore, going to close this out. Please feel free to reopen if you re-engage on this @Gaganjuneja

@peternied peternied closed this Dec 8, 2023
willyborankin pushed a commit that referenced this pull request Jan 5, 2024
…PIs instead of typecasting (#3917)

### Description
This is cherry-pick from
#3514 to use the
channel properties instead of type-casting

### Issues Resolved
Closes  #3911

Is this a backport? If so, please add backport PR # and/or commits #

### Testing
The change is covered by existing test suites

### Check List
- [X] New functionality includes testing
- [X] New functionality has been documented
- [X] Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and
signing off your commits, please check
[here](https://github.com/opensearch-project/OpenSearch/blob/main/CONTRIBUTING.md#developer-certificate-of-origin).

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>
reta added a commit to reta/security that referenced this pull request Jan 5, 2024
…PIs instead of typecasting (opensearch-project#3917)

This is cherry-pick from
opensearch-project#3514 to use the
channel properties instead of type-casting

Closes  opensearch-project#3911

Is this a backport? If so, please add backport PR # and/or commits #

The change is covered by existing test suites

- [X] New functionality includes testing
- [X] New functionality has been documented
- [X] Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and
signing off your commits, please check
[here](https://github.com/opensearch-project/OpenSearch/blob/main/CONTRIBUTING.md#developer-certificate-of-origin).

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>
(cherry picked from commit b038f93)
dlin2028 pushed a commit to dlin2028/security that referenced this pull request May 1, 2024
…PIs instead of typecasting (opensearch-project#3917)

### Description
This is cherry-pick from
opensearch-project#3514 to use the
channel properties instead of type-casting

### Issues Resolved
Closes  opensearch-project#3911

Is this a backport? If so, please add backport PR # and/or commits #

### Testing
The change is covered by existing test suites

### Check List
- [X] New functionality includes testing
- [X] New functionality has been documented
- [X] Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and
signing off your commits, please check
[here](https://github.com/opensearch-project/OpenSearch/blob/main/CONTRIBUTING.md#developer-certificate-of-origin).

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants