Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Transform] 8.2 transform can not read from < 8.2.0 CCS source #86716

Closed
pheyos opened this issue May 12, 2022 · 4 comments · Fixed by #86741
Closed

[Transform] 8.2 transform can not read from < 8.2.0 CCS source #86716

pheyos opened this issue May 12, 2022 · 4 comments · Fixed by #86741
Assignees
Labels
>bug :Security/Authentication Logging in, Usernames/passwords, Realms (Native/LDAP/AD/SAML/PKI/etc) Team:Security Meta label for security team v8.2.0

Comments

@pheyos
Copy link
Member

pheyos commented May 12, 2022

Affected version: 8.2.0
Fixed with: 8.2.1

After upgrading the ML QA long running cluster from 8.1.3 to 8.2.0, transforms with CCS source <8.2.0 (8.1.3 and 8.1.1 in our case) are failing with

Failed to retrieve checkpointing info for transform [ccs_monitoring_cluster_data_stream_with_ilm_on] org.elasticsearch.xpack.transform.checkpoint.CheckpointException: Failed to create checkpoint
[...]
Caused by: org.elasticsearch.ElasticsearchSecurityException: failed to verify signed authentication information

The remote cluster log shows related entries:

caught exception while trying to read authentication from request [transport request action [indices:monitor/stats]] java.io.EOFException: null
[...]
Authentication using apikey failed - unable to find apikey with id <REDACTED>

Running an _update request on the transforms to update the headers._xpack_security_authentication didn't fix it.

Mitigation

  • upgrade the CCS remote to >= 8.2.0

or

  • upgrade the cluster to >= 8.2.1
@pheyos pheyos added >bug :ml Machine learning :ml/Transform Transform labels May 12, 2022
@hendrikmuhs
Copy link
Contributor

hendrikmuhs commented May 12, 2022

Thanks @pheyos

I am able to reproduce this issue. It seems to me like a regression originating from #84473. The 8.1 remote doesn't understand the request and fails.

As CCS supports only one minor back, the problem will disappear once the cluster runs on 8.3 or higher, because the remote must run on 8.2 in this case.

For 8.2 I suggest a workaround as part of 8.2.1 where we go into compatibility mode if we catch the exception above. Transform will automatically retry in this case. As there are no further releases planned for 8.1 this is the best option IMO.

Unfortunately this workaround does not work.

@ywangd Please have a look. Can you think of any other workaround?

Lesson learned: We have CCS compat system tests, however we are testing the same cluster versions, I will try to extent it to additionally test against a remote from the previous minor version. -> #86727

@ywangd ywangd self-assigned this May 12, 2022
@ywangd
Copy link
Member

ywangd commented May 12, 2022

It turns out to be a long standing bug that just got surfaced. It is a rare code path and also we didn't have any version difference for Authentication object till 8.2.

The following branch of server transport interceptor does not take connection version into consideration. This leads to an Authentication object of v8.2 being sent to a v8.1.3 node which does not know how to decode it.

} else if (AuthorizationUtils.shouldSetUserBasedOnActionOrigin(threadPool.getThreadContext())) {
AuthorizationUtils.switchUserBasedOnActionOriginAndExecute(
threadPool.getThreadContext(),
securityContext,
(original) -> sendWithUser(
connection,
action,
request,
options,
new ContextRestoreResponseHandler<>(threadPool.getThreadContext().wrapRestorable(original), handler),
sender
)
);

I adjusted the labels and self assigned.

@ywangd ywangd added the :Security/Authentication Logging in, Usernames/passwords, Realms (Native/LDAP/AD/SAML/PKI/etc) label May 12, 2022
@elasticmachine elasticmachine added the Team:Security Meta label for security team label May 12, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-security (Team:Security)

@ywangd ywangd added v8.2.0 and removed :ml Machine learning :ml/Transform Transform labels May 12, 2022
ywangd added a commit to ywangd/elasticsearch that referenced this issue May 12, 2022
The SecurityServerTransportInterceptor class is responsible for writing
authentication header in a wire compatible format before the request
leaving the local node. However, a bug made it ignore the wire version
when setting user based on the action origin. This PR fixes it and adds
relevant tests.

It is an old bug but never manifested itself previously because (1) the
code path is rare enuough and (2) authentication didn't have any version
difference till 8.2.

Resolves: elastic#86716
ywangd added a commit that referenced this issue May 17, 2022
The SecurityServerTransportInterceptor class is responsible for writing
authentication header in a wire compatible format before the request
leaving the local node. However, a bug made it ignore the wire version
when setting user based on the action origin. This PR fixes it and adds
relevant tests.

It is an old bug but never manifested itself previously because (1) the
code path is rare enuough and (2) authentication didn't have any version
difference till 8.2.

Resolves: #86716
ywangd added a commit to ywangd/elasticsearch that referenced this issue May 17, 2022
…6741)

The SecurityServerTransportInterceptor class is responsible for writing
authentication header in a wire compatible format before the request
leaving the local node. However, a bug made it ignore the wire version
when setting user based on the action origin. This PR fixes it and adds
relevant tests.

It is an old bug but never manifested itself previously because (1) the
code path is rare enuough and (2) authentication didn't have any version
difference till 8.2.

Resolves: elastic#86716
elasticsearchmachine pushed a commit that referenced this issue May 17, 2022
…86828)

* Ensure authentication is wire compatible when setting user (#86741)

The SecurityServerTransportInterceptor class is responsible for writing
authentication header in a wire compatible format before the request
leaving the local node. However, a bug made it ignore the wire version
when setting user based on the action origin. This PR fixes it and adds
relevant tests.

It is an old bug but never manifested itself previously because (1) the
code path is rare enuough and (2) authentication didn't have any version
difference till 8.2.

Resolves: #86716

* fix compilation
hendrikmuhs pushed a commit that referenced this issue May 18, 2022
…86727 (#86747)

add testing against the previous minor version

backport #86727
relates #86741
relates #86716
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Security/Authentication Logging in, Usernames/passwords, Realms (Native/LDAP/AD/SAML/PKI/etc) Team:Security Meta label for security team v8.2.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants
@pheyos @ywangd @hendrikmuhs @elasticmachine and others