Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Transform] improve placement of transform task targetting a remote index #50033

Closed
hendrikmuhs opened this issue Dec 10, 2019 · 8 comments · Fixed by #52712
Closed

[Transform] improve placement of transform task targetting a remote index #50033

hendrikmuhs opened this issue Dec 10, 2019 · 8 comments · Fixed by #52712

Comments

@hendrikmuhs
Copy link
Contributor

hendrikmuhs commented Dec 10, 2019

Transform supports CCS indexes starting with 7.6.

The problem

A transform using remote source requires cluster.remote.connect:true (default) to execute search queries, if cluster.remote.connect:false is set on a node the transform will fail.

Transform validation will disallow preview/create if cluster.remote.connect:false is set to false on the executing node, assuming the situation is the same for all nodes.

However, cluster.remote.connect is a node setting. Worst-case a user has a mixed environment with some nodes enabled and some not. In this case p-task placement is unable to find out and can not choose a data node that has CCS support and validation might fail or not.

The problem applies not only to transform but also to ML, see #46025.

The workaround: Known Limitation

For #43201 this issue is out of scope, in order to use transform CCS you must have remote-support enabled on all data and the master node.

In case remote is disabled on the executing node transform will fail at start with a descriptive error.

A possible Solution: node attributes

Whether cluster.remote.connect is true or false can not be retrieved from cluster state. A nodeinfo call would be possible, but calls all nodes in the cluster and is therefore not considered as an option.

Similar to the ML plugin, we could use node attributes which are available in cluster state. A special attribute could be set if cluster.remote.connect:false at the start of the node, e.g. as part of x-pack core (all features that require remote aware placement are part of x-pack, ignoring that CCS is a core feature (OSS)).

Other solutions

@elastic/es-search any ideas? comments?

@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (:ml/Transform)

@jimczi
Copy link
Contributor

jimczi commented Dec 16, 2019

Similar to the ML plugin, we could use node attributes which are available in cluster state. A special attribute could be set if cluster.remote.connect:false at the start of the node, e.g. as part of x-pack core (all features that require remote aware placement are part of x-pack, ignoring that CCS is a core feature (OSS)).

From my understanding we require the remote cluster to be accessible from any ML node. That's explicit because all _search should run within the ML node. We could apply the same logic to transform (and maybe rollup) and introduce a new attribute like node.ml ? Today any node is eligible for a transform task so it would be helpful imo to leave more room for users to configure where they want to run these background tasks rather than picking nodes based on their connectivity with a remote cluster.

@hendrikmuhs
Copy link
Contributor Author

Similar to the ML plugin, we could use node attributes which are available in cluster state. A special attribute could be set if cluster.remote.connect:false at the start of the node, e.g. as part of x-pack core (all features that require remote aware placement are part of x-pack, ignoring that CCS is a core feature (OSS)).

From my understanding we require the remote cluster to be accessible from any ML node. That's explicit because all _search should run within the ML node. We could apply the same logic to transform (and maybe rollup) and introduce a new attribute like node.ml ?

The problem described in #46025 is as far as I understand about having more than 1 ML node while some are not allowed to connect to a remote cluster. This is for sure an exotic case and probably even a misconfiguration, however that's what we allow today and we saw such a setup at least once in the wild.

Transform can run into the same problem, as it potentially runs on more nodes the problem is more likely to happen (but still exotic). The 2nd concern is usability: We can not reliable check when a user previews, updates or creates a transform.

Today any node is eligible for a transform task so it would be helpful imo to leave more room for users to configure where they want to run these background tasks rather than picking nodes based on their connectivity with a remote cluster.

Transform runs only on data nodes. A dedicated node type for background tasks sounds like a good idea, but is not solving the problem, because you would still be able to configure some of those with remote access and some not. Again usability is involved: it's not possible to check whether all "background task nodes" have remote access or not.

TL/DR

Making the ml/transform/background node setting mutually exclusive to disabling remote connection(meaning the node refuses to start if you enable ml but disable remote) sounds wrong to me. I guess nobody wants that, just mentioning this possibility.

Maybe the discussion should be whether cluster.remote.connect should be a node setting. Why would a user disable remote connection on single data nodes. I think he would rather like to disable remote completely or allow it on certain node types (probably not easily to implement and requires a lot of work).

@jimczi
Copy link
Contributor

jimczi commented Dec 17, 2019

Transform runs only on data nodes. A dedicated node type for background tasks sounds like a good idea, but is not solving the problem, because you would still be able to configure some of those with remote access and some not. Again usability is involved: it's not possible to check whether all "background task nodes" have remote access or not.

I think that that would be reasonable to require users to configure nodes eligible for transform with remote access. Today it is required on all data nodes since that's where Transform can run but it would be more flexible to tag the node eligible to run these tasks like we do for ml. I don't really like the automatic assignment based on the node setting, this opens the door to more specialization and makes it difficult to achieve good repartition even when you have a lot of nodes.

Maybe the discussion should be whether cluster.remote.connect should be a node setting. Why would a user disable remote connection on single data nodes. I think he would rather like to disable remote completely or allow it on certain node types (probably not easily to implement and requires a lot of work).

I think it's ok to allow specialization here but the requirement should be propagated to all features that use remote search. Today clients targets specific machine to coordinate the search, they sometimes exclude data nodes and master eligible nodes, we should have the same for transform;
it runs anywhere by default but you can restrict it to specific nodes if you need to.

@nestermn
Copy link

@hendrikmuhs Hi Hendrik, is there any plans to allow CCS indices in transforms in a nearest future? We are looking for this kind of functionality to support KPI processing for Cloud ECE and ESS environments.

cc: @AlexP-Elastic @suyograo

@benwtrent
Copy link
Member

@nestermn test out the latest 7.x. @hendrikmuhs recently made changes that should be in 7.6 that allows transforms to supportt CCS (reading from remote clusters)

@hendrikmuhs
Copy link
Contributor Author

@nestermn as @benwtrent already said: 7.6, tracking issue is #43201

@nestermn
Copy link

@benwtrent @hendrikmuhs Great to hear. Looking forward to try it

hendrikmuhs pushed a commit that referenced this issue Mar 2, 2020
…orm (#52712)

implement transform node attributes to disable transform on certain nodes and test which nodes are allowed to do remote connections

closes #52200
closes #50033
closes #48734
hendrikmuhs pushed a commit to hendrikmuhs/elasticsearch that referenced this issue Mar 2, 2020
…odes and test which nodes are allowed to do remote connections

closes elastic#52200
closes elastic#50033
closes elastic#48734
hendrikmuhs pushed a commit that referenced this issue Mar 2, 2020
implement transform node attributes to disable transform on certain nodes and
test which nodes are allowed to do remote connections

closes #52200
closes #50033
closes #48734

backport #52712
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants