Fix/30904 cluster formation part2 #32877

alpar-t · 2018-08-15T16:19:26Z

This is a continuation of #32028 which I accidentally merged, then reverted right away due to lack of caffeine.

- This allows to move all all .java files from .groovy. - Will prevent eclipse from tangling up in this setup - make it possible to use Version from Java

…ormation-part1

``` > Task :plugins:ingest-user-agent:listElasticSearchClusters Starting cluster: myTestCluster * myTestCluster: /home/alpar/work/elastic/elasticsearch/plugins/ingest-user-agent/foo Asked to unClaimAndStop myTestCluster, since cluster still has 1 claim it will not be stopped > Task :plugins:ingest-user-agent:testme UP-TO-DATE Stopping myTestCluster, since no of claims is 0 ``` - Meant to auto manage the clusters lifecycle - Add integration test for cluster formation

…ormation-part2

alpar-t · 2018-08-15T16:22:11Z

@rjernst I changed the DSL but kept the task extension. The static method to get it is now moved to it's proper class. There's a lot of boilerplate in java to access it in all, but it I think the task extension mechanism is a natural way to achieve this in Gradle rather than store this in a separate map. The extensions points we use work with tasks, so we just get the extensions from the task to know about all the clusters it claims.

rjernst

Thanks for the changes thus far! One more minor naming change, that I believe we discussed, is to name the plugin and dsl such as “clusters” or “testClusters” (eg TestClustersPlugin, elasticsearch.test-clusters, etc). Cluster formation is sort of a legacy name that I don’t think we should continue with. Additionally, I don’t see why we need an interface to abstract cluster configuration vs node configuration? If we need it in the future, that is fine, but let’s wait until it is needed rather than abstracting prematurely.

My larger concern is still the extension. I think it makes this more complicated (ie harder to follow what the code is doing) than it needs to be. I believe all of the logic can be within the single plugin class, which makes it much easier to see the entire picture. Let me try to describe the solution I’ve been advocating for in more concrete terms.

To start, we need a way to track the number of current uses of each cluster. This can be a simple map by cluster name:

Map<String, Integer> refCounts = new HashMap<>();

Next we need to keep track of which clusters are associated with tasks, and which tasks are associated with clusters. We can do this as local variables within the plugin’s apply method:

Map<String, List<Task>> clusterToTasks = new HashMap<>();
Map<Task, List<ElasticsearchCluster>> taskToCluster = new HashMap<>();

Then the useCluster method on tasks can be a simple extension property that is a closure, just like you have, but the doCall method would be:

clusterToTasks.computeIfAbsent(conf.getName(), k -> new ArrayList()).add(task);
taskToCluster.computIfAbsent(task, k -> new ArrayList()).add(conf);

Now, instead of needing to iterate over all tasks in the task graph, we only need to iterate the uses map, to setup the ref counts:

project.getGradle().getTaskGraph().whenReady(taskExecutionGraph ->
    clusterToTasks.forEach((name, tasks) -> {
        refCounts.put(name, tasks.stream().filter(t -> taskExecutionGraph.hasTask(t)).count());
        for (Task task : tasks) {
            // use a doFirst so that if the task is skipped, it will not trigger the cluster to start
            task.doFirst(t -> taskToCluster.get(task).forEach(cluster -> cluster.start()));
        }
    });
    taskExecutionGraph.afterTask(task -> {
        ElasticsearchCluster cluster = taskToCluster.get(task);
        int uses = refCounts.get(cluster.getName())
        refCounts.put(cluster.getName(), uses - 1);
        if (uses <= 1) {
            // last use, so we can stop the cluster
            cluster.stop();
        }
    });
);

Having all of these hooks registered next to each other makes the logic much easier to follow.

alpar-t · 2018-08-16T13:31:47Z

@rjernst I added a task to #30904 to make sure I don't forget about it. I will do it in the next PR if that's fine with you. I have a large part of the implementation, I would need to do the refactoring there and backport.

alpar-t · 2018-08-16T14:15:16Z

I removed the Task extensions, but kept the listener.
I prefer it it to doFirst because we can make sure we can start the cluster before the first task action, but it also seems more straight forward than hooking on to the task graph.

rjernst

Thanks for the update. I don't see the advantage, though, to using a listener vs attaching to the task graph. They should be functionally equivalent? And if a task is disabled, we should not be starting the cluster for it; this is why I like using doFirst. Additionally, I think the cluster configuration should not contain mutable state, which is why the design I proposed had all tracking logic about use inside the plugin itself.

rjernst · 2018-08-16T15:18:38Z

buildSrc/src/main/java/org/elasticsearch/gradle/clusterformation/ClusterformationPlugin.java

+        project.getGradle().getTaskGraph().whenReady(taskExecutionGraph ->
+            taskExecutionGraph.getAllTasks()
+                .forEach(task ->
+                    taskToCluster.getOrDefault(task, new ArrayList<>()).forEach(ElasticsearchConfiguration::claim)


This would cause taskToCluster to fill itself with an empty list for every other task in the execution graph.

It will create an empty list, but not add it to the map should have used emptyList() i'm fixing that.

rjernst · 2018-08-16T15:20:03Z

buildSrc/src/main/java/org/elasticsearch/gradle/clusterformation/ClusterformationPlugin.java

+
+        // create the listener to start the clusters on-demand and terminate when no longer claimed.
+        project.getGradle().addListener(
+            new ClusterFormationTaskExecutionListener(Collections.unmodifiableMap(taskToCluster))


This suffers from the same concern I had in my previous review: by using a separate class, it increases the complexity of what must be understood by someone reading this code.

I don't agree that the listener class is more complex than hooking on the task graph and doFirst, I would even argue that the listener interface is simpler because it follows a well understood design pattern. To me this fits better than a plugin implemented in java.

It does have all the advantages, if a task is skipped, no actions will be run for that task, so beforeActions is never called and thus the cluster is not started.

The real tie breaker for me is that it also allows for tasks to have their own doFirst that interacts with the cluster, e.x. a testing task that does some setup before running the test, like indexing ( I think docs tests do this ). Keep in mind that there won't be a separate start cluster task. If we start the cluster in doFrist there will be no way for build scripts to add any actions that run after the cluster is started but before the regular task actions.

W.r.t immutability, the listener gets an immutable map and the configurations themselves lock-down once the cluster is running. This will guarantee that we don't change configuration in the execution phase.

The real tie breaker for me is that it also allows for tasks to have their own doFirst

My proposed solution still allows this. The doFirst is not added until the task graph is resolved, after configuration is complete. So the doFirst added is actually the very first doFirst, before any configured doFirst blocks.

the configurations themselves lock-down once the cluster is running

But they don't actually, because the "claimed" tracking is done there, which is always mutable.

I would even argue that the listener interface is simpler because it follows a well understood design pattern

All of these are hooks that allow executing a block of code before/after a task is run. They have different nuances, but they are all the same design pattern. Additionally, you could still use the listener, but I am suggesting it can be created as either an inner class, or an anonymous class, inside the plugin, so that the logic and state are all together.

rjernst · 2018-08-16T15:20:58Z

buildSrc/src/main/java/org/elasticsearch/gradle/clusterformation/UseClusterAction.java

+package org.elasticsearch.gradle.clusterformation;
+
+@FunctionalInterface
+public interface UseClusterAction {


This seems to have been created but not used?

alpar-t · 2018-08-17T07:35:34Z

@rjernst thanks for the quick thoughts!
I haven't addressed the abstraction introduced by the config interface from your previous comment, I'we been going back and forth on that one, I'll remove it in a subsequent PR to avoid conflicts with the second part of the implementation.

rjernst · 2018-08-17T18:53:48Z

I'll remove it in a subsequent PR to avoid conflicts with the second part of the implementation.

Ok thanks. I'm fine with that being collapsed in a followup.

…ormation-part2

alpar-t · 2018-08-27T11:16:18Z

@rjernst does this look good now ?

rjernst

@atorok I don't think all my concerns were addressed? This still has the configuration object as mutable with state inside it.

alpar-t · 2018-08-28T05:47:09Z

I think I misunderstood that part. Do you mean NamedDomainObjectContainer<? extends ElasticsearchConfiguration> container ? This does hold the state of the cluster, and allows mutation up until the cluster is started, I'm not sure how we would avoid this, as we'll want users to customize this at configuration time. taskToCluster is also mutable so we can add the mappings to it. We could probably do a better job at making sure that we don't add mappings after the task graph is ready. Is this what you meant ?

rjernst · 2018-08-29T03:45:59Z

I mean the use of numberOfClaims inside the configuration object instead of in the plugin. Additionally, I'm not sure the started flag is necessary. All of this can be tracked inside the plugin, and the elasticsearch configuration can be immutable once the gradle configuration phase is complete.

alpar-t · 2018-08-29T12:26:05Z

@rjernst I do see your point now. I propose to discuss this in the next PR which adds the implementation to start the cluster. I'm waiting to merge this before I open that PR to avoid this one getting too big. The implementation does not separate the configuration objects from the logic of what it takes to configure and start and adds additional logic to deal with failing tasks, adding more context to this discussion.

rjernst

Alright, let's discuss in the next part.

Gradle integration for the Cluster formation plugin with ref counting

* 6.x: Mute test watcher usage stats output [Rollup] Fix FullClusterRestart test TEST: Disable soft-deletes in ParentChildTestCase TEST: Disable randomized soft-deletes settings Integrates soft-deletes into Elasticsearch (#33222) drop `index.shard.check_on_startup: fix` (#32279) Fix AwaitsFix issue number Mute SmokeTestWatcherWithSecurityIT testsi [DOCS] Moves ml folder from x-pack/docs to docs (#33248) TEST: mute more SmokeTestWatcherWithSecurityIT tests [DOCS] Move rollup APIs to docs (#31450) [DOCS] Rename X-Pack Commands section (#33005) Fixes SecurityIntegTestCase so it always adds at least one alias (#33296) TESTS: Fix Random Fail in MockTcpTransportTests (#33061) (#33307) MINOR: Remove Dead Code from PathTrie (#33280) (#33306) Fix pom for build-tools (#33300) Lazy evaluate java9home (#33301) SQL: test coverage for JdbcResultSet (#32813) Work around to be able to generate eclipse projects (#33295) Different handling for security specific errors in the CLI. Fix for #33230 (#33255) [ML] Refactor delimited file structure detection (#33233) SQL: Support multi-index format as table identifier (#33278) Enable forbiddenapis server java9 (#33245) [MUTE] SmokeTestWatcherWithSecurityIT flaky tests Add region ISO code to GeoIP Ingest plugin (#31669) (#33276) Don't be strict for 6.x Update serialization versions for custom IndexMetaData backport Replace IndexMetaData.Custom with Map-based custom metadata (#32749) Painless: Fix Bindings Bug (#33274) SQL: prevent duplicate generation for repeated aggs (#33252) TEST: Mute testMonitorClusterHealth Fix serialization of empty field capabilities response (#33263) Fix nested _source retrieval with includes/excludes (#33180) [DOCS] TLS file resources are reloadable (#33258) Watcher: Ensure TriggerEngine start replaces existing watches (#33157) Ignore module-info in jar hell checks (#33011) Fix docs build after #33241 [DOC] Repository GCS ADC not supported (#33238) Upgrade to latest Gradle 4.10 (#32801) Fix/30904 cluster formation part2 (#32877) Move file-based discovery to core (#33241) HLRC: add client side RefreshPolicy (#33209) [Kerberos] Add unsupported languages for tests (#33253) Watcher: Reload properly on remote shard change (#33167) Fix classpath security checks for external tests. (#33066) [Rollup] Only allow aggregating on multiples of configured interval (#32052) Added deprecation warning for rescore in scroll queries (#33070) Apply settings filter to get cluster settings API (#33247) [Rollup] Re-factor Rollup Indexer into a generic indexer for re-usability (#32743) HLRC: create base timed request class (#33216) HLRC: Use Optional in validation logic (#33104) Painless: Add Bindings (#33042)

alpar-t added 8 commits July 12, 2018 17:23

Implement Version in java

4b532c0

- This allows to move all all .java files from .groovy. - Will prevent eclipse from tangling up in this setup - make it possible to use Version from Java

Merge remote-tracking branch 'origin/master' into fix/30904-cluster-f…

506dfc8

…ormation-part1

PR review comments

952af44

Fix rebase

dfc38a1

Merge remote-tracking branch 'origin/master' into fix/30904-cluster-f…

b01ed0e

…ormation-part2

Merge branch 'master' into fix/30904-cluster-formation-part2

7ac5b7c

Change to useCluster method on task

f2fb450

alpar-t requested a review from rjernst August 15, 2018 16:19

rjernst reviewed Aug 15, 2018

View reviewed changes

colings86 added the :Delivery/Build Build or test infrastructure label Aug 16, 2018

alpar-t added >non-issue v7.0.0 v6.5.0 labels Aug 16, 2018

remove task extension

2d0e478

rjernst reviewed Aug 16, 2018

View reviewed changes

address PR

938ef53

alpar-t added 2 commits August 21, 2018 11:47

Listener in anonymous class

a611747

Merge remote-tracking branch 'origin/master' into fix/30904-cluster-f…

5483186

…ormation-part2

rjernst reviewed Aug 27, 2018

View reviewed changes

rjernst approved these changes Aug 29, 2018

View reviewed changes

alpar-t merged commit f097446 into elastic:master Aug 30, 2018

alpar-t deleted the fix/30904-cluster-formation-part2 branch August 30, 2018 07:17

alpar-t added a commit that referenced this pull request Aug 30, 2018

Fix/30904 cluster formation part2 (#32877)

a3a168b

Gradle integration for the Cluster formation plugin with ref counting

alpar-t restored the fix/30904-cluster-formation-part2 branch August 30, 2018 07:33

alpar-t deleted the fix/30904-cluster-formation-part2 branch August 30, 2018 07:36

alpar-t restored the fix/30904-cluster-formation-part2 branch August 30, 2018 07:59

colings86 added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019

mark-vieira added the Team:Delivery Meta label for Delivery team label Nov 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix/30904 cluster formation part2 #32877

Fix/30904 cluster formation part2 #32877

alpar-t commented Aug 15, 2018

alpar-t commented Aug 15, 2018

rjernst left a comment

alpar-t commented Aug 16, 2018

alpar-t commented Aug 16, 2018

rjernst left a comment

rjernst Aug 16, 2018

alpar-t Aug 17, 2018

rjernst Aug 16, 2018

alpar-t Aug 17, 2018

rjernst Aug 17, 2018

rjernst Aug 16, 2018

alpar-t Aug 17, 2018

alpar-t commented Aug 17, 2018

rjernst commented Aug 17, 2018

alpar-t commented Aug 27, 2018

rjernst left a comment

alpar-t commented Aug 28, 2018

rjernst commented Aug 29, 2018

alpar-t commented Aug 29, 2018

rjernst left a comment

Fix/30904 cluster formation part2 #32877

Fix/30904 cluster formation part2 #32877

Conversation

alpar-t commented Aug 15, 2018

alpar-t commented Aug 15, 2018

rjernst left a comment

Choose a reason for hiding this comment

alpar-t commented Aug 16, 2018

alpar-t commented Aug 16, 2018

rjernst left a comment

Choose a reason for hiding this comment

rjernst Aug 16, 2018

Choose a reason for hiding this comment

alpar-t Aug 17, 2018

Choose a reason for hiding this comment

rjernst Aug 16, 2018

Choose a reason for hiding this comment

alpar-t Aug 17, 2018

Choose a reason for hiding this comment

rjernst Aug 17, 2018

Choose a reason for hiding this comment

rjernst Aug 16, 2018

Choose a reason for hiding this comment

alpar-t Aug 17, 2018

Choose a reason for hiding this comment

alpar-t commented Aug 17, 2018

rjernst commented Aug 17, 2018

alpar-t commented Aug 27, 2018

rjernst left a comment

Choose a reason for hiding this comment

alpar-t commented Aug 28, 2018

rjernst commented Aug 29, 2018

alpar-t commented Aug 29, 2018

rjernst left a comment

Choose a reason for hiding this comment