Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix/30904 cluster formation part2 #32877

Merged
merged 12 commits into from
Aug 30, 2018

Conversation

alpar-t
Copy link
Contributor

@alpar-t alpar-t commented Aug 15, 2018

This is a continuation of #32028 which I accidentally merged, then reverted right away due to lack of caffeine.

- This allows to move all  all .java files from .groovy.
- Will prevent eclipse from tangling up in this setup
- make it possible to use Version from Java
```
> Task :plugins:ingest-user-agent:listElasticSearchClusters
Starting cluster: myTestCluster
   * myTestCluster: /home/alpar/work/elastic/elasticsearch/plugins/ingest-user-agent/foo
Asked to unClaimAndStop myTestCluster, since cluster still has 1 claim it will not be stopped

> Task :plugins:ingest-user-agent:testme UP-TO-DATE
Stopping myTestCluster, since no of claims is 0
```

- Meant to auto manage the clusters lifecycle
- Add integration test for cluster formation
@alpar-t alpar-t requested a review from rjernst August 15, 2018 16:19
@alpar-t
Copy link
Contributor Author

alpar-t commented Aug 15, 2018

@rjernst I changed the DSL but kept the task extension. The static method to get it is now moved to it's proper class. There's a lot of boilerplate in java to access it in all, but it I think the task extension mechanism is a natural way to achieve this in Gradle rather than store this in a separate map. The extensions points we use work with tasks, so we just get the extensions from the task to know about all the clusters it claims.

Copy link
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes thus far! One more minor naming change, that I believe we discussed, is to name the plugin and dsl such as “clusters” or “testClusters” (eg TestClustersPlugin, elasticsearch.test-clusters, etc). Cluster formation is sort of a legacy name that I don’t think we should continue with. Additionally, I don’t see why we need an interface to abstract cluster configuration vs node configuration? If we need it in the future, that is fine, but let’s wait until it is needed rather than abstracting prematurely.

My larger concern is still the extension. I think it makes this more complicated (ie harder to follow what the code is doing) than it needs to be. I believe all of the logic can be within the single plugin class, which makes it much easier to see the entire picture. Let me try to describe the solution I’ve been advocating for in more concrete terms.

To start, we need a way to track the number of current uses of each cluster. This can be a simple map by cluster name:

Map<String, Integer> refCounts = new HashMap<>();

Next we need to keep track of which clusters are associated with tasks, and which tasks are associated with clusters. We can do this as local variables within the plugin’s apply method:

Map<String, List<Task>> clusterToTasks = new HashMap<>();
Map<Task, List<ElasticsearchCluster>> taskToCluster = new HashMap<>();

Then the useCluster method on tasks can be a simple extension property that is a closure, just like you have, but the doCall method would be:

clusterToTasks.computeIfAbsent(conf.getName(), k -> new ArrayList()).add(task);
taskToCluster.computIfAbsent(task, k -> new ArrayList()).add(conf);

Now, instead of needing to iterate over all tasks in the task graph, we only need to iterate the uses map, to setup the ref counts:

project.getGradle().getTaskGraph().whenReady(taskExecutionGraph ->
    clusterToTasks.forEach((name, tasks) -> {
        refCounts.put(name, tasks.stream().filter(t -> taskExecutionGraph.hasTask(t)).count());
        for (Task task : tasks) {
            // use a doFirst so that if the task is skipped, it will not trigger the cluster to start
            task.doFirst(t -> taskToCluster.get(task).forEach(cluster -> cluster.start()));
        }
    });
    taskExecutionGraph.afterTask(task -> {
        ElasticsearchCluster cluster = taskToCluster.get(task);
        int uses = refCounts.get(cluster.getName())
        refCounts.put(cluster.getName(), uses - 1);
        if (uses <= 1) {
            // last use, so we can stop the cluster
            cluster.stop();
        }
    });
);

Having all of these hooks registered next to each other makes the logic much easier to follow.

@colings86 colings86 added the :Delivery/Build Build or test infrastructure label Aug 16, 2018
@alpar-t
Copy link
Contributor Author

alpar-t commented Aug 16, 2018

@rjernst I added a task to #30904 to make sure I don't forget about it. I will do it in the next PR if that's fine with you. I have a large part of the implementation, I would need to do the refactoring there and backport.

@alpar-t
Copy link
Contributor Author

alpar-t commented Aug 16, 2018

I removed the Task extensions, but kept the listener.
I prefer it it to doFirst because we can make sure we can start the cluster before the first task action, but it also seems more straight forward than hooking on to the task graph.

Copy link
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. I don't see the advantage, though, to using a listener vs attaching to the task graph. They should be functionally equivalent? And if a task is disabled, we should not be starting the cluster for it; this is why I like using doFirst. Additionally, I think the cluster configuration should not contain mutable state, which is why the design I proposed had all tracking logic about use inside the plugin itself.

project.getGradle().getTaskGraph().whenReady(taskExecutionGraph ->
taskExecutionGraph.getAllTasks()
.forEach(task ->
taskToCluster.getOrDefault(task, new ArrayList<>()).forEach(ElasticsearchConfiguration::claim)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would cause taskToCluster to fill itself with an empty list for every other task in the execution graph.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will create an empty list, but not add it to the map should have used emptyList() i'm fixing that.


// create the listener to start the clusters on-demand and terminate when no longer claimed.
project.getGradle().addListener(
new ClusterFormationTaskExecutionListener(Collections.unmodifiableMap(taskToCluster))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This suffers from the same concern I had in my previous review: by using a separate class, it increases the complexity of what must be understood by someone reading this code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't agree that the listener class is more complex than hooking on the task graph and doFirst, I would even argue that the listener interface is simpler because it follows a well understood design pattern. To me this fits better than a plugin implemented in java.

It does have all the advantages, if a task is skipped, no actions will be run for that task, so beforeActions is never called and thus the cluster is not started.

The real tie breaker for me is that it also allows for tasks to have their own doFirst that interacts with the cluster, e.x. a testing task that does some setup before running the test, like indexing ( I think docs tests do this ). Keep in mind that there won't be a separate start cluster task. If we start the cluster in doFrist there will be no way for build scripts to add any actions that run after the cluster is started but before the regular task actions.

W.r.t immutability, the listener gets an immutable map and the configurations themselves lock-down once the cluster is running. This will guarantee that we don't change configuration in the execution phase.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The real tie breaker for me is that it also allows for tasks to have their own doFirst

My proposed solution still allows this. The doFirst is not added until the task graph is resolved, after configuration is complete. So the doFirst added is actually the very first doFirst, before any configured doFirst blocks.

the configurations themselves lock-down once the cluster is running

But they don't actually, because the "claimed" tracking is done there, which is always mutable.

I would even argue that the listener interface is simpler because it follows a well understood design pattern

All of these are hooks that allow executing a block of code before/after a task is run. They have different nuances, but they are all the same design pattern. Additionally, you could still use the listener, but I am suggesting it can be created as either an inner class, or an anonymous class, inside the plugin, so that the logic and state are all together.

package org.elasticsearch.gradle.clusterformation;

@FunctionalInterface
public interface UseClusterAction {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to have been created but not used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

@alpar-t
Copy link
Contributor Author

alpar-t commented Aug 17, 2018

@rjernst thanks for the quick thoughts!
I haven't addressed the abstraction introduced by the config interface from your previous comment, I'we been going back and forth on that one, I'll remove it in a subsequent PR to avoid conflicts with the second part of the implementation.

@rjernst
Copy link
Member

rjernst commented Aug 17, 2018

I'll remove it in a subsequent PR to avoid conflicts with the second part of the implementation.

Ok thanks. I'm fine with that being collapsed in a followup.

@alpar-t
Copy link
Contributor Author

alpar-t commented Aug 27, 2018

@rjernst does this look good now ?

Copy link
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@atorok I don't think all my concerns were addressed? This still has the configuration object as mutable with state inside it.

@alpar-t
Copy link
Contributor Author

alpar-t commented Aug 28, 2018

I think I misunderstood that part. Do you mean NamedDomainObjectContainer<? extends ElasticsearchConfiguration> container ? This does hold the state of the cluster, and allows mutation up until the cluster is started, I'm not sure how we would avoid this, as we'll want users to customize this at configuration time. taskToCluster is also mutable so we can add the mappings to it. We could probably do a better job at making sure that we don't add mappings after the task graph is ready. Is this what you meant ?

@rjernst
Copy link
Member

rjernst commented Aug 29, 2018

I mean the use of numberOfClaims inside the configuration object instead of in the plugin. Additionally, I'm not sure the started flag is necessary. All of this can be tracked inside the plugin, and the elasticsearch configuration can be immutable once the gradle configuration phase is complete.

@alpar-t
Copy link
Contributor Author

alpar-t commented Aug 29, 2018

@rjernst I do see your point now. I propose to discuss this in the next PR which adds the implementation to start the cluster. I'm waiting to merge this before I open that PR to avoid this one getting too big. The implementation does not separate the configuration objects from the logic of what it takes to configure and start and adds additional logic to deal with failing tasks, adding more context to this discussion.

Copy link
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, let's discuss in the next part.

@alpar-t alpar-t merged commit f097446 into elastic:master Aug 30, 2018
@alpar-t alpar-t deleted the fix/30904-cluster-formation-part2 branch August 30, 2018 07:17
alpar-t added a commit that referenced this pull request Aug 30, 2018
Gradle integration for the Cluster formation plugin with ref counting
@alpar-t alpar-t restored the fix/30904-cluster-formation-part2 branch August 30, 2018 07:33
@alpar-t alpar-t deleted the fix/30904-cluster-formation-part2 branch August 30, 2018 07:36
@alpar-t alpar-t restored the fix/30904-cluster-formation-part2 branch August 30, 2018 07:59
dnhatn added a commit that referenced this pull request Sep 1, 2018
* 6.x:
  Mute test watcher usage stats output
  [Rollup] Fix FullClusterRestart test
  TEST: Disable soft-deletes in ParentChildTestCase
  TEST: Disable randomized soft-deletes settings
  Integrates soft-deletes into Elasticsearch (#33222)
  drop `index.shard.check_on_startup: fix` (#32279)
  Fix AwaitsFix issue number
  Mute SmokeTestWatcherWithSecurityIT testsi
  [DOCS] Moves ml folder from x-pack/docs to docs (#33248)
  TEST: mute more SmokeTestWatcherWithSecurityIT tests
  [DOCS] Move rollup APIs to docs (#31450)
  [DOCS] Rename X-Pack Commands section (#33005)
  Fixes SecurityIntegTestCase so it always adds at least one alias (#33296)
  TESTS: Fix Random Fail in MockTcpTransportTests (#33061) (#33307)
  MINOR: Remove Dead Code from PathTrie (#33280) (#33306)
  Fix pom for build-tools (#33300)
  Lazy evaluate java9home (#33301)
  SQL: test coverage for JdbcResultSet (#32813)
  Work around to be able to generate eclipse projects (#33295)
  Different handling for security specific errors in the CLI. Fix for #33230 (#33255)
  [ML] Refactor delimited file structure detection (#33233)
  SQL: Support multi-index format as table identifier (#33278)
  Enable forbiddenapis server java9 (#33245)
  [MUTE] SmokeTestWatcherWithSecurityIT flaky tests
  Add region ISO code to GeoIP Ingest plugin (#31669) (#33276)
  Don't be strict for 6.x
  Update serialization versions for custom IndexMetaData backport
  Replace IndexMetaData.Custom with Map-based custom metadata (#32749)
  Painless: Fix Bindings Bug (#33274)
  SQL: prevent duplicate generation for repeated aggs (#33252)
  TEST: Mute testMonitorClusterHealth
  Fix serialization of empty field capabilities response (#33263)
  Fix nested _source retrieval with includes/excludes (#33180)
  [DOCS] TLS file resources are reloadable (#33258)
  Watcher: Ensure TriggerEngine start replaces existing watches (#33157)
  Ignore module-info in jar hell checks (#33011)
  Fix docs build after #33241
  [DOC] Repository GCS ADC not supported (#33238)
  Upgrade to latest Gradle 4.10  (#32801)
  Fix/30904 cluster formation part2 (#32877)
  Move file-based discovery to core (#33241)
  HLRC: add client side RefreshPolicy (#33209)
  [Kerberos] Add unsupported languages for tests (#33253)
  Watcher: Reload properly on remote shard change (#33167)
  Fix classpath security checks for external tests. (#33066)
  [Rollup] Only allow aggregating on multiples of configured interval (#32052)
  Added deprecation warning for rescore in scroll queries (#33070)
  Apply settings filter to get cluster settings API (#33247)
  [Rollup] Re-factor Rollup Indexer into a generic indexer for re-usability   (#32743)
  HLRC: create base timed request class (#33216)
  HLRC: Use Optional in validation logic (#33104)
  Painless: Add Bindings (#33042)
@mark-vieira mark-vieira added the Team:Delivery Meta label for Delivery team label Nov 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Delivery/Build Build or test infrastructure >non-issue Team:Delivery Meta label for Delivery team v6.5.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants