Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add QA module to test task cancellation #59828

Closed
wants to merge 13 commits into from

Conversation

dnhatn
Copy link
Member

@dnhatn dnhatn commented Jul 18, 2020

This commit introduces a QA module that runs tests with two mixed clusters with a test plugin installed. The test plugin blocks some requests until they are canceled so we can verify BWC in the task cancellation cross clusters.

Relates #55779 (review)

@dnhatn dnhatn added >test Issues or PRs that are addressing/adding tests :Distributed/Task Management Issues for anything around the Tasks API - both persistent and node level. v8.0.0 v6.8.12 v7.10.0 v7.9.1 labels Jul 18, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Task Management)

@elasticmachine elasticmachine added the Team:Distributed Meta label for distributed team label Jul 18, 2020
@dnhatn
Copy link
Member Author

dnhatn commented Jul 18, 2020

@mark-vieira Can you please take a look at the gradle things? Thank you!

@dnhatn
Copy link
Member Author

dnhatn commented Jul 20, 2020

run elasticsearch-ci/1

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not check the Gradle changes, but just looked at the other bits.

testImplementation project(':client:rest-high-level')
}

for (Version bwcVersion : BuildParams.bwcVersions.unreleasedWireCompatible) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we only testing unreleased versions here? Don't we want to test compatibility against released versions as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are using a test plugin, which is not published and not available in the released versions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we simply don't intend to do BWC testing against released versions? How do we intend to ensure we don't break compatibility there?

distribution/bwc/build.gradle Outdated Show resolved Hide resolved
qa/task-cancellation/build.gradle Show resolved Hide resolved
@dnhatn
Copy link
Member Author

dnhatn commented Jul 21, 2020

@ywelsch @mark-vieira Thank you for your reviews. I've addressed your comments. Can you take another look?

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left one more question, o.w. looking good

@dnhatn dnhatn requested a review from ywelsch July 21, 2020 14:39
Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@mark-vieira mark-vieira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a few more comments.

@@ -292,7 +292,12 @@ BuildParams.bwcVersions.forPreviousUnreleased { BwcVersions.UnreleasedVersionInf

createBuildBwcTask(projectName, "${baseDir}/${projectName}", projectArtifact)
}

// Create build tasks for a task-cancellation test plugin used for compatibility testing
if (bwcVersion.onOrAfter(Version.fromString("8.0.0"))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is temporary, yes? We shouldn't need to restrict versions here as this should only exist in branch for which the current snapshot branches also have this project.

testImplementation project(':client:rest-high-level')
}

for (Version bwcVersion : BuildParams.bwcVersions.unreleasedWireCompatible) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we simply don't intend to do BWC testing against released versions? How do we intend to ensure we don't break compatibility there?

} else {
UnreleasedVersionInfo unreleasedVersion = BuildParams.bwcVersions.unreleasedInfo(bwcVersion)
bundleOldPluginTask = project(unreleasedVersion.gradleProjectPath).tasks.findByName('buildBwcTaskCancellationTestPlugin')
if (bundleOldPluginTask == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is why we should do the version filtering here, and no in the bwc distribution project. That would allow us to ditch this check.

qa/task-cancellation/build.gradle Show resolved Hide resolved
nonInputProperties.systemProperty('tests.old_cluster', "${-> testClusters."old-${baseName}".allTransportPortURI.join(",")}")
nonInputProperties.systemProperty('tests.rest.old_cluster', "${-> testClusters."old-${baseName}".allHttpSocketURI.join(",")}")
nonInputProperties.systemProperty('tests.rest.cluster', "${-> testClusters."new-${baseName}".allHttpSocketURI.join(",")}")
dependsOn bundleOldPluginTask, tasks.bundlePlugin
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we make this the first line in the configuration closure. Just makes it easier to grok the task graph.

@mark-vieira
Copy link
Contributor

Since there is some plugin stuff involved here I think it might be worth @rjernst taking a look here as well.

@rjernst
Copy link
Member

rjernst commented Jul 21, 2020

This pattern is quite a bit different than anything we have done before. In particular, we have yet to have a "test only plugin" that manipulates the behavior of the cluster. The version logic doesn't really make sense here, only seeming to operating on in flight versions, but not actually testing all compatible versions, as Mark pointed out.

Taking a step back, the related PR/comment asked about bwc tests, but could we not test that logic in a unit test? I'm not saying we shouldnt necessarily have high level tests like this, but I would rather start with unit tests if possible, and only expand the scope of our full integration tests if necessary. I also don't like manipulating the system like this, as I think those kinds of mocks/manipulations are better left for tightly controlled unit tests rather than a full fledged cluster running, which may be very difficult to diagnose failures.

@dnhatn
Copy link
Member Author

dnhatn commented Jul 29, 2020

@rjernst Thanks for taking a look.

I took a step back to explore unit tests, as you suggested. It's too complex to construct unit tests that cover what we want and give us the confidence like these BWC tests.

This PR is quite similar to the idea proposed in #54159 (comment). However, we avoid publishing this plugin by making it test-only. If we publish it, then we can test against released versions.

I don't find a better alternative for this PR. WDYT?

Copy link
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we can take a step back and evaluate why we need BWC tests for this specific module/feature. What about it has special bwc behavior? I view BWC tests as not testing all features, but testing things we know to have bwc behavior that we want to guarantee. Normally this means we have some conditionals based on Version in the code; am I missing that here somewhere?

@dnhatn
Copy link
Member Author

dnhatn commented Aug 11, 2020

Perhaps we can take a step back and evaluate why we need BWC tests for this specific module/feature. What about it has special bwc behavior?

Thanks @rjernst. This QA module helps us to make sure that we won't send cancellation requests to a remote cluster that is on an old version because it won't be able to handle those requests. We know how to dispatch those requests conditionally. However, we do not feel confident without having some tests with a random configuration. As I said, we can do manual testing for this, but random tests can uncover something that we miss in BWC + remote connections.

@mark-vieira
Copy link
Contributor

This QA module helps us to make sure that we won't send cancellation requests to a remote cluster that is on an old version because it won't be able to handle those requests.

This statement seems like it could be a single assertion in a unit test to be honest, but I'm speaking from a naive perspective here. Do were really need a full on BWC integration test, w/ all the associated overhead to ensure the logic that handles cancellation works in this scenario?

@dliappis dliappis added v6.8.13 and removed v6.8.12 labels Aug 12, 2020
@dnhatn
Copy link
Member Author

dnhatn commented Aug 24, 2020

I am closing this PR as we could not reach a consensus on the approach. I will try to tackle the issue without this QA module. Thanks everyone for the reviews!

@dnhatn dnhatn closed this Aug 24, 2020
@dnhatn dnhatn deleted the qa-task-cancellation branch August 24, 2020 02:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Task Management Issues for anything around the Tasks API - both persistent and node level. Team:Distributed Meta label for distributed team >test Issues or PRs that are addressing/adding tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants