Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JBTM-3212 Respect LRA deadlines during recovery. #1530

Merged
merged 2 commits into from
Dec 2, 2019

Conversation

mmusgrov
Copy link
Contributor

@mmusgrov mmusgrov commented Nov 26, 2019

https://issues.jboss.org/browse/JBTM-3212
https://issues.jboss.org/browse/JBTM-3222

!QA_JTA !QA_JTS_JDKORB !QA_JTS_OPENJDKORB !QA_JTS_JACORB !BLACKTIE !XTS !PERF NO_WIN !AS_TESTS !TOMCAT !JACOCO !mysql !postgres !db2 !oracle !RTS !MAIN !LRA

This PR adds recovery testing for timed out LRAs

@jbosstm-bot
Copy link

@jbosstm-bot
Copy link

@jbosstm-bot
Copy link

@jbosstm-bot
Copy link

@mmusgrov mmusgrov force-pushed the JBTM-3112 branch 2 times, most recently from a066a56 to c09e28f Compare November 27, 2019 07:57
@jbosstm-bot
Copy link

@jbosstm-bot
Copy link

@jbosstm-bot
Copy link

@jbosstm-bot
Copy link

@jbosstm-bot
Copy link

@jbosstm-bot
Copy link

@jbosstm-bot
Copy link

@jbosstm-bot
Copy link

@jbosstm-bot
Copy link

@jbosstm-bot
Copy link

@jbosstm-bot
Copy link

@jbosstm-bot
Copy link

Copy link
Collaborator

@xstefank xstefank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note, but shouldn't we focus all LRA work on the LRA branch? In this case this a pretty big PR and potentially a lot of conflicts with the work done in the LRA branch.

rts/lra/lra-coordinator-jar/pom.xml Show resolved Hide resolved
rts/lra/lra-coordinator-jar/pom.xml Show resolved Hide resolved
<dependency>
<groupId>org.jboss.narayana.rts</groupId>
<artifactId>narayana-lra</artifactId>
<version>5.10.1.Final-SNAPSHOT</version>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

${project.version}?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

long ttl = ChronoUnit.MILLIS.between(LocalDateTime.now(ZoneOffset.UTC), finishTime);

if (ttl <= 0) {
System.out.printf("TIMER has expired since last reload%n");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be better to log these statements instead of system.out?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, this should be using the logger framework

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 (and to the other occurrences of printf in this PR)

/**
* Test that an LRA which times out while there is no running coordinator is cancelled
* when a coordinator is restarted
* @throws URISyntaxException if the LRA or revovery URIs are invalid (should never happen)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* @throws URISyntaxException if the LRA or revovery URIs are invalid (should never happen)
* @throws URISyntaxException if the LRA or recovery URIs are invalid (should never happen)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

// start an LRA with a long timeout to validate that timed LRAs do not finish early during recovery
URI longLRA = lraClient.startLRA(null, "Long Timeout Recovery Test", LONG_TIMEOUT, ChronoUnit.MILLIS);
// start an LRA with a short timeout to validate that timed LRAs that time out when the coordinator is unavailable are cancelled
URI shortLRA = lraClient.startLRA(null, "Long Timeout Recovery Test", SHORT_TIMEOUT, ChronoUnit.MILLIS);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

different clientId can help with debugging in future

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1


void stopContainer() {
if (containerController.isStarted(COORDINATOR_CONTAINER)) {
LRALogger.logger.info("STOPPING CONTAINER%n");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

infof?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to mention that printf are probably OK in the tests. But I think it is a bit confusing that there is a mix of printf and the logger. I think that using the LRA internal logger could be confusing in the output

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 I removed all system printf's and fixed this format string on the logger

}

int recover() throws URISyntaxException {
try (Response response = ClientBuilder.newClient().target(new URI(recoveryUrl))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the created client also closed in try-with-resources?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not even sure if Client is AutoCloseable. If it was I would guess it could need something like the following (assuming it would compile) to work:
try (Client client = ClientBuilder.newClient(); Response resource = client.target(...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but I still close it usually when I type rest client invocations. But you are right it's not AutoClosable. Not sure if this it is necessary to close it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I was not suggesting not to close it - just trying to develop the thread. If it's not autocloseable then I think we just need the try-with-resources wrapped with a try/finally block

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The call that closes the underlying connections/resources is either when the response is closed or when the response entity is fully read (which is what the above code does in its try with resources code block).

The spec has this to say about the JAX-RS client API close method:

Calling this method effectively invalidates all resource targets produced by the client instance. Invoking any method on such targets once the client is closed would result in an IllegalStateException being thrown.

So since I am not using any of those targets after the call finishes (because the client is implicitly created) I don't need to call close on it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this statement "Client instances must be properly closed before being disposed to avoid leaking resources." from https://docs.oracle.com/javaee/7/api/javax/ws/rs/client/Client.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe later on in the project we can do some resource utilisation testing and performance optimisations but let's do it as part of a more comprehensive analysis of the non functional behaviour of the system.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, it is just a test but fair enough. If that is the only pending issue then I will push a fix for this comment?

@tomjenkinson
Copy link
Contributor

@xstefank I thought the strategy was that the LRA branch was just for the features that rely on LRA GA or RC2? We can still make changes on the main branch when the change would not rely on the temporary artifacts.

That said I think we could request an RC2 if there is divergence going on as we don't want to spend a lot of time reconciling the changes.

@@ -275,6 +275,7 @@ public Response startLRA(
}

String coordinatorUrl = String.format("%s%s", context.getBaseUri(), COORDINATOR_PATH_NAME);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

formatting only

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@@ -10,7 +10,7 @@
// mark the war as a JAX-RS archive
@ApplicationPath("/")
@OpenAPIDefinition(
info = @Info(title = "LRA Coordinator", version = "1.0"),
info = @Info(title = "LRA Coordinator", version = "1.0-RC1"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like something we will have to watch out for. I wonder if how we can make this more prominent that it needs to be updated with LRA coordinator versions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate this concern is not introduced by this PR but is something to follow up on

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created a final constant (in the same file) to give some degree of prominence to it. Now that you have mentioned it I thing I will remember to update it as we produce new LRA releases.

@@ -126,6 +110,5 @@
<lra.tck.suite.thorntail.trace.params>-Dthorntail.logging=TRACE</lra.tck.suite.thorntail.trace.params>
</properties>
</profile>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

formatting

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed this one and adding it back in again will trigger a full rebuild.
But I would also say that since this is pre release code removing the empty line does not obfuscate the intent of the PR and does not impact anyone wanting to diff the code.

@tomjenkinson
Copy link
Contributor

@mmusgrov I have added some comments

@xstefank
Copy link
Collaborator

@xstefank I thought the strategy was that the LRA branch was just for the features that rely on LRA GA or RC2? We can still make changes on the main branch when the change would not rely on the temporary artifacts.

That said I think we could request an RC2 if there is divergence going on as we don't want to spend a lot of time reconciling the changes.

Yes, but if we are changing LRA subsystem in any way there will be conflicts. Changes are made for different reasons but over the same codebase.

@tomjenkinson
Copy link
Contributor

@xstefank I thought the strategy was that the LRA branch was just for the features that rely on LRA GA or RC2? We can still make changes on the main branch when the change would not rely on the temporary artifacts.
That said I think we could request an RC2 if there is divergence going on as we don't want to spend a lot of time reconciling the changes.

Yes, but if we are changing LRA subsystem in any way there will be conflicts. Changes are made for different reasons but over the same codebase.

As this is a bug we should try to get it in the master branch where the feature is already present. But I do agree that we should expand the use of the LRA branch to any new features to save on this reconciliation.

@mmusgrov
Copy link
Contributor Author

Just a note, but shouldn't we focus all LRA work on the LRA branch? In this case this a pretty big PR and potentially a lot of conflicts with the work done in the LRA branch.

No I don't think so. The LRA branch was created in order to avoid dependencies on snapshot builds and for tracking changes to the MP-LRA spec and TCK. My change affects the LRA implementation only which is more than just an MP-LRA implementation. Also since this change works fine with the spec version RC1 it is fine to apply the fix to the master branch (in fact that is the best place for it since users will use our master branch (not the LRA branch which is a moving target) so they will get the fix.

@mmusgrov
Copy link
Contributor Author

@tomjenkinson and @xstefank can you both take another look please.

@jbosstm-bot
Copy link

<artifactId>jconsole</artifactId>
</exclusion>
</exclusions>
<exclusion>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

formatting?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is pre release code this change does not obfuscate the intent of the PR and does not impact anyone wanting to diff the code. After release one it would make sense to minimise the occasion reformatting of blocks.

}

int recover() throws URISyntaxException {
try (Response response = ClientBuilder.newClient().target(new URI(recoveryUrl))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this statement "Client instances must be properly closed before being disposed to avoid leaking resources." from https://docs.oracle.com/javaee/7/api/javax/ws/rs/client/Client.html

rts/lra/lra-coordinator-jar/pom.xml Show resolved Hide resolved
rts/lra/lra-coordinator-jar/pom.xml Show resolved Hide resolved
@mmusgrov
Copy link
Contributor Author

Just a note, but shouldn't we focus all LRA work on the LRA branch? In this case this a pretty big PR and potentially a lot of conflicts with the work done in the LRA branch.

No I don't think so. The LRA branch was created in order to avoid dependencies on snapshot builds and for tracking changes to the MP-LRA spec and TCK. My change affects the LRA implementation only which is more than just an MP-LRA implementation. Also since this change works fine with the spec version RC1 it is fine to apply the fix to the master branch (in fact that is the best place for it since users will use our master branch (not the LRA branch which is a moving target) so they will get the fix.

I just checked and the LRA branch only has two small commits on it which I doubt will cause any conflicts - I don't mind doing the merge when we have an RC2. But I still think we are at least a couple of weeks away from RC2 so if we can get more (all even) of the MP-LRA issues resolved then we could do an RC2.

@jbosstm-bot
Copy link

Copy link
Collaborator

@xstefank xstefank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are still a few JAX-RS Client not closed in LRACoordinatorTestCase but if that's a design decision than ignore this message.

rts/lra/lra-coordinator-jar/pom.xml Show resolved Hide resolved
rts/lra/lra-coordinator-jar/pom.xml Show resolved Hide resolved
LocalDateTime ft = LocalDateTime.now().plusNanos(timeLimit * 1000000);
if (finishTime != null) {
// check whether the new time limit is less than the current one
LocalDateTime ft = LocalDateTime.now().plusNanos(timeLimit * 1000000);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry I missed this previously. This is missing ZoneOffset.UTC.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1


@After
public void after() {
LRALogger.logger.debugf("Finised test %s", testName);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
LRALogger.logger.debugf("Finised test %s", testName);
LRALogger.logger.debugf("Finished test %s", testName);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1


Assert.assertEquals("Unexpected status from reovery call:", 200, response.getStatus());

// the result wil be a List<LRAStatusHolder> of recovering LRAs but we just need the count
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// the result wil be a List<LRAStatusHolder> of recovering LRAs but we just need the count
// the result will be a List<LRAStatusHolder> of recovering LRAs but we just need the count

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@@ -1,5 +1,7 @@
<?xml version="1.0" encoding="UTF-8" standalone="no"?><project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

formatting

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

<modelVersion>4.0.0</modelVersion>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

formatting

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@mmusgrov
Copy link
Contributor Author

mmusgrov commented Dec 2, 2019

There are still a few JAX-RS Client not closed in LRACoordinatorTestCase but if that's a design decision than ignore this message.

It was deliberate (though not a "design decision"). This is a surefire test case and the JVM is torn down after each test so even if there were consequences to not closing the client it would have no effect on the test. Test cases are different from production code and the goal is coverage and simplicity (I do agree that closing responses is important since network resources live in the kernel).

But that said, I have made the changes anyway.

Copy link
Contributor

@tomjenkinson tomjenkinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some formatting changes but as LRA is still in active development it does not block the merge and the specific formatting changes do not inhibit debugging as they are in poms.

@mmusgrov mmusgrov merged commit cc46def into jbosstm:master Dec 2, 2019
@mmusgrov mmusgrov deleted the JBTM-3112 branch December 2, 2019 12:35
@mmusgrov mmusgrov changed the title JBTM-3112 Respect LRA deadlines during recovery. JBTM-3212 Respect LRA deadlines during recovery. Dec 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants