Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent handling of dependent lifecycle beans in DefaultLifecycleProcessor [SPR-16901] #21440

Closed
spring-issuemaster opened this issue Jun 4, 2018 · 9 comments
Assignees
Milestone

Comments

@spring-issuemaster
Copy link
Collaborator

@spring-issuemaster spring-issuemaster commented Jun 4, 2018

Markus Heiden opened SPR-16901 and commented

DefaultLifecycleProcessor.LifecycleGroup#stop() fails to wait for all members, if there are dependencies between the members.

IMO the problem lies in this loop of LifecycleGroup#stop():

for (LifecycleGroupMember member : this.members) {
  if (this.lifecycleBeans.containsKey(member.name)) {
    doStop(this.lifecycleBeans, member.name, latch, countDownBeanNames);
  }
  else if (member.bean instanceof SmartLifecycle) {
    // already removed, must have been a dependent
    latch.countDown();
  }
}

The loop iterates all members. doStop() removes the current member and its dependent members from this.lifecycleBeans and passes a callback to all of them which counts the latch down. This way the latch gets counted down for the current member and its dependent members.

Later on when the iteration reaches one of the dependent members from above, the else block gets executed, because the dependent member has already been removed from this.lifecycleBeans. This counts the latch down again for the dependent member.

This way the latch reaches 0 before all beans called their stop callback and the application context gets destroyed even though some of the members are still running.

I don't understand why the else block is needed at all. IMO it should be simply removed.


Affects: 5.0.6

Issue Links:

  • #11020 ApplicationContext very slow to shutdown with bean of type org.springframework.scheduling.quartz.SchedulerFactoryBean
  • INT-4486 The SmartLifecycle.stop(Runnable callback) implementation must always call callback, in the end

Referenced from: commits 6f41d4e, b3ca2d5, b71795b, 6cf1978

0 votes, 5 watchers

@spring-issuemaster
Copy link
Collaborator Author

@spring-issuemaster spring-issuemaster commented Jun 7, 2018

Artem Bilan commented

Hi Juergen Hoeller!

Not sure how, but looks like this has introduced some regression.
In Spring Integration in some test (if not all) we are obesrving a delay (30 secs) between some phases during stopping application context:

2018-06-07 15:32:27,546 DEBUG [main] [org.springframework.beans.factory.support.DefaultListableBeanFactory] - Returning cached instance of singleton bean 'lifecycleProcessor'
2018-06-07 15:32:27,548 INFO [main] [org.springframework.context.support.DefaultLifecycleProcessor] - Stopping beans in phase 2147483647
2018-06-07 15:32:27,548 INFO [main] [org.springframework.context.support.DefaultLifecycleProcessor] - Stopping beans in phase 1073741823
2018-06-07 15:32:57,550 INFO [main] [org.springframework.context.support.DefaultLifecycleProcessor] - Stopping beans in phase 0
2018-06-07 15:32:57,550 INFO [main] [org.springframework.context.support.DefaultLifecycleProcessor] - Stopping beans in phase -2147483648
2018-06-07 15:32:57,550 DEBUG [main] [org.springframework.context.support.DefaultLifecycleProcessor] - Asking bean '_org.springframework.integration.errorLogger' of type [class org.springframework.integration.config.ConsumerEndpointFactoryBean] to stop

Any ideas what may we miss otherwise?

Thanks

@spring-issuemaster
Copy link
Collaborator Author

@spring-issuemaster spring-issuemaster commented Jun 8, 2018

Artem Bilan commented

Juergen,

Let me know that this pattern is wrong:

public final void stop(Runnable callback) {
     this.lifecycleLock.lock();
     try {
          if (this.running) {
               doStop(callback);
               this.running = false;
               if (logger.isInfoEnabled()) {
                        logger.info("stopped " + this);
               }
          }
     }
     finally {
          this.lifecycleLock.unlock();
     }
}

and we have to call callback anyway independently of the component state.
Then it is definitely bug in Spring Integration :(
Thanks

@spring-issuemaster
Copy link
Collaborator Author

@spring-issuemaster spring-issuemaster commented Jun 8, 2018

Artem Bilan commented

Yeah... that doesn't help even after fix, but this ias what I get from the debug:


this = {DefaultLifecycleProcessor$LifecycleGroup@3810}
latch = {CountDownLatch@3811} "java.util.concurrent.CountDownLatch@3bdb2c78[Count = 4]"
this.members = {ArrayList@3812} size = 4
0 = {DefaultLifecycleProcessor$LifecycleGroupMember@3815}
name = "outboundWithImplicitChannel.adapter"
bean = {ConsumerEndpointFactoryBean@3746}
this$0 = {DefaultLifecycleProcessor@3661}
1 = {DefaultLifecycleProcessor$LifecycleGroupMember@3816}
name = "methodInvokingConsumer.adapter"
bean = {ConsumerEndpointFactoryBean@3748}
this$0 = {DefaultLifecycleProcessor@3661}
2 = {DefaultLifecycleProcessor$LifecycleGroupMember@3817}
name = "expressionConsumer.adapter"
bean = {ConsumerEndpointFactoryBean@3750}
this$0 = {DefaultLifecycleProcessor@3661}
3 = {DefaultLifecycleProcessor$LifecycleGroupMember@3818}
name = "_org.springframework.integration.errorLogger"
bean = {ConsumerEndpointFactoryBean@3752}
this$0 = {DefaultLifecycleProcessor@3661}
member.bean = Cannot find local variable 'member'
this.lifecycleBeans = {LinkedHashMap@3770} size = 2
0 = {LinkedHashMap$Entry@3788} "expressionConsumer.adapter" ->
key = "expressionConsumer.adapter"
value = {ConsumerEndpointFactoryBean@3750}
1 = {LinkedHashMap$Entry@3800} "_org.springframework.integration.errorLogger" ->
key = "_org.springframework.integration.errorLogger"
value = {ConsumerEndpointFactoryBean@3752}
this.smartMemberCount = 4
member.name = Cannot find local variable 'member'

Pay attention to a difference in numbers. In previous version we had this piece of code though:

else if (member.bean instanceof SmartLifecycle) {
	// already removed, must have been a dependent
	latch.countDown();
}

Maybe we are still something missing in Spring Integration?

@spring-issuemaster
Copy link
Collaborator Author

@spring-issuemaster spring-issuemaster commented Jun 8, 2018

Artem Bilan commented

Not sure if that is going to be helpful, but this is what I see so far.

We have this code:

private Object resolveReference(Object argName, RuntimeBeanReference ref) {
...
			else {
				bean = this.beanFactory.getBean(refName);
				this.beanFactory.registerDependentBean(refName, this.beanName);
			}
...
}

And this happens exactly when we have the logic in the AbstractBeanDefinitionParser like:

builder.addPropertyReference("handler", handlerBeanComponentDefinition.getBeanName());

In our case that handlerBeanComponentDefinition is for the public class MethodInvokingMessageHandler extends AbstractMessageHandler implements Lifecycle. Pay attention how it implements Lifecycle.
In this case the DefaultLifecycleProcessor calculates the phase like:

protected int getPhase(Lifecycle bean) {
	return (bean instanceof Phased ? ((Phased) bean).getPhase() : 0);
}

So, when time to stop application comes to phase == 0, it stops that our MethodInvokingMessageHandler. However at the same time it does that for all the dependent beans. And that is wrong a bit, just because when time comes to stop those dependent beans in a bit of different phase - -2147483648, there is nothing to stop already and CountDownLatch is not counted down: the Lifecycle bean = lifecycleBeans.remove(beanName); returns null because this bean has bean removed in phase 0.

So, independently of inconsistency that we don't clean up other groups after stopping dependent bean, I think the main issue that we would expected our SmartLifecycles to be stopped in their required phase, not in that 0 for the plain Lifecycle. To be honest I always thought that the relationship is opposite: we start Lifecycle from the particular SmartLifecycle and only this SmartLifecycle stops that Lifecycle in dependencies.

Does it make sense?

Thank you!

@spring-issuemaster
Copy link
Collaborator Author

@spring-issuemaster spring-issuemaster commented Jun 9, 2018

Juergen Hoeller commented

Artem Bilan, that helps a lot indeed... I didn't consider dependencies between several shutdown phases, and we don't have tests for that either. I'll see what I can do here... for 5.1 in particular, being defensive about backports.

@spring-issuemaster
Copy link
Collaborator Author

@spring-issuemaster spring-issuemaster commented Jun 10, 2018

Artem Bilan commented

Glad to be helpful, Juergen!

 

May we have a previous behavior back meanwhile?

This is kinda blocker for us and we can't commit anything into master because we have very long response time from test cases.

 

Thanks!

@spring-issuemaster
Copy link
Collaborator Author

@spring-issuemaster spring-issuemaster commented Jun 11, 2018

Juergen Hoeller commented

Restored to its original behavior (with slightly different code) in the meantime. I'll revisit this over the next few weeks.

@spring-issuemaster
Copy link
Collaborator Author

@spring-issuemaster spring-issuemaster commented Aug 9, 2018

Juergen Hoeller commented

I've revised that stop loop to check against the initially available lifecycle beans, only triggering that fallback count-down if the affected bean wasn't even available at the beginning of the phase. This should cover both scenarios that we're encountering here; Artem Bilan, let me know whether this works for you. It'll be available in the upcoming 5.1.0.BUILD-SNAPSHOT for next week's RC2; I also intend to backport it for September's 5.0.9.

@spring-issuemaster
Copy link
Collaborator Author

@spring-issuemaster spring-issuemaster commented Aug 10, 2018

Artem Bilan commented

Have just tested against the latest BUILD-SNAPSHOT - works well (y)

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.