Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CTR stalls when thread core pool size is less than nested splits #4279

Closed
hpoettker opened this issue Dec 6, 2020 · 1 comment
Closed

CTR stalls when thread core pool size is less than nested splits #4279

hpoettker opened this issue Dec 6, 2020 · 1 comment

Comments

@hpoettker
Copy link
Contributor

Description:
When a composed task contains nested splits, the split thread core pool size must be set at least as high as the maximal number of nested splits. This is mentioned here: spring-attic/spring-cloud-task-app-starters-composed-task-runner#110 But this constraint is currently not properly checked.

Release versions:
The problem can be reproduced with the master branch. I first stumbled upon it with release 2.5.2.

Steps to reproduce:
Probably the easiest way to reproduce the issue is to enable and tweak the following test in ComposedRunnerVisitorTests:

@Test 
public void nestedSplitThreadPoolSize() {
  Throwable exception = assertThrows(BeanCreationException.class, () ->
      setupContextForGraph("<<AAA || BBB > && CCC || <DDD || EEE> && FFF>", "--splitThreadCorePoolSize=2"));
  assertThat(exception.getCause().getCause().getMessage()).isEqualTo("Split thread core pool size 1 should be equal or greater than the " +
      "depth of split flows 3. Try setting the composed task property " +
      "`splitThreadCorePoolSize`");
}

When the test is simply enabled, it passes. However, if the splitThreadCorePoolSize is set to 2, the expected exception is not thrown and the test stalls in a dead lock. The full log is:

  _____                                         _   _______        _
 / ____|                                       | | |__   __|      | |
| |     ___  _ __ ___  _ __   ___  ___  ___  __| |    | | __ _ ___| | __
| |    / _ \| '_ ` _ \| '_ \ / _ \/ __|/ _ \/ _` |    | |/ _` / __| |/ /
| |___| (_) | | | | | | |_) | (_) \__ \  __/ (_| |    | | (_| \__ \   <
 \_____\___/|_| |_| |_| .__/ \___/|___/\___|\__,_|    |_|\__,_|___/_|\_\
                      | |
                      |_|
 _____
|  __ \
| |__) |   _ _ __  _ __   ___ _ __
|  _  / | | | '_ \| '_ \ / _ \ '__|
| | \ \ |_| | | | | | | |  __/ |
|_|  \_\__,_|_| |_|_| |_|\___|_|

2020-12-06 01:07:15.604  INFO 24370 --- [           main] c.c.c.ConfigServicePropertySourceLocator : Fetching config from server at : http://localhost:8888
2020-12-06 01:07:16.024  INFO 24370 --- [           main] c.c.c.ConfigServicePropertySourceLocator : Connect Timeout Exception on Url - http://localhost:8888. Will be trying the next url if available
2020-12-06 01:07:16.025  WARN 24370 --- [           main] c.c.c.ConfigServicePropertySourceLocator : Could not locate PropertySource: I/O error on GET request for "http://localhost:8888/application/default": Connection refused (Connection refused); nested exception is java.net.ConnectException: Connection refused (Connection refused)
2020-12-06 01:07:16.041  INFO 24370 --- [           main] o.e.j.i.junit.runner.RemoteTestRunner    : No active profile set, falling back to default profiles: default
2020-12-06 01:07:17.836  INFO 24370 --- [           main] o.s.j.d.e.EmbeddedDatabaseFactory        : Starting embedded database: url='jdbc:h2:mem:e51aff93-03c2-44a3-8f13-8bb21e0a41df;DB_CLOSE_DELAY=-1;DB_CLOSE_ON_EXIT=false', username='sa'
2020-12-06 01:07:18.868  INFO 24370 --- [           main] o.s.s.concurrent.ThreadPoolTaskExecutor  : Initializing ExecutorService 'taskExecutor'
2020-12-06 01:07:19.269  INFO 24370 --- [           main] o.s.b.c.r.s.JobRepositoryFactoryBean     : No database type set, using meta data indicating: H2
2020-12-06 01:07:19.307  INFO 24370 --- [           main] o.s.b.c.l.support.SimpleJobLauncher      : No TaskExecutor has been set, defaulting to synchronous executor.
2020-12-06 01:07:19.545  INFO 24370 --- [           main] o.e.j.i.junit.runner.RemoteTestRunner    : Started RemoteTestRunner in 8.229 seconds (JVM running for 11.266)
2020-12-06 01:07:19.871  INFO 24370 --- [           main] o.s.b.c.l.support.SimpleJobLauncher      : Job: [FlowJob: [name=job]] launched with the following parameters: [{run.id=1}]
2020-12-06 01:07:19.968  WARN 24370 --- [           main] o.s.c.t.b.l.TaskBatchExecutionListener   : This job was executed outside the scope of a task but still used the task listener.

The issue can be also reproduced with any composed task that has the same structure of that in the test.

@cppwfs
Copy link
Contributor

cppwfs commented Dec 8, 2020

Thank you @hpoettker for raising this issue and resolving it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants