Skip to content
This repository has been archived by the owner on Jul 9, 2022. It is now read-only.

v2.0.2 CTR process runs indefinitely #60

Closed
sabbyanandan opened this issue Dec 20, 2018 · 10 comments
Closed

v2.0.2 CTR process runs indefinitely #60

sabbyanandan opened this issue Dec 20, 2018 · 10 comments
Assignees

Comments

@sabbyanandan
Copy link

As a user, while orchestrating a deeply nested-graph on CTR v2.0.2, I'm noticing that the CTR process continues to run even after successfully executing all the steps with exit-code=0. This behavior is not observed while using v2.0.0 release.

See spring-cloud/spring-cloud-dataflow#2667 for more details.

@mminella
Copy link
Contributor

I believe this is a symptom of #58 . If @cppwfs agrees, we can close this one as a duplicate.

@sabbyanandan
Copy link
Author

This issue is on CTR v2.0 line (builds on SCT v2.0), though. #58 relates to SCT v2.1, however.

@cppwfs
Copy link
Contributor

cppwfs commented Dec 20, 2018

These are separate issues.

@cppwfs
Copy link
Contributor

cppwfs commented Jan 3, 2019

Hello @Rostish,
I'm still having problems reproducing the problem.
I ran the following graph (each task was a timestamp)

  1. CTR 2.0.2 on SCDF 1.7.2 for 80 times
  2. CTR 2.0.2 on SCDF 2.0 127 times
    (DB used was mysql).

screen shot 2019-01-03 at 1 27 40 pm

This graph was constructed after reviewing your log and deriving the basic flow of what you were trying to run.
The command I executed looked like this:
java -jar composedtaskrunner-task-2.0.2.RELEASE.jar --spring.cloud.task.closecontextEnabled=true --increment-instance-enabled=true --split-thread-core-pool-size=4 --interval-time-between-checks=1000 --graph=""logrunme-1&&logrunme-2&&<logrunme-3||logrunme-4||logrunme-5||logrunme-6>&&<logrunme-7||logrunme-8||logrunme-9>&&logrunme-10&&<logrunme-11||logrunme-12>&&<logrunme-13||logrunme-14>&&logrunme-15&&<logrunme-16||logrunme-17||logrunme-18>&&logrunme-19&&<logrunme-20||logrunme-21||logrunme-22>"

Can you see a difference in my test case above and what you are executing?

@Rostish
Copy link

Rostish commented Jan 4, 2019

@cppwfs Good day for you!

i pass next arguments via REST Client launch command:
--dataflow-server-uri: http://10.101.48.150:9494 (could it connect with problem?)
--split-thread-core-pool-size: 5(as i see, you use 4 value)
--increment-instance-enabled: true (the same)

And i pass next arguments via DSL(in your example you didn't use any arguments in DSL):
--runner.localDate=2018-12-08
--spring.cloud.consul.config.datakey=calculate-vm-click-statistic
--runner.mode=EXEC

a little example:

calculate-vm-click-statistic: multirating-baseoperation --runner.localDate=2018-12-08 --
spring.cloud.consul.config.datakey=calculate-vm-click-statistic --runner.mode=EXEC && <average-
genre-statistic-calculation-online-vm: multirating-baseoperation --runner.localDate=2018-12-08 --
spring.cloud.consul.config.datakey=average-genre-statistic-calculation-online-vm --runner.mode=EXEC
 || average-click-statistic-calculation-online-web: multirating-baseoperation --runner.localDate=2018-12-
08 --spring.cloud.consul.config.datakey=average-click-statistic-calculation-online-web --
runner.mode=EXEC || average-click-statistic-calculation-online-vm: multirating-baseoperation --
runner.localDate=2018-12-08 --spring.cloud.consul.config.datakey=average-click-statistic-calculation-
online-vm --runner.mode=EXEC || average-genre-statistic-calculation-online-web: multirating-
baseoperation --runner.localDate=2018-12-08 --spring.cloud.consul.config.datakey=average-genre-
statistic-calculation-online-web --runner.mode=EXEC> && average-genre-statistic-calculation-off: 
multirating-baseoperation --runner.localDate=2018-12-08 --
spring.cloud.consul.config.datakey=average-genre-statistic-calculation-off --runner.mode=EXEC && 
fusion-v2: multirating-baseoperation --runner.localDate=2018-12-08 --
spring.cloud.consul.config.datakey=fusion-v2 --runner.mode=EXEC && aggregation-transformation: 
multirating-baseoperation --runner.localDate=2018-12-08 --
spring.cloud.consul.config.datakey=aggregation-transformation --runner.mode=EXEC && export-
infosys: multirating-baseoperation --runner.localDate=2018-12-08 --
spring.cloud.consul.config.datakey=export-infosys --runner.mode=EXEC && combine-infosys: 
multirating-baseoperation --runner.localDate=2018-12-08 --
spring.cloud.consul.config.datakey=combine-infosys --runner.mode=EXEC

And the main difference in executed tasks, i use my custom task for all executions. It has next bootstrap.yaml(i use consul):

runner:
  localDate: **pass this argument via dsl**
  mode: **pass this argument via dsl**
spring:
  application:
    name: multi-rating-operations
  cloud:
    consul:
      config:
        watch:
          enabled: false
        enabled: true
        prefix: ""
        datakey: **pass this argument via dsl**
        format: yaml
      host: 10.101.48.150
      port: 8500
      discovery:
        prefer-ip-address: true
        enabled: false
  jpa:
    properties:
      hibernate:
        jdbc:
          lob:
            non_contextual_creation: true
  datasource:
    url: jdbc:postgresql://192.168.21.70:5432/data_flow
    username: xxxxxxxx
    password: xxxxxxxx
    driver-class-name: org.postgresql.Driver
logging:
  level:
    org:
      springframework:
        cloud:
          task: debug
dataBusRest:
  dataSourceUrl: 10.101.48.150
  user: xxxxxxxx
  password: xxxxxxxx
  port: 10888

I could try to debug CTR by my self. Could you share your metodology for me? Or i just need to download sources of CTR and try start it like you using java -jar command.

@cppwfs
Copy link
Contributor

cppwfs commented Jan 4, 2019

Are you including --spring.cloud.task.closecontextEnabled=true in your parameters? That is required.

@Rostish
Copy link

Rostish commented Jan 4, 2019

I will try after holidays in my country.
I coudn't do it right now, because my code is availabe only from my work place.

@cppwfs
Copy link
Contributor

cppwfs commented Jan 4, 2019

I was able reproduce it somewhat.
Using the same graph and tooling except in this case I used a SCDF-Local to launch docker images like you discussed previously.
What occurred was after running the CTR instance 50 times one of the CTR executions appeared to stop. In this case one of the child apps failed to start because of the following error docker: Error response from daemon: driver failed programming external connectivity on endpoint stupefied_hypatia (fc8f22b557ad6dd9ea4c692792dab9e9259c0ae872cf02d1397409c99171f4d0): Error starting userland proxy: listen tcp 0.0.0.0:58386: bind: address already in use. This error appeared in the stderr log of the child task.
So CTR was waiting for the child application to start which it never did and thus CTR was effectively blocked.
The solution to this is to set the max-wait-time as discussed here: https://github.com/spring-cloud-task-app-starters/composed-task-runner/blob/master/spring-cloud-starter-task-composedtaskrunner/README.adoc

@Rostish
Copy link

Rostish commented Jan 5, 2019

I had to go to work to check this))).
It seems --spring.cloud.task.closecontextEnabled=true parameter helped to me. I did about 60 launches and CTR never stucks. Could you explain meaning of this parameter?

About docker. It looks like another bug, because i use docker only for SCDF-local deployment. And then use volume command to move custom tasks to container folder.

@cppwfs
Copy link
Contributor

cppwfs commented Jan 7, 2019

I'm glad that this resolved this issue for you. A brief discussion on the parameter can be found here: https://docs.spring.io/spring-cloud-task/docs/current-SNAPSHOT/reference/htmlsingle/#features-lifecycle CTR uses ThreadPoolTaskExecutor to manage splits in the graph, and thus the context remains open beyond the scope of the task. Thus this setting closes the context upon the completion in CTR. As of the release of CTR 2.1 the closeContextEnabled will be set by default.
The other issue is not really a bug with SCDF or CTR.
I will go ahead and close this issue.

@cppwfs cppwfs closed this as completed Jan 7, 2019
@cppwfs cppwfs removed the ready label Jan 7, 2019
@cppwfs cppwfs self-assigned this Jan 7, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

No branches or pull requests

4 participants