Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does CTR ends his work correct? insufficient memory for the Java Runtime Environment appears. #2667

Closed
Rostish opened this issue Dec 11, 2018 · 7 comments
Assignees
Labels
status/need-investigation Oh need to look under a hood

Comments

@Rostish
Copy link

Rostish commented Dec 11, 2018

Description:
Today i tried to launch my graph what contains about 30 task for 90 days.
I noticed that the server gradually runs out of memory, and then in one of the tasks next error was appeared:

There is insufficient memory for the Java Runtime Environment to continue.

Native memory allocation (mmap) failed to map 238551040 bytes for committing reserved memory.

An error report file with more information is saved as:

/tmp/MRF_20181201_EXEC_END_18537052664308845656/17457070341406264/MRF_20180908_EXEC_END_1-average-genre-statistic-calculation-online-web-0b144d13-7aee-41ba-aab1-774948f384d4/hs_err_pid6755.log

I checked memory state using "htop" command and saw next:

image

On screenshot you could see about 20% of messages. There are more of them. In the begining there were about 25GB of free space, after about 30 executions become 3-4GB of free space. After dataflow-server reboot memory is back.

I checked logs of CTR, last line says:

image

There are no words about shutdown.I can not be sure that CTR will report about shutdown as Spring Boot, but with insufficient of memory it look likes smt going wrong.

Release versions:
Local Server - 1.7.2 Snapshot
CTR-2.0.2

Custom apps:
My apps end their work coorectly, dataflow server says all jobs has completed status. In logs i can see shutdown messages.

Steps to reproduce:

  1. Construct a big graph that contain a lot of fast tasks.
  2. Launch them one by one.
  3. Watch how your memory is gone.

Additional context
My server has 48GB of RAM. There are a docker with about 10 running container. Dataflow-server is working locally not via container. I run it from my ssh session.

@Rostish Rostish changed the title Does CTR end his work correct? insufficient memory for the Java Runtime Environment appears. Does CTR ends his work correct? insufficient memory for the Java Runtime Environment appears. Dec 11, 2018
@cppwfs
Copy link
Contributor

cppwfs commented Dec 13, 2018

If you run jps on the server, what java apps are running?

@Rostish
Copy link
Author

Rostish commented Dec 17, 2018

@cppwfs
Good day for you.
I need some time to reproduce this behavior. i am waiting while new batch of days will be collected to start calculation for a long period. I am going to provide the results of jps command in a few days.

@Rostish
Copy link
Author

Rostish commented Dec 17, 2018

I ran one day several times in a row today. Jps command shows next:
image

All jobs have COMPLETED status , all task are ended.

UPDATE:
After an 17 hours the same here. Looks like CTRs stuck.

@sabbyanandan
Copy link
Contributor

Hi, @Rostish. Thank you for your update. It is a bit odd to see that you've seven instances of CTR running. Do you have seven different Composed Task pipelines running simultaneously?

It'd help if you could open up the logs for those seven instances, and share the last bit of logs/exception messages here for review.

We will attempt to reproduce it on our side as well.

@Rostish
Copy link
Author

Rostish commented Dec 19, 2018

Good day for you @sabbyanandan.
As i said i start Composed Task pipelines one by one. I start next one only then previous one is ended with status "COMPLETED". In my first post there was a CTR logs, they are all the same with other seven:
image
There are no exceptions or something else. Only a line with words "Job ... completed with follow parameters" and zero words about shutdown.
A new one log file from today CTR use:
image
P.S.After two days of work with CTR:
image
Absolutely all CTR not going to shutdown

UPDATE:
I checked our cluster project there is using CTR 2.0.0.REALESE ver, there are no the same problems.
Tommorrow i am going to redeploy this project with CTR 2.0.0.REALESE and will try to reproduce problem.

@sabbyanandan
Copy link
Contributor

That seems like a bug (in 2.0.2). We are in the middle of 2.0 M1 release of SCDF, so please bear with us, and we will get back to you.

We will also use spring-attic/spring-cloud-task-app-starters-composed-task-runner#60 to troubleshoot it.

@cppwfs
Copy link
Contributor

cppwfs commented Jan 7, 2019

Was able to resolve this issue with the closeContextEnabled flag. For more information please review: spring-attic/spring-cloud-task-app-starters-composed-task-runner#60

@cppwfs cppwfs closed this as completed Jan 7, 2019
@cppwfs cppwfs removed the ready label Jan 7, 2019
@jvalkeal jvalkeal added status/need-investigation Oh need to look under a hood and removed attempt-to-reproduce labels Dec 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status/need-investigation Oh need to look under a hood
Projects
None yet
Development

No branches or pull requests

4 participants