critical refactoring: attempt to solve graal-js callonce / callSingle issues and excessive memory use for call #1685

ptrthomas · 2021-07-16T18:35:31Z

cc @workwithprashant who raised this: https://stackoverflow.com/q/68411732/143475

not really looking forward to this fix. but maybe we shouldn't "return" any variables that were in the calling context

and would like to attempt a re-factor for the horrible graal-js limitation for using js objects emanating from one context on another thread e.g. #1633 and #1558

have a rough idea of moving callonce and callSingle onto a separate "special" context and try to make any use of that context thread safe. wish me luck :|

glad could get rid of scenario-listener - that was really clunky and a new approach seems promising - clone the js engine if needed for another thread etc

ptrthomas · 2021-07-17T13:04:24Z

so @workwithprashant I think you already figured that this solves your problem. set variables to null to avoid them being cloned. unless you have some better ideas

    * def out = null
    * def response = null
    * def response = call read('classpath:examples/library.feature@getLibraryData')

now we have learnt that java native host-object like functions can be passed around freely in graal exploring if we convert to runnable / consumer / function as in java lambdas seems to work well - this commit tries this with the active-mq example one possible limitation is only single argument functions are supported, sure this can be overcome but it is a reasonable limit - e.g. use maps to pass data in / out

workwithprashant · 2021-07-17T16:27:09Z

Well, it's going to be painful with setting null throughout all the scripts. I was looking at the code at ScenarioRuntime.java where we could save the keys from vars and remove those keys from nested vars in ScenarioEngine.java followed with runtime.engine.setVariables(allVars)

Not sure about the approach but exploring different options.

ptrthomas · 2021-07-17T17:14:58Z

@workwithprashant well, please suggest a different design then. let me first say that I personally discourage over-use of call you should never have more than one level for test-automation, at least that is what I think. I have written a bit about this here: https://stackoverflow.com/a/54126724/143475

so in this case if you have some complicated re-use, please use Java code and not feature files or JavaScript

a suggestion given in #1675 was to use a Java singleton and write to a temp file if needed. if your tests are complicated, there will be pain.

a core principle is that any variable defined using def has to be visible to called features. but I'm eager to hear any proposals

ptrthomas · 2021-07-17T17:39:14Z

@workwithprashant I found a reasonable solution. use a "wrapper" variable. because "inner" variables will not be cloned. try this change:

@unexpected
Scenario: Not over-writing nested variable
    * def data = {}
    * data.response = karate.call('classpath:examples/library.feature@getLibraryData')
    * string out = data.response
    * karate.log('FIRST RESPONSE = ', data.response)
    * karate.log('FIRST RESPONSE SIZE = ', out.length)
    * def out = null

    * data.response = karate.call('classpath:examples/library.feature@getLibraryData')
    * string out = data.response
    * karate.log('SECOND RESPONSE = ', data.response)
    * karate.log('SECOND RESPONSE SIZE = ', out.length)
    * def out = null
    
    * data.response = karate.call('classpath:examples/library.feature@getLibraryData')
    * string out = data.response
    * karate.log('THIRD RESPONSE SIZE = ', out.length)
    * karate.log('UNEXPECTED RESPONSE = ', data.response)

i'm setting out to null just to focus on the response

…ariables #1685

ptrthomas · 2021-07-17T18:37:41Z

@workwithprashant actually. you were right and we were doing an un-necessary variable definition. you can ignore my comment above. made a fix - so will be great if you can test with the latest version in develop !

workwithprashant · 2021-07-18T16:33:48Z

@ptrthomas Your fix worked beautifully on sample project.

PREVIOUS EXAMPLE: Called feature returning variables of calling context...

10:54:28.417 [main] INFO  com.intuit.karate - FIRST RESPONSE SIZE =  331
10:54:28.992 [main] INFO  com.intuit.karate - SECOND RESPONSE SIZE =  2125 
10:54:29.008 [main] INFO  com.intuit.karate - THIRD RESPONSE SIZE =  11919

WITH THE FIX NOW:

11:01:16.150 [main] INFO  com.intuit.karate - FIRST RESPONSE SIZE =  155
11:01:16.165 [main] INFO  com.intuit.karate - SECOND RESPONSE SIZE =  155
11:01:16.178 [main] INFO  com.intuit.karate - THIRD RESPONSE SIZE =  155

I am going to try the fix with performance testing on actual project and compare JVM heap usage. I will update this thread with my findings (probably in 24 hours)

Thank you @ptrthomas for reconsidering the approach :)

ptrthomas · 2021-07-18T16:38:29Z

@workwithprashant great to hear. apologies for all the grumbling - at the end I'm very glad we caught this. thanks for raising this, it possibly improves the other callSingle() / callonce situation because earlier we were passing everything around and now we don't - so cc @aleruz

workwithprashant · 2021-07-19T15:30:10Z

@ptrthomas I have some results to share for performance testing before and after the fix as below.

Environment

Gatling : 3.1.2
JDK : 14
Maven : 3.6.3

Setup

100 users ramp up in 100 seconds and sustained for 3600 seconds

Before the FIX

System Stats : (System where tests are executed)

CPU usage average = 90%
JVM usage at the end = 37 GB (growing consistently)

Gatling Results:
total transactions = 48933

After the FIX

System Stats : (System where tests are executed)

CPU usage average = 16% (decreased significantly) 👍
JVM usage at the end = 50 GB (growing consistently)

Gatling Results:
total transactions = 256395 (increased exponentially by 423.972%) 👍

Even though results are much better, I believe that we may be able to decrease the JVM usage during performance testing. When I used the memory profiler during the start of the test, it looks like ScenarioEngine.AttachResult() memory usage keeps growing. Following stats from initial 15-20 minutes of starting the tests. Even though GC is automatically performed, JVM usage trend keeps growing. (Need to investigate optimal GC settings passed to gatling plugin)

Since we don't need lot of results during performance testing, I was thinking of detecting if it's perf tests then skip some of the result calls which are not needed for Gatling (Not sure about its impact on gatling result collection).

Do you have any ideas?

ptrthomas · 2021-07-19T16:21:20Z

@workwithprashant this is fantastic work. I will try avoid the AttachResult part, but it is taking more time than I expected. will keep you posted.

ptrthomas · 2021-07-19T17:53:19Z

@workwithprashant okay, made the change, please try now !

workwithprashant · 2021-07-20T03:56:16Z

After 6d8aa7c FIX

It has helped little and I will keep investigating more. If I find something more then will raise a new request.

ptrthomas · 2021-07-20T04:01:35Z

@workwithprashant thanks. if you use callonce or callSingle there is a similar cloning of data that happens.

if you can remove callonce and callSingle for a test run, please try to do that and compare. and then we will know where to focus

ptrthomas · 2021-07-20T07:08:16Z

@workwithprashant an update - I tried to improve the callSingle / callonce code to avoid cloning too many objects. so ignore my message above, just run again with the latest from develop and let me know how it goes

workwithprashant · 2021-07-21T01:11:58Z

@ptrthomas I did not see any changes so I will investigate by migrating my karate scripts (performance testing scenarios) to profiling-test project which will be easier to debug. I may have to look into GC config passed to gatling maven plugin.

Thank you for all the improvements!!

workwithprashant · 2021-07-21T03:16:04Z

@ptrthomas I had log levels set to ERROR so I didn't see following warning.
[WARN ] 2021-07-20 22:10:19.692 [GatlingSystem-akka.actor.default-dispatcher-10] | - attach - immutable map: __gatling

I will try to reproduce this in sample project.

ptrthomas · 2021-07-21T03:31:26Z

@workwithprashant thanks that helps. this is the fix for the other issue you reported with gatling + call. but I will think of a better fix. maybe it is this hashmap coming from gatling which is the problem

workwithprashant · 2021-07-21T13:59:37Z

@ptrthomas I migrated my Karate scripts to profiling-test and executed performance tests without passing any GC related override settings to gatling plugin for the same test as before. JVM heap size keep increasing especially prominent for longer tests.

Following is the profiling after first 5 minutes and these top 3 classes keeps increasing.

I was passing following GC settings to gatling plugin in my framework.

-XX:+HeapDumpOnOutOfMemoryError
-XX:+UseG1GC
-XX:+UseStringDeduplication
-XX:+ParallelRefProcEnabled

It looks like JDK 16 has significant changes in GC strategy. I will try running same tests with JDK 16 and maven 3.8.1 and will update this thread.

ptrthomas · 2021-07-21T14:11:23Z

@workwithprashant I have been able to get VisualVM running and with the profiling-test project (I made a few small changes) I see absolutely no memory leaks. but I am on JDK 11 !

the next step here in my opinion is please simulate the problem you see within the profiling-test project - add more call or more JSON / variables etc. meanwhile I think I shall release 1.1.0.RC5

workwithprashant · 2021-07-21T14:13:48Z

@ptrthomas RC5 sounds good!! This effort is going to be iterative process and I will try to simulate the similar trend using profiling-test project which I can share with you.

introduced by #1685

ptrthomas · 2021-08-04T19:18:01Z

1.1.0 released

ptrthomas · 2024-04-22T06:20:53Z

note to self to run this benchmark as part of #2546

ptrthomas added the enhancement label Jul 16, 2021

ptrthomas self-assigned this Jul 16, 2021

ptrthomas added a commit that referenced this issue Jul 17, 2021

refactor js executable and async wip #1685

37daa89

glad could get rid of scenario-listener - that was really clunky and a new approach seems promising - clone the js engine if needed for another thread etc

ptrthomas mentioned this issue Jul 17, 2021

Java Heap Error With XML + Call Read #1675

Closed

ptrthomas added a commit that referenced this issue Jul 17, 2021

breaking potentially - no more un-necessary clone of isolated-scope v…

eb5a6fc

…ariables #1685

ptrthomas added the fixed label Jul 18, 2021

ptrthomas added this to the 1.1.0 milestone Jul 18, 2021

ptrthomas added a commit that referenced this issue Jul 19, 2021

resolve memory issues reported in #1685

603ef9b

ptrthomas added a commit that referenced this issue Jul 19, 2021

improve prev commit #1685 edge case

aca8cbc

ptrthomas added a commit that referenced this issue Jul 19, 2021

add test case also related to prev commit #1685

6d8aa7c

ptrthomas added a commit that referenced this issue Jul 20, 2021

was able to avoid deep clones - it was a mistake #1685

77408d8

ptrthomas added a commit that referenced this issue Jul 20, 2021

adding a project for profiling experiments #1685

280a432

ptrthomas added a commit that referenced this issue Jul 21, 2021

improve profiling helper project #1685

3c65431

ptrthomas added a commit that referenced this issue Jul 21, 2021

improve profiling helper project #1685

bd804b1

ptrthomas added a commit that referenced this issue Jul 21, 2021

improve profiling test project #1685

c72f991

ptrthomas added a commit that referenced this issue Jul 22, 2021

resolve variable access in called features #1687

ef6b28b

introduced by #1685

hlemmur mentioned this issue Aug 2, 2021

[1.1.0.RC5] for the chained feature calls payload variable gets unitialized if attempting to update it with set #1699

Closed

ptrthomas closed this as completed Aug 4, 2021

This was referenced Aug 11, 2021

Getting DynamicContext error after upgrading to Karate 1.1.0 from 1.1.0.RC4 #1711

Closed

The value 'DynamicObject<JSFunction>' cannot be passed from one context to another #1725

Closed

ptrthomas mentioned this issue Sep 13, 2022

Update Graal to v22.x #2009

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

critical refactoring: attempt to solve graal-js callonce / callSingle issues and excessive memory use for call #1685

critical refactoring: attempt to solve graal-js callonce / callSingle issues and excessive memory use for call #1685

ptrthomas commented Jul 16, 2021

ptrthomas commented Jul 17, 2021

workwithprashant commented Jul 17, 2021 •

edited

ptrthomas commented Jul 17, 2021

ptrthomas commented Jul 17, 2021 •

edited

ptrthomas commented Jul 17, 2021

workwithprashant commented Jul 18, 2021

ptrthomas commented Jul 18, 2021

workwithprashant commented Jul 19, 2021 •

edited

ptrthomas commented Jul 19, 2021

ptrthomas commented Jul 19, 2021

workwithprashant commented Jul 20, 2021

ptrthomas commented Jul 20, 2021

ptrthomas commented Jul 20, 2021

workwithprashant commented Jul 21, 2021

workwithprashant commented Jul 21, 2021

ptrthomas commented Jul 21, 2021

workwithprashant commented Jul 21, 2021 •

edited

ptrthomas commented Jul 21, 2021

workwithprashant commented Jul 21, 2021

ptrthomas commented Aug 4, 2021

ptrthomas commented Apr 22, 2024

critical refactoring: attempt to solve graal-js callonce / callSingle issues and excessive memory use for call #1685

critical refactoring: attempt to solve graal-js callonce / callSingle issues and excessive memory use for call #1685

Comments

ptrthomas commented Jul 16, 2021

ptrthomas commented Jul 17, 2021

workwithprashant commented Jul 17, 2021 • edited

ptrthomas commented Jul 17, 2021

ptrthomas commented Jul 17, 2021 • edited

ptrthomas commented Jul 17, 2021

workwithprashant commented Jul 18, 2021

ptrthomas commented Jul 18, 2021

workwithprashant commented Jul 19, 2021 • edited

Before the FIX

After the FIX

ptrthomas commented Jul 19, 2021

ptrthomas commented Jul 19, 2021

workwithprashant commented Jul 20, 2021

After 6d8aa7c FIX

ptrthomas commented Jul 20, 2021

ptrthomas commented Jul 20, 2021

workwithprashant commented Jul 21, 2021

workwithprashant commented Jul 21, 2021

ptrthomas commented Jul 21, 2021

workwithprashant commented Jul 21, 2021 • edited

ptrthomas commented Jul 21, 2021

workwithprashant commented Jul 21, 2021

ptrthomas commented Aug 4, 2021

ptrthomas commented Apr 22, 2024

workwithprashant commented Jul 17, 2021 •

edited

ptrthomas commented Jul 17, 2021 •

edited

workwithprashant commented Jul 19, 2021 •

edited

workwithprashant commented Jul 21, 2021 •

edited