Comparison Testing between Elyra and JNBG #42

aazhou1 · 2017-06-26T22:02:25Z

No description provided.

aazhou1 · 2017-06-27T18:31:16Z

Running 1g each of executor and driver memory and with the default yarn queue. Using JNBG, am able to start 4 Scala and 4 Python kernels before exceeding the Queue's AM resource limit. On JNBG, 41 of 44 GB of available memory is being used. On Elyra, am able to start 2 Scala and 2 Python kernels before exceeding the Queue's AM resource limit. On Elyra, 25 of 44 GB of available memory is being used. I think there's some configuration on Elyra that I need to change to allow more of the available memory to be used. Right now I cannot start new kernels on Elyra once >1/2 of the memory has been used up.

aazhou1 · 2017-07-06T18:46:43Z

Ran 9 Toree Scala kernels on Elyra JKG node and 9 Spark Scala kernels on JNBG node and observed a pretty similar resource usage of the gateway nodes (9.32 GB for Elyra, 9.69 GB for JNBG). Now running a similar test with the Yarn cluster kernels on Elyra JKG to get a comparison. Will also work on timing kernel startups.

aazhou1 · 2017-07-07T17:23:42Z

Kernel Startup Times.xlsx

Timed the startups of kernels on the Elyra JKG node. For the yarn cluster kernels, included intermediate times between kernel start and yarn assignment, and times between yarn assignment and kernel running state. Did 2 trial runs each and averaged out results. Scala yarn cluster mode takes about 14 seconds more than Python yarn cluster mode. Scala yarn cluster mode kernels take about 5 seconds longer than regular Toree Scala Yarn Client mode jerbeks.

aazhou1 · 2017-07-07T21:09:51Z

toree_scala_yarn_client_stats.xlsx
yarn_cluster_scala.xlsx

Tested memory usage of 9 Apache Toree Scala Yarn Client kernels as mentioned on 2 comments above. Afterwards tested 9 Scala YARN Cluster mode kernels on Elyra JKG to get a comparison. Elyra JKG only had one process running (kernel gateway) and a bunch of defunct Java processes that were not using any memory, probably from the old JVM's. The memory footprint on the Elyra JKG node was 0.0617 GB, showing that with enough YARN memory, Elyra can support more kernels running simultaneously than JNBG.

LK-Tmac1 · 2017-07-07T22:20:05Z

Based on the comparison on Kernel Startup Times.xlsx, here is my understanding:

For Python YARN cluster and Toree Scala YARN cluster mode, the YARN assignment time was roughly the same (5 secs), whereas after assignment Toree Scala kernel took more time than Python kernel (25 secs v.s. 10 secs).
This was probably because for each application, YARN required certain amount of time for initialization and assignment of resources, so this "assignment" time would be roughly the same for every app/kernel; however, after the assignment, since to start a Toree Scala kernel, there were more initialization works to do than the Python kernel, e.g. kernel bootstrap, Toree Scala kernel took longer time to be in RUNNING state.
For the regular Toree Scala kernel, as it was in YARN client mode, its total starting time was slightly shorter than the YARN cluster case probably due to that in client mode YARN would take shorter time for assignment of resources (driver always on the local node v.s choosing which container on which node to run App Master).

Based on the comparison on toree_scala_stats.xlsx and yarn_scala.xlsx, I believe this was what we expected, i.e. being able to run more kernels on YARN cluster mode with better load balancing on the Elyra node.

aazhou1 · 2017-07-13T21:30:23Z

decided to compare kernel load on IOP clusters due to JNBG compatibility

aazhou1 · 2017-07-20T21:04:36Z

Finished testing with 15 YARN nodes, able to run 54-55 kernels on Elyra for both kernels, while on JNBG was able to run 48 python kernels and 28 scala kernels. Planning to test 12 YARN nodes, 9, 6, 3.

aazhou1 · 2017-07-21T21:21:59Z

KernelLoads.xlsx

Finished testing kernel loads for both YARN modes. It seems like for the Python kernels, if there are 12 YARN nodes or less, then the maximum kernel load exhibits the same linear growth with more YARN nodes. At 15 YARN nodes, the maximum kernel load is limited in YARN client mode as the gateway node is populated to the max with drivers. For the Scala kernels, if there are less than 6 YARN nodes, then the same linear growth in kernel load with YARN nodes is observed. Above 6 nodes, the maximum kernel load for YARN client mode is capped at 28, while the YARN cluster mode keeps increasing maximum kernel load linearly with YARN nodes. At 15 YARN nodes, YARN cluster mode can support almost twice a maximum kernel load as YARN client mode. There seems to be a much larger difference in the 2 YARN deployment modes for Scala kernels than Python kernels, probably because there's a larger JVM involved in starting Toree kernels that we were able to kill using Zee's spark opt?

aazhou1 · 2017-07-24T18:37:17Z

updated the powerpoint slides with the graph from the spreadsheet

LK-Tmac1 added the resource management label Jul 7, 2017

LK-Tmac1 assigned LK-Tmac1 and aazhou1 and unassigned LK-Tmac1 Jul 7, 2017

kevin-bates mentioned this issue Jul 11, 2017

Toree is taking 10 seconds to respond to initial kernel info request #62

Closed

kevin-bates added this to the Sprint 5 milestone Jul 11, 2017

kevin-bates modified the milestones: Sprint 6, Sprint 5 Jul 18, 2017

aazhou1 closed this as completed Jul 27, 2017

lresende unassigned aazhou1 Sep 15, 2017

kevin-bates modified the milestones: Sprint 6, v0.6 Mar 26, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comparison Testing between Elyra and JNBG #42

Comparison Testing between Elyra and JNBG #42

aazhou1 commented Jun 26, 2017

aazhou1 commented Jun 27, 2017

aazhou1 commented Jul 6, 2017

aazhou1 commented Jul 7, 2017 •

edited

aazhou1 commented Jul 7, 2017 •

edited

LK-Tmac1 commented Jul 7, 2017 •

edited

aazhou1 commented Jul 13, 2017 •

edited

aazhou1 commented Jul 20, 2017

aazhou1 commented Jul 21, 2017 •

edited

aazhou1 commented Jul 24, 2017

Comparison Testing between Elyra and JNBG #42

Comparison Testing between Elyra and JNBG #42

Comments

aazhou1 commented Jun 26, 2017

aazhou1 commented Jun 27, 2017

aazhou1 commented Jul 6, 2017

aazhou1 commented Jul 7, 2017 • edited

aazhou1 commented Jul 7, 2017 • edited

LK-Tmac1 commented Jul 7, 2017 • edited

aazhou1 commented Jul 13, 2017 • edited

aazhou1 commented Jul 20, 2017

aazhou1 commented Jul 21, 2017 • edited

aazhou1 commented Jul 24, 2017

aazhou1 commented Jul 7, 2017 •

edited

aazhou1 commented Jul 7, 2017 •

edited

LK-Tmac1 commented Jul 7, 2017 •

edited

aazhou1 commented Jul 13, 2017 •

edited

aazhou1 commented Jul 21, 2017 •

edited