Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comparison Testing between Elyra and JNBG #42

Closed
aazhou1 opened this issue Jun 26, 2017 · 9 comments
Closed

Comparison Testing between Elyra and JNBG #42

aazhou1 opened this issue Jun 26, 2017 · 9 comments

Comments

@aazhou1
Copy link

aazhou1 commented Jun 26, 2017

No description provided.

@aazhou1
Copy link
Author

aazhou1 commented Jun 27, 2017

Running 1g each of executor and driver memory and with the default yarn queue. Using JNBG, am able to start 4 Scala and 4 Python kernels before exceeding the Queue's AM resource limit. On JNBG, 41 of 44 GB of available memory is being used. On Elyra, am able to start 2 Scala and 2 Python kernels before exceeding the Queue's AM resource limit. On Elyra, 25 of 44 GB of available memory is being used. I think there's some configuration on Elyra that I need to change to allow more of the available memory to be used. Right now I cannot start new kernels on Elyra once >1/2 of the memory has been used up.

@aazhou1
Copy link
Author

aazhou1 commented Jul 6, 2017

Ran 9 Toree Scala kernels on Elyra JKG node and 9 Spark Scala kernels on JNBG node and observed a pretty similar resource usage of the gateway nodes (9.32 GB for Elyra, 9.69 GB for JNBG). Now running a similar test with the Yarn cluster kernels on Elyra JKG to get a comparison. Will also work on timing kernel startups.

@aazhou1
Copy link
Author

aazhou1 commented Jul 7, 2017

Kernel Startup Times.xlsx

Timed the startups of kernels on the Elyra JKG node. For the yarn cluster kernels, included intermediate times between kernel start and yarn assignment, and times between yarn assignment and kernel running state. Did 2 trial runs each and averaged out results. Scala yarn cluster mode takes about 14 seconds more than Python yarn cluster mode. Scala yarn cluster mode kernels take about 5 seconds longer than regular Toree Scala Yarn Client mode jerbeks.

@aazhou1
Copy link
Author

aazhou1 commented Jul 7, 2017

toree_scala_yarn_client_stats.xlsx
yarn_cluster_scala.xlsx

Tested memory usage of 9 Apache Toree Scala Yarn Client kernels as mentioned on 2 comments above. Afterwards tested 9 Scala YARN Cluster mode kernels on Elyra JKG to get a comparison. Elyra JKG only had one process running (kernel gateway) and a bunch of defunct Java processes that were not using any memory, probably from the old JVM's. The memory footprint on the Elyra JKG node was 0.0617 GB, showing that with enough YARN memory, Elyra can support more kernels running simultaneously than JNBG.

@LK-Tmac1
Copy link
Contributor

LK-Tmac1 commented Jul 7, 2017

Based on the comparison on Kernel Startup Times.xlsx, here is my understanding:

  1. For Python YARN cluster and Toree Scala YARN cluster mode, the YARN assignment time was roughly the same (5 secs), whereas after assignment Toree Scala kernel took more time than Python kernel (25 secs v.s. 10 secs).
  2. This was probably because for each application, YARN required certain amount of time for initialization and assignment of resources, so this "assignment" time would be roughly the same for every app/kernel; however, after the assignment, since to start a Toree Scala kernel, there were more initialization works to do than the Python kernel, e.g. kernel bootstrap, Toree Scala kernel took longer time to be in RUNNING state.
  3. For the regular Toree Scala kernel, as it was in YARN client mode, its total starting time was slightly shorter than the YARN cluster case probably due to that in client mode YARN would take shorter time for assignment of resources (driver always on the local node v.s choosing which container on which node to run App Master).

Based on the comparison on toree_scala_stats.xlsx and yarn_scala.xlsx, I believe this was what we expected, i.e. being able to run more kernels on YARN cluster mode with better load balancing on the Elyra node.

@aazhou1
Copy link
Author

aazhou1 commented Jul 13, 2017

decided to compare kernel load on IOP clusters due to JNBG compatibility

@kevin-bates kevin-bates modified the milestones: Sprint 6, Sprint 5 Jul 18, 2017
@aazhou1
Copy link
Author

aazhou1 commented Jul 20, 2017

Finished testing with 15 YARN nodes, able to run 54-55 kernels on Elyra for both kernels, while on JNBG was able to run 48 python kernels and 28 scala kernels. Planning to test 12 YARN nodes, 9, 6, 3.

@aazhou1
Copy link
Author

aazhou1 commented Jul 21, 2017

KernelLoads.xlsx

Finished testing kernel loads for both YARN modes. It seems like for the Python kernels, if there are 12 YARN nodes or less, then the maximum kernel load exhibits the same linear growth with more YARN nodes. At 15 YARN nodes, the maximum kernel load is limited in YARN client mode as the gateway node is populated to the max with drivers. For the Scala kernels, if there are less than 6 YARN nodes, then the same linear growth in kernel load with YARN nodes is observed. Above 6 nodes, the maximum kernel load for YARN client mode is capped at 28, while the YARN cluster mode keeps increasing maximum kernel load linearly with YARN nodes. At 15 YARN nodes, YARN cluster mode can support almost twice a maximum kernel load as YARN client mode. There seems to be a much larger difference in the 2 YARN deployment modes for Scala kernels than Python kernels, probably because there's a larger JVM involved in starting Toree kernels that we were able to kill using Zee's spark opt?

@aazhou1
Copy link
Author

aazhou1 commented Jul 24, 2017

updated the powerpoint slides with the graph from the spreadsheet

@aazhou1 aazhou1 closed this as completed Jul 27, 2017
@kevin-bates kevin-bates modified the milestones: Sprint 6, v0.6 Mar 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants