New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comparison Testing between Elyra and JNBG #42
Comments
Running 1g each of executor and driver memory and with the default yarn queue. Using JNBG, am able to start 4 Scala and 4 Python kernels before exceeding the Queue's AM resource limit. On JNBG, 41 of 44 GB of available memory is being used. On Elyra, am able to start 2 Scala and 2 Python kernels before exceeding the Queue's AM resource limit. On Elyra, 25 of 44 GB of available memory is being used. I think there's some configuration on Elyra that I need to change to allow more of the available memory to be used. Right now I cannot start new kernels on Elyra once >1/2 of the memory has been used up. |
Ran 9 Toree Scala kernels on Elyra JKG node and 9 Spark Scala kernels on JNBG node and observed a pretty similar resource usage of the gateway nodes (9.32 GB for Elyra, 9.69 GB for JNBG). Now running a similar test with the Yarn cluster kernels on Elyra JKG to get a comparison. Will also work on timing kernel startups. |
Timed the startups of kernels on the Elyra JKG node. For the yarn cluster kernels, included intermediate times between kernel start and yarn assignment, and times between yarn assignment and kernel running state. Did 2 trial runs each and averaged out results. Scala yarn cluster mode takes about 14 seconds more than Python yarn cluster mode. Scala yarn cluster mode kernels take about 5 seconds longer than regular Toree Scala Yarn Client mode jerbeks. |
toree_scala_yarn_client_stats.xlsx Tested memory usage of 9 Apache Toree Scala Yarn Client kernels as mentioned on 2 comments above. Afterwards tested 9 Scala YARN Cluster mode kernels on Elyra JKG to get a comparison. Elyra JKG only had one process running (kernel gateway) and a bunch of defunct Java processes that were not using any memory, probably from the old JVM's. The memory footprint on the Elyra JKG node was 0.0617 GB, showing that with enough YARN memory, Elyra can support more kernels running simultaneously than JNBG. |
Based on the comparison on
Based on the comparison on |
decided to compare kernel load on IOP clusters due to JNBG compatibility |
Finished testing with 15 YARN nodes, able to run 54-55 kernels on Elyra for both kernels, while on JNBG was able to run 48 python kernels and 28 scala kernels. Planning to test 12 YARN nodes, 9, 6, 3. |
Finished testing kernel loads for both YARN modes. It seems like for the Python kernels, if there are 12 YARN nodes or less, then the maximum kernel load exhibits the same linear growth with more YARN nodes. At 15 YARN nodes, the maximum kernel load is limited in YARN client mode as the gateway node is populated to the max with drivers. For the Scala kernels, if there are less than 6 YARN nodes, then the same linear growth in kernel load with YARN nodes is observed. Above 6 nodes, the maximum kernel load for YARN client mode is capped at 28, while the YARN cluster mode keeps increasing maximum kernel load linearly with YARN nodes. At 15 YARN nodes, YARN cluster mode can support almost twice a maximum kernel load as YARN client mode. There seems to be a much larger difference in the 2 YARN deployment modes for Scala kernels than Python kernels, probably because there's a larger JVM involved in starting Toree kernels that we were able to kill using Zee's spark opt? |
updated the powerpoint slides with the graph from the spreadsheet |
No description provided.
The text was updated successfully, but these errors were encountered: