Poll prior to RUNNING state #71

kevin-bates · 2017-07-13T23:11:31Z

These changes address Issue #69 and are related to Issue #64.
To optimize the time-to-connect duration following a yarn-cluster kernel's launch, we now start polling for connection information as soon as host assignment occurs. This way launchers can create the connection information prior to creating the spark session - which allows the kernel to reach a connected state as soon as possible.
Other changes:

Decreased the wait time on each iteration form 1 second to .5 seconds to allow for more frequent polling (mostly for pull mode).
Increased the interruptible socket timeout from 1 second to 5 seconds so as to optimize socket mode.
Combined yarn queries into one to eliminate an extra query on each iteration until host assignment occurs.
Refactored code to specific methods for each of the connection file access methods.

In order to look into improving kernel startup times, we should attempt to get connection information from the assigned host as soon as the host is known. This is because connection information could be made available even before Yarn has placed the application into RUNNING state since spark-submit jobs require a spark session for that state transition to occur. By attempting to get that information, we have the ability to look at possible asynchronous session creation, etc. I also took this opportunity to refactor the pull and socket code into local methods so as to clean up the application startup logic. This included reformatting debug messages such that useful information is more quickly visible.

Decreased the non-interruptable sleep timeout from 1 to .5 seconds so that we check status of connection information more frequently. Increased interruptable socket timeout from 1 to 5 seconds so that we optimize socket mode operations since the method will be interrupted when the data arrives. This way we don't spend time waiting on non-interrupted calls. Also performed a little more refactoring to cut down on the number of Yarn queries performed each iteration.

LK-Tmac1 · 2017-07-14T16:21:59Z

It's more clean after handling pull/socket mode are refactored from the confirm_yarn_application_startup.

LGTM. Merge it now.

kevin-bates added 2 commits July 12, 2017 15:27

kevin-bates requested a review from LK-Tmac1 July 13, 2017 23:11

kevin-bates self-assigned this Jul 13, 2017

kevin-bates added this to the Sprint 5 milestone Jul 13, 2017

This was referenced Jul 13, 2017

Improve time-to-connect times for yarn-cluster kernels #69

Closed

Investigate async creation of spark session in launchers #64

Closed

kevin-bates removed this from the Sprint 5 milestone Jul 14, 2017

LK-Tmac1 merged commit 1f4c6fb into jupyter-server:elyra Jul 14, 2017

kevin-bates deleted the early-polling branch July 14, 2017 17:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Poll prior to RUNNING state #71

Poll prior to RUNNING state #71

kevin-bates commented Jul 13, 2017

LK-Tmac1 commented Jul 14, 2017

Poll prior to RUNNING state #71

Poll prior to RUNNING state #71

Conversation

kevin-bates commented Jul 13, 2017

LK-Tmac1 commented Jul 14, 2017