Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poll prior to RUNNING state #71

Merged
merged 2 commits into from Jul 14, 2017
Merged

Poll prior to RUNNING state #71

merged 2 commits into from Jul 14, 2017

Conversation

kevin-bates
Copy link
Member

These changes address Issue #69 and are related to Issue #64.
To optimize the time-to-connect duration following a yarn-cluster kernel's launch, we now start polling for connection information as soon as host assignment occurs. This way launchers can create the connection information prior to creating the spark session - which allows the kernel to reach a connected state as soon as possible.
Other changes:

  1. Decreased the wait time on each iteration form 1 second to .5 seconds to allow for more frequent polling (mostly for pull mode).
  2. Increased the interruptible socket timeout from 1 second to 5 seconds so as to optimize socket mode.
  3. Combined yarn queries into one to eliminate an extra query on each iteration until host assignment occurs.
  4. Refactored code to specific methods for each of the connection file access methods.

In order to look into improving kernel startup times, we should attempt
to get connection information from the assigned host as soon as the
host is known.  This is because connection information could be made
available even before Yarn has placed the application into RUNNING state
since spark-submit jobs require a spark session for that state transition
to occur.  By attempting to get that information, we have the ability
to look at possible asynchronous session creation, etc.

I also took this opportunity to refactor the pull and socket code into
local methods so as to clean up the application startup logic.  This
included reformatting debug messages such that useful information is
more quickly visible.
Decreased the non-interruptable sleep timeout from 1 to .5 seconds so
that we check status of connection information more frequently.

Increased interruptable socket timeout from 1 to 5 seconds so that we
optimize socket mode operations since the method will be interrupted
when the data arrives.  This way we don't spend time waiting on
non-interrupted calls.

Also performed a little more refactoring to cut down on the number of
Yarn queries performed each iteration.
@kevin-bates kevin-bates self-assigned this Jul 13, 2017
@kevin-bates kevin-bates added this to the Sprint 5 milestone Jul 13, 2017
@kevin-bates kevin-bates removed this from the Sprint 5 milestone Jul 14, 2017
@LK-Tmac1
Copy link
Contributor

It's more clean after handling pull/socket mode are refactored from the confirm_yarn_application_startup.

LGTM. Merge it now.

@LK-Tmac1 LK-Tmac1 merged commit 1f4c6fb into jupyter-server:elyra Jul 14, 2017
@kevin-bates kevin-bates deleted the early-polling branch July 14, 2017 17:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants