Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added one internal magic to enable retry of session creation. #716

Merged
merged 3 commits into from
Jun 8, 2021

Conversation

edwardps
Copy link
Contributor

@edwardps edwardps commented Jun 4, 2021

Description
Fix two issues:

  1. the self.session_started[1] is set as true too early for exceptional case.
  2. the session is added too early[2] before session is actually started.

Enhanced the magic kernel by adding one internal magic to allow retry session
creation. Currently, the session creation error is treated as fatal error which
make it impossible[3] to retry after fixing the endpoint info.

[1] -

self.session_started = True

[2] -
self.session_manager.add_session(name, session)

[3] -

The session leaking consideration:
After the change to only add a successfully started session to session manager, it's possible to have a leaking session when session start runs into timeout issue on client side but the session is actually created on Livy side(thanks to comment from @devstein). For example, in session start() logic, we could have timeout exception after post a session to Livy service. In order to address it, we will have session cleanup when initial session start() throws exception and session status is not in initial state.

[4] - https://github.com/jupyter-incubator/sparkmagic/blob/master/sparkmagic/sparkmagic/livyclientlib/livysession.py#L151

Please see the movitivation for our use cases.

Motivation
To enable user to choose an Spark cluster and connect to it at runtime(means the
sparkmagic configuration is not generated in advance), we need to configure the endpoint
after kernel starting and enable retry session creation after fixing the incorrect endpoint.

We discussed this with Alejandro(aggFTW@) and he suggested to leverage the following do_not_call*
methods to configure the endpoint dynamically and create session after specifying configuration.

  1. _do_not_call_delete_session
  2. _do_not_call_change_endpoint
  3. _do_not_call_start_session

To conditionally allow the session creation retry, the changes introduced a flag
to allow retry on previous fatal of session creation. By default, the behavior
is kept unchanged and do not allow retry on fatal.

Testing Done

  1. Added unit test.
  2. Also ran the code in a notebook to verify.

Backwards Compatibility Criteria (if any)
The enhancement is activated conditionally and no compatibility issue is expected.

Checklist

  • Wrote a description of my changes above
  • Added a bullet point for my changes to the top of the CHANGELOG.md file
  • Added or modified unit tests to reflect my changes
  • Manually tested with a notebook
  • [-] If adding a feature, there is an example notebook and/or documentation in the README.md file (N/A as no user visible feature added. The internal magic is added for kernel integration purpose)

**Description**
Fix two issues:
1. the self.session_started[1] is set as true too early for exceptional case.
2. the session is added too early[2] before session is actually started.

Enhanced the magic kernel by adding one internal magic to allow retry session
creation. Currently, the session creation error is treated as fatal error which
make it impossible[3] to retry after fixing the endpoint info.

[1] https://github.com/jupyter-incubator/sparkmagic/blob/bfabbb39a0249197c2c05c8efe681710fff9151b/sparkmagic/sparkmagic/kernels/kernelmagics.py#L357
[2] https://github.com/jupyter-incubator/sparkmagic/blob/bfabbb39a0249197c2c05c8efe681710fff9151b/sparkmagic/sparkmagic/livyclientlib/sparkcontroller.py#L90
[3] https://github.com/jupyter-incubator/sparkmagic/blob/bfabbb39a0249197c2c05c8efe681710fff9151b/sparkmagic/sparkmagic/kernels/kernelmagics.py#L350

Please see the movitivation for our use cases.
**Motivation**
To enable user to choose an Spark cluster and connect to it at runtime(means the
sparkmagic configuration is not generated in advance), we need to configure the endpoint
after kernel starting and enable retry session creation after fixing the incorrect endpoint.

We discussed this with Alejandro(aggFTW@) and he suggested to leverage the following _do_not_call_*
methods to configure the endpoint dynamically and create session after specifying configuration.
1.    _do_not_call_delete_session
2.    _do_not_call_change_endpoint
3.    _do_not_call_start_session

To conditionally allow the session creation retry, the changes introduced a flag
to allow retry on previous fatal of session creation. By default, the behavior
is kept unchanged and do not allow retry on fatal.

**Testing Done**
1. Added unit test.
2. Also ran the code in a notebook to verify.

**Backwards Compatibility Criteria (if any)**
The enhancement is activated conditionally and no compatibility issue is expected.
Copy link
Collaborator

@devstein devstein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@edwardps Thank you for the well-written PR 🥇 Is this also meant to address #709?

@edwardps
Copy link
Contributor Author

edwardps commented Jun 5, 2021

Hi Devin,

@edwardps Thank you for the well-written PR 🥇 Is this also meant to address #709?

No exactly, our case is kind of overlapping with original issue reported by 709 but not exactly the same.
Issue 709 can be addressed if it's fine to have python code communicating with kernel to invoke the do_not_call* methods like below.

%%local
# set endpoint
change_endpoint_internal_magic = get_ipython().find_line_magic('_do_not_call_change_endpoint')
change_endpoint_internal_magic('-s http://host:8998 -t Basic_Access -u billy -p welcome123')

# start session
start_session_internal_magic = get_ipython().find_cell_magic('_do_not_call_start_session')
start_session_internal_magic()

session.start()
self.session_manager.add_session(name, session)
Copy link
Collaborator

@devstein devstein Jun 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@edwardps Can you think of any unintended side-effects of only adding the session after it's started? I agree this change makes sense, but it's been this way for many years and I don't know if there are historical reasons for the ordering

cc @itamarst

Copy link
Contributor Author

@edwardps edwardps Jun 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@devstein This is a fair callout. If there is timeout issue[1], we might have session leaking. So we can clean up the session for edge case caused by e.g. timeout.
Any suggestion?

        try:
            session.start()
        except:
            if session.is_posted(): session.delete() # is_posted() is decided by self.status != constants.NOT_STARTED_SESSION_STATUS
            raise
        else:
            self.session_manager.add_session(name, session)

[1] - https://github.com/jupyter-incubator/sparkmagic/blob/master/sparkmagic/sparkmagic/livyclientlib/livysession.py#L151

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Devin,

Made a code fix for the session leaking issue. Please let me know if you have any suggestion or comments.
edwardps@b6c89eb

Will update the PR after confirming the solution.

Thanks,
Ed

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Ed, thanks for the quick turnaround. The solution looks good to me. Can you add a comment to explain the logic (avoid leaking sessions)?

Also, cc @aggFTW as he was the last to edit this logic and refactored it. @aggFTW Do you see any issues with changing this logic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing. I will add more details in the comment when update this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added info about the session leaking edge case handling in the PR description and git submit.

@devstein devstein requested a review from aggFTW June 8, 2021 01:09
**Description**
When session start() throws exception, session delete() is invoked when
session status is not in NOT_STARTED_SESSION_STATUS to clean up resource.
This makes it safe to add the session to session manager only when session
is started properly.

This change is to address the edge case which could cause leaking session.
From client side we could see timeout issue but actually the session is
already created on Livy side.

**Testing Done**
Added unit test cases.
Also tested in a notebook.
@devstein devstein merged commit 56c6b42 into jupyter-incubator:master Jun 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants