Skip to content

Commit

Permalink
Auto-enable kernel session persistence if availability mode is set
Browse files Browse the repository at this point in the history
  • Loading branch information
kevin-bates committed Jun 2, 2022
1 parent 86ed531 commit 464f59d
Show file tree
Hide file tree
Showing 3 changed files with 28 additions and 26 deletions.
16 changes: 9 additions & 7 deletions docs/source/operators/config-availability.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,10 @@

Enterprise Gateway can be optionally configured in one of two "availability modes": _single-instance_ or _multi-instance_. When configured, Enterprise Gateway can recover from failures and reconnect to any active remote kernels that were previously managed by the terminated EG instance. As such, both modes require that kernel session persistence also be enabled via `KernelSessionManager.enable_persistence=True`.

```{note}
Kernel session persistence will be automtically enabled whenever availability mode is configured.
```

```{caution}
**Availability modes and kernel session persistence should be considered experimental!**
Expand All @@ -16,7 +20,7 @@ We hope to address these in future releaases (depending on demand).

_Single-instance availability_ assumes that, upon failure of the original EG instance, another EG instance will be started. Upon startup of the second instance (following the termination of the first), EG will attempt to load and reconnect to all kernels that were deemed active when the previous instance terminated. This mode is somewhat analogous to the classic HA/DR mode of _active-passive_ and is typically used when node resources are at a premium or the number of replicas (in the Kubernetes sense) must remain at 1.

To configure Enterprise Gateway for 'single-instance' availability, you must first enable session persistence as noted above and configure `EnterpiseGatewayApp.availability_mode=single-instance` or set env `EG_AVAILABILITY_MODE=single-instance`.
To enable Enterprise Gateway for 'single-instance' availability, configure `EnterpiseGatewayApp.availability_mode=single-instance` or set env `EG_AVAILABILITY_MODE=single-instance`.

Here's an example for starting Enterprise Gateway with single-instance availability:

Expand All @@ -27,7 +31,6 @@ LOG=/var/log/enterprise_gateway.log
PIDFILE=/var/run/enterprise_gateway.pid

jupyter enterprisegateway --ip=0.0.0.0 --port_retries=0 --log-level=DEBUG \
--KernelSessionManager.enable_persistence=True \
--EnterpriseGatewayApp.availability_mode=single-instance > $LOG 2>&1 &

if [ "$?" -eq 0 ]; then
Expand All @@ -47,7 +50,7 @@ Configuring client affinity is **strongly recommended**, otherwise functionality

In this mode, when one node goes down, the subsequent request will be routed to a different node that doesn't know about the kernel. Prior to returning a `404` (not found) status code, EG will check its persisted store to determine if the kernel was managed and, if so, attempt to "hydrate" a `KernelManager` instance associated with the remote kernel. (Of course, if the kernel was running local to the downed server, chances are it cannot be _revived_.) Upon successful "hydration" the request continues as if on the originating node. Because _client affinity_ is in place, subsequent requests should continue to be routed to the "servicing node".

To configure Enterprise Gateway for 'multi-instance' availability, you must first enable session persistence as noted above and configure `EnterpiseGatewayApp.availability_mode=multi-instance` or set env `EG_AVAILABILITY_MODE=multi-instance`.
To enable Enterprise Gateway for 'multi-instance' availability, configure `EnterpiseGatewayApp.availability_mode=multi-instance` or set env `EG_AVAILABILITY_MODE=multi-instance`.

```{attention}
To preserve backwards compatibility, if only kernel session persistence is enabled via `KernelSessionManager.enable_persistence=True`, the availability mode will be automatically configured to 'multi-instance' if `EnterpiseGatewayApp.availability_mode` is not configured.
Expand All @@ -62,7 +65,6 @@ LOG=/var/log/enterprise_gateway.log
PIDFILE=/var/run/enterprise_gateway.pid

jupyter enterprisegateway --ip=0.0.0.0 --port_retries=0 --log-level=DEBUG \
--KernelSessionManager.enable_persistence=True \
--EnterpriseGatewayApp.availability_mode=multi-instance > $LOG 2>&1 &

if [ "$?" -eq 0 ]; then
Expand All @@ -75,20 +77,20 @@ fi
## Kernel Session Persistence

```{attention}
Due to its experimental nature, kernel session persistence is disabled by default. To enable this functionality, you must configure `KernelSessionManger.enable_persistence=True`.
Due to its experimental nature, kernel session persistence is disabled by default. To enable this functionality, you must configure `KernelSessionManger.enable_persistence=True` or configure `EnterpriseGatewayApp.availability_mode` to either `single-instance` or `multi-instance`.
```

As noted above, the availability modes rely on the persisted information relative to the kernel. This information consists of the arguments and options used to launch the kernel, along with its connection information. In essence, it consists of any information necessary to re-establish communication with the kernel.

Kernel session persistence is unique to Enterprise Gateway and consists of a _bring-your-own_ model whereby subclasses of `KernelSessionManager` can be configured that manage their own persistent storage of kernel sessions. By default, Enterprise Gateway provides a `FileKernelSessionManager` that reads and writes kernel session information to a pre-configured directory. For use with `availability_mode` it is presumed that directory resides in a location accessible by all applicable nodes running Enterprise Gateway.

```{note}
This option can be also be set on subclasses of `KernelSessionsManager` (e.g., `FileKernelSessionManager.enable_persistence=True`).
This option can be also be set on subclasses of `KernelSessionManager` (e.g., `FileKernelSessionManager.enable_persistence=True`).
```

By default, the directory used to store a given kernel's session information is the `JUPYTER_DATA_DIR`. This location can be configured using `FileKernelSessionManager.persistence_root` with a value of a fully-qualified path to an existing directory.

To introduce a different implementation, you must configure the kernel session manager class. Here's an example for starting Enterprise Gateway using a custom `KernelSessionManager` and 'single-instance' availability:
To introduce a different implementation, you must configure the kernel session manager class. Here's an example for starting Enterprise Gateway using a custom `KernelSessionManager` and 'single-instance' availability. Note that setting `--MyCustomKernelSessionManager.enable_persistence=True` is not necessary because an availability mode is specified, but displayed here for completeness:

```bash
#!/bin/bash
Expand Down
3 changes: 1 addition & 2 deletions docs/source/operators/config-cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,8 +108,7 @@ EnterpriseGatewayApp(EnterpriseGatewayConfigMixin, JupyterApp) options
Default: ''
--EnterpriseGatewayApp.availability_mode=<CaselessStrEnum>
Specifies the type of availability. Values must be one of "single-instance"
or "multi-instance". Configuration of this option requires that
KernelSessionManager.enable_persistence is True. (EG_AVAILABILITY_MODE env var)
or "multi-instance". (EG_AVAILABILITY_MODE env var)
Choices: any of ['single-instance', 'multi-instance'] (case-insensitive) or None
Default: None
--EnterpriseGatewayApp.base_url=<Unicode>
Expand Down
35 changes: 18 additions & 17 deletions enterprise_gateway/enterprisegatewayapp.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,6 @@
from jupyter_server.utils import url_path_join
from tornado import httpserver, web
from tornado.log import enable_pretty_logging
from traitlets import TraitError
from traitlets.config import Configurable
from zmq.eventloop import ioloop

Expand Down Expand Up @@ -136,24 +135,26 @@ def init_configurables(self):

# For B/C purposes, check if session persistence is enabled. If so, and availability
# mode is not enabled, go ahead and default availability mode to 'multi-instance'.
if self.kernel_session_manager.enable_persistence and self.availability_mode is None:
self.availability_mode = "multi-instance"
self.log.info(
f"Kernel session persistence is enabled but availability mode is not. "
f"Setting EnterpriseGatewayApp.availability_mode to '{self.availability_mode}'."
)

if self.availability_mode is not None:
if self.kernel_session_manager.enable_persistence is False:
raise TraitError(
f"Availability mode is configured as '{self.availability_mode}', "
f"yet kernel session persistence has not been enabled. Configure "
f"KernelSessionManager.enable_persistence and restart Enterprise Gateway."
if self.kernel_session_manager.enable_persistence:
if self.availability_mode is None:
self.availability_mode = "multi-instance"
self.log.info(
f"Kernel session persistence is enabled but availability mode is not. "
f"Setting EnterpriseGatewayApp.availability_mode to '{self.availability_mode}'."
)
else:
# Persistence is not enabled, check if availability_mode is configured and, if so,
# auto-enable persistence
if self.availability_mode is not None:
self.kernel_session_manager.enable_persistence = True
self.log.info(
f"Availability mode is set to '{self.availability_mode}' yet kernel session "
"persistence is not enabled. Enabling kernel session persistence."
)

# If we're using single-instance availability, attempt to start persisted sessions
if self.availability_mode == "single-instance":
self.kernel_session_manager.start_sessions()
# If we're using single-instance availability, attempt to start persisted sessions
if self.availability_mode == "single-instance":
self.kernel_session_manager.start_sessions()

self.contents_manager = None # Gateways don't use contents manager

Expand Down

0 comments on commit 464f59d

Please sign in to comment.