Ways to tackle public access #55
Replies: 10 comments 6 replies
-
Option 1: Exposing Jupyter Hub Directly to the internetThis is likely to be a non-starter but is worth documenting anyway, but the idea is that we create a load balancer and expose jupyter hub directly to the internet: flowchart LR
InternalUser([Internal User]) --> |VPN| LTH[LTH Network] --> AzureHubVNET[Azure Hub VNET]
subgraph azpublic [Azure: Publicly Accessible Resources]
Entra
LB
end
subgraph AzureHub [Azure Hub Network]
AzureHubVNET
end
subgraph LSCSDE [LSC SDE Network]
direction TB
subgraph AzureSpokeProdVNet[K8s Cluster]
Jupyter[Jupyter Hub] --> ATLAS[OHDSI Atlas]
end
end
AzureHubVNET --> Jupyter
ExternalUser([External User]) --> |Browser| LB[Public Load Balancer] --> Jupyter
InternalUser --> Entra
ExternalUser --> Entra
The user logs in via their own browser and is taken directly to jupyter hub which exposes the other services via jupyterhub. This relies on the authentication and relies on the users to not transfer data out of the environment. Scenario AssessmentA user downloads files/data to their local machineA user can download any files or data that is accessible to their browser. Score: 0 A user download files/data to an intermediate VMThis does not apply as there is no intermediate VM Score: N/A A user transfer files from one workspace to anotherIf a user wishes to transfer files from one workspace to another, they cannot do so within the workspace, but they can simply download the file to their local machine when logged into one workspace, log out and log back into another workspace on their local machine. Score: 1 A user copy and paste files/data out of the environment via remote desktop or otherA user can copy and paste anything displayed in their browser Score: 0 A user screen shots dataA user can screen shot data from their local machine Score: 0 AssessmentThis option scores 1, meaning that it does little to address the scenarios laid out in #54 |
Beta Was this translation helpful? Give feedback.
-
Option 2: Bastion or Azure Virtual Desktop to a virtual machine (or collection of virtual machines)In this option, we have a collection of virtual machines for use by external users. Once on the virtual machine, the user can access the products on the LSC-SDE via their browser. flowchart LR
InternalUser([Internal User]) --> |VPN| LTH[LTH Network] --> AzureHubVNET[Azure Hub VNET]
subgraph azpublic [Azure: Publicly Accessible Resources]
Entra
LB
end
subgraph AzureHub [Azure Hub Network]
AzureHubVNET
end
subgraph LSCSDE [LSC SDE Network]
direction TB
subgraph AzureSpokeProdVNet[K8s Cluster]
Jupyter[Jupyter Hub] --> ATLAS[OHDSI Atlas]
end
subgraph AzureVD[External User VNET]
DevVM[Virtual Desktops]
end
end
AzureHubVNET --> Jupyter
DevVM --> |Browser| Jupyter
ExternalUser([External User]) --> |Browser| LB[Azure Virtual Desktop Endpoint] --> |RDP|DevVM
InternalUser --> Entra
ExternalUser --> Entra
The remote desktop will be locked down so that users cannot connect local resources, including disks, printers or their clipboard. The VM will also have no persistent storage and will be deprovisioned every night to save money and ensure that any data on the machine is stored only temporarily. Scenario AssessmentA user downloads files/data to their local machineA user cannot directly download data to their virtual machine as the remote desktop connection acts as a barrier Score: 5 A user download files/data to an intermediate VMThese is nothing to stop files or data being downloaded from the workspace to the intermediate VM. If the VM is configured with no persistent storage and deprovisions every night this will limit the exposure as the virtual machine will only hold the downloaded data for a short period of time. Score: 1 A user transfer files from one workspace to anotherIf a user wishes to transfer files from one workspace to another, they cannot do so within the workspace, but they can simply download the file to their virtual machine when logged into one workspace, log out and log back into another workspace from the VM and then upload the data. Score: 1 A user copy and paste files/data out of the environment via remote desktop or otherGroup policies will prevent attachment of clipboard or any other local resources, this means that the user will not be able to copy the data out of the environment through this methodology. Score: 5 A user screen shots dataA user can screen shot data from their local machine Score: 0 AssessmentThis option scores 12, which demonstrates a great deal of improvement on Option 1 and does address some of the scenarios laid out in #54 however there is also a great deal of room for improvement |
Beta Was this translation helpful? Give feedback.
-
Option 3: Azure Virtual Desktop or Bastion with Workspace VNETThis option is a similar setup to option 2: Bastion or Azure Virtual Desktop to a virtual machine (or collection of virtual machines), however the workspace is provisioned on a virtual desktop on its own dedicated virtual network for the workspace and access will be limited to the workspace in question. This will essentially be an extension to the jupyter workspace into the azure virtual network. flowchart LR
InternalUser([Internal User]) --> |VPN| LTH[LTH Network] --> AzureHubVNET[Azure Hub VNET]
subgraph azpublic [Azure: Publicly Accessible Resources]
Entra
LB
end
subgraph AzureHub [Azure Hub Network]
AzureHubVNET
end
subgraph LSCSDE [LSC SDE Network]
direction TB
subgraph AzureSpokeProdVNet[K8s Cluster]
Jupyter[Jupyter Hub] --> ATLAS[OHDSI Atlas]
end
subgraph AzureVD[Workspace VNET]
DevVM[Virtual Desktops]
end
AzureVD -.-|Peering| AzureSpokeProdVNet
end
AzureHubVNET --> Jupyter
DevVM --> |Browser| Jupyter
ExternalUser([External User]) --> |Browser| LB[Azure Virtual Desktop Endpoint] --> |RDP|DevVM
InternalUser --> Entra
ExternalUser --> Entra
Like in option 2 the remote desktop will be locked down so that users cannot connect local resources, including disks, printers or their clipboard. However unlike in option 2, the virtual machine will allow persistent storage as this VM is dedicated resource for the workspace, so data will not have left the workspace and no other workspaces will be accessible. This is significantly greater work than option 2 as virtual machines and supporting Virtual networks and peerings will need to be provisioned automatically by the system, it will also need to provision the resources in azure virtual desktop to expose the virtual machine to the outside world. It will also make the solution incredibly reliant on azure virtual desktop to function correctly. A forward proxy would be needed to be exposed to the newly created virtual network, this would handle all browser traffic on the network allowing us to control access via network policies and NSG's in azure. Policies will be in place that prevents access to any other workspace than those provisioned We will need operators to provision:
Scenario AssessmentA user downloads files/data to their local machineA user cannot directly download data to their virtual machine as the remote desktop connection acts as a barrier Score: 5 A user download files/data to an intermediate VMWhile a user can download data to the virtual machine, the VM is part of the environment so this does not matter. Score: 4 A user transfer files from one workspace to anotherNetwork policies will prevent the VM accessing any other workspace than its own. Score: 5 A user copy and paste files/data out of the environment via remote desktop or otherGroup policies will prevent attachment of clipboard or any other local resources, this means that the user will not be able to copy the data out of the environment through this methodology. Score: 5 A user screen shots dataA user can screen shot data from their local machine Score: 0 AssessmentThis option scores 19, which demonstrates many improvements on Option 2 and does address some of the scenarios laid out in #54 however there are concerns with the reliance of this solution on specific azure products. |
Beta Was this translation helpful? Give feedback.
-
Option 4: Apache Guacamole with Workspace VNETThis option is a similar setup to option 3, however rather than using azure virtual desktop, it instead uses the open source Apache Guacamole. flowchart TB
InternalUser([Internal User]) --> |VPN| LTH[LTH Network] --> AzureHubVNET[Azure Hub VNET]
subgraph azpublic [Azure: Publicly Accessible Resources]
Entra
LB
end
subgraph AzureHub [Azure Hub Network]
AzureHubVNET
end
subgraph LSCSDE [LSC SDE Network]
direction TB
subgraph AzureSpokeProdVNet[K8s Cluster]
Jupyter[Jupyter Hub] --> ATLAS[OHDSI Atlas]
Guacamole[Apache Guacamole]
end
subgraph AzureVD[Workspace VNET]
DevVM[Virtual Desktops]
end
AzureSpokeProdVNet -.-|Peering| AzureVD
end
AzureHubVNET --> Jupyter
DevVM --> |Browser| Jupyter
ExternalUser([External User]) --> |Browser| LB[Load Balancer] --> Guacamole --> |RDP/VNC| DevVM
InternalUser --> Entra
ExternalUser --> Entra
Like in option 3 the remote desktop will be locked down so that users cannot connect local resources, including disks, printers or their clipboard and the solution will allow persistent storage as this VM is dedicated resource for the workspace, so data will not have left the workspace and no other workspaces will be accessible. Also like option 3, this solution will need work to provision virtual machines and supporting Virtual networks and peerings will need to be automatically provisioned. Where option 3 will need us to maintain Azure Virtual Desktops, this will instead need an operator to manage the connections in Guacamole's database. A forward proxy would be still be needed to be exposed to the newly created virtual network, this would handle all browser traffic on the network allowing us to control access via network policies and NSG's in azure. Policies will be in place that prevents access to any other workspace than those provisioned We will need operators to provision:
Scenario AssessmentA user downloads files/data to their local machineA user cannot directly download data to their virtual machine as the remote desktop connection acts as a barrier Score: 5 A user download files/data to an intermediate VMWhile a user can download data to the virtual machine, the VM is part of the environment so this does not matter. Score: 4 A user transfer files from one workspace to anotherNetwork policies will prevent the VM accessing any other workspace than its own. Score: 5 A user copy and paste files/data out of the environment via remote desktop or otherGroup policies will prevent attachment of clipboard or any other local resources, this means that the user will not be able to copy the data out of the environment through this methodology. Score: 5 A user screen shots dataA user can screen shot data from their local machine Score: 0 AssessmentThis option scores 19, which is the same as option 3 and demostrates that the solution does address some of the scenarios laid out in #54 and unlike option 3 the solution is reliant on Apache Guacamole rather than Azure Virtual Desktop which can be deployed to any area (though new operators would be needed for other solutions). |
Beta Was this translation helpful? Give feedback.
-
Option 5: Guacamole with XFCE pods over VNCThis option attempts to address the entire journey using containers rather than virtual machines. This reduces the complexity on network and the development of additional controllers but perhaps shifts the emphasis onto container image development. flowchart TB
InternalUser([Internal User]) --> |VPN| LTH[LTH Network] --> AzureHubVNET[Azure Hub VNET]
subgraph azpublic [Azure: Publicly Accessible Resources]
Entra
LB
end
subgraph AzureHub [Azure Hub Network]
AzureHubVNET
end
subgraph LSCSDE [LSC SDE Network]
direction TB
subgraph AzureSpokeProdVNet[K8s Cluster]
Guacamole[Apache Guacamole]
DevVM[Firefox Container]
Jupyter[Jupyter Hub] --> ATLAS[OHDSI Atlas]
end
end
AzureHubVNET --> Jupyter
DevVM --> |Browser| Jupyter
ExternalUser([External User]) --> |Browser| LB[Load Balancer] --> Guacamole --> |VNC| DevVM
InternalUser --> Entra
ExternalUser --> Entra
Like in options 3 and 4, the remote desktop will be locked down so that users cannot connect local resources, including disks, printers or their clipboard. Unlike in options 2, 3 and 4 we will not provision any external virtual machines, or virtual networks, instead we will provision a container image inside of kubernetes, this container image will run XFCE or similar and a browser such as firefox, this browser will point directly at the workspace it is targeting and no others. It should be noted that running a virtual desktop via XFCE on a container image is experimental and we may experience some issues with compatibility with the various products. OHDSI for example only officially supports Google Chome, so we will have to see if we can get a container running with google chrome in XFCE. We will need an operator to provision Guacamole entries based upon the workspace definition. We will also need to develop a reliable container image for the browsers Scenario AssessmentA user downloads files/data to their local machineA user cannot directly download data to their virtual machine as the remote desktop connection acts as a barrier Score: 5 A user download files/data to an intermediate VMWhile a user can potentially download data to the browser container, the container will cease to exist after the connection goes stale and the data will not persist. Score: 5 A user transfer files from one workspace to anotherNetwork policies will prevent the browser pod accessing any other workspace than its own. Score: 5 A user copy and paste files/data out of the environment via remote desktop or otherThe guacamole instance will prevent attachment of clipboard or any other local resources, this means that the user will not be able to copy the data out of the environment through this methodology. Score: 5 A user screen shots dataA user can screen shot data from their local machine Score: 0 AssessmentThis option scores 20, which is a mild improvement on options 3 and 4 and demostrates that the solution does address some of the scenarios laid out in #54 and could be deployed into any environment. It is also likely the most cost effective and quick to provision solution as it does not rely on external VM's, however a drawback is that it is highly experimental and the browsers / vnc pods may not work correctly with the products now or moving forward so there are risks to this approach. |
Beta Was this translation helpful? Give feedback.
-
|
@vvcb @m1p1h Further to our discussion earlier today I've started off this discussion on public access, i've given a high level overview of the options we discussed today, even the ones we've already decided not to pursue as I think it's important to document the decisions and reasoning. Following the meeting I think option 4 is the one we want to look at in greater detail, so I will start drawing up plans but I'm open to further discussion on this. |
Beta Was this translation helpful? Give feedback.
-
Happy to do so, i already have a limited POC for option 5 which we can use as the basis of it. It is definitely the quicker option if we're happy with the risks. |
Beta Was this translation helpful? Give feedback.
-
|
Further to discussions this morning I am adding another option Option 6: Jupyter Driven Guacamole with XFCE pods over VNCLike option 5, this option attempts to address the entire journey using containers rather than virtual machines will not amend Apache Guacamole to provision containers via kubernetes. instead a second instance of Jupyter hub will spin up a pod that will create the browser container and expose this via a temporary guacamole instance spun up for the user in question. This will mean that jupyter hub will control both parts of the journey and should in theory make it easier to get authentication/authorisation working. flowchart TB
InternalUser([Internal User]) --> |VPN| LTH[LTH Network] --> AzureHubVNET[Azure Hub VNET]
subgraph azpublic [Azure: Publicly Accessible Resources]
Entra
LB
end
subgraph AzureHub [Azure Hub Network]
AzureHubVNET
end
subgraph LSCSDE [LSC SDE Network]
direction TB
subgraph AzureSpokeProdVNet[K8s Cluster]
JupyterPF[Public Facing Jupyter Hub]
subgraph Browser Pod
Guacamole[Guacamole Container]
DevVM[Firefox Container]
end
Jupyter[Private Facing Jupyter Hub] --> ATLAS[OHDSI Atlas]
end
end
AzureHubVNET --> Jupyter
DevVM --> |Browser| Jupyter
ExternalUser([External User]) --> |Browser| LB[Load Balancer] --> JupyterPF --> Guacamole --> |VNC| DevVM
InternalUser --> Entra
ExternalUser --> Entra
Like in options 3 and 4, the remote desktop will be locked down so that users cannot connect local resources, including disks, printers or their clipboard. Unlike in options 2, 3 and 4 we will not provision any external virtual machines, or virtual networks, instead we will provision a container image inside of kubernetes, this container image will run XFCE or similar and a browser such as firefox, this browser will point directly at the workspace it is targeting and no others. It should be noted that running a virtual desktop via XFCE on a container image is experimental and we may experience some issues with compatibility with the various products. OHDSI for example only officially supports Google Chome, so we will have to see if we can get a container running with google chrome in XFCE. We will need an operator to provision Guacamole entries based upon the workspace definition. We will also need to develop a reliable container image for the browsers Scenario AssessmentA user downloads files/data to their local machineA user cannot directly download data to their virtual machine as the remote desktop connection acts as a barrier Score: 5 A user download files/data to an intermediate VMWhile a user can potentially download data to the browser container, the container will cease to exist after the connection goes stale and the data will not persist. Score: 5 A user transfer files from one workspace to anotherNetwork policies will prevent the browser pod accessing any other workspace than its own. Score: 5 A user copy and paste files/data out of the environment via remote desktop or otherThe guacamole instance will prevent attachment of clipboard or any other local resources, this means that the user will not be able to copy the data out of the environment through this methodology. Score: 5 A user screen shots dataA user can screen shot data from their local machine Score: 0 AssessmentThis option scores 20, which is a mild improvement on options 3 and 4 and demostrates that the solution does address some of the scenarios laid out in #54 and could be deployed into any environment. By comparison, to Option 5 this is actually entirely comparable, with the main difference being where the development effort is placed, into apache guacamole, or into jupyter hub. |
Beta Was this translation helpful? Give feedback.
-
|
It was great to catch up with you this week! Here's a rough draft of the alternative I mentioned Default JupyterHub Proxy(@minrk @yuvipanda if one of you has time would you mind checking if this diagram of the default proxy behaviour is correct?) sequenceDiagram
Hub-->>ConfigurableHttpProxy: Initialisation: Create route / to hub
actor Alice
rect rgb(255, 255, 192)
Note right of Alice: Get hub home page
Alice->>ConfigurableHttpProxy: GET /
ConfigurableHttpProxy-->>Hub: GET / (Forward to hub)
Hub--)ConfigurableHttpProxy: Return hub homepage
ConfigurableHttpProxy->>Alice: Forward hub homepage
Note right of Alice: If needed login and set cookies
end
rect rgb(192, 255, 255)
Note right of Alice: Start alice's singleuser server
Alice->>ConfigurableHttpProxy: GET /users/alice/spawn (Start user's server)
ConfigurableHttpProxy-->>Hub: Get user server (Forward request to hub), hub creates singleuser server
Hub--)ConfigurableHttpProxy: Create route /users/alice to singleuser-ip:port
Hub--)ConfigurableHttpProxy: Redirect user to /users/alice
ConfigurableHttpProxy->>Alice: Forward redirect /users/alice
end
rect rgb(255, 192, 255)
Note right of Alice: Server started, proxy bypasses hub
Alice->>ConfigurableHttpProxy: GET /users/alice
ConfigurableHttpProxy--)Singleuser-server: Forward /users/alice to singleuser-ip:port
Note right of Singleuser-server: Singleuser server must check user is authenticated<br/>using hub cookies
end
In this example if you wanted a desktop the singleuser server pod would include the Linux desktop, VNC server, and Guacamole Option 7: Jupyter Driven shared Guacamole with XFCE pods over VNCSimilar to option 6, except that instead of each desktop having it's own guacamole server a single shared guacamole server will be used. JupyterHub consists of the hub (affectively the control plane for JupyterHub), and a proxy (default configurable-http-proxy, traefik-proxy is an alternative) that handles routing. This option requires replacing the proxy with a customised Guacamole server that implement the necessary proxy behaviour https://jupyterhub.readthedocs.io/en/stable/howto/proxy.html. Principally, this means maintaining a route table, implementing an API that allows JupyterHub to create routes, and to route/proxy incoming requests accordingly. This could either be done by modifying Guacamole (a lot of work), or by wrapping guacamole with a Python server. sequenceDiagram
Hub-->>GuacamoleProxy: Initialisation: Create route / to hub
actor Alice
rect rgb(255, 255, 192)
Note right of Alice: Get hub home page
Alice->>GuacamoleProxy: GET /
GuacamoleProxy-->>Hub: GET / (Forward to hub)
Hub--)GuacamoleProxy: Return hub homepage
GuacamoleProxy->>Alice: Forward hub homepage
Note right of Alice: If needed login and set cookies
end
rect rgb(192, 255, 255)
Note right of Alice: Start alice's Desktop-pod
Alice->>GuacamoleProxy: GET /users/alice/spawn (Start user's desktop)
GuacamoleProxy-->>Hub: Get user server (Forward request to hub), hub creates desktop pod
Hub--)GuacamoleProxy: Create route /users/alice to Desktop-pod
Note right of GuacamoleProxy: Proxy should ignore this request,<br/>and instead route /users/alice to itself
Hub--)GuacamoleProxy: Redirect user to /users/alice
GuacamoleProxy->>Alice: Forward redirect /users/alice
end
rect rgb(255, 192, 255)
Note right of Alice: Desktop-pod started, proxy bypasses hub
Alice->>GuacamoleProxy: GET /users/alice
Note right of GuacamoleProxy: Don't proxy to singleuser user<br/>Guacamole handles everything
GuacamoleProxy--)Desktop-pod: VNC/RDP connection
Note right of GuacamoleProxy: GuacamoleProxy must check user is authenticated
end
In this example the singleuser server pod is the Linux desktop and VNC server only. |
Beta Was this translation helpful? Give feedback.
-
|
So I've been looking at this once again and feel that the core problem of creating a decent user experience is one of workspace selection rather than anything else. Which really is a problem with the authentication/authorization system we currently use. At present this is handled by jupyterhub as part of the application, e.g. the user selects their workspace after they've logged in rather than as part of the login process. I think if we want to do this properly we would want a customised version of oauthenticator. This version would make the user select their workspace as part of the login process. This information could then be carried forward as a claim to every client that then uses the jupyterhub authentication service. I think if we also stick keycloak in the middle we can achieve a few things:
I would therefore suggest:
This will effectively create a chain flowchart LR
subgraph public services
PJH[Public Jupyterhub] --> KC[Keycloak]
PJH --> GC[Guacamole Container]
end
GC --> BC
IJH --> KC
subgraph private services
BC[Browser Container] ---> IJH[Internal Jupyterhub] --> JNB[Jupyter Notebook]
end
subgraph private support services
KCO[Keycloak Operator] --> KC
KAPI[K8S API] --> KCO
end
PJH --> KAPI
To the user though this should all be seemless, they will login to jupyterhub, select their workspace, they should be presented with a VNC window loading guacamole, which in turn contains a browser pointing at internal jupyterhub which will automatically log them into the correct workspace. This is essentially a variant of option 6 and we can leverage the work done by manics above to manage the proxying elements. This should effectively require no work on apache guacamole itself, except for making a guacamole via jupyterhub implementation. I will begin working on these immediately |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Setting this up as a way to discuss the options with regards to public access to the environment.
At present, access to this environment is provided via web browser for those on the LTH network, or via a jump server accessed via bastion in azure.
flowchart LR User --> |VPN| LTH[LTH Network] --> AzureHubVNET[Azure Hub VNET] subgraph AzureHub [Azure Hub Network] AzureHubVNET end subgraph LSCSDE [LSC SDE Network] direction TB AzureSpokeDev[Dev VNET] ~~~ AzureSpokeSandbox[Sandbox VNET] ~~~ AzureSpokeProd[Prod VNET] end AzureHub --> LSCSDENWSDE utilises Azure Virtual Desktop, however the implementation of this has proven difficult to manage and is deemed difficult to secure. As a result we have been asked to investigate other options for the LSCSDE.
To assess the different options alongside the issues identified in #54 and assigning a score from 0-5 (with 5 being really well) for each of the scenarios based on how well a solution deals with the scenario hopefully we can judge which of the proposed solutions is best.
What options are there for public access?
Beta Was this translation helpful? Give feedback.
All reactions