Skip to content

Code and readme to implement a Galaxy server cluster on Azure using AKS, Kubernetes, HTCondor, and Helm

License

Notifications You must be signed in to change notification settings

rc-ms/galaxy-azure-k8s-helm-htcondor

Repository files navigation

Building and Running a Galaxy Bioinformatics Web Portal on Azure Using Kubernetes, NFS, and HTCondor

Table of Contents generated with DocToc

Overview

This readme describes how to build and run a Galaxy server cluster on Azure using Kubernetes, HTCondor, Helm, and some post install incantations and configurations. It was built to support the Neurolincs epigenome workflow of the Fraenkel Lab at MIT in support of the AnswerALS Foundation research plan seeking a cure for Amyotrophic Lateral Sclerosis.

The acceptance criteria for this configuration included the following:

  • At least 60 TB of accessible storage for genome data and algorithm output
  • FTP, Galaxy web UI import, and direct file upload options
  • Sufficient peformance for uploaded data
  • Dynamic scaling
  • Kubernetes orchestration
  • Package management via Helm chart for external configurability, consistency with Galaxy Cloudman next-generation support
  • NFS server/client configuration for Galaxy file-management support

Setup

Install the Azure CLI Tools

Before we can start working on Galaxy itself, we need to set up and configure the Kubernetes environment it will run in.

Start by creating an Azure account.

Once you have an account, install the Azure CLI tools.

Per this link, brew is the preferred way to install the Azure CLI.

$ brew update && brew install azure-cli

You can then run the Azure CLI with the az command from a Terminal window.

Windows

The recommended way to install Azure CLI tools on Windows is to use the MSI Installer.

Linux

Depending on which version of Linux you're using, you can install using yum, zypper, apt, or script. Best to check the docs for your particular peccadillo.

Verify your Azure setup

Once you've installed your tools and set up your account, let's verify everything is working correctly. Open a terminal window or command prompt and enter the following:

$ az login

You should be prompted to go to this URL (http://aka.ms/devicelogin) and enter the code referenced in your Terminal response to sign in.

To sign in, use a web browser to open the page https://aka.ms/devicelogin and enter the code BPV5A449L to authenticate.

Once you've successfully signed in, verify you can access your subscription information by running the following in your Terminal window:

$ az account list --output=table

Note the value in the name field: you'll use that later. If you have multiple accounts and want to change the default (designated by whichever subscription shows as True in the IsDefault column) or current active subscription, use the set command

$ az account set -s "[your subscription Name here]"

Create Your AKS Kubernetes Cluster

To create your Kubernetes cluster, we're going to use the new Azure Kubernetes Service (AKS) optimized for Kubernetes. Some notes as to why:

  • You don't pay for the Kubernetes master VMs because the AKS control plane is free
  • You can scale and upgrade your Kubernetes clusters easily and in place
  • Full "upstream" Kubernetes API support
  • Configurations that use AKS spin up and consume fewer compute resources than previous ACS-based versions

That's no small stuff.

Let's start by creating a resource group:

az group create --name k8sGalaxy --location centralus

Note that not all regions have all versions of VMs that might be used for Kubernetes clusters. If you're concerned about specific configurations, specifically as relates to VMs, refer to this product availability chart; you may generate errors on create but the message will tell you which regions are available (at the time of this writing: eastus,westeurope,centralus,canadacentral,canadaeast).

Now that you've created your resource group, let's create a k8s cluster within it.

az aks create --resource-group k8sGalaxy --name ansAlsGalaxy --generate-ssh-keys

It may take a few minutes, but if all goes according to plan you'll see a new k8s cluster appear in your portal. Congrats!

Install the AKS CLI

Now that we've got our shiny new cluster set up, we need to connect to it. To do that, let's install more fun tools.

az aks install-cli

If you're on a Mac and try to install and get permission issues, run again under sudo:

sudo az aks install-cli

(At the password prompt, enter the login for your Mac.)

Check out your cluster

For the CLI tool to connect to your cluster, we have to download the credentials and keys necessary to do so securely, etc. Yup: there's a command for that:

az aks get-credentials --resource-group k8sGalaxy --name ansAlsGalaxy

We are now ready to begin interacting with our newly-created cluster. First, let's make sure we're all connected up. To do that we'll start using kubectl, the Kubernetes command-line interface for interacting with k8s clusters. We'll start with kubectl get nodes.

kubectl get nodes
NAME                       STATUS    ROLES     AGE       VERSION
aks-nodepool1-37476279-0   Ready     agent     1h        v1.7.7
aks-nodepool1-37476279-1   Ready     agent     1h        v1.7.7
aks-nodepool1-37476279-2   Ready     agent     1h        v1.7.7

Post-create cluster configuration

For this Galaxy implementations, we're going to directly connect to the Kubernetes agents. One of the agent nodes will be configured as a storage node with an NFS server. Shared files will be hosted in an /export directory, and the remaining nodes will mount that /export directory as NFS clients.

First, use kubectl to designate one node (we'll use the '-0' node) for storage (remember to replace the commands below with the Name of the nodes in your k8s cluster).

kubectl label nodes aks-nodepool1-37476279-0 type=store
node "aks-nodepool1-37476279-0" labeled

Find your agent pool

Since it's possible, even likely, that the resource group AKS created to house your cluster is different than the one you specified earlier using the Azure CLI, let's first list our resource groups

az group list -o table You should see the resource group you created initially in the results. Check as well for another resource group with a concatenation of your group and k8s cluster, as happened with me this go-round.

az group list -o table
Name                                    Location        Status
--------------------------------------  --------------  ---------

k8sGalaxy                               centralus       Succeeded
MC_k8sGalaxy_ansAlsGalaxy_centralus     centralus       Succeeded

My suspicion is that the agents are in the concatenated version. :)

az resource list -g MC_k8sGalaxy_ansAlsGalaxy_centralus -o table

Look for your agent pool VMs in the list that results. Are they there? Good. No? You sure your cluster creation was successful? Maybe look in another resource group. Once you do find them, let's get some additional info about them:

az vm show -g MC_k8sGalaxy_ansAlsGalaxy_centralus -n aks-nodepool1-37476279-0

I'll spare you the full dump of data; suffice to say it's a lot. But we're good, right?

Create public IPs for SSH access

(Note that this section plagiarizes pretty much entirely from this great how-to.)

Okay. Now that we've verified our agents are there and accessible; lets start creating a public IP for each node so we can enable SSH access.

az network public-ip create -g MC_k8sGalaxy_ansAlsGalaxy_centralus -n node0-ip

We can also get the list of IPs available:

az network nic list -g MC_k8sGalaxy_ansAlsGalaxy_centralus -o table
az network nic list -g MC_k8sGalaxy_ansAlsGalaxy_centralus -o table
EnableIpForwarding    Location    MacAddress         Name                          Primary    ProvisioningState    ResourceGroup                        ResourceGuid
--------------------  ----------  -----------------  ----------------------------  ---------  -------------------  -----------------------------------  ------------------------------------
True                  centralus   00-0D-3A-92-21-9A  aks-nodepool1-37476279-nic-0  True       Succeeded            MC_k8sGalaxy_ansAlsGalaxy_centralus  f7bb5530-5ca2-435a-b5b3-609d728bf435
True                  centralus   00-0D-3A-91-EC-9A  aks-nodepool1-37476279-nic-1  True       Succeeded            MC_k8sGalaxy_ansAlsGalaxy_centralus  277dc614-0b0e-41c0-8372-0383b6798554
True                  centralus   00-0D-3A-92-2A-4C  aks-nodepool1-37476279-nic-2  True       Succeeded            MC_k8sGalaxy_ansAlsGalaxy_centralus  ed39e5a1-93ae-445f-89b3-2827c8bcb52d

Almost there! Now we take the Name from the nic list and ask about its associated ipconfig information.

az network nic ip-config list --nic-name aks-nodepool1-37476279-nic-0 -g MC_k8sGalaxy_ansAlsGalaxy_centralus

Now we can add the public IP to the ipconfig file. (Crikey. ipconfig? Yes. But not forever, okay?) Note that the --name ipconfig1 parameter is in the response above; it should be ipconfig1 but if for some reason it isn't, check the Name field.

az network nic ip-config update -g MC_k8sGalaxy_ansAlsGalaxy_centralus --nic-name aks-nodepool1-37476279-nic-0 --name ipconfig1 --public-ip-address node0-ip

Assuming this update is successful (which it should be), you can now ask about the public ip that was created for this node.

az network public-ip show -g MC_k8sGalaxy_ansAlsGalaxy_centralus -n node0-ip

Woo hoo! There it is. Our very own IP address. Now we can SSH into node 0.

Before leaving, let's set up SSH access for the other two nodes that were created. Thanks to command history we can just circle back and swap out -1 and then -2 for each command. Remember that you will need to create a new public IP for each node, as well as update each ipconfig file to incorporate the new IP. (And yes, we really should script this.) Now remember those IP addresses for our next set of work.

Configure storage node and set up NFS Server

What did we want to do again? Right: SSH into the nodes. We'll start with our storage node.

ssh [your_username]@[your.node0.IP.address]

Remember, the username and password are the accounts you created in the Azure Portal.

Let's get some proper priviledges.

$ sudo su
[sudo] password for rc:
root@aks-nodepool1-37476279-0:/home/rc#

First, we're going to create an /export directory.

root@aks-nodepool1-37476279-0:/home/rc# mkdir /export
root@aks-nodepool1-37476279-0:/home/rc# chown nobody:nogroup /export

Now we'll install and start NFS server.

root@aks-nodepool1-37476279-0:/home/rc# sudo apt install nfs-kernel-server

After install, add the /export directory to the list of directories eligible for nfs mount with both read and write privileges. We'll do it by adding the following entries to the /etc/exports file using vi (if your vi is rusty, i find this page helpful).

root@aks-nodepool1-37476279-0:/home/rc# vi /etc/exports

Once you've placed your cursor where you want (don't forget to press the 'i' key :) ), copy the below and then save (Esc, wq:):

      /ubuntu  *(ro,sync,no_root_squash)
      /export  *(rw,sync,no_root_squash)

And now for the hosts.allow file

vi /etc/hosts.allow

Copy and paste the following:

   ALL: ALL

Ok. Now we're ready to start the service.

root@aks-nodepool1-37476279-0:/etc# sudo systemctl start nfs-kernel-server.service

Set remaining nodes as NFS clients

We should know the SSH dance pretty well now. For the remaining nodes, we're going to also create an /export directory, but instead of NFS Server incantations, they'll be NFS clients and mount the server's /export directory.

Why do we have to do this, you ask? (Or maybe you don't.) Regardless, we're going through these steps because of the way Galaxy handles files in its current state. Our experience shows the most performant configuration is enabling NFS support, especially if we want to have files uploaded through the Galaxy UI be available for scalable compute jobs.

(FYI, I find it helpful to run these commands in a new terminal window, keeping the storage node session open in case I need to stop and restart the NFS server.)

Once connected to one of your remaining nodes, sudo su and create an /export directory again:

root@aks-nodepool1-37476279-1:/home/rc# mkdir /export

Now, though, we're going to mount the storage node's /export directory.

sudo mount aks-nodepool1-37476279-0:/export /export

If you get a permission denied message, return to your storage node and stop and restart the NFS service. I don't know why this works, but it does.

root@aks-nodepool1-37476279-0:/etc# sudo systemctl stop nfs-kernel-server.service
root@aks-nodepool1-37476279-0:/etc# sudo systemctl start nfs-kernel-server.service

Install Galaxy via a Helm Chart

Bet you thought we'd never get here. I know I did. Now we're going to get ready to build and deploy Galaxy via a Helm Chart.

Install Helm and Tiller

Ok but first let's install Helm. For Mac, we'll use good old brew.

brew install kubernetes-helm

With Helm installed, we'll install Tiller by running helm init.

helm init

Choose Your Galaxy Flavor

Now you're (finally!) ready to start working with Galaxy! This is a critical moment. Which Galaxy do you want to install into your cluster? If you have cloned this repository and you navigated to its directory in Terminal, you can install this repo into your new cluster with the following command.

helm install galaxy

Other installation options obtain as well.

Post-install: Fix your permissions

While your Galaxy may be installed and running, it will also likely throw some permission issues if you tried to load it in your browser. So let's nip that in the bud.

If you haven't kept your storage node SSH session open by chance, let's connect again and run these commands on your /export directory

root@aks-nodepool1-37476279-0: cd /export
root@aks-nodepool1-37476279-0: chmod 777 -R *

To set up your k8s cluster to load the Galaxy web UI in your local browser, run this command on your local computer (not one of the agent nodes).

kubectl port-forward galaxy 8080:80

Now you can open your browser and point it at the URL you specified above (in this case you are forwarding the Galaxy response to port 8080 so enter the URL http://localhost:8080 in your browser; if for some reason you get an error on port 8080 feel free to try another port such as 8090 or 13080).

Configure HTCondor

Now we'll shell to galaxy-htcondor container

[localhost]:kubectl exec -it galaxy-htcondor -- /bin/bash

Edit file /etc/hosts

  [root@galaxy-htcondor]: vi /etc/hosts

Insert the following line

  127.0.0.1   galaxy-htcondor

Shell to galaxy-htcondor-executor container

kubectl exec -it galaxy-htcondor-executor -- /bin/bash

Edit file /etc/hosts

  [root@galaxy-htcondor-executor]: vi /etc/hosts
  127.0.0.1   galaxy-htcondor-executor

Get shell to galaxy container

kubectl exec -it galaxy --container=galaxy  -- /bin/bash

Edit /etc/condor/condor_config.local

[root@galaxy-htcondor]: vi /etc/condor/condor_config.local

Copy and paste the following

  HOSTALLOW_READ = *
  HOSTALLOW_WRITE = *
  HOSTALLOW_NEGOTIATOR = *
  HOSTALLOW_ADMINISTRATOR = *

Restart condor.

    [root@galaxy]:condor_restart

Configure a Public Static IP

We would imagine that given you've gone to set up this awesome Galaxy server, you don't want people to have to port-forward from kubectl to access it. To set one up in Azure, you can do so from the portal or To assign a static public IP to galaxy and galaxy-proftpd server run

kubectl expose pod galaxy --type=LoadBalancer
kubectl expose pod galaxy-proftpd --type=LoadBalancer

Create a Public IP using the Azure CLI

This article gives a great summary of the Public IP options and rationale (this section is effectively a micro-version).

First, create a new reserved IP address in the same resource group as your AKS deployment

az network public-ip create -g MC_k8sGalaxy_ansAlsGalaxy_centralus -n galaxyWebServer --dns-name galaxy-web-ip --allocation-method Static
{
  "publicIp": {
    "dnsSettings": {
      "domainNameLabel": "galaxy-web-ip",
      "fqdn": "galaxy-web-ip.centralus.cloudapp.azure.com",
      "reverseFqdn": null
    },
    "ipAddress": "your.public.ip.address",
    "ipConfiguration": null,
    "ipTags": [],
    "location": "centralus",
    "name": "galaxyWebServer",
    "provisioningState": "Succeeded",
    "publicIpAddressVersion": "IPv4",
    "publicIpAllocationMethod": "Static",
    "resourceGroup": "MC_k8sGalaxy_ansAlsGalaxy_centralus"
  }
}

Now that we have our public IP, we can add it to our YAML configuration file. Open galaxy-webservice2.yaml and add your IP address to the loadbalancerIP: entry.

  loadBalancerIP: your.public.ip.address

Save your file. Now we're going to delete the running service and then re-create it with the updated IP.

kubectl delete -f galaxy-webservice2.yaml
kubectl create -f galaxy-webservice2.yaml

Did it work? Let's find out.

kubectl get svc --output=wide

Troubleshooting

So many places for it all to go wrong. Hopefully you've been familiarizing yourself with the different to-ing and fro-ing along the way.

Stopping and Restarting Galaxy

If you come back to your Galaxy after some time away and you find the URL isn't loading or whatnot, here are the steps you can do to perform a "reboot" Kubernetes-style. Open your terminal and run the following commands.

kubectl delete --all pods --all --force --grace-period=0
kubectl delete --all services --all --force --grace-period=0
kubectl delete --all deployments --all --force --grace-period=0
kubectl delete --all rc --all --force --grace-period=0

Updating Tools

Upgrading Kubernetes

Additional Resources and Links

About

Code and readme to implement a Galaxy server cluster on Azure using AKS, Kubernetes, HTCondor, and Helm

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages