Scanner + Kubernetes on Amazon EKS
This document will guide you through setting up a kubernetes cluster on AWS that is ready to process scanner jobs.
Install Scanner via installation
Create a Google Cloud Platform account and then create a Google Cloud project.
Install the gcloud SDK and then install the kubernetes command-line management tool
gcloud components install kubectl
- Install the
jqtool for parsing JSON (used in this example):
apt-get install jq
brew install jq
- Create a bucket on AWS S3. Put the name into
There are three components required to get started running jobs on AWS:
Get setup to connect to AWS machines
Optional: create a staging machine on AWS for setting up the cluster and running jobs.
Create a kubernetes cluster.
Run a job on the cluster.
Connecting to AWS machines
To connect to the AWS cluster, you need to acquire authentication keys. This will
take the form of an
access key id and a
Now, install the AWS command line interface and configure it using your
access key id and
secret key, making sure to set your default region to
pip3 install awscli aws configure
Finally, we will generate a key pair that can be used to sign into our EC2 machines:
aws ec2 create-key-pair --key-name ec2-key --query 'KeyMaterial' --output text > ec2-key.pem chmod 600 ec2-key.pem
This commands saves a private key to the file
ec2-key.pem. Now you're ready to
connect to an EC2 machine.
Optional: Creating a staging machine
Our first act will be to create a "staging" machine on AWS. The purpose of this machine is to serve as the "staging" ground for managing our cluster and executing long-running jobs. This is preferable to using your own local machine because the bandwidth from this machine to AWS services (such as S3) will be much higher and the connectivity will be more stable.
Building from scratch
If there is no existing AMI, you can build the image yourself:
- Find the current AMI id for Ubuntu 16.04 by going to
https://cloud-images.ubuntu.com/locator/ec2/ in your browser, typing
us-west-2 hvm xenial ebsinto the search box, and then copying the
- Open the
spawn_staging_machine.shscript and change the
AMI=...variable to the Ubuntu AMI ID you copied from step 1.
bash ./spawn_staging_machine.shagain to spawn a staging machine. This will take a few moments. Once it's complete, it will output the public IP of the machine which you can use to access it.
- Connect to the remote machine by running:
ssh -i path/to/your/ec2-key.pem ubuntu@<ip-address>
- Setup your AWS keys in the environment by running (replacing the <...>):
echo "export AWS_ACCESS_KEY_ID=<your-access-key>" >> ~/.bashrc echo "export AWS_SECRET_ACCESS_KEY=<your-secret-key>" >> ~/.bashrc
exec $SHELLto reload your bash configuration.
- Run the following script to install the dependencies required for the staging
This process will query you for your
access key idand
secret key, since it setups the AWS cli.
Using a pre-built AMI
Once you've built the image from scratch, you can create an AMI out of it and resuse
it. You can then create a new instance by replacing <AMI_ID> in
with the AMID ID. Then run:
If the command succeeds, it will return an IP address which you can use to login to the machine using the following command:
ssh -i ec2-key.pem ubuntu@<ip-address>
Create a kubernetes cluster
To create a kubernetes cluster, ssh into the staging machine and simply run:
cd capture bash ./create_eks_cluster.sh <cluster-name> <num-workers> exec $SHELL # To update environment variables
<cluster-name is the name you will use to identify the cluster, and
<num-workers> is the number of worker nodes to create.
NOTE: The default worker machines are c4.8xlarge instances. If you'd like to try out different configurations, you need to modify:
create_eks_cluster.sh: Change the
NodeInstanceTypevalue from c4.8xlarge to a machine of your choice.
worker.yml.template: Change the
35.0to the number of cores on your machine type less 1 (kubernetes spawns some services on the machine that ask for ~ 1 core).
Update the cluster with new code
Before running a job on the cluster, you must build and deploy the code.
cd capture bash ./build_and_deploy.sh
Run a job on the cluster
scanner_cli_script.sh.template and modify it with the correct parameters
(they are the same as the DerpCLI command). Then, simply run the script.
Scaling the cluster up or down
cd capture bash ./scale_eks_workers.sh <cluster-name> <num-workers>
Deleting the cluster
cd capture bash ./delete_eks_cluster.sh <cluster-name>