Skip to content

Deployment Instructions Scalable Deployment

Eli Jones edited this page Mar 19, 2024 · 12 revisions

Robust, scalable setup (configuring a Beiwe cluster)

These instructions assume that your computer runs Linux, MacOS, the Windows Subsystem for Linux, or the Cygwin posix environment on Windows. If you are have the expertise to run the Beiwe Backend directly on Windows and encounter an issue that may be a compatibility issue please report it as a bug with the appropriate caveat. We will try and assist you.

There are many reasons to run a Beiwe cluster instead of an individual server, but the chief one is this: Onnela Lab runs a Beiwe cluster. All work is tested on a cluster deployment, and all application updates are handled by a script maintained in the beiwe-backend repository. By taking on the extra up-front work to set up a cluster you will receive first class support and, inevitably, be saved from a couple of unintended bugs.

Create an Amazon Web Services (AWS) account

  1. If you don't have an Amazon Web Services account, go to https://aws.amazon.com and click "Create an AWS account".
  2. Choose an AWS region that offers the services you need: S3, EC2, RDS, and Elastic Beanstalk. (Comparison table: https://aws.amazon.com/about-aws/global-infrastructure/regional-product-services/) As of 2020 this appears to be all regions, so pick one that is relatively close to the participants in your studies.

Obtain a domain name

You need to run your Beiwe deployment with a domain name; you can either buy one (like example.com) or obtain one from your organization (e.g., beiwe.myresearchcenter.myuniversity.edu). You need the domain name because Beiwe requires an SSL certificate or mobile devices will not be able to upload data.

Set up an IAM user with sufficient permissions

Before you can do anything with the AWS Command-Line Interface (CLI), you need to use the AWS web interface to create an Identity and Access Management (IAM) user with the AdministratorAccess policy attached. Here's how to do that:

  1. Log in to the AWS web GUI (https://signin.aws.amazon.com/console), and go to "Services" > "Security, Identity, & Compliance" > "IAM"
  2. On the left sidebar, click "Users" and then the "Add User" button. Fill in "User name" with a value of your choice, select "Programmatic access", and click "Next". On the next screen, select "Attach existing policies directly", and check the box next to AdministratorAccess. Click "Next", then review, and click "Create User".
  3. Download or copy the generated credentials ("User", "Access key ID", and "Secret access key") and save them; you will need them in a later step!

Generate a deployment key

You need a deployment key so that you can SSH from your local machine into the AWS servers you're setting up. Here's how to generate one:

  1. Go to "Services" > "Compute" > "EC2". Then in the left sidebar, under "Network & Security", click "Key Pairs".
  2. Click the "Create Key Pair" button, give it a name of your choice, and click "Create". Your browser should automatically download the private key as a .PEM file.
  3. Store the private key in an appropriate location on your local computer; on a Linux or macOS machine a good location is in the ~/.ssh/ directory.
  4. You will need to set permissions on the key, do that with this command chmod 600 path/to/your/key.pem

Create a Sentry Account

Sentry.io is a platform for monitoring errors and providing developers with runtime information. The Sentry credentials are optional, but we strongly recommend including them. We can offer much more support if you have Sentry configured.

  1. Create an account on Sentry.io. You can use the free tier; the primary limitation of that tier is that it only lets you create one login account.
  2. Create a project
  3. Once you have created or selected a project, click "Project Settings" in the top right corner, then in the left side menu, under "Data", click "Client Keys (DSN)".
  4. Save the "DSN" and "DSN (Public)" values- you'll need them in a later step.

Configure your application

  1. git clone the beiwe-backend repository to your local computer. git checkout the main branch, and navigate to the cluster-management/general_configuration directory.

  2. Make sure you are using Python 3.8 inside the beiwe-backend directory.

    • Note: Python 3.8 goes out of support in October 2024 (2024-10), we will be upgrading before that and providing guides like we did for previous platform version upgrades. (and please file a bug report if this documentation is out of date!)
    • Most OSes have fully retired Python 2, but if you are stuck on an old platform you may need to try python3 for your python executable and possibly the pip3 command to instal software dependencies.
    • It is very strongly recommended that you use a modern platform and a python virtual environment to install packages. This guide is not the place to go into detail, python virtual environments can be easily searched for online.
  3. Copy the file aws_credentials.example.json to create a new file called aws_credentials.json. Open the new file aws_credentials.json and fill in the appropriate values:

    • AWS_ACCESS_KEY_ID is the "Access key ID" from the IAM user you created in the step above.
    • AWS_SECRET_ACCESS_KEY is the "Secret access key" from the IAM user you created in the step above.
  4. Copy the file global_configuration.example.json to create a new file called global_configuration.json. Open the new global_configuration.json file and fill in the appropriate values:

    • DEPLOYMENT_KEY_NAME is the name of the deployment key you generated above (don't include the ".pem" filename extension).
    • DEPLOYMENT_KEY_FILE_PATH is the absolute filepath (including the filename and ".pem" extension) for where the deployment key is stored on your local computer.
    • VPC_ID: when you select an AWS region, AWS automatically creates one Virtual Private Cloud (VPC) for you. To find its ID, go to "Services" > "Networking & Content Delivery" > "VPC". Under "Resources", you should see a link for "1 VPC" (or a higher number, if you have manually created additional VPCs); click on that link, and then get your VPC ID from the table; it should be formatted like vpc-6789abcd.
    • AWS_REGION is Amazon's lowercase, hyphenated name for the region you're using. For example, if your region is "US East (Ohio)", this value should be us-east-2; if your region is "Asia Pacific (Sydney)", this value should be ap-southeast-2. You can look up the lowercase, hyphenated names here: http://docs.aws.amazon.com/general/latest/gr/rande.html#elasticbeanstalk_region
    • SYSTEM_ADMINISTRATOR_EMAIL is the email address that AWS (not Sentry) will administrative alerts, performance alarm notifications, deployment operation events, etc. This can be whatever you choose, but it should be an email address that the system administrator checks regularly.
  5. Make sure you have pip, the CLI Python package manager (https://pip.pypa.io/en/stable/), available on your computer. Then, in your local copy of the beiwe-backend repo, cd into the cluster_management/ directory and run the command below.

    • If you are using an os-provided Python 3 instance and are not using a Virtual Environment for package installation you should add the --user flag to this command. Installing packages directly into your operating system requires the use of sudo, may simply not work, will cause pip to complain, and has the capability of breaking other software on your computer that has expectations about system software packages.* To reiterate, we recommended that you use a modern platform and a python virtual environment to install packages.
    $ pip install -r launch_requirements.txt
    
  6. Still in the cluster_management/ directory, run:

    $ python launch_script.py -help-setup-new-environment
    

    Give a name for your environment when prompted to do so; from here on, we'll refer to that name you provided as [YOUR-ENV-NAME]. The script will then automatically create two more configuration files in cluster_management/environment_configuration.

  7. In the directory cluster_management/environment_configuration, edit both files:

    • Beiwe is currently using a very old format for Sentry DSNs. Until we have updated Beiwe any DSNs you provide may not work. However, the deployment script should tell you that your new-style DSN does not match the old DSN format, so for now just run without those credentials. We will be updating the Sentry support to use the new DSN format.
    • In the file [YOUR-ENV-NAME]_beiwe_environment_variables.json:
      • SENTRY_JAVASCRIPT_DSN: this should be a DSN (Public) value from your Sentry.io account, you can leave it empty or provide a dummy value if you do not have one.
      • SENTRY_ANDROID_DSN, SENTRY_ELASTIC_BEANSTALK_DSN, SENTRY_DATA_PROCESSING_DSN: these should all be non-public DSN values from your Sentry.io account. A DSN identifies which Sentry project errors get submitted to. If you want Android errors, Elastic Beanstalk errors, and Data Processing errors all lumped together in the same project, you can provide the same DSN for each of those. If you want the errors grouped in different projects, you can give each one a DSN for a different Sentry project.
      • DOMAIN: this is your server's domain name, including subdomains. (Don't include the http:// or https:// prefix.)
    • The file [YOUR-ENV-NAME]_server_settings.json contains default server types for your Beiwe deployment. You don't need to edit this in any way, but you are welcome to do so.
  8. cd back into cluster_management/ and enter the commands below. Each command will prompt you to Enter the name of the Elastic Beanstalk Environment you want to run this operation on; provide it with exactly the same environment name you used in the previous step.

    $ python launch_script.py -create-environment
    $ python launch_script.py -create-manager
    $ python launch_script.py -create-worker # (only necessary on clusters with large numbers of users.)
    

    All three of these commands take several minutes to run because they all involve waiting for AWS to spin up new servers. It is possible for commands to take more than 10 minutes if you have chosen very small server sizes or receive a dud server instances. For more information about what each of these commands does, run:

    $ python launch_script.py --help
    

    If any of the above commands fail with error messages about "Instance Profiles" (or any other IAM entities) you may be in a situation, usually cause interrupting deployment operations, you have extra AWS Instance Profiles hanging around. You can delete all Instance Profiles by running python launch_script.py -purge-instance-profiles. DO NOT RUN THIS COMMAND if you have a functional Elastic Beanstalk Beiwe cluster running, it will probably break it.

Set up the AWS Elastic Beanstalk Command-Line Interface (EB CLI)

  1. Install the EB CLI; follow the instructions on this page: Install the Elastic Beanstalk Command Line Interface (EB CLI)

  2. In the beiwe-backend repo, configure the file .elasticbeanstalk/config.yml:

    branch-defaults:
      master:                            # You can change "master" to any branch name
        environment: [YOUR-ENV-NAME]
    global:
      application_name: beiwe-application
      default_ec2_keyname: [DEPLOYMENT_KEY_NAME]  # Same as in global_configuration.json
      default_platform: 64bit Amazon Linux 2018.03 v2.9.4 running Python 3.6
      default_region: [AWS_REGION]                # Same as in global_configuration.json
      profile: eb-cli                             # this name just needs to match a profile declared in ~/.aws/
      sc: git
    
  3. cd into the root directory of the beiwe-backend repo, and run

    $ eb init
    

    When asked to provide your credentials, use the same IAM user credentials you pasted into cluster_management/general_configuration/aws_credentials.json. If it prompts you to add AWS CodeCommit, you can answer "n".

  4. From the root directory of the beiwe-backend repo, run:

    WARNING: do not run this command before configuring your HTTPS certificate in the EC2 console for the Load Balancer

    If you do so you may run into a scenario where you are locked out of all operations for 15+ minutes repeatedly. This is caused by Elastic Beanstalk erroring when it attempts a default health check using HTTPS. See the Configuring SSL section below.

    $ eb deploy
    

    Any time you want to update your deployment to the current version of the code, you can just run:

    $ git pull
    $ eb deploy
    

    For more information on using Elastic Beanstalk with Git, read Using the EB CLI with Git.

Configuring DNS

Beiwe clusters aren't served from a single server; it runs behind a Load Balancer (a kind of intelligent router) that distributes incoming requests to multiple servers. You need to point your domain name to the address of your deployment's Load Balancer. Go to "Services" > "EC2" and click "Load Balancers" in the left column. From the table of Load Balancers, copy the "DNS name" of your Load Balancer - this is the address you need to point your domain name at.

Note: a Load Balancer's IP address is not stable. You need to use its URL, which is formatted like this: awseb-AWSEBLoa-ABCDEF123456-1234567890.us-east-1.elb.amazonaws.com. Because of that:

  • If you want to use a root domain like example.com rather than beiwe.example.com, you may need to use Amazon Route 53 for your DNS. DNS specifications require that the root domain have only A records (ip addresses) attached to them, and disallow CNAME records. Most DNS providers seem to follow this. Route 53 is unusual in that it does allow you point an A record to a URL, though it uses the term "Alias" for this specific feature. We've heard that some other DNS providers will let you point an A record to a URL, but we don't know which providers those are; we do know that GoDaddy doesn't.

  • If you want to use a subdomain like beiwe.example.com, you can use a DNS provider of your choice; you can use a CNAME record to point from your subdomain to the Load Balancer's URL directly.

Configuring SSL

Because Beiwe often deals with sensitive data potentially covered under HIPAA regulations it is important to add an SSL certificate so that web traffic is encrypted with HTTPS. This is so important that Beiwe will not run without one except inside development environments.

You can use your own SSL certificate, or one provided by your organization, or a free SSL certificate from Amazon Certificate Manager (ACM). If using ACM, use the web interface to request an SSL certificate for your deployment (see documentation here). Amazon Certificate Manager will check that you control the domain by sending verification emails to the email addresses in the domain's WHOIS listing, so make sure you can receive emails from at least one of those addresses.

Clone this wiki locally