Skip to content

mini-school/infra-1

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Tutor Open edX Production Devops Tools

Hack.d Lawrence McDaniel Forums Documentation

Terraform AWS Docker Kubernetes

OPEN edX Tutor logo

This repository contains Terraform code and Github Actions workflows to deploy and manage a Tutor Kubernetes-managed production installation of Open edX that will automatically scale up, reliably supporting several hundred thousand learners.

The Terraform scripts in this repo provide a 1-click means of creating / updating / destroying the following for each environment:

You can also optionally automatically create additional environments for say, dev and test and QA and so forth. These would result in environments like the following:

Cookiecutter Manifest

This repository was generated using Cookiecutter. Keep your repository up to date with the latest Terraform code and configuration versions of the Open edX application stack, AWS infrastructure services and api code libraries by occasionally re-generating the Cookiecutter template using this make file.

Cookiecutter Version Control
Software Version
Open edX Named Release nutmeg.1
MySQL Server 5.7.33
Redis Cache 6.x
Tutor Docker-based Open edX Installer latest stable
Tutor Plugin: Object storage for Open edX with S3 v0.2.2
Tutor Plugin: Backup & Restore v0.0.6
Tutor Plugin: Credentials Application v13.0.2
Tutor Plugin: Discovery Service latest stable
Tutor Plugin: Micro Front-end Service latest stable
Tutor Plugin: Ecommerce Service latest stable
Tutor Plugin: Xqueue Service latest stable
Tutor Plugin: Notes Service latest stable
Tutor Plugin: Discussion Forum Service latest stable
Tutor Plugin: Android Application latest stable
Kubernetes Cluster 1.22
Terraform ~> 1.1
terraform-aws-modules/acm ~> 3.4
terraform-aws-modules/cloudfront ~> 2.9
terraform-aws-modules/eks ~> 18.21
terraform-aws-modules/iam ~> 5.0.0
terraform-aws-modules/rds ~> 4.3.0
terraform-aws-modules/s3-bucket ~> 3.2
terraform-aws-modules/security-group ~> 4.9
terraform-aws-modules/vpc ~> 3.14
Terraform Helm cert-manager ~> 1.8
Terraform Kubernetes Provider ~> 2.11
Terraform AWS Provider ~> 4.15
Terraform Local Provider ~> 2.2
Terraform Random Provider ~> 3.2

Important Considerations

  • this code only works for AWS.
  • the root domain schoddle.com must be hosted in AWS Route53. Terraform will create several DNS entries inside of this hosted zone, and it will optionally create additional hosted zones (one for each additional optional environment) that will be linked to the hosted zone of your root domain.
  • resources are deployed to this AWS region: ap-south-1
  • the Github Actions workflows depend on secrets `located here <settings> (see 'secrets/actions' from the left menu bar) `_
  • the Github Actions use an AWS IAM key pair from this manually-created user named *ci*
  • the collection of resources created by these scripts will generate AWS costs of around $0.41 USD per hour ($10.00 USD per day) while the platform is in a mostly-idle pre-production state. This cost will grow proportionally to your production work loads. You can view your AWS Billing dashboard here
  • BE ADVISED that MySQL RDS, MongoDB and Redis ElastiCache are vertically scaled manually and therefore require some insight and potential adjustments on your part. All of these services are defaulted to their minimum instance sizes which you can modify in the environment configuration file

Quick Start

I. Add Your Secret Credentials To This Repository

The Github Actions workflows in this repository depend on several workflow secrets including two sets of AWS IAM keypairs, one for CI workflows and another for the AWS Simple Email Service. Additionally, they require a Github Personal Access Token (PAT) for a Github user account with all requisite privileges in this repository as well as any other repositories that are cloned during any of the build / installation pipelines.

Github Repository Secrets

II. Configure Your Open edX Back End

Set your global parameters

locals {
  platform_name    = "schoddle"
  platform_region  = "global"
  root_domain      = "schoddle.com.ai"
  aws_region       = "ap-south-1"
  account_id       = "108973625715"
}

Set your production environment parameters

locals {

environment           = "courses"
environment_domain    = "${local.environment}.${local.global_vars.locals.root_domain}"
environment_namespace = "${local.environment}-${local.global_vars.locals.platform_name}-${local.global_vars.locals.platform_region}"


# AWS infrastructure sizing

mysql_instance_class            = "db.t2.small"
redis_node_type                 = "cache.t2.small"
eks_worker_group_instance_type  = "t3.large"

}

III. Build Your Open edX Backend

The backend build procedure is automated using Terragrunt for Terraform. Installation instructions are avilable at both of these web sites.

Terraform scripts rely on the AWS CLI (Command Line Interface) Tools. Installation instructions for Windows, macOS and Linux are available on this site. We also recommend that you install k9s, a popular tool for adminstering a Kubernetes cluster.

# -------------------------------------
# to build the entire backend
# -------------------------------------
cd ./terraform/environments/prod/vpc
terragrunt run-all init
terragrunt run-all apply

# -------------------------------------
# or, to manage an individual resource
# -------------------------------------
cd ./terraform/environments/prod/mongodb
terragrunt init
terragrunt validate
terragrunt plan
terragrunt apply
terragrunt destroy

terragrunt run-all init

IV. Connect To Your backend Services

Terraform creates friendly subdomain names for any of the backend services which you are likely to connect: Cloudfront, MySQL, Mongo and Redis. The ssh private pem key for accessing the EC2 Bastion instance is stored in Kubernetes secrets in the openedx namespace. Additionally, passwords for the root/admin accounts are accessible from Kubernetes Secrets. Note that each of MySQL, MongoDB and Redis reside in private subnets. These services can only be accessed on the command line from the Bastion.

ssh bastion.courses.schoddle.com -i path/to/schoddle-global-live-bastion.pem

mysql -h mysql.courses.schoddle.com -u root -p

mongo --port 27017 --host mongo.master.courses.schoddle.com -u root -p
mongo --port 27017 --host mongo.reader.courses.schoddle.com -u root -p

redis-cli -h redis.primary.courses.schoddle.com -p 6379

Specifically with regard to MySQL, several 3rd party analytics tools provide out-of-the-box connectivity to MySQL via a bastion server. Following is an example of how to connect to your MySQL environment using MySQL Workbench.

Connecting to MySQL Workbench

V. Add more Kubernetes admins

By default your AWS IAM user account will be the only user who can view, interact with and manage your new Kubernetes cluster. Other IAM users with admin permissions will still need to be explicitly added to the list of Kluster admins. If you're new to Kubernetes then you'll find detailed technical how-to instructions in the AWS EKS documentation, Enabling IAM user and role access to your cluster. You'll need kubectl in order to modify the aws-auth pod in your Kubernets cluster.

kubectl edit -n kube-system configmap/aws-auth

Following is an example aws-auth configMap with additional IAM user accounts added to the admin "masters" group.

# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: v1
data:
  mapRoles: |
    - groups:
      - system:bootstrappers
      - system:nodes
      rolearn: arn:aws:iam::012345678942:role/default-eks-node-group-20220518182244174100000002
      username: system:node:{{EC2PrivateDNSName}}
  mapUsers: |
    - groups:
      - system:masters
      userarn: arn:aws:iam::012345678942:user/lawrence.mcdaniel
      username: lawrence.mcdaniel
kind: ConfigMap
metadata:
  creationTimestamp: "2022-05-18T18:38:29Z"
  name: aws-auth
  namespace: kube-system
  resourceVersion: "499488"
  uid: 52d6e7fd-01b7-4c80-b831-b971507e5228

Continuous Integration (CI)

Both the Build as well as the Deploy workflows were pre-configured based on your responses to the Cookiecutter questionnaire. Look for these two files in .github/workflows. You'll find additional Open edX deployment and configuration files in ci/tutor-build and ci/tutor-deploy

I. Build your Tutor Docker Image

Use this automated Github Actions workflow to build a customized Open edX Docker container based on the latest stable version of Open edX (current nutmeg.1) and your Open edX custom theme repository and Open edX plugin repository. Your new Docker image will be automatically uploaded to AWS Amazon Elastic Container Registry

II. Deploy your Docker Image to a Kubernetes Cluster

Use this automated Github Actions workflow to deploy your customized Docker container to a Kubernetes Cluster. Open edX LMS and Studio configuration parameters are located here.

About The Open edX Platform Back End

The scripts in the terraform folder provide 1-click functionality to create and manage all resources in your AWS account. These scripts generally follow current best practices for implementing a large Python Django web platform like Open edX in a secure, cloud-hosted environment. Besides reducing human error, there are other tangible improvements to managing your cloud infrastructure with Terraform as opposed to creating and managing your cloud infrastructure resources manually from the AWS console. For example, all AWS resources are systematically tagged which in turn facilitates use of CloudWatch and improved consolidated logging and AWS billing expense reporting.

These scripts will create the following resources in your AWS account:

  • Compute Cluster. uses AWS EC2 behind a Classic Load Balancer.
  • Kubernetes. Uses `AWS Elastic Kubernetes Service `_ to implement a Kubernetes cluster onto which all applications and scheduled jobs are deployed as pods.
  • MySQL. uses AWS RDS for all MySQL data, accessible inside the vpc as mysql.courses.schoddle.com:3306. Instance size settings are located in the environment configuration file, and other common configuration settings are located here. Passwords are stored in Kubernetes Secrets accessible from the EKS cluster.
  • MongoDB. uses AWS DocumentDB for all MongoDB data, accessible insid the vpc as mongodb.master.courses.schoddle.com:27017 and mongodb.reader.courses.schoddle.com. Instance size settings are located in the environment configuration file, and other common configuration settings are located here. Passwords are stored in Kubernetes Secrets accessible from the EKS cluster.
  • Redis. uses AWS ElastiCache for all Django application caches, accessible inside the vpc as cache.courses.schoddle.com. Instance size settings are located in the environment configuration file. This is necessary in order to make the Open edX application layer completely ephemeral. Most importantly, user's login session tokens are persisted in Redis and so these need to be accessible to all app containers from a single Redis cache. Common configuration settings are located here. Passwords are stored in Kubernetes Secrets accessible from the EKS cluster.
  • Container Registry. uses this automated Github Actions workflow to build your tutor Open edX container and then register it in Amazon Elastic Container Registry (Amazon ECR). Uses this automated Github Actions workflow to deploy your container to AWS Amazon Elastic Kubernetes Service (EKS). EKS worker instance size settings are located in the environment configuration file. Note that tutor provides out-of-the-box support for Kubernetes. Terraform leverages Elastic Kubernetes Service to create a Kubernetes cluster onto which all services are deployed. Common configuration settings are located here
  • User Data. uses AWS S3 for storage of user data. This installation makes use of a Tutor plugin to offload object storage from the Ubuntu file system to AWS S3. It creates a public read-only bucket named of the form prod-schoddle-global-storage, with write access provided to edxapp so that app-generated static content like user profile images, xblock-generated file content, application badges, e-commerce pdf receipts, instructor grades downloads and so on will be saved to this bucket. This is not only a necessary step for making your application layer ephemeral but it also facilitates the implementation of a CDN (which Terraform implements for you). Terraform additionally implements a completely separate, more secure S3 bucket for archiving your daily data backups of MySQL and MongoDB. Common configuration settings are located here
  • CDN. uses AWS Cloudfront as a CDN, publicly acccessible as https://cdn.courses.schoddle.com. Terraform creates Cloudfront distributions for each of your enviornments. These are linked to the respective public-facing S3 Bucket for each environment, and the requisite SSL/TLS ACM-issued certificate is linked. Terraform also automatically creates all Route53 DNS records of form cdn.courses.schoddle.com. Common configuration settings are located here
  • Password & Secrets Management uses Kubernetes Secrets in the EKS cluster. Open edX software relies on many passwords and keys, collectively referred to in this documentation simply as, "secrets". For all back services, including all Open edX applications, system account and root passwords are randomly and strongluy generated during automated deployment and then archived in EKS' secrets repository. This methodology facilitates routine updates to all of your passwords and other secrets, which is good practice these days. Common configuration settings are located here
  • SSL Certs. Uses AWS Certificate Manager and LetsEncrypt. Terraform creates all SSL/TLS certificates. It uses a combination of AWS Certificate Manager (ACM) as well as LetsEncrypt. Additionally, the ACM certificates are stored in two locations: your aws-region as well as in us-east-1 (as is required by AWS CloudFront). Common configuration settings are located here
  • DNS Management uses AWS Route53 hosted zones for DNS management. Terraform expects to find your root domain already present in Route53 as a hosted zone. It will automatically create additional hosted zones, one per environment for production, dev, test and so on. It automatically adds NS records to your root domain hosted zone as necessary to link the zones together. Configuration data exists within several modules but the highest-level settings are located here
  • System Access uses AWS Identity and Access Management (IAM) to manage all system users and roles. Terraform will create several user accounts with custom roles, one or more per service.
  • Network Design. uses Amazon Virtual Private Cloud (Amazon VPC) based on the AWS account number provided in the global configuration file to take a top-down approach to compartmentalize all cloud resources and to customize the operating enviroment for your Open edX resources. Terraform will create a new virtual private cloud into which all resource will be provisioned. It creates a sensible arrangment of private and public subnets, network security settings and security groups. See additional VPC documentation here
  • Proxy Access to Backend Services. uses an Amazon EC2 t2.micro Ubuntu instance publicly accessible via ssh as bastion.courses.schoddle.com:22 using the ssh key specified in the global configuration file. For security as well as performance reasons all backend services like MySQL, Mongo, Redis and the Kubernetes cluster are deployed into their own private subnets, meaning that none of these are publicly accessible. See additional Bastion documentation here. Terraform creates a t2.micro EC2 instance to which you can connect via ssh. In turn you can connect to services like MySQL via the bastion. Common configuration settings are located here. Note that if you are cost conscious then you could alternatively use AWS Cloud9 to gain access to all backend services.

Fargate Release Notes

Fargate is a serverless compute alternative to EC2 instances. This is an experimental part of the Open edX devops stack. While the Fargate compute service itself is both stable and robust, it's integration with Terraform for purposes providing the compute layer for a kubernetes cluster is a relatively new thing, and comes with some headaches. For the avoidance of any doubt, Fargate runs well inside a Kubernetes cluster and for the most part is indistinguishable from a traditional EC2 server, aside from the obvious luxury of not needing to directly administer this aspect of the cluster. But on the other hand, Terraform's life cycle management of a kubernetes cluster running Fargate is imperfect. Before you deploy Fargate into a production environment please consider the following:

Known Issues

  • When using Terraform to create an EKS Kubernetes Cluster configured to use Fargate, the apply operation will fail on your first attempt. See error message below. This is a known issue that is caused by a race condition between coredns and creation of the Fargate node on which it runs. Re-attempting with terragrunt apply resolves the problem.
  • When using Terraform to destroy an EKS Kubernetes Cluster configured to use Fargate instead of EC2, you might experience any of the following:
    1. Terraform fails to destroy some of the IAM roles when destroying the EKS. Each is an eks Service-Linked Role. This is a known bug in the Terraform module.
    2. Terraform fails to destroy one or more of the EKS security groups. This is a known bug in the Terraform module.
  • Terraform fails to destroy the Application Load Balancer ingress. This is due to a dependency problem which I'm still trouble shooting. The temporary resolution is to delete the Terraform file terraform/modules/kubernetes_ingress_alb_controller/ingress.tf and then run terraform apply.
  • Other AWS admin users might lack permissions to view EKS resources in the AWS console, even if they have admin permissions or are logged in as the root account user. This is an AWS issue. I'm working on a set of instructions for configuring permissions for other users.
  • If Terraform is interrupted during execution then it is possible that it will lose track of its state, leading to Terraform attempting to create already-existing resources which will result in run-time errors. This is the expected behavior of Terraform, but it can be a huge pain in the neck to resolve.

Build Error

On your first build attempt you will encounter the following error aproximately 30 minutes into the kubernetes build. This is a know bug caused by a race condition in coredns installation when it is configured to run on Fargate nodes rather than EC2 instances. Restarting the build resolves the error, and the build should complete normally.

module.eks.aws_eks_addon.this["coredns"]: Still creating... [20m0s elapsed]
╷
│ Error: unexpected EKS Add-On (prod-stepwisemath-mexico:coredns) state returned during creation: timeout while waiting for state to become 'ACTIVE' (last state: 'DEGRADED', timeout: 20m0s)
│ [WARNING] Running terraform apply again will remove the kubernetes add-on and attempt to create it again effectively purging previous add-on configuration
│
│   with module.eks.aws_eks_addon.this["coredns"],
│   on .terraform/modules/eks/main.tf line 298, in resource "aws_eks_addon" "this":
│  298: resource "aws_eks_addon" "this" {
│
╵
Releasing state lock. This may take a few moments...
ERRO[1950] 1 error occurred:
  * exit status 1

FAQ

Why Use Tutor?

Tutor is the official Docker-based Open edX distribution, both for production and local development. The goal of Tutor is to make it easy to deploy, customize, upgrade and scale Open edX. Tutor is reliable, fast, extensible, and it is already used to deploy hundreds of Open edX platforms around the world.

  • Runs on Docker
  • 1-click installation and upgrades
  • Comes with batteries included: theming, SCORM, HTTPS, web-based administration interface, mobile app, custom translations…
  • Extensible architecture with plugins
  • Works out of the box with Kubernetes
  • Amazing premium plugins available in the Tutor Wizard Edition, including Cairn the next-generation analytics solution for Open edX.

Why Use Docker?

In a word, Docker is about "Packaging" your software in a way that simplifies how it is installed and managed so that you benefit from fast, consistent delivery of your applications. A Docker container image is a lightweight, standalone, executable package of software that includes everything needed to run an application: code, runtime, system tools, system libraries and settings. Meanwhile, Docker is an open platform for developing, shipping, and running applications.

For context, any software which you traditionally relied on Linux package managers like apt, snap or yum can alternativley be installed and run as a Docker container. Some examples of stuff which an Open edX platform depends: Nginx, MySQL, MongoDB, Redis, and the Open edX application software itself which Tutor bundles into a container using Docker Compose.

Why Use Kubernetes?

Kubernetes manages Docker containers in a deployment enviornment. It provides an easy way to scale your application, and is a superior, cost-effective alternative to you manually creating and maintaing individual virtual servers for each of your backend services. It keeps code operational and speeds up the delivery process. Kubernetes enables automating a lot of resource management and provisioning tasks.

Your Open edX platform runs via multiple Docker containers: the LMS Django application , CMS Django application, one or more Celery-based worker nodes for each of these applications, nginx, Caddy, and any backend services that tutor manages like Nginx and SMTP for example. Kubernetes creates EC2 instances and then decides where to place each of these containers based on various real-time resource-based factors. This leads to your EC2 instances carrying optimal workloads, all the time. Behind the scenes Kubernetes (EKS in our case) uses an EC2 Elastic Load Balancer (ELB) with an auto-scaling policy, both of which you can see from the AWS EC2 dashboard.

Why Use Terraform?

Terraform allows you to manage the entire lifecycle of your AWS cloud infrastructure using infrastructure as code (IAC). That means declaring infrastructure resources in configuration files that are then used by Terraform to provision, adjust and tear down your AWS cloud infrastructure. There are tangential benefits to using IAC.

  1. Maintain all of your backend configuration data in a single location. This allows you to take a more holistic, top-down approach to planning and managing your backend resources, which leads to more reliable service for your users.
  2. Leverage git. This is a big deal! Managing your backend as IAC means you can track individual changes to your configuration over time. More importantly, it means you can reverse backend configuration changes that didn't go as planned.
  3. It's top-down and bottom-up. You can start at the network design level and work your way up the stack, taking into consideration factors like security, performance and cost.
  4. More thorough. You see every possible configuration setting for each cloud service. This in turns helps to you to consider all aspects of your configuration decisions.
  5. More secure. IAC leads to recurring reviews of software versions and things getting patched when they should. It compels you to regularly think about the ages of your passwords. It makes it easier for you to understand how network concepts like subnets, private networks, CIDRs and port settings are being used across your entire backend.
  6. Saves money. Taking a top-down approach with IAC will lead to you proactively and sensibly sizing your infrastructure, so that you don't waste money on infrastructure that you don't use.
  7. It's what the big guys use. Your Open edX backend contains a lot of complexity, and it provides a view into the far-larger worlds of platforms like Google, Facebook, Tiktok and others. Quite simply, technology stacks have evolved to a point where we no longer have the ability to artesanlly manage any one part. That in a nutshell is why major internet platforms have been so quick to adopt tools like Terraform.

Why Use Terragrunt?

Terragrunt is a thin wrapper that provides extra tools for keeping your configurations DRY, working with multiple Terraform modules, and managing remote state. DRY means don't repeat yourself. That helped a lot with self-repeating modules we had to use in this architecture.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published