From 4d9116ce27f265312eb3f1a7965236c4ab7b68b3 Mon Sep 17 00:00:00 2001 From: "Jason Crowe (Mozilla)" Date: Mon, 12 Jun 2017 18:51:44 -0400 Subject: [PATCH] Updates for lint checks --- CONTRIBUTING.md | 227 ++-- DEPLOYMENT_OVERVIEW.md | 129 +- FEATURE_REQUESTS.md | 1 + GETTING_STARTED.md | 81 +- GIT_GITHUB.md | 82 +- IMAGES.md | 98 +- MANIFESTO.md | 154 ++- NETWORKING.md | 5 +- PREREQUISITES.md | 252 +++- PROJECT_ONBOARDING.md | 145 ++- PUPPET.md | 107 +- README.md | 82 +- RELEASING.md | 234 +++- SECURITY.md | 1 + TEMPLATING.MD | 82 -- TEMPLATING.md | 200 ++++ VERSIONING.md | 122 +- WALKTHROUGH.md | 46 +- .../IT_Walk_Through_20150601_Links.md | 100 +- reports/20150203.md | 41 +- training/README.md | 363 +++--- training/assumptions.md | 80 +- training/demonstrations.md | 42 +- training/exercise-one.md | 109 +- training/exercise-two.md | 28 +- training/introduction.md | 1030 ++++++++++++---- training/labs/nubis_dpaste.md | 113 +- training/labs/nubis_skel.md | 210 +++- training/nubis-overview.md | 1036 ++++++++++++----- training/operating-principles.md | 678 +++++++---- training/working-labs.md | 39 +- 31 files changed, 4281 insertions(+), 1636 deletions(-) delete mode 100644 TEMPLATING.MD create mode 100644 TEMPLATING.md diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index e457599..a681984 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -1,179 +1,254 @@ -# Nubis - Contributing + -The Nubis project is an open-source, collaborative project. And anybody is more than welcome to contribute to it. +# Nubis - Contributing + +The Nubis project is an open-source, collaborative project. And anybody is more +than welcome to contribute to it. ## Prerequisites -Before you can contribute to the Nubis project, you'll need to make sure of a few things beforehand. Head over and read the [Prerequisites](https://github.com/Nubisproject/nubis-docs/blob/master/PREREQUISITES.md) doc first. + +Before you can contribute to the Nubis project, you'll need to make sure of a +few things beforehand. Head over and read the [Prerequisites](https://github.com/Nubisproject/nubis-docs/blob/master/PREREQUISITES.md) +doc first. ## Overview -At this point, you should have all the tooling necessary to make changes to Nubis itself. +At this point, you should have all the tooling necessary to make changes to +Nubis itself. -Take the time to read the contents of [nubis-docs](https://github.com/Nubisproject/nubis-docs) where you'll find tons of useful documentation explaining a lot more details about the various parts that make up the Nubis project. +Take the time to read the contents of [nubis-docs](https://github.com/Nubisproject/nubis-docs) +where you'll find tons of useful documentation explaining a lot more details +about the various parts that make up the Nubis project. ### Code standards - - AWS deployments *must* be written in [Terraform](https://www.terraform.io/) - - Exceptions will only be allowed for unsupported resources - - Small utility scripts *must* be written in [Bash](https://www.gnu.org/software/bash/). - - Bash code *should* be POSIX compliant - - Bash code *should* be run through [ShellCheck](http://www.shellcheck.net/) before submittal. - - Tools *must* be written in [Go](https://golang.org/). - - Go code *should* be run through the [go linter](https://github.com/golang/lint). + +* AWS deployments *must* be written in [Terraform](https://www.terraform.io/) + * Exceptions will only be allowed for unsupported resources +* Small utility scripts *must* be written in [Bash](https://www.gnu.org/software/bash/). + * Bash code *should* be POSIX compliant + * Bash code *should* be run through [ShellCheck](http://www.shellcheck.net/) + before submittal. +* Tools *must* be written in [Go](https://golang.org/). + * Go code *should* be run through the [go linter](https://github.com/golang/lint). ### Process -Independently of what you are trying to achieve, the process should be more or less the same. +Independently of what you are trying to achieve, the process should be more or +less the same. #### File a GitHub issue -That should always be the first step. You found a bug, you thought of a new feature, you'd like to see something improved, doesn't matter. File an issue with as much information as possible. +That should always be the first step. You found a bug, you thought of a new +feature, you'd like to see something improved, doesn't matter. File an issue +with as much information as possible. -This will help keep track of the work being done while at the same time giving better visibility to the rest of the Nubis contributors. +This will help keep track of the work being done while at the same time giving +better visibility to the rest of the Nubis contributors. -Issues are required for all bugs, features and enhancement. A pull-request will not be merged if it doesn't close an existing issue. +Issues are required for all bugs, features and enhancement. A pull-request will +not be merged if it doesn't close an existing issue. #### Issue Labels -Issues labels are standardized across Nubis repositories and are used to classify the type of work: +Issues labels are standardized across Nubis repositories and are used to +classify the type of work: - * bug + \* bug -This is reserved for issues that represent a defect in an existing functionality. Small of big, if something is not behaving as it should, it's a bug. +This is reserved for issues that represent a defect in an existing +functionality. Small of big, if something is not behaving as it should, it's a +bug. - * enhancement + \* enhancement -This is a request for enhancement to an existing feature. It represent something that is not broken, but an opportunity for improvement. +This is a request for enhancement to an existing feature. It represent something +that is not broken, but an opportunity for improvement. - * feature + \* feature - This is a request for a brand new feature. When requesting that Nubis do something it doesn't do before, it's a feature request. +This is a request for a brand new feature. When requesting that Nubis do +something it doesn't do before, it's a feature request. - * docs + \* docs - Issues representing the need to document something, new or old. Improvements to existing documentation, or request for documenting something that currently isn't. +Issues representing the need to document something, new or old. Improvements to +existing documentation, or request for documenting something that currently +isn't. - * question + \* question - Issues that are asking a specific question about Nubis. It could be a request to better explain something, or the begging of a discussion about how to approach a certain problem or possible feature. +Issues that are asking a specific question about Nubis. It could be a request to +better explain something, or the begging of a discussion about how to approach a +certain problem or possible feature. - * decision + \* decision - Issues marked as questions that result in a concrete decision for the project. This label is used to mark the issue as decided, generally spawning a few more issues for implementation or documentation of what has been decided. +Issues marked as questions that result in a concrete decision for the project. +This label is used to mark the issue as decided, generally spawning a few more +issues for implementation or documentation of what has been decided. - * upgrade + \* upgrade - Issues reserved for upgrades to external Nubis components included. This could be puppet modules, software packages, etc. - Generally, these issues will be very log hanging fruits, requiring the bump of a version number somewhere and some testing. +Issues reserved for upgrades to external Nubis components included. This could +be puppet modules, software packages, etc. +Generally, these issues will be very log hanging fruits, requiring the bump of a +version number somewhere and some testing. - * invalid + \* invalid - Default GitHub label used to close issues that are not going to be adressed or are simply invalid. +Default GitHub label used to close issues that are not going to be adressed or +are simply invalid. - * duplicate + \* duplicate - Default GitHub label used to close an issue as a dupliate of another one. +Default GitHub label used to close an issue as a dupliate of another one. #### Issue Milestones -Milestones are used to track Nubis releases, and **only** repository owners should be allowed to assign them to issues. +Milestones are used to track Nubis releases, and **only** repository owners +should be allowed to assign them to issues. -Each issue that is slated for inclusion in a particular Nubis release will be assigned to the Milestone that corresponds to that release during the triage and planning process. +Each issue that is slated for inclusion in a particular Nubis release will be +assigned to the Milestone that corresponds to that release during the triage and +planning process. #### Fork the appropriate repository -No real work should happen directly on the main Nubis repositories. You should be doing things in a personal fork of these repositories. So fork away, if not something you've already done before. +No real work should happen directly on the main Nubis repositories. You should +be doing things in a personal fork of these repositories. So fork away, if not +something you've already done before. #### Make a branch This is one of the ways of GitHub. But it's really sensible. -Every single logical self-contained unit of work should live on a branch for it. Name it in a self-explanatory way, as that name will be shared with others. +Every single logical self-contained unit of work should live on a branch for it. +Name it in a self-explanatory way, as that name will be shared with others. Examples of good branch names: - * add-feature-x - * fix-time-sync-bug - * improve-documentation-for-strange-feature-x +* add-feature-x +* fix-time-sync-bug +* improve-documentation-for-strange-feature-x Example of bad branch names: - * documentation - * fix-bug-1234 (what is that bug again?) - * stuff - * work-from-2016-03-03 - * username +* documentation +* fix-bug-1234 (what is that bug again?) +* stuff +* work-from-2016-03-03 +* username #### Do the work -Now you get to do what you've been wanting to do. So go ahead and do it. Fix that bug, improve that feature, add this new knob. +Now you get to do what you've been wanting to do. So go ahead and do it. Fix +that bug, improve that feature, add this new knob. -Working in git, commit often, commit soon. But keep in mind that your commit history will possibly be seen and reviewed by others, so keep it tidy if you can. +Working in git, commit often, commit soon. But keep in mind that your commit +history will possibly be seen and reviewed by others, so keep it tidy if you +can. #### Test the work -No matter how small your changes, you should at a bare minimum ensure you can still build the image with *nubis-builder* before considering your work done. +No matter how small your changes, you should at a bare minimum ensure you can +still build the image with *nubis-builder* before considering your work done. -Depending on what you are doing, you might want to perform much more in-depth testing, by spinning up the image you are building in AWS and such. But do try and make sure your work does what it meant to achieve, and nothing else. If you stumble on a bug or some documentation you'd like to see fixed, start back at the top, and file an issue for that. +Depending on what you are doing, you might want to perform much more in-depth +testing, by spinning up the image you are building in AWS and such. But do try +and make sure your work does what it meant to achieve, and nothing else. If you +stumble on a bug or some documentation you'd like to see fixed, start back at +the top, and file an issue for that. #### Submit the work -The Nubis project uses a sheriff system similar to Mozilla's. This means that we try very hard and assign Sheriffs to each component of Nubis, responsible for reviewing changes to that component. +The Nubis project uses a sheriff system similar to Mozilla's. This means that we +try very hard and assign Sheriffs to each component of Nubis, responsible for +reviewing changes to that component. -Once you are ready, you should submit a pull-request to the repository you forked, effectively requesting inclusion of your work into Nubis itself. +Once you are ready, you should submit a pull-request to the repository you +forked, effectively requesting inclusion of your work into Nubis itself. -Remember, you are feeding this to another fellow human who will review your work. Take the time to make the pull-request contain what you think would be the best information necessary to make the job of the reviewer easier. Explain what you are doing, if there are tricky bits, point them out, etc. +Remember, you are feeding this to another fellow human who will review your +work. Take the time to make the pull-request contain what you think would be the +best information necessary to make the job of the reviewer easier. Explain what +you are doing, if there are tricky bits, point them out, etc. -The only writes to the official Nubis repositories will be the merging of pull-requests. Feature branches are not mandatory, but highly recommended. +The only writes to the official Nubis repositories will be the merging of +pull-requests. Feature branches are not mandatory, but highly recommended. #### Code Review -Every Pull-Request needs to be reviewed (+1) by at least one committer before being allowed to be merged in. This process is still being formalized, and currently relies on the knoledge and experience of the current members of the project. +Every Pull-Request needs to be reviewed (+1) by at least one committer before +being allowed to be merged in. This process is still being formalized, and +currently relies on the knoledge and experience of the current members of the +project. But, code reviews will at a minimum include these: - * Code needs to be polished and of acceptable quality - * Code needs to conform to the projet's coding standards and design principles - * Code needs to follow a reasonably consistent indentation style - * Code needs to be atomic ( Each Pull-Request should implement one feature or fix one bug ) - * Whitespace/indentation changes should be handled separately, to keep the noise of the review request down. +* Code needs to be polished and of acceptable quality +* Code needs to conform to the projet's coding standards and design principles +* Code needs to follow a reasonably consistent indentation style +* Code needs to be atomic ( Each Pull-Request should implement one feature or + fix one bug ) +* Whitespace/indentation changes should be handled separately, to keep the noise + of the review request down. -Conversation on the pull-requests is encouraged to improve the qualiy of the request prior to merging it. Anybody is welcome to add feedback and/or questions to open pull-requests. +Conversation on the pull-requests is encouraged to improve the qualiy of the +request prior to merging it. Anybody is welcome to add feedback and/or questions +to open pull-requests. -It is the responsability of the submitter of the pull-request to address the issues raised during the review, if they want to see their pull-request successfully merged. +It is the responsability of the submitter of the pull-request to address the +issues raised during the review, if they want to see their pull-request +successfully merged. -Code is written by people, but it's important to remember that reviews are about the work, not the person. Stay civil and polite, and remember that it's an evaluation of the code and how to help include it in the project. +Code is written by people, but it's important to remember that reviews are about +the work, not the person. Stay civil and polite, and remember that it's an +evaluation of the code and how to help include it in the project. -The objective of any code review, for the reviewer, should be focused on helping the submitter to get his work included into the project. +The objective of any code review, for the reviewer, should be focused on helping +the submitter to get his work included into the project. #### Repeat -At this point, it's almost done. Be prepared for possibly some back and forth with the reviewer. There might be questions about bits of code, for instance. +At this point, it's almost done. Be prepared for possibly some back and forth +with the reviewer. There might be questions about bits of code, for instance. -Or there might be requests for changes or fixes to your work. In that case, it's safe to do that work back on that same branch and they will be added to the pull-request. +Or there might be requests for changes or fixes to your work. In that case, it's +safe to do that work back on that same branch and they will be added to the +pull-request. -Once the review completes successfully, your branch will be merged back into the master branch of the Nubis repository, and your work will be included in the next official image builds. +Once the review completes successfully, your branch will be merged back into the +master branch of the Nubis repository, and your work will be included in the +next official image builds. ## Commiters -Each repository has a Commiter Team, whose members are allowed to merge pull-requests into the repository. +Each repository has a Commiter Team, whose members are allowed to merge +pull-requests into the repository. -*Note*: It's bad practice to merge your own pull-requests, as that defeats the review process. +*Note*: It's bad practice to merge your own pull-requests, as that defeats the +review process. -Adding/Removing a member to that Team is the result of a majority vote among existing members. +Adding/Removing a member to that Team is the result of a majority vote among +existing members. (Comitter Agreement and Agreement to follow establish processes, etc) -Each Team also has an appointed Technical Lead that holds a tie-breaking vote on that repository. +Each Team also has an appointed Technical Lead that holds a tie-breaking vote on +that repository. -Commit access is a privilege, not a right. It's is earned by one's contributions and the quality of the work produced. It's all about the quality and health of the project, nothing less, nothing more. +Commit access is a privilege, not a right. It's is earned by one's contributions +and the quality of the work produced. It's all about the quality and health of +the project, nothing less, nothing more. ## Contact * IRC: ```#nubis-users``` on `irc.mozilla.org` -* General mailing list: [Google Groups] (https://groups.google.com/forum/#!forum/nubis-users) -* Developer mailing list: [Google Groups] (https://groups.google.com/forum/#!forum/nubis-dev) +* General mailing list: [Google Groups](https://groups.google.com/forum/#!forum/nubis-users) +* Developer mailing list: [Google Groups](https://groups.google.com/forum/#!forum/nubis-dev) ## TODO: More concrete examples + * Bug Fixes * Improvements * New base features diff --git a/DEPLOYMENT_OVERVIEW.md b/DEPLOYMENT_OVERVIEW.md index ed73743..9372139 100644 --- a/DEPLOYMENT_OVERVIEW.md +++ b/DEPLOYMENT_OVERVIEW.md @@ -1,61 +1,92 @@ -# Deployment Overview -A Nubis Account Deployment consists of a number of standard services and security integrations. This document provides and overview of the account and services provided. Each service is self-contained and links are provided to each services' documentation which details that specific service. + + +# Deployment Overview + +A Nubis Account Deployment consists of a number of standard services and +security integrations. This document provides and overview of the account and +services provided. Each service is self-contained and links are provided to each +services' documentation which details that specific service. ## Nubis Account Diagram + ![Nubis Account Diagram](media/Nubis_Account_Diagram.png "Nubis Account Diagram") -NOTE: Details for the deployment including; naming conventions, relationships, permissions, etcetera, can be found in the [Terraform template](https://github.com/nubisproject/nubis-deploy/blob/master/main.tf) used for deployment. +NOTE: Details for the deployment including; naming conventions, relationships, +permissions, etcetera, can be found in the [Terraform template](https://github.com/nubisproject/nubis-deploy/blob/master/main.tf) +used for deployment. ### Services Provided + This is a list of all of the services available in a Nubis Account. **TODO**: Add missing documentation links - - [VPC](https://github.com/nubisproject/nubis-deploy/blob/master/README.md#vpc-deployment) - - [Consul](https://github.com/nubisproject/nubis-consul/blob/master/README.md#consul-deployment) - - [Jumphost](https://github.com/nubisproject/nubis-jumphost/blob/master/README.md#jumphost-deployment) - - [Fluent](https://github.com/nubisproject/nubis-fluent-collector/blob/master/README.md#fluent-deployment) - - [Opsec / CloudTrail](https://github.com/nubisproject/nubis-deploy/blob/master/README.md#opsec-deployment) - - [CI](https://github.com/nubisproject/nubis-ci/blob/master/README.md#ci-deployment) - - [VPN](https://github.com/nubisproject/nubis-deploy/blob/master/README.md#vpc-deployment) - - User Management - - [NAT / Proxy](https://github.com/nubisproject/nubis-nat/blob/master/README.md#nat-deployment) - - [Prometheus](https://github.com/nubisproject/nubis-prometheus/blob/master/README.md#prometheus-deployment) - - [ELK](https://github.com/nubisproject/nubis-fluent-collector/blob/master/README.md#deployment-notes) - -It is important to note that not all services are deployed in every account. To determine which services are deployed in a specific account you will need to consult the deployment configuration file for that account. For example, you can find the configuration files for the Nubis' Teams accounts in the [nubis-accounts-nubis](https://github.com/nubisproject/nubis-accounts-nubis) repository. - -Within each configuration file are a set of feature flags, these flags are used to enable or disable specific services and are discussed [below](#feature-flags). +* [VPC](https://github.com/nubisproject/nubis-deploy/blob/master/README.md#vpc-deployment) +* [Consul](https://github.com/nubisproject/nubis-consul/blob/master/README.md#consul-deployment) +* [Jumphost](https://github.com/nubisproject/nubis-jumphost/blob/master/README.md#jumphost-deployment) +* [Fluent](https://github.com/nubisproject/nubis-fluent-collector/blob/master/README.md#fluent-deployment) +* [Opsec / CloudTrail](https://github.com/nubisproject/nubis-deploy/blob/master/README.md#opsec-deployment) +* [CI](https://github.com/nubisproject/nubis-ci/blob/master/README.md#ci-deployment) +* [VPN](https://github.com/nubisproject/nubis-deploy/blob/master/README.md#vpc-deployment) +* User Management +* [NAT / Proxy](https://github.com/nubisproject/nubis-nat/blob/master/README.md#nat-deployment) +* [Prometheus](https://github.com/nubisproject/nubis-prometheus/blob/master/README.md#prometheus-deployment) +* [ELK](https://github.com/nubisproject/nubis-fluent-collector/blob/master/README.md#deployment-notes) + +It is important to note that not all services are deployed in every account. To +determine which services are deployed in a specific account you will need to +consult the deployment configuration file for that account. For example, you can +find the configuration files for the Nubis' Teams accounts in the +[nubis-accounts-nubis](https://github.com/nubisproject/nubis-accounts-nubis) repository. + +Within each configuration file are a set of feature flags, these flags are used +to enable or disable specific services and are discussed [below](#feature-flags). ### Decryption Keys -You will need your GPG key added to the authorized configuration to view these files. [Git-crypt](https://github.com/AGWA/git-crypt) is used to manage encrypting the files. You will need to contact the team responsible for the deployment repository to gain decryption abilities. + +You will need your GPG key added to the authorized configuration to view these +files. [Git-crypt](https://github.com/AGWA/git-crypt) is used to manage encrypting +the files. You will need to contact the team responsible for the deployment +repository to gain decryption abilities. To determine which team to contact you will need to: - - Log into ServiceNow (The Hub) - - You will need the 'Amazon Web Services (AWS)' module enabled (Yellow Arrow) - - Select 'AWS Assets' (Red Circle) - - Locate the account by name or number (Purple Arrows) - - Locate the 'Account Email Address' (Sea-Green Square) - - Send an email to the address requesting access -**NOTE:** Only Nubis project accounts contain 'nubis' in the name. Application accounts are named after the deployed application. +* Log into ServiceNow (The Hub) +* You will need the 'Amazon Web Services (AWS)' module enabled (Yellow Arrow) +* Select 'AWS Assets' (Red Circle) +* Locate the account by name or number (Purple Arrows) +* Locate the 'Account Email Address' (Sea-Green Square) +* Send an email to the address requesting access + +**NOTE:** Only Nubis project accounts contain 'nubis' in the name. Application +accounts are named after the deployed application. ![Service Now Screenshot](media/Service_Now_Screenshot.png "Service Now Screenshot") ### Security Integrations -There are a number of security integrations deployed into a Nubis Account. These are not available via feature flags and are always deployed in an account. Note that specific services contain additional security integrations which are detailed with the documentation for the service. + +There are a number of security integrations deployed into a Nubis Account. These +are not available via feature flags and are always deployed in an account. Note +that specific services contain additional security integrations which are +detailed with the documentation for the service. **TODO**: List security integrations - - SSH security group - - MIG - - NSM - - IP Blocklist - - HTTP(S) Proxy - - Cloud Trail - - + +* SSH security group +* MIG +* NSM +* IP Blocklist +* HTTP(S) Proxy +* Cloud Trail +* ? ### Feature Flags -Within the account deployment variables file are a number of feature flags. These flags are used to select which services to deploy into the account. For a complete list of services depoyed into a particular account you will need to consult that accounts variables file. Here is an example of some of the feature flags available: + +Within the account deployment variables file are a number of feature flags. +These flags are used to select which services to deploy into the account. For a +complete list of services depoyed into a particular account you will need to +consult that accounts variables file. Here is an example of some of the feature +flags available: ```bash features.consul = 1 @@ -69,14 +100,32 @@ features.user_management_consul = 0 ``` ## Deployment Workflow -All deployment methods use Terraform as the descriptive language. The process varies somewhat depending on weather you are deploying a Nubis account or an Application in a Nubis account. + +All deployment methods use Terraform as the descriptive language. The process +varies somewhat depending on weather you are deploying a Nubis account or an +Application in a Nubis account. ### Account Deployment Workflow -Account deployments are quite simple in practice. To deploy an account you need access to the encrypted variables file discussed above. You also need to have admin (*.*) privileges in AWS. The actual deployment is accomplished with Terraform and is described in greater detail in the [nubis-deployment repository](https://github.com/nubisproject/nubis-deploy/blob/master/README.md). + +Account deployments are quite simple in practice. To deploy an account you need +access to the encrypted variables file discussed above. You also need to have +admin (*.*) privileges in AWS. The actual deployment is accomplished with +Terraform and is described in greater detail in the +[nubis-deployment repository](https://github.com/nubisproject/nubis-deploy/blob/master/README.md). ### Application Deployment Workflow -Application deployment is a bit more complex. If you are working in a Sandbox account that you will likely be using a manual process utilizing Terraform. When working in a production account the deployment is automated by using Jenkins as the continuous integration (CI) system -The CI system monitors the application's deployment repository hosted in git, typically GitHub. When a change lands in the repository, CI triggers AMI builds and (if successful) deploys the new image into the Stage environment (VPC). Deployment to production is typically triggered manually, through the CI system. The Prod deployment does not build an AMI, instead it uses the latest successfully built AMI from the Stage environment. This helps to ensure that only working AMIs are deployed into production. +Application deployment is a bit more complex. If you are working in a Sandbox +account that you will likely be using a manual process utilizing Terraform. When +working in a production account the deployment is automated by using Jenkins as +the continuous integration (CI) system + +The CI system monitors the application's deployment repository hosted in git, +typically GitHub. When a change lands in the repository, CI triggers AMI builds +and (if successful) deploys the new image into the Stage environment (VPC). +Deployment to production is typically triggered manually, through the CI system. +The Prod deployment does not build an AMI, instead it uses the latest +successfully built AMI from the Stage environment. This helps to ensure that +only working AMIs are deployed into production. -![Application Deployment Workflow](media/Application_Deployment_Workflow.png "Application Deployment Workflow") +![Application Deployment Workflow](media/Application_Deployment_Workflow.png "Flow") diff --git a/FEATURE_REQUESTS.md b/FEATURE_REQUESTS.md index ad4d99e..690b39e 100644 --- a/FEATURE_REQUESTS.md +++ b/FEATURE_REQUESTS.md @@ -1 +1,2 @@ + # Nubis - Feature Requests diff --git a/GETTING_STARTED.md b/GETTING_STARTED.md index 159828d..adaa454 100644 --- a/GETTING_STARTED.md +++ b/GETTING_STARTED.md @@ -1,43 +1,78 @@ -## Getting started with the Nubis Project -Welcome to the Nubis Project. We hope you will find that it meets your requirements and is easy to use. In this document I will introduce you to the Nubis Project and give you a number of links to other documents that will help you along. + -The Nubis Project is in essence a framework for deploying applications to the cloud. At this time we support only Amazon Web Services (AWS). For an overview of our design principles I recommend you read our [manifesto](https://github.com/Nubisproject/nubis-docs/blob/master/MANIFESTO.md). +# Getting started with the Nubis Project -### Familiarize yourself with the Nubis Project -Now, to get you up to speed with everything you will need to know to use the Nubis Project, I will provide for you a reading list. Not to worry, while this list looks long, most of the documents are quite short. -* [Nubis Overview](https://github.com/Nubisproject/nubis-docs/blob/master/SYSTEM_OVERVIEW.md) will give you an outline of the pieces of the project. -* [Git & GitHub](https://github.com/Nubisproject/nubis-docs/blob/master/GIT_GITHUB.md) provides some advice specific to Nubis. -* [CloudFormation](https://github.com/Nubisproject/nubis-docs/blob/master/CLOUDFORMATION.md) walks through some recomendations on structure and workflow. -* [Prerequisites](https://github.com/Nubisproject/nubis-docs/blob/master/PREREQUISITES.md) will get you set up with all the necessary tools. -* [Project Onbording](https://github.com/Nubisproject/nubis-docs/blob/master/PROJECT_ONBOARDING.md) will guide you through onboarding your first application. +Welcome to the Nubis Project. We hope you will find that it meets your +requirements and is easy to use. In this document I will introduce you to the +Nubis Project and give you a number of links to other documents that will help +you along. -### Deployment -Now that you are familiar with the project and the process, it is time to get coding. The first step is to assemble your deployment repository. Then it will be time to deploy into the sandbox. +The Nubis Project is in essence a framework for deploying applications to the +cloud. At this time we support only Amazon Web Services (AWS). For an overview +of our design principles I recommend you read our [manifesto](https://github.com/Nubisproject/nubis-docs/blob/master/MANIFESTO.md). -As we have seen in various examples throughout these documents, you will need to create a deployment repository. Take a look at the [System Overview](link) document for details. +## Familiarize yourself with the Nubis Project -Once your repository is all set up the next step it to deploy into the sandbox. You can deploy following the procedures outlined in the [Project Onbording](https://github.com/Nubisproject/nubis-docs/blob/master/PROJECT_ONBOARDING.md#Application Build Out) doc. Some example commands can be found in our trusty [nubis-mediawiki](https://github.com/Nubisproject/nubis-mediawiki/blob/master/nubis/cloudformation/README.md) repository. +Now, to get you up to speed with everything you will need to know to use the +Nubis Project, I will provide for you a reading list. Not to worry, while this +list looks long, most of the documents are quite short. +* [Nubis Overview](https://github.com/Nubisproject/nubis-docs/blob/master/SYSTEM_OVERVIEW.md) + will give you an outline of the pieces of the project. +* [Git & GitHub](https://github.com/Nubisproject/nubis-docs/blob/master/GIT_GITHUB.md) + provides some advice specific to Nubis. +* [CloudFormation](https://github.com/Nubisproject/nubis-docs/blob/master/CLOUDFORMATION.md) + walks through some recomendations on structure and workflow. +* [Prerequisites](https://github.com/Nubisproject/nubis-docs/blob/master/PREREQUISITES.md) + will get you set up with all the necessary tools. +* [Project Onbording](https://github.com/Nubisproject/nubis-docs/blob/master/PROJECT_ONBOARDING.md) + will guide you through onboarding your first application. -### Bugs, Contributions and more -We are super excited to have you here! If you have stumbled into an issue there are several ways to address it. +## Deployment -First, you can fix the issue yourself and file a pull request. You will find a guide in our [Contributing Doc](https://github.com/Nubisproject/nubis-docs/blob/master/CONTRIBUTING.md). +Now that you are familiar with the project and the process, it is time to get +coding. The first step is to assemble your deployment repository. Then it will +be time to deploy into the sandbox. -Next, you can file an issue. Simply navigate to the Nubis Project space on Github [here](https://github.com/Nubisproject), select the appropriate repository and click on the issues link. For example, to file an issue against nubis-stacks you would go [here](https://github.com/Nubisproject/nubis-stacks/issues) +As we have seen in various examples throughout these documents, you will need to +create a deployment repository. Take a look at the [System Overview](link) +document for details. -Finally if you are looking for a new feature to be supported, simply follow the [Feature Requests](https://github.com/Nubisproject/nubis-docs/blob/master/FEATURE_REQUESTS.md) guide. +Once your repository is all set up the next step it to deploy into the sandbox. +You can deploy following the procedures outlined in the [Project Onbording](https://github.com/Nubisproject/nubis-docs/blob/master/PROJECT_ONBOARDING.md#application-build-out) +doc. Some example commands can be found in our trusty [nubis-mediawiki](https://github.com/Nubisproject/nubis-mediawiki/blob/master/nubis/cloudformation/README.md) +repository. + +## Bugs, Contributions and more + +We are super excited to have you here! If you have stumbled into an issue there +are several ways to address it. + +First, you can fix the issue yourself and file a pull request. You will find a +guide in our [Contributing Doc](https://github.com/Nubisproject/nubis-docs/blob/master/CONTRIBUTING.md). + +Next, you can file an issue. Simply navigate to the Nubis Project space on +Github [here](https://github.com/Nubisproject), select the appropriate +repository and click on the issues link. For example, to file an issue against +nubis-stacks you would go [here](https://github.com/Nubisproject/nubis-stacks/issues) + +Finally if you are looking for a new feature to be supported, simply follow the +[Feature Requests](https://github.com/Nubisproject/nubis-docs/blob/master/FEATURE_REQUESTS.md) +guide. --- + ## TODO + Document these things + * set up git repo - * add nubis directory + * add nubis directory * link to structure doc - * discuss packer and nubis-builder - * discuss packers use of puppet + * discuss packer and nubis-builder + * discuss packers use of puppet * describe cloudformation template system - * link to cloudformation layout doc? + * link to cloudformation layout doc? * discuss what is and is not appropritae to place in the bin directory * walk through deployment of application * need to link to set up for Nubis doc (set up aws, git, github, etc...) diff --git a/GIT_GITHUB.md b/GIT_GITHUB.md index 493369d..4dfb706 100644 --- a/GIT_GITHUB.md +++ b/GIT_GITHUB.md @@ -1,28 +1,74 @@ -## Recommended Practices for Git & GitHub -This document will walk you through some best practices that we recommend for working with the Nubis project. I will not cover much about the basic operation of git or GitHub as there are a large number of tutorials online that cover these topics. Instead I will concentrate on the specifics that will help you to get the most out of the Nubis project, but most importantly will help you to avoid some pitfalls along the way. + -### Deployment Repository -Lets start with what we will call the "Deployment Repository". This is a git repository, typically available on GitHub, that contains all of the pieces necessary to deploy your Application. This includes two things, your application code and a collection of Nubis files. It is important that you follow this layout as our automation tools expect to find things in specific locations. For an example, check out the example [nubis-mediawiki](https://github.com/Nubisproject/nubis-mediawiki) repository. +# Recommended Practices for Git & GitHub -#### Application Code -Your application code can be contained within this repository or it can simply be included as a git submodule. The choice to embed your code directly in the repository simplifies your application by having everything in one location. On the other hand if you separate out your application code from your deployment repository you can have different people responsible for different aspects of your code. Additionally this allows your application code repository to remain deployment agnostic. The choice is yours, but I generally recommend you use the submodule method for clean separation of responsibility and technology. +This document will walk you through some best practices that we recommend for +working with the Nubis project. I will not cover much about the basic operation +of git or GitHub as there are a large number of tutorials online that cover +these topics. Instead I will concentrate on the specifics that will help you to +get the most out of the Nubis project, but most importantly will help you to +avoid some pitfalls along the way. -#### Nubis Files -The nubis files are all contained in a single folder called, not surprisingly, nubis. Typically there will be three or more folders contained within the nubis folder; puppet, builder and cloudformation. You can learn more about this layout over in the [Nubis Overview](link) document. +## Deployment Repository -### Branching -We recommend using topic (feature) branches while developing new features. This allows you to switch easily between different development work-flows without all that stashing nonsense. You can learn more about branching (and merging) [here](http://git-scm.com/book/en/v2/Git-Branching-Basic-Branching-and-Merging). +Lets start with what we will call the "Deployment Repository". This is a git +repository, typically available on GitHub, that contains all of the pieces +necessary to deploy your Application. This includes two things, your application +code and a collection of Nubis files. It is important that you follow this +layout as our automation tools expect to find things in specific locations. For +an example, check out the example [nubis-mediawiki](https://github.com/Nubisproject/nubis-mediawiki) +repository. -### Issues & Pull Requests -For anything related to the Nubis Project itself, you can either file an [issue](https://guides.github.com/features/issues/) for us or submit a patch by using GtiHubs [pull request](https://help.github.com/articles/using-pull-requests/) method. Either way we will be sure to work with you to solve your issue. +### Application Code -You might be interested in checking out [Hub](https://hub.github.com/), it makes working with GitHub from the command line a snap. +Your application code can be contained within this repository or it can simply +be included as a git submodule. The choice to embed your code directly in the +repository simplifies your application by having everything in one location. On +the other hand if you separate out your application code from your deployment +repository you can have different people responsible for different aspects of +your code. Additionally this allows your application code repository to remain +deployment agnostic. The choice is yours, but I generally recommend you use the +submodule method for clean separation of responsibility and technology. -### Code Reviews -For everything relating to the Nubis Project itself, we require a code review before landing anything. We do not currently have a strict process for this, however one of the core maintainers (or module owner) must review code before it is merged to any production branch. We define a production branch as any branch that can affect any running systems. For example, branches that deploy the Sandbox are considered "Production" as they affect the productivity of people using this system. +### Nubis Files -We recommend that you adopt a code review process for all aspects of your application. This helps to reduce production downtime, helps to maintain cohesiveness and ensures code continues to follow your style guidelines. +The nubis files are all contained in a single folder called, not surprisingly, +nubis. Typically there will be three or more folders contained within the nubis +folder; puppet, builder and cloudformation. You can learn more about this layout +over in the [Nubis Overview](link) document. + +## Branching + +We recommend using topic (feature) branches while developing new features. This +allows you to switch easily between different development work-flows without all +that stashing nonsense. You can learn more about branching (and merging) [here](http://git-scm.com/book/en/v2/Git-Branching-Basic-Branching-and-Merging). + +## Issues & Pull Requests + +For anything related to the Nubis Project itself, you can either file an [issue](https://guides.github.com/features/issues/) +for us or submit a patch by using GtiHubs [pull request](https://help.github.com/articles/using-pull-requests/) +method. Either way we will be sure to work with you to solve your issue. + +You might be interested in checking out [Hub](https://hub.github.com/), it makes +working with GitHub from the command line a snap. + +## Code Reviews + +For everything relating to the Nubis Project itself, we require a code review +before landing anything. We do not currently have a strict process for this, +however one of the core maintainers (or module owner) must review code before it +is merged to any production branch. We define a production branch as any branch +that can affect any running systems. For example, branches that deploy the +Sandbox are considered "Production" as they affect the productivity of people +using this system. + +We recommend that you adopt a code review process for all aspects of your +application. This helps to reduce production downtime, helps to maintain +cohesiveness and ensures code continues to follow your style guidelines. + +--- + +## TODO -### TODO * Describe versioning -* Details on directory layout may be in another doc and should be linked here. \ No newline at end of file +* Details on directory layout may be in another doc and should be linked here. diff --git a/IMAGES.md b/IMAGES.md index d7d499b..27b3300 100644 --- a/IMAGES.md +++ b/IMAGES.md @@ -1,35 +1,52 @@ + # Nubis - Building Quality Images Nubis is all about building system images in an automated and repeatable way. -These images should be thought about as immutables ones, that is, images that are a static known quantity with well known and defined propreties, that only change when they are built. +These images should be thought about as immutables ones, that is, images that +are a static known quantity with well known and defined propreties, that only +change when they are built. A lot of the rational behind that can be found in our [MANIFESTO](MANIFESTO.md) -There are a few important principles to keep in mind when building good quality Nubis images. +There are a few important principles to keep in mind when building good quality +Nubis images. ## Immutable -First and foremost, this is the key proprety of Nubis images. They should be immutable, baked-in with as much of your project as possible. +First and foremost, this is the key proprety of Nubis images. They should be +immutable, baked-in with as much of your project as possible. This should include dependencies, code, tools, all of it. -The image should contain everything it needs to perform its task from bootup, without needing to perform special tasks during startup. +The image should contain everything it needs to perform its task from bootup, +without needing to perform special tasks during startup. -For instance, if you have an application with static assets that need to be post-processed (i.e. minifying javascritp), do that at image build time, not as part of a bootup migration task. +For instance, if you have an application with static assets that need to be +post-processed (i.e. minifying javascritp), do that at image build time, not as +part of a bootup migration task. ## Repeatable -When building an image, you can use puppet, and you can use arbitrary shell commands. +When building an image, you can use puppet, and you can use arbitrary shell +commands. -But think of the build process as the build process for some software. You want to make these steps as explicit and as deterministic as possible, so that if someone else chooses to rebuild your image at a specific revision, they will get the same resulting images. +But think of the build process as the build process for some software. You want +to make these steps as explicit and as deterministic as possible, so that if +someone else chooses to rebuild your image at a specific revision, they will get +the same resulting images. -For instance, it's easy to use a shell script to download a tool you need to your image straight out of GitHub. It's simple, convenient, and a simple *wget* invocation. +For instance, it's easy to use a shell script to download a tool you need to +your image straight out of GitHub. It's simple, convenient, and a simple *wget* +invocation. -However, if you just grab that tool from the *master* branch, it means it can change from under you without warning at any time. +However, if you just grab that tool from the *master* branch, it means it can +change from under you without warning at any time. -Your image might build and function just fine today, but when it gets rebuild tommorrow with an unrelated change, you'll get a different version of that tool that might now break your application. +Your image might build and function just fine today, but when it gets rebuild +tommorrow with an unrelated change, you'll get a different version of that tool +that might now break your application. ## Distributable @@ -43,48 +60,71 @@ They should be something that could be useful to someone else as-is. Don't **ever** include any secrets of any kind, for starters. -But also make sure that there isn't anything specific to your internal project implementation hard-coded in them. +But also make sure that there isn't anything specific to your internal project +implementation hard-coded in them. -Things like domain names, usernames, email addresses and the like don't belong in images. They are piece of configuration, not intergral part of the images. +Things like domain names, usernames, email addresses and the like don't belong +in images. They are piece of configuration, not intergral part of the images. ## Configurable -Take the time to clearly identify the pieces of data the image needs to operate but can't be baked-in. +Take the time to clearly identify the pieces of data the image needs to operate +but can't be baked-in. -Nubis offers mechanism for service discovery ([Consul](CONSUL.md)) and self-configuration ([confd](CONFD.md)), make use of them. +Nubis offers mechanism for service discovery ([Consul](CONSUL.md)) and +self-configuration ([confd](CONFD.md)), make use of them. -Amazon Web Services provides the *user-data* mechanism to feed information into instances at launch time, but we advise strongly against its use. +Amazon Web Services provides the *user-data* mechanism to feed information into +instances at launch time, but we advise strongly against its use. -Every piece of data that is fed to the instance that way, in effect, creates an API into that instance that must be complied with to be able to launch a working instance. This would greatly complicate the design of a generic continuous-integration system, for instance. +Every piece of data that is fed to the instance that way, in effect, creates an +API into that instance that must be complied with to be able to launch a working +instance. This would greatly complicate the design of a generic +continuous-integration system, for instance. -This can get quickly out of hands. Add to that the fact that *user-data* can't be changed once the instance has been started, and you get a very poor configuration managment system. +This can get quickly out of hands. Add to that the fact that *user-data* can't +be changed once the instance has been started, and you get a very poor +configuration managment system. ## Black boxes -Nubis images, once launched, should be considered like black-boxes that you can't get access to, apart from it's externally defined interfaces and APIs. +Nubis images, once launched, should be considered like black-boxes that you +can't get access to, apart from it's externally defined interfaces and APIs. -Design your images with this in mind. Use the [Nubis logging mechanism](FLUENTD.md) if you need to get operational data out of the instances. +Design your images with this in mind. Use the [Nubis logging mechanism](FLUENTD.md) +if you need to get operational data out of the instances. -Build tools if you need that ability to perform operator tasks of your service, if you need them. +Build tools if you need that ability to perform operator tasks of your service, +if you need them. -Use the configuration system to create *knobs* for your service, where it makes sense. +Use the configuration system to create *knobs* for your service, where it makes +sense. -Want to be able to turn your web application into read-only mode while a traffic spike is going on? Make that into a configuration parameter. +Want to be able to turn your web application into read-only mode while a traffic +spike is going on? Make that into a configuration parameter. -Want to be able to blacklist certain IPs from your service? Make that into a configuration list. +Want to be able to blacklist certain IPs from your service? Make that into a +configuration list. -Want to be able to enable debugging output for a certain username? Make that into a user-proprety of the system, or a configuration list. +Want to be able to enable debugging output for a certain username? Make that +into a user-proprety of the system, or a configuration list. Assume you'll never have *ssh* access into the instances running in production. ## Absolutely no persistent data -Running instances are disposable assets, and may be killed at any time, replaced with new fresh copies. +Running instances are disposable assets, and may be killed at any time, replaced +with new fresh copies. -This means that any data that is locally stored on the instance can vanish at any time. +This means that any data that is locally stored on the instance can vanish at +any time. -Do not assume any kind of persistence for the data you store locally on the instance. If it's important data, **do not** store it locally, hand it off to a service that's meant for persistency. +Do not assume any kind of persistence for the data you store locally on the +instance. If it's important data, **do not** store it locally, hand it off to a +service that's meant for persistency. -Use a database, Amazon RDS, Amazon S3, Amazon EFS, Nubis Storage, ship the logs away, etc. +Use a database, Amazon RDS, Amazon S3, Amazon EFS, Nubis Storage, ship the logs +away, etc. -AWS does allow for some level or persistency for instance storage, but it should be avoided as much as possible. +AWS does allow for some level or persistency for instance storage, but it should +be avoided as much as possible. diff --git a/MANIFESTO.md b/MANIFESTO.md index b7d4955..ea29bd3 100644 --- a/MANIFESTO.md +++ b/MANIFESTO.md @@ -1,93 +1,171 @@ -# Nubis - To the cloud, we are going! + -## A Design Manifesto, by Mozillians. +# Nubis - To the cloud, we are going + +## A Design Manifesto, by Mozillians ## Open by default -This is about everything; code, process, artifacts. The idea here is to treat Nubis as an open-source project, that can (and should) be contributed to by all. +This is about everything; code, process, artifacts. The idea here is to treat +Nubis as an open-source project, that can (and should) be contributed to by all. -To achieve this, we are hosting *all* code on GitHub, publicly. We are trying to use open-source solutions wherever possible, as our first default. We are attempting to build an infrastructure that anybody should be able to reproduce themselves. +To achieve this, we are hosting *all* code on GitHub, publicly. We are trying to +use open-source solutions wherever possible, as our first default. We are +attempting to build an infrastructure that anybody should be able to reproduce +themselves. -We are also using public puppet modules for as much of the provisioning as possible, contributing upstream when necessary, and forking only when absolutely required. This way, we encourage reuse of public modules, and improve the ones that we find deficient. No more single-use recipes. +We are also using public puppet modules for as much of the provisioning as +possible, contributing upstream when necessary, and forking only when absolutely +required. This way, we encourage reuse of public modules, and improve the ones +that we find deficient. No more single-use recipes. -The only things contain secrets are the things that *need* to. Amazon credentials, secret keys and things that are *specific* to our deployments of Nubis. There should be no hard coded secrets in puppet modules or elsewhere. +The only things contain secrets are the things that *need* to. Amazon +credentials, secret keys and things that are *specific* to our deployments of +Nubis. There should be no hard coded secrets in puppet modules or elsewhere. ## Dynamic discovery -We have chosen to sidestep configuration management by building machine components designed for dynamic discovery upfront. This leaves the run-time configuration to dynamic discovery instead of a more traditional configuration management system. +We have chosen to sidestep configuration management by building machine +components designed for dynamic discovery upfront. This leaves the run-time +configuration to dynamic discovery instead of a more traditional configuration +management system. -This is a very different approach to system design for us. However it has been used before successfully and we believe it is the best approach. When you start thinking about systems as throw-away components, it becomes increasingly difficult to manage and configure them using processes designed for static infrastructure. +This is a very different approach to system design for us. However it has been +used before successfully and we believe it is the best approach. When you start +thinking about systems as throw-away components, it becomes increasingly +difficult to manage and configure them using processes designed for static +infrastructure. -If we succeed in this, it will mean that we will be able to create an infrastructure that is capable of adapting to changes in its environment in near-realtime. This will give operators a lot more flexibility and freedom when solving (more interesting) problems up the stack. +If we succeed in this, it will mean that we will be able to create an +infrastructure that is capable of adapting to changes in its environment in +near-realtime. This will give operators a lot more flexibility and freedom when +solving (more interesting) problems up the stack. ## Auto-Scaling by default -One of the things that the cloud provides for is easy deployments, which can provide easy scaling. We want to take full advantage of this possibility by default. This means that the norm will be to make systems auto-scalable and the ones that simply can not auto scale will be the exception. +One of the things that the cloud provides for is easy deployments, which can +provide easy scaling. We want to take full advantage of this possibility by +default. This means that the norm will be to make systems auto-scalable and the +ones that simply can not auto scale will be the exception. -In some cases this could mean designing systems with a little more complexity than single-system equivalents. However, if we build the right tools and frameworks this increase in complexity will be small and well worth the advantages. +In some cases this could mean designing systems with a little more complexity +than single-system equivalents. However, if we build the right tools and +frameworks this increase in complexity will be small and well worth the +advantages. -Who doesn't want to run an infrastructure where every component at every layer is able to grow and shrink to adapt to demand? Handling a sudden burst of 10x the usual traffic should be the norm, not the exception. +Who doesn't want to run an infrastructure where every component at every layer +is able to grow and shrink to adapt to demand? Handling a sudden burst of 10x +the usual traffic should be the norm, not the exception. ## Immutable Servers -In the cloud, servers are disposable resources. They can come and go in mere seconds, sometimes outside of our control. One can try and fight this, or one can chose to embrace it. We have decided to fully embrace this very unique feature. +In the cloud, servers are disposable resources. They can come and go in mere +seconds, sometimes outside of our control. One can try and fight this, or one +can chose to embrace it. We have decided to fully embrace this very unique +feature. -We have chosen to think of individual servers as immutable components. That means building system images that contain everything needed to run a given service, upfront. This also means no software upgrades on running systems and no deployments on running systems. We must learn to think of the servers we run as black boxes into which we have no write capabilities (yes, we know, this is an ideal). +We have chosen to think of individual servers as immutable components. That +means building system images that contain everything needed to run a given +service, upfront. This also means no software upgrades on running systems and no +deployments on running systems. We must learn to think of the servers we run as +black boxes into which we have no write capabilities (yes, we know, this is an +ideal). -This is another change in the way traditional IT has operated in the past. This will be a learning experience. However, it will force us to think in terms of well defined service components. This will also force us to think hard about what knobs *really* need to be tunable at run-time as opposed to the ones that should be changed through a full build, test and deploy process. +This is another change in the way traditional IT has operated in the past. This +will be a learning experience. However, it will force us to think in terms of +well defined service components. This will also force us to think hard about +what knobs *really* need to be tunable at run-time as opposed to the ones that +should be changed through a full build, test and deploy process. ## Reusable by design -Individual systems should aspire to perform one task, and only one task. Ideally doing it really well. +Individual systems should aspire to perform one task, and only one task. Ideally +doing it really well. -These systems should be designed for efficiency and be as general purpose as possible. We should be able to build, say, one memcache component and builders of systems should be able to use and re-use it for their own purposes, without having to re-engineer it over and over again. +These systems should be designed for efficiency and be as general purpose as +possible. We should be able to build, say, one memcache component and builders +of systems should be able to use and re-use it for their own purposes, without +having to re-engineer it over and over again. ## Decoupled by default -Even if a working system is composed of many components, each of these should have the minimal knowledge possible about all the other components, for example: - -* A system should ship its logs off to another system, but should not have to know what is going to be processing those logs at the other end. - -* A system should expose telemetry data, but not know what is going to be consuming it, if anything at all. - -* A system should expose to others the services it offers, the tunables it recognizes, as well as the services it is looking for. +Even if a working system is composed of many components, each of these should +have the minimal knowledge possible about all the other components, for example: -* A system should discover its own environment and adapt to it dynamically, rather than statically. +* A system should ship its logs off to another system, but should not have to + know what is going to be processing those logs at the other end. +* A system should expose telemetry data, but not know what is going to be + consuming it, if anything at all. +* A system should expose to others the services it offers, the tunables it + recognizes, as well as the services it is looking for. +* A system should discover its own environment and adapt to it dynamically, + rather than statically. This is one of the key concepts that will make reusability possible. ## Isolated by default -Whenever possible, systems should be built and deployed in isolation. Isolation means expecting to be deployed alone, being explicit about external dependencies and externally offered services. +Whenever possible, systems should be built and deployed in isolation. Isolation +means expecting to be deployed alone, being explicit about external dependencies +and externally offered services. -When two different systems need to cooperate to achieve results, we need to ask ourselves if they can be made to operate independently. If so, we need to understand that making them isolated from one another is the ideal goal and that the effort required to achieve this goal is necessary. +When two different systems need to cooperate to achieve results, we need to ask +ourselves if they can be made to operate independently. If so, we need to +understand that making them isolated from one another is the ideal goal and that +the effort required to achieve this goal is necessary. ## Deployments are the norm, not the exception -Deploying new functionality to an existing system is a normal part of a system's life. However, all too often, this is an unusual process filled with exceptions and special cases. It does not have to be, nor should it be. +Deploying new functionality to an existing system is a normal part of a system's +life. However, all too often, this is an unusual process filled with exceptions +and special cases. It does not have to be, nor should it be. -From the start of any project, continuous deployments should be the norm. The process a developer uses to quickly deploy and iterate on a project should be able to live with that project throughout its lifetime. +From the start of any project, continuous deployments should be the norm. The +process a developer uses to quickly deploy and iterate on a project should be +able to live with that project throughout its lifetime. -Deploying to production, deploying to a test setup, deploying to a different cloud provider even, should be encapsulated in the exact same process, using the same tools. +Deploying to production, deploying to a test setup, deploying to a different +cloud provider even, should be encapsulated in the exact same process, using +the same tools. -Also, from an operator's point of view, deploying a new version of something, whatever it is, should feel safe and identical. A DNS Server or a complex web stack should appear to be the same in terms of deployments and rollbacks. +Also, from an operator's point of view, deploying a new version of something, +whatever it is, should feel safe and identical. A DNS Server or a complex web +stack should appear to be the same in terms of deployments and rollbacks. ## Bit for bit repeatability -Most production systems live in multiple copies, sometimes called Development, Staging and Production. Too often these environments claim to be the same, but fail in many subtle and not so subtle ways. +Most production systems live in multiple copies, sometimes called Development, +Staging and Production. Too often these environments claim to be the same, but +fail in many subtle and not so subtle ways. -Our ideal is to have 100% bit for bit repeatability between as many parallel environment as are needed. This means we should be running in Production the exact same bits that were tested somewhere else. +Our ideal is to have 100% bit for bit repeatability between as many parallel +environment as are needed. This means we should be running in Production the +exact same bits that were tested somewhere else. -This is an ideal that will never be fully achieved, there will always be operational differences between environments. However we should strive to keep these differences as few as possible and ensure they are all known quantities, instead of possible unknowns. +This is an ideal that will never be fully achieved, there will always be +operational differences between environments. However we should strive to keep +these differences as few as possible and ensure they are all known quantities, +instead of possible unknowns. ## Testing from the start -Far to often we end up with production systems that have no way to test if they are working properly. For example, loading the main page of a web application does not mean that users can log in. +Far to often we end up with production systems that have no way to test if they +are working properly. For example, loading the main page of a web application +does not mean that users can log in. -We intend to build unit tests from the start, which will be used by our continuous integration system to test for real functionality. These tests can be further utilized by being tied into the Production monitoring system to ensure that the application is actually functioning correctly. +We intend to build unit tests from the start, which will be used by our +continuous integration system to test for real functionality. These tests can be +further utilized by being tied into the Production monitoring system to ensure +that the application is actually functioning correctly. ## Monitor the things that actually matter -Taken to the extreme we could say that is does not matter how much ram is in use or whether an instance is heavily into swap. What actually matters it whether the system is, say, responsive. +Taken to the extreme we could say that is does not matter how much ram is in use +or whether an instance is heavily into swap. What actually matters it whether +the system is, say, responsive. -Ideally we will use tests and metrics to monitor the things that actually matter in terms of *usability*. This may include data from many sources, including system resources. We will use this data to inform auto-scaling and tooling to adapt to the changing conditions. Humans will be alerted only when the system can not fix itself in an automated or scripted way. +Ideally we will use tests and metrics to monitor the things that actually matter +in terms of *usability*. This may include data from many sources, including +system resources. We will use this data to inform auto-scaling and tooling to +adapt to the changing conditions. Humans will be alerted only when the system +can not fix itself in an automated or scripted way. diff --git a/NETWORKING.md b/NETWORKING.md index e8c9ab8..5e4c1fe 100644 --- a/NETWORKING.md +++ b/NETWORKING.md @@ -1,5 +1,6 @@ + # Networking in Nubis -http://www.slideshare.net/AmazonWebServices/from-one-to-manyevolving-vpc-design -http://www.slideshare.net/AmazonWebServices/cpn208selecting-the-best-vpc-network-architecture-cpn208-aws-reinvent-2013 +[from-one-to-manyevolving-vpc-design](http://www.slideshare.net/AmazonWebServices/from-one-to-manyevolving-vpc-design) +[selecting-the-best-vpc-network-architecture](http://www.slideshare.net/AmazonWebServices/cpn208selecting-the-best-vpc-network-architecture-cpn208-aws-reinvent-2013) diff --git a/PREREQUISITES.md b/PREREQUISITES.md index e72314e..52ff41c 100644 --- a/PREREQUISITES.md +++ b/PREREQUISITES.md @@ -1,98 +1,188 @@ -## Prerequisites -Before you can contribute to the Nubis project, you'll need to have a set of tools installed. + -### GitHub Account -Nubis is entirely hosted on github, and we derive most of our workflows from GitHub's recommended practices. +# Prerequisites -If you are new to git or GitHub, you are probably better familiarizing yourself with that first. There are a lot of handy tutorials straight from the source [here](https://www.atlassian.com/git/tutorials/) +Before you can contribute to the Nubis project, you'll need to have a set of +tools installed. -First you will need to set up an account on GitHub. It doesn't have to be anything special, any old account will do. To set up a GitHub account, click [here](https://github.com/join). +## GitHub Account + +Nubis is entirely hosted on github, and we derive most of our workflows from +GitHub's recommended practices. + +If you are new to git or GitHub, you are probably better familiarizing yourself +with that first. There are a lot of handy tutorials straight from the source +[here](https://www.atlassian.com/git/tutorials/) + +First you will need to set up an account on GitHub. It doesn't have to be +anything special, any old account will do. To set up a GitHub account, +click [here](https://github.com/join). You will also want to set up your ssh keys with GitHub. You can do that [here](https://github.com/settings/ssh). -### git -All our source-control is in Git, so you'll need a git client of some kind. Most distributions include a git client that you can install with your package manager. You can also get git directly from their [downloads site](https://git-scm.com/downloads). +## git + +All our source-control is in Git, so you'll need a git client of some kind. Most +distributions include a git client that you can install with your package +manager. You can also get git directly from their [downloads site](https://git-scm.com/downloads). For apt users: + ```bash + aptitude install git + ``` -### AWS CLI -Next, you need to install the AWS CLI tool. You can install it by following the instructions at the top of [this page](http://aws.amazon.com/cli/). For Mac and Linux users you can simply: +## AWS CLI + +Next, you need to install the AWS CLI tool. You can install it by following the +instructions at the top of [this page](http://aws.amazon.com/cli/). For Mac and +Linux users you can simply: + ```bash + pip install awscli + ``` Homebrew users: ```bash + brew install awscli + ``` -### AWS Credentials -In order to work with AWS you will need to set up some credentials. This is a somewhat involved process as all access to AWS requires utilizing a multi-factor authentication (MFA) device. When your user is added to an account you will receive an encrypted email containing a key pair. +## AWS Credentials + +In order to work with AWS you will need to set up some credentials. This is a +somewhat involved process as all access to AWS requires utilizing a multi-factor +authentication (MFA) device. When your user is added to an account you will +receive an encrypted email containing a key pair. + +NOTE: These keys need to remain secret. You need to take the utmost care; DO NOT +check them into git, send them via unencrypted email, copy them into a pastebin, +etcetera. -NOTE: These keys need to remain secret. You need to take the utmost care; DO NOT check them into git, send them via unencrypted email, copy them into a pastebin, etcetera. +### aws-vault -#### aws-vault -[aws-vault](https://github.com/99designs/aws-vault) is a tool to securely manage AWS API credentials. You will need to download this tool and place it on your path. +[aws-vault](https://github.com/99designs/aws-vault) is a tool to securely manage +AWS API credentials. You will need to download this tool and place it on your +path. -Once installed you will use the aws-vault tool to authenticate for all access and actions within AWS. Fist you will need to set up your MFA device. +Once installed you will use the aws-vault tool to authenticate for all access +and actions within AWS. Fist you will need to set up your MFA device. Lets start by making sure aws-vault is installed and working correctly: + ```bash + aws-vault --version + ``` + This should return something like ```v3.3.0```. -If you are using linux you need to set your backend to use kwallet. I recommend placing this in one of your startup scripts, say ```~/.bashrc```: +If you are using linux you need to set your backend to use kwallet. I recommend +placing this in one of your startup scripts, say ```~/.bashrc```: + ```bash + export AWS_VAULT_BACKEND=kwallet + ``` -The next thing I like to do is set up some local shell variables to make the following commands a bit simpler. Of course you will need to replace the ```ACCOUNT_NAME```, ```ACCOUNT_NUMBER``` and ```LOGIN``` with the ones you received in the user credentials email. +The next thing I like to do is set up some local shell variables to make the +following commands a bit simpler. Of course you will need to replace +the ```ACCOUNT_NAME```, ```ACCOUNT_NUMBER``` and ```LOGIN``` with the ones you +received in the user credentials email. + ```bash + ACCOUNT_NAME='nubis-training-2016'; ACCOUNT_NUMBER='517826968395'; LOGIN='jcrowe' + ``` -Now you can run the aws-vault command to set up the account. This will ask you for the 'Access Key ID' and the 'Secret Access Key'. You will need to get those from the user credentials email as well: +Now you can run the aws-vault command to set up the account. This will ask you +for the 'Access Key ID' and the 'Secret Access Key'. You will need to get those +from the user credentials email as well: + ```bash + aws-vault add ${ACCOUNT_NAME} + ``` It should look something like this (keys redacted): + ```bash + Enter Access Key ID: AKXXXXXXXXXXXXXXXXXX Enter Secret Access Key: jF6EfXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX Added credentials to profile "nubis-training-2016" in vault + ``` Now it is time to create your virtual MFA device: + ```bash -aws-vault exec -n ${ACCOUNT_NAME} -- aws iam create-virtual-mfa-device --virtual-mfa-device-name ${LOGIN} --outfile ${LOGIN}.png --bootstrap-method QRCodePNG + +aws-vault exec -n ${ACCOUNT_NAME} -- \ +aws iam create-virtual-mfa-device \ +--virtual-mfa-device-name ${LOGIN} \ +--outfile ${LOGIN}.png \ +--bootstrap-method QRCodePNG + ``` -You should see output similar to the following. The number here should correspond to the account number and the 'jcrowe' part should be your user-name (not mine ;-D): +You should see output similar to the following. The number here should +correspond to the account number and the 'jcrowe' part should be your user-name +(not mine ;-D): + ```bash + { "VirtualMFADevice": { "SerialNumber": "arn:aws:iam::517826968395:mfa/jcrowe" } } + ``` -You will need to view the ${LOGIN}.png file and use it to configure your MFA application. If you have imagemagic installed you can try ```display $LOGIN.png``` or just open it with any image viewer. I use the [duo mobile app](https://duo.com/solutions/features/two-factor-authentication-methods/duo-mobile), however the [google authenticator app](https://play.google.com/store/apps/details?id=com.google.android.apps.authenticator2&hl=en) works as well. Really, most MFA apps work here so if you have one you already use or prefer it will probably be just fine. +You will need to view the ${LOGIN}.png file and use it to configure your MFA +application. If you have imagemagic installed you can try ```display$LOGIN.png``` +or just open it with any image viewer. I use the [duo mobile app](https://duo.com/solutions/features/two-factor-authentication-methods/duo-mobile), +however the [google authenticator app](https://play.google.com/store/apps/details?id=com.google.android.apps.authenticator2&hl=en) +works as well. Really, most MFA apps work here so if you have one you already +use or prefer it will probably be just fine. -To finish up, you need to enable the mfa device. This step is basically proving that you are you and everything is configured correctly. You will need to provide two sequential MFA codes. You get these codes from the MFA application you just set up. +To finish up, you need to enable the mfa device. This step is basically proving +that you are you and everything is configured correctly. You will need to +provide two sequential MFA codes. You get these codes from the MFA application +you just set up. + +NOTE: The codes must be sequential and entered in the correct order in the +following command. (Replace ```123456``` and ```654321``` with codes from +your app): -NOTE: The codes must be sequential and entered in the correct order in the following command. (Replace ```123456``` and ```654321``` with codes from your app): ```bash -aws-vault exec -n ${ACCOUNT_NAME} -- aws iam enable-mfa-device --user-name ${LOGIN} --serial-number arn:aws:iam::${ACCOUNT_NUMBER}:mfa/${LOGIN} --authentication-code-1 123456 --authentication-code-2 654321 + +aws-vault exec -n ${ACCOUNT_NAME} -- \ +aws iam enable-mfa-device \ +--user-name ${LOGIN} \ +--serial-number arn:aws:iam::${ACCOUNT_NUMBER}:mfa/${LOGIN} \ +--authentication-code-1 123456 \ +--authentication-code-2 654321 + ``` -You need to configure your AWS CLI tools to make use of the virtual MFA device. You can either add this to your ```~/.aws/config``` file manually or run the following bash snippet: +You need to configure your AWS CLI tools to make use of the virtual MFA device. +You can either add this to your ```~/.aws/config``` file manually or run the +following bash snippet: + ```bash + AWS_CONFIG_FILE=~/.aws/config cat >>${AWS_CONFIG_FILE} <jq is like sed for JSON data – you can use it to slice and filter and map and transform structured data with the same ease that sed, awk, grep and friends let you play with text. +### jq -You can install it by following the instructions on the [download](https://stedolan.github.io/jq/download/) page. +We use [jq](https://stedolan.github.io/jq/) to munge [JSON](http://json.org/) +data from within [Bash](http://www.gnu.org/software/bash/). From the [jq site](https://stedolan.github.io/jq/): +>jq is like sed for JSON data – you can use it to slice and filter and map and + transform structured data with the same ease that sed, awk, grep and friends + let you play with text. -NOTE: You need at least version 1.4 of jq. If your package manager does not have a recent enough version you will need to install it manually following the instructions above. +You can install it by following the instructions on the [download](https://stedolan.github.io/jq/download/) +page. + +NOTE: You need at least version 1.4 of jq. If your package manager does not have +a recent enough version you will need to install it manually following the +instructions above. For Linux users you can: + ```bash + aptitude install jq + ``` Homebrew users: ```bash + brew install jq + ``` -#### Packer -[Packer](https://www.packer.io/) (from Hashicorp) is the image building tool we use to build the Nubis system images. +### Packer -Built in Go, it's a simple .zip file to [download](https://www.packer.io/downloads.html) with static binaries in it. No dependencies or installation pain. Simply follow the instruction [here](https://www.packer.io/docs/installation.html). +[Packer](https://www.packer.io/) (from Hashicorp) is the image building tool we +use to build the Nubis system images. + +Built in Go, it's a simple .zip file to [download](https://www.packer.io/downloads.html) +with static binaries in it. No dependencies or installation pain. Simply follow +the instruction [here](https://www.packer.io/docs/installation.html). NOTE: You need packer version v0.8.1 or newer. Homebrew users (requires Caskroom): + ```bash + brew install caskroom/cask/brew-cask brew install packer + ``` -#### Setup Path -While this step is not mandatory, it sure is convenient to have the nubis-builder tools on your path. You can do this one time by: +### Setup Path + +While this step is not mandatory, it sure is convenient to have the +nubis-builder tools on your path. You can do this one time by: + ```bash + PATH=/path/to/your/clone/of/nubis-builder/bin:$PATH + ``` -You can make this automatic on login by adding it to the bottom of your ```~/.bashrc``` file: + +You can make this automatic on login by adding it to the bottom of your + +```~/.bashrc``` file: + ```bash + echo "PATH=/path/to/your/clone/of/nubis-builder/bin:$PATH" >> ~/.bashrc + ``` -Of course in both of these examples you will need to change */path/to/your/clone/of* to the actual path on your system. -### Terraform (0.6.16+) -Get it from [Terraform.io](https://www.terraform.io/downloads.html). We use Terraform for deploying everything from the account to the application. If you are interested in the decision to use Terraform over Cloudformation you can read about it [here](./TEMPLATING.MD). +Of course in both of these examples you will need to change +*/path/to/your/clone/of* to the actual path on your system. + +## Terraform (0.6.16+) + +Get it from [Terraform.io](https://www.terraform.io/downloads.html). We use +Terraform for deploying everything from the account to the application. If you +are interested in the decision to use Terraform over Cloudformation you can read +about it [here](./TEMPLATING.MD). It's a simple Go binary bundle, just unzip and drop in your $PATH @@ -197,14 +337,24 @@ Make sure you obtain at least version 0.6.16, but less than 0.7. Try [this path](https://releases.hashicorp.com/terraform/0.6.16/). -### credstash (1.11.0+) -[Credstash](https://github.com/fugue/credstash) is a tool for managing our secrets into DynamoDB and KMS. It's a dependency we are hoping to get rid of, but for now, you'll need in your $PATH as well. +## credstash (1.11.0+) + +[Credstash](https://github.com/fugue/credstash) is a tool for managing our +secrets into DynamoDB and KMS. It's a dependency we are hoping to get rid of, +but for now, you'll need in your $PATH as well. It's a Python PIP package, so assuming you have a working Python, just do ```bash + pip install "credstash>=1.11.0" + ``` -### Fin -That should be all you need to get started. If you run into any issue or have any trouble at all please reach out to us. We are happy to help and are quite interested in improving the project in any way we can. We are on irc.mozilla.org in #nubis-users or you can reach us on the mailing list at nubis-users[at]googlegroups.com +## Fin + +That should be all you need to get started. If you run into any issue or have +any trouble at all please reach out to us. We are happy to help and are quite +interested in improving the project in any way we can. We are on irc.mozilla.org +in #nubis-users or you can reach us on the mailing list at +nubis-users[at]googlegroups.com diff --git a/PROJECT_ONBOARDING.md b/PROJECT_ONBOARDING.md index df6941f..433a383 100644 --- a/PROJECT_ONBOARDING.md +++ b/PROJECT_ONBOARDING.md @@ -1,60 +1,125 @@ -## Project Onboarding + -This document is hear to guide us through the process of on-boarding your application in AWS using the [Nubis](https://github.com/Nubisproject) project. +# Project Onboarding + +This document is hear to guide us through the process of on-boarding your +application in AWS using the [Nubis](https://github.com/Nubisproject) project. + +## Checklist -### Checklist The steps we will take during this process are: - 1. You [gather information](PROJECT_ONBOARDING.md#gather-information) - 1. We [create your AWS account](PROJECT_ONBOARDING.md#create-account) in the Sandbox - 1. You [generate your AWS API keys](PROJECT_ONBOARDING.md#generate-api-keys) - 1. Everyone [meets](PROJECT_ONBOARDING.md#meeting-time) to discuss architectural requirements / design - 1. You [build out](PROJECT_ONBOARDING.md#application-build-out) your application in the Sandbox - 1. You initiate the [promotion to Dev](PROJECT_ONBOARDING.md#promote-to-dev) process - 1. You initiate the [promotion to Prod](PROJECT_ONBOARDING.md#promote-to-prod) process -### Gather Information -We need to know a few pieces of information about your application. This is used to track resources, for troubleshooting and for billing purposes. If you do not have all of this information, do not worry, we can help you figure it out. Once you have gathered all of this information you simply fill out [this form](link to Bugzilla form) to kick off the rest of the process. +1. You [gather information](PROJECT_ONBOARDING.md#gather-information) +1. We [create your AWS account](PROJECT_ONBOARDING.md#create-account) + in the Sandbox +1. You [generate your AWS API keys](PROJECT_ONBOARDING.md#generate-api-keys) +1. Everyone [meets](PROJECT_ONBOARDING.md#meeting-time) to discuss architectural + requirements / design +1. You [build out](PROJECT_ONBOARDING.md#application-build-out) your application + in the Sandbox +1. You initiate the [promotion to Dev](PROJECT_ONBOARDING.md#promote-to-dev) + process +1. You initiate the [promotion to Prod](PROJECT_ONBOARDING.md#promote-to-prod) + process + +## Gather Information + +We need to know a few pieces of information about your application. This is used +to track resources, for troubleshooting and for billing purposes. If you do not +have all of this information, do not worry, we can help you figure it out. Once +you have gathered all of this information you simply fill out +[this form](link to ServiceNow form) to kick off the rest of the process. -> Link to Bugzilla form (or list here the instructions for them to send the information to us). I am suggesting we create a Bugzilla form with inputs specifically for this information. When the user fills out this form it will both initiate the on-boarding process as well as provide us with the information we need to get started. I suggest two additional Bugzilla forms later in this document. +> Link to Bugzilla form (or list here the instructions for them to send the +information to us). I am suggesting we create a Bugzilla form with inputs +specifically for this information. When the user fills out this form it will +both initiate the on-boarding process as well as provide us with the information +we need to get started. I suggest two additional Bugzilla forms later in this +document. - The information we need from you is: - 1. Name of your project (AKA the "Service Name") [found here] (https://inventory.mozilla.org/en-US/core/service/) - 1. Email address of the ["Technical Owner"](link to email address requirements) - 1. List of people who should attend the kickoff Meeting +The information we need from you is: + +1. Name of your project (AKA the "Service Name") [found here](https://inventory.mozilla.org/en-US/core/service/) +1. Email address of the ["Technical Owner"](link to email address requirements) +1. List of people who should attend the kickoff Meeting + +## Create Account -### Create Account As soon as you submit the above information to us we will create your AWS account. - > Will we use the Technical Owner email address for this? If not provide details here about how we will provide the information to them. +> Will we use the Technical Owner email address for this? If not provide details + here about how we will provide the information to them. Once we have done that we will notify you by... > (how?). -### Generate API Keys -Once you are logged into your new account you will need to generate an API key pair by following the instructions [here](http://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSGettingStartedGuide/AWSCredentials.html). You will use this key pair to deploy resources (such as EC2 instances) into the sandbox. You should keep in mind that these keys are [secret](https://github.com/Nubisproject/nubis-docs/blob/master/SECURITY.md) and should not be shared with anyone. +## Generate API Keys + +Once you are logged into your new account you will need to generate an API key +pair by following the instructions [here](http://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSGettingStartedGuide/AWSCredentials.html). +You will use this key pair to deploy resources (such as EC2 instances) into the +sandbox. You should keep in mind that these keys are [secret](https://github.com/Nubisproject/nubis-docs/blob/master/SECURITY.md) +and should not be shared with anyone. -### Meeting Time -We will schedule a kickoff meeting with all the necessary folks so we can all sit down and determine how we can help you to succeed. In this meeting we will discuss; design requirements, architectural needs, best practices and so on. Not to worry, we have a presentation all set up and will try to make this as painless as possible for you and your team. +## Meeting Time -### Application Build Out -Now that we have a design it is time to build the resources necessary to support your application. To assist you with your application build out we have prepared a number of documents. +We will schedule a kickoff meeting with all the necessary folks so we can all +sit down and determine how we can help you to succeed. In this meeting we will +discuss; design requirements, architectural needs, best practices and so on. Not +to worry, we have a presentation all set up and will try to make this as +painless as possible for you and your team. - * First you will want to check out our [System Overview](https://github.com/Nubisproject/nubis-docs/blob/master/SYSTEM_OVERVIEW.md) document. It will give you a general sense of how all the pieces fit together. - * Then I recommend you take a peek at our [Git & GitHub](https://github.com/Nubisproject/nubis-docs/blob/master/GIT_GITHUB.md) doc. This will aid you in setting up your repository for deploying with the Nubis Project. - * [Prerequisites](https://github.com/Nubisproject/nubis-docs/blob/master/PREREQUISITES.md) will get you set up with all the necessary tools. - * Next you should peruse our [Nubis Builder](https://github.com/Nubisproject/nubis-builder/blob/master/README.md) document which covers building AMIs using [Packer](https://www.packer.io/) and [Puppet](https://puppetlabs.com/). - * Finally you should take a gander at out [CloudFormation](https://github.com/Nubisproject/nubis-docs/blob/master/CLOUDFORMATION.md) document which covers things like using nested stacks to simplify your CloudFormation templates. +## Application Build Out -
-Armed with this informtation you can start to craft a CloudFormation template for your application. For an example check out the nubis-mediawiki project [template](https://github.com/Nubisproject/nubis-mediawiki/blob/master/nubis/cloudformation/main.json). - -### Promote to Dev -The next step on the road to getting your application into production is to initiate the process to get it deployed into Dev. This should be a super simple process as long as you followed the best practices mentioned above. If so, you simply fill out [this little form](link to another Bugzilla form for promotion process) and we will do the rest. We will be setting up a Continuous Integration (CI) system to deploy your project into Dev. This CI system will deploy your application using the exact same CloudFormation templates that you use to deploy in the Sandbox. The process will go something like [this](https://mana.mozilla.org/wiki/display/EA/Environment+promotion). +Now that we have a design it is time to build the resources necessary to support +your application. To assist you with your application build out we have prepared +a number of documents. -### Promote to Prod -Once your application is running in Dev it is time for you to do your User Acceptance Testing (UAT). Once you have completed your UAT and you are ready to promote your application into production, simply fill out this [form](link to Bugzilla form for production promotion). We will then set up the CI system to deploy your application into Production. We will also schedule the following meetings +* First you will want to check out our [System Overview](https://github.com/Nubisproject/nubis-docs/blob/master/SYSTEM_OVERVIEW.md) + document. It will give you a general sense of how all the pieces fit together. +* Then I recommend you take a peek at our [Git & GitHub](https://github.com/Nubisproject/nubis-docs/blob/master/GIT_GITHUB.md) + doc. This will aid you in setting up your repository for deploying with the + Nubis Project. +* [Prerequisites](https://github.com/Nubisproject/nubis-docs/blob/master/PREREQUISITES.md) + will get you set up with all the necessary tools. +* Next you should peruse our [Nubis Builder](https://github.com/Nubisproject/nubis-builder/blob/master/README.md) + document which covers building AMIs using [Packer](https://www.packer.io/) + and [Puppet](https://puppetlabs.com/). +* Finally you should take a gander at out [CloudFormation](https://github.com/Nubisproject/nubis-docs/blob/master/CLOUDFORMATION.md) + document which covers things like using nested stacks to simplify your + CloudFormation templates. + +Armed with this informtation you can start to craft a CloudFormation template +for your application. For an example check out the nubis-mediawiki project +[template](https://github.com/Nubisproject/nubis-mediawiki/blob/master/nubis/cloudformation/main.json). + +## Promote to Dev + +The next step on the road to getting your application into production is to +initiate the process to get it deployed into Dev. This should be a super simple +process as long as you followed the best practices mentioned above. If so, you +simply fill out [this little form](link to another Bugzilla form for promotion process) +and we will do the rest. We will be setting up a Continuous Integration (CI) +system to deploy your project into Dev. This CI system will deploy your +application using the exact same CloudFormation templates that you use to deploy +in the Sandbox. The process will go something like [this](https://mana.mozilla.org/wiki/display/EA/Environment+promotion). + +## Promote to Prod + +Once your application is running in Dev it is time for you to do your User +Acceptance Testing (UAT). Once you have completed your UAT and you are ready to +promote your application into production, simply fill out this +[form](link to Bugzilla form for production promotion). +We will then set up the CI system to deploy your application into Production. +We will also schedule the following meetings > what meetings go here? make list CAB, what else? -During these meetings we will work with you to schedule the actual go-live event. This typically includes things like coordinating with the Mozilla Operations Center (MOC) and scheduling final content sync and DNS cut-over. +During these meetings we will work with you to schedule the actual go-live +event. This typically includes things like coordinating with the Mozilla +Operations Center (MOC) and scheduling final content sync and DNS cut-over. + +## Winning -### Winning -That is all there is to it. If you have any feedback on this document, this process, or anything Nubis Project related please feel free to drop us a line. We are in #infra-aws on [irc](irc.mozilla.org) or you can shoot us an [email](mailto:infra-aws@mozilla.org). Happy clouding. \ No newline at end of file +That is all there is to it. If you have any feedback on this document, this +process, or anything Nubis Project related please feel free to drop us a line. +We are in #infra-aws on [irc](irc.mozilla.org) or you can shoot us an +[email](mailto:infra-aws@mozilla.org). Happy clouding. diff --git a/PUPPET.md b/PUPPET.md index c6578fd..ea47697 100644 --- a/PUPPET.md +++ b/PUPPET.md @@ -1,82 +1,129 @@ -# Nubis - Puppet Best Practices + -We use packer (http://packer.io) to build our system images, but under the hood, we use puppet-masterless to do the heavy lifting. +# Nubis - Puppet Best Practices -First, let's be clear, you do not have to use puppet at all to build images, you could simply rely on packer's provisionners, copying files to the image and running arbitrary shell commands. +We use [Packer](http://packer.io) to build our system images, but under the hood, +we use puppet-masterless to do the heavy lifting. + +First, let's be clear, you do not have to use puppet at all to build images, you +could simply rely on packer's provisionners, copying files to the image and +running arbitrary shell commands. However, that is not the recommended mechanism to deliver quality Nubis images. -Keeping to just one tool like puppet keeps things much cleaner. But, being the declarative system that it is, puppet makes it a much better tool to express *intent* than a collection of shell scripts can do. +Keeping to just one tool like puppet keeps things much cleaner. But, being the +declarative system that it is, puppet makes it a much better tool to express +*intent* than a collection of shell scripts can do. -We are trying very hard to keep our image projects maintainable, and upgradeable, and sticking to simple puppet recipes and good puppet modules will help us achieve that. +We are trying very hard to keep our image projects maintainable, and +upgradeable, and sticking to simple puppet recipes and good puppet modules will +help us achieve that. ## nubis/puppet/main.pp -This is your project's starting point for Puppet. *nubis-builder* will detect the presence of a folder in your project and automatically upload its content onto the provisioning instance and invoke puppet-masterless with your main.pp as the manifest to apply. +This is your project's starting point for Puppet. *nubis-builder* will detect +the presence of a folder in your project and automatically upload +its content onto the provisioning instance and invoke puppet-masterless with +your main.pp as the manifest to apply. -We recommend you keep that *main.pp* manifest to only *include* statements of other manifests in your nubis/puppet folder, keeping components separated. +We recommend you keep that *main.pp* manifest to only *include* statements of +other manifests in your nubis/puppet folder, keeping components separated. ## nubis/puppet -You can put as many manifests as you want in there, but we recommend you try and group them by logical components. If the application you are building requires a webserver, a smtp server and an api service, why not structure it like: +You can put as many manifests as you want in there, but we recommend you try and +group them by logical components. If the application you are building requires a +webserver, a smtp server and an api service, why not structure it like: - nubis/puppet/main.pp - nubis/puppet/webserver.pp - nubis/puppet/smtp.pp - nubis/puppet/api.pp +* nubis/puppet/main.pp +* nubis/puppet/webserver.pp +* nubis/puppet/smtp.pp +* nubis/puppet/api.pp -This way, you keep things nice and separated. If you discover that your webserver manifest needs to know something about the smtp manifest, try and not refer from one to the next. Instead, use the main.pp manifest to glue them together, passing values in and out of them, if necessary. +This way, you keep things nice and separated. If you discover that your +webserver manifest needs to know something about the smtp manifest, try and not +refer from one to the next. Instead, use the main.pp manifest to glue them +together, passing values in and out of them, if necessary. ## nubis/puppet/files -If you place files under *nubis/puppet/files*, they will be automatically copied over the instance, before puppet is invoked. +If you place files under *nubis/puppet/files*, they will be automatically copied +over the instance, before puppet is invoked. -Once puppet runs, you can access the files you placed in there with the usual puppet syntax in your puppet .pp files: +Once puppet runs, you can access the files you placed in there with the usual +puppet syntax in your puppet .pp files: source => "puppet:///nubis/files/my-file" ## nubis/puppet/templates -If you place templates under *nubis/puppet/templates*, they will be automatically copied over the instance, before puppet is invoked. +If you place templates under *nubis/puppet/templates*, they will be +automatically copied over the instance, before puppet is invoked. -Once puppet runs, you can access the templates you placed in there with the usual puppet syntax in your puppet .pp files: +Once puppet runs, you can access the templates you placed in there with the +usual puppet syntax in your puppet .pp files: source => "puppet:///nubis/templates/my-file" ## puppet-modules -You have access to many puppet modules when building Nubis projects, they are baked into the base images and made available automatically when you build images of your own. +You have access to many puppet modules when building Nubis projects, they are +baked into the base images and made available automatically when you build +images of your own. -For the most up-to-date list of modules available, see https://github.com/Nubisproject/nubis-base/blob/master/nubis/Puppetfile +For the most up-to-date list of modules available, see this [Puppetfile](https://github.com/Nubisproject/nubis-base/blob/master/nubis/Puppetfile) -If you believe you could benefit from a new puppet module, or a newer version of an included one, just head over to https://github.com/Nubisproject/nubis-base and file an issue. +If you believe you could benefit from a new puppet module, or a newer version of +an included one, just head over to [nubis-base](https://github.com/Nubisproject/nubis-base) +and file an issue. ## nubis/Puppetfile -You can create your own *Puppetfile* in your project, and *nubis-builder* will use *puppet-librarian* to include that puppet module in the image you are building. +You can create your own *Puppetfile* in your project, and *nubis-builder* will +use *puppet-librarian* to include that puppet module in the image you are +building. -This is an excellent way to include a custom module in your build process or to use a different module than the one that shipd with the *nubis-base* ami. Any modules you include will overwrite existing modules, this is matched based on the module (directory) name with no further interrogation. +This is an excellent way to include a custom module in your build process or to +use a different module than the one that shipd with the *nubis-base* ami. Any +modules you include will overwrite existing modules, this is matched based on +the module (directory) name with no further interrogation. ## Multi-OS puppet Nubis supports Amazon Linux & Ubuntu as base OSes. -If you are writing complex puppet code, try and keep it OS agnostic where possible. And if not, keep this in mind and ensure your puppet code will work for all the supported OSes. +If you are writing complex puppet code, try and keep it OS agnostic where +possible. And if not, keep this in mind and ensure your puppet code will work +for all the supported OSes. ## Use existing modules -Try and not reinvent the wheel, and make good use of existing puppet modules. They are great time savers, and shrink the amount of puppet-foo we need to support and maintain. +Try and not reinvent the wheel, and make good use of existing puppet modules. +They are great time savers, and shrink the amount of puppet-foo we need to +support and maintain. -If you find a module that almost does what you want, but not quite, consider modifying the module itself and submitting it back upstream instead. +If you find a module that almost does what you want, but not quite, consider +modifying the module itself and submitting it back upstream instead. ## latest vs. present -This is an area of some debate. The purely declarative approach would advocate whenever installing a package, you should describe precisely which version of it you expect. The pragmatic would tell you to just stick to 'latest' and this way, your images are always up to date. Deciding which way to go is up to your team, however there are a few things worth considering. +This is an area of some debate. The purely declarative approach would advocate +whenever installing a package, you should describe precisely which version of it +you expect. The pragmatic would tell you to just stick to 'latest' and this way, +your images are always up to date. Deciding which way to go is up to your team, +however there are a few things worth considering. -For all components of Nubis we have a policy that all libraries be pinned to specific versions while utilities can be pinned to latest. We consider this a fair trade-off between stability and maintainability. +For all components of Nubis we have a policy that all libraries be pinned to +specific versions while utilities can be pinned to latest. We consider this a +fair trade-off between stability and maintainability. -Things you rely very heavily on, consider pinning them down to explicit versions, but make it part of your work-flow to test and upgrade that version on a regular schedule that fits with your release cycle. +Things you rely very heavily on, consider pinning them down to explicit +versions, but make it part of your work-flow to test and upgrade that version on +a regular schedule that fits with your release cycle. -Things you need present, but otherwise don't care about them much (i.e. I need *unzip* installed), pin them to 'latest', this way, every time you rebuild your images, you get the newest versions automatically. +Things you need present, but otherwise don't care about them much (i.e. I need +*unzip* installed), pin them to 'latest', this way, every time you rebuild your +images, you get the newest versions automatically. -Whichever path you chose you should document it in your project and revisit the decision from time to time. +Whichever path you chose you should document it in your project and revisit the +decision from time to time. diff --git a/README.md b/README.md index 8696d2c..2afcf20 100644 --- a/README.md +++ b/README.md @@ -1,48 +1,88 @@ -# nubis-docs + -This repository is a collaborative documentation area for the Nubis project. The projects main purpose is to provide tooling and systems that enable cloud deployments to be fast, simple and secure. +# nubis-docs -Feel free to look around. We appreciate any feedback or pull requests you feel would add to this effort. +This repository is a collaborative documentation area for the Nubis project. The +projects main purpose is to provide tooling and systems that enable cloud +deployments to be fast, simple and secure. + +Feel free to look around. We appreciate any feedback or pull requests you feel +would add to this effort. ## Getting started with the Nubis Project -Welcome to the Nubis Project. We hope you will find that it meats your requirements and is easy to use. In this document I will introduce you to the Nubis Project and give you a number of links to other documents that will help you along. -The Nubis Project is in essence a collection of services that simplify the process of deploying applications to the cloud. We take care of the little things so you can focus on what matters, your application. We treat security as a first class citizen, so you can rest assured that your application will be safe in the cloud. At this time we support only Amazon Web Services (AWS). For an overview of our design principles I recommend you read our [manifesto](https://github.com/Nubisproject/nubis-docs/blob/master/MANIFESTO.md). +Welcome to the Nubis Project. We hope you will find that it meats your +requirements and is easy to use. In this document I will introduce you to the +Nubis Project and give you a number of links to other documents that will help +you along. + +The Nubis Project is in essence a collection of services that simplify the +process of deploying applications to the cloud. We take care of the little +things so you can focus on what matters, your application. We treat security as +a first class citizen, so you can rest assured that your application will be +safe in the cloud. At this time we support only Amazon Web Services (AWS). For +an overview of our design principles I recommend you read our [manifesto](https://github.com/Nubisproject/nubis-docs/blob/master/MANIFESTO.md). ### Familiarize yourself with the Nubis Project -Now, to get you up to speed with everything you will need to know to use the Nubis Project, I will provide for you a reading list. Not to worry, while this list looks long most of the documents are quite short. -* [Nubis Overview](https://github.com/Nubisproject/nubis-docs/blob/master/SYSTEM_OVERVIEW.md) Will give you an outline of the pieces of the project. -* [Git & GitHub](https://github.com/Nubisproject/nubis-docs/blob/master/GIT_GITHUB.md) provides some advice specific to Nubis. -* [CloudFormation](https://github.com/Nubisproject/nubis-docs/blob/master/CLOUDFORMATION.md) walks through some recomendations on structure and workflow. -* [Project Onbording](https://github.com/Nubisproject/nubis-docs/blob/master/PROJECT_ONBOARDING.md) will guide you through on-boarding your first application. + +Now, to get you up to speed with everything you will need to know to use the +Nubis Project, I will provide for you a reading list. Not to worry, while this +list looks long most of the documents are quite short. + +* [Nubis Overview](https://github.com/Nubisproject/nubis-docs/blob/master/SYSTEM_OVERVIEW.md) + Will give you an outline of the pieces of the project. +* [Git & GitHub](https://github.com/Nubisproject/nubis-docs/blob/master/GIT_GITHUB.md) + provides some advice specific to Nubis. +* [CloudFormation](https://github.com/Nubisproject/nubis-docs/blob/master/CLOUDFORMATION.md) + walks through some recomendations on structure and workflow. +* [Project Onbording](https://github.com/Nubisproject/nubis-docs/blob/master/PROJECT_ONBOARDING.md) + will guide you through on-boarding your first application. ### Deployment -Now that you are familiar with the project and the process it is time to get coding. The first step is to assemble your deployment repository. Then it will be time to deploy into the sandbox. -As we have seen in various examples through these documents, you will need to create a deployment repository. Take a look at the [System Overview](https://github.com/Nubisproject/nubis-docs/blob/master/SYSTEM_OVERVIEW.md) document for details. +Now that you are familiar with the project and the process it is time to get +coding. The first step is to assemble your deployment repository. Then it will +be time to deploy into the sandbox. -Once your repository is all set up the next step it to deploy into the sandbox. You can deploy by following the procedures outlined in the [Project Onbording](https://github.com/Nubisproject/nubis-docs/blob/master/PROJECT_ONBOARDING.md#Application Build Out) doc. Some example commands can be found in our trusty [nubis-mediawiki](https://github.com/Nubisproject/nubis-mediawiki/blob/master/nubis/cloudformation/README.md) repository. +As we have seen in various examples through these documents, you will need to +create a deployment repository. Take a look at the [System Overview](https://github.com/Nubisproject/nubis-docs/blob/master/SYSTEM_OVERVIEW.md) +document for details. +Once your repository is all set up the next step it to deploy into the sandbox. +You can deploy by following the procedures outlined in the [Project Onbording](https://github.com/Nubisproject/nubis-docs/blob/master/PROJECT_ONBOARDING.md#application-build-out) +doc. Some example commands can be found in our trusty [nubis-mediawiki](https://github.com/Nubisproject/nubis-mediawiki/blob/master/nubis/cloudformation/README.md) +repository. ### Bugs, Contributions and more -We are super excited to have you here. If you have stumbled into an issue there are several ways to address it. -First, you can fix the issue yourself and file a pull request. You will find a guild in our [Contributing Doc](https://github.com/Nubisproject/nubis-docs/blob/master/CONTRIBUTING.md). +We are super excited to have you here. If you have stumbled into an issue there +are several ways to address it. -Next, you can file an issue. Simply navigate to the Nubis Project organization on Github [here](https://github.com/Nubisproject), select the appropriate repository and click on the issues link. For example to file an issue against nubis-stacks you would go [here](https://github.com/Nubisproject/nubis-stacks/issues) +First, you can fix the issue yourself and file a pull request. You will find a +guild in our [Contributing Doc](https://github.com/Nubisproject/nubis-docs/blob/master/CONTRIBUTING.md). -Finally if you are looking for a new feature to be supported, simply follow the [Feature Requests](https://github.com/Nubisproject/nubis-docs/blob/master/FEATURE_REQUESTS.md) guide. +Next, you can file an issue. Simply navigate to the Nubis Project organization +on Github [here](https://github.com/Nubisproject), select the appropriate +repository and click on the issues link. For example to file an issue against +nubis-stacks you would go [here](https://github.com/Nubisproject/nubis-stacks/issues) + +Finally if you are looking for a new feature to be supported, simply follow the +[Feature Requests](https://github.com/Nubisproject/nubis-docs/blob/master/FEATURE_REQUESTS.md) +guide. --- + ## TODO + Document these things + * set up git repo - * add nubis directory + * add nubis directory * link to structure doc - * discuss packer and nubis-builder - * discuss packers use of puppet + * discuss packer and nubis-builder + * discuss packers use of puppet * describe cloudformation template system - * link to cloudformation layout doc? + * link to cloudformation layout doc? * discuss what is and is not appropriate to place in the bin directory * walk through deployment of application * need to link to set up for Nubis doc (set up aws, git, github, etc...) diff --git a/RELEASING.md b/RELEASING.md index 758afcb..14188fb 100644 --- a/RELEASING.md +++ b/RELEASING.md @@ -1,186 +1,324 @@ -# Nubis - Release Management -This is a document that helps explain all the process involved in Release Management for the Nubis project. If you are not planning to make a Nubis release, you can safely ignore this document, unless you are curious. + + +# Nubis - Release Management + +This is a document that helps explain all the process involved in Release +Management for the Nubis project. If you are not planning to make a Nubis +release, you can safely ignore this document, unless you are curious. ## Milestones -GitHub milestones are used to track work (issues) against a given Nubis release. Issues that will be part of that release *must* be assigned to the corresponding milestone. + +GitHub milestones are used to track work (issues) against a given Nubis release. +Issues that will be part of that release *must* be assigned to the +corresponding milestone. ### Versioning -The format for releases is documented in the [versioning](https://github.com/Nubisproject/nubis-docs/VERSIONING.md) doc. + +The format for releases is documented in the +[versioning](https://github.com/Nubisproject/nubis-docs/VERSIONING.md) doc. ### Code names -Release code-names might be exist, but will be used for purely cosmetic purposes. The *Happy Panda* release would just be a name for the v1.5.0 release. + +Release code-names might be exist, but will be used for purely cosmetic +purposes. The *Happy Panda* release would just be a name for the v1.5.0 release. ### Management + For each Milestone, one of the Tech Leads takes on the Release Manager Hat. -That Release Manager is responsible for triaging what makes it into that Release, with input from the rest of the team. +That Release Manager is responsible for triaging what makes it into that +Release, with input from the rest of the team. -Generally, it makes sense for each Milestone to represent a logical and descriptive amount of work. For example, a Milestone -could be about documentation, bug fixes, implementing a big new feature, refactoring, etc. +Generally, it makes sense for each Milestone to represent a logical and +descriptive amount of work. For example, a Milestone +could be about documentation, bug fixes, implementing a big new feature, +refactoring, etc. -There is no specific defined time limit for a Milestone, but when it's created and issues triaged into it, it should be -factored in. Milestones that take too much time to complete are a bad practice. Better to split up work in multiple Milestones, +There is no specific defined time limit for a Milestone, but when it's created +and issues triaged into it, it should be +factored in. Milestones that take too much time to complete are a bad practice. +Better to split up work in multiple Milestones, sequentially reached, instead of one big-bad Milestone. -For that Release, the Release Manager gains a veto, solely for purposed of tie-breaking project blocking/delaying issues. +For that Release, the Release Manager gains a veto, solely for purposed of +tie-breaking project blocking/delaying issues. -The ultimate role of the Release Manager is to successfully complete the Milestone, with the help of the development team. +The ultimate role of the Release Manager is to successfully complete the +Milestone, with the help of the development team. ## Tags -GitHub tags will be used to make releases of each repository that is part of the Nubis Project. -Tags *must* follow the format as defined in the [versioning](https://github.com/Nubisproject/nubis-docs/VERSIONING.md) doc. +GitHub tags will be used to make releases of each repository that is part of the +Nubis Project. -All tags *must* be [GPG Signed](https://git-scm.com/book/tr/v2/Git-Tools-Signing-Your-Work) by the Release Manager. This allows their integrity to be verified. +Tags *must* follow the format as defined in the [versioning](https://github.com/Nubisproject/nubis-docs/VERSIONING.md) +doc. -Each Nubis repository is allowed to follow it's own dash release tagging schedule, however we encourage them to follow the -dev model as defined in the [versioning](https://github.com/Nubisproject/nubis-docs/VERSIONING.md) doc. +All tags *must* be [GPG Signed](https://git-scm.com/book/tr/v2/Git-Tools-Signing-Your-Work) +by the Release Manager. This allows their integrity to be verified. -Major, Minor and Patch releases will be coordinated across all repositories, to provide a consistent versioning scheme for each Nubis Project release. +Each Nubis repository is allowed to follow it's own dash release tagging +schedule, however we encourage them to follow the -dev model as defined in the +[versioning](https://github.com/Nubisproject/nubis-docs/VERSIONING.md) doc. + +Major, Minor and Patch releases will be coordinated across all repositories, to +provide a consistent versioning scheme for each Nubis Project release. ## Changelogs -Each repository *must* contain a CHANGELOG.md document in the root directory, highlighting the changes between releases. It's a very common pattern for software projects. However, maintaining ChangeLogs can quickly become tiresome. -To address this, all ChangeLogs for Nubis will be generated using [github_changelog_generator](https://github.com/skywinder/github-changelog-generator) during the release process. +Each repository *must* contain a CHANGELOG.md document in the root directory, +highlighting the changes between releases. It's a very common pattern for +software projects. However, maintaining ChangeLogs can quickly become tiresome. + +To address this, all ChangeLogs for Nubis will be generated using +[github_changelog_generator](https://github.com/skywinder/github-changelog-generator) +during the release process. The process is quite simple and is executed like this: - $> github_changelog_generator --future-release v1.0.0 nubisproject/nubis-docs +```bash + +github_changelog_generator --future-release v1.0.0 nubisproject/nubis-docs + +``` ## Cadence -Patch releases are not on any schedule. They are released as soon as work is completed following notification of a security vulnerability necessitating a patch release. -Minor releases will be released on a quarterly cadence. They will be released as close to the end of the quarter as practical. We allow for a delay of up to two weeks after the quarter, but every effort must be made to keep within this grace period. +Patch releases are not on any schedule. They are released as soon as work is +completed following notification of a security vulnerability necessitating a +patch release. -Major releases will occur on an as-needed basis. They are reserved for backwards incompatible changes and therefore can not be defined in advance. Major releases will happen organically, as we discover the need as well as define and complete the milestones for them. +Minor releases will be released on a quarterly cadence. They will be released as +close to the end of the quarter as practical. We allow for a delay of up to two +weeks after the quarter, but every effort must be made to keep within this +grace period. + +Major releases will occur on an as-needed basis. They are reserved for backwards +incompatible changes and therefore can not be defined in advance. Major releases +will happen organically, as we discover the need as well as define and complete +the milestones for them. ## Announcing -The day-to-day mechanism for communicating with Nubis users is on the #nubis-users channel on irc.mozilla.org -For more official announcements, or announcements that require more reasonable delivery guarantees, we use the [nubis-announce](https://groups.google.com/d/forum/nubis-announce) distribution list. It can be reached at [nubis-announce@googlegroups.com](nubis-announce@googlegroups.com) +The day-to-day mechanism for communicating with Nubis users is on the +**#nubis-users** channel on irc.mozilla.org + +For more official announcements, or announcements that require more reasonable +delivery guarantees, we use the [nubis-announce](https://groups.google.com/d/forum/nubis-announce) +distribution list. It can be reached at [nubis-announce@googlegroups.com](nubis-announce@googlegroups.com) -There will be a formal announcement for all releases sent to the [nubis-announce@googlegroups.com](nubis-announce@googlegroups.com) distribution list. This announcement will follow the standard template found [here](https://github.com/Nubisproject/nubis-docs/templates/announce.txt). +There will be a formal announcement for all releases sent to the [nubis-announce@googlegroups.com](nubis-announce@googlegroups.com) +distribution list. This announcement will follow the standard template found +[here](https://github.com/Nubisproject/nubis-docs/templates/announce.txt). ## Process -When it comes time to create a release; all pull-requests related to this release have been merged, the changelog has been generated and all tests have passed, you are ready to cut a release. -This is the only time you will need to operate on an origin branch directly. This is different from the normal pull-request based work-flow in that tags can not be associated with a pull-request. Also this process is described using the master branch, you may wish to use a feature branch for your release work. +When it comes time to create a release; all pull-requests related to this +release have been merged, the changelog has been generated and all tests have +passed, you are ready to cut a release. -The first thing you will need to do is set up your branches in a manner similar to the following. That is one branch, named master, which is tracking your fork and one branch, named originmaster, tracking the master branch from the nubisproject origin. You can call these whatever you like, however these instructions will assume you have named them as shown here. +This is the only time you will need to operate on an origin branch directly. +This is different from the normal pull-request based work-flow in that tags can +not be associated with a pull-request. Also this process is described using the +master branch, you may wish to use a feature branch for your release work. + +The first thing you will need to do is set up your branches in a manner similar +to the following. That is one branch, named master, which is tracking your fork +and one branch, named originmaster, tracking the master branch from the +nubisproject origin. You can call these whatever you like, however these +instructions will assume you have named them as shown here. ```bash + git checkout -b originmaster --track origin/master git branch -avv * master 0645ce9 [tinnightcap/master] Update changelog for v0.9.0-beta1 release originmaster 7a8254d [origin/master] Merge pull request #8 from tinnightcap/master + ``` -Next you will need to ensure that your origin branch is current. Note that all pull-requests for this release need to have been merged onto the origin prior to this step. This includes having generated the changelog, committed it, created a pull-request, having had it code-reviewed and merged. Simply fetch and rebase. +Next you will need to ensure that your origin branch is current. Note that all +pull-requests for this release need to have been merged onto the origin prior to +this step. This includes having generated the changelog, committed it, created a +pull-request, having had it code-reviewed and merged. Simply fetch and rebase. + ```bash + git checkout master git fetch origin git rebase origin/master + ``` -Now we are ready to create the release using a signed tag as described [above](#tags). In this case I am creating a beta release for testing in advance of the v0.9.0 release. +Now we are ready to create the release using a signed tag as described [above](#tags). +In this case I am creating a beta release for testing in advance of the v0.9.0 +release. + ```bash git tag -s v0.9.0-beta -m"Signed beta release for upcoming v0.9.0 release" + ``` -Push the tag to your fork for testing. You should validate that the release files (.tar.gz & .zip) are working as expected and that there are no collisions or typos. You should be able to safely rename or delete tags from your fork, however once they are pushed to the origin they should no longer be deleted. You should ensure things are correct prior to continuing on to the next step. If you have any doubt, have your tags code-reviewed prior to continuing. +Push the tag to your fork for testing. You should validate that the release +files (.tar.gz & .zip) are working as expected and that there are no collisions +or typos. You should be able to safely rename or delete tags from your fork, +however once they are pushed to the origin they should no longer be deleted. +You should ensure things are correct prior to continuing on to the next step. +If you have any doubt, have your tags code-reviewed prior to continuing. + ```bash + git push --tags + ``` Switch to the originmaster branch. + ```bash + git checkout originmaster + ``` Make sure this branch is up to date. + ```bash + git pull + ``` Finally push the signed tag to the origin. + ```bash + git push --tags + ``` -That is it. You should verify once more that the release files are correct and send updates as appropriate. +That is it. You should verify once more that the release files are correct and +send updates as appropriate. ## Release Order -Due to some interdependencies between various repositories, the order in which repositories are released has become important. In general you need to release [nubis-stacks](https://github.com/Nubisproject/nubis-stacks) before you release any repositories that rely on the nested stacks. -There is a bit of a chicken and egg issue when it comes to releasing [nubis-storage](https://github.com/Nubisproject/nubis-storage). This is due to the fact that nubis-storage consumes nubis-stacks (requiring a released nested stack), however the nubis-stacks *storage.template* contains hard coded ami Ids. The process for solving this is quite simple: +Due to some interdependencies between various repositories, the order in which +repositories are released has become important. In general you need to release +[nubis-stacks](https://github.com/Nubisproject/nubis-stacks) before you release +any repositories that rely on the nested stacks. + +There is a bit of a chicken and egg issue when it comes to releasing [nubis-storage](https://github.com/Nubisproject/nubis-storage). +This is due to the fact that nubis-storage consumes nubis-stacks (requiring a +released nested stack), however the nubis-stacks *storage.template* contains +hard coded ami Ids. The process for solving this is quite simple: Upload the release ready nested stack templates to the new release directory: + ```bash + bin/upload_to_s3 --path "v0.9.0-beta" push + ``` -Next you need to [edit](https://github.com/Nubisproject/nubis-storage/blob/master/nubis/cloudformation/main.json#L35) and rebuild nubis-storage: +Next you need to [edit](https://github.com/Nubisproject/nubis-storage/blob/master/nubis/cloudformation/main.json#L35) +and rebuild nubis-storage: + ```bash + cd path/to/nubis-storage vi nubis/cloudformation/main.json /StacksVersion ~ update to latest release ~ nubis-builder build + ``` Place the generated ami Ids in the nubis-storage [main.json](https://github.com/Nubisproject/nubis-storage/blob/master/nubis/cloudformation/main.json#L76) + ```bash + cd path/to/nubis-storage vi nubis/cloudformation/main.json /Mappings ~ edit the Mappings with new ami Ids ~ + ``` Place the generated ami Ids in the [storage.template](https://github.com/Nubisproject/nubis-stacks/blob/master/storage.template#L98) + ```bash + cd path/to/nubis-stacks vi storage.template /Mappings ~ edit the Mappings with new ami Ids ~ + ``` Make your pull-request, have it code reviewed and merged: + ````bash + git add storage.template git commit -m"Update storage AMI Ids for v0.9.0-beta release" git push hub pull-request -m "Update storage AMI Ids for v0.9.0-beta release" + ``` -Now you can cut the release of the nubis-stacks repository, making sure you are up to date first: +Now you can cut the release of the nubis-stacks repository, making sure you are +up to date first: + ```bash + git checkout master git fetch origin git rebase origin/master git tag -s v0.9.0-beta -m"Signed beta release for upcoming v0.9.0 release" git push --tags + ``` -Complete the dance, making sure you push the tag to *originmaster* and fetch back the release ref. This ensures you have locally what is actually in the release. +Complete the dance, making sure you push the tag to *originmaster* and fetch +back the release ref. This ensures you have locally what is actually in the +release. + ```bash + git checkout originmaster git pull git push --tags git checkout master git fetch origin git rebase origin/master -``` -Finally push the actual release of nubis-stacks to the S3 bucket overwriting your previous, temporary uploads: -```bash -bin/upload_to_s3 --path "v0.9.0-beta" push ``` -You are now in a position to release the remaining repositories (including nubis-storage). There is generally no order to this process, however there are a few remaining points. +Finally push the actual release of nubis-stacks to the S3 bucket overwriting +your previous, temporary uploads: -You MUST test the (currently) three example repositories to make sure they work prior to releasing them. This is important due to the fact that we are making a guarantee that if a user chooses to use the project at a known good release, that this release will be, well, good. What this means is that you need to actually *nubis-builder build* AND *cloudformation create-stack* on all of the example repositories followed by some testing. Be sure to update their respective cloudformation templates to use the newly released nubis-stacks before you deploy them. The three repositories are: +```bash - * [nubis-skel](https://github.com/Nubisproject/nubis-skel) - * [nubis-dpaste](https://github.com/Nubisproject/nubis-dpaste) - * [nubis-mediawiki](https://github.com/Nubisproject/nubis-mediawiki) +bin/upload_to_s3 --path "v0.9.0-beta" push + +``` -That is about all there is to it. You need to close a few issues and send an announcement to the nubis-announce list, but I am sure you remember all of that from higher up in this doc. Cheers. +You are now in a position to release the remaining repositories (including +nubis-storage). There is generally no order to this process, however there are a +few remaining points. + +You MUST test the (currently) three example repositories to make sure they work +prior to releasing them. This is important due to the fact that we are making a +guarantee that if a user chooses to use the project at a known good release, +that this release will be, well, good. What this means is that you need to +actually *nubis-builder build* AND *cloudformation create-stack* on all of the +example repositories followed by some testing. Be sure to update their +respective cloudformation templates to use the newly released nubis-stacks +before you deploy them. The three repositories are: + +* [nubis-skel](https://github.com/Nubisproject/nubis-skel) +* [nubis-dpaste](https://github.com/Nubisproject/nubis-dpaste) +* [nubis-mediawiki](https://github.com/Nubisproject/nubis-mediawiki) + +That is about all there is to it. You need to close a few issues and send an +announcement to the nubis-announce list, but I am sure you remember all of that +from higher up in this doc. Cheers. diff --git a/SECURITY.md b/SECURITY.md index 7dd7c5d..86fa468 100644 --- a/SECURITY.md +++ b/SECURITY.md @@ -1 +1,2 @@ + # Nubis - Building Secure Images diff --git a/TEMPLATING.MD b/TEMPLATING.MD deleted file mode 100644 index baab86c..0000000 --- a/TEMPLATING.MD +++ /dev/null @@ -1,82 +0,0 @@ -# Templating -This document covers templating within the Nubis project. We have gone through a few iterations of templating and are currently using Terraform for all templating within the project. Further we recommend that any projects or applications that deploy on top of a Nubis deployment also adopt Terraform as their templating framework. While we place no restrictions on the deployment framework or methodology that a team uses to deploy on top of the Nubis platform, we encourage the use of Terraform for a number of reasons. - -This document is divided into three main sections: - - [Recommended Practices for Terraform](#recommended-practices-for-terraform) - - [Terraform versus Cloudformation](#terraform-versus-cloudformation) - - [Recommended Practices for Cloudformation](#recommended-practices-for-cloudformation) (legacy documentation) - -## Recommended Practices for Terraform -TODO - -## Terraform versus Cloudformation -We have had a bit of back and forth with this decision. Technically speaking we casually considered other deployment frameworks (boto ansible, etc...), however we only put serious development time into Terraform and Cloudformation. - -### A bit of history -When we started the Nubis project we decided to use Terraform in general. There were folks on the team who did not agree and those folks used Cloudformation. Some time into development we discovered a few limitations with Terraform, specifically how it handled RDS, and reluctantly decided to use Cloudformation. Eventually the limitations of Cloudformation led us down the path of extending Cloudformation with Lambda functions. - -We then got to a point where we were looking to streamline the account creation and update process. As part of this streamlining we discovered that we needed to use some sort of wrapper around Cloudformation in order to integrate with other 3rd party tools (DNS, IPAM, Monitoring, etc...). We did some requirements gathering and began looking to see if there were any existing open source tools or if we would need to create and maintain our own set of tools. It was at that time that we decided to take another look at Terraform, it had been more than a year since we had moved away from it and there had been a large amount of development around it during that time. - -We chose to do some prototyping of the account creation process using Terraform along with both official and 3rd-party Terraform modules. We were really excited with the progress that had been made and were delighted to discover just how many of our needs were either already solved or in active development. We decided, through much debate and deliberation, to go ahead and switch the entire Nubis project over to Terraform and to discontinue our use of Cloudformation. - -I should also mention for completeness that we managed to negotiate a limited NDA with Amazon Web Services (AWS) and got a briefing on the Cloudformation road-map. The information we got out of that meeting was quite informative and we took that into consideration before making our final decision. While there are some very interesting things on the horizon, we felt that Terraform still offered enough additional features as to make it the obvious choice. - -I was asked to document the rational behind the decision in the form of a Terraform versus Cloudformation list. That list follows: - -### Pros and Cons Matrix - -| Feature | Terraform | Cloudformation | -|---------|-----------|----------------| -| Documentation Capability | Yes | No[1](#1) | -| Integration with 3rd-party tools | Yes | No | -| Amazon Native | No | Yes | -| Run-time Executions | Yes | No[2](#2) | -| Dry-Run | Yes | Yes | -| Access to Outputs | Yes | No[3](#3) | -| Multi-Component Dependencies | Yes | No[4](#4) | -| Cloud Agnostic[5](#5) | Yes | No | -| Dependency Graphing | Yes | No | -| Human Readable Configuration | Yes | No[6](#6) | -| Prescriptive Operation[7](#7) | Yes| No | -| Reusable Modularity[8](#8) | Yes | No | -| Open Source[9](#9) | Yes | No | -| Multi Region Deployments | Yes | No | -| Parallelized Resource Deployment[10](#10) | Yes | No | - -1: There is technically a metadata parameter hack that offers some limited documentation, but IMHO this does not reach the minimum bar for any real metric of documentation.
-
2: Can handle *some* cases (ie: uuidgen) but it requires building and maintaining lambda functions.
-
3 We hacked this with a lambda function that required very precise coordination of output maps to necessary inputs.
-
4: We started writing a script that executed multiple cloudformation stacks but ran into a lot of issues when things did not deploy perfectly.
-
5: Our tool evaluation guidelines state that we should only consider AWS specifically, however I included this for completeness.
-
6: There are those who would argue that a machine interface language, having been invented by humans, is human readable. I posit that while a computer exchange format might be able to be understood by humans, and even authored by them, that in of itself is not enough. When things, like ease of use, readability, formatting, commenting, etc are taken into account JSON falls far short of what I consider human readable or at least reasonable human manageable.
-
7: During stack creation, if one or a few resources fail to create Terraform has the ability to retry in a graceful manner. Cloudformation partial failure handling does not exist and it is necessary to roll back the entire deployment. During the development phase it is substantially faster when using Terraform due to this behavior.
-
8: Terraform has a large number of native and 3rd-party modules to do all manner of things, like EC2 or RDS deployments. Cloudformation has no official modular support. We did exploit the nested stacks towards this aim with general success. It is worth noting that we never discovered any 3rd-party nested stacks that were usable as-is.
-
9: This is a really broad topic and I am consistently surprised at how little credit is given to this point. There are numerous advantages to participating in a healthy open source project over using closed source technology. I will highlight a few here. With open source you can, well view the source code. This is helpful for all manner of troubleshooting or, as we have done, to add instrumentation to the tool to see where things have gone awry. With open source projects you can, and we have, submit patches when you find issues. This ensures that the issues that are of the greatest impact to you get prioritized right to the top of the list. For issues that are not that urgent, you can subscribe to the issues on GitHub and therefore get notifications when there is traction on those issues, not to mention the ability to vote on issues and therefore influence their prioritization. With any healthy open source project, like Terraform, you can hang out in their public IRC channel and garner all sorts of useful tips and tricks as well as the obvious ability to chat directly with the developers, more or less, at any time. I could go on and on here but for now I will digress the point.
-
10: Terraform account deployments take around 5 minutes total. Cloudformation deployments take around 40 minutes *per region*.
- -## Recommended Practices for Cloudformation -**NOTE: This section has been left for legacy reasons and is guaranteed to no longer be accurate.** - -Cloudformation is a necessary evil when working with AWS. It uses JSON which has a number of staggering limitations. You will soon learn that it is overly rigid in its formatting. Additionally it lacks commenting, which, as you know, is a rather atrocious limitation. In an effort to limit your exposure to JSON we have adopted a nested stack model. Basically you will create a stack template which will use these ready made nested stacks. For an example check out this section of the [nubis-mediawiki template](https://github.com/Nubisproject/nubis-mediawiki/blob/master/nubis/cloudformation/main.json#L70). - -### Nested Stacks -Nested stacks are in and of themselves simply stacks that you include in a higher level, or container stack. We have created a number of stack templates to cover the most common use cases. You can take a look at them [here](https://github.com/Nubisproject/nubis-stacks). For each stack template we have included a README which includes usage code that you can copy into your stack template. Following the previous example from the nubis-mediawiki project you can see the EC2Stack nested stack template [here](https://github.com/Nubisproject/nubis-stacks/blob/master/ec2.template). - -### Stack Outputs -We have created a small [function](https://github.com/Nubisproject/nubis-stacks/blob/master/lambda/LookupStackOutputs/LookupStackOutputs.README.md) that runs in [Lambda](http://aws.amazon.com/lambda/) (an AWS compute service) which makes the outputs of other stacks available for reference in your template. You will find us using this function in nearly every nested stack, sometimes multiple times. While you may not find a need for this in your template it is necessary knowledge for understanding the nested stack templates. For example, in the EC2Stack example above we are calling the function as [VpcInfo](https://github.com/Nubisproject/nubis-stacks/blob/master/ec2.template#L48) and using the VpcId output of the $region-$environment-vpc stack [here](https://github.com/Nubisproject/nubis-stacks/blob/master/ec2.template#L73). - -### Parameterization -By utilizing stack outputs we are able to minimize the number of parameters (AWS name for input variables) we need. This simplifies deployments, especially when multiple developers are working on the same project. Back in the nubis-mediawiki project you will find the [parameters.json-dist file](https://github.com/Nubisproject/nubis-mediawiki/blob/master/nubis/cloudformation/parameters.json-dist) to contain only the absolute minimum[minimum](#min) number of parameters. These are the parameters that are necessary for every project that utilizes the Nubis project. - -| Parameter | Description | -|---------------|-------------| -|ServiceName | Name of service from [here](https://inventory.mozilla.org/en-US/core/service/) -|Environment | Sandbox or Dev or Prod -|SSHKeyName | Name of AWS ssh key to install on ec2 instances -|TechnicalOwner | Email address or distribution list -|AmiId | ID output from nubis-builder - -
minimum: Well, not really since technically the environment can be discovered. - -### Credentials -When deploying your stack using the [AWS cli tools](http://aws.amazon.com/cli/) you will be using an API keypair. You will need to take extra precaution to ensure that these secrets remain, well, secret. This includes dressing up your .gitignore file, taking care with pastebins and the like. diff --git a/TEMPLATING.md b/TEMPLATING.md new file mode 100644 index 0000000..4989d66 --- /dev/null +++ b/TEMPLATING.md @@ -0,0 +1,200 @@ + + +# Templating + +This document covers templating within the Nubis project. We have gone through a +few iterations of templating and are currently using Terraform for all +templating within the project. Further we recommend that any projects or +applications that deploy on top of a Nubis deployment also adopt Terraform as +their templating framework. While we place no restrictions on the deployment +framework or methodology that a team uses to deploy on top of the Nubis +platform, we encourage the use of Terraform for a number of reasons. + +This document is divided into three main sections: + +* [Recommended Practices for Terraform](#recommended-practices-for-terraform) +* [Terraform versus Cloudformation](#terraform-versus-cloudformation) +* [Recommended Practices for Cloudformation](#recommended-practices-for-cloudformation) + (legacy documentation) + +## Recommended Practices for Terraform + +TODO + +## Terraform versus Cloudformation + +We have had a bit of back and forth with this decision. Technically speaking we +casually considered other deployment frameworks (boto ansible, etc...), however +we only put serious development time into Terraform and Cloudformation. + +### A bit of history + +When we started the Nubis project we decided to use Terraform in general. There +were folks on the team who did not agree and those folks used Cloudformation. +Some time into development we discovered a few limitations with Terraform, +specifically how it handled RDS, and reluctantly decided to use Cloudformation. +Eventually the limitations of Cloudformation led us down the path of extending +Cloudformation with Lambda functions. + +We then got to a point where we were looking to streamline the account creation +and update process. As part of this streamlining we discovered that we needed to +use some sort of wrapper around Cloudformation in order to integrate with other +3rd party tools (DNS, IPAM, Monitoring, etc...). We did some requirements +gathering and began looking to see if there were any existing open source tools +or if we would need to create and maintain our own set of tools. It was at that +time that we decided to take another look at Terraform, it had been more than a +year since we had moved away from it and there had been a large amount of +development around it during that time. + +We chose to do some prototyping of the account creation process using Terraform +along with both official and 3rd-party Terraform modules. We were really excited +with the progress that had been made and were delighted to discover just how +many of our needs were either already solved or in active development. We +decided, through much debate and deliberation, to go ahead and switch the entire +Nubis project over to Terraform and to discontinue our use of Cloudformation. + +I should also mention for completeness that we managed to negotiate a limited +NDA with Amazon Web Services (AWS) and got a briefing on the Cloudformation +road-map. The information we got out of that meeting was quite informative and +we took that into consideration before making our final decision. While there +are some very interesting things on the horizon, we felt that Terraform still +offered enough additional features as to make it the obvious choice. + +I was asked to document the rational behind the decision in the form of a +Terraform versus Cloudformation list. That list follows: + +### Pros and Cons Matrix + +| Feature | Terraform | Cloudformation | +|---------|-----------|----------------| +| Documentation Capability | Yes | No[^1] | +| Integration with 3rd-party tools | Yes | No | +| Amazon Native | No | Yes | +| Run-time Executions | Yes | No[^2] | +| Dry-Run | Yes | Yes | +| Access to Outputs | Yes | No[^3] | +| Multi-Component Dependencies | Yes | No[^4] | +| Cloud Agnostic[^5] | Yes | No | +| Dependency Graphing | Yes | No | +| Human Readable Configuration | Yes | No[^6] | +| Prescriptive Operation[^7] | Yes| No | +| Reusable Modularity[^8] | Yes | No | +| Open Source[^9] | Yes | No | +| Multi Region Deployments | Yes | No | +| Parallelized Resource Deployment[^10] | Yes | No | + +[^1]: There is technically a metadata parameter hack that offers some limited +documentation, but IMHO this does not reach the minimum bar for any real metric +of documentation. + +[^2]: Can handle *some* cases (ie: uuidgen) but it requires building and +maintaining lambda functions. + +[^3]: We hacked this with a lambda function that required very precise +coordination of output maps to necessary inputs. + +[^4]: We started writing a script that executed multiple cloudformation stacks +but ran into a lot of issues when things did not deploy perfectly. + +[^5]: Our tool evaluation guidelines state that we should only consider AWS +specifically, however I included this for completeness. + +[^6]: There are those who would argue that a machine interface language, having +been invented by humans, is human readable. I posit that while a computer +exchange format might be able to be understood by humans, and even authored by +them, that in of itself is not enough. When things, like ease of use, +readability, formatting, commenting, etc are taken into account JSON falls far +short of what I consider human readable or at least reasonable human manageable. + +[^7]: During stack creation, if one or a few resources fail to create Terraform +has the ability to retry in a graceful manner. Cloudformation partial failure +handling does not exist and it is necessary to roll back the entire deployment. +During the development phase it is substantially faster when using Terraform due +to this behavior. + +[^8]: Terraform has a large number of native and 3rd-party modules to do all +manner of things, like EC2 or RDS deployments. Cloudformation has no official +modular support. We did exploit the nested stacks towards this aim with general +success. It is worth noting that we never discovered any 3rd-party nested stacks +that were usable as-is. + +[^9]: This is a really broad topic and I am consistently surprised at how little +credit is given to this point. There are numerous advantages to participating in +a healthy open source project over using closed source technology. I will +highlight a few here. With open source you can, well view the source code. This +is helpful for all manner of troubleshooting or, as we have done, to add +instrumentation to the tool to see where things have gone awry. With open source +projects you can, and we have, submit patches when you find issues. This ensures +that the issues that are of the greatest impact to you get prioritized right to +the top of the list. For issues that are not that urgent, you can subscribe to +the issues on GitHub and therefore get notifications when there is traction on +those issues, not to mention the ability to vote on issues and therefore +influence their prioritization. With any healthy open source project, like +Terraform, you can hang out in their public IRC channel and garner all sorts of +useful tips and tricks as well as the obvious ability to chat directly with the +developers, more or less, at any time. I could go on and on here but for now I +will digress the point. + +[^10]: Terraform account deployments take around 5 minutes total. Cloudformation +deployments take around 40 minutes *per region*. + +## Recommended Practices for Cloudformation + +**NOTE: This section has been left for legacy reasons and is guaranteed to no** +**longer be accurate.** + +Cloudformation is a necessary evil when working with AWS. It uses JSON which has +a number of staggering limitations. You will soon learn that it is overly rigid +in its formatting. Additionally it lacks commenting, which, as you know, is a +rather atrocious limitation. In an effort to limit your exposure to JSON we have +adopted a nested stack model. Basically you will create a stack template which +will use these ready made nested stacks. For an example check out this section +of the [nubis-mediawiki template](https://github.com/Nubisproject/nubis-mediawiki/blob/master/nubis/cloudformation/main.json#L70). + +### Nested Stacks + +Nested stacks are in and of themselves simply stacks that you include in a +higher level, or container stack. We have created a number of stack templates to +cover the most common use cases. You can take a look at them [here](https://github.com/Nubisproject/nubis-stacks). +For each stack template we have included a README which includes usage code that +you can copy into your stack template. Following the previous example from the +nubis-mediawiki project you can see the EC2Stack nested stack template [here](https://github.com/Nubisproject/nubis-stacks/blob/master/ec2.template). + +### Stack Outputs + +We have created a small [function](https://github.com/Nubisproject/nubis-stacks/blob/master/lambda/LookupStackOutputs/LookupStackOutputs.README.md) +that runs in [Lambda](http://aws.amazon.com/lambda/) (an AWS compute service) +which makes the outputs of other stacks available for reference in your +template. You will find us using this function in nearly every nested stack, +sometimes multiple times. While you may not find a need for this in your +template it is necessary knowledge for understanding the nested stack templates. +For example, in the EC2Stack example above we are calling the function as +[VpcInfo](https://github.com/Nubisproject/nubis-stacks/blob/master/ec2.template#L48) +and using the VpcId output of the $region-$environment-vpc stack [here](https://github.com/Nubisproject/nubis-stacks/blob/master/ec2.template#L73). + +### Parameterization + +By utilizing stack outputs we are able to minimize the number of parameters +(AWS name for input variables) we need. This simplifies deployments, especially +when multiple developers are working on the same project. Back in the +nubis-mediawiki project you will find the [parameters.json-dist file](https://github.com/Nubisproject/nubis-mediawiki/blob/master/nubis/cloudformation/parameters.json-dist) +to contain only the absolute minimum[^minimum] number of +parameters. These are the parameters that are necessary for every project that +utilizes the Nubis project. + +| Parameter | Description | +|---------------|-------------| +|ServiceName | Name of service from [here](https://inventory.mozilla.org/en-US/core/service/) +|Environment | Sandbox or Dev or Prod +|SSHKeyName | Name of AWS ssh key to install on ec2 instances +|TechnicalOwner | Email address or distribution list +|AmiId | ID output from nubis-builder + +[^minimum]: Well, not really since technically the environment can be discovered. + +### Credentials + +When deploying your stack using the [AWS cli tools](http://aws.amazon.com/cli/) +you will be using an API keypair. You will need to take extra precaution to +ensure that these secrets remain, well, secret. This includes dressing up your +.gitignore file, taking care with pastebins and the like. diff --git a/VERSIONING.md b/VERSIONING.md index d207abe..ff51302 100644 --- a/VERSIONING.md +++ b/VERSIONING.md @@ -1,47 +1,87 @@ -# Versioning -This document describes the versioning standard for the Nubis Project. All repositories and projects that make up part of the "Nubis Platform" *must* conform to this standard. + -### Overview -Conceptually the versioning standard in Nubis is quite simple. In practice there are a lot of moving parts that go into successfully implementing the standard. For detailed information on the processes behind implementation take a look at the [releasing](https://github.com/Nubisproject/nubis-docs/RELEASING.md) doc. +# Versioning -We take advantage of [semantic versioning](http://semver.org/) in the form of vn.n.n, where v stands for 'v'ersion. Additionally we have a pre-release (dash) standard. +This document describes the versioning standard for the Nubis Project. All +repositories and projects that make up part of the "Nubis Platform" *must* +conform to this standard. -### Semantic Versioning -We use Semantic versioning without modification where the numbers stand for MAJOR.MINOR.PATCH. We also take advantage of the -PRE_RELEASE option for development and feature preview releases. We have the _BUILD_METADATA option available in cases where it is desirable to append a git hash or similar piece of identifying information. +## Overview -As a side note, we use an underscore (_) in place of a plus (+) due to limitations in Amazons IAM name spacing. This is the only place where we differ form Semantic versioning. +Conceptually the versioning standard in Nubis is quite simple. In practice there +are a lot of moving parts that go into successfully implementing the standard. +For detailed information on the processes behind implementation take a look at +the [releasing](https://github.com/Nubisproject/nubis-docs/RELEASING.md) doc. + +We take advantage of [semantic versioning](http://semver.org/) in the form of +vn.n.n, where v stands for 'v'ersion. Additionally we have a pre-release (dash) +standard. + +## Semantic Versioning + +We use Semantic versioning without modification where the numbers stand for +MAJOR.MINOR.PATCH. We also take advantage of the -PRE_RELEASE option for +development and feature preview releases. We have the _BUILD_METADATA option +available in cases where it is desirable to append a git hash or similar piece +of identifying information. + +As a side note, we use an underscore (_) in place of a plus (+) due to +limitations in Amazons IAM name spacing. This is the only place where we differ +form Semantic versioning. Example Release Flow: - - v1.2.0 (Normal Release) - - v1.2.0-dev - - v1.2.0-dev_githash - - v1.2.0-fp1 - - v1.2.0-fp2 - - v1.2.0_githash - - v1.2.1 (Security Release) - - v1.2.1-dev - - v1.2.1_githash - - v1.3.0 (Normal Release) - - v1.3.0-dev - - v1.3.0-fp1 - -### Patch Release -There are two reasons for a patch release, either we discover regressions after the release or an important security vulnerability is discovered. In either case we will bump the patch segment and release through the normal process. - -### Pre Releases -Pre releases are used for development work and feature previews. They take the form of -name with an optional incrementing number (-nameX). - -#### Development Release (-dev) -The development dash release (-dev) is the working release. It is incremented immediately after any minor or patch release. For example, once v1.0.2 is released the v1.0.3-dev release will be "cut". In reality this is akin to riding master. In fact most repositories in development for the next point release will be riding master (the -dev release). - -The -dev release is unlike other releases in that it is not a "release" in the true sense of the word. This release is not a stable target by any means. It is intended only for development work on the Nubis platform and is "released" multiple times without any notice or incrementing any numbers. - -#### Feature Preview Release (-fpX) -Occasionally there is a desire to make a feature available outside of the normal release cadence. This type of release will only be initiated by the Nubis platform team. Feature previews are intended to provide downstream projects time to test and the opportunity to provide feedback before a feature "goes live" during a normal release. There is no schedule for feature preview releases and implementation is entirely optional. - -### Additional Notes - - All libraries *must* be pinned to a version. - - Utilities *may* be pinned to present or latest. - - All packer jobs (nubis-builder) *must* be pinned to a release of nubis-base. - - (ie: source_ami_project_version": "v1.0.2) - - Dependant projects *must* update their nubis-base version to the current release before releasing. + +* v1.2.0 (Normal Release) + * v1.2.0-dev + * v1.2.0-dev_githash + * v1.2.0-fp1 + * v1.2.0-fp2 + * v1.2.0_githash +* v1.2.1 (Security Release) + * v1.2.1-dev + * v1.2.1_githash +* v1.3.0 (Normal Release) + * v1.3.0-dev + * v1.3.0-fp1 + +## Patch Release + +There are two reasons for a patch release, either we discover regressions after +the release or an important security vulnerability is discovered. In either case +we will bump the patch segment and release through the normal process. + +## Pre Releases + +Pre releases are used for development work and feature previews. They take the +form of -name with an optional incrementing number (-nameX). + +### Development Release (-dev) + +The development dash release (-dev) is the working release. It is incremented +immediately after any minor or patch release. For example, once v1.0.2 is +released the v1.0.3-dev release will be "cut". In reality this is akin to riding +master. In fact most repositories in development for the next point release will +be riding master (the -dev release). + +The -dev release is unlike other releases in that it is not a "release" in the +true sense of the word. This release is not a stable target by any means. It is +intended only for development work on the Nubis platform and is "released" +multiple times without any notice or incrementing any numbers. + +### Feature Preview Release (-fpX) + +Occasionally there is a desire to make a feature available outside of the normal +release cadence. This type of release will only be initiated by the Nubis +platform team. Feature previews are intended to provide downstream projects time +to test and the opportunity to provide feedback before a feature "goes live" +during a normal release. There is no schedule for feature preview releases and +implementation is entirely optional. + +## Additional Notes + +* All libraries *must* be pinned to a version. +* Utilities *may* be pinned to present or latest. +* All packer jobs (nubis-builder) *must* be pinned to a release of nubis-base. + * (ie: source_ami_project_version": "v1.0.2) +* Dependant projects *must* update their nubis-base version to the current + release before releasing. diff --git a/WALKTHROUGH.md b/WALKTHROUGH.md index a83396d..85fb406 100644 --- a/WALKTHROUGH.md +++ b/WALKTHROUGH.md @@ -1,21 +1,43 @@ -## Walk Through -The process can seem a bit overwhelming and complicated at first. In an attempt to clarify the process for you I have created this little walk through that will allow you to deploy your first application using the Nubis Project. + -We will be using the [nubis-dpaste](https://github.com/nubisproject/nubis-dpste) application for our deployment today. +# Walk Through -If you have not already done so I recommend you read the documents linked in [this](https://github.com/Nubisproject/nubis-docs/blob/master/GETTING_STARTED.md#familiarize-yourself-with-the-nubis-project) section of the [Getting Started](https://github.com/Nubisproject/nubis-docs/blob/master/GETTING_STARTED.md) guide. +The process can seem a bit overwhelming and complicated at first. In an attempt +to clarify the process for you I have created this little walk through that will +allow you to deploy your first application using the Nubis Project. -### Prerequisites -First things first, you need to install a few tools in order to deploy an application using the Nubis Project. This is all covered in our [Prerequisites document](https://github.com/Nubisproject/nubis-docs/blob/master/PREREQUISITES.md). +We will be using the [nubis-dpaste](https://github.com/nubisproject/nubis-dpste) +application for our deployment today. -### Checkout and Deploy -The next thing is to checkout the code and deploy the application. This is covered in the nubis-dpaste [README](https://github.com/Nubisproject/nubis-dpaste/blob/master/README.md) +If you have not already done so I recommend you read the documents linked in +[this](https://github.com/Nubisproject/nubis-docs/blob/master/GETTING_STARTED.md#familiarize-yourself-with-the-nubis-project) +section of the [Getting Started](https://github.com/Nubisproject/nubis-docs/blob/master/GETTING_STARTED.md) +guide. -### Play -It is now time to play around with your new deployment. You will need to look up the DNS name of the load balancer in the [AWS web console](https://us-west-2.console.aws.amazon.com/ec2/v2/home#LoadBalancers:). Place that load balancer name in your Firefox URL bar and you should see the dpaste app. +## Prerequisites + +First things first, you need to install a few tools in order to deploy an +application using the Nubis Project. This is all covered in our [Prerequisites document](https://github.com/Nubisproject/nubis-docs/blob/master/PREREQUISITES.md). + +## Checkout and Deploy + +The next thing is to checkout the code and deploy the application. This is +covered in the nubis-dpaste [README](https://github.com/Nubisproject/nubis-dpaste/blob/master/README.md) + +## Play + +It is now time to play around with your new deployment. You will need to look up +the DNS name of the load balancer in the [AWS web console](https://us-west-2.console.aws.amazon.com/ec2/v2/home#LoadBalancers:). +Place that load balancer name in your Firefox URL bar and you should see the +dpaste app. In order to ssh into your instance you will need to connect through a jumphost. - $> ssh ec2-user@jumphost1.sandbox.us-west-2.nubis.allizom.org +```bash + +ssh ec2-user@jumphost1.sandbox.us-west-2.nubis.allizom.org + +``` -Congratulations on your first Nubis deployment. Don't forget to delete your stack once you are done playing around to avoid excess billing charges. \ No newline at end of file +Congratulations on your first Nubis deployment. Don't forget to delete your +stack once you are done playing around to avoid excess billing charges. diff --git a/presentations/IT_Walk_Through_20150601_Links.md b/presentations/IT_Walk_Through_20150601_Links.md index 9d56e4a..807e042 100644 --- a/presentations/IT_Walk_Through_20150601_Links.md +++ b/presentations/IT_Walk_Through_20150601_Links.md @@ -1,48 +1,100 @@ + # IT Walk Through 20150601 Links -## Slide 1: -https://github.com/Nubisproject/nubis-docs/blob/master/presentations/IT_Walk_Through_20150601.odp +## Slide 1 + +[IT_Walk_Through_20150601.odp](https://github.com/Nubisproject/nubis-docs/blob/master/presentations/IT_Walk_Through_20150601.odp) + +## Slide 5 + +[Aws Credential](https://github.com/Nubisproject/nubis-docs/blob/master/PREREQUISITES.md#aws-credentials) -## Slide 5: -https://github.com/Nubisproject/nubis-docs/blob/master/PREREQUISITES.md#aws-credentials +## Slide 6 -## Slide 6: -https://github.com/Nubisproject/nubis-docs/blob/master/PREREQUISITES.md#github-account +[Github Account](https://github.com/Nubisproject/nubis-docs/blob/master/PREREQUISITES.md#github-account) -https://github.com/Nubisproject/nubis-builder#dependencies +[Dependencies](https://github.com/Nubisproject/nubis-builder#dependencies) -## Slide 8: -https://github.com/Nubisproject/nubis-dpaste +## Slide 8 + +[Nubis Dpaste](https://github.com/Nubisproject/nubis-dpaste) + +```bash git clone git@github.com:YOU/nubis-dpaste.git +``` + +```bash + git submodule update --init --recursive -## Slide 10: +``` + +## Slide 10 + +```bash + nubis-builder build -## Slide 11: -AmiId: ami-7bbb844b +``` + +## Slide 11 + +AmiId: *ami-7bbb844b* -aws cloudformation create-stack --template-body file://nubis/cloudformation/main.json --parameters file://nubis/cloudformation/parameters.json --stack-name nubis-xxx +```bash -https://github.com/tinnightcap/nubis-dpaste/blob/master/nubis/cloudformation/README.md#set-up +aws cloudformation create-stack \ +--template-body file://nubis/cloudformation/main.json \ +--parameters file://nubis/cloudformation/parameters.json \ +--stack-name nubis-xxx -https://github.com/tinnightcap/nubis-dpaste/blob/master/nubis/cloudformation/README.md#create +``` -## Slide 12: -nubis-consul --stack-name nubis-xxx --settings nubis/cloudformation/parameters.json get-and-update +[Set Up](https://github.com/tinnightcap/nubis-dpaste/blob/master/nubis/cloudformation/README.md#set-up) -https://github.com/tinnightcap/nubis-dpaste/blob/master/nubis/cloudformation/README.md#update-consul +[Create](https://github.com/tinnightcap/nubis-dpaste/blob/master/nubis/cloudformation/README.md#create) -## Slide 13: -ssh -A -t ec2-user@jumphost.sandbox.us-west-2.nubis.allizom.org "ssh -A -t ubuntu@$(nubis-consul --stack-name nubis-xxx --settings nubis/cloudformation/parameters.json get-ec2-instance-ip)" +## Slide 12 -https://github.com/tinnightcap/nubis-dpaste/blob/master/nubis/cloudformation/README.md#login +```bash + +nubis-consul --stack-name nubis-xxx \ +--settings nubis/cloudformation/parameters.json get-and-update + +``` + +[Update Consul](https://github.com/tinnightcap/nubis-dpaste/blob/master/nubis/cloudformation/README.md#update-consul) + +## Slide 13 + +```bash + +ssh -A -t ec2-user@jumphost.sandbox.us-west-2.nubis.allizom.org \ +"ssh -A -t ubuntu@$(nubis-consul \ +--stack-name nubis-xxx \ +--settings nubis/cloudformation/parameters.json \ +get-ec2-instance-ip)" + +``` + +[Login](https://github.com/tinnightcap/nubis-dpaste/blob/master/nubis/cloudformation/README.md#login) + +## Slide 16 + +```bash -## Slide 16: aws cloudformation delete-stack --stack-name nubis-xxx -nubis-consul --stack-name nubis-xxx --settings nubis/cloudformation/parameters.json delete +``` + +```bash + +nubis-consul --stack-name nubis-xxx \ +--settings nubis/cloudformation/parameters.json \ +delete + +``` -https://github.com/tinnightcap/nubis-dpaste/blob/master/nubis/cloudformation/README.md#delete +[Delete](https://github.com/tinnightcap/nubis-dpaste/blob/master/nubis/cloudformation/README.md#delete) diff --git a/reports/20150203.md b/reports/20150203.md index babd0b8..dd83467 100644 --- a/reports/20150203.md +++ b/reports/20150203.md @@ -1,26 +1,33 @@ + # Nubis progress report for 20150203 -This is the first of what should be weekly status report on the progress of the Nubis effort at Mozilla. Longer than usual, since this is the first **ever** +This is the first of what should be weekly status report on the progress of the +Nubis effort at Mozilla. Longer than usual, since this is the first **ever** -This week is all about preparation for the upcoming Sprint. We've been hard at work putting together the remaining bits to hit our interal goal of delivering a working sample application on top of the Nubis infrastructure. +This week is all about preparation for the upcoming Sprint. We've been hard at +work putting together the remaining bits to hit our interal goal of delivering a +working sample application on top of the Nubis infrastructure. -We've picked dpaste for it's simplicity and yet, it's a mature application that could be used at Mozilla. +We've picked dpaste for it's simplicity and yet, it's a mature application that +could be used at Mozilla. -We aren't there yet, but are confident we'll have it up and running before the Sprint. +We aren't there yet, but are confident we'll have it up and running before the +Sprint. We are also working on many fronts, as is usual: - * Finishing up a VPC design recommendation with dcurado (gozer) - * Starting to flesh out the user-story for a new Nubis project onboarding - * Starting to work an AMI registry (digi) - * Thinking about security and thread models - * Thinking about secret distribution, after great conversations with secops +* Finishing up a VPC design recommendation with dcurado (gozer) +* Starting to flesh out the user-story for a new Nubis project onboarding +* Starting to work an AMI registry (digi) +* Thinking about security and thread models +* Thinking about secret distribution, after great conversations with secops ## Components ### nubis-docs -Still in its infancy, but work on producing content has finally started (_this document included_) +Still in its infancy, but work on producing content has finally started +(_this document included_) ### nubis-base @@ -36,19 +43,21 @@ Quick summary of recent changes, including last week: * moved to the new nubis/ directory strucure -Otherwise, somewhat dormant, waiting for our first application prototype to get working on it again. +Otherwise, somewhat dormant, waiting for our first application prototype to get +working on it again. ### nubis-puppet Now includes: - * confd - * dnsmasq - * fluentd - * postfix +* confd +* dnsmasq +* fluentd +* postfix ### nubis-fluentd First release of a simple fluentd collector that just dumps logs to disk. -Registers itself via Consul and is discovered by default, if present, by nubis-base. +Registers itself via Consul and is discovered by default, if present, by +nubis-base. diff --git a/training/README.md b/training/README.md index 27a22ac..7d56343 100644 --- a/training/README.md +++ b/training/README.md @@ -1,182 +1,199 @@ -# Index -Welcome to the Nubis training documentation. As of this writing this is very much a work in progress. The table of contents is currently designed to help me to design the course content and help me keep track of my progress and what I have left to accomplish. This list will eventually link out to the content covered by each section. + + +# Index + +Welcome to the Nubis training documentation. As of this writing this is very +much a work in progress. The table of contents is currently designed to help me +to design the course content and help me keep track of my progress and what I +have left to accomplish. This list will eventually link out to the content +covered by each section. ## Table of Contents -This section contains the material we will cover in the classroom session. Eventually we hope to develop this into a self-guided training course that a person can work through at their own pace. -0. [Introduction](./introduction.md) - - [On listening to our customers](./introduction.md#on-listening-to-our-customers) - - [Examples of current state](./introduction.md#examples-of-current-state) - - [How can we Improve](./introduction.md#how-can-we-improve) - - [Current operating model](./introduction.md#current-operating-model) - - [Illustration of current challenges](./introduction.md#illustration-of-current-challenges) - - [List of issues in our current work-flows](./introduction.md#list-of-issues-in-our-current-work-flows) - - [Tainted resources](./introduction.md#tainted-resources) - - [Lack of package pinning](./introduction.md#lack-of-package-pinning) - - [Untested changes to production systems](./introduction.md#untested-changes-to-production-systems) - - [Lack of isolation between applications](./introduction.md#lack-of-isolation-between-applications) - - [Too many ways a system can be mutated](./introduction.md#too-many-ways-a-system-can-be-mutated) - - [Puppetmasters ensure eventual consistency](./introduction.md#puppetmasters-ensure-eventual-consistency) - - [Puppet's inability to guarantee symmetry among systems](./introduction.md#puppets-inability-to-guarantee-symmetry-among-systems) - - [A word on backups](./introduction.md#a-word-on-backups) - - [User experiences](./introduction.md#user-experiences) - - [Future Operating Model](./introduction.md#future-operating-model) - - [Specific areas of improvement](./introduction.md#specific-areas-of-improvement) - - [Automate all the things](./introduction.md#automate-all-the-things) - - [Built on cloud technology](./introduction.md#built-on-cloud-technology) - - [Provide self service opportunities](./introduction.md#provide-self-service-opportunities) - - [Create standards](./introduction.md#create-standards) - - [Treat datacenters as reusable components](./introduction.md#treat-datacenters-as-reusable-components) - - [Exterminate the "Human API"](./introduction.md#exterminate-the-human-api) - - [Use more community resources](./introduction.md#use-more-community-resources) - - [Revision everything](./introduction.md#revision-everything) - - [Transition work-flow to GitHub](./introduction.md#transition-work-flow-to-github) - - [Code Reviews](./introduction.md#code-reviews) - - [Provide Application isolation](./introduction.md#provide-application-isolation) - - [Provide a platform that can autoscale](./introduction.md#provide-a-platform-that-can-autoscale) - - [Bit for bit repeatable deployments](./introduction.md#bit-for-bit-repeatable-deployments) - - [Destroy Tainted resources](./introduction.md#destroy-tainted-resources) - - [Reduce time required to stand up a new application](./introduction.md#reduce-time-required-to-stand-up-a-new-application) - - [Provide analytical and trending monitoring for applications and systems](./introduction.md#provide-analytical-and-trending-monitoring-for-applications-and-systems) - - [Log / Audit everything](./introduction.md#log--audit-everything) - - [Provide transparency into web operations systems and deployment methodologies](./introduction.md#provide-transparency-into-web-operations-systems-and-deployment-methodologies) - - [Provide an open structure that enables us to better support the open web and the Mozilla community](./introduction.md#provide-an-open-structure-that-enables-us-to-better-support-the-open-web-and-the-mozilla-community) - - [Provide a better customer experience](./introduction.md#provide-a-better-customer-experience) -0. [New operating principles](./operating-principles.md) - - [New way of thinking](./operating-principles.md#new-way-of-thinking) - - [Twelve Factor Review](./operating-principles.md#twelve-factor-review) - - [Agile Development](./operating-principles.md#agile-development) - - [Symantic versioning](./operating-principles.md#symantic-versioning) - - [Code Reviews](./operating-principles.md#code-reviews) - - [Decentralization](./operating-principles.md#decentralization) - - [git and GitHub](./operating-principles.md#git-and-github) - - [System level configuration](./operating-principles.md#system-level-configuration) - - [Packer](./operating-principles.md#packer) - - [Puppet (masterless)](./operating-principles.md#puppet-masterless) - - [Consul and Confd](./operating-principles.md#consul-and-confd) - - [Image Upgrades](./operating-principles.md#image-upgrades) - - [To Autoscale or Not to Autoscale](./operating-principles.md#to-autoscale-or-not-to-autoscale) - - [Tainted Instances](./operating-principles.md#tainted-instances) - - [Security Requirements](./operating-principles.md#security-requirements) -0. [Exercise One](./exercise-one.md) - - [Setup](./exercise-one.md#setup) - - [Organize into groups](./exercise-one.md#organize-into_groups) - - [Chose a Topic](./exercise-one.md#chose-a-topic) - - [Discuss Improvements](./exercise-one.md#discuss-improvements) - - [Presentations and Discussions](./exercise-one.md#presentations-and-discussions) -0. [Nubis overview](./nubis-overview.md) - - [What is Nubis](./nubis-overview.md#what-is-nubis) - - [Standardized design](./nubis-overview.md#standardized-design) - - [Security compliance](./nubis-overview.md#security-compliance) - - [Reduced time-to-market](./nubis-overview.md#reduced-time-to-market) - - [What can Nubis do for me](./nubis-overview.md#what-can-nubis-do-for-me) - - [What does Nubis provide](./nubis-overview.md#what-does-nubis-provide) - - [Nubis accounts](./nubis-overview.md#nubis-accounts) - - [Accounts](./nubis-overview.md#accounts) - - [Account Diagram](#account-diagram) - - [Multiple environments](./nubis-overview.md#multiple-environments) - - [Quarterly Updates](./nubis-overview.md#quarterly-updates) - - [Distribution upgrades](./nubis-overview.md#distribution-upgrades) - - [Package updates](./nubis-overview.md#package-updates) - - [New services](./nubis-overview.md#new-services) - - [Application Image Updates](./nubis-overview.md#application-image-updates) - - [Security Updates](./nubis-overview.md#security-updates) - - [Included Services](./nubis-overview.md#included-services) - - [Proxies](./nubis-overview.md#proxies) - - [NATs](./nubis-overview.md#nats) - - [Consul Integration](./nubis-overview.md#consul-integration) - - [Fluent Integration](./nubis-overview.md#fluent-integration) - - [Jumphosts](./nubis-overview.md#jumphosts) - - [User Management](./nubis-overview.md#user-management) - - [MFA](./nubis-overview.md#mfa) - - [aws-vault](./nubis-overview.md#aws-vault) - - [LDAP Integration](./nubis-overview.md#ldap-integration) - - [Security Integration](./nubis-overview.md#security-integration) - - [InfoSec security audit role](./nubis-overview.md#infoSec-security-audit-role) - - [Network Security Monitoring](./nubis-overview.md#network-security-monitoring) (NSM) - - [Integrated IP Blacklisting](./nubis-overview.md#integrated-ip-blacklisting) - - [Log Integration with Mozilla Investigator](./nubis-overview.md#log-integration-with-mozilla-investigator) (MIG) - - [CloudTrail Integration](./nubis-overview.md#cloudtrail-integration) - - [Additional Services](./nubis-overview.md#additional-services) - - [Cloud Health Integration](./nubis-overview.md#cloud-health-integration) - - [Billing Support](./nubis-overview.md#billing-support) - - [Tainted Resources](./nubis-overview.md#tainted-resources) - - [Platform Monitoring](./nubis-overview.md#platform-monitoring) - - [High Availability](./nubis-overview.md#high-availability) - - [Nubis deployments](./nubis-overview.md#nubis-deployments) - - [Deployment Overview](./nubis-overview.md#deployment-overview) - - [Environments and how to use them](./nubis-overview.md#environments-and-how-to-use-them) - - [Deployment Workflow Diagram](./nubis-overview.md#deployment-workflow-diagram) - - [Deployment repository](./nubis-overview.md#deployment-repository) - - [Puppet configuration](./nubis-overview.md#puppet-configuration) - - [Application Code](./nubis-overview.md#application-Code) - - [Terraform modules](./nubis-overview.md#terraform-modules) - - [Recommended practices](./nubis-overview.md#recommended-practices) - - [Architectural design services](./nubis-overview.md#architectural-design-services) - - [Example deployments](./nubis-overview.md#example-deployments) - - [nubis-skel](./nubis-overview.md#nubis-skel) - - [AWS Solutions Architect](./nubis-overview.md#aws-solutions-architect) - - [Community support](./nubis-overview.md#community-support) - - [CI System](./nubis-overview.md#ci-system) - - [Rolling Back](./nubis-overview.md#rolling-back) - - [Custom Monitors](./nubis-overview.md#custom-monitors) - - [nubis-base](./nubis-overview.md#nubis-base) - - [nubis-builder](./nubis-overview.md#nubis-builder) - - [Build Deploy Diagram](./nubis-overview.md#build-deploy-diagram) -0. [Exercise Two](./exercise-two.md) - - [Chose a Topic](./exercise-two.md#chose-a-topic) - - [Diagram the deployment](./exercise-two.md#diagram-the-deployment) -0. [Demonstrations](./demonstrations.md) - - [Deploy a new application](./demonstrations.md#deploy-a-new-application) - - [Deploy new application code](./demonstrations.md#deploy-new-application-code) - - [Continuous Integration work-flow](./demonstrations.md#continuous-integration-work-flow) - - [Upgrade an account](./demonstrations.md#upgrade-an-account) - - [Troubleshooting](./demonstrations.md#troubleshooting) -0. [Working Labs](./working-labs.md) - - [Setting up your local environment](./working-labs.md#setting-up-your-local-environment) - - [Working with git & GitHub](./working-labs.md#working-with-git--github) - - [Deploying the Nubis example application Dpaste](./working-labs.md#deploying-the-nubis-example-application-dpaste) - - [Deploying your own application using nubis-skel](./working-labs.md#deploying-your-own-application-using-nubis-skel) - - [Updating system level packages](./working-labs.md#updating-system-level-packages) +This section contains the material we will cover in the classroom session. +Eventually we hope to develop this into a self-guided training course that a +person can work through at their own pace. + +1. [Introduction](./introduction.md) + * [On listening to our customers](./introduction.md#on-listening-to-our-customers) + * [Examples of current state](./introduction.md#examples-of-current-state) + * [How can we Improve](./introduction.md#how-can-we-improve) + * [Current operating model](./introduction.md#current-operating-model) + * [Illustration of current challenges](./introduction.md#illustration-of-current-challenges) + * [List of issues in our current work-flows](./introduction.md#list-of-issues-in-our-current-work-flows) + * [Tainted resources](./introduction.md#tainted-resources) + * [Lack of package pinning](./introduction.md#lack-of-package-pinning) + * [Untested changes to production systems](./introduction.md#untested-changes-to-production-systems) + * [Lack of isolation between applications](./introduction.md#lack-of-isolation-between-applications) + * [Too many ways a system can be mutated](./introduction.md#too-many-ways-a-system-can-be-mutated) + * [Puppetmasters ensure eventual consistency](./introduction.md#puppetmasters-ensure-eventual-consistency) + * [Puppet's inability to guarantee symmetry among systems](./introduction.md#puppets-inability-to-guarantee-symmetry-among-systems) + * [A word on backups](./introduction.md#a-word-on-backups) + * [User experiences](./introduction.md#user-experiences) + * [Future Operating Model](./introduction.md#future-operating-model) + * [Specific areas of improvement](./introduction.md#specific-areas-of-improvement) + * [Automate all the things](./introduction.md#automate-all-the-things) + * [Built on cloud technology](./introduction.md#built-on-cloud-technology) + * [Provide self service opportunities](./introduction.md#provide-self-service-opportunities) + * [Create standards](./introduction.md#create-standards) + * [Treat datacenters as reusable components](./introduction.md#treat-datacenters-as-reusable-components) + * [Exterminate the Human API](./introduction.md#exterminate-the-human-api) + * [Use more community resources](./introduction.md#use-more-community-resources) + * [Revision everything](./introduction.md#revision-everything) + * [Transition work-flow to GitHub](./introduction.md#transition-work-flow-to-github) + * [Code Reviews](./introduction.md#code-reviews) + * [Provide Application isolation](./introduction.md#provide-application-isolation) + * [Provide a platform that can autoscale](./introduction.md#provide-a-platform-that-can-autoscale) + * [Bit for bit repeatable deployments](./introduction.md#bit-for-bit-repeatable-deployments) + * [Destroy Tainted resources](./introduction.md#destroy-tainted-resources) + * [Reduce time required to stand up a new application](./introduction.md#reduce-time-required-to-stand-up-a-new-application) + * [Provide analytical and trending monitoring for applications and systems](./introduction.md#provide-analytical-and-trending-monitoring-for-applications-and-systems) + * [Log / Audit everything](./introduction.md#log--audit-everything) + * [Provide transparency into web operations systems and deployment methodologies](./introduction.md#provide-transparency-into-web-operations-systems-and-deployment-methodologies) + * [Provide an open structure that enables us to better support the open web](./introduction.md#provide-an-open-structure-that-enables-us-to-better-support-the-open-web) + * [Provide a better customer experience](./introduction.md#provide-a-better-customer-experience) +1. [New operating principles](./operating-principles.md) + * [New way of thinking](./operating-principles.md#new-way-of-thinking) + * [Twelve Factor Review](./operating-principles.md#twelve-factor-review) + * [Agile Development](./operating-principles.md#agile-development) + * [Symantic versioning](./operating-principles.md#symantic-versioning) + * [Code Reviews](./operating-principles.md#code-reviews) + * [Decentralization](./operating-principles.md#decentralization) + * [git and GitHub](./operating-principles.md#git-and-github) + * [System level configuration](./operating-principles.md#system-level-configuration) + * [Packer](./operating-principles.md#packer) + * [Puppet (masterless)](./operating-principles.md#puppet-masterless) + * [Consul and Confd](./operating-principles.md#consul-and-confd) + * [Image Upgrades](./operating-principles.md#image-upgrades) + * [To Autoscale or Not to Autoscale](./operating-principles.md#to-autoscale-or-not-to-autoscale) + * [Tainted Instances](./operating-principles.md#tainted-instances) + * [Security Requirements](./operating-principles.md#security-requirements) +1. [Exercise One](./exercise-one.md) + * [Setup](./exercise-one.md#setup) + * [Organize into groups](./exercise-one.md#organize-into_groups) + * [Chose a Topic](./exercise-one.md#chose-a-topic) + * [Discuss Improvements](./exercise-one.md#discuss-improvements) + * [Presentations and Discussions](./exercise-one.md#presentations-and-discussions) +1. [Nubis overview](./nubis-overview.md) + * [What is Nubis](./nubis-overview.md#what-is-nubis) + * [Standardized design](./nubis-overview.md#standardized-design) + * [Security compliance](./nubis-overview.md#security-compliance) + * [Reduced time-to-market](./nubis-overview.md#reduced-time-to-market) + * [What can Nubis do for me](./nubis-overview.md#what-can-nubis-do-for-me) + * [What does Nubis provide](./nubis-overview.md#what-does-nubis-provide) + * [Nubis accounts](./nubis-overview.md#nubis-accounts) + * [Accounts](./nubis-overview.md#accounts) + * [Account Diagram](#account-diagram) + * [Multiple environments](./nubis-overview.md#multiple-environments) + * [Quarterly Updates](./nubis-overview.md#quarterly-updates) + * [Distribution upgrades](./nubis-overview.md#distribution-upgrades) + * [Package updates](./nubis-overview.md#package-updates) + * [New services](./nubis-overview.md#new-services) + * [Application Image Updates](./nubis-overview.md#application-image-updates) + * [Security Updates](./nubis-overview.md#security-updates) + * [Included Services](./nubis-overview.md#included-services) + * [Proxies](./nubis-overview.md#proxies) + * [NATs](./nubis-overview.md#nats) + * [Consul Integration](./nubis-overview.md#consul-integration) + * [Fluent Integration](./nubis-overview.md#fluent-integration) + * [Jumphosts](./nubis-overview.md#jumphosts) + * [User Management](./nubis-overview.md#user-management) + * [MFA](./nubis-overview.md#mfa) + * [aws-vault](./nubis-overview.md#aws-vault) + * [LDAP Integration](./nubis-overview.md#ldap-integration) + * [Security Integration](./nubis-overview.md#security-integration) + * [InfoSec security audit role](./nubis-overview.md#infoSec-security-audit-role) + * [Network Security Monitoring](./nubis-overview.md#network-security-monitoring) + (NSM) + * [Integrated IP Blacklisting](./nubis-overview.md#integrated-ip-blacklisting) + * [Log Integration with Mozilla Investigator](./nubis-overview.md#log-integration-with-mozilla-investigator) + (MIG) + * [CloudTrail Integration](./nubis-overview.md#cloudtrail-integration) + * [Additional Services](./nubis-overview.md#additional-services) + * [Cloud Health Integration](./nubis-overview.md#cloud-health-integration) + * [Billing Support](./nubis-overview.md#billing-support) + * [Tainted Resources](./nubis-overview.md#tainted-resources) + * [Platform Monitoring](./nubis-overview.md#platform-monitoring) + * [High Availability](./nubis-overview.md#high-availability) + * [Nubis deployments](./nubis-overview.md#nubis-deployments) + * [Deployment Overview](./nubis-overview.md#deployment-overview) + * [Environments and how to use them](./nubis-overview.md#environments-and-how-to-use-them) + * [Deployment Workflow Diagram](./nubis-overview.md#deployment-workflow-diagram) + * [Deployment repository](./nubis-overview.md#deployment-repository) + * [Puppet configuration](./nubis-overview.md#puppet-configuration) + * [Application Code](./nubis-overview.md#application-Code) + * [Terraform modules](./nubis-overview.md#terraform-modules) + * [Recommended practices](./nubis-overview.md#recommended-practices) + * [Architectural design services](./nubis-overview.md#architectural-design-services) + * [Example deployments](./nubis-overview.md#example-deployments) + * [nubis-skel](./nubis-overview.md#nubis-skel) + * [AWS Solutions Architect](./nubis-overview.md#aws-solutions-architect) + * [Community support](./nubis-overview.md#community-support) + * [CI System](./nubis-overview.md#ci-system) + * [Rolling Back](./nubis-overview.md#rolling-back) + * [Custom Monitors](./nubis-overview.md#custom-monitors) + * [nubis-base](./nubis-overview.md#nubis-base) + * [nubis-builder](./nubis-overview.md#nubis-builder) + * [Build Deploy Diagram](./nubis-overview.md#build-deploy-diagram) +1. [Exercise Two](./exercise-two.md) + * [Chose a Topic](./exercise-two.md#chose-a-topic) + * [Diagram the deployment](./exercise-two.md#diagram-the-deployment) +1. [Demonstrations](./demonstrations.md) + * [Deploy a new application](./demonstrations.md#deploy-a-new-application) + * [Deploy new application code](./demonstrations.md#deploy-new-application-code) + * [Continuous Integration work-flow](./demonstrations.md#continuous-integration-work-flow) + * [Upgrade an account](./demonstrations.md#upgrade-an-account) + * [Troubleshooting](./demonstrations.md#troubleshooting) +1. [Working Labs](./working-labs.md) + * [Setting up your local environment](./working-labs.md#setting-up-your-local-environment) + * [Working with git & GitHub](./working-labs.md#working-with-git--github) + * [Deploying the Nubis example application Dpaste](./working-labs.md#deploying-the-nubis-example-application-dpaste) + * [Deploying your own application using nubis-skel](./working-labs.md#deploying-your-own-application-using-nubis-skel) + * [Updating system level packages](./working-labs.md#updating-system-level-packages) ## Operational Documentation (HOWTOs) -Here are some links to context relevant HOWTOs which are intended to guide you through many of the tasks you will need to perform using Nubis. - - How do I deploy an app - - How do I login to AWS? - - aws-vault overview (still might like a wrapper script for account setup) - - Walk-through dpaste deploy - - Build custom app with nubis-skel - - Detailed working example for git and GitHub - - How do I build an AMI? - - Features of nubis-base - - /etc/nubis.d/* - - consul integration - - Puppet masterless - - Puppet modules - - librarian-puppet - - Packer overview - - nubis-builder overview - - distrobutions supported - - project.json file requirements and options - - How do I launch a jumphost? - - How do I access instances - - What is the meaning of immutable - - What happens when my instance is marked as tainted? - - How does monitoring work in AWS? - - How do I upgrade my account to Nubis latest? - - Terraform overview - - Consul overview - - Fluent overview - - Proxy overview (including nat) - - Database admin node - - How do I add and remove users from my account - - Levels of user permissions +Here are some links to context relevant HOWTOs which are intended to guide you +through many of the tasks you will need to perform using Nubis. + +1. How do I deploy an app +1. How do I login to AWS? + * aws-vault overview (still might like a wrapper script for account setup) +1. Walk-through dpaste deploy +1. Build custom app with nubis-skel +1. Detailed working example for git and GitHub +1. How do I build an AMI? + * Features of nubis-base + * /etc/nubis.d/* + * consul integration + * Puppet masterless + * Puppet modules + * librarian-puppet + * Packer overview + * nubis-builder overview + * distrobutions supported + * project.json file requirements and options +1. How do I launch a jumphost? +1. How do I access instances +1. What is the meaning of immutable +1. What happens when my instance is marked as tainted? +1. How does monitoring work in AWS? +1. How do I upgrade my account to Nubis latest? +1. Terraform overview +1. Consul overview +1. Fluent overview +1. Proxy overview (including nat) +1. Database admin node +1. How do I add and remove users from my account + * Levels of user permissions ## Technical Documents (Design docs) -In this section you will find links to some of our technical and design documentation. This material is intended to help you with troubleshooting. It is also helpfull if you would like to get into helping us with Nubis development. - - NSM monitoring - - IP Blocklist - - Nat setup / HA / State - - User Management +In this section you will find links to some of our technical and design +documentation. This material is intended to help you with troubleshooting. It is +also helpfull if you would like to get into helping us with Nubis development. + +* NSM monitoring +* IP Blocklist +* Nat setup / HA / State +* User Management diff --git a/training/assumptions.md b/training/assumptions.md index 855e2b0..6fb06bf 100644 --- a/training/assumptions.md +++ b/training/assumptions.md @@ -1,38 +1,50 @@ -# Assumptions -For the sake of this training programe we are making a number of assumptions about what the student knows prior to beginning the course. These assumptions are documented here and we will update this list as we discover more. It is my intention that this list will be hyperlinked to existing trainign material or documentation for each of these topics. In that way this document can evolve into a starting point for the student to be able to reference and do some independant study prior to embarking on this journey. + -As to the specific day this training is being developed for, this list should in some fashon be included in the prerequisite material. The intention is that the students will review the material they feel they are weak in prior to attending the class. +# Assumptions +For the sake of this training programe we are making a number of assumptions +about what the student knows prior to beginning the course. These assumptions +are documented here and we will update this list as we discover more. It is my +intention that this list will be hyperlinked to existing trainign material or +documentation for each of these topics. In that way this document can evolve +into a starting point for the student to be able to reference and do some +independant study prior to embarking on this journey. + +As to the specific day this training is being developed for, this list should +in some fashon be included in the prerequisite material. The intention is that +the students will review the material they feel they are weak in prior to +attending the class. Assumptions: - - Working knowledge of: - - Linux - - Systems administration - - Networking principles (likely a week spot for some attendees) - - Database administration basics (likly a missing skill for some attendees) - - Open source (floss) (Might be a good idea to talk about this a bit) - - Agile development model theories (We might need to level up here) - - git and GitHub (might need to level up here) - - Puppet - - Working knowledge of AWS - - EC2 - - Instance sizing and pricing considerations - - Load Balancers - - RDS - - S3 - - Security groups - - IAM - - Security Rolls (including assume roll) - - Payer accounts - - Virtual Networking - - VPCs - - Routing - - Subnets (Public vs. Private) - - Internet Gateways - - Elastic IPs - - VPN Connections - - EFS - - Route 53 - - No working knowledge of the AWS CLI (Amazon training does not cover this) - - Local working environment not set up (Can we solve for this in some way?) - - Twelve Factor \ No newline at end of file + +* Working knowledge of: + * Linux + * Systems administration + * Networking principles (likely a week spot for some attendees) + * Database administration basics (likly a missing skill for some attendees) + * Open source (floss) (Might be a good idea to talk about this a bit) + * Agile development model theories (We might need to level up here) + * git and GitHub (might need to level up here) + * Puppet +* Working knowledge of AWS + * EC2 + * Instance sizing and pricing considerations + * Load Balancers + * RDS + * S3 + * Security groups + * IAM + * Security Rolls (including assume roll) + * Payer accounts + * Virtual Networking + * VPCs + * Routing + * Subnets (Public vs. Private) + * Internet Gateways + * Elastic IPs + * VPN Connections + * EFS + * Route 53 +* No working knowledge of the AWS CLI (Amazon training does not cover this) +* Local working environment not set up (Can we solve for this in some way?) +* Twelve Factor diff --git a/training/demonstrations.md b/training/demonstrations.md index c46b353..eaff19c 100644 --- a/training/demonstrations.md +++ b/training/demonstrations.md @@ -1,33 +1,51 @@ -# Demonstrations -In this section we will walk through a few demonstrations describing how to perform various tasks using Nubis. + - - [Deploy a new application](#deploy-a-new-application) - - [Deploy new application code](#deploy-new-application-code) - - [Continuous Integration work-flow](#continuous-integration-work-flow) - - [Upgrade an account](#upgrade-an-account) - - [Troubleshooting](#troubleshooting) +# Demonstrations + +In this section we will walk through a few demonstrations describing how to +perform various tasks using Nubis. + +* [Deploy a new application](#deploy-a-new-application) +* [Deploy new application code](#deploy-new-application-code) +* [Continuous Integration work-flow](#continuous-integration-work-flow) +* [Upgrade an account](#upgrade-an-account) +* [Troubleshooting](#troubleshooting) ## Deploy a new application -In this demonstration we will see a full, start to finish, example of deploying a web application from scratch using Nubis. + +In this demonstration we will see a full, start to finish, example of deploying +a web application from scratch using Nubis. **TODO** Create new application video ## Deploy new application code -Here we will see an example of updating code for an existing application in a sandbox account. This will be the process from updating code, rebuilding the image, through deploying the code in our sandbox environment. + +Here we will see an example of updating code for an existing application in a +sandbox account. This will be the process from updating code, rebuilding the +image, through deploying the code in our sandbox environment. Updating Sandbox [video](https://youtu.be/rBGvMJGXRR4) ## Continuous Integration work-flow -Here we will see how to promote our application code from the staging environment into the production environment. We will discover that we deployed bad code and will then see how to roll back to a working state. + +Here we will see how to promote our application code from the staging +environment into the production environment. We will discover that we deployed +bad code and will then see how to roll back to a working state. Continuous Integration [video](https://youtu.be/MTe_seH82bk) ## Upgrade an account -In this demonstration we will see how to update a Nubis account that is running a Nubis deployed application. + +In this demonstration we will see how to update a Nubis account that is running +a Nubis deployed application. Account upgrade [video](https://youtu.be/CjwkB-W009o) ## Troubleshooting -In this demonstration we will see how to deploy a jumphost into the account. We will then log into the jumphost and then log into a web server. This is a simple demonstration to illustrate how to get into a production Nubis account for troubleshooting. We will then see how tainted resources are treated. + +In this demonstration we will see how to deploy a jumphost into the account. We +will then log into the jumphost and then log into a web server. This is a simple +demonstration to illustrate how to get into a production Nubis account for +troubleshooting. We will then see how tainted resources are treated. Accessing instances [video](https://youtu.be/QschFVsEzzQ) diff --git a/training/exercise-one.md b/training/exercise-one.md index 2f73a3a..d48436f 100644 --- a/training/exercise-one.md +++ b/training/exercise-one.md @@ -1,54 +1,101 @@ -# Exercise One -For this exercise the participants will break up into small groups. Then they will chose a topic and discuss how they can improve by using the principles discussed in the previous two modules. + - - [Setup](#setup) - - [Organize into groups](#organize-into-groups) - - [Chose a Topic](#chose-a-topic) - - [Discuss Improvements](#discuss-improvements) - - [Presentations and Discussions](#presentations-and-discussions) +# Exercise One + +For this exercise the participants will break up into small groups. Then they +will chose a topic and discuss how they can improve by using the principles +discussed in the previous two modules. + +* [Setup](#setup) +* [Organize into groups](#organize-into-groups) +* [Chose a Topic](#chose-a-topic) +* [Discuss Improvements](#discuss-improvements) +* [Presentations and Discussions](#presentations-and-discussions) ## Setup -You will need to have large sticky pads or small white-boards for each team. These should be placed around the room so that each team can gather around without interfering with the other teams. + +You will need to have large sticky pads or small white-boards for each team. +These should be placed around the room so that each team can gather around +without interfering with the other teams. ## Organize into groups -Have the people count off by an appropriate number to create groups of four to five members. For example, if there are twenty people in the class have them count off to four. There will be four groups of five people each. -The intention is to create a number of groups for the exercise that are small enough that everyone can have a voice and participate. If the groups are to large it is difficult for everyone to participate. If the groups are to small it may be difficult for them to come up with a topic. +Have the people count off by an appropriate number to create groups of four to +five members. For example, if there are twenty people in the class have them +count off to four. There will be four groups of five people each. + +The intention is to create a number of groups for the exercise that are small +enough that everyone can have a voice and participate. If the groups are to +large it is difficult for everyone to participate. If the groups are to small it +may be difficult for them to come up with a topic. -Generally counting off while going around the room creates diversity amongst the teams. This way teams should be made up of a cross section of the people attending the training. Any method to create the groups can be used, the thing to avoid is entire groups of people who may not be involved closely enough with the technology to come up with a topic. +Generally counting off while going around the room creates diversity amongst the +teams. This way teams should be made up of a cross section of the people +attending the training. Any method to create the groups can be used, the thing +to avoid is entire groups of people who may not be involved closely enough with +the technology to come up with a topic. -Once everyone has counted off, have the teams gather around their paper or white-board. +Once everyone has counted off, have the teams gather around their paper or +white-board. ## Chose a Topic -Instruct the teams to each chose a topic. This should be a real topic and not a hypothetical. The topic should be a problem they have today. It will likely be around some technology or process they currently use in the datacenter that could use improvement. + +Instruct the teams to each chose a topic. This should be a real topic and not a +hypothetical. The topic should be a problem they have today. It will likely be +around some technology or process they currently use in the datacenter that +could use improvement. Have the team write the topic on the top of their paper. -The topics can be anything, however it needs to be something that can be solved by applying some of the operating principles previously discussed. It is best if each team has a unique topic. This enables a broader opportunity for collaborative discussion later on. +The topics can be anything, however it needs to be something that can be solved +by applying some of the operating principles previously discussed. It is best if +each team has a unique topic. This enables a broader opportunity for +collaborative discussion later on. -If teams are having difficulty coming up with a topic suggestions can be made from the following list. It is important for teams to try and come up with their own topic so that it is relevant and meaningful to them. +If teams are having difficulty coming up with a topic suggestions can be made +from the following list. It is important for teams to try and come up with their +own topic so that it is relevant and meaningful to them. Topic Ideas: - - Application deployment process - - Team collaboration - - System upgrading and patching - - Dependencies between applications - - Developer access - - Monitoring insights - - Rolling back - - Dependency management for applications - - Configuration management - - Configuration validation and testing - - Testing in general + +* Application deployment process +* Team collaboration +* System upgrading and patching +* Dependencies between applications +* Developer access +* Monitoring insights +* Rolling back +* Dependency management for applications +* Configuration management +* Configuration validation and testing +* Testing in general ## Discuss Improvements -Once each team has chosen a topic, instruct them to discuss how they could improve on the topic by applying some of the new operating principles. They do not need to document in great detail, a word or two so they remember what they discussed is fine. -This section is intended to take some time. Allow enough time for the teams to explore a number of possibilities. Do not rush this process. You should move around the room and make suggestions to help the teams along the correct path. Get a sense of the room and when the conversations begin to move off-topic that is a sign that it is time to move on. +Once each team has chosen a topic, instruct them to discuss how they could +improve on the topic by applying some of the new operating principles. They do +not need to document in great detail, a word or two so they remember what they +discussed is fine. + +This section is intended to take some time. Allow enough time for the teams to +explore a number of possibilities. Do not rush this process. You should move +around the room and make suggestions to help the teams along the correct path. +Get a sense of the room and when the conversations begin to move off-topic that +is a sign that it is time to move on. ## Presentations and Discussions -Have each team briefly introduce their topic and explain what they have discussed for improving the situation. Brief discussion is encouraged in order to further explore the operating principles. This is intended primarily to ensure that folks are gaining an understanding of the material and how they can apply it. -The pace of this should be based on the students understanding. If they are getting it right off this portion can be shortened. If many of the students are struggling with the concepts, this is the place to explore that and foster discussions that lead to understanding. +Have each team briefly introduce their topic and explain what they have +discussed for improving the situation. Brief discussion is encouraged in order +to further explore the operating principles. This is intended primarily to +ensure that folks are gaining an understanding of the material and how they can +apply it. + +The pace of this should be based on the students understanding. If they are +getting it right off this portion can be shortened. If many of the students are +struggling with the concepts, this is the place to explore that and foster +discussions that lead to understanding. -A break should immediately follow this discussion. This gives you the opportunity to work with individuals who may benefit from additional guidance. This will also encourage the discussions to carry over to the break. +A break should immediately follow this discussion. This gives you the +opportunity to work with individuals who may benefit from additional guidance. +This will also encourage the discussions to carry over to the break. diff --git a/training/exercise-two.md b/training/exercise-two.md index 68c261f..08ccdb7 100644 --- a/training/exercise-two.md +++ b/training/exercise-two.md @@ -1,13 +1,27 @@ -# Exercise Two -For this exercise we will take one or two of the topics we discovered in exercise one and see how we can apply the new operating principles to an actual deployment using Nubis. + -This is a group discussion which will be led by the trainer. It should be interactive, asking lots of questions while engaging the students. The goal is to help them to start to envision how Nubis could be used to solve problems. +# Exercise Two - - [Chose a Topic](#chose-a-topic) - - [Diagram the deployment](#diagram-the-deployment) +For this exercise we will take one or two of the topics we discovered in +exercise one and see how we can apply the new operating principles to an actual +deployment using Nubis. + +This is a group discussion which will be led by the trainer. It should be +interactive, asking lots of questions while engaging the students. The goal is +to help them to start to envision how Nubis could be used to solve problems. + +* [Chose a Topic](#chose-a-topic) +* [Diagram the deployment](#diagram-the-deployment) ## Chose a Topic -From the topics the groups discussed in exercise one, select one that will be easy to discuss and deploy using Nubis. If the topics are to general, select an application to discuss which is directly related to one of the topics. + +From the topics the groups discussed in exercise one, select one that will be +easy to discuss and deploy using Nubis. If the topics are to general, select an +application to discuss which is directly related to one of the topics. ## Diagram the deployment -On a whiteboard or large sticky note, diagram a deployment of the application using Nubis. Discuss how to move from existing code to a full deployment. Emphasis should be placed more on the problems being solved and less on the process of deployment. + +On a whiteboard or large sticky note, diagram a deployment of the application +using Nubis. Discuss how to move from existing code to a full deployment. +Emphasis should be placed more on the problems being solved and less on the +process of deployment. diff --git a/training/introduction.md b/training/introduction.md index 6cde9aa..dd8cff7 100644 --- a/training/introduction.md +++ b/training/introduction.md @@ -1,313 +1,909 @@ -# Introduction -I am going to say some things today that are going to upset some people. I want to say right up front that I acknowledge that some of you have a lot invested in how we currently operate. Many of you have worked for years within the current datacenter paradigm. My intention is not to lambaste anyone. The intent of the discussion today is to discover some challenges we might all be facing and talk about mitigating those challenges. Ultimately our job in IT is about creating experiences that serve our customers needs while creating an environment that they enjoy working in. I believe that we are facing a crisis in IT today. Our customers are leaving in droves to other offerings that provide better experiences. This is completely understandable from a customer perspective. However, it is my opinion that we, collectively have the knowledge and experience to provide what our users need in a way that will not only serve their needs, but that they enjoy, is reliable and secure. - -We have skills, expertise and experience in this organization that can rival any other shop out there. We simply need to take a look at some of the user experiences we are currently providing and see how we can improve on them. Some of the ideas you are going to hear today might sound a bit radical, they might be difficult to accept. I ask however, that you keep an open mind and try to look past the how, to the why. What I am going to present here is not perfect, it is a work in progress. Some of the decisions have been made, some of the ideas are central, however we have a long ways to go before we achieve success. That is where all of you come in, we need to come together as an organization and work together to try and provide the absolute best user experience we can. - - - [On listening to our customers](#on-listening-to-our-customers) - - [Examples of current state](#examples-of-current-state) - - [How can we Improve](#how-can-we-improve) - - [Current operating model](#current-operating-model) - - [Illustration of current challenges](#illustration-of-current-challenges) - - [List of issues in our current work-flows](#list-of-issues-in-our-current-work-flows) - - [Tainted resources](#tainted-resources) - - [Lack of package pinning](#lack-of-package-pinning) - - [Untested changes to production systems](#untested-changes-to-production-systems) - - [Lack of isolation between applications](#lack-of-isolation-between-applications) - - [Too many ways a system can be mutated](#too-many-ways-a-system-can-be-mutated) - - [Puppetmasters ensure eventual consistency](#puppetmasters-ensure-eventual-consistency) - - [Puppet's inability to guarantee symmetry among systems](#puppets-inability-to-guarantee-symmetry-among-systems) - - [A word on backups](#a-word-on-backups) - - [User experiences](#user-experiences) - - [Future Operating Model](#future-operating-model) - - [Specific areas of improvement](#specific-areas-of-improvement) - - [Automate all the things](#automate-all-the-things) - - [Built on cloud technology](#built-on-cloud-technology) - - [Provide self service opportunities](#provide-self-service-opportunities) - - [Create standards](#create-standards) - - [Treat datacenters as reusable components](#treat-datacenters-as-reusable-components) - - [Exterminate the "Human API"](#exterminate-the-human-api) - - [Use more community resources](#use-more-community-resources) - - [Revision everything](#revision-everything) - - [Transition work-flow to GitHub](#transition-work-flow-to-github) - - [Code Reviews](#code-reviews) - - [Provide Application isolation](#provide-application-isolation) - - [Provide a platform that can autoscale](#provide-a-platform-that-can-autoscale) - - [Bit for bit repeatable deployments](#bit-for-bit-repeatable-deployments) - - [Destroy Tainted resources](#destroy-tainted-resources) - - [Reduce time required to stand up a new application](#reduce-time-required-to-stand-up-a-new-application) - - [Provide analytical and trending monitoring for applications and systems](#provide-analytical-and-trending-monitoring-for-applications-and-systems) - - [Log / Audit everything](#log--audit-everything) - - [Provide transparency into web operations systems and deployment methodologies](#provide-transparency-into-web-operations-systems-and-deployment-methodologies) - - [Provide an open structure that enables us to better support the open web and the Mozilla community](#provide-an-open-structure-that-enables-us-to-better-support-the-open-web-and-the-mozilla-community) - - [Provide a better customer experience](#provide-a-better-customer-experience) + + +# Introduction + +I am going to say some things today that are going to upset some people. I want +to say right up front that I acknowledge that some of you have a lot invested in +how we currently operate. Many of you have worked for years within the current +datacenter paradigm. My intention is not to lambaste anyone. The intent of the +discussion today is to discover some challenges we might all be facing and talk +about mitigating those challenges. Ultimately our job in IT is about creating +experiences that serve our customers needs while creating an environment that +they enjoy working in. I believe that we are facing a crisis in IT today. Our +customers are leaving in droves to other offerings that provide better +experiences. This is completely understandable from a customer perspective. +However, it is my opinion that we, collectively have the knowledge and +experience to provide what our users need in a way that will not only serve +their needs, but that they enjoy, is reliable and secure. + +We have skills, expertise and experience in this organization that can rival any +other shop out there. We simply need to take a look at some of the user +experiences we are currently providing and see how we can improve on them. Some +of the ideas you are going to hear today might sound a bit radical, they might +be difficult to accept. I ask however, that you keep an open mind and try to +look past the how, to the why. What I am going to present here is not perfect, +it is a work in progress. Some of the decisions have been made, some of the +ideas are central, however we have a long ways to go before we achieve success. +That is where all of you come in, we need to come together as an organization +and work together to try and provide the absolute best user experience we can. + +* [On listening to our customers](#on-listening-to-our-customers) +* [Examples of current state](#examples-of-current-state) +* [How can we Improve](#how-can-we-improve) +* [Current operating model](#current-operating-model) + * [Illustration of current challenges](#illustration-of-current-challenges) + * [List of issues in our current work-flows](#list-of-issues-in-our-current-work-flows) + * [Tainted resources](#tainted-resources) + * [Lack of package pinning](#lack-of-package-pinning) + * [Untested changes to production systems](#untested-changes-to-production-systems) + * [Lack of isolation between applications](#lack-of-isolation-between-applications) + * [Too many ways a system can be mutated](#too-many-ways-a-system-can-be-mutated) + * [Puppetmasters ensure eventual consistency](#puppetmasters-ensure-eventual-consistency) + * [Puppet's inability to guarantee symmetry among systems](#puppets-inability-to-guarantee-symmetry-among-systems) + * [A word on backups](#a-word-on-backups) +* [User experiences](#user-experiences) +* [Future Operating Model](#future-operating-model) + * [Specific areas of improvement](#specific-areas-of-improvement) + * [Automate all the things](#automate-all-the-things) + * [Built on cloud technology](#built-on-cloud-technology) + * [Provide self service opportunities](#provide-self-service-opportunities) + * [Create standards](#create-standards) + * [Treat datacenters as reusable components](#treat-datacenters-as-reusable-components) + * [Exterminate the Human API](#exterminate-the-human-api) + * [Use more community resources](#use-more-community-resources) + * [Revision everything](#revision-everything) + * [Transition work-flow to GitHub](#transition-work-flow-to-github) + * [Code Reviews](#code-reviews) + * [Provide Application isolation](#provide-application-isolation) + * [Provide a platform that can autoscale](#provide-a-platform-that-can-autoscale) + * [Bit for bit repeatable deployments](#bit-for-bit-repeatable-deployments) + * [Destroy Tainted resources](#destroy-tainted-resources) + * [Reduce time required to stand up a new application](#reduce-time-required-to-stand-up-a-new-application) + * [Provide analytical and trending monitoring for applications and systems](#provide-analytical-and-trending-monitoring-for-applications-and-systems) + * [Log / Audit everything](#log--audit-everything) + * [Provide transparency into web operations systems and deployment methodologies](#provide-transparency-into-web-operations-systems-and-deployment-methodologies) + * [Provide an open structure that enables us to better support the open web](#provide-an-open-structure-that-enables-us-to-better-support-the-open-web) + * [Provide a better customer experience](#provide-a-better-customer-experience) ## On listening to our customers -We spend a great deal of time designing systems. We are really good at that. Just about any one of us can be handed a technical problem to solve, go off and figure out how to solve it, then implement our solution. We totally have that covered. Where things start to get tricky for us is when we need to work with other people and collaborate on the solutions they are working on. We are not great at coordinating within IT. Now, I think we, as an organization, understand that and we are working on it. The new capability model and city map are the start of what, I hope, will be a more cohesive way of working within our organization. I must say however, that we are not very good, at all, when it comes to understanding our customers needs. We do not encourage feedback from our customers, as a mater of course. We do not engage our customers when selecting problems to solve, let alone when choosing technology or defining processes. If we intend to remain relevant, we must figure out how to include our customers at all levels. We must start genuinely listening to them and taking their feedback to heart. We have a tradition in IT that says "We know best. We will build you what you need", this attitude simply must change. While we may be experts at the technology we specialize in, when it comes to our customers needs, we do not know best. We do not begin to know best or, in many cases, even understand their needs, let alone their wants. We can do this. We can listen. We can fix our processes to include feedback from our users, and we must do this. -## Examples of current state -We need to travel with our customers to where they want to go. We currently provide either bare bones VMs or complicated and convoluted web clusters for our infrastructure offerings. Lets look at each of those in turn. - -Think about the bare bones VM offering. In some cases this can make sense, however we must consider that in order for a developer to use this system they need to have quite a bit of knowledge about linux operating systems. Granted many of them do, however in order to operate an application in a robust, secure and reliable manner requires a whole other level of knowledge and expertise. It is not reasonable for us to assume that our customers have that knowledge any more that it would be for them to assume we were experts in the technology they are deploying. I ask you, how many people here know how to debug the java compiler? How about go? Python? PHP? I could go on, but the point is that no one person can be an expert in everything. This is precisely why we specialize. We, as IT, specialize in delivering safe, reliable and secure systems. Yet we often over burden our customers with exactly what we are experts in. +We spend a great deal of time designing systems. We are really good at that. +Just about any one of us can be handed a technical problem to solve, go off and +figure out how to solve it, then implement our solution. We totally have that +covered. Where things start to get tricky for us is when we need to work with +other people and collaborate on the solutions they are working on. We are not +great at coordinating within IT. Now, I think we, as an organization, understand +that and we are working on it. The new capability model and city map are the +start of what, I hope, will be a more cohesive way of working within our +organization. I must say however, that we are not very good, at all, when it +comes to understanding our customers needs. We do not encourage feedback from +our customers, as a mater of course. We do not engage our customers when +selecting problems to solve, let alone when choosing technology or defining +processes. If we intend to remain relevant, we must figure out how to include +our customers at all levels. We must start genuinely listening to them and +taking their feedback to heart. We have a tradition in IT that says "We know +best. We will build you what you need", this attitude simply must change. While +we may be experts at the technology we specialize in, when it comes to our +customers needs, we do not know best. We do not begin to know best or, in many +cases, even understand their needs, let alone their wants. We can do this. We +can listen. We can fix our processes to include feedback from our users, and we +must do this. -Lets look now at the web cluster model. The developers have zero access into these systems. That is by design as they are shared systems and many of the applications running there contain sensitive data. The customers do not have any insight into the deployment of their application, they can not even see the settings file for their application. When it comes to troubleshooting, we do not have a way for most of our customers to even get at their application logs. If there is ever an issue, they must contact us, then we attempt to troubleshoot their application, built on a technology that they specialize in and, that by definition, we do not. When we think about the experience behind that, how our customer must feel, frustrated, under-served, like IT has a feeling of superiority and a control they will not relinquish. I am not making this stuff up, this comes directly from numerous conversations that I have had with our customers over the years. +## Examples of current state -In both of the examples above we have a situation in which the subject matter expert is operating outside of their field of expertise. On the one hand a developer is tasked with the low level tasks that a systems administrator specializes in. On the other hand we have systems administrators attempting to troubleshoot technologies that developers specialize in. Note well that I am not advocating against a devops model. In fact I am advocating for a devops model. One in which we, as IT, work closely with the developers to create a devops environment. +We need to travel with our customers to where they want to go. We currently +provide either bare bones VMs or complicated and convoluted web clusters for our +infrastructure offerings. Lets look at each of those in turn. + +Think about the bare bones VM offering. In some cases this can make sense, +however we must consider that in order for a developer to use this system they +need to have quite a bit of knowledge about linux operating systems. Granted +many of them do, however in order to operate an application in a robust, secure +and reliable manner requires a whole other level of knowledge and expertise. It +is not reasonable for us to assume that our customers have that knowledge any +more that it would be for them to assume we were experts in the technology they +are deploying. I ask you, how many people here know how to debug the java +compiler? How about go? Python? PHP? I could go on, but the point is that no one +person can be an expert in everything. This is precisely why we specialize. We, +as IT, specialize in delivering safe, reliable and secure systems. Yet we often +over burden our customers with exactly what we are experts in. + +Lets look now at the web cluster model. The developers have zero access into +these systems. That is by design as they are shared systems and many of the +applications running there contain sensitive data. The customers do not have any +insight into the deployment of their application, they can not even see the +settings file for their application. When it comes to troubleshooting, we do not +have a way for most of our customers to even get at their application logs. If +there is ever an issue, they must contact us, then we attempt to troubleshoot +their application, built on a technology that they specialize in and, that by +definition, we do not. When we think about the experience behind that, how our +customer must feel, frustrated, under-served, like IT has a feeling of +superiority and a control they will not relinquish. I am not making this stuff +up, this comes directly from numerous conversations that I have had with our +customers over the years. + +In both of the examples above we have a situation in which the subject matter +expert is operating outside of their field of expertise. On the one hand a +developer is tasked with the low level tasks that a systems administrator +specializes in. On the other hand we have systems administrators attempting to +troubleshoot technologies that developers specialize in. Note well that I am not +advocating against a devops model. In fact I am advocating for a devops model. +One in which we, as IT, work closely with the developers to create a devops +environment. ## How can we Improve -So, how do we fix this? How do we provide our customers with what they need? First, we must engage our customers, second we need to provide them what they need and desire. Our customers are asking for things like; faster turn around time for systems, more self-service opportunities, more control over their own applications, more insight into the systems running their applications, options for running in containers, support for more third-party (SAAS & PAAS) offerings, a better password experience, better ticketing systems, and the list goes on. I know that many of these issues are being addressed in and around IT. For our part here today we are going to be discussing our infrastructure offering, our application deployment offering and self service opportunities. We are going to start by taking a look at how we operate in the datacenter and look for some areas where we can discover opportunity for improvement. + +So, how do we fix this? How do we provide our customers with what they need? +First, we must engage our customers, second we need to provide them what they +need and desire. Our customers are asking for things like; faster turn around +time for systems, more self-service opportunities, more control over their own +applications, more insight into the systems running their applications, options +for running in containers, support for more third-party (SAAS & PAAS) offerings, +a better password experience, better ticketing systems, and the list goes on. +I know that many of these issues are being addressed in and around IT. For our +part here today we are going to be discussing our infrastructure offering, our +application deployment offering and self service opportunities. We are going to +start by taking a look at how we operate in the datacenter and look for some +areas where we can discover opportunity for improvement. ## Current operating model -If we look at the way we operate in the datacenter we discover that a number of our current practices are based on certain constraints. For example, it takes quite a lot of time and planning to upgrade to a bigger, more robust, server. From specking to purchasing approval to shipping time, just to get the hardware on-site. Then the datacenter technicians need to rack and cable the hardware as well as install a basic operating system. In my experience this process can take anywhere form about six weeks to more than a year, for larger projects. Lets look at another example, say I need to increase the disk space in one of my servers. I still need to go through the specking, purchasing and shipping process to get the new drives on-site. However once they are there a process starts where I coordinate with the datacenter technicians to swap out one disk at a time while I monitor the RAID rebuild status. I am sure we have all experienced issues rebuilding RAID arrays that range from hours upon hours of degraded performance, while corrupt arrays error check and rebuild, to complete loss of array integrity and data. Time for the backups, we have good backups right? This assumes that I have RAID on-board and don't need to do an application migration to accomplish a disk upgrade. +If we look at the way we operate in the datacenter we discover that a number +of our current practices are based on certain constraints. For example, it takes +quite a lot of time and planning to upgrade to a bigger, more robust, server. +From specking to purchasing approval to shipping time, just to get the hardware +on-site. Then the datacenter technicians need to rack and cable the hardware as +well as install a basic operating system. In my experience this process can take +anywhere form about six weeks to more than a year, for larger projects. Lets +look at another example, say I need to increase the disk space in one of my +servers. I still need to go through the specking, purchasing and shipping +process to get the new drives on-site. However once they are there a process +starts where I coordinate with the datacenter technicians to swap out one disk +at a time while I monitor the RAID rebuild status. I am sure we have all +experienced issues rebuilding RAID arrays that range from hours upon hours of +degraded performance, while corrupt arrays error check and rebuild, to complete +loss of array integrity and data. Time for the backups, we have good backups +right? This assumes that I have RAID on-board and don't need to do an +application migration to accomplish a disk upgrade. ### Illustration of current challenges -Late me take a moment to talk about people making manual changes. When I was working on a web operations team, a developer for one of the web sites on one of our web clusters contacted me about an intermittent issue with their application. When loading some of the pages, there would occasionally be an error rendering portions of the page, however a reload would often fix the issue. I began troubleshooting and after quite a bit of time noticed that the errors only happened when one particular web server was serving up the page. Having narrowed the issue down to a single server I focused in and began trying to determine why that one server was misbehaving. After several days I was getting desperate (AKA my bag of tricks was nearly empty). I decided to just start comparing installed package version with other web servers. I discovered that a library on the misbehaving web server was at a slightly newer version than on the remaining servers in the cluster. Not knowing why this library was upgraded or how it was upgraded, I went to the developer to ask if they thought this could be the issue. The developer told me that in fact there was a small change in the library that would break their application. Armed with that knowledge, I began trying to figure out what, or who, might have upgraded this library. You see I was concerned that if I simply downgraded the library that I might break something else. I started with the usual suspects, was there a change in puppet that had not made it to the remaining servers? Did someone not pin a version and perhaps this one server came on-line at a later date than the other servers, hence getting a newer version. Perhaps another one of my team members upgraded it when troubleshooting a different issue? After exhausting my set of inquires as to the origin of the upgrade, I decided to just take the risk and downgrade the library version to match the other web servers in the cluster. I mean, after all the site was serving up errors, and had been doing so for at least three days by this point. I executed the command using the package manager and, as package managers do, it listed the other changes that it would need to make to the system to satisfy my request. In the list of changes was removal of a command line tool that I had installed a week prior in order to troubleshoot a separate issue. It turned out that I had been the source of the trouble all along. -There are several problems illustrated with that example. There was no way to ensure that all of the servers in the cluster were identical. There was no way to tell who or what might have made changes to the system. Due to the fact that we were hosting multiple applications on a single cluster, there was no way to know if my downgrading the library would not break another site. There were to many ways a system could be modified. There was a lack of monitoring resulting in the developer reporting the issue, this is not a good customer experience. The human process around troubleshooting did not include ensuring the environment was pristine on logout. There were no logs describing sudo user commands, which could have been used for historical information, I simply had to ask around and rely on peoples memory. Critical packages were not pinned at specific versions. The Puppetmaster methodology attempts to ensure eventual consistency but can only be deterministic about assets it is aware of. There are probably more things we can point out that are less than ideal in this example, but I hope you can agree that there must be a better way of operating. +Late me take a moment to talk about people making manual changes. When I was +working on a web operations team, a developer for one of the web sites on one of +our web clusters contacted me about an intermittent issue with their +application. When loading some of the pages, there would occasionally be an +error rendering portions of the page, however a reload would often fix the +issue. I began troubleshooting and after quite a bit of time noticed that the +errors only happened when one particular web server was serving up the page. +Having narrowed the issue down to a single server I focused in and began trying +to determine why that one server was misbehaving. After several days I was +getting desperate (AKA my bag of tricks was nearly empty). I decided to just +start comparing installed package version with other web servers. I discovered +that a library on the misbehaving web server was at a slightly newer version +than on the remaining servers in the cluster. Not knowing why this library was +upgraded or how it was upgraded, I went to the developer to ask if they thought +this could be the issue. The developer told me that in fact there was a small +change in the library that would break their application. Armed with that +knowledge, I began trying to figure out what, or who, might have upgraded this +library. You see I was concerned that if I simply downgraded the library that +I might break something else. I started with the usual suspects, was there a +change in puppet that had not made it to the remaining servers? Did someone not +pin a version and perhaps this one server came on-line at a later date than the +other servers, hence getting a newer version. Perhaps another one of my team +members upgraded it when troubleshooting a different issue? After exhausting my +set of inquires as to the origin of the upgrade, I decided to just take the risk +and downgrade the library version to match the other web servers in the cluster. +I mean, after all the site was serving up errors, and had been doing so for at +least three days by this point. I executed the command using the package manager +and, as package managers do, it listed the other changes that it would need to +make to the system to satisfy my request. In the list of changes was removal of +a command line tool that I had installed a week prior in order to troubleshoot +a separate issue. It turned out that I had been the source of the trouble all +along. + +There are several problems illustrated with that example. There was no way to +ensure that all of the servers in the cluster were identical. There was no way +to tell who or what might have made changes to the system. Due to the fact that +we were hosting multiple applications on a single cluster, there was no way to +know if my downgrading the library would not break another site. There were to +many ways a system could be modified. There was a lack of monitoring resulting +in the developer reporting the issue, this is not a good customer experience. +The human process around troubleshooting did not include ensuring the +environment was pristine on logout. There were no logs describing sudo user +commands, which could have been used for historical information, I simply had to +ask around and rely on peoples memory. Critical packages were not pinned at +specific versions. The Puppetmaster methodology attempts to ensure eventual +consistency but can only be deterministic about assets it is aware of. There are +probably more things we can point out that are less than ideal in this example, +but I hope you can agree that there must be a better way of operating. + +### List of issues in our current work-flows -### List of issues in our current work-flows: Lets take a moment to dig into some of the challenges we currently face. - - [Tainted resources](#tainted-resources) - - [Lack of package pinning](#lack-of-package-pinning) - - [Untested changes to production systems](#untested-changes-to-production-systems) - - [Lack of isolation between applications](#lack-of-isolation-between-applications) - - [Too many ways a system can be mutated](#too-many-ways-a-system-can-be-mutated) - - [Puppetmasters ensure eventual consistency](#puppetmasters-ensure-eventual-consistency) - - [Puppet's inability to guarantee symmetry among systems](#puppets-inability-to-guarantee-symmetry-among-systems) - - [A word on backups](#a-word-on-backups) +* [Tainted resources](#tainted-resources) +* [Lack of package pinning](#lack-of-package-pinning) +* [Untested changes to production systems](#untested-changes-to-production-systems) +* [Lack of isolation between applications](#lack-of-isolation-between-applications) +* [Too many ways a system can be mutated](#too-many-ways-a-system-can-be-mutated) +* [Puppetmasters ensure eventual consistency](#puppetmasters-ensure-eventual-consistency) +* [Puppet's inability to guarantee symmetry among systems](#puppets-inability-to-guarantee-symmetry-among-systems) +* [A word on backups](#a-word-on-backups) #### Tainted resources -The idea of tainted resources will be quite new to those of you who have spent most of your careers lovingly hand crafting artisan systems. That is to say, for a long time we in IT have set up systems by hand, sure we sprinkle in a bit of bootstrapping automation and throw a little puppet on top, but we still find ourselves tweaking things by hand in order to get optimal performance. By definition these servers are tainted even before they get put into production, meaning there are customizations on the servers that are configured by hand, and not by automation. - -Tainted resources are bad because they are not repeatable. There is no way to guarantee one system will be identical to another. This may be a desirable trait in art, however it is a potentially disastrous trait in computing. We need to know with as near to 100 percent accuracy as possible that systems are identical. For example, we need to know that the production environment, which we are getting ready to deploy a new version of an application into, is identical to the staging environment we have just completed successful testing in. -This brings us to tainted resources. We must consider any system that has been modified from its well defined state to be tainted. This means that any local modification taints the resource. Further any tainted resource should be destroyed and rebuilt from a golden image. As you can see, basically every asset we run in the datacenter is by definition tainted and, if we are interested in reliability, should be destroyed and rebuilt from a golden image. We will talk more about golden images and the exact definition of tainted resources in a little bit. +The idea of tainted resources will be quite new to those of you who have spent +most of your careers lovingly hand crafting artisan systems. That is to say, for +a long time we in IT have set up systems by hand, sure we sprinkle in a bit of +bootstrapping automation and throw a little puppet on top, but we still find +ourselves tweaking things by hand in order to get optimal performance. By +definition these servers are tainted even before they get put into +production, meaning there are customizations on the servers that are configured +by hand, and not by automation. + +Tainted resources are bad because they are not repeatable. There is no way to +guarantee one system will be identical to another. This may be a desirable trait +in art, however it is a potentially disastrous trait in computing. We need to +know with as near to 100 percent accuracy as possible that systems are +identical. For example, we need to know that the production environment, which +we are getting ready to deploy a new version of an application into, is +identical to the staging environment we have just completed successful testing +in. + +This brings us to tainted resources. We must consider any system that has been +modified from its well defined state to be tainted. This means that any local +modification taints the resource. Further any tainted resource should be +destroyed and rebuilt from a golden image. As you can see, basically every +asset we run in the datacenter is by definition tainted and, if we are +interested in reliability, should be destroyed and rebuilt from a golden image. +We will talk more about golden images and the exact definition of tainted +resources in a little bit. #### Lack of package pinning -This problem seems innocuous but is actually quite insidious. Without pinning all packages to a particular version it is not possible to know if an application will run without errors. If a developer is working with one set of package versions and the web servers are running a different set of package versions, it will eventually lead to errors. This goes one step further when working with web clusters. As seen in the example above, it is possible to have different versions installed on different servers within the same cluster. It therefore becomes quite important, in terms of reliability to have some way to ensure that we are working with the same set of dependencies across every environment that an application will be deployed into. + +This problem seems innocuous but is actually quite insidious. Without pinning +all packages to a particular version it is not possible to know if an +application will run without errors. If a developer is working with one set of +package versions and the web servers are running a different set of package +versions, it will eventually lead to errors. This goes one step further when +working with web clusters. As seen in the example above, it is possible to have +different versions installed on different servers within the same cluster. It +therefore becomes quite important, in terms of reliability to have some way to +ensure that we are working with the same set of dependencies across every +environment that an application will be deployed into. #### Untested changes to production systems -While we have some dev and staging environments, these only cover a narrow set of circumstances. For example, we have a staging server to test a web application before promoting it to production. We do not however, have a staging system for puppet. This means that any changes made to puppet code are tested in the production environment. In fact, I have taken down the majority of our Apache servers whilst making a change to out base Apache Puppet module. Further more, we do not have a testing environment for our monitoring system, for our code deployment pipeline, and many of our underlying systems (DNS, DHCP, NTP, etc) do not have staging environments at all. I do not think I need to go into the details of why this is an issue, but suffice it to say this has caused many outages that could have been avoided with a way to test changes before they went into production. + +While we have some dev and staging environments, these only cover a narrow set +of circumstances. For example, we have a staging server to test a web +application before promoting it to production. We do not however, have a staging +system for puppet. This means that any changes made to puppet code are tested in +the production environment. In fact, I have taken down the majority of our +Apache servers whilst making a change to out base Apache Puppet module. Further +more, we do not have a testing environment for our monitoring system, for our +code deployment pipeline, and many of our underlying systems (DNS, DHCP, NTP, +etc) do not have staging environments at all. I do not think I need to go into +the details of why this is an issue, but suffice it to say this has caused many +outages that could have been avoided with a way to test changes before they went +into production. #### Lack of isolation between applications -This is a multi-part problem, however the basic issue is simple. First, due to the use of web clusters, it is not currently possible to know which dependencies were installed for which application. This leads to a situation where upgrades are difficult as it is not possible to know with any certainty that an update for one application will not break another application. Second, most application servers are configured from a single puppet module with little separation of dependencies. This leads to the same issue but on a wider scale, in other words it is nearly impossible to ascertain if upgrading a dependency for one application server will affect another application server. Both of these causes attain the same result, operators are hesitant to upgrade any dependencies due to fear of causing more issues. This leads to a situation where systems are quite out of date and open to security vulnerabilities. Further it creates a situation where our customers are often required to re-code their applications in order to use the old dependencies we have installed on production systems. + +This is a multi-part problem, however the basic issue is simple. First, due to +the use of web clusters, it is not currently possible to know which dependencies +were installed for which application. This leads to a situation where upgrades +are difficult as it is not possible to know with any certainty that an update +for one application will not break another application. Second, most application +servers are configured from a single puppet module with little separation of +dependencies. This leads to the same issue but on a wider scale, in other words +it is nearly impossible to ascertain if upgrading a dependency for one +application server will affect another application server. Both of these causes +attain the same result, operators are hesitant to upgrade any dependencies due +to fear of causing more issues. This leads to a situation where systems are +quite out of date and open to security vulnerabilities. Further it creates a +situation where our customers are often required to re-code their applications +in order to use the old dependencies we have installed on production systems. #### Too many ways a system can be mutated -There are many ways a system can be changed today. Puppet controls some of the system. Administrators can log on and make manual changes. Automated deployments of application code which are, in many cases, not tracked or logged. Developers can manually deploy changes including changing packages through language specific package managers like pip or npm. During security fire-drills administrators make changes in a wily-nilly fashion, often with little regard to testing. External security auditing tools, like Mig, can modify the state of a running system. The list goes on, but hopefully you understand the issue. With so many ways a system can be modified it creates a situation which makes troubleshooting difficult. In fact it is often difficult to know with any certainty that an application will run at all, let alone without errors. + +There are many ways a system can be changed today. Puppet controls some of the +system. Administrators can log on and make manual changes. Automated deployments +of application code which are, in many cases, not tracked or logged. Developers +can manually deploy changes including changing packages through language +specific package managers like pip or npm. During security fire-drills +administrators make changes in a wily-nilly fashion, often with little regard to +testing. External security auditing tools, like Mig, can modify the state of a +running system. The list goes on, but hopefully you understand the issue. With +so many ways a system can be modified it creates a situation which makes +troubleshooting difficult. In fact it is often difficult to know with any +certainty that an application will run at all, let alone without errors. #### Puppetmasters ensure eventual consistency -Puppet in the datacenter attempts to ensure eventual consistency of assets it is aware of. This leads to a number of issues. When upgrading or adding a new package to a series of servers, there is a time when different systems have different versions. This is not handled in a controlled rollout, but in an ad-hock manner which often leads to inconsistencies during this period. Puppet does not control the entire system, hence there can be a lot of inconsistency between servers. When we look at updating a package through puppet in the datacenter the work-flow is; user updates file on locally checked out copy, user pushes changes to version control system, puppetmasters eventually update from a cron job, taking upwards of 15 minutes, individual servers update on a 30 minute schedule. This means that without sudo user intervention, the process, from the time the user checks in code to the time all servers are updated, can take over an hour. There are very few standards around Puppet, leading to messy and inconsistent puppet modules as well as issues with puppet version upgrades as well as interoperability issues between various linux distributions. Version pinning is optional, this leads to version mismatches between servers in clusters or environments. + +Puppet in the datacenter attempts to ensure eventual consistency of assets it is +aware of. This leads to a number of issues. When upgrading or adding a new +package to a series of servers, there is a time when different systems have +different versions. This is not handled in a controlled rollout, but in an +ad-hock manner which often leads to inconsistencies during this period. Puppet +does not control the entire system, hence there can be a lot of inconsistency +between servers. When we look at updating a package through puppet in the +datacenter the work-flow is; user updates file on locally checked out copy, +user pushes changes to version control system, puppetmasters eventually update +from a cron job, taking upwards of 15 minutes, individual servers update on a +30 minute schedule. This means that without sudo user intervention, the process, +from the time the user checks in code to the time all servers are updated, can +take over an hour. There are very few standards around Puppet, leading to messy +and inconsistent puppet modules as well as issues with puppet version upgrades +as well as interoperability issues between various linux distributions. Version +pinning is optional, this leads to version mismatches between servers in +clusters or environments. #### Puppet's inability to guarantee symmetry among systems -This point can be illustrated by asking a simple question "Can you guarantee that two servers in your web cluster are identical?". The answer is "No". This is no fault of puppet, but rather that puppet does not control, or even know about, all of the packages and configurations installed on a given system. Puppet can only control the assets it is aware of, and that is typically a fraction of the packages and configurations that are actually on a system. -#### A word on backups -When was the last time you verified your backups. I do not mean that they are running on the correct schedule, I assume you have an alert for that. I am talking about actually using a backup to restore your system to verify it actually works. I posit that without this test there is no way to know that a system can actually be recovered. I would further suggest that backups should be restored routinely and automatically. This brings me to my point. Any process that is not tested routinely can not be expected to work when it is needed. Any process that is not automated will eventually fail due to errors. There are several places where errors creep up during manual processes; the first and perhaps most obvious is simple human error (colloquially known as fat fingering a command), the second and perhaps most insidious error happens when something on the system is different from when the process was created or last tested. There are many many ways for things to change in the way we operate in the datacenter, as we have already discussed. +This point can be illustrated by asking a simple question "Can you guarantee +that two servers in your web cluster are identical?". The answer is "No". This +is no fault of puppet, but rather that puppet does not control, or even know +about, all of the packages and configurations installed on a given system. +Puppet can only control the assets it is aware of, and that is typically a +fraction of the packages and configurations that are actually on a system. -## User experiences -TODO: List some user experiences that exist today that we can improve upon. The idea here is to humanize the issues we are addressing. To reinforce the ideas presented above. (Might not be necessary???) - - Time to market for new account - - Time to enact small changes - - lack of transparency - - etc... +#### A word on backups -## Future Operating Model -At this point we should all have a general understanding of some areas in our current work-flows and processes that could stand some improvement. Quite some time ago I sat down with a few of my coworkers and we began to brainstorm around how we could improve on our current situation. In doing so we came up with some basic concepts and principles under which we should operate. +When was the last time you verified your backups. I do not mean that they are +running on the correct schedule, I assume you have an alert for that. I am +talking about actually using a backup to restore your system to verify it +actually works. I posit that without this test there is no way to know that a +system can actually be recovered. I would further suggest that backups should be +restored routinely and automatically. This brings me to my point. Any process +that is not tested routinely can not be expected to work when it is needed. Any +process that is not automated will eventually fail due to errors. There are +several places where errors creep up during manual processes; the first and +perhaps most obvious is simple human error (colloquially known as fat fingering +a command), the second and perhaps most insidious error happens when something +on the system is different from when the process was created or last tested. +There are many many ways for things to change in the way we operate in the +datacenter, as we have already discussed. -We have all of these ideas of these specific technological pieces that we want to improve upon, but it is not enough to simply say "make better" or improve upon. We instead need to be able to define what it means to make better or improve upon. We started by taking this list of things that did not work as well as we would like and tried to identify a work-flow or ideal system for each of these. It became clear from the start that we were talking about creating a system that was highly agile and automated. It was also clear that we had to get out of the way of the process. We needed to remove ourselves from the path of work as much as possible, in other words get rid of the "human API". In order to achieve this we have a need to provide, not only a high level of automation, but also a high level of self service. In other words, we need to empower our users to be able to deploy their applications and systems without going through us. This is a shift in paradigm, a new way of thinking for us. This creates an opportunity for us to reevaluate our roll in delivering systems to our end users. It also points out the need to think about standards and security right from the start. There is a little process built in to this system, a sort of bargain or agreement that we all have to buy into or accept, but as much as possible we want to stay out of the process. We want to simply deliver an awesome platform that the end users can develop their own processes around. We need to provide them the tools so that they can build the process they require, without muddying the waters by adding our own process. Everything must, therefore be well documented, in as open a place as possible. This leads us to some basic tenants of this new system: +## User experiences - - Everything must be well documented. Preferably, things should be self documenting. - - We should use open source, community built and maintained pieces wherever possible. - - Each piece we build should be designed to be reusable as well as open source. - - Everything should be developed in the open and be available for review, comment, pull-request, etc. +TODO: List some user experiences that exist today that we can improve upon. The +idea here is to humanize the issues we are addressing. To reinforce the ideas +presented above. (Might not be necessary???) -It is important to note that we are not trying to provide a datacenter automation strategy. We are operating in a new arena, specifically the cloud. Our management team has decided that we must limit ourselves to working solely with the Amazon cloud offering, Amazon Web Services (AWS). +* Time to market for new account +* Time to enact small changes +* lack of transparency +* etc... -Put succinctly we are trying to provide a framework for deploying applications in a simple, self-service, automated, and repeatable way. Note here, we consider everything that runs on a system to be an application regardless of its complexity or protocol. In other words, a simple web site is an application. Additionally a DNS server is an application, as is a monitoring system or a metrics system. We make no distinction for what is deployed, they are all treated equal. +## Future Operating Model -This brings us to some high level ideals about what we are trying to accomplish. These include: - - Delivering standards that users can design applications around - - Providing self service opportunities to our customers - - Providing automated; provisioning, networking, firewalling, monitoring, backups, load balancing, autoscaling, etc - - Ensure basic security through; application isolation, ssl certificate automation, code review, InfoSec review (RRA), etc... - - Provide a comprehensive, analytical, trending monitoring suite covering the entire application stack +At this point we should all have a general understanding of some areas in our +current work-flows and processes that could stand some improvement. Quite some +time ago I sat down with a few of my coworkers and we began to brainstorm around +how we could improve on our current situation. In doing so we came up with some +basic concepts and principles under which we should operate. + +We have all of these ideas of these specific technological pieces that we want +to improve upon, but it is not enough to simply say "make better" or improve +upon. We instead need to be able to define what it means to make better or +improve upon. We started by taking this list of things that did not work as well +as we would like and tried to identify a work-flow or ideal system for each of +these. It became clear from the start that we were talking about creating a +system that was highly agile and automated. It was also clear that we had to get +out of the way of the process. We needed to remove ourselves from the path of +work as much as possible, in other words get rid of the "human API". In order to +achieve this we have a need to provide, not only a high level of automation, +but also a high level of self service. In other words, we need to empower our +users to be able to deploy their applications and systems without going through +us. This is a shift in paradigm, a new way of thinking for us. This creates an +opportunity for us to reevaluate our roll in delivering systems to our end +users. It also points out the need to think about standards and security right +from the start. There is a little process built in to this system, a sort of +bargain or agreement that we all have to buy into or accept, but as much as +possible we want to stay out of the process. We want to simply deliver an +awesome platform that the end users can develop their own processes around. We +need to provide them the tools so that they can build the process they require, +without muddying the waters by adding our own process. Everything must, +therefore be well documented, in as open a place as possible. This leads us to +some basic tenants of this new system: + +* Everything must be well documented. Preferably, things should be self + documenting. +* We should use open source, community built and maintained pieces wherever + possible. +* Each piece we build should be designed to be reusable as well as open source. +* Everything should be developed in the open and be available for review, + comment, pull-request, etc. + +It is important to note that we are not trying to provide a datacenter +automation strategy. We are operating in a new arena, specifically the cloud. +Our management team has decided that we must limit ourselves to working solely +with the Amazon cloud offering, Amazon Web Services (AWS). + +Put succinctly we are trying to provide a framework for deploying applications +in a simple, self-service, automated, and repeatable way. Note here, we consider +everything that runs on a system to be an application regardless of its +complexity or protocol. In other words, a simple web site is an application. +Additionally a DNS server is an application, as is a monitoring system or a +metrics system. We make no distinction for what is deployed, they are all +treated equal. + +This brings us to some high level ideals about what we are trying to accomplish. +These include: + +* Delivering standards that users can design applications around +* Providing self service opportunities to our customers +* Providing automated; provisioning, networking, firewalling, monitoring, + backups, load balancing, autoscaling, etc +* Ensure basic security through; application isolation, ssl certificate + automation, code review, InfoSec review (RRA), etc... +* Provide a comprehensive, analytical, trending monitoring suite covering the + entire application stack ### Specific areas of improvement -Lets take a few minutes to dig into some details of some things we can do to help alleviate some of our current headaches while providing some of the benefits we were just discussing. - - - [Automate all the things](#automate-all-the-things) - - [Built on cloud technology](#built-on-cloud-technology) - - [Provide self service opportunities](#provide-self-service-opportunities) - - [Create standards](#create-standards) - - [Treat datacenters as reusable components](#treat-datacenters-as-reusable-components) - - [Exterminate the "Human API"](#exterminate-the-human-api) - - [Use more community resources](#use-more-community-resources) - - [Revision everything](#revision-everything) - - [Transition work-flow to GitHub](#transition-work-flow-to-github) - - [Code Reviews](#code-reviews) - - [Provide Application isolation](#provide-application-isolation) - - [Provide a platform that can autoscale](#provide-a-platform-that-can-autoscale) - - [Bit for bit repeatable deployments](#bit-for-bit-repeatable-deployments) - - [Destroy Tainted resources](#destroy-tainted-resources) - - [Reduce time required to stand up a new application](#reduce-time-required-to-stand-up-a-new-application) - - [Provide analytical and trending monitoring for applications and systems](#provide-analytical-and-trending-monitoring-for-applications-and-systems) - - [Log / Audit everything](#log--audit-everything) - - [Provide transparency into web operations systems and deployment methodologies](#provide-transparency-into-web-operations-systems-and-deployment-methodologies) - - [Provide an open structure that enables us to better support the open web and the Mozilla community](#provide-an-open-structure-that-enables-us-to-better-support-the-open-web-and-the-mozilla-community) - - [Provide a better customer experience](#provide-a-better-customer-experience) + +Lets take a few minutes to dig into some details of some things we can do to +help alleviate some of our current headaches while providing some of the +benefits we were just discussing. + +* [Automate all the things](#automate-all-the-things) +* [Built on cloud technology](#built-on-cloud-technology) +* [Provide self service opportunities](#provide-self-service-opportunities) +* [Create standards](#create-standards) +* [Treat datacenters as reusable components](#treat-datacenters-as-reusable-components) +* [Exterminate the Human API](#exterminate-the-human-api) +* [Use more community resources](#use-more-community-resources) +* [Revision everything](#revision-everything) +* [Transition work-flow to GitHub](#transition-work-flow-to-github) +* [Code Reviews](#code-reviews) +* [Provide Application isolation](#provide-application-isolation) +* [Provide a platform that can autoscale](#provide-a-platform-that-can-autoscale) +* [Bit for bit repeatable deployments](#bit-for-bit-repeatable-deployments) +* [Destroy Tainted resources](#destroy-tainted-resources) +* [Reduce time required to stand up a new application](#reduce-time-required-to-stand-up-a-new-application) +* [Provide analytical and trending monitoring for applications and systems](#provide-analytical-and-trending-monitoring-for-applications-and-systems) +* [Log / Audit everything](#log--audit-everything) +* [Provide transparency into web operations systems and deployment methodologies](#provide-transparency-into-web-operations-systems-and-deployment-methodologies) +* [Provide an open structure that enables us to better support the open web](#provide-an-open-structure-that-enables-us-to-better-support-the-open-web) +* [Provide a better customer experience](#provide-a-better-customer-experience) #### Automate all the things -This is an important first step on our journey. We need to automate as many thing as possible. This removes many of the issues related to human error or "fat fingering" commands. This also has the potential to greatly reduce time to market for many of the services we offer. It is also a key milestone on the path towards self service. When we automate things we open ourselves up top the opportunity to put our substantial experience down in code so that our customers can take advantage of it time and again. + +This is an important first step on our journey. We need to automate as many +thing as possible. This removes many of the issues related to human error or +"fat fingering" commands. This also has the potential to greatly reduce time to +market for many of the services we offer. It is also a key milestone on the path +towards self service. When we automate things we open ourselves up top the +opportunity to put our substantial experience down in code so that our customers +can take advantage of it time and again. A few of the things that we could automate include: - - Provisioning - - Networking - - Firewalling - - Monitoring - - Backups - - Load Balancing - - Autoscaling - - SSL Certificates - - Proxies - - Log Aggregation - - Systems DNS - - User management - - SSH Keys - - Jumphosts - - IP Allocation - - DHCP - - VPNs - - MTAs - - Databases - - Memcache -The list goes on and on... +* Provisioning +* Networking +* Firewalling +* Monitoring +* Backups +* Load Balancing +* Autoscaling +* SSL Certificates +* Proxies +* Log Aggregation +* Systems DNS +* User management +* SSH Keys +* Jumphosts +* IP Allocation +* DHCP +* VPNs +* MTAs +* Databases +* Memcache +The list goes on and on... #### Built on cloud technology -The idea behind building on top of existing cloud technology is that it gets us out of the datacenter game. We are historically not very good at predicting our datacenter footprint which has led to some challenges. We find ourselves with more than double the necessary footprint. This is compounded by the desire to provide some level of high availability. Additionally we get locked into multi-year contracts with no room to adapt when our needs change. More than that however, it the idea that we can get at compute and other underlying services more rapidly. While we have been using more and more VMs in the datacenter, the cloud offers a number of additional services such as; ready made databases, virtual networking, zero capacity planning requirements, and so on. + +The idea behind building on top of existing cloud technology is that it gets us +out of the datacenter game. We are historically not very good at predicting our +datacenter footprint which has led to some challenges. We find ourselves with +more than double the necessary footprint. This is compounded by the desire to +provide some level of high availability. Additionally we get locked into +multi-year contracts with no room to adapt when our needs change. More than that +however, it the idea that we can get at compute and other underlying services +more rapidly. While we have been using more and more VMs in the datacenter, the +cloud offers a number of additional services such as; ready made databases, +virtual networking, zero capacity planning requirements, and so on. #### Provide self service opportunities -In my opinion the primary reason that we are loosing our customers to other departments, PASS and SAAS providers, and so on, is that we are not providing what our customers need in the time they need it. While providing them what they need may be a larger endeavor, providing them services in a timely manner is completely withing our grasp. It starts with automation as stated previously, but that flows directly through to self service. This is the ultimate convenience that our customers are gaining elsewhere. With few exceptions, our customers are more interested in developing and running their applications than they are in operating systems. If we can provide them with completely configured, reliable and secure systems at the push of a button that they do not have to maintain, I believe this will be quite attractive to them. + +In my opinion the primary reason that we are loosing our customers to other +departments, PASS and SAAS providers, and so on, is that we are not providing +what our customers need in the time they need it. While providing them what they +need may be a larger endeavor, providing them services in a timely manner is +completely withing our grasp. It starts with automation as stated previously, +but that flows directly through to self service. This is the ultimate +convenience that our customers are gaining elsewhere. With few exceptions, ou +r customers are more interested in developing and running their applications +than they are in operating systems. If we can provide them with completely +configured, reliable and secure systems at the push of a button that they do not +have to maintain, I believe this will be quite attractive to them. #### Create standards -This one is huge. We have a tradition of "cowboy ops". This method of working served us well when we were a small organization, poorly staffed and more focused on getting stuff done than long term viability and maintainability. However, this has led to a situation where nearly every system is its own "snowflake". Beautifully crafted and only maintainable by the artist that created it. While this might make for a lovely museum instillation, it does not serve us well in todays increasingly agile world. -In order to remain relevant we need to embrace agile. We need to become more agile as well. The only way we are going to be able to do that is if we accept the fact that we need to start doing things in a standardized and repeatable way. In a way that allows any technician to be able to troubleshoot any part of the system. We need to put ourselves in a position where we are not always required to be "rock stars" who undertake herculean efforts at every turn. We should not be striving to be despot cowboys, independent and lonely. We need to come together and form a team. Working together as a team we will be able to accomplish far more than we could ever dream of when working alone. Creating a framework, or standards, that we can agree on is one of the first steps towards that goal. +This one is huge. We have a tradition of "cowboy ops". This method of working +served us well when we were a small organization, poorly staffed and more +focused on getting stuff done than long term viability and maintainability. +However, this has led to a situation where nearly every system is its own +"snowflake". Beautifully crafted and only maintainable by the artist that +created it. While this might make for a lovely museum instillation, it does not +serve us well in todays increasingly agile world. + +In order to remain relevant we need to embrace agile. We need to become more +agile as well. The only way we are going to be able to do that is if we accept +the fact that we need to start doing things in a standardized and repeatable +way. In a way that allows any technician to be able to troubleshoot any part of +the system. We need to put ourselves in a position where we are not always +required to be "rock stars" who undertake herculean efforts at every turn. We +should not be striving to be despot cowboys, independent and lonely. We need to +come together and form a team. Working together as a team we will be able to +accomplish far more than we could ever dream of when working alone. Creating a +framework, or standards, that we can agree on is one of the first steps towards +that goal. #### Treat datacenters as reusable components -This is a bit of a misnomer. We are actually moving away from the datacenter model. I am using the word datacenter here to denote the set of core services that every application needs regardless of being a physical datacenter or a deployment on some cloud technology. This core set of services contains things like; DHCP, DNS, NTP, proxies, firewalls and so on. The point here is that this core set of services should be configured in a way that we can reuse it over and over without spending the least bit of time on configuration. These services are for the most part so simple and so well understood that little time should be spent on them. We have more interesting and more important things to spend our time on that configuring a DHCP server for the millionth time. - -#### Exterminate the "Human API" -Today, a customer files a bug in Bugzilla or ServiceNow and then waits around for some person to pick up the request. Often times the request requires little more than firing off some scripts or making some small change to Puppet. The issue arises due to the fact that we are all quite busy and therefore requests often sit for days or weeks before any action is taken. Then when action is taken it is often in the form or asking for additional information. This back and forth along with shifting priorities often leads to substantial delays in fulfilling even the simplest of requests. This Bugzilla (ServiceNow) driven back and forth is what we lovingly refer to as the "Human API". -We need to get rid of the "Human API" for each and every request, where it is feasible to do so. This is made possible through both automation and self service opportunities. This serves us in a number of ways. First it frees us up to spend more time on the things that actually matter. Second it decreases the turn around time for our customers simple requests, which in turn increases their happiness (delight). +This is a bit of a misnomer. We are actually moving away from the datacenter +model. I am using the word datacenter here to denote the set of core services +that every application needs regardless of being a physical datacenter or a +deployment on some cloud technology. This core set of services contains things +like; DHCP, DNS, NTP, proxies, firewalls and so on. The point here is that this +core set of services should be configured in a way that we can reuse it over and +over without spending the least bit of time on configuration. These services are +for the most part so simple and so well understood that little time should be +spent on them. We have more interesting and more important things to spend our +time on that configuring a DHCP server for the millionth time. + +#### Exterminate the Human API + +Today, a customer files a bug in Bugzilla or ServiceNow and then waits around +for some person to pick up the request. Often times the request requires little +more than firing off some scripts or making some small change to Puppet. The +issue arises due to the fact that we are all quite busy and therefore requests +often sit for days or weeks before any action is taken. Then when action is +taken it is often in the form or asking for additional information. This back +and forth along with shifting priorities often leads to substantial delays in +fulfilling even the simplest of requests. This Bugzilla (ServiceNow) driven back +and forth is what we lovingly refer to as the "Human API". + +We need to get rid of the "Human API" for each and every request, where it is +feasible to do so. This is made possible through both automation and self +service opportunities. This serves us in a number of ways. First it frees us up +to spend more time on the things that actually matter. Second it decreases the +turn around time for our customers simple requests, which in turn increases +their happiness (delight). #### Use more community resources -This is truly a force multiplier. Sure, we can create everything from scratch, we are crazy capable. The question is not one of ability but of time. I for one would rather be working on the complex and interesting problems of today instead of consistently reinventing the wheel. Practically speaking we can not create everything from scratch. The real question is where do we draw the line. It is my measured opinion that we should use open source technology at every opportunity. If there is a minor thing lacking, we should add it and contribute it back to the open source community. - -We have a habit here in IT of using community technologies whilst giving little back. I think it is time that changed as well. I will mention that, for my part, it is far more gratifying to contribute one small patch to an existing open source project than to reinvent ten projects on my own. There is something exciting and rewarding about knowing that your contribution will serve more than just your needs. -As an example, we are taking advantage of [PuppetForge](https://forge.puppet.com) for community puppet modules. This was not a resource that existed years ago when we adopted puppet in the datacenter. This exemplifies the difficulty in changing direction and the cost of technical debt. - -There does come a point when no tool exists that can do what we need to accomplish. In such cases we can create our own open source project to fit the need. This requires a lot of careful consideration as there is a lot involved in running a successful open source project. From releases to keeping things up to date to managing community contributions, it is a lot of work and requires an investment in time. It is almost always better to find a project that almost fits and submit patches. +This is truly a force multiplier. Sure, we can create everything from scratch, +we are crazy capable. The question is not one of ability but of time. I for one +would rather be working on the complex and interesting problems of today instead +of consistently reinventing the wheel. Practically speaking we can not create +everything from scratch. The real question is where do we draw the line. It is +my measured opinion that we should use open source technology at every +opportunity. If there is a minor thing lacking, we should add it and contribute +it back to the open source community. + +We have a habit here in IT of using community technologies whilst giving little +back. I think it is time that changed as well. I will mention that, for my part, +it is far more gratifying to contribute one small patch to an existing open +source project than to reinvent ten projects on my own. There is something +exciting and rewarding about knowing that your contribution will serve more than +just your needs. + +As an example, we are taking advantage of [PuppetForge](https://forge.puppet.com) +for community puppet modules. This was not a resource that existed years ago +when we adopted puppet in the datacenter. This exemplifies the difficulty in +changing direction and the cost of technical debt. + +There does come a point when no tool exists that can do what we need to +accomplish. In such cases we can create our own open source project to fit the +need. This requires a lot of careful consideration as there is a lot involved +in running a successful open source project. From releases to keeping things up +to date to managing community contributions, it is a lot of work and requires an +investment in time. It is almost always better to find a project that almost +fits and submit patches. #### Revision everything -If every part of the system is revisioned in some way then it will always be possible to recover to a known good state. That is the basis for using a Version Control System (VCS) for deploying assets in the cloud. Now, there are a number of additional benefits to using a VCS, but this point can not be overstated. It does not matter what changes are made to a system, so long as you know when the last working state was you can always revert. + +If every part of the system is revisioned in some way then it will always be +possible to recover to a known good state. That is the basis for using a Version +Control System (VCS) for deploying assets in the cloud. Now, there are a number +of additional benefits to using a VCS, but this point can not be overstated. It +does not matter what changes are made to a system, so long as you know when the +last working state was you can always revert. #### Transition work-flow to GitHub -GitHub provides a number of advantages over our current Subversion based work-flow. GitHub takes git as a VCS a step further. It makes it trivially simple to created patches and collaborate on them. The forking and branching methodologies that are inherent in git provide an opportunity to test code in a safe way that does not risk changes to production systems. Adding on the open collaborative enhancements that GitHub provides, gives us an easy to use, out of the box, experience that aligns quite well with agile methodologies. + +GitHub provides a number of advantages over our current Subversion based +work-flow. GitHub takes git as a VCS a step further. It makes it trivially +simple to created patches and collaborate on them. The forking and branching +methodologies that are inherent in git provide an opportunity to test code in a +safe way that does not risk changes to production systems. Adding on the open +collaborative enhancements that GitHub provides, gives us an easy to use, out of +the box, experience that aligns quite well with agile methodologies. #### Code Reviews -Utilizing the work-flow that git and GitHub provide we find ourselves in a place where code reviews become, not only possible, but quite simple. While we transition away from "cowboy opps" into a more mature working model we find ourselves needing to reduce the number of human errors, "fat fingering", wherever practical. Code reviews are one way in which we can gain a lot of benefit for little effort. Code reviews are common in most development shops today and, if it is not yet obvious, when working with cloud technologies, we are working entirely with code. This transition brings with it a number of opportunities to improve. Code reviews are an excellent way to not only reduce human error but also to engender collaboration and teamwork. -I have found great advantage in having my code reviewed. I have learned a number of new things, prevented a number of errors from reaching production, as well as developed closer relationships with my coworkers in the process. +Utilizing the work-flow that git and GitHub provide we find ourselves in a place +where code reviews become, not only possible, but quite simple. While we +transition away from "cowboy opps" into a more mature working model we find +ourselves needing to reduce the number of human errors, "fat fingering", +wherever practical. Code reviews are one way in which we can gain a lot of +benefit for little effort. Code reviews are common in most development shops +today and, if it is not yet obvious, when working with cloud technologies, we +are working entirely with code. This transition brings with it a number of +opportunities to improve. Code reviews are an excellent way to not only reduce +human error but also to engender collaboration and teamwork. + +I have found great advantage in having my code reviewed. I have learned a number +of new things, prevented a number of errors from reaching production, as well as +developed closer relationships with my coworkers in the process. #### Provide Application isolation -Working with cloud technologies provides us with another great opportunity. Namely we have the opportunity to deploy applications in their own little "datacenter" islands. This separation, or isolation, gives us a number of exciting advantages. -First, applications are isolated from each other in terms of resource consumption. This means that if one application suddenly has an increase in traffic and begins consuming a large number of resources, it will not adversely affect any other applications. This is true not only on the compute layer, but also on the database layer, the caching layers, the load balancing layers, and so forth. - -Second, application isolation gives us the ability to express explicitly what dependencies a particular application has. This enables us to describe precise versions which are specific to an application. It further provides us a safe and discrete way to upgrade dependency versions with out fear or risk to any other application. - -Application isolation in the cloud also provides us the ability to understand the cost associated with operating any given application. While this is less an operator need, management sure finds the information handy. Truly it is a very useful tool for the company in evaluating weather or not a particular application is worth operating at its current level or if it needs to be reevaluated. +Working with cloud technologies provides us with another great opportunity. +Namely we have the opportunity to deploy applications in their own little +"datacenter" islands. This separation, or isolation, gives us a number of +exciting advantages. + +First, applications are isolated from each other in terms of resource +consumption. This means that if one application suddenly has an increase in +traffic and begins consuming a large number of resources, it will not adversely +affect any other applications. This is true not only on the compute layer, but +also on the database layer, the caching layers, the load balancing layers, and +so forth. + +Second, application isolation gives us the ability to express explicitly what +dependencies a particular application has. This enables us to describe precise +versions which are specific to an application. It further provides us a safe and +discrete way to upgrade dependency versions with out fear or risk to any other +application. + +Application isolation in the cloud also provides us the ability to understand +the cost associated with operating any given application. While this is less an +operator need, management sure finds the information handy. Truly it is a very +useful tool for the company in evaluating weather or not a particular +application is worth operating at its current level or if it needs to be +reevaluated. #### Provide a platform that can autoscale -Autoscaling is a new concept when operating on cloud based technology. In AWS, the cloud we are currently deploying on, they offer basic vertical scaling. Given this ability it becomes possible to allow applications to right size themselves for their current load. This has multiple advantages. - -When an application suddenly comes under increased load, the underlying compute layer can scale up to accommodate the demand. This can occur regardless of where the increased load is coming from. It can be from an engagement campaign, or seemingly just as likely, form a developer pushing inefficient code. -As soon as the event requiring increased resources has passed, the application compute layer can scale back down (in) which effectively reduces cost. This means that we no longer need to size resources for the "worst case" scenario. Rather we can size for normal load and rely on autoscaling to kick in when demand requires. +Autoscaling is a new concept when operating on cloud based technology. In AWS, +the cloud we are currently deploying on, they offer basic vertical scaling. +Given this ability it becomes possible to allow applications to right size +themselves for their current load. This has multiple advantages. -This is a form of automatic right sizing. When configured smartly, autoscaling can provide great benefits to both cost and reliability. +When an application suddenly comes under increased load, the underlying compute +layer can scale up to accommodate the demand. This can occur regardless of where +the increased load is coming from. It can be from an engagement campaign, or +seemingly just as likely, form a developer pushing inefficient code. -#### Bit for bit repeatable deployments -It is time now for us to dream the impossible dream. Imagine, if you will, a world in which we could deploy an application to production with 100 percent certainty that it would work. Not just start, but actually work as designed, right down to the smallest detail. I say to you, this dream can be reality. It starts with the concept of, Golden Images - -A golden image is simply a compute (VM, EC2, etc) resource image that does not change when it is put into operation. In other words, no packages are upgraded, no scripts run which mutate state, code is not updated, and so forth. By approaching images in this way we can guarantee that if we launch an instance using this image today, that tomorrow we can launch the same instance using the same image with the exact same results. - -Now if we take this "golden image" concept up a level, to include the entire deployment, we get to the idea of bit for bit repeatability. Meaning that every resource launched, from the databases to the web servers to the load balancers, are all based on golden images. +As soon as the event requiring increased resources has passed, the application +compute layer can scale back down (in) which effectively reduces cost. This +means that we no longer need to size resources for the "worst case" scenario. +Rather we can size for normal load and rely on autoscaling to kick in when +demand requires. -By deploying the entire environment this way we can have, say a staging environment in which we run tests to validate functionality. Then when we deploy the exact same, bit for bit, environment to production, we get to do so with a certainty that we have never been able to achieve before. +This is a form of automatic right sizing. When configured smartly, autoscaling +can provide great benefits to both cost and reliability. -While this presents some interesting challenges in application, it none-the-less provides us with a degree of confidence that is unprecedented. It enables us to contemplate approaching things in new ways. For example, we could consider destructive, failure based, testing in a staging environment. We can deploy test branches, try out new concepts, all the while confident that if it works in our test environment it will work, with certainty, in production. +#### Bit for bit repeatable deployments -This may very well mark the beginning of a new era of enlightenment in IT... What, am I overselling it a bit? Well okay, perhaps a bit, but still you have to admit, this is really cool. +It is time now for us to dream the impossible dream. Imagine, if you will, a +world in which we could deploy an application to production with 100 percent +certainty that it would work. Not just start, but actually work as designed, +right down to the smallest detail. I say to you, this dream can be reality. It +starts with the concept of, Golden Images + +A golden image is simply a compute (VM, EC2, etc) resource image that does not +change when it is put into operation. In other words, no packages are upgraded, +no scripts run which mutate state, code is not updated, and so forth. By +approaching images in this way we can guarantee that if we launch an instance +using this image today, that tomorrow we can launch the same instance using the +same image with the exact same results. + +Now if we take this "golden image" concept up a level, to include the entire +deployment, we get to the idea of bit for bit repeatability. Meaning that every +resource launched, from the databases to the web servers to the load balancers, +are all based on golden images. + +By deploying the entire environment this way we can have, say a staging +environment in which we run tests to validate functionality. Then when we deploy +the exact same, bit for bit, environment to production, we get to do so with a +certainty that we have never been able to achieve before. + +While this presents some interesting challenges in application, it none-the-less +provides us with a degree of confidence that is unprecedented. It enables us to +contemplate approaching things in new ways. For example, we could consider +destructive, failure based, testing in a staging environment. We can deploy test +branches, try out new concepts, all the while confident that if it works in our +test environment it will work, with certainty, in production. + +This may very well mark the beginning of a new era of enlightenment in IT... +What, am I overselling it a bit? Well okay, perhaps a bit, but still you have +to admit, this is really cool. #### Destroy Tainted resources -This brings us right along to the concept of tainted images. A tainted image is any image that has differed from its golden image. While that sounds simple enough, I understand there is a bit of confusion around what exactly constitutes a tainted image. In other words, what exactly makes an image tainted. Simply stated, any configuration or system level change that would persist across a reboot taints a system. - -I think it is simpler to explain what changes do not constitute tainting an image. Things like running state, meaning changes to running memory (ram). Likewise changes to the '/tmp' file-system. Mounting a network attached storage volume. Any thing that is a transient state that does not persist over a reboot can generally be considered non-tainting. - -Conversely, anything that would persist over a reboot necessarily taints the resource. For example, upgrading a package would taint the resource. Locally modifying a configuration file or updating application code modifies the system in a persistent way and therefore taints the resource. -You may be asking yourself, "If I can't modify a configuration file, how do I configure my application?". Excellent question. I am glad you asked. Allow me to answer that by way of an example. Take database configuration. We understand that we need a username and password to connect to our database. Lets assume that we do not know this when we create our golden image. We simply make those pieces of configuration inputs that we pass to the golden image on boot. This way the configuration file, with variables in place, does not change on boot and the instance simply exposes the variables to the application on startup. We will see specific examples a little later on. - -The key takeaway when considering tainted resources is this. You can never trust a tainted resource to be 100 percent identical to a known good image. Therefore it is wise to terminate any suspected resource and allow autoscaling to replace it with a fresh, known good, golden image. +This brings us right along to the concept of tainted images. A tainted image is +any image that has differed from its golden image. While that sounds simple +enough, I understand there is a bit of confusion around what exactly constitutes +a tainted image. In other words, what exactly makes an image tainted. Simply +stated, any configuration or system level change that would persist across a +reboot taints a system. + +I think it is simpler to explain what changes do not constitute tainting an +image. Things like running state, meaning changes to running memory (ram). +Likewise changes to the '/tmp' file-system. Mounting a network attached storage +volume. Any thing that is a transient state that does not persist over a reboot +can generally be considered non-tainting. + +Conversely, anything that would persist over a reboot necessarily taints the +resource. For example, upgrading a package would taint the resource. Locally +modifying a configuration file or updating application code modifies the system +in a persistent way and therefore taints the resource. + +You may be asking yourself, "If I can't modify a configuration file, how do I +configure my application?". Excellent question. I am glad you asked. Allow me +to answer that by way of an example. Take database configuration. We understand +that we need a username and password to connect to our database. Lets assume +that we do not know this when we create our golden image. We simply make those +pieces of configuration inputs that we pass to the golden image on boot. This +way the configuration file, with variables in place, does not change on boot and +the instance simply exposes the variables to the application on startup. We will +see specific examples a little later on. + +The key takeaway when considering tainted resources is this. You can never trust +a tainted resource to be 100 percent identical to a known good image. Therefore +it is wise to terminate any suspected resource and allow autoscaling to replace +it with a fresh, known good, golden image. #### Reduce time required to stand up a new application -When we start to consider these concepts as a whole, we start to see how we could add them together to advantage. When we take automation together with golden images, we begin to get to a point where speed of deployments along with the level of confidence we can achieve, combine to give us something special. We can start to see that we can reduce time-to-market by a wide margin. When we sprinkle in a bit of self service, our turn around time starts to diminish as well. -By taking all of that together with some agreed upon standards of practice, we begin to arrive at a place where we can stop spending cycles configuring the same old Apache server and start fine tuning automation to deliver instead. When we start to automate ourselves out of the mundane tasks, we not only free up our time to focus on more interesting challenges, we also provide a better experience to our customers. +When we start to consider these concepts as a whole, we start to see how we +could add them together to advantage. When we take automation together with +golden images, we begin to get to a point where speed of deployments along with +the level of confidence we can achieve, combine to give us something special. +We can start to see that we can reduce time-to-market by a wide margin. When we +sprinkle in a bit of self service, our turn around time starts to diminish as +well. + +By taking all of that together with some agreed upon standards of practice, we +begin to arrive at a place where we can stop spending cycles configuring the +same old Apache server and start fine tuning automation to deliver instead. When +we start to automate ourselves out of the mundane tasks, we not only free up our +time to focus on more interesting challenges, we also provide a better +experience to our customers. #### Provide analytical and trending monitoring for applications and systems -As we start to become more hands off with basic deployments, we discover a need to really monitor the things that matter. While monitoring our artisan servers, we would look for things like load going beyond some number. We would monitor things like XX gigs of free disk space. This worked well because we were basing our monitoring on hardware that we specked, purchased and installed. Hardware that did not change often. - -It the cloud we can no longer monitor in this manner. The underlying resources, hardware if you will, changes so rapidly that our monitors become out of date almost as soon as we have them in place. Instead of monitoring the quantity of disk space, we need to monitor for the amount of time until a disk fills up. Instead of monitoring load on a single, or cluster of servers, we need to monitor unexpected changes in the resources required to accomplish a given task. - -Further, it is not sufficient to understand if the underlying resources are adequate and performing up to requirements. It does us absolutely no good to know that we have a perfectly happy and stable web server if all it is serving up are 404s. We need to get to the point of functional monitoring. -Functional monitoring is not simply understanding the health of the underlying systems, although that is a part of it, but rather understanding the health of the entire application. We need to start to understand the performance of the application itself. For example, search latency trends over time, or responsiveness over time. We need to understand if form submissions are actually making it to the database. In essence, we need to instrument the application based on its intended function and then understand if the underlying resources are adequate to the task. - -Given that we now have application isolation and bit for bit repeatability, we are able to open the door to understanding the applications we are running. We can no longer be satisfied by simply stating "The server is up.". We must strive to be able to say "The application is operating flawlessly!". +As we start to become more hands off with basic deployments, we discover a need +to really monitor the things that matter. While monitoring our artisan servers, +we would look for things like load going beyond some number. We would monitor +things like XX gigs of free disk space. This worked well because we were basing +our monitoring on hardware that we specked, purchased and installed. Hardware +that did not change often. + +It the cloud we can no longer monitor in this manner. The underlying resources, +hardware if you will, changes so rapidly that our monitors become out of date +almost as soon as we have them in place. Instead of monitoring the quantity of +disk space, we need to monitor for the amount of time until a disk fills up. +Instead of monitoring load on a single, or cluster of servers, we need to +monitor unexpected changes in the resources required to accomplish a given task. + +Further, it is not sufficient to understand if the underlying resources are +adequate and performing up to requirements. It does us absolutely no good to +know that we have a perfectly happy and stable web server if all it is serving +up are 404s. We need to get to the point of functional monitoring. + +Functional monitoring is not simply understanding the health of the underlying +systems, although that is a part of it, but rather understanding the health of +the entire application. We need to start to understand the performance of the +application itself. For example, search latency trends over time, or +responsiveness over time. We need to understand if form submissions are actually +making it to the database. In essence, we need to instrument the application +based on its intended function and then understand if the underlying resources +are adequate to the task. + +Given that we now have application isolation and bit for bit repeatability, we +are able to open the door to understanding the applications we are running. We +can no longer be satisfied by simply stating "The server is up.". We must strive +to be able to say "The application is operating flawlessly!". #### Log / Audit everything -We collect a lot of logs. Most of them never leave the host system on which they were generated. The few systems that are set up to aggregate logs, generally only process a fraction of the logs they receive. Then the ones that are processed are only available to a few people and getting at any meaningful insights is difficult at best. - -The shame of this is that it is not difficult to make the situation better. There are a host of open source tools available for aggregating, parsing, map reducing, and making search-able logs of all sorts. We simply need to take advantage of what is already available to us. - -Simply aggregating logs and making them available for parsing, opens up a world of insights. These insights not only help us to troubleshoot issues, but help us to predict issues before they become issues. Further this could be an invaluable resource to our customers when it comes to tracking down issues with their applications or simply understanding why something is operating in the way that it is. -Understanding our logs is more critical in the cloud. Resources are transient and often autoscale out of existence before we know there is an issue. It is commonplace for numerous scaling events to take place during the time it takes to investigate an issue. Therefore it is more critical in the cloud than in a datacenter to aggregate logs in a persistent manner. It is obvious that logs need to be retained in a persistent manner, but what is less obvious it the need to be able to get at the data within the logs in a rapid way that lends itself to insights. It is far to cumbersome to download and grep through gigs upon gigs of log entries, this simply does not scale. We need to have a system in place that makes this sort of investigation easy and fast. +We collect a lot of logs. Most of them never leave the host system on which they +were generated. The few systems that are set up to aggregate logs, generally +only process a fraction of the logs they receive. Then the ones that are +processed are only available to a few people and getting at any meaningful +insights is difficult at best. + +The shame of this is that it is not difficult to make the situation better. +There are a host of open source tools available for aggregating, parsing, map +reducing, and making search-able logs of all sorts. We simply need to take +advantage of what is already available to us. + +Simply aggregating logs and making them available for parsing, opens up a world +of insights. These insights not only help us to troubleshoot issues, but help +us to predict issues before they become issues. Further this could be an +invaluable resource to our customers when it comes to tracking down issues with +their applications or simply understanding why something is operating in the way +that it is. + +Understanding our logs is more critical in the cloud. Resources are transient +and often autoscale out of existence before we know there is an issue. It is +commonplace for numerous scaling events to take place during the time it takes +to investigate an issue. Therefore it is more critical in the cloud than in a +datacenter to aggregate logs in a persistent manner. It is obvious that logs +need to be retained in a persistent manner, but what is less obvious it the need +to be able to get at the data within the logs in a rapid way that lends itself +to insights. It is far to cumbersome to download and grep through gigs upon gigs +of log entries, this simply does not scale. We need to have a system in place +that makes this sort of investigation easy and fast. #### Provide transparency into web operations systems and deployment methodologies -One of the more difficult issues facing us in the datacenter is balancing the need for security with the need to access systems. We have generally erred on the side of security and completely lock our customers out. This creates a situation where they have no insight into the systems running their applications. This in turn leads to the situation where the customer is developing blind. As a result they ofter build applications that do not run, or are plagued with errors, on the systems we provide. -This demonstrates the need for increased transparency at a minimum. Beyond that, I suggest that we have an opportunity to collaborate with our customers, early in the process, to provide them with systems that meet their needs. At the same time we can achieve a high level of security and reliability. It is a challenge to be sure, but one we can solve. - -By placing all of the cloud configuration code in git, we are providing the customer insights into their deployment. We are also offering them the ability to collaborate on their deployment. This in conjunction with using open source technologies grants our customers hitherto unknown insights. - -In addition to the above we should document everything possible in an open and collaborative environment. We should not place any documentation in a close walled garden. Everything from design documentation to HOWTOs should be public. This, of course, necessitates care so as to not expose any secrets. We should continuously strive to be more open and more transparent. - -#### Provide an open structure that enables us to better support the open web and the Mozilla community -By now, we have some ideas based on open source technologies, documented in a public way. We can enable bit for bit repeatable deployments with security and reliability at the forefront of the design. We have created an environment in which we can work in the open while collaborating with our customers. - -This will enable us, practically for the first time, to be able to contribute to the open source community as an organization. This is more than an open source first stance, this is an open source collaborative stance. I truly believe that when we all pull in the same direction and work together, we can be more successful than when we work as lone cowboys. - -This not only serves our own ends, but also our customers and finally helps to foster a more cohesive Mozilla community. +One of the more difficult issues facing us in the datacenter is balancing the +need for security with the need to access systems. We have generally erred on +the side of security and completely lock our customers out. This creates a +situation where they have no insight into the systems running their +applications. This in turn leads to the situation where the customer is +developing blind. As a result they ofter build applications that do not run, +or are plagued with errors, on the systems we provide. + +This demonstrates the need for increased transparency at a minimum. Beyond that, +I suggest that we have an opportunity to collaborate with our customers, early +in the process, to provide them with systems that meet their needs. At the same +time we can achieve a high level of security and reliability. It is a challenge +to be sure, but one we can solve. + +By placing all of the cloud configuration code in git, we are providing the +customer insights into their deployment. We are also offering them the ability +to collaborate on their deployment. This in conjunction with using open source +technologies grants our customers hitherto unknown insights. + +In addition to the above we should document everything possible in an open and +collaborative environment. We should not place any documentation in a close +walled garden. Everything from design documentation to HOWTOs should be public. +This, of course, necessitates care so as to not expose any secrets. We should +continuously strive to be more open and more transparent. + +#### Provide an open structure that enables us to better support the open web + +By now, we have some ideas based on open source technologies, documented in a +public way. We can enable bit for bit repeatable deployments with security and +reliability at the forefront of the design. We have created an environment in +which we can work in the open while collaborating with our customers. + +This will enable us, practically for the first time, to be able to contribute to +the open source community as an organization. This is more than an open source +first stance, this is an open source collaborative stance. I truly believe that +when we all pull in the same direction and work together, we can be more +successful than when we work as lone cowboys. + +This not only serves our own ends, but also our customers and finally helps to +foster a more cohesive Mozilla community. #### Provide a better customer experience -All of this adds up to a better customer experience. As I said in the opening, we are facing some unprecedented challenges in IT today. I firmly believe that if we accept that the way we have been operating is in some ways flawed and that there is room for improvement, we can truly make things better. -At the end of the day we work in a services industry. We must serve the needs of our customers first. We must move away from the "IT knows best" mentality and into one of collaboration and inclusiveness. In this way and by adopting the types of practices I have outlined here, we can get better. We can win back our customers, and I dare say, we can have fun doing it. +All of this adds up to a better customer experience. As I said in the opening, +we are facing some unprecedented challenges in IT today. I firmly believe that +if we accept that the way we have been operating is in some ways flawed and that +there is room for improvement, we can truly make things better. + +At the end of the day we work in a services industry. We must serve the needs of +our customers first. We must move away from the "IT knows best" mentality and +into one of collaboration and inclusiveness. In this way and by adopting the +types of practices I have outlined here, we can get better. We can win back our +customers, and I dare say, we can have fun doing it. diff --git a/training/labs/nubis_dpaste.md b/training/labs/nubis_dpaste.md index f50fd96..27f0afa 100644 --- a/training/labs/nubis_dpaste.md +++ b/training/labs/nubis_dpaste.md @@ -1,26 +1,41 @@ -# Nubis-Dpaste Working Lab -In this lab we will walk through cloning the nubis-dpaste repository, making a change and submitting a pull request. + + +# Nubis-Dpaste Working Lab + +In this lab we will walk through cloning the nubis-dpaste repository, making a +change and submitting a pull request. ## Fork The nubis-dpaste Repository On GitHub -Head over the the [nubis-dpaste](https://github.com/Nubisproject/nubis-dpaste) repository on GitHub and click the fork button. + +Head over the the [nubis-dpaste](https://github.com/Nubisproject/nubis-dpaste) +repository on GitHub and click the fork button. ![GitHub Fork](../media/labs/nubis-dpaste-lab/github_fork.png "GitHub Fork") ## Clone The Repository Locally -Next we will clone the repository locally. You can copy the url from github and paste it into your terminal. + +Next we will clone the repository locally. You can copy the url from github and +paste it into your terminal. ![GitHub Clone](../media/labs/nubis-dpaste-lab/github_clone.png "GitHub Clone") ```bash + git clone git@github.com:nubisproject/nubis-dpaste.git cd nubis-dpaste + ``` ## Make some local changes -Here we will make a local change. Lets say that the ```apache2ctl graceful``` command is not sufficent to apply our changes. Lets change that to ```apache2ctl restart```. + +Here we will make a local change. Lets say that the ```apache2ctl graceful``` +command is not sufficent to apply our changes. Lets change that +to ```apache2ctl restart```. ```bash + vi nubis/puppet/files/update + ``` Change ```apache2ctl graceful``` @@ -32,97 +47,155 @@ To ```apache2ctl restart``` ![update_edit_restart](../media/labs/nubis-dpaste-lab/update_edit_restart.png "update_edit_restart") ## Make a Pull Request -Now we want to get our changes accepted upstream so we can get this change into production. We will use a series of git commands to check in our change and submit a pull request. + +Now we want to get our changes accepted upstream so we can get this change into +production. We will use a series of git commands to check in our change and +submit a pull request. ### Git stauts + Lets check the changes that git is aware of. ```bash + git status + ``` ![git_status](../media/labs/nubis-dpaste-lab/git_status.png "git_status") ### Git diff -If we would like to double check that we are not including anything unintended in the commit we can look at the diff. + +If we would like to double check that we are not including anything unintended +in the commit we can look at the diff. ```bash + git diff + ``` ![git_diff](../media/labs/nubis-dpaste-lab/git_diff.png "git_diff") ### Git add + The next step is to add the changes we want to include. ```bash + git add nubis/puppet/files/update + ``` ### Git status -Lets double check what git is going to include in our commit. We can see here that git is only going to include the change we made in this commit. + +Lets double check what git is going to include in our commit. We can see here +that git is only going to include the change we made in this commit. ```bash + git status + ``` ![git_status_two](../media/labs/nubis-dpaste-lab/git_status_two.png "git_status_two") ### Git commit -Now that we have added the changed file and verified what we are committing it is time to commit the changes. While it is possible to add a message to the commit command, I encourage you to always add the message through your editor as it gives you one last chance to ensure you are committing what you intended. + +Now that we have added the changed file and verified what we are committing it +is time to commit the changes. While it is possible to add a message to the +commit command, I encourage you to always add the message through your editor +as it gives you one last chance to ensure you are committing what you intended. Go ahead and add a good commit message like ```Change apache graceful to restart``` + ```bash + git commit + ``` ![git_commit](../media/labs/nubis-dpaste-lab/git_commit.png "git_commit") ### Git push -Finally it is time to push our local commit to our remote repository. Since we cloned this from our user space on GitHub this will push our changes to the repository hosted there. + +Finally it is time to push our local commit to our remote repository. Since we +cloned this from our user space on GitHub this will push our changes to the +repository hosted there. ```bash + git push + ``` ![git_push](../media/labs/nubis-dpaste-lab/git_push.png "git_push") ### Create a Pull Request + Now we can head over to GitHub and make a pull request. #### Load GitHub in your browser -Start by going to your ```nubis-dpaste``` repository on GitHub. For me the url is https://github.com/$GIT_USERNAME/nubis-dpaste, however you will need to change ```$GIT_USERNAME``` to your GitHub user name. + +Start by going to your ```nubis-dpaste``` repository on GitHub. For me the url +is [https://github.com/$GIT_USERNAME/nubis-dpaste](https://github.com/$GIT_USERNAME/nubis-dpaste), +however you will need to change ```$GIT_USERNAME``` to your GitHub user name. #### GitHub New Pull Request + Click on the ```New Pull Request``` button. -![github_new_pull_request](../media/labs/nubis-dpaste-lab/github_new_pull_request.png "github_new_pull_request") +![github_new_pr](../media/labs/nubis-dpaste-lab/github_new_pull_request.png "github_new_pr") #### GitHub Create Pull Request -This is your final change to review your changes before creating the pull request. You can review all of the changes to all of the files that are going to be included. Once you have ensured that everything looks good, go ahead and click on the ```Create Pull Request``` button. -![github_create_pull_request](../media/labs/nubis-dpaste-lab/github_create_pull_request.png "github_create_pull_request") +This is your final change to review your changes before creating the pull +request. You can review all of the changes to all of the files that are going to +be included. Once you have ensured that everything looks good, go ahead and +click on the ```Create Pull Request``` button. + +![git_create_pr](../media/labs/nubis-dpaste-lab/github_create_pull_request.png "git_create_pr") ### Code Review -Once that is complete, anyone with admin privileges on the original repository will be able to merge your pull request. At this point they may; make comments and ask you to make changes, reject the pull request, or simply ```r+``` and merge in your changes. -![github_final_pull_request](../media/labs/nubis-dpaste-lab/github_final_pull_request.png "github_final_pull_request") +Once that is complete, anyone with admin privileges on the original repository +will be able to merge your pull request. At this point they may; make comments +and ask you to make changes, reject the pull request, or simply ```r+``` and +merge in your changes. + +![github_final_pr](../media/labs/nubis-dpaste-lab/github_final_pull_request.png "git_final_pr") ## Clean up -Since this is just an example and we do not want to actually make this change, lets go ahead and reset to a clean state. -This will reset your repository to the state it was in before the last commit. If you wish to remove the commit but leave the changed files in a ```"Changes to be committed"``` state you can use the ```--soft``` option in place of ```--hard```. The ```HEAD~1``` portion tells git to go back one commit from the tip of the repository. +Since this is just an example and we do not want to actually make this change, +lets go ahead and reset to a clean state. + +This will reset your repository to the state it was in before the last commit. +If you wish to remove the commit but leave the changed files in +a ```"Changes to be committed"``` state you can use the ```--soft``` option in +place of ```--hard```. The ```HEAD~1``` portion tells git to go back one commit +from the tip of the repository. ```bash + git reset --hard HEAD~1 + ``` Finally, go ahead and push this up to your fork on GitHub. -**NOTE:** You do not want to normally remove commits from a public repository. This is especially important if anyone has forked your repository. In this case it should be safe since no one should have forked your repository yet. **Use Caution** +**NOTE:** You do not want to normally remove commits from a public repository. +This is especially important if anyone has forked your repository. In this case +it should be safe since no one should have forked your repository yet. +**Use Caution** + ```bash + git push -f + ``` ## FIN -That is the end of this lab. If you have any questions do not hesitate to reach out to us on ```irc.mozilla.org #nubis-users```. + +That is the end of this lab. If you have any questions do not hesitate to reach +out to us on ```irc.mozilla.org #nubis-users```. diff --git a/training/labs/nubis_skel.md b/training/labs/nubis_skel.md index 114dcf5..ab2537a 100644 --- a/training/labs/nubis_skel.md +++ b/training/labs/nubis_skel.md @@ -1,30 +1,52 @@ -# Nubis-Skel Working Lab -In this lab we will walk through obtaining a release from the nubis-skel repository, building an AMI and deploying it into AWS. We will then modify the application code, rebuilding and redeploying the application. + + +# Nubis-Skel Working Lab + +In this lab we will walk through obtaining a release from the nubis-skel +repository, building an AMI and deploying it into AWS. We will then modify the +application code, rebuilding and redeploying the application. ## Get the code -Next grab the latest [release](https://github.com/nubisproject/nubis-skel/releases/latest), extract it and copy the *nubis* directory into your code base. + +Next grab the latest [release](https://github.com/nubisproject/nubis-skel/releases/latest), +extract it and copy the *nubis* directory into your code base. ```bash + wget https://github.com/nubisproject/nubis-skel/archive/v1.2.2-training.tar.gz tar -xvf v1.2.2-training.tar.gz cd nubis-skel-1.2.2-training/ + ``` + ## Change the name of the application -Here we will change the name of the application. When you are deploying your own application you should change this to what makes sense. For the sake of this lab, you should use some form of your user name. + +Here we will change the name of the application. When you are deploying your own +application you should change this to what makes sense. For the sake of this +lab, you should use some form of your user name. **NOTE** Do not use ```myapp``` here, it will cause a build error -**NOTE2** Do not exceed 12 characters for the app name (ask me why if you are interested) +**NOTE2** Do not exceed 12 characters for the app name (ask me why if you are +interested) ```bash + grep skel * -rl | xargs perl -pi -e's/skel/myapp/g' + ``` ## Build The Application AMI -It is time to build a new AMI. We will use [nubis-builder](https://github.com/Nubisproject/nubis-builder) to do this. You should already have installed nubis-builder by following the instructions in the [prerequisites](https://github.com/Nubisproject/nubis-docs/blob/master/PREREQUISITES.md) document. + +It is time to build a new AMI. We will use [nubis-builder](https://github.com/Nubisproject/nubis-builder) +to do this. You should already have installed nubis-builder by following the +instructions in the [prerequisites](https://github.com/Nubisproject/nubis-docs/blob/master/PREREQUISITES.md) +document. ``` + nubis-builder build + ``` You should see something like this: @@ -32,34 +54,54 @@ You should see something like this: ![nubis-builder](../media/labs/nubis-skel-lab/nubis_builder.png "nubis-builder") ### Capture The AMI ID -You will need to record the AMI ID for ubuntu in us-west-2 from the nubis-builder outputs. -In this example that would be ```us-west-2: ami-d5c31fb5```, however your AMI ID will be different. +You will need to record the AMI ID for ubuntu in us-west-2 from the +nubis-builder outputs. + +In this example that would be ```us-west-2: ami-d5c31fb5```, however your AMI ID +will be different. ![nubis-builder-amis](../media/labs/nubis-skel-lab/nubis_builder_amis.png "nubis-builder-amis") ## Deploy With Terraform -Now that we have built the AMI we can use it to deploy into AWS using Terraform. Again this tool should have been installed by following the [prerequisites](https://github.com/Nubisproject/nubis-docs/blob/master/PREREQUISITES.md) document. -All of the terraform commands should be run from the ```nubis/terraform``` directory. We will also set some variables for convenience. +Now that we have built the AMI we can use it to deploy into AWS using Terraform. +Again this tool should have been installed by following the [prerequisites](https://github.com/Nubisproject/nubis-docs/blob/master/PREREQUISITES.md) +document. -**NOTE:** If your POSIX login is different from your local user name you may need to replace the ```$USER``` variables below with your POSIX user name. Do not use your entire LDAP login here. Say your LDAP is ```me@mozilla.com```, you would use ```me```. +All of the terraform commands should be run from the ```nubis/terraform``` +directory. We will also set some variables for convenience. -**NOTE-2:** You need to use the AMI ID from the ```nubis-builder``` outputs to include it in the ```AMI_ID``` variable here. +**NOTE:** If your POSIX login is different from your local user name you may +need to replace the ```$USER``` variables below with your POSIX user name. Do +not use your entire LDAP login here. Say your LDAP is ```me@mozilla.com```, +you would use ```me```. -**NOTE-3:** Make sure to edit ```SSH_KEY_FILE``` to point to a valid ssh public key. +**NOTE-2:** You need to use the AMI ID from the ```nubis-builder``` outputs to +include it in the ```AMI_ID``` variable here. + +**NOTE-3:** Make sure to edit ```SSH_KEY_FILE``` to point to a valid ssh public +key. ```bash + cd nubis/terraform -export ACCOUNT_NAME="nubis-training-2016" USER_LOGIN="$USER" SSH_KEY_NAME="$USER-skel" SSH_KEY_FILE="~/.ssh/XXX.pub" AMI_ID="ami-XXX" +export ACCOUNT_NAME="nubis-training-2016" USER_LOGIN="$USER" \ +SSH_KEY_NAME="$USER-skel" SSH_KEY_FILE="~/.ssh/XXX.pub" AMI_ID="ami-XXX" + ``` ## Configure The Deployment -In this step we will create our ```terraform.tfvars``` file. There is an example ```terraform.tfvars-dist``` file that you can copy and edit or you can run the following commands. -**NOTE:** For all of the copy-paste examples, some users have reported having to remove the curly brackets ```{ }```. +In this step we will create our ```terraform.tfvars``` file. There is an +example ```terraform.tfvars-dist``` file that you can copy and edit or you can +run the following commands. + +**NOTE:** For all of the copy-paste examples, some users have reported having to +remove the curly brackets ```{ }```. ```bash + cat < terraform.tfvars account = "${ACCOUNT_NAME}" region = "us-west-2" @@ -72,10 +114,14 @@ EOF ``` ### Get Terraform Modules -The first step will be to grab the terraform modules that we use to deploy the application. + +The first step will be to grab the terraform modules that we use to deploy the +application. ```bash + terraform get -update=true + ``` You should see something like this: @@ -83,10 +129,17 @@ You should see something like this: ![terraform_get](../media/labs/nubis-skel-lab/terraform_get.png "terraform_get") ### Plan The Deployment -Next we will run a ```terraform plan```. This will show us all of the resources that we are about to create in AWS. We will be using the aws-vault tool which you installed and configured when following the instructions in the [prerequisites](https://github.com/Nubisproject/nubis-docs/blob/master/PREREQUISITES.md) document. + +Next we will run a ```terraform plan```. This will show us all of the resources +that we are about to create in AWS. We will be using the aws-vault tool which +you installed and configured when following the instructions in the +[prerequisites](https://github.com/Nubisproject/nubis-docs/blob/master/PREREQUISITES.md) +document. ```bash + aws-vault exec ${ACCOUNT_NAME}-admin -- terraform plan -var ami=${AMI_ID} + ``` You should see something like this: @@ -94,82 +147,131 @@ You should see something like this: ![terraform_plan](../media/labs/nubis-skel-lab/terraform_plan.png "terraform_plan") ### Apply the deployment -Finally it is time to deploy our application into AWS. We do that by running a ```terraform apply```. + +Finally it is time to deploy our application into AWS. We do that by running +a ```terraform apply```. ```bash + aws-vault exec ${ACCOUNT_NAME}-admin -- terraform apply -var ami=${AMI_ID} + ``` You should see something like this: ![terraform_apply](../media/labs/nubis-skel-lab/terraform_apply.png "terraform_apply") -If you get an error similar to this one you will need to edit yout ```terraform.tfvars``` file and shorten the ```service_name```. +If you get an error similar to this one you will need to edit +your ```terraform.tfvars``` file and shorten the ```service_name```. ```bash + * aws_elb.load_balancer: "name" cannot be longer than 32 characters: "longusername-myapp-stage-us-west-2-elb" + ``` ### Verify it worked -Load the ```address``` from the ```Outputs:``` you got during the ```terraform apply``` above and you should see the nginx default ```index.html``` page. -![nginx_default](../media/labs/nubis-skel-lab/nginx_default.png "nginx_default") +Load the ```address``` from the ```Outputs:``` you got during +the ```terraform apply``` above and you should see the nginx +default ```index.html``` page. + +![nginx_default](../media/labs/nubis-skel-lab/nginx_default.png "nginx_default") ## Update The Application -Now we are going to make a change to the application. You can customize the application to your liking. The commands here will have you simply edit the ```index.html``` file and add some custom text. Then you will rebuild the AMI by running ```nubis-builder build```, thereafter a ```terraform plan``` followed by a ```terraform apply```. -**NOTE:** Remember to use the new AMI ID from the ```nubis-builder``` outputs and include it in the ```AMI_ID``` variable here. +Now we are going to make a change to the application. You can customize the +application to your liking. The commands here will have you simply edit +the ```index.html``` file and add some custom text. Then you will rebuild the +AMI by running ```nubis-builder build```, thereafter a ```terraform plan``` +followed by a ```terraform apply```. + +**NOTE:** Remember to use the new AMI ID from the ```nubis-builder``` outputs +and include it in the ```AMI_ID``` variable here. ```bash + cd ../.. # You should be in the root nubis-skel directory here vi nubis/puppet/files/index.html # <--- Make awesome changes nubis-builder build export AMI_ID="ami-XXX" -( cd nubis/terraform && aws-vault exec ${ACCOUNT_NAME}-admin -- terraform plan -var ami=${AMI_ID} ) -( cd nubis/terraform && aws-vault exec ${ACCOUNT_NAME}-admin -- terraform apply -var ami=${AMI_ID} ) +cd nubis/terraform && \ +aws-vault exec ${ACCOUNT_NAME}-admin -- terraform plan -var ami=${AMI_ID} +cd nubis/terraform && \ +aws-vault exec ${ACCOUNT_NAME}-admin -- terraform apply -var ami=${AMI_ID} + ``` ### Verify your changes -Load the ```address``` from the ```Outputs:``` you got during the ```terraform apply``` above and you should see your updates to the ```index.html``` page. -![nginx_update](../media/labs/nubis-skel-lab/nginx_update.png "nginx_update") +Load the ```address``` from the ```Outputs:``` you got during +the ```terraform apply``` above and you should see your updates to +the ```index.html``` page. + +![nginx_update](../media/labs/nubis-skel-lab/nginx_update.png "nginx_update") ### Logging on to your instance -The web server that we deployed is runnign in a private subnet. In order to ssh to the web server we will need to go through a jumphost. Replace "myapp" in this command with your app name. + +The web server that we deployed is runnign in a private subnet. In order to ssh +to the web server we will need to go through a jumphost. Replace "myapp" in +this command with your app name. ```bash -ssh -A -t ec2-user@jumphost.stage.us-west-2.${ACCOUNT_NAME}.nubis.allizom.org "ssh -A -t ubuntu@${USER_LOGIN}.nubis-myapp.service.consul" + +ssh -A -t ec2-user@jumphost.stage.us-west-2.${ACCOUNT_NAME}.nubis.allizom.org \ +"ssh -A -t ubuntu@${USER_LOGIN}.nubis-myapp.service.consul" + ``` -### Loggin into the AWS web console. -If you need to get into the web console you can do that with the aws-vault command. +### Loggin into the AWS web console + +If you need to get into the web console you can do that with the aws-vault +command. ```bash + aws-vault login nubis-training-2016-ro -``` +``` ## Create your own repository -Now we will walk through creating your own repository onGitHub. This will enable you to collaborate with other s using the workflow we discovered in the [previous lab](nubis_dpaste.md). + +Now we will walk through creating your own repository onGitHub. This will enable +you to collaborate with other s using the workflow we discovered in +the [previous lab](nubis_dpaste.md). ### Initialize the repository + The first step is to initialize the repository. ```bash + git init + ``` + You should see: + ```bash + Initialized empty Git repository in ~/nubis-skel-1.2.2-training/.git/ + ``` Next lets add all of the files: + ```bash + git add . + ``` + Now we need to commit our (new) changes: + ```bash + git commit + ``` Add a nice commit message like ```First checking of my new project``` @@ -177,30 +279,43 @@ Add a nice commit message like ```First checking of my new project``` ![github_commit_project](../media/labs/nubis-skel-lab/github_commit_project.png "github_commit_project") ### Create a new repository on GitHub -You need to be logged into GitHub for these steps. If you are not already logged in head over to [GitHub](https://github.com) and login. + +You need to be logged into GitHub for these steps. If you are not already logged +in head over to [GitHub](https://github.com) and login. Next create a new repository [here](https://github.com/new) -You will need to name your new repository and add a description. You do not need to add a README or a LICENSE as they are already included with the nubis-skel project. then click the ```“Create repository”``` button. +You will need to name your new repository and add a description. You do not need +to add a README or a LICENSE as they are already included with the nubis-skel +project. then click the ```“Create repository”``` button. -![github_new_repository](../media/labs/nubis-skel-lab/github_new_repository.png "github_new_repository") +![github_new_repository](../media/labs/nubis-skel-lab/github_new_repository.png "github_new_repository") -Now you will need follow the second set of instructions under ```“Push an existing repository…”```. +Now you will need follow the second set of instructions +under ```“Push an existing repository…”```. ```bash + git remote add origin git@github.com:username/new_repo git push -u origin master + ``` -![git_push_new_repository](../media/labs/nubis-skel-lab/git_push_new_repository.png "git_push_new_repository") +![git_push_new_repo](../media/labs/nubis-skel-lab/git_push_new_repository.png "git_push_new_repo") ## Clean Up The Deployment -Lastly, we should clean up our deployment. In AWS everything costs money. Add onto that the fact that when using terraform it is quick to start up and shut down your deployment. It just makes sense to shut them down at the end of the day. + +Lastly, we should clean up our deployment. In AWS everything costs money. Add +onto that the fact that when using terraform it is quick to start up and shut +down your deployment. It just makes sense to shut them down at the end of the +day. ```bash + cd nubis/terraform aws-vault exec ${ACCOUNT_NAME}-admin -- terraform plan -var ami=${AMI_ID} -destroy aws-vault exec ${ACCOUNT_NAME}-admin -- terraform destroy -var ami=${AMI_ID} + ``` You should see something like this: @@ -208,11 +323,20 @@ You should see something like this: ![terraform_destroy](../media/labs/nubis-skel-lab/terraform_destroy.png "terraform_destroy") ## About the AMI ID -As a final note. You can place the AMI ID into the ```terraform.tfvars``` file if that fits your work-flow. ```ami = "ami-xxx"```. In that case you can omit the ```-var ami=${AMI_ID}``` portion from all of the commands above. Just know that you will need to update it every time you rebuild the AMI. -In fact all of the variables in the ```terraform.tfvars``` file can be either replaced or overridden in this way. +As a final note. You can place the AMI ID into the ```terraform.tfvars``` +file if that fits your work-flow. ```ami = "ami-xxx"```. In that case you can +omit the ```-var ami=${AMI_ID}``` portion from all of the commands above. Just +know that you will need to update it every time you rebuild the AMI. + +In fact all of the variables in the ```terraform.tfvars``` file can be either +replaced or overridden in this way. ## FIN -Well that is it for this working lab. I hope you had fun and learned a bit about deploying into AWS using Nubis. As always if you have questions feel free to reach out to us at nubis-users@googlegroups.com or find us on ```irc.mozilla.org #nubis-users```. + +Well that is it for this working lab. I hope you had fun and learned a bit about +deploying into AWS using Nubis. As always if you have questions feel free to +reach out to us at nubis-users@googlegroups.com or find us +on ```irc.mozilla.org #nubis-users```. Thanks for playing. diff --git a/training/nubis-overview.md b/training/nubis-overview.md index d289464..f7a8b8c 100644 --- a/training/nubis-overview.md +++ b/training/nubis-overview.md @@ -1,232 +1,394 @@ -# Nubis Overview -It is time to move past ideas and start talking about the project we are working on to help us with our transition into the cloud. - - - [What is Nubis](#what-is-nubis) - - [Standardized design](#standardized-design) - - [Security compliance](#security-compliance) - - [Reduced time-to-market](#reduced-time-to-market) - - [What can Nubis do for me](#what-can-nubis-do-for-me) - - [What does Nubis provide](#what-does-nubis-provide) - - [Nubis accounts](#nubis-accounts) - - [Accounts](#accounts) - - [Multiple environments](#multiple-environments) - - [Quarterly Updates](#quarterly-updates) - - [Distribution upgrades](#distribution-upgrades) - - [Package updates](#package-updates) - - [New services](#new-services) - - [Application Image Updates](#application-image-updates) - - [Security Updates](#security-updates) - - [Included Services](#included-services) - - [Proxies](#proxies) - - [NATs](#nats) - - [Consul Integration](#consul-integration) - - [Fluent Integration](#fluent-integration) - - [Jumphosts](#jumphosts) - - [User Management](#user-management) - - [MFA](#mfa) - - [aws-vault](#aws-vault) - - [LDAP Integration](#ldap-integration) - - [Security Integration](#security-integration) - - [InfoSec security audit role](#infoSec-security-audit-role) - - [Network Security Monitoring](#network-security-monitoring) (NSM) - - [Integrated IP Blacklisting](#integrated-ip-blacklisting) - - [Log Integration with Mozilla Investigator](#log-integration-with-mozilla-investigator) (MIG) - - [CloudTrail Integration](#cloudtrail-integration) - - [Additional Services](#additional-services) - - [Cloud Health Integration](#cloud-health-integration) - - [Billing Support](#billing-support) - - [Tainted Resources](#tainted-resources) - - [Platform Monitoring](#platform-monitoring) - - [High Availability](#high-availability) - - [Nubis deployments](#nubis-deployments) - - [Deployment Overview](#deployment-overview) - - [Environments and how to use them](#environments-and-how-to-use-them) - - [Deployment Workflow Diagram](#deployment-workflow-diagram) - - [Deployment repository](#deployment-repository) - - [Puppet configuration](#puppet-configuration) - - [Application Code](#application-Code) - - [Terraform modules](#terraform-modules) - - [Recommended practices](#recommended-practices) - - [Architectural design services](#architectural-design-services) - - [Example deployments](#example-deployments) - - [nubis-skel](#nubis-skel) - - [AWS Solutions Architect](#aws-solutions-architect) - - [Community support](#community-support) - - [CI System](#ci-system) - - [Rolling Back](#rolling-back) - - [Custom Monitors](#custom-monitors) - - [nubis-base](#nubis-base) - - [nubis-builder](#nubis-builder) - - [Build Deploy Diagram](#build-deploy-diagram) - + + +# Nubis Overview + +It is time to move past ideas and start talking about the project we are working +on to help us with our transition into the cloud. + +* [What is Nubis](#what-is-nubis) + * [Standardized design](#standardized-design) + * [Security compliance](#security-compliance) + * [Reduced time-to-market](#reduced-time-to-market) +* [What can Nubis do for me](#what-can-nubis-do-for-me) +* [What does Nubis provide](#what-does-nubis-provide) + * [Nubis accounts](#nubis-accounts) + * [Accounts](#accounts) + * [Multiple environments](#multiple-environments) + * [Quarterly Updates](#quarterly-updates) + * [Distribution upgrades](#distribution-upgrades) + * [Package updates](#package-updates) + * [New services](#new-services) + * [Application Image Updates](#application-image-updates) + * [Security Updates](#security-updates) + * [Included Services](#included-services) + * [Proxies](#proxies) + * [NATs](#nats) + * [Consul Integration](#consul-integration) + * [Fluent Integration](#fluent-integration) + * [Jumphosts](#jumphosts) + * [User Management](#user-management) + * [MFA](#mfa) + * [aws-vault](#aws-vault) + * [LDAP Integration](#ldap-integration) + * [Security Integration](#security-integration) + * [InfoSec security audit role](#infoSec-security-audit-role) + * [Network Security Monitoring](#network-security-monitoring) (NSM) + * [Integrated IP Blacklisting](#integrated-ip-blacklisting) + * [Log Integration with Mozilla Investigator](#log-integration-with-mozilla-investigator) + (MIG) + * [CloudTrail Integration](#cloudtrail-integration) + * [Additional Services](#additional-services) + * [Cloud Health Integration](#cloud-health-integration) + * [Billing Support](#billing-support) + * [Tainted Resources](#tainted-resources) + * [Platform Monitoring](#platform-monitoring) + * [High Availability](#high-availability) + * [Nubis deployments](#nubis-deployments) + * [Deployment Overview](#deployment-overview) + * [Environments and how to use them](#environments-and-how-to-use-them) + * [Deployment Workflow Diagram](#deployment-workflow-diagram) + * [Deployment repository](#deployment-repository) + * [Puppet configuration](#puppet-configuration) + * [Application Code](#application-Code) + * [Terraform modules](#terraform-modules) + * [Recommended practices](#recommended-practices) + * [Architectural design services](#architectural-design-services) + * [Example deployments](#example-deployments) + * [nubis-skel](#nubis-skel) + * [AWS Solutions Architect](#aws-solutions-architect) + * [Community support](#community-support) + * [CI System](#ci-system) + * [Rolling Back](#rolling-back) + * [Custom Monitors](#custom-monitors) + * [nubis-base](#nubis-base) + * [nubis-builder](#nubis-builder) + * [Build Deploy Diagram](#build-deploy-diagram) + ## What is Nubis -At a high level, Nubis is a collection of resources designed to help deploy an application in the Cloud. Nubis provides a large number of benefits including: - - [Standardized design](#standardized-design) - - [Security compliance](#security-compliance) - - [Reduced time-to-market](#reduced-time-to-market) + +At a high level, Nubis is a collection of resources designed to help deploy an +application in the Cloud. Nubis provides a large number of benefits including: + +* [Standardized design](#standardized-design) +* [Security compliance](#security-compliance) +* [Reduced time-to-market](#reduced-time-to-market) ### Standardized design -As we have discussed, there are many ways to operate in the cloud. Nubis creates a standardized way of deploying and operating in the cloud. This provides the benefit of knowing exactly what to expect when looking at anything deployed using Nubis. This aids, not only in understanding but also troubleshooting. Further, by standardizing we enable the ability to rapidly and easily share new ideas and services. Any time a new service is developed by a team using Nubis, any other team using Nubis can take advantage of it. -### Security compliance -Nubis incorporates a massive number of security compliance measures and technologies. We have spent more than a year developing nothing but security compliance within Nubis. The majority of these requirements are required by any application that will be deployed in the Cloud. By using Nubis, you get to take advantage of all of this work. You will save massive quantities of time working on compliance. +As we have discussed, there are many ways to operate in the cloud. Nubis creates +a standardized way of deploying and operating in the cloud. This provides the +benefit of knowing exactly what to expect when looking at anything deployed +using Nubis. This aids, not only in understanding but also troubleshooting. +Further, by standardizing we enable the ability to rapidly and easily share new +ideas and services. Any time a new service is developed by a team using Nubis, +any other team using Nubis can take advantage of it. -Rest assured that we have been working closely with the InfoSec team to make sure we are compliant at all levels of the platform. Every single component of Nubis has been reviewed and vetted by InfoSec. Each new component we design is discussed with the InfoSec team before we begin writing code. We work with InfoSec to discuss changes as we discover them during the build phase. Finally, once the feature is complete we work with InfoSec to do a through review of the final product, making any additional changes required. +### Security compliance -This exhaustive process ensures that Nubis is up to par with the most stringent, best practices, when it comes to securing your deployment. We will be discussing a number of the security measures and systems we have developed in greater detail a bit later on. If you are interested in a more through discussion, reach out to anyone on the Nubis development team and we will be happy to walk you through everything. +Nubis incorporates a massive number of security compliance measures and +technologies. We have spent more than a year developing nothing but security +compliance within Nubis. The majority of these requirements are required by any +application that will be deployed in the Cloud. By using Nubis, you get to take +advantage of all of this work. You will save massive quantities of time working +on compliance. + +Rest assured that we have been working closely with the InfoSec team to make +sure we are compliant at all levels of the platform. Every single component of +Nubis has been reviewed and vetted by InfoSec. Each new component we design is +discussed with the InfoSec team before we begin writing code. We work with +InfoSec to discuss changes as we discover them during the build phase. Finally, +once the feature is complete we work with InfoSec to do a through review of the +final product, making any additional changes required. + +This exhaustive process ensures that Nubis is up to par with the most stringent, +best practices, when it comes to securing your deployment. We will be discussing +a number of the security measures and systems we have developed in greater +detail a bit later on. If you are interested in a more through discussion, reach +out to anyone on the Nubis development team and we will be happy to walk you +through everything. ### Reduced time-to-market -One of the design goals of Nubis is to reduce the time it takes to get an application deployed in the cloud. When considering all of the things necessary to deploy an application in the cloud, from image building all the way up through monitoring, the time it takes to build it all can be daunting. When you add on the work necessary for security compliance, it can take months or even years just to get an environment set up. -All of this is before even considering the application. Many applications currently deployed in the datacenter will require modification in order to operate in the Cloud. Using Nubis frees you from all of the underlying requirements and allows you to focus on the thing that matters to you, your application. +One of the design goals of Nubis is to reduce the time it takes to get an +application deployed in the cloud. When considering all of the things necessary +to deploy an application in the cloud, from image building all the way up +through monitoring, the time it takes to build it all can be daunting. When you +add on the work necessary for security compliance, it can take months or even +years just to get an environment set up. + +All of this is before even considering the application. Many applications +currently deployed in the datacenter will require modification in order to +operate in the Cloud. Using Nubis frees you from all of the underlying +requirements and allows you to focus on the thing that matters to you, your +application. -## What does Nubis provide? -While discussing the details of what Nubis provides it is helpful to conceptualize the idea that Nubis provides two distinct things. - - [Nubis accounts](#nubis-accounts) - - [Nubis deployments](#nubis-deployments) +## What does Nubis provide +While discussing the details of what Nubis provides it is helpful to +conceptualize the idea that Nubis provides two distinct things. + +* [Nubis accounts](#nubis-accounts) +* [Nubis deployments](#nubis-deployments) ### Nubis Accounts -Nubis accounts contain all of the basic services that one would expect to find in a datacenter, such as monitoring and integration into security systems. The account services are updated and maintained by the Nubis development team. We take care of image upgrades as well as ensuring the services operate correctly and integrate with each other. In other words, there is nothing for you to set up or configure, you simply consume these services. Lets briefly look at some of the things provided with a Nubis account: - - - [Accounts](#accounts) - - [Account Diagram](#account-diagram) - - [Multiple environments](#multiple-environments) - - [Quarterly Updates](#quarterly-updates) - - [Distribution upgrades](#distribution-upgrades) - - [Package updates](#package-updates) - - [New services](#new-services) - - [Application Image Updates](#application-image-updates) - - [Security Updates](#security-updates) - - [Included Services](#included-services) - - [Proxies](#proxies) - - [NATs](#nats) - - [Consul Integration](#consul-integration) - - [Fluent Integration](#fluent-integration) - - [Jumphosts](#jumphosts) - - [User Management](#user-management) - - [MFA](#mfa) - - [aws-vault](#aws-vault) - - [LDAP Integration](#ldap-integration) - - [Security Integration](#security-integration) - - [InfoSec security audit role](#infoSec-security-audit-role) - - [Network Security Monitoring](#network-security-monitoring) (NSM) - - [Integrated IP Blacklisting](#integrated-ip-blacklisting) - - [Log Integration with Mozilla Investigator](#log-integration-with-mozilla-investigator) (MIG) - - [CloudTrail Integration](#cloudtrail-integration) - - [Additional Services](#additional-services) - - [Cloud Health Integration](#cloud-health-integration) - - [Billing Support](#billing-support) - - [Tainted Resources](#tainted-resources) - - [Platform Monitoring](#platform-monitoring) - - [High Availability](#high-availability) + +Nubis accounts contain all of the basic services that one would expect to find +in a datacenter, such as monitoring and integration into security systems. The +account services are updated and maintained by the Nubis development team. We +take care of image upgrades as well as ensuring the services operate correctly +and integrate with each other. In other words, there is nothing for you to set +up or configure, you simply consume these services. Lets briefly look at some of +the things provided with a Nubis account: + +* [Accounts](#accounts) + * [Account Diagram](#account-diagram) + * [Multiple environments](#multiple-environments) +* [Quarterly Updates](#quarterly-updates) + * [Distribution upgrades](#distribution-upgrades) + * [Package updates](#package-updates) + * [New services](#new-services) + * [Application Image Updates](#application-image-updates) +* [Security Updates](#security-updates) +* [Included Services](#included-services) + * [Proxies](#proxies) + * [NATs](#nats) + * [Consul Integration](#consul-integration) + * [Fluent Integration](#fluent-integration) + * [Jumphosts](#jumphosts) +* [User Management](#user-management) + * [MFA](#mfa) + * [aws-vault](#aws-vault) + * [LDAP Integration](#ldap-integration) +* [Security Integration](#security-integration) + * [InfoSec security audit role](#infoSec-security-audit-role) + * [Network Security Monitoring](#network-security-monitoring) (NSM) + * [Integrated IP Blacklisting](#integrated-ip-blacklisting) + * [Log Integration with Mozilla Investigator](#log-integration-with-mozilla-investigator) + (MIG) + * [CloudTrail Integration](#cloudtrail-integration) +* [Additional Services](#additional-services) + * [Cloud Health Integration](#cloud-health-integration) + * [Billing Support](#billing-support) + * [Tainted Resources](#tainted-resources) + * [Platform Monitoring](#platform-monitoring) + * [High Availability](#high-availability) #### Accounts -Each application gets deployed into its own, separate Nubis account. This creates separation between applications. This separation serves several needs. First it provides a level of security by ensuring that one compromised application can not pose a risk to another application. Second, this separation creates a limited blast radius in case of load spike, application misconfiguration, etcetera. + +Each application gets deployed into its own, separate Nubis account. This +creates separation between applications. This separation serves several needs. +First it provides a level of security by ensuring that one compromised +application can not pose a risk to another application. Second, this separation +creates a limited blast radius in case of load spike, application +misconfiguration, etcetera. We deploy two types of accounts - - Production Accounts - - Hosts the production application along with a staging environment - - Intended to be hands-off immutable - - Sandbox accounts - - Hosts environments for working on application development, deployment developmenbt, new feature development, and so on - - Intended to allow developers and operators login abilities for development work - - Assumed to be tainted and not capable of production + +* Production Accounts + * Hosts the production application along with a staging environment + * Intended to be hands-off immutable +* Sandbox accounts + * Hosts environments for working on application development, deployment + development, new feature development, and so on + * Intended to allow developers and operators login abilities for development + work + * Assumed to be tainted and not capable of production #### Account Diagram + ![Nubis Account Diagram](media/account_diagram.png "Nubis Account Diagram") ##### Multiple environments (Stage & Prod) -Accounts can be provisioned with an arbitrary number of environments. Typically, a sandbox account is provisioned with a single sandbox environment. Production accounts are generally provisioned with three environments. The first is an admin environment which hosts the CI instance along with other necessary account administrative services. Next there is a staging environment which is where the CI instance automatically deploys code, builds images and runs tests. Finally there is a production environment which hosts the production facing application. - -Each environment contains all of the services that are included with a Nubis account which we will discuss shortly. The important thing to note here is that there are no services shared between the environments. Further the environments are self-contained and identical. This design is intentional and ensures bit-for-bit repeatability between the staging and production environments. This provides near certainty that if your application works in staging it will work in production. -In AWS terms, each environment is a separate VPC containing multiple public and private subnets spread across multiple availability zones. +Accounts can be provisioned with an arbitrary number of environments. Typically, +a sandbox account is provisioned with a single sandbox environment. Production +accounts are generally provisioned with three environments. The first is an +admin environment which hosts the CI instance along with other necessary account +administrative services. Next there is a staging environment which is where the +CI instance automatically deploys code, builds images and runs tests. Finally +there is a production environment which hosts the production facing application. + +Each environment contains all of the services that are included with a Nubis +account which we will discuss shortly. The important thing to note here is that +there are no services shared between the environments. Further the environments +are self-contained and identical. This design is intentional and ensures +bit-for-bit repeatability between the staging and production environments. This +provides near certainty that if your application works in staging it will work +in production. + +In AWS terms, each environment is a separate VPC containing multiple public and +private subnets spread across multiple availability zones. #### Quarterly Updates -The Nubis development team releases a new version of Nubis quarterly. It is the responsibility of the account owner to upgrade their account to the latest Nubis release within one quarter following a release. The Nubis development team works very diligently to ensure this upgrade is as seamless as possible. There is a very simple process to upgrade an account, consisting primarily of a single Terraform command. + +The Nubis development team releases a new version of Nubis quarterly. It is the +responsibility of the account owner to upgrade their account to the latest Nubis +release within one quarter following a release. The Nubis development team works +very diligently to ensure this upgrade is as seamless as possible. There is a +very simple process to upgrade an account, consisting primarily of a single +Terraform command. ##### Distribution Updates -Included with the quarterly updates are distribution updates. For all of the services included with Nubis, the Nubis development team will perform the upgrades and make sure everything works. When you upgrade to the newest version of Nubis, these upgrades will be transparent to you. You can expect that these upgrades are occurring and you don't need to worry about it. -For your application deployed on top of a Nubis account, you will need to ensure that it continues to function correctly. We will take a closer look at deploying applications when we discuss application image updates below. +Included with the quarterly updates are distribution updates. For all of the +services included with Nubis, the Nubis development team will perform the +upgrades and make sure everything works. When you upgrade to the newest version +of Nubis, these upgrades will be transparent to you. You can expect that these +upgrades are occurring and you don't need to worry about it. + +For your application deployed on top of a Nubis account, you will need to ensure +that it continues to function correctly. We will take a closer look at deploying +applications when we discuss application image updates below. ##### Package updates -In addition to full distribution updates, the Nubis development team also takes care of package updates for all of the services provided by Nubis. Again, these are throughly tested by the Nubis team and will be transparent and seamless to you. + +In addition to full distribution updates, the Nubis development team also takes +care of package updates for all of the services provided by Nubis. Again, these +are throughly tested by the Nubis team and will be transparent and seamless to +you. ##### New Services -As if that were not enough, you also get access to new services as they are developed. When the Nubis development team builds a new service, it is automatically included for you. Again, this is seamless and transparent to you. Things "just work". As a safeguard, new services are, typically, not enabled by default. New services need to be enabled by a feature flag prior to upgrading the Nubis account. -##### Application Image Updates -When it comes time for you to upgrade your application images, Nubis helps you here as well. The Nubis development team maintains the nubis-base image. We maintain several flavors of linux for you to choose from. Your application image will be built from one of these base images. To upgrade your application image, all you need to do is fire off a new image build. This is accomplished with a tool we have built to drive Packer, called nubis-builder. We will see an example of how to use nubis-builder and build an application image a little later on. +As if that were not enough, you also get access to new services as they are +developed. When the Nubis development team builds a new service, it is +automatically included for you. Again, this is seamless and transparent to you. +Things "just work". As a safeguard, new services are, typically, not enabled by +default. New services need to be enabled by a feature flag prior to upgrading +the Nubis account. -Nubis can not do everything for you, so there is a bit of work you will need to do when it comes time to upgrade. You will need to update the deployment configuration for your application. Typically this is done through puppet configuration files inside of the deployment repository for your application. This will also be discussed in more detail shortly. +##### Application Image Updates -Lastly you will need to test your application and its deployment. This is made easy when using Nubis as we provide a staging environment for this purpose. I will go into details about environments and their use when we discuss Nubis Deployments below. +When it comes time for you to upgrade your application images, Nubis helps you +here as well. The Nubis development team maintains the nubis-base image. We +maintain several flavors of linux for you to choose from. Your application image +will be built from one of these base images. To upgrade your application image, +all you need to do is fire off a new image build. This is accomplished with a +tool we have built to drive Packer, called nubis-builder. We will see an example +of how to use nubis-builder and build an application image a little later on. + +Nubis can not do everything for you, so there is a bit of work you will need to +do when it comes time to upgrade. You will need to update the deployment +configuration for your application. Typically this is done through puppet +configuration files inside of the deployment repository for your application. +This will also be discussed in more detail shortly. + +Lastly you will need to test your application and its deployment. This is made +easy when using Nubis as we provide a staging environment for this purpose. I +will go into details about environments and their use when we discuss Nubis +Deployments below. #### Security Updates -When the Nubis team is notified of a security vulnerability we will apply the necessary patches and cut a patch release. This will require you to update your account. This is done with the same process using the same Terraform command as you would use for a normal release. The only difference is that the time-line for the release is shorter. The InfoSec team sets the time-line and it is typically something like 24 hours or 3 days. -In addition to updating your Nubis Account you may need to rebuild your application image. This is also the same process you use during a normal release. +When the Nubis team is notified of a security vulnerability we will apply the +necessary patches and cut a patch release. This will require you to update your +account. This is done with the same process using the same Terraform command as +you would use for a normal release. The only difference is that the time-line +for the release is shorter. The InfoSec team sets the time-line and it is +typically something like 24 hours or 3 days. + +In addition to updating your Nubis Account you may need to rebuild your +application image. This is also the same process you use during a normal release. -In short, a security release is exaclty the same as a normal release, it simply needs to be done faster. +In short, a security release is exaclty the same as a normal release, it simply +needs to be done faster. #### Included Services -Nubis includes a growing number of services. As these should be familiar to you, I will only briefly mention the technology we are using and note where to locate additional information. - - [Proxies](#proxies) - - [NATs](#nats) - - [Consul Integration](#consul-integration) - - [Fluent Integration](#fluent-integration) - - [Jumphosts](#jumphosts) +Nubis includes a growing number of services. As these should be familiar to you, +I will only briefly mention the technology we are using and note where to locate +additional information. + +* [Proxies](#proxies) +* [NATs](#nats) +* [Consul Integration](#consul-integration) +* [Fluent Integration](#fluent-integration) +* [Jumphosts](#jumphosts) ##### Proxies -For each private subnet within a VPC there is a [Squid](http://www.squid-cache.org/) http proxy. Currently the proxies allow all outbound http and https connections while logging for security and auditing reasons. The InfoSec team has a goal of whitelist only proxies. That means that outbound connections would only be allowed to pre-configured addresses. This change will be coming soon. -When taking advantage of the nubis-base images, proxy environment variables are already set. Many tools and applications already take advantage of these environment variables making things quite transparent. If your application is not aware of these variables you may need to configure outbound connections manually. Documentation can be found [here](https://github.com/nubisproject/nubis-nat/blob/052d01fda8472d4c85a7a7dca507943a1fc40dfc/README.md#proxy). +For each private subnet within a VPC there is a [Squid](http://www.squid-cache.org/) +http proxy. Currently the proxies allow all outbound http and https connections +while logging for security and auditing reasons. The InfoSec team has a goal of +whitelist only proxies. That means that outbound connections would only be +allowed to pre-configured addresses. This change will be coming soon. + +When taking advantage of the nubis-base images, proxy environment variables are +already set. Many tools and applications already take advantage of these +environment variables making things quite transparent. If your application is +not aware of these variables you may need to configure outbound connections +manually. Documentation can be found [here](https://github.com/nubisproject/nubis-nat/blob/052d01fda8472d4c85a7a7dca507943a1fc40dfc/README.md#proxy). ```bash + export http_proxy="http://proxy.service.consul:3128/" export https_proxy="http://proxy.service.consul:3128/" export no_proxy="localhost,127.0.0.1,.localdomain,10.0.0.0/8,169.254.169.254" export HTTP_PROXY="$http_proxy" export HTTPS_PROXY="$https_proxy" export NO_PROXY="$no_proxy" + ``` ##### NATs -At the edge of the private subnets exist some Network Address Translation (NAT) instances. These instances do exactly what you might accept in terms of address translation. -The other task these instances perform is outbound firewalling. You can basically assume that all outbound connectivity (not going through the http proxies) are blocked by these firewalls. You can create exceptions through consul configuration. The process is documented [here](https://github.com/nubisproject/nubis-nat/blob/052d01fda8472d4c85a7a7dca507943a1fc40dfc/README.md#forcing-connection-through-proxy), but note that you will need InfoSec approval to carry this into production. +At the edge of the private subnets exist some Network Address Translation (NAT) +instances. These instances do exactly what you might accept in terms of address +translation. + +The other task these instances perform is outbound firewalling. You can +basically assume that all outbound connectivity (not going through the http +proxies) are blocked by these firewalls. You can create exceptions through +consul configuration. The process is documented [here](https://github.com/nubisproject/nubis-nat/blob/052d01fda8472d4c85a7a7dca507943a1fc40dfc/README.md#forcing-connection-through-proxy), +but note that you will need InfoSec approval to carry this into production. ```bash + # This populates the variables '$NUBIS_PROJECT' and '$NUBIS_ENVIRONMENT' NUBIS_PROJECT=$(nubis-metadata NUBIS_PROJECT) NUBIS_ENVIRONMENT=$(nubis-metadata NUBIS_ENVIRONMENT) # Look up the current list -consulate kv get nubis-nat-$NUBIS_ENVIRONMENT/$NUBIS_ENVIRONMENT/config/IptablesAllowTCP +consulate kv get \ +nubis-nat-$NUBIS_ENVIRONMENT/$NUBIS_ENVIRONMENT/config/IptablesAllowTCP # Add the allowed ports in consul including any already existing -```bash -consulate kv set nubis-nat-$NUBIS_ENVIRONMENT/$NUBIS_ENVIRONMENT/config/IptablesAllowTCP '[ "3128", "587", "443", "123" ]' +consulate kv set \ +nubis-nat-$NUBIS_ENVIRONMENT/$NUBIS_ENVIRONMENT/config/IptablesAllowTCP \ +'[ "3128", "587", "443", "123" ]' + ``` ##### Consul Integration -[Consul](https://www.consul.io/) provides a number of functions. It hosts a key-value store that we use for run-time tunables. It coordinates service discovery for all Nubis services. Additionally it provides a locking service that we use to coordinate high availability services, among other things. -The nubis-base image comes pre-configured to connect to the consul cluster. Additionally it contains [confd](http://www.confd.io/), a tool which is used for application tuning and run-time configuration management. +[Consul](https://www.consul.io/) provides a number of functions. It hosts a +key-value store that we use for run-time tunables. It coordinates service +discovery for all Nubis services. Additionally it provides a locking service +that we use to coordinate high availability services, among other things. + +The nubis-base image comes pre-configured to connect to the consul cluster. +Additionally it contains [confd](http://www.confd.io/), a tool which is used for +application tuning and run-time configuration management. -An example of using consul's locking mechanism to ensure only one web server executes a command can be seen [here](https://github.com/nubisproject/nubis-dpaste/blob/665d5ccc01c0448dbdbe8bd0be04104a7d74ee1e/nubis/bin/migrate#L132) +An example of using consul's locking mechanism to ensure only one web server +executes a command can be seen [here](https://github.com/nubisproject/nubis-dpaste/blob/665d5ccc01c0448dbdbe8bd0be04104a7d74ee1e/nubis/bin/migrate#L132) ``` bash + consul lock $NUBIS_STACK/$NUBIS_ENVIRONMENT/syncdb \ /var/www/dpaste/manage.py syncdb --migrate + ``` -An example of setting and retrieving values for the consul key-value store can be seen [here](https://github.com/nubisproject/nubis-dpaste/blob/665d5ccc01c0448dbdbe8bd0be04104a7d74ee1e/nubis/bin/migrate#L68). It is worth noting in this example that we are generating a random password for a MySql database. This approach exemplifies the notion that things like this do not need to be set by humans. In fact, it is more secure if the humans are never involved in the process. Further this removes the possibility of a copy / past error. +An example of setting and retrieving values for the consul key-value store can +be seen [here](https://github.com/nubisproject/nubis-dpaste/blob/665d5ccc01c0448dbdbe8bd0be04104a7d74ee1e/nubis/bin/migrate#L68). +It is worth noting in this example that we are generating a random password for +a MySql database. This approach exemplifies the notion that things like this do +not need to be set by humans. In fact, it is more secure if the humans are never +involved in the process. Further this removes the possibility of a copy / paste +error. ``` bash + # Source the consul connection details from the metadata api NUBIS_STACK=$(nubis-metadata NUBIS_STACK) NUBIS_ENVIRONMENT=$(nubis-metadata NUBIS_ENVIRONMENT) @@ -241,18 +403,28 @@ if [ "$DB_PASSWORD" == "" ]; then DB_PASSWORD=`makepasswd --minchars=12 --maxchars=16` consulate kv set $CONSUL_PREFIX/DB_PASSWORD $DB_PASSWORD fi + ``` -An example of service discovery can be seen in this code taken from the proxy section above. +An example of service discovery can be seen in this code taken from the proxy +section above. + ``` bash + export http_proxy="http://proxy.service.consul:3128/" export https_proxy="http://proxy.service.consul:3128/" + ``` ##### Fluent Integration -[Fluent](http://www.fluentd.org/) is a log aggregation service. It is integrated into the nubis-base image as well. By default all of the system logs are sent to fluent. You will want to send any application logs to fluent as well. An example can be seen [here](https://github.com/nubisproject/nubis-dpaste/blob/665d5ccc01c0448dbdbe8bd0be04104a7d74ee1e/nubis/puppet/fluentd.pp) + +[Fluent](http://www.fluentd.org/) is a log aggregation service. It is integrated +into the nubis-base image as well. By default all of the system logs are sent to +fluent. You will want to send any application logs to fluent as well. An example +can be seen [here](https://github.com/nubisproject/nubis-dpaste/blob/665d5ccc01c0448dbdbe8bd0be04104a7d74ee1e/nubis/puppet/fluentd.pp) ```ruby + class { 'fluentd': service_ensure => stopped } @@ -269,182 +441,338 @@ fluentd::source { 'apache_access': }, } -fluentd::source { 'apache_error': - configfile => 'apache', - type => 'tail', - format => '/^\[[^ ]* (?