Skip to content
Jonathan Meyer edited this page Sep 10, 2019 · 13 revisions

Deployment Testing

Full testing of the Scale system is a somewhat arduous process. There are multiple sub-components that Scale depends on for state persistence and log capture. At its core, Scale consists of 5 main components:

  • Scheduler
  • Silo API
  • Scale API
  • UI Frontend
  • Message Handlers

These components are logically separated into 3 GitHub repositories:

  • Scale Scheduler, API and Message Handlers (github.com/ngageoint/scale)
  • Silo API (github.com/ngageoint/seed-silo)
  • Scale UI (github.com/ngageoint/scale-ui)

The reasons for this split are primarily to allow individual teams to independently iterate on their respective projects without undue interdependence. Ultimately, for the purposes of simplified Scale deployments and testing these individual projects are all deployed into DCOS via the container that contains the Scale scheduler.

We have hard dependencies on a number of additional services, that in a default deployment are installed by default:

  • Postgresql
  • RabbitMQ
  • Elasticsearch
  • Fluentd

Hashicorp Vault is an additional dependency if your system is going to leverage secrets and must be tested before any release.

This is a high-level view of the system. Throughout the following walk-through, I'll try to call out the places you can substitute other choices in specific cases.

UI Only Deploy

The following steps are for performing a rolling update to an existing Scale deployment that only replaces the UI container. This container has 2 responsibilities: serving the compiled UI assets and single routing entry-point (via Nginx) to eliminate any CORS hurdles for the UI. The settings that govern the routing behavior can be found here: https://github.com/ngageoint/scale-ui/tree/master/docker

Assumptions:

  • DCOS 1.11+ cluster w/Admin login
  • DCOS package Marathon LB installed
  • DCOS Public Agents configured to support dynamic subdomains by having wildcard DNS entry
  • Previous run through of subsequent Full Scale Deploy section
  • Experience deploying, scaling and removing services in DCOS

Checklist:

  • Create a new build of the Scale UI Docker image.
  • Edit configuration of the scale-ui service within DCOS. A basic change to Docker image JSON key value will re-deploy.
  • Once the scale-ui service has gone healthy, you can open the address specified in the HAPROXY_0_VHOST label, in your browser.
  • If the service does not go healthy, check the service stdout and stderr logs for any errors.

End-to-end Deployment Test

The following steps outline the steps to validate a build prior to a release:

Assumptions

  • DCOS 1.11+ cluster w/Admin login
  • DCOS package Marathon LB installed
  • DCOS Public Agents configured to support dynamic subdomains by having wildcard DNS entry
  • Experience deploying, scaling and removing services in DCOS
  • Installation ofPostman or Newman for consuming testing collection.
  • A full set of images (geointdev/scale, geointdev/scale-ui, geointdev/scale-fluentd and optionally geointdev/scale-vault) with matched tags.

Tips

  • Deploys speeds can be vastly improved by using an in-cluster Docker Hub mirror proxy. Read to the bottom of the README for how you need to adjust your Docker image references to accomodate. https://github.com/gisjedi/docker-registry-mirror
  • Ensure DCOS is free of any scale-* services. Delete it all. This will eliminate any legacy data for testing and cause your deploys to progress much faster.

Checklist

  1. Deploy Vault https://github.com/ngageoint/scale/tree/master/dockerfiles/vault. You may need to clear out the vault key in Zookeeper if it has been previously initialized. This can be done using Exhibitor APIs:
    curl -k 'https://omega.aisohio.net/exhibitor/exhibitor/v1/explorer/znode/vault' -X DELETE -H 'netflix-ticket- 
    number: 1' -H 'netflix-reason: redeploy vault' -H 'netflix-user-name: meyerjd' --compressed
    
  2. Deploy minimal marathon.json into DCOS via Services (https://your-dcos/services) or DCOS CLI:
    {
    "healthChecks": [
    	{
    	"gracePeriodSeconds": 300,
    	"intervalSeconds": 30,
    	"timeoutSeconds": 20,
    	"maxConsecutiveFailures": 3,
    	"protocol": "COMMAND",
    	"command": {
    		"value": "ps -ef | grep 'manage.py scale_scheduler' | grep -v grep > /dev/null"
    	}
    	}
    ],
    "env": {
    	"SCALE_VHOST": "scale.omega.aisohio.net",
    	"SECRETS_TOKEN": "ROOT_TOKEN_FROM_VAULT",
    	"SECRETS_URL":"https://scale-vault.marathon.l4lb.thisdcos.directory:8200",
    	"DCOS_PACKAGE_FRAMEWORK_NAME": "scale",
    	"ENABLE_BOOTSTRAP": "true",
    	"ADMIN_PASSWORD": "admin"
    },
    "gpus": 0,
    "disk": 0,
    "mem": 1024,
    "cpus": 1,
    "args": ["scale_scheduler"],
    "container": {
    	"volumes": [],
    	"docker": {
    	"image": "geointdev/scale",
    	"forcePullImage": true,
    	"privileged": false
    	},
    	"type": "DOCKER"
    },
    "instances": 1,
    "id": "/scale"
    }
    
  3. Wait for all services to be healthy. Some may cycle a few times due to the timing of the launch of dependent services such as fluentd on elasticsearch. You should see scale, scale-db, scale-elasticsearch, scale-fluentd, scale-rabbitmq, scale-ui, scale-webserver.
  4. Once all services are healthy in DCOS, you can browse to the location you specified in the SCALE_VHOST environment variable. In Omega, that is: https://scale.omega.aisohio.net/
  5. The Scale UI should appear and prompt you for a login. Use the default superuser admin username (admin) and password (admin) as configured via the ADMIN_PASSWORD environment variable.
  6. The next step is to ensure that authentication worked and identifies you properly. This can be done by clicking the avatar in the top-right of the UI - it should list you as Admin User
  7. Verify that all nodes have gone through cleanup phase and are healthy. https://scale.omega.aisohio.net/system/nodes?active=true&ready=true&paused=true&deprecated=true&offline=true&degraded=true&initial_cleanup=true&image_pull=true&scheduler_stopped=true&collapsed=true
  8. Verify that the message handlers are up and error-free: https://omega.aisohio.net/#/services/detail/%2Fscale/tasks?q=is%3Aactive+message (check logs link)
  9. Drop test data into location for scanning. /nas/DCOS/omega/testing/input/ should be used and test data can be found at /nas/Data/happy-couples.
  10. With an initial system configured, you can leverage our host workspace type Postman (newman) collection and environment to test an end-to-end processing pipeline.
    wget https://raw.githubusercontent.com/ngageoint/scale/master/tests/postman/environment.json
    https://raw.githubusercontent.com/ngageoint/scale/master/tests/postman/full-host-test.json
    newman run -k -e environment.json full-host-test.json
    
Clone this wiki locally