Docker EE Operational Checklist
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
Dockerfile
README.md

README.md


THE ENTERPRISE IT CHECKLIST FOR DOCKER OPERATIONS

Version:17.06

Source: https://github.com/nicolaka/checklist

docker run -t nicolaka/checklist:17.06


☑ Infrastructure

  • Cluster Sizing and Zoning
  • Supported and Compatible ( OS, Docker Engine, UCP, DTR)
  • Adequate Resource ( Manager vs Worker Nodes)
    • Manager: 16G mem, 4 vCPU, 1+ Gbps, 32+ GB disk
    • Worker(minimum): 4G mem, 2 vCPU, 100+ Mbps, 8 GB disk
  • Resources

☑ Orchestration Management

☑ Image Distribution

☑ Security

☑ Network

  • Pick right networking driver for your application
  • Select proper publishing mode ( Ingress vs. Host Mode)
  • Pick suitable load-balancing mode ( client side = dnsrr, server-side = vip)
  • Network latency < 100ms
  • Segment App at L3 with Overlays (1 App 1 Overlay Network)
  • Utilize built-in encrypted overlay feature ( app <--> app encrypted)
  • Pick the application subnet size carefully
  • Designated non-overlapping subnets to be used by Docker for overlay networks
  • Resources:

☑ Storage

☑ Logging and Monitoring

  • External centralized logging for engine and application containers logs
  • Local logging for active trouble-shooting
  • Host-level and container-level resource monitoring
  • DTR image backend storage monitoring
  • Docker engine storage monitoring
  • Use built-in application health checking functionality
  • Resources:

☑ Integration

  • UCP and DTR are well integrated ( SSO, DCT..etc)
  • CI/CD tooling ( Jenkins, Bamboo, CircleCI..etc)
  • Development tooling (dev machines, IDEs)
  • Configuration automation tools (Puppet, Chef, Ansible, Salt)
  • Resource provisioning systems (Terraform..etc)
  • Change management systems
  • Internal/external DNS or other service discovery and registration systems
  • Load balancing for both the management plane and each of the applications ( L4/L7)
  • Incident/ticketing management systems (ServiceNow..etc)

☑ Disaster Recovery

☑ Testing

  • Multi-platform image pull and push to DTR
  • Confirm users have the right set of access to their respective resources
  • Confirm application resource limitation works as expected
  • End-to-end stack deployment from CLI and UI
  • Updating applications with new configuration, images, networks using rolling upgrade