Stratos

Eliminate cloud instance cold-start delays with pre-warmed, instantly-ready nodes.

What is Stratos?

Stratos is a Kubernetes operator that eliminates cloud instance cold-start delays by maintaining pools of pre-warmed, stopped instances ready to start in seconds. Instead of waiting 3-5 minutes for new nodes to provision, boot, and initialize, Stratos enables sub-minute scale-up times by keeping instances in a "warm standby" state.

The Problem

Spinning up a new cloud instance typically takes 3-5 minutes:

Instance provisioning - Cloud provider allocates resources
OS boot - Operating system initialization
Kubernetes join - Node registers with the cluster
CNI setup - Network plugin initialization
Application initialization - User data scripts, image pulls

For time-sensitive workloads like CI/CD pipelines, autoscaling events, or burst traffic handling, this delay is unacceptable.

How Stratos Solves It

Stratos maintains a pool of pre-warmed, stopped instances using a four-phase lifecycle:

warmup --> standby --> running --> stopping
                ^                     |
                |_____________________|

Warmup - Stratos launches instances that run initialization scripts (join cluster, pull images, configure networking) and self-stop when ready
Standby - Stopped instances wait in the pool, costing only storage (no compute charges)
Running - When pods are pending, Stratos instantly starts standby nodes (seconds, not minutes)
Stopping - Empty nodes are drained and returned to standby for reuse

Key Features

Sub-minute scale-up - Start pre-warmed nodes in seconds instead of minutes
Cost efficient - Stopped instances only incur storage costs, not compute
Kubernetes native - Declarative NodePool and NodeClass CRDs, integrates with existing clusters
CNI-aware - Properly handles startup taints for VPC CNI, Cilium, Calico
Automatic maintenance - Pool replenishment, node recycling, state synchronization
Observable - Prometheus metrics for all operations

Quick Start

Prerequisites

Kubernetes cluster (1.26+)
Helm 3.x
AWS credentials configured (for EC2 operations)

Installation

helm install stratos oci://ghcr.io/stratos-sh/charts/stratos \
  --namespace stratos-system --create-namespace \
  --set clusterName=my-cluster

Create Resources

AWSNodeClass (cloud-specific configuration):

apiVersion: stratos.sh/v1alpha1
kind: AWSNodeClass
metadata:
  name: workers
spec:
  region: us-east-1
  instanceType: m5.large
  ami: ami-0123456789abcdef0
  subnetIds: ["subnet-12345678"]
  securityGroupIds: ["sg-12345678"]
  iamInstanceProfile: arn:aws:iam::123456789:instance-profile/node-role
  userData: |
    #!/bin/bash
    /etc/eks/bootstrap.sh my-cluster \
      --kubelet-extra-args '--register-with-taints=node.eks.amazonaws.com/not-ready=true:NoSchedule'
    until curl -sf http://localhost:10248/healthz; do sleep 5; done
    sleep 30
    poweroff

NodePool (references the AWSNodeClass):

apiVersion: stratos.sh/v1alpha1
kind: NodePool
metadata:
  name: workers
spec:
  poolSize: 10
  minStandby: 3
  template:
    nodeClassRef:
      kind: AWSNodeClass
      name: workers
    labels:
      stratos.sh/pool: workers
    startupTaints:
      - key: node.eks.amazonaws.com/not-ready
        value: "true"
        effect: NoSchedule

Verify:

kubectl get awsnodeclasses,nodepools

Architecture Overview

+------------------+     +-------------------+     +------------------+
|   NodePool CRD   | --> | Stratos Controller| --> |   Cloud Provider |
|  (Desired State) |     |   (Reconciler)    |     |   (AWS EC2)      |
+------------------+     +-------------------+     +------------------+
        |                        |
        v                        v
+------------------+       +-------------+
| AWSNodeClass CRD |       |  K8s Nodes  |
| (Cloud Config)   |       | (Managed)   |
+------------------+       +-------------+

NodePools reference a cloud-specific NodeClass (e.g., AWSNodeClass) that contains instance configuration. This separation allows multiple NodePools to share the same cloud configuration.

The controller watches for:

NodePool changes - Create/update/delete pools
NodeClass changes - Cloud configuration updates (e.g., AWSNodeClass)
Pending pods - Trigger scale-up when pods can't be scheduled
Node state changes - Track node lifecycle and health

Use Cases

CI/CD Pipelines

Traditional autoscalers don't just make you wait for a node to boot — they give you a completely cold environment. Every pipeline run pulls all DaemonSet images from scratch, then pulls the CI agent image, and every docker build or npm install starts with an empty cache. Stratos nodes come pre-warmed with all DaemonSet images already pulled, and since nodes are reused (stopped and restarted rather than terminated), build caches, Docker layer caches, and package manager caches persist across runs. Your second pipeline run is dramatically faster than the first.

LLM / AI Model Serving

Large model images (often 10-50GB+) make cold starts painfully slow. Downloading a model, loading it into GPU memory, and running health checks can take 10+ minutes before the first request is served. With Stratos, the model image is pre-pulled during the warmup phase and persists on the node's EBS volume. When demand spikes, a standby node starts in seconds with the model image already on disk — cutting startup time from minutes to seconds.

Scale-to-Zero Applications

Stratos's ~20-second pending-to-running time (when properly configured) makes true scale-to-zero viable for latency-sensitive services. Pair a simple ingress doorman with a 30-second timeout: when a request arrives at a scaled-down service, the doorman holds the connection while Stratos starts a standby node, and the request completes within the timeout window. No idle compute costs, no cold-start frustration.

Documentation

Full documentation is available at stratos-sh.github.io/stratos

Getting Started - Installation and quickstart
Concepts - Architecture and node lifecycle
Guides - AWS setup, scaling policies, monitoring
API Reference - NodePool and AWSNodeClass CRDs

Running Docs Locally

cd docs
npm install
npm start

The documentation site will be available at http://localhost:3000/stratos/.

Development

Build

make build

Run Locally

# With fake cloud provider (for testing)
go run ./cmd/stratos/main.go --cluster-name=main --cloud-provider=fake

# With AWS
go run ./cmd/stratos/main.go --cluster-name=main --cloud-provider=aws

Test

# Unit tests
make test

# Integration tests (requires envtest setup)
make test-integration

# Coverage report
make coverage

Status

Stratos is currently in alpha development. The API may change between versions.

License

Apache License 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 82 Commits
.claude		.claude
.github		.github
api/v1alpha1		api/v1alpha1
backlog		backlog
cmd/stratos		cmd/stratos
deploy		deploy
docs		docs
hack		hack
internal		internal
openspec		openspec
tests/integration		tests/integration
.dockerignore		.dockerignore
.gitignore		.gitignore
.golangci.yml		.golangci.yml
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Stratos

What is Stratos?

The Problem

How Stratos Solves It

Key Features

Quick Start

Prerequisites

Installation

Create Resources

Architecture Overview

Use Cases

CI/CD Pipelines

LLM / AI Model Serving

Scale-to-Zero Applications

Documentation

Running Docs Locally

Development

Build

Run Locally

Test

Status

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors 3

Languages

License

stratos-sh/stratos

Folders and files

Latest commit

History

Repository files navigation

Stratos

What is Stratos?

The Problem

How Stratos Solves It

Key Features

Quick Start

Prerequisites

Installation

Create Resources

Architecture Overview

Use Cases

CI/CD Pipelines

LLM / AI Model Serving

Scale-to-Zero Applications

Documentation

Running Docs Locally

Development

Build

Run Locally

Test

Status

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors 3

Languages

Packages