PromQL-driven VM placement optimization engine for OpenStack
Kronos evaluates Prometheus metrics per Nova host aggregate and plans live migrations to balance (spread) or consolidate (pack) workloads. Multiple policies on the same aggregate are combined into a single weighted score, so the planner can trade off memory and CPU (or any other PromQL-driven dimensions) simultaneously.
When dry-run is disabled, the engine casts migration tasks to a per-aggregate RPC topic via oslo.messaging. A dedicated executor daemon consumes the tasks and carries them out through the Nova live-migrate API.
Status: Pre-alpha. Not yet ready for production.
+-----------+ +------+ +----------+
| Prometheus| | Nova | | RabbitMQ |
+-----+-----+ +---+--+ +-----+----+
| | |
PromQL queries host aggregates |
| | |
+-----v-----------------v--+ |
| kronos-engine | |
| for each aggregate: | |
| score all policies | |
| combined imbalance | |
| profile all VMs | |
| enforce affinity rules | |
| plan combined moves | |
+------------+--------------+ |
| |
MigrationTask per step |
RPC cast ββββββββββββββββββββββββββββΊ |
|
+------------------------------------------v--+
| kronos-executor |
| consume β pre-flight β live-migrate β poll |
| β post-flight β publish result |
+---------------------------------------------+
- Policies define PromQL queries, thresholds, and scheduling modes. All policies in one file apply to every aggregate the engine manages.
- Scorer runs each policy's PromQL imbalance query against the aggregate's host list, enforces the [0, 1] contract, and detects imbalance.
- Profiler collects per-VM resource weights across all policies in one pass. Each VM carries a per-policy weight dict.
- Combined scoring: the planner simulates moves against every policy's
scores simultaneously, minimizing a weighted sum of imbalances (policy
weightvalues sum to 1.0). - Constraint checker respects all four Nova server-group placement
policies:
affinity,anti-affinity,soft-affinity, andsoft-anti-affinity. A move that would break any of them is rejected. - Affinity enforcer (optional) runs before the planner and proposes
migrations to repair existing server-group violations. Enabled per
policy class via
[engine] enforce_hard_affinityandenforce_soft_affinity. Destinations are picked to minimise the combined imbalance and never cross a policy threshold. Repair and imbalance moves share a singlemax_migrations_per_cyclebudget. - Cooldown tracker prevents oscillation via aggregate-level and instance-level cooldowns, and quarantines VMs whose migration has definitively failed so the planner stops re-proposing them.
- Executor consumes migration tasks, validates pre-flight state, calls Nova live-migrate, polls until completion, and verifies post-flight.
- Python 3.12+
- OpenStack cloud with Nova and Keystone
- Prometheus with host-level metrics (e.g.,
node_exporter,libvirt_exporter) - RabbitMQ β the existing OpenStack broker; only needed when
dry_run = false
git clone https://github.com/kronos-openstack/kronos.git
cd kronos
pip install -e .Kronos uses two configuration files:
| File | Format | Purpose |
|---|---|---|
kronos.conf |
INI (oslo.config) | Daemon settings: intervals, Prometheus URL, Nova auth, messaging, executor |
policies.yaml |
YAML (Pydantic) | PromQL queries, thresholds, scheduling modes |
Copy the samples and edit them:
sudo mkdir -p /etc/kronos
sudo cp etc/kronos/kronos.conf.sample /etc/kronos/kronos.conf
sudo cp etc/kronos/policies.yaml.sample /etc/kronos/policies.yamlMinimal kronos.conf:
[engine]
evaluation_interval = 60
dry_run = true
policies_file = /etc/kronos/policies.yaml
# Aggregate scope: at least one of `aggregates` or
# `include_unassigned_hosts = true` must be set.
aggregates = my-aggregate
include_unassigned_hosts = false
# Cooldowns (seconds)
cooldown = 600
instance_cooldown = 900
# Quarantine window applied to a VM after its migration definitively
# failed (retries exhausted with PreFlightError / MigrationFailed /
# MigrationTimeout). Use -1 for indefinite quarantine.
instance_quarantine_seconds = 3600
# Optional: repair existing server-group violations every cycle.
# Both off by default.
enforce_hard_affinity = false
enforce_soft_affinity = false
[prometheus]
url = http://prometheus:9090
[nova]
auth_type = password
auth_url = http://keystone:5000/v3
username = kronos
password = secret
project_name = service
user_domain_name = Default
project_domain_name = Default
[messaging]
transport_url = rabbit://guest:guest@localhost:5672/
[executor]
max_concurrent_migrations = 2
migration_timeout = 600
max_retries = 3
stagger_seconds = 30Minimal policies.yaml:
Aggregates live on the engine ([engine] aggregates), not the policy.
Enabled policy weights must sum to 1.0. All policies in one file must
share a mode (spread or pack).
policies:
- name: cpu-spread
mode: spread
weight: 0.3
imbalance_query: |
1 - avg by (nodename) (
rate(node_cpu_seconds_total{mode="idle"}[5m])
* on(instance) group_left(nodename)
node_uname_info
)
host_label: nodename
vm_profile_query: |
rate(libvirt_domain_info_cpu_time_seconds_total[5m])
* on(domain, instance) group_left(instance_id)
libvirt_domain_openstack_info
vm_profile_label: instance_id
vm_profile_label_type: nova_instance_uuid
vm_profile_fallback: host_average
threshold: 0.05
max_migrations_per_cycle: 3
- name: memory-spread
mode: spread
weight: 0.7
imbalance_query: |
1 - avg by (nodename) (
node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes
* on(instance) group_left(nodename)
node_uname_info
)
host_label: nodename
vm_profile_query: |
libvirt_domain_memory_stats_rss_bytes
/ on(instance) group_left()
label_replace(node_memory_MemTotal_bytes, "instance", "$1:9177", "instance", "(.+):.*")
* on(domain, instance) group_left(instance_id)
libvirt_domain_openstack_info
vm_profile_label: instance_id
vm_profile_label_type: nova_instance_uuid
vm_profile_fallback: skip
threshold: 0.10
max_migrations_per_cycle: 3# Validate configuration and test connectivity
kronos-test-config --config-file /etc/kronos/kronos.conf
# Start the engine (dry-run by default)
kronos-engine --config-file /etc/kronos/kronos.conf
# Start the executor for a specific aggregate (requires dry_run = false)
kronos-executor --config-file /etc/kronos/kronos.conf --aggregate my-aggregate
# Or for the unassigned-hosts pool (clusters without aggregates)
kronos-executor --config-file /etc/kronos/kronos.conf --unassignedCapture a snapshot of live OpenStack + Prometheus state and replay it locally:
# Record
kronos-record --config-file /etc/kronos/kronos.conf /tmp/snapshot
# Replay a single engine cycle against recorded data
kronos-replay --config-file /etc/kronos/kronos.conf /tmp/snapshot| Mode | Behavior |
|---|---|
spread |
Balance load evenly across hosts β greedy combined-score simulation picks the best single move per round |
pack |
Consolidate VMs onto fewer hosts β First Fit Decreasing on combined utilization |
All policies in one file must share a mode. Migrations never cross aggregate boundaries.
One engine owns a set of aggregates (or the unassigned-hosts pool) and evaluates all enabled policies against each aggregate every cycle:
- Score β each policy runs its PromQL imbalance query; values must be in [0, 1]
- Profile β collect per-VM resource weights across all policies in one pass
- Constrain β reject any move that would break a Nova server-group placement rule
- Enforce (optional) β when
enforce_hard_affinity/enforce_soft_affinityis set, propose repair moves for VMs already violating their groups - Plan β simulate moves minimizing the weighted combined imbalance, sharing the per-cycle migration budget with the enforcer
- Cast β send
MigrationTaskover RPC tokronos.migrations.<aggregate>. Each task carries aphasefield (affinity,spread, orpack) that surfaces in logs so operators can see why each migration was proposed - Cooldown β record aggregate-level and instance-level cooldown on plan emission; skip VMs already in cooldown or quarantine on the next cycle
- Result listener β subscribe to
kronos.results.<aggregate>, quarantine VMs on a definitive failure (PreFlightError, MigrationFailed, MigrationTimeout) so the planner stops re-proposing them
One executor per aggregate consumes tasks from RabbitMQ:
- Schedule β priority queue sorted by
not_beforetimestamps, semaphore for concurrency - Pre-flight β verify instance is ACTIVE, no pending task_state, still on source host
- Migrate β call Nova live-migrate API
- Poll β check migration status until terminal state or timeout
- Post-flight β confirm instance landed on destination host and is ACTIVE
- Retry β on failure, re-cast with exponential backoff (up to
max_retries) - Report β publish
MigrationResultnotification onkronos.results.<aggregate>
| Topic | Primitive | Publisher | Consumer |
|---|---|---|---|
kronos.migrations.<aggregate> |
RPC cast | Engine | Executor (competing consumers) |
kronos.results.<aggregate> |
Notification | Executor | Engines (broadcast: active + passive update cooldown and quarantine state) |
The unassigned-hosts pool uses the reserved name _unassigned_ in its topics.
kronos/
βββ cmd/ CLI entry points (kronos-engine, kronos-executor, kronos-test-config, kronos-record, kronos-replay)
βββ common/ Shared utilities, exceptions, oslo.config registration, oslo.messaging helpers
βββ policies/ Pydantic models and YAML loader for policy definitions
βββ clients/ Prometheus HTTP client, Nova/OpenStack client (read + live-migrate)
βββ engine/ Control loop, scoring, profiling, constraint checking, affinity enforcement, planning, cooldown tracking
βββ executor/ Migration executor: worker, scheduler, migration runner
tools/ Operational helpers (e.g. generate_fake_snapshot.py for benchmarks)
pip install -e ".[dev]"
# Run tests
pytest
# Lint
ruff check kronos/ tests/
# Type check
mypy kronos/Generate a synthetic snapshot in the same shape kronos-record writes,
then replay it with timings to measure planner performance without
needing a real cluster:
python tools/generate_fake_snapshot.py \
--hosts 50 --vms 5000 --groups 100 --seed 42 \
/tmp/snapshot-fake
# Point [engine] policies_file at /tmp/snapshot-fake/policies.yaml,
# then:
kronos-replay --config-file /tmp/kronos.conf --time /tmp/snapshot-fake--time prints per-phase wall-clock timings (scorer, profiler,
enforcer, planner) so you can see where cycles are spent.
| Milestone | Scope | Status |
|---|---|---|
| M1 | Project skeleton, oslo.config, clients, dry-run engine loop | Done |
| M2 | VM profiling, simulation-based migration planning, constraint checking, record/replay | Done |
| M3 | oslo.messaging queue, migration executor, cooldown tracking | Done |
| M3.5 | Affinity enforcer, all four server-group policies, phase-tagged steps, planner perf, benchmarks | Done |
| M4 | HA via tooz distributed locks, active-passive engines and executors, distributed rate limiting | Planned |
| M5 | Audit logging (append-only JSONL) and general logging cleanup | Planned |
| M6 | PyPI packaging, container image, systemd units, documentation | Planned |
Independent named milestones (any order):
| Name | Scope |
|---|---|
| Pack-rework | Pack mode redesign + compactor parity (post_drain_action, ha_reserve, host re-enable) |
| Executor-multi-aggregate | One executor process can service multiple aggregates (one thread per topic) |
Apache 2.0 β see LICENSE.