Online force-sleep controller

Controller for monitoring and restricting resource usage of free tier accounts. Listens to events in a cluster and caches data on resources by namespace (project).

The PERIOD is a rolling timeframe during which pods' resource usage is considered.

QUOTA_HOURS refers to the maximum number of quota-hours usable within the PERIOD before the resource (or project) is put into force-sleep mode. A quota-hour is defined for pods as a pod using its full memory quota for one hour.

Once a project has exceeded the QUOTA_HOURS limit, that project's scalable resources are scaled to 0 replicas. Pods' quota-hour usage within a project accumulate during the rolling PERIOD until QUOTA_HOURS is met. In that case, a force-sleep quota is placed on the project with a hard limit on pods=0. This quota persists for PROJECT_SLEEP_LENGTH. Upon removal of the force-sleep quota, services are placed in an idled state. Project deployments will be scaled up to the pre-sleep value when the service within that project receives network traffic, using the same logic as oc idle and the origin unidling controller.

Every SLEEP_SYNC_PERIOD, the cached data on each project will be queried and the projects' quota-hour usage will be calculated and, if necessary, force-sleep will be added to (or removed from) the project.

Every IDLE_SYNC_PERIOD, prometheus metrics will be queried to get the cumulative network traffic received for all pods in a project over the IDLE_QUERY_PERIOD. If network traffic recieved is below a configured threshold, services in the project will be idled. Also, replication controllers, replicasets, deployments and deployment configs are scaled to 0 and all pods are deleted. Upon receiving network traffic, scalable resources within idled projects are scaled to whatever the value was in the RC/RS/Deployment/DC before being idled. The auto-idler uses the same logic as oc idle and the origin unidling controller.

The auto-idler queries prometheus. Therefore, prometheus must be deployed in the cluster to run the auto-idling controller.

Note: Prometheus has a default collection interval of 1 minute.  A query has to be at least 2 times
      that interval.  Therefore, in testing this component, the IDLE_QUERY_PERIOD should never be
      set to less than 2 minutes.  Prometheus will not return any projects as below idling threshold
      if the query period is less than 2 minutes.

Usage - deploy in cluster with the following:

oc create -f template.yaml -n openshift-infra
oc process -n openshift-infra hibernation | oc apply -n openshift-infra -f -

glog levels generally follow this structure:

3: Resource/watch event level messages
2: Project/sleep/idle level messages
1: Sleeper/Idler/cluster level messages

Name		Name	Last commit message	Last commit date
Latest commit History 126 Commits
ansible		ansible
cmd/hibernate		cmd/hibernate
hack		hack
pkg		pkg
vendor		vendor
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
OWNERS		OWNERS
README.md		README.md
glide.lock		glide.lock
glide.yaml		glide.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ansible

ansible

cmd/hibernate

cmd/hibernate

hack

hack

pkg

pkg

vendor

vendor

.gitignore

.gitignore

Dockerfile

Dockerfile

LICENSE

LICENSE

Makefile

Makefile

OWNERS

OWNERS

README.md

README.md

glide.lock

glide.lock

glide.yaml

glide.yaml

Repository files navigation

Online force-sleep controller

About

Releases

Packages

Languages

License

isabella232/online-hibernation

Folders and files

Latest commit

History

Repository files navigation

Online force-sleep controller

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Languages