Compute Energy & Emissions Monitoring Stack (CEEMS)


CI/CD
Docs
Package
Meta

Compute Energy & Emissions Monitoring Stack (CEEMS) (pronounced as kiːms) contains a Prometheus exporter to export metrics of compute instance units and a REST API server that serves the metadata and aggregated metrics of each compute unit. Optionally, it includes a TSDB load balancer that supports basic access control on TSDB so that one user cannot access metrics of another user.

"Compute Unit" in the current context has a wider scope. It can be a batch job in HPC, a VM in cloud, a pod in k8s, etc. The main objective of the repository is to quantify the energy consumed and estimate emissions by each "compute unit". The repository itself does not provide any frontend apps to show dashboards and it is meant to use along with Grafana and Prometheus to show statistics to users.

Although CEEMS was born out of a need to monitor energy and carbon footprint of compute workloads, it supports monitoring performance metrics as well. In addition, it leverages eBPF framework to monitor IO and network metrics in a resource manager agnostic way.

Features

Monitor energy, performance, IO and network metrics for different types of resource managers (SLURM, Openstack, k8s)
Support NVIDIA (MIG and vGPU) and AMD GPUs
Provides targets using HTTP Discovery Component to Grafana Alloy to continuously profile compute units
Realtime access to metrics via Grafana dashboards
Access control to Prometheus datasource in Grafana
Stores aggregated metrics in a separate DB that can be retained for long time
CEEMS apps are capability aware

Install CEEMS

Warning

DO NOT USE pre-release versions as the API has changed quite a lot between the pre-release and stable versions.

Installation instructions of CEEMS components can be found in docs.

Visualizing metrics with Grafana

CEEMS is meant to be used with Grafana for visualization and below are some of the screenshots of dashboards.

Time series compute unit CPU metrics

Time series compute unit GPU metrics

List of compute units of user with aggregate metrics

Aggregate usage metrics of a user

Talks and Demos

Contributing

We welcome contributions to this project, we hope to see this project grow and become a useful tool for people who are interested in the energy and carbon footprint of their workloads.

Please feel free to open issues and/or discussions for any potential ideas of improvement.

Name		Name	Last commit message	Last commit date
Latest commit History 574 Commits
.circleci		.circleci
.github		.github
build		build
cmd		cmd
etc		etc
examples		examples
internal		internal
pkg		pkg
scripts		scripts
website		website
.clang-format		.clang-format
.dockerignore		.dockerignore
.gitignore		.gitignore
.golangci.yml		.golangci.yml
.promu-cgo.yml		.promu-cgo.yml
.promu-go-test.yml		.promu-go-test.yml
.promu-go.yml		.promu-go.yml
.promu.yml		.promu.yml
.yamllint		.yamllint
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
Makefile.common		Makefile.common
README.md		README.md
VERSION		VERSION
go.mod		go.mod
go.sum		go.sum
staticcheck.conf		staticcheck.conf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Compute Energy & Emissions Monitoring Stack (CEEMS)

Features

Install CEEMS

Visualizing metrics with Grafana

Time series compute unit CPU metrics

Time series compute unit GPU metrics

List of compute units of user with aggregate metrics

Aggregate usage metrics of a user

Talks and Demos

Contributing

About

Releases 16

Contributors 2

Languages

License

mahendrapaipuri/ceems

Folders and files

Latest commit

History

Repository files navigation

Compute Energy & Emissions Monitoring Stack (CEEMS)

Features

Install CEEMS

Visualizing metrics with Grafana

Time series compute unit CPU metrics

Time series compute unit GPU metrics

List of compute units of user with aggregate metrics

Aggregate usage metrics of a user

Talks and Demos

Contributing

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 16

Contributors 2

Languages