Skip to content
/ tele Public

dashboard and alerts for: system resources and application logs

License

Notifications You must be signed in to change notification settings

jtara1/tele

Repository files navigation

tele

FlakeHub

dashboard and alerts for: system resources and application logs

Features

  • view timeseries graphs for: cpu, memory, storage, & networking
  • receive email alerts when a resource (cpu, memory, or storage) is in high use
  • ingest docker (rootless) container logs
  • ingest nginx access logs
  • query the logs

You're not required to use docker, nginx, or setup email (to receive alerts).

Requirements

NixOS and Nix Flakes on remote host for usage.

Setup

assumptions for simplification:

  • docker containers run on same host
  • nginx runs on same host

promtail should safely fail for its respective scrape job if you're not running docker or nginx

NixOS Flake

Import

in your system flake.nix inputs, add

    tele.url = "github:jtara1/tele";

in your modules or an imports, append

    inputs.tele.nixosModules.default

where inputs is the 1st parameter in the function assigned to outputs.

Configure

in a nix module, enable it with config such as

  services.tele = {
    enable = true;
    email = {
      host = "mail.example.com:587";
      senderAddress = "admin@example.com";
      receiverAddress = "admin@example.com";
      secretsFilePath = /etc/tele/secrets.json;
    };
  };

email is optional and used for sending emails for notifications based on alert rules.

secretsFilePath should be a file path to a file containing:

{
  "emailPlaintextPassword": "your-email-password"
}

Publish

On host OS and its network, you should expose or redirect to its http://localhost:3010, grafana, the dashboard for querying and graphs

Security

Add security. Change grafana dashboard password. Allowlist your IP from which you're accessing it. Encrypt the channel via TLS or other. etc.

Usage

Go to the url for grafana, defaults to http://localhost:3010

default login:

username: admin
password: admin

You can query logs, create visualizations, load a dashboard, or check alerts:

TODO

  • migrate everything to nixos config declaration
  • nixos config services.promtail
  • relabel docker apps logs for readability (container name instead of container id, the long-hashes)
  • .sh instead of ansible? Makefile instead of ansible? can import nix flake
  • promtail to ingest logs for non-rootless (default) docker containers
  • config to conditionally set docker deamon.json instead of docker rootless
  • improve promtail job, docker
  • re-add nginx access logs pipeline
  • prometheus for system monitoring and metrics
  • 1st alerts rule for memory
  • alert notifications via SMTPS/email
  • alert rules for core resources: cpu, storage, memory
  • ingest logs from multiple virtual machines in dedicated tele server?
  • fix and test the nginx config for local dev and for my server
  • refactor into nix module(s) to be more portable for non-flake NixOS users
  • core health dashboard - pre-configured visualization for core resources
  • improve management of email password for ease of secure and declarative use
  • publish to flakehub
  • error: timestamp of a queried log is invalid
  • alerts may need variable datasource uid

References