Skip to content

Latest commit

 

History

History
574 lines (386 loc) · 13.9 KB

File metadata and controls

574 lines (386 loc) · 13.9 KB

Grafana and Telegraf Metrics Monitoring System

Grafana is an open-source platform for monitoring and observability, primarily used to visualize time series data. It allows users to create and share dynamic dashboards, providing powerful and flexible visual representations of data.

Telegraf is the plugin-driven server agent for collecting, processing, aggregating and writing metrics to InfluxDB.

Grafana supports various data sources such as Prometheus, Graphite and OpenTSDB. In this lab we'll use InfluxDB as our preferred data source.

InfluxDB is used as a data source for Grafana due to its high performance and scalability in handling large volumes of time-stamped data.


Getting Started

  • Provision Servers with Terraform

  • User Configuration

  • Setup InfluxDB v2

  • Setup Telegraf

  • Install Grafana

  • Create Grafana Data Source

  • Import System Dashboard

  • Ansible Installation and Setup

  • Add Remote Hosts to Grafana Dasboard

Provision Servers with Terraform

Install AWS CLI in local machine

sudo apt install curl unzip
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
sudo ./aws/install -i /usr/local/aws-cli -b /usr/local/bin

Confirm the AWS CLI installation

aws --version

Clone this repository in the local machine

cd /
git clone git@github.com:odennav/grafana-telegraf-metrics-monitoring-system.git

Execute these Terraform commands sequentially in the local machine to create the AWS VPC(Virtual Private Cloud) and EC2 instances.

Initializes terraform working directory

cd grafana-telegraf-metrics-monitoring-system/terraform
terraform init

Validate the syntax of the terraform configuration files

terraform validate

Create an execution plan that describes the changes terraform will make to the infrastructure

terraform plan

Apply the changes described in execution plan

terraform apply -auto-approve

Check AWS console for instances created and running

SSH access

Use .pem key from AWS to SSH into the public EC2 instance. IPv4 address of public EC2 instance will be shown in terraform outputs.

ssh -i private-key/terraform-key.pem ec2-user@<ipaddress>

We can use public EC2 instance as a jumpbox to securely SSH into private EC2 instances within the VPC.

Note, the ansible inventory is built dynamically by terraform with the private ip addresses of the EC2 machines.


User Configuration

Add New User

We'll use central-server-1 virtual machine as our build machine.

Change password for root user

sudo passwd

Switch to root user. Add new user odennav to sudo group.

sudo useradd odennav
sudo usermod -aG wheel odennav

Notice the prompt to enter your user password. To disable password prompt for every sudo command, implement the following:

Add sudoers file for odennav-admin

echo "odennav ALL=(ALL) NOPASSWD: ALL" | sudo tee /etc/sudoers.d/odennav

Ensure correct permissions for sudoers file

sudo chmod 0440 /etc/sudoers.d/odennav
sudo chown root:root /etc/sudoers.d/odennav

Create new password for odennav user

sudo passwd odennav

Test sudo privileges by switching to new user

su - odennav
sudo ls -la /root

To change the PermitRootLogin setting, modify the SSH server configuration file /etc/ssh/sshd_config as shown below:

PermitRootLogin no

Restart the SSH service

sudo systemctl restart sshd.service

Please note you'll have to repeat this user setup for each server provisioned.


Setup InfluxDB v2

Install InfluxDB as a service with systemd as shown below:

Download and install the appropriate .rpm file

curl -LO https://download.influxdata.com/influxdb/releases/influxdb2-2.7.6-1.x86_64.rpm
sudo yum localinstall -y influxdb2-2.7.6-1.x86_64.rpm

Start and enable the InfluxDB service

sudo systemctl start influxdb.service
sudo systemctl enable influxdb.service

Installing the InfluxDB package creates a service file at /lib/systemd/system/influxdb.service to start InfluxDB as a background service on startup.

Verify the service is running

sudo systemctl status influxdb.service

Setup Initial User of InfluxDB

Implement the following:

  • With InfluxDB running, visit http://10.33.10.1:8086.

  • Click Get Started.

  • Enter a Username for your initial user.

  • Enter a Password and Confirm Password for your user.

  • Enter your initial Organization Name.

  • Enter your initial Bucket Name.

  • Click Continue.

  • Copy the provided operator API token and store it for safe keeping.

Your InfluxDB instance is now initialized.

Install Influx CLI

The influx CLI is used to interact with and manage your InfluxDB instance.

Confirm the cpu architecture of your local machine to

uname -m
lscpu | grep Architecture

Download the influx CLI package from the command line.

sudo wget https://download.influxdata.com/influxdb/releases/influxdb2-client-2.7.5-linux-amd64.tar.gz

Unpackage the downloaded binary

tar -xvzf ./influxdb2-client-2.7.5-linux-amd64.tar.gz

Place the unpackaged influx executable in system $PATH

sudo cp ./influx /usr/local/bin/

Confirm influx client is available

influx version

Create an All Access API token with influx CLI

With the Operator token we can interact with InfluxDB, it's recommended to create an All Access token that is scoped to an organization, and then using this token to manage InfluxDB.

Use the influx auth create command to create an All Access token

influx auth create \
  --all-access \
  --host http://localhost:8086 \
  --org Treten \
  --token <YOUR_INFLUXDB_OPERATOR_TOKEN>

Copy the generated token and store it for safe keeping.

Configure authentication credentials

A connection configuration stores your credentials to avoid having to pass your InfluxDB API token with each influx command.

It specifies connection configuration presets to switch between InfluxDB connection credentials.

Use the All Access token to interact with InfluxDB.

influx config create \
  --config-name odennav-config \
  --host-url http://localhost:8086 \
  --org Treten \
  --token <ALL ACCESS API_TOKEN> \
  --active

Authenticate with a username and password

influx config create \
  -n odennav-config \
  -u http://localhost:8086 \
  -p odennav:<PASSWORD> \
  -o Treten

Set the following environment variables

export INFLUX_HOST=localhost:8086
export INFLUX_ORG=Treten
export INFLUX_ORG_ID=<ORG_ID>
export INFLUX_TOKEN=<ALL ACCESS API_TOKEN>

Setup Telegraf

Use the yum package manager to install the latest stable version of Telegraf

cat <<EOF | sudo tee /etc/yum.repos.d/influxdb.repo
[influxdb]
name = InfluxData Repository - Stable
baseurl = https://repos.influxdata.com/stable/\$basearch/main
enabled = 1
gpgcheck = 1
gpgkey = https://repos.influxdata.com/influxdata-archive_compat.key
EOF

Install Telegraf from the repo

sudo yum install -y telegraf

The telegraf configuration file is installed at /etc/telegraf/telegraf.conf

Create a configuration file with default input and output plugins

telegraf config > /etc/telegraf/telegraf.conf

Configure the input plugins

Search and uncomment the following plugins below to enable them in telegraf.conf file

[[inputs.conntrack]]
[[inputs.internal]]
[[inputs.interrupts]]
[[inputs.linux_sysctl_fs]]
[[inputs.net]]
[[inputs.netstat]]
[[inputs.nstat]]

Start and enable Telegraf service

sudo systemctl start telegraf
sudo systemctl enable telegraf

Verify the Telegraf service is running

sudo systemctl status telegraf.service

Check databases in InfluxDB

influx bucket list

Look at metrics/tables stored in the telegraf database

influx query 'import "influxdata/influxdb/schema" schema.measurements(bucket: "telegraf")'

Check data from swap measurement table

influx query 'from(bucket: "telegraf")
  |> range(start: -1m)
  |> filter(fn: (r) => r._measurement == "swap")'

Install Grafana

To install Grafana from the RPM repository, complete the following steps:

wget -q -O gpg.key https://rpm.grafana.com/gpg.key
sudo rpm --import gpg.key

Use the yum package manager to install the latest stable version of Grafana Add this to /etc/yum/epos.d/grafana.repo

cat << EOF | sudo tee /etc/yum/epos.d/grafana.repo
[grafana]
name=grafana
baseurl=https://rpm.grafana.com
repo_gpgcheck=1
enabled=1
gpgcheck=1
gpgkey=https://rpm.grafana.com/gpg.key
sslverify=1
sslcacert=/etc/pki/tls/certs/ca-bundle.crt
EOF

Install Grafana Enterprise

sudo yum install grafana-enterprise

To start the grafana service

sudo systemctl daemon-reload
sudo systemctl start grafana-server

To configure the Grafana server to start at boot

sudo systemctl enable grafana-server.service

To verify that the grafana service is running

sudo systemctl status grafana-server

Create Grafana Data Source

Browse to 10.33.10.1:3000.

Log in with the username of admin and the password of admin

Implement the following:

  • Click on the ⚙️ icon in the menu bar on the left. It will take you to the Configuration page for Grafana

  • Click on the Add data sources button

  • Click on InfuxDB as your choice of time series database.

  • Fill the Settings form as below:

    Name ----------------> InfluxDB-Telegraf

    Default--------------> Click to turn the selector on.

    Query Language ------> InfluxQL

    URL -----------------> http://localhost:8086

    Access --------------> Server (default)

    Database ------------> telegraf

Click the Save & Test button.

Now Grafana can access the metrics stored in InfluxDB.


Import System Dashboard

Hover over the + sign in the menu on the left of your screen. It will expand into a menu when you hover over it.

From there, click on Import.

Implement the following:

  • In the Grafana dashboard URL or id field, enter 13095 and click the "Load" button next to it.

An alternative is to import a JSON file for the grafana dashboard.

To upload click Upload JSON File and navigate to /grafana-metrics-monitoring-system/grafana templates/13095_System in this repo.

Note theIPMI json template available in this repo. It's an awesome SNMP based dashboard to monitor Dell hosts via iDRAC.

To use this dashboard you'll have to ensure SNMPv1 is enabled in iDRAC and also enable the SNMP input plugin in telegraf.conf file as shown here

To view more dashboards, check Grafana Labs

  • On the next screen in the Select an InfluxDB data source box, select InfluxDB-Telegraf. Then click on the Import button.

Now you should see a dashboard displaying system performance information for the grafana host.

Ensure to save the imported dasboard.


Ansible Installation and Setup

The task of configuring a remote hosts as an icinga agents is repetitve.

We'll need to install and use ansible to ensure consisitent and efficient configuration.

Install Ansible

To install ansibe without upgrading current python version, we'll make use of the yum package manager.

sudo yum update

Install EPEL repository

sudo yum install epel-release

Verify installation of EPEL repository

sudo yum repolist

Install Ansible

sudo yum install ansible

Confirm installation

ansible --version

Configure Ansible Vault

Ansible communicates with target remote servers using SSH and usually we generate RSA key pair and copy the public key to each remote server, instead we'll use username and password credentials of the odennav user.

This credentials are added to inventory host file but encrypted with ansible-vault.

Ensure all IPv4 addresses and user variables of remote servers are in the inventory file.

View ansible-vault/values.yml which has the secret password

cat /grafana-telegraf-metrics-monitoring-system/ansible/ansible-vault/values.yml

Generate vault password file

openssl rand -base64 2048 > /grafana-telegraf-metrics-monitoring-system/ansible/ansible-vault/secret-vault.pass

Create ansible vault with vault password file

ansible-vault create /grafana-telegraf-metrics-monitoring-system/ansible/ansible-vault/values.yml --vault-password-file=/grafana-telegraf-metrics-monitoring-system/ansible/ansible-vault/secret-vault.pass

View content of ansible vault

ansible-vault view /grafana-telegraf-metrics-monitoring-system/ansible/ansible-vault/values.yml --vault-password-file=/grafana-telegraf-metrics-monitoring-system/ansible/ansible-vault/secret-vault.pass

Read ansible vault password from environment variable

export ANSIBLE_VAULT_PASSWORD_FILE=/grafana-telegraf-metrics-monitoring-system/ansible/ansible-vault/secret-vault.pass

Confirm environment variable has been exported

export ANSIBLE_VAULT_PASSWORD_FILE

Test Ansible by pinging all remote servers in inventory list

ansible all -m ping

Add Remote Hosts to Grafana Dashboard

Provision other remote hosts with Vagrant and implement new user configuration.

Use ansible playbook /ansible/deploy_telegraf/setup_telegraf.yml

ansible-playbook -i inventory /grafana-telegraf-metrics-monitoring-system/ansible/deploy_telegraf/deploy_telegraf.yml -e @/grafana-telegraf-metrics-monitoring-system/ansible/ansible-vault/values.yml

Return to your web browser and reload the dashboard page.

At the top of your browser you should see a selector box for Server.

When you click on the word Server you will see the grafana VM, central-server-1 as well as our newly added VMs.


Enjoy!