Skip to content

Commit

Permalink
Run telegraf on everything, and have Galaxy log to a telegraf+statsd …
Browse files Browse the repository at this point in the history
…running host-local so these stats can be separated
  • Loading branch information
natefoo committed Apr 23, 2018
1 parent b2fb46d commit 73dc2d9
Show file tree
Hide file tree
Showing 32 changed files with 994 additions and 1,310 deletions.
21 changes: 21 additions & 0 deletions env/tacc/group_vars/all.yml
Expand Up @@ -27,3 +27,24 @@ all_packages:
# opts: rw,nosuid,noatime,rsize=1048576,wsize=1048576,intr,nfsvers=3,tcp,soft,addr=129.114.60.34
# owner: ndc
# group: G-803372

# run telegraf on everything
telegraf_agent_output:
- type: influxdb
config:
- urls = ["http://stats.galaxyproject.org:8086"]
- database = "system"

telegraf_plugins_default:
- plugin: cpu
config:
- percpu = true
- plugin: disk
- plugin: kernel
- plugin: processes
- plugin: io
- plugin: mem
- plugin: system
- plugin: swap
- plugin: net
- plugin: netstat
48 changes: 48 additions & 0 deletions env/tacc/group_vars/galaxyservers.yml
Expand Up @@ -85,3 +85,51 @@ cvmfs_http_proxies:

cvmfs_install_setuid_cvmfs_wipecache: no
cvmfs_install_setuid_cvmfs_remount_sync: yes

# log stats to telegraf locally
#
# separate db filtering for statsd (Galaxy timing) and system stuff using
# https://github.com/influxdata/telegraf/issues/1778
telegraf_agent_output:
- type: influxdb
config:
- urls = ["http://stats.galaxyproject.org:8086"]
- database = "system"
- '[outputs.influxdb.tagdrop]'
- ' influxdb_database = ["*"]'
- type: influxdb
config:
- urls = ["http://stats.galaxyproject.org:8086"]
- database = "telegraf"
- 'tagexclude = ["influxdb_database"]'
- '[outputs.influxdb.tagpass]'
- ' influxdb_database = ["telegraf"]'

telegraf_plugins_default:
- plugin: statsd
config:
- service_address = "127.0.0.1:8125"
- delete_gauges = false
- delete_counters = false
- delete_sets = false
- delete_timings = true
- percentiles = [90]
- metric_separator = "_"
- parse_data_dog_tags = false
- allowed_pending_messages = 10000
- percentile_limit = 1000
- udp_packet_size = 1500
- '[inputs.statsd.tags]'
- ' influxdb_database = "telegraf"'
- plugin: cpu
config:
- percpu = true
- plugin: disk
- plugin: kernel
- plugin: processes
- plugin: io
- plugin: mem
- plugin: system
- plugin: swap
- plugin: net
- plugin: netstat
46 changes: 46 additions & 0 deletions env/tacc/group_vars/statsservers/vars.yml
Expand Up @@ -56,3 +56,49 @@ group_crontabs:
user: stats
minute: "*/5"
job: "/srv/statslurp/bin/galaxy_slurp.py main"

# separate db filtering for statsd (Galaxy timing) and system stuff using
# https://github.com/influxdata/telegraf/issues/1778
telegraf_agent_output:
- type: influxdb
config:
- urls = ["http://localhost:8086"]
- database = "system"
- '[outputs.influxdb.tagdrop]'
- ' influxdb_database = ["*"]'
- type: influxdb
config:
- urls = ["http://localhost:8086"]
- database = "telegraf"
- 'tagexclude = ["influxdb_database"]'
- '[outputs.influxdb.tagpass]'
- ' influxdb_database = ["telegraf"]'

telegraf_plugins_default:
- plugin: statsd
config:
- service_address = "0.0.0.0:8125"
- delete_gauges = false
- delete_counters = false
- delete_sets = false
- delete_timings = true
- percentiles = [90]
- metric_separator = "_"
- parse_data_dog_tags = false
- allowed_pending_messages = 10000
- percentile_limit = 1000
- udp_packet_size = 1500
- '[inputs.statsd.tags]'
- ' influxdb_database = "telegraf"'
- plugin: cpu
config:
- percpu = true
- plugin: disk
- plugin: kernel
- plugin: processes
- plugin: io
- plugin: mem
- plugin: system
- plugin: swap
- plugin: net
- plugin: netstat
2 changes: 2 additions & 0 deletions env/tacc/playbook.yml
Expand Up @@ -18,6 +18,8 @@
tags: cron
- role: services # Manage services
tags: services
- role: dj-wasabi.telegraf
tags: stats,telegraf
tags: initial,system

- name: Forward mail for local accounts
Expand Down
10 changes: 10 additions & 0 deletions env/tacc/telegraf.yml
@@ -0,0 +1,10 @@
---

- name: Tasks for telegraf hosts
hosts: baseenv
remote_user: root
roles:
- role: dj-wasabi.telegraf
tags:
- telegraf
- stats
12 changes: 1 addition & 11 deletions env/tacc/templates/supervisor/stats.j2
@@ -1,14 +1,4 @@

[program:telegraf]
process_name = telegraf
command = /usr/bin/telegraf -config {{ stats_conf_dir }}/telegraf.conf
user = stats
directory = {{ stats_var_dir }}
autostart = true
autorestart = true
stdout_logfile = {{ supervisord_log_dir }}/stats_telegraf.log
redirect_stderr = true

[program:influxdb]
process_name = influxd
command = /usr/bin/influxd -config {{ stats_conf_dir }}/influxdb.conf
Expand All @@ -31,4 +21,4 @@ stdout_logfile = {{ supervisord_log_dir }}/stats_grafana.log
redirect_stderr = true

[group:stats]
programs = telegraf, influxdb, grafana
programs = influxdb, grafana
7 changes: 7 additions & 0 deletions roles/dj-wasabi.telegraf/.gitignore
@@ -0,0 +1,7 @@
.idea
.molecule
tests/.cache
.cache
__pycache__
*.retry
pmip
15 changes: 15 additions & 0 deletions roles/dj-wasabi.telegraf/.travis.yml
@@ -0,0 +1,15 @@
---
sudo: required
language: python
services:
- docker

install:
- pip install molecule ansible docker

script:
- molecule --version
- ansible --version
- molecule test
notifications:
webhooks: https://galaxy.ansible.com/api/v1/notifications/
89 changes: 89 additions & 0 deletions roles/dj-wasabi.telegraf/CHANGELOG.md
@@ -0,0 +1,89 @@
dj-wasabi.telegraf
------------------

Below an overview of all changes in the releases.

Version (Release date)

0.8.0 (2017-10-30)

* Updating to Molecule V2
* Test if LSB codename exists before using it #35 (By pull request: tszym (Thanks!))
* Remove useless packages on RedHat. fix #28 #36 (By pull request: tszym (Thanks!))
* Fix extra plugins by file / Change apt source filename / Change tags by global_tags #37 (By pull request: aarnaud (Thanks!))
* Use telegra_global_tags for oldest telegraf versions #38 (By pull request: tszym (Thanks!))

0.7.0 (2017-02-23)

* Replace action by modules #26 (By pull request: tszym (Thanks!))
* Use yum repository to install telegraf on RedHat #25 (By pull request: tszym (Thanks!))
* Remove for-loop in extra-plugin template #24 (By pull request: emersondispatch (Thanks!))
* Update Debian.yml #23 (By pull request: zend0 (Thanks!))
* extra plugins tags #21 (By pull request: oboukili (Thanks!))
* Input tags support #20 (By pull request: szibis (Thanks!))
* Fix telegraf confguration permissions #19 (By pull request: szibis (Thanks!))

0.6.0 (2017-01-02)

* Fix the Influxdb repo for "hybrid" debian distros (like "jessie/sid") #9 (By pull request: Ismael (Thanks!))
* Do "become" for the steps that require root access on Debian #10 (By pull request: Ismael (Thanks!))
* Fix the Influxdb repo for "hybrid" debian distros (like "jessie/sid") #11 (By pull request: Ismael (Thanks!))
* Removed imports #12
* Fixing molecule #15
* set telegraf hostname in defaults. #13 (By pull request: romainbureau (Thanks!))
* use version_compare filter … #14 (By pull request: lhoss (Thanks!))
* support missing agent settings upto telegraf v1.1 #16 (By pull request: lhoss (Thanks!))
* update the README with the latest v0.13 - v1.1 agent settings #17 (By pull request: lhoss (Thanks!))

0.5.1 (2016-08-24)

* fixed issue with ansible not getting the package #6 (By pull request: thecodeassassin (Thanks!))

0.5.0 (2016-07-17)

* Removed Test Kitchen tests
* Added Molecule tests and travis make use of them
* Updated default version to 1.0.0 beta2
* Feature/add extra plugins to telegrafd folder #5 (By pull request: stvnwrgs (Thanks!))

0.4.0 (2016-02-05)

* Fixed test for test-kitchen
* Added travis-ci test for testing default installation when PR is made
* Fixed Download url for Debian
* Removed default entry for telegraf_plugins_extra

0.3.0 (2016-01-13)

* Made it work with telegraf 0.10.0
* Default installation: 0.10.0

0.2.0 (2015-11-14)

* Fixed kitchen test setup
* Adding "net" to the telegraf_plugins_default property
* Update etc-opt-telegraf-telegraf.conf.j2 #2 (By pull request: aferrari-technisys (Thanks!))
* Improvement and upgrade for v0.2.0 of telegraf #1 (By pull request: aferrari-technisys (Thanks!))

0.1.0 (2015-09-23)

* Updated `telegraf_agent_version` to 0.1.9
* Added restart when package is changed (When updated for example)
* Added several plugin options:
* pass
* drop
* tagpass
* tagdrop
* interval
* Updated documentation


0.0.2 (2015-09-20)

* Updated README dus to missing colon
* Forgot to update the meta file
* Added Changelog file

0.0.1 (2015-09-20)

* Initial release

0 comments on commit 73dc2d9

Please sign in to comment.