Skip to content

Adding Prometheus and Grafana for load analysis and reimbursements analytics #484

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 13 commits into from

Conversation

ArthurSens
Copy link

@ArthurSens ArthurSens commented Jun 2, 2019

What is the purpose of this Pull Request?

Provide a tool that is capable to monitor the performance of all the applications that are part of the serenata solution and show those informations with graphics that we can use to compare the performance of different solutions.

At the same time, with prometheus+grafana, we can provide dashboards that can be used to study the reimbursements data. Prometheus is great for analytics with their PromQL while grafana is great for show data with a friendly interface.

What was done to achieve this purpose?

I've added prometheus and grafana to the docker-compose, along with some agents that are responsible to provide monitoring metrics from the applications to the prometheus server.

One agent in particular is responsible for quering postgres once a day to provide the reimbursements data to prometheus, so we can make our analytical study on that data.

How to test if it really works?

First start all applications

docker-compose up -d

Your prometheus will be available at http://localhost:9090
Your grafana will be available at http://localhost:3000
You can log in into grafana with the credentials admin/admin (It can be changed)

You can see the dashboards at http://localhost:3000/dashboards
You will find 2 folders, Monitoring will have all the dashboards responsible to monitor and provide load analysis of serenata's applications and Business Inteligence will have the dashboards the we will use to make analytics studies about the reimbursements.
PS: Sometimes the folders and dashboards takes a long time to be provisioned after you start the grafana server. If it's not there, wait 5 to 10 minutes for them to load.

When prometheus and grafana are up, you can monitor the performance of your applications at:

For load analysis, for example, open the Process monitoring dashboard and then execute:

docker run --rm -v /tmp/serenata-data:/tmp/serenata-data serenata/rosie python rosie.py run chamber_of_deputies 

You will be able to see how much hardware resources each process is consuming, as show at the image below:

image

If you make changes at rosie's code, you will be able to evaluate if the performance got better or worse.


To validate the reimbursements stuff is a lit bit trickier, since it can take 1 whole day to prometheus to gather the data, not because it's slow, just because it gather that information only once a day.

But first you gotta populate the the database.

docker-compose run --rm django python manage.py migrate
docker-compose run --rm django python manage.py reimbursements /mnt/data/reimbursements_sample.csv
docker-compose run --rm django python manage.py companies /mnt/data/companies_sample.xz

After that you can see if prometheus got that data from postgres if you go to http://localhost:9090/targets and you see the businessInteligence job with the UP status. As shown in the image below:
image

If you see the UNKNOWN status, it means that prometheus still haven't queried postgres yet.

Once prometheus have the reimbursement data, you will be able to manipulate that information with PromQL and build dashboards like the one I made, accesible at http://localhost:3000/d/ecOrj5MZz/reimbursements-analytics?orgId=1

Who can help reviewing it?

Anyone capable of starting the containers.

TODO

@@ -32,6 +32,56 @@ services:
rosie:
image: serenata/rosie

prometheus:
image: arthursens/serenata-prometheus
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arthursens/serenata-prometheus image was built from this Dockerfile

- bi_exporter:bi_exporter

grafana:
image: arthursens/serenata-grafana
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arthursens/serenata-grafana was built from this Dockerfile

node_exporter:
image: prom/node-exporter

process_exporter:
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This agent is responsible to expose metrics about resources used by every process that runs in your host.
More information about it can be found at this repository

command: --procfs /host/proc -config.path /config/config.yml

postgres_exporter:
image: wrouesnel/postgres_exporter
Copy link
Author

@ArthurSens ArthurSens Jun 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This agent is responsible to execute queries at our postgres and expose the results.
It already have some monitoring queries that are executed by default and more can be added with a YAML file.

I've added some monitoring queries here.

More information about this exporter can be found at this repository

- postgres:postgres
volumes:
- ./prometheus/postgres-exporter:/prometheus
command: --disable-default-metrics --disable-settings-metrics --extend.query-path /prometheus/businessInteligence.yml
Copy link
Author

@ArthurSens ArthurSens Jun 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This agent uses the same image as postgres_exporter, but it has different configurations.

First I've disabled the monitoring queries with the --disable-default-metrics --disable-settings-metrics flags
The YAML file with the custom queries is also different, as you can see here

@@ -0,0 +1,3 @@
FROM grafana/grafana:6.1.3

ADD ./provisioning /etc/grafana/provisioning
Copy link
Author

@ArthurSens ArthurSens Jun 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will use Grafana Provisioning to provide our dashboards and datasources automatically.

Grafana reads yaml files located at /etc/grafana/provisioning to provide dashboards and datasources, so we gotta add those files to our grafana image.

updateIntervalSeconds: 300
options:
path: /etc/grafana/provisioning/dashboards/businessInteligence

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This yaml will create the Business Inteligence folder.
All the json files included at the folder /etc/grafana/provisioning/dashboards/businessInteligence will be provisioned automatically at the Business Inteligence folder.

So if you want to add new dashboards, you just gotta add their corresponding JSONs files at this directory.

options:
path: /etc/grafana/provisioning/dashboards/monitoring


Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This yaml will create the Monitoring folder.
All the json files included at the folder /etc/grafana/provisioning/dashboards/monitoring will be provisioned automatically at the Monitoring folder.

So if you want to add new dashboards, you just gotta add their corresponding JSONs files at this directory.

datasources:
- name: Postgres
type: postgres
url: postgres
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The URL to our postgres is just postgres thanks to the address we created here

maxIdleConns: 2 # Grafana v5.4+
connMaxLifetime: 14400 # Grafana v5.4+
postgresVersion: 904 # 903=9.3, 904=9.4, 905=9.5, 906=9.6, 1000=10
timescaledb: false
Copy link
Author

@ArthurSens ArthurSens Jun 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This YAML will provide our Postgres datasource.

Particularly, I haven't used for anything, but you can build dashboards about reimbursements with this datasource.

url: http://localhost:9090
version: 1
editable: true

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This YAML will provide our Prometheus datasource.

We will use this datasource to build our monitoring and load analysis dashboards, and potencially our Reimbursements analytics dashboards too.

FROM prom/prometheus:v2.10.0

ADD ./prometheus.yml /etc/prometheus/prometheus.yml

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prometheus needs a configuration file to gather data.

At that file we will tell prometheus where he must scrape data and how often he will scrape it.

This configuration file can be found here

- total_net_value:
usage: "GAUGE"
description: "Reimbursement value"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra queries can be added here so prometheus gather more analytical information.


- job_name: monitoring
honor_timestamps: true
scrape_interval: 15s
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Monitoring information is collected every 15s

metrics_path: /metrics
scheme: http
static_configs:
- targets:
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Address where prometheus scrapes monitoring information

scheme: http
static_configs:
- targets:
- bi_exporter:9187
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Address where prometheus scrapes analytical information


- job_name: businessInteligence
honor_timestamps: true
scrape_interval: 1d
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Analytical information is collected once a day

@ArthurSens
Copy link
Author

Prometheus is an open-source monitoring tool and really well accepted by the community.

It uses agents that exposes metrics at a certain port, and then prometheus scrapes those metrics via HTTP requests.

Prometheus has several agents(they call them 'exporters') build by themselves or by the community and it also have libraries so we can build our own custom exporter, so it can monitor almost everything.

On this PR I've added a process-exporter, postgres-exporter and linux-exporter, but there is some other agents that can monitor RabbitMQ, Memcached, New Relic and Django. I just haven't had the time to build a dashboard to them... if anyone feel confortable to contribute with that, just add the exporter to the docker-compose.yml, build a dashboard and put the JSON file at the grafana monitoring folder.

@ArthurSens ArthurSens marked this pull request as ready for review June 2, 2019 02:22
Copy link
Collaborator

@cuducos cuducos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Over all I think it's a interesting contribution. Also I'm very glad for the effort you've put on the PR – not only in the code, but in the description and comments. Pretty rare to see outstanding documentation of the intended changes : )

In spite of that I have two main concerns and one disclaimer. The disclaimer: as a volunteer I haven't actually run any code, so I'm commenting based on reading the PR. The two issues are as follow:

Conceptual: grouping by criteria

As far as I got, this PR implements a party analytics, that is something Serenata has never done for some very specific reasons:

  • Nature of data: we (used to) belive in social control of politicians, and they change parties more than we would like they did, so focusing on parties is a focsung on somethng relatively volatile when compared to focusing on people
  • Representation bias: different parties have different number of representatives in the house, thus parties with more elected congresspeople will be more on the spotlight of any visualization or comparison that groups reimburements by parties (the same is valid for grouping by state, by the way)

Thus I would like to invite other maintainer to think and rethink how useful are these visualizations and what risks they imply in terms of interpretation, bias ans skewed meaning of data.

Pragmatic: complexity of the code base

In spite of the great documentation of the PR, no file makes this documentation part of the code base. Current and future users would have to go back to this PR to understand part of the docker-compose.yml and the grafana/ and prometeus/ directories. And it's a lot of code (loooong JSONs and a bunch of YAMLs specifing how this dashboard should work).

I do think a more experimental use os Prometheus & Grafana can be interesting for exporing this data, and I'm not asing for a use case, for example. However I think most of what is documented in this PR should feature, for example, in (so far, non-existng) grafana/README.md and prometeus/README.md. Also I think the services added to docker-compose.yml should feature somewhere in README.md or CONTRIBUTING.md, for example.

prometheus:
image: arthursens/serenata-prometheus
ports:
- 9090:9090
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To follow the syntax adopted in this (and others) YAMLs, this items should de indented y 2 spaces. For example:

    ports:
      - 9090:9090

Instead of:

    ports:
    - 9090:9090

Would you mind changing it in your YAMLs additions?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem at all

- ./prometheus/postgres-exporter:/prometheus
command: --disable-default-metrics --disable-settings-metrics --extend.query-path /prometheus/businessInteligence.yml


Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other places in this file we use a single line to separate services, here we have two. Can we standardize it?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing!

description: "Month that the reimbursement was made"
- state:
usage: "LABEL"
description: "Federal state that the reimbursement was made"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is imprecise. The state refers to the state which has elected the congressperson claiming the reimbursement.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh my bad


ADD ./prometheus.yml /etc/prometheus/prometheus.yml


Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need these empty lines at the and of the file?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No we do not, I'll erase it

metrics:
- id:
usage: "LABEL"
description: "Reimbursement ID"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how this is going to figure in the UI, but this is useless from the UX perspective.

This ID is our Postgres-attributed unique identifier, and not the proper ID attributed by the Chamber of Deputies (which is not unique and is the document_id).

Copy link
Author

@ArthurSens ArthurSens Jun 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's actually kind useless, but not really

Similar to a relational database, prometheus can't store metrics with same name and label values(Relational dbs can't have duplicated primary keys).

All the registries queried by this query will have the same metric name (chamber_of_deputies_reimbursement_total_net_value), and the labels will vary accordingly with their columns values....

I've noticed that if we don't query that ID column, we will have duplicated metrics with the exactly same labels, which will cause an error and we wouldn't be able to collect those metrics.

In short, we need to collect that ID just to avoid some errors, but its not used for anything

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it! Many thanks for the explanation. Maybe we could just rename the label so whoever uses it knows it's an id attributed by Serenata and not by the Chamber of Deputies. Maybe Jarbas ID or a longer version ID attributed to the reimbursement by in Serenata

@ArthurSens
Copy link
Author

ArthurSens commented Jun 3, 2019

Over all I think it's a interesting contribution. Also I'm very glad for the effort you've put on the PR – not only in the code, but in the description and comments. Pretty rare to see outstanding documentation of the intended changes : )

Thanks! I just love projects and tools that have outstanding documentation like yours and I try to do the same.

As far as I got, this PR implements a party analytics, that is something Serenata has never done for some very specific reasons:

Nature of data: we (used to) belive in social control of politicians, and they change parties more than we would like they did, so focusing on parties is a focsung on somethng relatively volatile when compared to focusing on people
Representation bias: different parties have different number of representatives in the house, thus parties with more elected congresspeople will be more on the spotlight of any visualization or comparison that groups reimburements by parties (the same is valid for grouping by state, by the way)
Thus I would like to invite other maintainer to think and rethink how useful are these visualizations and what risks they imply in terms of interpretation, bias ans skewed meaning of data.

Yes, I've built a dashboard focused on parties but no, that's not the main point of this PR. I want to provide a place where you guys go to when you want to develop analytics UI and I've built one single dashboard just for demonstration(It took me 15m to build it). Grafana is soooo easy to learn, it's open source and also have excellent documentation, so anyone could create new dashboards and easly add it to grafana just by adding their JSON files at the provisioning folder.

Just in case you are new to grafana... We don't need to write the JSON. We build the dashboard using the UI and then we can export it to a JSON file.

image

Pragmatic: complexity of the code base

In spite of the great documentation of the PR, no file makes this documentation part of the code base. Current and future users would have to go back to this PR to understand part of the docker-compose.yml and the grafana/ and prometeus/ directories. And it's a lot of code (loooong JSONs and a bunch of YAMLs specifing how this dashboard should work).

I do think a more experimental use os Prometheus & Grafana can be interesting for exporing this data, and I'm not asing for a use case, for example. However I think most of what is documented in this PR should feature, for example, in (so far, non-existng) grafana/README.md and prometeus/README.md. Also I think the services added to docker-compose.yml should feature somewhere in README.md or CONTRIBUTING.md, for example.

Hmmmm, how could I forget that? 🤔
I strongly agree that a README.md is necessary so more people feel confortable contributing with this. I will try to write them ASAP

@ArthurSens
Copy link
Author

I do have some end of semester exams to study to and some others stuff behind schedule at work 😟 so I won't be able to advance with this PR right now....

I will try to save some time on the weekend for it

@cuducos
Copy link
Collaborator

cuducos commented Jun 3, 2019

That's awesome, @ArthurSens. I think my role a volunteer code reviewer is done now. I'll let @sergiomario and the Open Knowledge crew ponder on the questions I raised : )

Once more, many thanks for the awesome PR.

Mario, LGMT counting on the forthcoming changes Arthur mentioned in the conversation, and taking into account my disclaimer (i just reviewed code, haven't tested the new apps).

@ArthurSens
Copy link
Author

Just commenting on this last commit...

I've made the changes you asked on the docker-compose regarding it's identation aaaaaand I've changed the images of prometheus and grafana to their originals...

I think that having to build a new image everytime a new dashboard is build will have a bad impact with contributions, so maybe using external volumes would be a better option here.

With this option you can change prometheus configuration just by editting prometheus.yml in your host machine... Adding new dashboards follows the same logic, add the JSON file to the monitoring folder, no need to make a new image after that

I will work on the README.md on the next few days...I hope I can finish it by the next weekend

@cuducos
Copy link
Collaborator

cuducos commented Jun 4, 2019

I think that having to build a new image everytime a new dashboard is build will have a bad impact with contributions, so maybe using external volumes would be a better option here.

Awesome ❤️

arthur.sens and others added 3 commits June 11, 2019 15:20
Documentation created and new monitoring dashboard for prometheus and grafana
@ArthurSens
Copy link
Author

Documentation created...
I've tried to follow Jarbas and Rosie's README as example when structuring prometheus and grafana's README, but please tell me if there are changes that I still need to make

@ArthurSens ArthurSens closed this Nov 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants