<h1 align='center'> Building your first performance monitor </h1>

<h3 align='center'>With Laura G. Funderburk</h3>

<h4 align='center'>Data Scientist at Cybera</h4>

<h4 align='center'>Support DevOps Engineer, iReceptor project </h4>

<h4 align='center'>PyLadies Berlin </h4>

<h4 align='center'>April 6 2021 </h4>

<h2 align='center'> Introductions </h2>


<center><img src="https://media1.giphy.com/media/j1soPQE95y0eXhMwKT/source.gif" alt="Drawing" style="width: 500px;"/></center>


## What does the performance monitor I will show you look like? 

https://github.com/lfunderburk/Performance-Monitor

## How I do it

### Part I: Perform queries on a regular schedule

1. Identify an API (this talk will focus on using REST API's, but can be extended to other kinds of API's)
2. Write Python script `perform_queries.py` to perform a query, track how long it takes and save results in a CSV file 
3. Use Bash to create a wrapper script for `perform_queries.py` - `automate_queries.sh` (advantage when you want to perform queries on multiple API's)
4. Use Crontab to set up a regular schedule for `automate_queries.sh`


## How I do it

### Part II: Use Python + Bash + GitHub + GitHub pages to visualize and share results

5. Write a Python script to parse the CSV results and generate plots `plot_results.py`
6. Create a new GitHub repository, clone the repository on your local computer
7. Write a wrapper script for `plot_results.py`, - `publish_results.sh` that pushes plots generated as HTML files to repository
9. Activate GitHub's pages
10. Add Bash plotting script to Crontab on a regular schedule

## Why I do it this way

This is a process I have been curating over the course of two years through experience supporting DevOps tasks. 

Team wanted to identify timeouts in their database and prevent them. 

The performance monitor I built for them allows them to see when query times are close to the cutoff time, and that rebooting service cuts time in half. 

This performance monitor won't assess underlying causes for timeouts, and should only be used as a part of a systematic approach to assess how long different queries take.

In [8]:
# Example
%run -i performance_monitor.py 

0.8223390579223633
2.769749879837036
HTTP error occurred: 400 Client Error: Bad Request for url: https://api.carbonintensity.org.uk/intensity/stats/2021-03-05T23:59Z/2021-04-05T23:59Z
Could not complete query


In [5]:
# An example output file 
sample_file = "./logs/_pT_2021-03-10_20-24-35_Query_Times_carbonintensity_intensity.csv"
pd.read_csv(sample_file)

Unnamed: 0.1,Unnamed: 0,from,to,intensity.forecast,intensity.actual,intensity.index,query.lasted,query.date,query.time
0,0,2021-03-11T03:30Z,2021-03-11T04:00Z,92,93,low,0.821728,2021-03-10,20:24:35


## What about Bash?

There are a few things we need to consider:

1. Depending on how often you want the results, you might need to set up a virtual machine (I used Oracle VM VirtualBox in Windows to set up a Linux VM, on 24/7)

2. You will also need to configure crontab to ensure you provide the path to Bash and other environment variables, set up a ssh key so crontab can push to GitHub without needing your credentials. 

3. Setting up a virtualenv for Python ensures same dependencies and minimizes errors. 

4. Wrapper scripts are not necessary, but can greatly increase automation, especially when you scale up to several REST API's, and multiple queries within each of them. 

## Configuring Crontab

On Linux, type in the command line:

    crontab -e
    
Get a few environment basics right before anything else:

    * * * * * env > /tmp/env.output
    
The above means that every minute, every hour, every day this command will run. You only need it once, you can comment it out afterwards. `CTRL + SHIFT + O` and `ENTER` to save, then `CTRL + SHIFT + X` to exit. 

Read the content via `cat /tmp/env.output`.


## Configuring Crontab

Tell it where to find Bash

    SHELL=/bin/bash
 
Tell it the path to working directory

    PWD=/home/lauragf #YOUR USERNAME INSTEAD

Tell it any other paths your scripts depend on. Note Anaconda for Python

    PATH=/usr/local/sbin:/usr/local/bin:
    /usr/sbin:/user/bin:/bin:/home/lauragf/anaconda3/bin

## Setting up your script

Let's go take a look at our Bash scripts....

Once they are ready to go, ensure they can be executed (I like giving everyone executing permissions, but this depends entirely on you)...

    chmod a+x 
    
Test the script until you are sure it does what you want

    bash uk_emissions.sh

## Regular schedule through crontab 

    0 */2 * * * /PATH/TO/BASH/SCRIPT/uk_emissions.sh > 
    /PATH/TO/BASH/SCRIPT/uk_emissions.out 2>&1 
    
The above means that the script is to be executed during minute 0, every 2 hours, every day. 

File descriptor 1 is the standard output (stdout).
File descriptor 2 is the standard error (stderr)

` 2>&1 ` means "redirect the stderr to the same place we are redirecting the stdout"

## Plotting results

In [7]:
# Need to visualize
%run -i plot_performance.py "./logs/" "intensity"

## We can write a wrapper script for the plotting script

This script will have the same pieces as before, except instead of executing the query script, it will parse the results and generate an HTML plotly plot we can embed into a webpage. 

## Sharing results

Now that we have created plots...it is time to share!

1. Create a GitHub repository
2. Clone the repository locally
3. Push the HTML files that our script generated
4. Activate GitHub pages
5. If you like, you can get fancy and export the HTML as objects and add any customization you want


## Sharing results

6. To automate, we will need to create SSH keys, here is a guide https://zzpanqing.github.io/2017/02/28/github-push-without-username-and-password.html
7. Fiddle with it a bit if it doesn't work at first https://stackoverflow.com/questions/10116373/git-push-error-repository-not-found 
8. Test, test, test, test via `bash publish_results.sh` until you can push without needing to provide credentials
9. Once it works, activate GitHub pages
10. Add to schedule via crontab as long as you like (I like to run twice a day)