<a href="https://colab.research.google.com/github/mbaersch/piwik-pro-broken-event-checker/blob/main/PiwikPRO_Broken_Events.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Piwik PRO Debugger API Example: "Near Real-Time" Alerting with Piwik PRO & Slack

**Check Piwik PRO Debugger API for broken events and send debug info as Slack notification** (demo walkthrough).

## Why a "Broken Event Checker"?
Piwik PRO counts events that cannot be processed as *broken events*. They do not appear in reports beyond real-time... and in your hit stats. That means, that broken events count towards your limits.

When you are not able to catch such an event, it is very hard to tell what´s wrong.

## Using the Debugger API
Piwik PRO offers a separate API to access data from the tracking debugger. Every event and information about the session and visitor are available here. A different endpoint even allows to receive the raw incoming requests, including header and all parameters ans values.

This is valuable information if you want to find out, when and why broken events occur... and hopefully can fix the problem.

This example is a step-by-step walkthrough of the following tasks:

- use Python to get data out of the API for the last x minutes
- look for broken events
- send a message via Slack if broken events are found

This code is meant to be executed regularly. In [the example repository](https://github.com/mbaersch/piwik-pro-broken-event-checker) are some ideas how to do that. This notebook serves more as a demo for the different steps.

## Using Slack Apps
The function provided in this example uses a webhook to send Slack messages. For more information about how to receive messages in Slack using a Webhook visit [this help article from Slack](https://api.slack.com/messaging/webhooks).


## Links
You can find additional code in the [Example repository](https://github.com/mbaersch/piwik-pro-broken-event-checker)

### Piwik PRO API#
- [How to get API credentials](https://help.piwik.pro/support/questions/generate-api-credentials/)
- [Tracker Debugger API](https://developers.piwik.pro/en/latest/data_collection/api/tracker_debugger_api.html)

### Slack API
- [Slack Apps](https://api.slack.com/apps)
- [Slack App Webhooks](https://api.slack.com/messaging/webhooks)

### Python
- [Using "requests" for sending API calls](https://www.w3schools.com/python/ref_requests_post.asp)
- [Understanding "response" attributes](https://www.w3schools.com/python/ref_requests_response.asp)

## Preparations
Follow these steps in order to execute the following code blocks:


### Get API Credentials
Follow the steps from - [this help article](https://help.piwik.pro/support/questions/generate-api-credentials/) to get a new API access token:

1. click on your email address in the menu
2. click *API keys*
3. click the *Create a key* button, enter a name and save
4. copy both *Client ID* and *Client secret*. They will be needed to get data from the API

**Note**: keep both values in a safe place. You will have no access via the UI - and everybody with the credentials will have access to your data!

In order to use the credentials in this notebook, we will store them in the *Secrets* manager. If you want to skip this part, adjust the code below to directly contain your credentials instead of reading them from *Secrets*.

### Storing your credentials in Colab *Secrets*
Click on the key symbol in the left pane and open the *Secrets* UI. Add two entries for your client ID and client secret. Name them `ppClientID` and `ppClientSecret`. We well use these names to access the values without putting them directly in the code.


### Create Slack App (if you do not have one already)
We do not really *need* Slack for this example but it is a nice feature that allows real-time alerting.

If you do not want to use this feature, you can adjust the code and delete or comment out the line beginning with `hook_response`.  



## Start using Python
 Your Slack app will have a *Webhook URL* in the following format:  

```
https://hooks.slack.com/services/T00000000/B00000000/XXXXXXXXXXXXXXXXXXXXXXXX
```

We will use this URL to send our alerts. Just like the API credentials mentioned above, we store the URL for this example in the Colab *Secrets* Manager in order to keep public code and secret credentials separated. Use the name `slackWebhookUrl`.

For importing information from *Secrets*, we load the package `userdata` - additionally to `requests` which is needed to send requests to the webhook URL.

Using `requests`, we send a POST request to the webhook URL and add a simple message as JSON payload.



### Sending a test alert
When you execute the following block, a new message should appear in the Slack channel that is connected to your app.

In [None]:
import requests
from google.colab import userdata

webhook_url = userdata.get('slackWebhookUrl')
#this could be your webhook URL directly, example:
#webhook_url = 'https://hooks.slack.com/services/T00000000/B00000000/XXXXXXXXXXXXXXXXXXXXXXXX'

requests.post(webhook_url, headers = {"Content-type": 'application/json'}, json = {"text": "This is a test message from Colab!"})

### Getting a token with your credentials
Communication with the API requires a token that is sent as a header along with your API requests.

This token can be obtained by calling an authorization endpoint with your credentials. In order to to that, we can define a separate function `get_auth_token()` and then call it to get a token. Even if this example only contains one call to this function, it is a good idea to keep this task in a separate code block - it makes things easier if you want to reuse code for other API projects.

As we need `credentials`, we will store them in a separate `dict` variable. This might seem like overkill but it is a good idea for the same reason why we create a function for getting a token. Also, we will work with the JSON response from the API in the same way, converting content to a dict structure.

The URL of your instance (`site_url`) and site ID (`site_id`) are not that sensible (everyone can read it on your site in the browser) so we can define those directly in the code.

The last two lines in the following block call the function, store the response in a variable `token`and print the value to the console.

In [None]:
site_id = "bb338e1a-12f8-5353-ac63-9fd8b1f928a1"
site_url = "https://mbsl.piwik.pro"

credentials = {
  "client_id": userdata.get('ppClientID'),
  "client_secret": userdata.get('ppClientSecret')
}

def get_auth_token(credentials, site_url):
    auth_body = {"grant_type": "client_credentials", "client_id": credentials["client_id"], "client_secret": credentials["client_secret"]}
    return requests.post(site_url + '/auth/token', data=auth_body).json()["access_token"]

#--------------------------------------------------------------------------------------------

token = get_auth_token(credentials, site_url)
print(token)

## Piwik PRO Tracker Debugger API

The [Tracker Debugger API](https://developers.piwik.pro/en/latest/data_collection/api/tracker_debugger_api.html) has to different endpoints / methods:

- stream of events
- stream of logs

We will use both in this example. The first one will be the stream of events, where we will look for broken events.

To get this done, we will need a few more variables like type of events that we want to filter and others. But in the following block we can "hard-code" all of them and see how an API call looks like and what we get as a response.

So we need to send an API request with the same `request` that was already used for the Slack example above to a different endpoint `/api/tracker/v1/debugger`  with some parameters:

- `app_id` contains our site id from the variable definied earlier
- `lookup_window` defines how far we look into the past. It is a value in minutes, so this request will get data from the last 30 minutes
- `limit` can be used to control the amount of sessions that will be returned. For a demo, a single session is enough, so the value is *1*
- the `event_type` allows to filter sessions that contain a specific event. The different types are described [here](https://developers.piwik.pro/en/latest/data_collection/api/tracker_debugger_api.html) - we use *8* for all *goal conversion* events.  

Additionally, we have to define a `Authorization` header with our token value as `Bearer`.

The response will contain the most recent session from the last 30 minutes wirth at least one *goal conversion* event. The data received will be printed to the console, using the `print()` command.

In [None]:
url = site_url + '/api/tracker/v1/debugger?app_id=' + site_id + \
                 '&lookup_window=60&limit=1&event_type=8'
response = requests.get(url, headers={"Authorization": 'Bearer ' + token})
print(response.content.decode())
#readable version:
#print(response.content.decode().replace(",", ",\n"))

## Working with responses using JSON
The response can be converted to a JSON "object" that will be a `dict` in Python. That structure allows access to specific values. So we need to import another package, parse the response content string and then extract a single value like `session_total_goal_conversions`, the `session_entry_url`... or all events that are stored in an array in the key `events` if the API request really led to a response. If not, we just log that there are *no events*.

In [None]:
import json

if response:
    cnt = response.content.decode()
    json_cnt = json.loads(cnt)

    print("Goal conversions: " + str(json_cnt["session_total_goal_conversions"]))
    print("Session entry URL: " + str(json_cnt["session_entry_url"]))
    print(json_cnt["events"])
else:
    print("no events")


This gives us everything we need in order to find specific events, extract data from an API response and send a message using Slack.

Putting it all together and adding some code to iterate through multiple sessions, search events for matches and then get the log info from another endpoint `/api/tracker/v1/log` in order to compose an alert and send it to Slack will be enough to build a simple application for real-time reporting. "near real-time" when this code will be executed in a specific interval like once every hour.

## Complete code example

In [None]:
############################################################################################
"""
Piwik PRO API Demo
===================
Example : "near real-time" alerting for broken events in Piwik PRO
Version : 0.2.1 2024-07-16
Author  : Markus Baersch
Contact : mail@markus-baersch.de / https://www.markus-baersch.de
"""
############################################################################################

import requests
import json
from google.colab import userdata

search_debug_type = 17 #8 = Goal, 4 = Search, 17 = broken event, 18 = excluded event
site_id = "bb338e1a-12f8-5353-ac63-9fd8b1f928a1"
site_url = "https://mbsl.piwik.pro"
session_limit = 1
lookup_window = 300

#--------------------------------------------------------------------------------------------

webhook_url = userdata.get('slackWebhookUrl')
credentials = {
  "client_id": userdata.get('ppClientID'),
  "client_secret": userdata.get('ppClientSecret')
}

#--------------------------------------------------------------------------------------------

def get_auth_token(credentials, site_url):
    auth_body = {"grant_type": "client_credentials", "client_id": credentials["client_id"], "client_secret": credentials["client_secret"]}
    return requests.post(site_url + '/auth/token', data=auth_body).json()["access_token"]

#--------------------------------------------------------------------------------------------

token = get_auth_token(credentials, site_url)

try:
    rep_response = requests.get(site_url + '/api/tracker/v1/debugger?app_id=' + site_id + '&lookup_window=' +\
                  str(lookup_window) + '&limit=' + str(session_limit) + '&event_type=' +\
                  str(search_debug_type), headers={"Authorization": 'Bearer ' + token})
except requests.exceptions.HTTPError as e:
    if e.response.status_code == 401:
        print("Auth token is no longer valid.")
    else:
        print("Request error occured.")
        raise

if (rep_response):
    cnt = rep_response.content.decode()
    if (cnt == ""):
        print("no events")
    else:
        sessions = cnt.split("\n")
        for session in sessions:
          try:
            #print(session)
            #print(session.replace(",", ",\n"))
            json_session = json.loads(session)
            rep_start = json_session["server_time"]
            rep_end = json_session["updated_at"]
            events = json_session["events"]
            for event in events:
                #get sample
                #print(event["event_type"][1], event["event_id"])
                if event["event_type"][0] == search_debug_type:
                    err = "no errors"
                    if "error_message" in event:
                        err = event["error_message"]
                    print(event["event_id"], err)
                    log_url = site_url + '/api/tracker/v1/log?app_id='+ site_id+'&event_ids=' + str(event["event_id"]) +\
                              '&server_time_min=' + rep_start + '&server_time_max=' + rep_end
                    log_response = requests.get(log_url, headers={"Authorization": 'Bearer ' + token})
                    loginfo = log_response.content.decode()
                    print(loginfo)
                    hook_payload = {"text" : '*PP Event Checker Alert*: ' + event["event_type"][1] +\
                                    " found. _Message_: " + err +"\n```" + loginfo + "```"}
                    hook_response = requests.post(webhook_url, headers = {"Content-type": 'application/json'}, json = hook_payload)
                    break
          except:
              print("error parsing events")