<a href="https://colab.research.google.com/github/janilles/couch/blob/master/couch_GA_API.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# c25k app
## Reports via the Google Analytics API v4
Documentation:
- App's tracking guide for Google Analytics tagging. 
- [Google Analytics reporting API v4](https://developers.google.com/analytics/devguides/reporting/core/v4/) dev guide.
- Markdown text [cheatsheet](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet) for formatting text cells.


## Enable the API

### Upload JSON file with credentials
To get started using Analytics Reporting API v4, you need to first [use the setup tool](https://console.developers.google.com/flows/enableapi?apiid=analyticsreporting.googleapis.com&credential=client_key), which guides you through creating a project in the Google API Console, enabling the API, and creating credentials. 'Client Secrets' JSON file with credentials will be generated.

In [0]:
# to be able to upload a file  
from google.colab import files

# upload couch-to-5k-207810-201ce08efe2e.json
files.upload()


### Import libraries and dependencies

In [0]:
# Google API dependencies
import argparse
from apiclient.discovery import build
from oauth2client.service_account import ServiceAccountCredentials
import httplib2
from oauth2client import client
from oauth2client import file
from oauth2client import tools

# assigning variables used in initialising the analytics functions below
SCOPES = ['https://www.googleapis.com/auth/analytics.readonly']
DISCOVERY_URI = ('https://analyticsreporting.googleapis.com/$discovery/rest')
# generated by Google - see JSON file section above
KEY_FILE_LOCATION = 'couch-to-5k-207810-201ce08efe2e.json' 
# Service account email needs access in GA, otherwise there's 403 error
SERVICE_ACCOUNT_EMAIL = 'couch-to-5k@couch-to-5k-207810.iam.gserviceaccount.com' 

# used in all (but stitching) reports below
VIEW_ID = '171109278' # Couch To 5k v3 - Main

# boilerplate libraries
import pandas as pd

# plotting within the notebook dependencies
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('seaborn')


### Initialise analytics reporting

In [0]:
def initialize_analyticsreporting():
    """    
    Initializes an analyticsreporting service object.

    Returns:
    authorized analyticsreporting service object.
    """
    
    credentials = ServiceAccountCredentials.from_json_keyfile_name(
        KEY_FILE_LOCATION,
        scopes=SCOPES)

    http = credentials.authorize(httplib2.Http())

    # Build the service object.
    analytics = build('analytics',
                      'v4',
                      http=http,
                      discoveryServiceUrl=DISCOVERY_URI)

    return analytics

analyticsreporting = initialize_analyticsreporting()

# parses API results to Pandas data frame
def response_to_DataFrame(response):
    
    list = []
    
    # get report data
    for report in response.get('reports', []):
        # set column headers
        columnHeader = report.get('columnHeader', {})
        dimensionHeaders = columnHeader.get('dimensions', [])
        metricHeaders = columnHeader.get('metricHeader', {}).get('metricHeaderEntries', [])
        rows = report.get('data', {}).get('rows', [])
    
    for row in rows:
        # create dict for each row
        dict = {}
        dimensions = row.get('dimensions', [])
        dateRangeValues = row.get('metrics', [])

        # fill dict with dimension header (key) and dimension value (value)
        for header, dimension in zip(dimensionHeaders, dimensions):
            dict[header] = dimension

        # fill dict with metric header (key) and metric value (value)
        for i, values in enumerate(dateRangeValues):
            for metric, value in zip(metricHeaders, values.get('values')):
            
                #set int as int, float a float
                if ',' in value or ',' in value:
                    dict[metric.get('name')] = float(value)
                else:
                    dict[metric.get('name')] = int(value)

            list.append(dict)
    
    return pd.DataFrame(list)


## Install, import and initialise libraries for creating Google Sheets from Pandas dataframes
The Google Sheet we're importing into or from has to be shared with the service email account.  
In this case: couch-to-5k@couch-to-5k-207810.iam.gserviceaccount.com


In [0]:
!pip install -q gspread
!pip install -q gspread-dataframe

In [0]:
import gspread
import gspread_dataframe as gd

# Already imported above as Google API dependency:
# from oauth2client.service_account import ServiceAccountCredentials

scope = ['https://spreadsheets.google.com/feeds', 'https://www.googleapis.com/auth/drive']
credentials = ServiceAccountCredentials.from_json_keyfile_name('couch-to-5k-207810-201ce08efe2e.json', scope)

gc = gspread.authorize(credentials)

### How to create Google Sheets from Pandas data frames
- The Google Sheet we're importing into or from has to be shared with the service email. (in this case: couch-to-5k@couch-to-5k-207810.iam.gserviceaccount.com)
- Template for creating worksheets:
```python
ws = gc.open("Google Doc name").worksheet("Sheet name")
gd.set_with_dataframe(ws, df)
```
- Template for importing worksheets:
```python
ws = gc.open("Google Doc name").worksheet("Sheet name")
df = gd.get_as_dataframe(ws, usecols=[0,1,2,3,]) # specify which columns to get
```

# Google guide for obtaining reports via the API 

## How to request analytics data
```
response = analyticsreporting.reports().batchGet()
```
https://developers.google.com/analytics/devguides/reporting/core/v4/rest/v4/reports/batchGet

## Dimensions and metrics explorer

This dimensions and metrics explorer lists and describes all the dimensions and metrics available through the Core Reporting API: https://developers.google.com/analytics/devguides/reporting/core/dimsmets

# Selecting date range
You can use the 'dates' variable below to selecte the same date range for all reports.  
When selecting lifetime data which include older app versions (i.e. different Google Analytics property and view ID) select the most recent end date to include users who haven't updated their apps to the latest version.

## Important dates 
* "Couch to 5k - v1" -- started on 1 March 2016
* "Couch to 5k - v2" -- staretd on 17 April 2017 
* "Couch to 5k - v3" -- started on 24 May 2018 
  * Some tags started firing correctly only from 22 June 2018 
  * Android screen views and events are not associated for 'run interruptions' report and 'badges' report.
  * If emoji selection issue in v3 (iOS=0-4, Android=1-5) gets fixed so that iOS matches Android there'll be a cut-off date for the current report, and the step (further below) correcting it should not be run.

In [0]:
dates = [{ "startDate":"2018-06-22", "endDate":"2019-01-21"}]

# Reports

## How do you feel before and after a run
Steps producing this report:
1. Get emojis before (df_emojiBefore)
2. Get emojis after (df_emojiAfter)
3. Stich them together (df_emoji)

Drilldown by:
- Run number (week x run x)
- Run mode (initial v revist)
- Trainer name
- Operating system (iOS v Android)

### Emoji selection before a run (df_emojiBefore)


In [0]:
# see Couch to 5k tagging guide for custom dimensions and metrics

metrics = [{"expression":"ga:totalEvents"}]

dimensions = [{"name":"ga:dimension1"}, # trainer name
              {"name":"ga:dimension2"}, # run number
              {"name":"ga:dimension4"}, # run mode
              {"name":"ga:dimension8"}, # how do you feel - before
              {"name":"ga:operatingSystem"}]


response = analyticsreporting.reports().batchGet(
    
    body={
        "reportRequests":[
            {
            "viewId":VIEW_ID,
            "pageSize": "100000", # this is the max number of rows the API returns
            "dateRanges":dates,
            "metrics":metrics,
            "dimensions": dimensions,
            "dimensionFilterClauses": [
                {
                "filters": [
                    {
                    "dimensionName": "ga:eventAction",
                    "operator": "EXACT",
                    "expressions": ["Before You Run"]
                    }
                  ]
                },
                {
                "filters": [
                    {
                    "dimensionName": "ga:dimension2",
                    "operator": "BEGINS_WITH",
                    "expressions": ["Week"]                
                    }
                  ]
                },
                {
                "filters": [
                    {
                    "dimensionName": "ga:operatingSystem", # to exclude Blackberry and '(not set)'
                    "operator": "IN_LIST",
                    "expressions": ["iOS", "Android"]
                    }
                  ]
                }
              ]
            }
          ]
        }
    ).execute()

df_emojiBefore = response_to_DataFrame(response)

df_emojiBefore.head()


### Correcting iOS emoji values in df_emojiBefore
This is correcting the difference in indexing of emojis on iOS (which is from 0-4) and Android (1-5). It should have been the same as on both as per the app's tracking guide. If this is fixed at a future date this step won't be necessary to run but there will have to be a note of the transition. See 'date range' section above. 

In [0]:
# change datatype of the custom dimension (emoji number) from 'object' to 'integer' 
# to be able to perform the calculation in the function below
df_emojiBefore['ga:dimension8'] = df_emojiBefore['ga:dimension8'].astype('int64')

# create a function to align the iOS values with Android
def OScorrectionBefore(row): 
    if row['ga:operatingSystem'] == 'iOS':
        return row['ga:dimension8'] + 1
    else:
        return row['ga:dimension8']
    
# create a new column with the aligned (i.e. OS-corected) emoji values
df_emojiBefore['emoji'] = df_emojiBefore.apply(OScorrectionBefore, axis=1)

df_emojiBefore.head()


### Formatting df_emojiBefore

In [0]:
# drop ga:dimension8 column as it's not needed anymore
df_emojiBefore.drop(columns='ga:dimension8', inplace=True)

# convert relevant columns to data type 'category' - saves memory
df_emojiBefore['ga:dimension1'] = df_emojiBefore['ga:dimension1'].astype('category')
df_emojiBefore['ga:dimension2'] = df_emojiBefore['ga:dimension2'].astype('category')
df_emojiBefore['ga:dimension4'] = df_emojiBefore['ga:dimension4'].astype('category')
df_emojiBefore['ga:operatingSystem'] = df_emojiBefore['ga:operatingSystem'].astype('category')
df_emojiBefore['emoji'] = df_emojiBefore['emoji'].astype('category')

# rename column headers
df_emojiBefore.columns = ['trainerName',
                          'runNumber',
                          'runMode',
                          'OS',
                          'totalEventsBefore',
                          'emoji']

df_emojiBefore.head()


### Emoji selection after a run (df_emojiAfter)

In [0]:
# see Couch to 5k tagging guide for custom dimensions and metrics

metrics = [{"expression":"ga:totalEvents"}]

dimensions = [{"name":"ga:dimension1"}, # trainer name
              {"name":"ga:dimension2"}, # run number
              {"name":"ga:dimension4"}, # run mode
              {"name":"ga:dimension9"}, # how do you feel - after
              {"name":"ga:operatingSystem"}]


response = analyticsreporting.reports().batchGet(

    body={
        "reportRequests":[
            {
            "viewId":VIEW_ID,
            "pageSize": "100000", # this is the max number of rows the API returns
            "dateRanges":dates,
            "metrics":metrics,
            "dimensions": dimensions,
            "dimensionFilterClauses": [
                {
                "filters": [
                    {
                    "dimensionName": "ga:eventAction",
                    "operator": "EXACT",
                    "expressions": ["Save"]
                    }
                  ]
                },
                {
                "filters": [
                    {
                    "dimensionName": "ga:dimension2",
                    "operator": "BEGINS_WITH",
                    "expressions": ["Week"]                
                    }
                  ]
                },
                {
                "filters": [
                    {
                    "dimensionName": "ga:operatingSystem", # to exclude Blackberry and '(not set)'
                    "operator": "IN_LIST",
                    "expressions": ["iOS", "Android"]  
                    }
                  ]
                }
              ]
            }
          ]
        }
    ).execute()

df_emojiAfter = response_to_DataFrame(response)

df_emojiAfter.head()


### Correcting iOS emoji values in df_emojiAfter
This is correcting a mistake in the indexing of emojis on iOS (which is from 0-4). It should have been the same as on Android (1-5 range) as per Panos' tagging document. If this is fixed at a future date this step won't be necessary to run but there will have to be a note of the transition. See 'date range' section above. 

In [0]:
# change datatype of the custom dimension (emoji number) from 'object' to 'integer' 
# to be able to perform the calculation in the function below
df_emojiAfter['ga:dimension9'] = df_emojiAfter['ga:dimension9'].astype('int64')

# create a function to align the iOS values with Android
def OScorrectionAfter(row):
  
    if row['ga:operatingSystem'] == 'iOS':
        return row['ga:dimension9'] + 1
    else:
        return row['ga:dimension9']

# create a new column with the aligned (i.e. OS-corected) emoji values
df_emojiAfter['emoji'] = df_emojiAfter.apply(OScorrectionAfter, axis=1)

df_emojiAfter.head()


### Formatting df_emojiAfter

In [0]:
# drop ga:dimension9 column as it's not needed anymore
df_emojiAfter.drop(columns='ga:dimension9', inplace=True)

# convert relevant columns to data type 'category'
# saves memory space but also helpful for feeding Datastudio?
df_emojiAfter['ga:dimension1'] = df_emojiAfter['ga:dimension1'].astype('category')
df_emojiAfter['ga:dimension2'] = df_emojiAfter['ga:dimension2'].astype('category')
df_emojiAfter['ga:dimension4'] = df_emojiAfter['ga:dimension4'].astype('category')
df_emojiAfter['ga:operatingSystem'] = df_emojiAfter['ga:operatingSystem'].astype('category')
df_emojiAfter['emoji'] = df_emojiAfter['emoji'].astype('category')

# rename column headers
df_emojiAfter.columns = ['trainerName',
                         'runNumber',
                         'runMode',
                         'OS',
                         'totalEventsAfter',
                         'emoji']

df_emojiAfter.head()


### Joining up df_emojiBefore and df_emojiAfter into df_emoji

In [0]:
df_emoji = pd.concat([df_emojiBefore, df_emojiAfter])

df_emoji.head()


In [0]:
df_emoji = df_emoji.groupby(['runNumber',
                             'runMode',
                             'trainerName',
                             'OS',
                             'emoji'], as_index=False).sum().

df_emoji.head()


In [0]:
# replace NaN values seen in the dataframe above 
df_emoji[['totalEventsAfter',
          'totalEventsBefore']] = df_emoji[['totalEventsAfter',
                                            'totalEventsBefore']].fillna(0)

df_emoji.head()


### Exporting df_emoji to Google Sheet

[couchColab](https://docs.google.com/spreadsheets/d/1XAavlY-yet8EeLTsD7GyRmE8TVWLb2CT5glqmEhwlyQ/edit?usp=sharing)

In [0]:
# for clarity, name the worksheet tab as the dataframe imported into it 
ws_emoji = gc.open("couchColab").worksheet("df_emoji")

# gspread_dataframe imported as gd
gd.set_with_dataframe(ws_emoji, df_emoji)


### Ploting df_emoji with Altair-viz library
https://altair-viz.github.io/gallery

#### Preparing data

In [0]:
df = df_emoji.groupby('emoji')['totalEventsAfter',
                               'totalEventsBefore'].sum().reset_index()

df.head()


In [0]:
melted = pd.melt(df, 
                 id_vars=['emoji'], 
                 value_vars=['totalEventsAfter', 'totalEventsBefore'], 
                 value_name='events')

melted


#### Layered area chart

In [0]:
import altair as alt

alt.Chart(melted, height=400, width=400).mark_area(opacity=0.3).encode(
    x="emoji:O",
    y=alt.Y("events:Q", stack=None),
    color="variable:N"
)

## Run interruptions - forward/pause/rewind 

Drilldown by:
- Run number (week x run x)
- Run mode (initial v repeat)
- Screen name (intro, warm up, running, cooldown, outro)
- Operating system (Android not associated - bug raised with developers)


### df_runInterruptions

In [0]:
# see Couch to 5k tagging guide for custom dimensions and metrics

metrics = [{"expression":"ga:totalEvents"}]

dimensions = [{"name":"ga:screenName"}, 
              {"name":"ga:eventAction"}, # forward / pause / rewind
              {"name":"ga:dimension1"}, # trainer name
              {"name":"ga:dimension2"}, # run number
              {"name":"ga:operatingSystem"}] 
              #{"name":"ga:dimension4"}] run mode not associated with run screen but session


response = analyticsreporting.reports().batchGet(
  
    body={
        "reportRequests":[
            {
            "viewId":VIEW_ID,
            "pageSize": "100000", # this is the max number of rows the API returns
            "dateRanges":dates,
            "metrics":metrics,
            "dimensions": dimensions,
            "dimensionFilterClauses": [
                {
                "filters": [
                    {
                    "dimensionName": "ga:eventAction",
                    "operator": "IN_LIST",
                    "expressions": ["Forward", 
                                    "Pause", 
                                    "Rewind"]
                    }
                  ]
                },
                {
                "filters": [
                    {
                    "dimensionName": "ga:screenName",
                    "operator": "IN_LIST",
                    "expressions": ["Run: Intro", 
                                    "Run: Warm Up", 
                                    "Run: Running", 
                                    "Run: Cooldown", 
                                    "Run: Outro"]
                    }
                  ]
                },
                {
                "filters": [
                    {
                    "dimensionName": "ga:dimension2",
                    "operator": "BEGINS_WITH",
                    "expressions": ["Week"]                
                    }
                  ]
                },
                {
                "filters": [
                    {
                    "dimensionName": "ga:operatingSystem", # to exclude Blackberry and '(not set)'
                    "operator": "IN_LIST",
                    "expressions": ["iOS", "Android"]  
                    }
                  ]
                }
              ]
            }
          ]
        }
    ).execute()

df_runInterruptions = response_to_DataFrame(response)

df_runInterruptions.head()


### Formatting df_runInterruptions

In [0]:
# convert relevant columns to data type 'category'
# saves memory space but also helpful for feeding Datastudio?
df_runInterruptions['ga:dimension1'] = df_runInterruptions['ga:dimension1'].astype('category')
df_runInterruptions['ga:dimension2'] = df_runInterruptions['ga:dimension2'].astype('category')
df_runInterruptions['ga:eventAction'] = df_runInterruptions['ga:eventAction'].astype('category')
df_runInterruptions['ga:operatingSystem'] = df_runInterruptions['ga:operatingSystem'].astype('category')
df_runInterruptions['ga:screenName'] = df_runInterruptions['ga:screenName'].astype('category')

In [0]:
# rename screen names so that they can be ordered alphabetically in charts
# TBC

# change dimensions to data type 'category'
# TBC

# rename column headers
df_runInterruptions.columns = ['trainerName',
                               'runNumber',
                               'eventAction',
                               'OS',
                               'screenName',
                               'totalEvents']

df_runInterruptions.head()


### Exporting df_runInterruptions to Google Sheet

[couchColab](https://docs.google.com/spreadsheets/d/1XAavlY-yet8EeLTsD7GyRmE8TVWLb2CT5glqmEhwlyQ/edit?usp=sharing)

In [0]:
# for clarity, name the worksheet tab as the dataframe imported into it
ws_runInterruptions = gc.open("couchColab").worksheet("df_runInterruptions")

# gspread_dataframe imported as gd
gd.set_with_dataframe(ws_runInterruptions, df_runInterruptions)


## End run -- including minutes ran -- report

Drilldown by:
- Run number (week x run x)
- Run mode (initial v revisit)
- Trainer name
- Operating system

### Getting df_endRun

In [0]:
# it needs pagesize in .batchGet() because it'll be cut off at default 1000 lines
# see Couch to 5k tagging guide for custom dimensions and metrics

metrics = [{"expression":"ga:totalEvents"}]

dimensions = [{"name":"ga:dimension1"}, # trainer name
              {"name":"ga:dimension2"}, # run number
              {"name":"ga:dimension4"}, # run mode
              {"name":"ga:eventLabel"}, 
              {"name": "ga:operatingSystem"}]

response = analyticsreporting.reports().batchGet(
  
    body={
        "reportRequests":[
            {
            "viewId":VIEW_ID,
            "pageSize": "100000", # this is the max number of rows the API returns
            "dateRanges":dates,
            "metrics":metrics,
            "dimensions": dimensions,
            "dimensionFilterClauses": [
                {
                "filters": [
                    {
                    "dimensionName": "ga:eventAction",
                    "operator": "IN_LIST",
                    "expressions": ["End Run"]
                    }
                  ]
                },
                {
                "filters": [
                    {
                    "dimensionName": "ga:dimension2",
                    "operator": "BEGINS_WITH",
                    "expressions": ["Week"]                
                    }
                  ]
                },
                {
                "filters": [
                    {
                    "dimensionName": "ga:operatingSystem", # to exclude Blackberry and '(not set)'
                    "operator": "IN_LIST",
                    "expressions": ["iOS", "Android"]  
                    }
                  ]
                }
              ]
            }
          ]
        }
    ).execute()

df_endRun = response_to_DataFrame(response)

df_endRun.head()


### Formatting df_endRun

In [0]:
df_endRun.columns = ['trainerName',
                     'runNumber',
                     'runMode', 
                     'endTime',
                     'OS',
                     'totalEvents']

df_endRun.head()


### Exporting df_endRun to Google Sheet

[couchColab](https://docs.google.com/spreadsheets/d/1XAavlY-yet8EeLTsD7GyRmE8TVWLb2CT5glqmEhwlyQ/edit?usp=sharing)

In [0]:
# for clarity, name the worksheet tab as the dataframe imported into it
ws_endRun = gc.open("couchColab").worksheet("df_endRun")

# gspread_dataframe imported as gd
gd.set_with_dataframe(ws_endRun, df_endRun)


### Importing df_endRun back from Google Sheet

In [0]:
df_get = gd.get_as_dataframe(ws_endRun, # defined above
                             usecols=[0,1,2,3,4,5]) # specify which columns to get

df_get.head()


In [0]:
# endTime column was imported in different shape than exported 
# xx:xx v xx:xx:xx but the datatype is still 'object'
df_get.info()


## When are most runs completed - 'Save this run'
Based on users clicking 'Save this run' button, as opposed to 'Run in the bag' screen name.

Drilldown by:
- Week day
- Hour of day
- Run number (week x run x)
- Run mode (initial v revisit)
- Trainer name
- Operating system

### Getting df_runsSave

In [0]:
# it needs pagesize in .batchGet() because it'll be cut off at default 1000 lines
# we have 7 days * 24 hours * 27 runs * 2 Run modes * 5 trainers * 2 OS

# see Couch to 5k tagging guide for custom dimensions and metrics

metrics = [{"expression":"ga:totalEvents"}]

dimensions = [{"name":"ga:dimension1"}, # trainer name
              {"name":"ga:dimension2"}, # run number 
              {"name":"ga:dimension4"}, # run mode
              {"name":"ga:hour"}, 
              {"name":"ga:dayOfWeekName"},
              {"name": "ga:operatingSystem"}]

response = analyticsreporting.reports().batchGet(
  
    body={
        "reportRequests":[
            {
            "viewId":VIEW_ID,
            "pageSize": "100000", # needs to be more than the default 1000 for this report
            "dateRanges":dates,
            "metrics":metrics,
            "dimensions": dimensions,
            "dimensionFilterClauses": [
                {
                "filters": [
                    {
                    "dimensionName": "ga:eventAction",
                    "operator": "EXACT",
                    "expressions": ["Save"]
                    }
                  ]
                },
                {
                "filters": [
                    {
                    "dimensionName": "ga:dimension2", # to filter out '(optional week 1...)'
                    "operator": "BEGINS_WITH",
                    "expressions": ["Week"]                
                    }
                  ]
                },
                {
                "filters": [
                    {
                    "dimensionName": "ga:operatingSystem", # to exclude Blackberry and '(not set)'
                    "operator": "IN_LIST",
                    "expressions": ["iOS", "Android"]  
                    }
                  ]
                }
              ]
            }
          ]
        }
    ).execute()

df_runsSave = response_to_DataFrame(response)

df_runsSave.head()


### Formatting df_runsSave

In [0]:
df_runsSave.columns = ['dayOfWeek',
                       'trainerName',
                       'runNumber', 
                       'runMode',
                       'hourOfDay',
                       'OS',
                       'totalEvents']

df_runsSave.head(1)


### Exporting df_runsSave to Google Sheets
[couchColab](https://docs.google.com/spreadsheets/d/1XAavlY-yet8EeLTsD7GyRmE8TVWLb2CT5glqmEhwlyQ/edit?usp=sharing)

In [0]:
# for clarity, name the worksheet tab as the dataframe imported into it
ws_runsSave = gc.open("couchColab").worksheet("df_runsSave")

# gspread_dataframe imported as gd
gd.set_with_dataframe(ws_runsSave, df_runsSave)


## When are most runs completed - 'Run in the bag'
Based on 'Run in the bag' screen name, as opposed to users clicking 'Save this run' button.  

Drilldown by:
- Week day
- Hour of day
- Run number (week x run x)
- Run mode (initial v revisit)
- Trainer name
- Operating system

### Getting df_runsBag

In [0]:
# it needs pagesize in .batchGet() because it'll be cut off at default 1000 lines
# we have 7 days * 24 hours * 27 runs * 2 Run modes * 5 trainers * 2 OS
# see Couch to 5k tracking guide for custom dimensions and metrics

metrics = [{"expression":"ga:screenViews"}]

dimensions = [{"name":"ga:dimension1"}, # trainer name
              {"name":"ga:dimension2"}, # run number 
              {"name":"ga:dimension4"}, # run mode
              {"name":"ga:hour"}, 
              {"name":"ga:dayOfWeekName"},
              {"name": "ga:operatingSystem"}]

response = analyticsreporting.reports().batchGet(
  
    body={
        "reportRequests":[
            {
            "viewId":VIEW_ID,
            "pageSize": "100000", # needs to be more than the default 1000 for this report
            "dateRanges":dates,
            "metrics":metrics,
            "dimensions": dimensions,
            "dimensionFilterClauses": [
                {
                "filters": [
                    {
                    "dimensionName": "ga:screenName",
                    "operator": "EXACT",
                    "expressions": ["Run: In The Bag"]
                    }
                  ]
                },
                {
                "filters": [
                    {
                    "dimensionName": "ga:dimension2", # to filter out '(optional week 1...)'
                    "operator": "BEGINS_WITH",
                    "expressions": ["Week"]                
                    }
                  ]
                },
                {
                "filters": [
                    {
                    "dimensionName": "ga:operatingSystem", # to exclude Blackberry and '(not set)'
                    "operator": "IN_LIST",
                    "expressions": ["iOS", "Android"]  
                    }
                  ]
                }
              ]
            }
          ]
        }
    ).execute()

df_runsBag = response_to_DataFrame(response)

df_runsBag.head()


### Formatting df_runsBag

In [0]:
df_runsBag.columns = ['dayOfWeek',
                      'trainerName',
                      'runNumber', 
                      'runMode',
                      'hourOfDay',
                      'OS',
                      'screenViews']

df_runsBag.head(1)


### Plotting df_runsBag

#### Formatting data for a heatmap

In [0]:
df_grouped = df_runsBag.groupby(['dayOfWeek', 
                                 'hourOfDay'], as_index=False)['screenViews'].sum()

pivoted_runsBag = pd.pivot_table(df_grouped, 
                                 index="hourOfDay", 
                                 columns="dayOfWeek", 
                                 values="screenViews")

pivoted_runsBag.head()


In [0]:
# manualy sort columns (can't be done in Datastudio)
pivoted_runsBag = pivoted_runsBag.reindex(['Monday',
                                           'Tuesday', 
                                           'Wednesday', 
                                           'Thursday', 
                                           'Friday',
                                           'Saturday', 
                                           'Sunday'], axis=1)

# sort hourOfDay to descending so that y-axis 0 = 00
pivoted_runsBag = pivoted_runsBag.sort_index(ascending=False, axis=0)

pivoted_runsBag.head(26)


#### Seaborn library
https://seaborn.pydata.org/generated/seaborn.heatmap.html

In [0]:
import seaborn as sns

In [0]:
# customise the size of the plot and assign axes object
fig, ax = plt.subplots(1, 1, figsize = (14, 10))

# increase font size
sns.set(font_scale=1.4)

sns.heatmap(pivoted_runsBag, 
            ax = ax, # implement the above formatting, else it's default  
            annot=True, # show numbers in the heatmap
            annot_kws={"size": 12}, # size of the text inside heatmap
            fmt='g', # format of text inside heatmap (deafault is scientific)
            linewidths=1, # adding the white lines and specifying their thickness
            cbar=False, # hide color bar
            cmap="YlGnBu"); # color scheme

# customising labels
ax.set_xlabel('') # removing label with a blank string
ax.set_ylabel('Hour of day')
ax.set_title('When are most runs completed?');


### Exporting df_runsBag to Google Sheet
[couchColab](https://docs.google.com/spreadsheets/d/1XAavlY-yet8EeLTsD7GyRmE8TVWLb2CT5glqmEhwlyQ/edit?usp=sharing)

In [0]:
# for clarity, name the worksheet tab as the dataframe imported into it
ws_runsBag = gc.open("couchColab").worksheet("df_runsBag")

# gspread_dataframe imported as gd
gd.set_with_dataframe(ws_runsBag, df_runsBag)


## New users (downloads) by weekday and hour of day
Drilldown by:
- Operating system
- Trainer name

### Getting df_newUsers

In [0]:
metrics = [{"expression":"ga:newUsers"}]

dimensions = [{"name":"ga:dayOfWeekName"},
              {"name":"ga:hour"}, 
              {"name":"ga:operatingSystem"}, 
              {"name":"ga:dimension1"}] # trainer name

response = analyticsreporting.reports().batchGet(
  
    body={
        "reportRequests":[
            {
            "viewId":VIEW_ID,
            "dateRanges":dates,
            "pageSize": "100000", # needs to be more than the default 1000 for this report
            "metrics":metrics,
            "dimensions": dimensions,
            "dimensionFilterClauses": [
                {
                "filters": [
                    {
                    "dimensionName": "ga:operatingSystem", # to exclude Blackberry and '(not set)'
                    "operator": "IN_LIST",
                    "expressions": ["iOS", "Android"]  
                    }
                  ]
                }
              ]
            }
          ]
        }
    ).execute()

df_newUsers = response_to_DataFrame(response)

df_newUsers.head()


### Fromatting df_newUsers

In [0]:
df_newUsers.columns = ['dayOfWeek',
                       'trainerName',
                       'hourOfDay',
                       'newUsers',
                       'OS']

df_newUsers.head(1).


### Exporting df_newUsers to Google Sheets
[couchColab](https://docs.google.com/spreadsheets/d/1XAavlY-yet8EeLTsD7GyRmE8TVWLb2CT5glqmEhwlyQ/edit?usp=sharing)

In [0]:
# for clarity, name the worksheet tab as the dataframe imported into it
ws_newUsers = gc.open("couchColab").worksheet("df_newUsers")

# gspread_dataframe imported as gd
gd.set_with_dataframe(ws_newUsers, df_newUsers)


## Health unlocked
Clicks on 'Health unlocked' option in the app and then (on the subsequent screen) to the 'Health unlocked' forum outside the app.  
Breakdown by:
- Trainer name
- Run number
- Run mode
- Operating system

### Getting df_healthUnlocked

In [0]:
# see Couch to 5k tagging guide for custom dimensions and metrics

metrics = [{"expression":"ga:totalEvents"}]

dimensions = [{"name":"ga:dimension1"}, # trainer name
              {"name":"ga:dimension2"}, # run number
              {"name":"ga:dimension4"}, # run mode
              {"name":"ga:eventAction"}, 
              {"name":"ga:operatingSystem"}]


response = analyticsreporting.reports().batchGet(
    
    body={
        "reportRequests":[
            {
            "viewId":VIEW_ID,
            "dateRanges":dates,
            "metrics":metrics,
            "dimensions": dimensions,
            "dimensionFilterClauses": [
                {
                "filters": [
                    {
                    "dimensionName": "ga:eventAction",
                    "operator": "PARTIAL",
                    "expressions": ["Health"]
                    }
                  ]
                },
                {
                "filters": [
                    {
                    "dimensionName": "ga:dimension2",
                    "operator": "BEGINS_WITH",
                    "expressions": ["Week"]                
                    }
                  ]
                },
                {
                "filters": [
                    {
                    "dimensionName": "ga:operatingSystem", # to exclude Blackberry and '(not set)'
                    "operator": "IN_LIST",
                    "expressions": ["iOS", "Android"]
                    }
                  ]
                }
              ]
            }
          ]
        }
    ).execute()

df_healthUnlocked = response_to_DataFrame(response)

df_healthUnlocked.head()


### Formatting df_healthUnlocked

In [0]:
# Even though there are different capitalisations in eventAction 'on' and 'On' 
# Tableau deals with it automatically - no need to format it for that 

df_healthUnlocked.columns = ['trainerName',
                             'runNumber',
                             'runMode', 
                             'healthUnlocked',
                             'OS',
                             'totalEvents']

df_healthUnlocked.head(1)


### Exporting df_healthUnlocked to Google Sheets
[couchColab](https://docs.google.com/spreadsheets/d/1XAavlY-yet8EeLTsD7GyRmE8TVWLb2CT5glqmEhwlyQ/edit?usp=sharing)

In [0]:
# for clarity, name the worksheet tab as the dataframe imported into it
ws_healthUnlocked = gc.open("couchColab").worksheet("df_healthUnlocked")

# gspread_dataframe imported as gd
gd.set_with_dataframe(ws_healthUnlocked, df_healthUnlocked)


## Milestone badges and graduation
Drilldown by:
- Trainer name
- Operating system (Android not associated - bug raised with developers)

### Getting df_badges

In [0]:
# see Couch to 5k tagging guide for custom dimensions and metrics

metrics = [{"expression":"ga:totalEvents"}]

dimensions = [{"name":"ga:dimension1"}, # trainer name
              {"name":"ga:eventLabel"}, 
              {"name":"ga:operatingSystem"}]


response = analyticsreporting.reports().batchGet(
    
    body={
        "reportRequests":[
            {
            "viewId":VIEW_ID,
            "dateRanges":dates,
            "metrics":metrics,
            "dimensions": dimensions,
            "dimensionFilterClauses": [
                {
                "filters": [
                    {
                    "dimensionName": "ga:eventLabel",
                    "operator": "ENDS_WITH",
                    "expressions": ["Badge"]                
                    }
                  ]
                },
                {
                "filters": [
                    {
                    "dimensionName": "ga:operatingSystem", # to exclude Blackberry and '(not set)'
                    "operator": "IN_LIST",
                    "expressions": ["iOS", "Android"]
                    }
                  ]
                }
              ]
            }
          ]
        }
    ).execute()

df_badges = response_to_DataFrame(response)

df_badges.head()


### Formatting df_badges

In [0]:
df_badges.columns = ['trainerName', 'badge', 'OS', 'totalEvents']

df_badges.head(1)


### Exporting df_badges to Google Sheets
[couchColab](https://docs.google.com/spreadsheets/d/1XAavlY-yet8EeLTsD7GyRmE8TVWLb2CT5glqmEhwlyQ/edit?usp=sharing)

In [0]:
# for clarity, name the worksheet tab as the dataframe imported into it
ws_badges = gc.open("couchColab").worksheet("df_badges")

# gspread_dataframe imported as gd
gd.set_with_dataframe(ws_badges, df_badges)


## Runs started - stitching v3 and v2 

Report of ga:totalEvents clicking 'Go' button to start a run.

### df_v3

In [0]:
viewID_v3 = '171109278' # Couch To 5k v3 - Main

dates_v3 = [{ "startDate":"2018-05-24", "endDate":"2019-01-21"}]

metrics_v3 = [{"expression":"ga:totalEvents"}]

dimensions_v3 = [{"name":"ga:isoYearIsoWeek"}]

response = analyticsreporting.reports().batchGet(
  
    body={
        "reportRequests":[
            {
            "viewId":viewID_v3, 
            "dateRanges":dates_v3,
            "metrics":metrics_v3,
            "dimensions": dimensions_v3,
            "dimensionFilterClauses": [
                {
                "filters": [
                    {
                    "dimensionName": "ga:eventAction",
                    "operator": "EXACT",
                    "expressions": ["Go"]
                    }
                  ]
                }
              ]
            }
          ]
        }
    ).execute()

df_v3 = response_to_DataFrame(response)

# sort alphabetically on label in prep for stitching 
df_v3.sort_values(by=['ga:isoYearIsoWeek'], inplace=True)

df_v3.columns = ['iso_time', 'v3_events']

df_v3.head()


In [0]:
df_v3.plot();


### df_v2

In [0]:
viewID_v2 = '141192389' # Couch To 5k v2 - Main

dates_v2 = [{ "startDate":"2017-04-17", "endDate":"2019-01-21"}]

metrics_v2 = [{"expression":"ga:totalEvents"}]

dimensions_v2 = [{"name":"ga:isoYearIsoWeek"}]

response = analyticsreporting.reports().batchGet(
  
    body={
        "reportRequests":[
            {
            "viewId":viewID_v2,
            "dateRanges":dates_v2,
            "metrics":metrics_v2,
            "dimensions": dimensions_v2,
            "dimensionFilterClauses": [
                {
                "filters": [
                    {
                    "dimensionName": "ga:eventLabel",
                    "operator": "ENDS_WITH",
                    "expressions": ["Go_start_run"]
                    }
                  ]
                }
              ]
            }
          ]
        }
    ).execute()

df_v2 = response_to_DataFrame(response)

# sort alphabetically on label in prep for stitching 
df_v2.sort_values(by=['ga:isoYearIsoWeek'], inplace=True)

df_v2.columns = ['iso_time', 'v2_events']

df_v2.head()


In [0]:
df_v2.plot();


### df_merged

In [0]:
df_merged = pd.merge(df_v3, df_v2, how='outer', on='iso_time')

df_merged.sort_values(by=['iso_time'], inplace=True)

df_merged.head()


In [0]:
df_merged.fillna(0, inplace=True)

df_merged.head()


In [0]:
df_merged['starts'] = (df_merged['v3_events'] + df_merged['v2_events']).astype('int')

df_merged.head()


In [0]:
df_merged.drop(columns=['v3_events', 'v2_events'], inplace=True)

df_merged.head()


### df_merged plot

In [0]:
df_merged.plot.bar(x='iso_time',
                   title='Runs started by week',
                   legend=False,
                   figsize=(18,10));


### Exporting df_merged to Google Sheet

[couchColab](https://docs.google.com/spreadsheets/d/1XAavlY-yet8EeLTsD7GyRmE8TVWLb2CT5glqmEhwlyQ/edit?usp=sharing)

In [0]:
# for clarity, name the worksheet tab as the dataframe imported into it
ws_merged = gc.open("couchColab").worksheet("df_merged")

# gspread_dataframe imported as gd
gd.set_with_dataframe(ws_merged, df_merged)


## Runs completed - stitching v3 and v2 

Report of 'Run in the bag' (unique?) screen views.

### df3

In [0]:
viewID_v3 = '171109278' # Couch To 5k v3 - Main

dates_v3 = [{ "startDate":"2018-05-24", "endDate":"2019-01-21"}]

metrics_v3 = [{"expression":"ga:screenViews"}]

dimensions_v3 = [{"name":"ga:isoYearIsoWeek"}]

response = analyticsreporting.reports().batchGet(
  
    body={
        "reportRequests":[
            {
            "viewId":viewID_v3, 
            "dateRanges":dates_v3,
            "metrics":metrics_v3,
            "dimensions": dimensions_v3,
            "dimensionFilterClauses": [
                {
                "filters": [
                    {
                    "dimensionName": "ga:screenName",
                    "operator": "EXACT",
                    "expressions": ["Run: In The Bag"]
                    }
                  ]
                }
              ]
            }
          ]
        }
    ).execute()

df3 = response_to_DataFrame(response)

# sort alphabetically on label in prep for stitching 
df3.sort_values(by=['ga:isoYearIsoWeek'], inplace=True)

df3.columns = ['iso_time', 'v3_bag']

df3.head()


In [0]:
df3.plot();


### df2

In [0]:
viewID_v2 = '141192389' # Couch To 5k v2 - Main

dates_v2 = [{ "startDate":"2017-04-17", "endDate":"2019-01-21"}]

metrics_v2 = [{"expression":"ga:screenViews"}]

dimensions_v2 = [{"name":"ga:isoYearIsoWeek"}]

response = analyticsreporting.reports().batchGet(
  
    body={
        "reportRequests":[
            {
            "viewId":viewID_v2,
            "dateRanges":dates_v2,
            "metrics":metrics_v2,
            "dimensions": dimensions_v2,
            "dimensionFilterClauses": [
                {
                "filters": [
                    {
                    "dimensionName": "ga:screenName",
                    "operator": "ENDS_WITH",
                    "expressions": ["Run In_the_bag!"]
                    }
                  ]
                }
              ]
            }
          ]
        }
    ).execute()

df2 = response_to_DataFrame(response)

# sort alphabetically on label in prep for stitching 
df2.sort_values(by=['ga:isoYearIsoWeek'], inplace=True)

df2.columns = ['iso_time', 'v2_bag']

df2.head()


In [0]:
df2.plot();


### df_merged_bag

In [0]:
df_merged_bag = pd.merge(df3, df2, how='outer', on='iso_time')

df_merged_bag.fillna(0, inplace=True)

df_merged_bag['inTheBag'] = (df_merged_bag['v3_bag'] + df_merged_bag['v2_bag']).astype('int')

df_merged_bag.drop(columns=['v3_bag', 'v2_bag'], inplace=True)

df_merged_bag.sort_values(by=['iso_time'], inplace=True)

df_merged_bag.head()


In [0]:
df_merged_bag.plot.bar(x='iso_time',
                       title='Runs in the bag by week',
                       legend=False,
                       figsize=(18,10));


### Exporting df_merged_bag to Google Sheet

[couchColab](https://docs.google.com/spreadsheets/d/1XAavlY-yet8EeLTsD7GyRmE8TVWLb2CT5glqmEhwlyQ/edit?usp=sharing)

In [0]:
# for clarity, name the worksheet tab as the dataframe imported into it
ws_merged_bag = gc.open("couchColab").worksheet("df_merged_bag")

# gspread_dataframe imported as gd
gd.set_with_dataframe(ws_merged_bag, df_merged_bag)
