# Speed Metrics

Loads and saves PageSpeed and page loading time.

Currently uses a local PHP script sending requests to the GT Metrix API.

A complete test may last up to 30 minutes.

For mobile device performances tested with “mobile_metrics.ipynb”. GT Metrix doesn't offer testing on mobile devices for now.

Feel free to contact me for help: https://www.quel-media.com/about.html#contact

© Paul Ronga under Apache-2 Licence (see LICENCE.txt).

In [1]:
import pandas as pd
import requests
from IPython.display import HTML
import json
import datetime

In [2]:
# change this for your local tester / an external tool
TESTER_URL = 'http://rospo.local/~paul/gtmetrix/medias.php'

In [3]:
# dataframe containing media id, name and URLs
medias = pd.read_csv('df/media_list.csv')

medias.head(2)

Unnamed: 0,media_id,Name,URL_short,URL,URL_mobile
0,19,La Tribune de Genève,tdg.ch,https://www.tdg.ch/,https://m.tdg.ch
1,20,24 heures,24heures.ch,https://www.24heures.ch,https://m.24heures.ch


In [4]:
# remove Konbini
medias = medias[medias['media_id'] < 34].copy()

# media id as string
medias['media_id'] = medias['media_id'].apply(lambda x: str(x))

missing_medias = None

In [5]:
# this new dataframe will contain our stats
df_speed = pd.DataFrame(columns=['Name', 'media_id', 'pagespeed_score', 'page_load_time', 'fully_loaded_time', 'report_url'])

In [13]:
# Run again if failed
target_medias = medias
if missing_medias is not None:
    print('Getting missing results')
    target_medias = missing_medias

for i, row in target_medias.iterrows():
    print('Testing', row['Name'], '...')
    media_index = i

    payload = {'media': medias.loc[media_index][['Name', 'media_id', 'URL']].to_dict()}
    r = requests.post(TESTER_URL, json=payload)
    
    print(r.text, end='\n\n')
    
    result = json.loads(r.text.split('\n')[-1])

    df_speed = df_speed.append(pd.DataFrame([[
        result['media']['Name'],
        result['media']['media_id'],
        result['results']['pagespeed_score'],
        result['results']['page_load_time'] / 1000,
        result['results']['fully_loaded_time'] / 1000,
        result['results']['report_url']
    ]], columns=['Name', 'media_id', 'pagespeed_score', 'page_load_time', 'fully_loaded_time', 'report_url']))

Getting missing results


In [7]:
# Use this in case you get e.g. a “The page took too long to load” or “Unable to analyze your site” error.
# It will contain missing medias. You can loop through it in the previous cell.
missing_medias = medias[(-medias['media_id'].isin(df_speed['media_id']))]

In [8]:
missing_medias.head(3)

Unnamed: 0,media_id,Name,URL_short,URL,URL_mobile


In [9]:
# To check for a report after an error
print("https:\/\/gtmetrix.com\/reports\/www.lacote.ch\/ZrEyp4s4".replace('\\', ''))

https://gtmetrix.com/reports/www.lacote.ch/ZrEyp4s4


In [10]:
# add current timestamp
df_speed['timestamp'] = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')
df_speed

Unnamed: 0,Name,media_id,pagespeed_score,page_load_time,fully_loaded_time,report_url,timestamp
0,La Tribune de Genève,19,34,10.302,10.613,https://gtmetrix.com/reports/www.tdg.ch/wMvInlFG,2018-09-01 18:14:52
0,24 heures,20,32,9.357,9.757,https://gtmetrix.com/reports/www.24heures.ch/8...,2018-09-01 18:14:52
0,Le Temps,21,49,7.038,7.401,https://gtmetrix.com/reports/www.letemps.ch/6I...,2018-09-01 18:14:52
0,Le Monde,22,40,7.028,22.471,https://gtmetrix.com/reports/www.lemonde.fr/fZ...,2018-09-01 18:14:52
0,RTS info,23,56,7.109,7.572,https://gtmetrix.com/reports/www.rts.ch/Kuw3qDcR,2018-09-01 18:14:52
0,20 minutes (ch),24,23,8.928,12.621,https://gtmetrix.com/reports/www.20min.ch/Ptvd...,2018-09-01 18:14:52
0,Le Matin,25,0,12.455,13.466,https://gtmetrix.com/reports/www.lematin.ch/2E...,2018-09-01 18:14:52
0,Mediapart,26,38,4.578,5.059,https://gtmetrix.com/reports/www.mediapart.fr/...,2018-09-01 18:14:52
0,Le Figaro,27,30,8.879,28.661,https://gtmetrix.com/reports/www.lefigaro.fr/X...,2018-09-01 18:14:52
0,Libération,28,5,12.148,30.425,https://gtmetrix.com/reports/www.liberation.fr...,2018-09-01 18:14:52


In [11]:
outputfile = 'df/archive/speed_metrics_{}.csv'.format( datetime.datetime.now().strftime('%Y-%m-%d') )
print('Saving to {}...'.format(outputfile))

Saving to df/archive/speed_metrics_2018-09-01.csv...


In [12]:
df_speed.to_csv(outputfile) # archive
df_speed.to_csv('df/speed_metrics.csv') # temp file