## Background and Motivation

For the past few weeks I've been working on learning Python and building a data science toolkit. Until now, this had largely entailed following canned tutorials and working with data that came from flat files downloaded from some online resource.

Given that this is not where most data scientists obtain their data, I wanted to do a small project that would require obtaining data using an API, populating a database, and building a dashboard on top of said database. 

In [1]:
import pandas as pd
import numpy as np
import requests # used for api calls
import json # parse json body
from pandas.io.json import json_normalize
import psycopg2
from sqlalchemy import create_engine


  """)


# strava api

After creating a dummy app on strava, you get an assortment of keys that enable you to request information. I stored my keys in a config.py file that is not part of the github repository, however you can see contextually what information I'm pulling from it

In [2]:
import config # create a configuration file to hide your API keys

In [3]:
# this refreshes your access code - previously created an access code that had greater permissions
refresh = {'client_id':config.client_id,
          'client_secret':config.client_secret,
          'grant_type':'refresh_token',
          'refresh_token':config.refresh_token}
response = requests.post('https://www.strava.com/oauth/token', params=refresh)

In [4]:
# confirm connection
access_token = response.json()['access_token']
header = {'Authorization': 'Bearer {}'.format(access_token)}

In [7]:
# update weight to verify other permissions are working
d = {"weight":"178"}
resp = requests.put('https://www.strava.com/api/v3/athlete', headers=header, data=d)
resp.json()['weight']

178.0

# postgres

In [23]:
# create an engine - run once
engine = create_engine('postgresql://jakekirsch:@localhost/jakekirsch')

In [22]:
# create table
drop_create_table = """
DROP TABLE IF EXISTS activities;

CREATE TABLE activities (
achievement_count text,
athlete_id text,
athlete_resource_state text,
athlete_count text,
average_cadence float,
average_heartrate float,
average_speed float,
average_temp float,
comment_count float,
commute boolean,
display_hide_heartrate_option boolean,
distance float,
device_watts text,
elapsed_time float,
elev_high float,
elev_low float,
end_latlng text,
external_id text,
flagged boolean,
from_accepted_tag boolean,
gear_id text,
has_heartrate boolean,
has_kudoed boolean,
heartrate_opt_out boolean,
id bigint PRIMARY KEY,
kudos_count text,
location_city text,
location_country text,
location_state text,
manual boolean,
map_id text,
map_resource_state text,
map_summary_polyline text,
max_heartrate float,
max_speed float,
moving_time float,
name text,
photo_count text,
pr_count text,
private boolean,
resource_state text,
start_date timestamp,
start_date_local timestamp,
start_latitude text,
start_latlng text,
start_longitude text,
timezone text,
total_elevation_gain float,
total_photo_count text,
trainer boolean,
type text,
upload_id text,
utc_offset float,
visibility text,
workout_type text
);"""


In [24]:
# create the table
engine.execute(drop_create_table)

<sqlalchemy.engine.result.ResultProxy at 0x1111753c8>

In [25]:
# get stats
param = {"page":1,
        "per_page":35}
athlete_id = config.athlete_id
athlete_stats = requests.get('https://www.strava.com/api/v3/athletes/{}/stats'.format(athlete_id), 
                             headers = header)

In [26]:
# total count of activities
total_activities = sum([athlete_stats.json()['all_ride_totals']['count'], 
                        athlete_stats.json()['all_run_totals']['count'], 
                        athlete_stats.json()['all_swim_totals']['count']])

In [27]:
# params for api calls - we'll split this into pages because there is a limit to the per_page request
per_page = 100
(total_activities / per_page) + 1

4.2

In [28]:
for page in np.arange(1, (total_activities / per_page) + 2, 1):
    activities = requests.get('https://www.strava.com/api/v3/athlete/activities', headers = header, 
                                 params = {
                                     'page': page,
                                     'per_page':per_page
                                 })
    activities_df = json_normalize(activities.json(),  sep="_")
    activities_df.to_sql('activities', con=engine, if_exists="append", index=False)
    

In [29]:
test = pd.read_sql_table('activities', con=engine)

This was a simple example of pulling data down from an API and using some awesome tools to populate a database.