# All material ©2019, Alex Siegman

---

## Welcome to 'Projects in Programming & Data Science' – we're going to jump right in to the mix. 

### Today we are going to leverage the CitiBike API to populate a MySQL database at regular intervals. Consider this your warm-up for the semester. 

---

## MySQL Setup

To install, in your terminal: 

1. brew install mysql
2. unset TMPDIR
3. mysql_secure_installation
4. enter your preferred password
5. skip all options presented

To run, in your terminal: 

1. mysql -u root -p
2. enter password
3. quit

To set privileges: 

1. mysql -u root -p
2. enter password
3. grant all privileges on *.* to 'username'@'localhost' identified by 'password' 

> _to change your password, use mysql> alter user 'user'@'localhost' identified by 'newpassword'_

4. flush privileges
5. quit

To create database: 

1. mysql> create database citibike

### https://streamdata.io/developers/api-gallery/new-york-citibike-api/

In [1]:
# first, let's request the json from the CitiBike API URL

import json 
import urllib.request

with urllib.request.urlopen("https://gbfs.citibikenyc.com/gbfs/en/station_status.json") as url:
    data = json.loads(url.read().decode())
    print(data)

{'last_updated': 1564510593, 'ttl': 10, 'data': {'stations': [{'station_id': '168', 'num_bikes_available': 6, 'num_ebikes_available': 0, 'num_bikes_disabled': 2, 'num_docks_available': 39, 'num_docks_disabled': 0, 'is_installed': 1, 'is_renting': 1, 'is_returning': 0, 'last_reported': 1564510588, 'eightd_has_available_keys': False, 'eightd_active_station_services': [{'id': 'bedaaf2b-8664-469e-8681-26ff8059765b'}]}, {'station_id': '281', 'num_bikes_available': 0, 'num_ebikes_available': 0, 'num_bikes_disabled': 1, 'num_docks_available': 64, 'num_docks_disabled': 1, 'is_installed': 1, 'is_renting': 1, 'is_returning': 0, 'last_reported': 1564510535, 'eightd_has_available_keys': True, 'eightd_active_station_services': [{'id': '32461582-cd1e-4ecf-a5ea-563593fa7009'}]}, {'station_id': '285', 'num_bikes_available': 0, 'num_ebikes_available': 0, 'num_bikes_disabled': 0, 'num_docks_available': 0, 'num_docks_disabled': 0, 'is_installed': 1, 'is_renting': 0, 'is_returning': 0, 'last_reported': 18

In [2]:
stations = data['data']['stations']

In [3]:
import pandas as pd

df_stations = pd.DataFrame(stations)
df_stations.head()

Unnamed: 0,eightd_active_station_services,eightd_has_available_keys,is_installed,is_renting,is_returning,last_reported,num_bikes_available,num_bikes_disabled,num_docks_available,num_docks_disabled,num_ebikes_available,station_id
0,[{'id': 'bedaaf2b-8664-469e-8681-26ff8059765b'}],False,1,1,0,1564510588,6,2,39,0,0,168
1,[{'id': '32461582-cd1e-4ecf-a5ea-563593fa7009'}],True,1,1,0,1564510535,0,1,64,1,0,281
2,,True,1,0,0,18001,0,0,0,0,0,285
3,[{'id': 'a58d9e34-2f28-40eb-b4a6-c8c01375657a'}],True,1,1,0,1564510351,10,1,22,0,0,304
4,[{'id': '8ec29d39-9642-466a-9a20-aad1e5c4788a'}],False,1,1,0,1564510532,5,1,31,0,0,337


In [4]:
import MySQLdb

In [5]:
db = MySQLdb.connect(passwd="password",db="citibike")

In [6]:
c = db.cursor()

In [7]:
# create table

c.execute("CREATE TABLE IF NOT EXISTS StationInfo (station_id int, num_ebikes_available int, num_docks_disabled int, num_docks_available int, num_bikes_disabled int, num_bikes_available int, last_reported varchar(250), is_returning int, is_renting int, is_installed int, eightd_has_available_keys bool);")

c.close()
db.commit()

In [8]:
from datetime import datetime

# populate table

# we fetch for now just the time-invariant data. Notice that we have the INSERT IGNORE so that even when we add the same entry
# again, we do not get an error that the line exists. We do get warnings, but this is expected

c = db.cursor()

query_template = """INSERT IGNORE INTO StationInfo(station_id, num_ebikes_available, num_docks_disabled, num_docks_available, num_bikes_disabled, num_bikes_available, last_reported, is_returning, is_renting, is_installed, eightd_has_available_keys) VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s);"""

for entry in stations:
    station_id = int(entry['station_id'])
    num_ebikes_available = int(entry['num_ebikes_available'])
    num_docks_disabled = int(entry['num_docks_disabled'])
    num_docks_available = int(entry['num_docks_available'])
    num_bikes_disabled = int(entry['num_bikes_disabled'])
    num_bikes_available = int(entry['num_bikes_available'])
    last_reported = str(entry['last_reported'])
    is_returning = int(entry['is_returning'])
    is_renting = int(entry['is_renting'])
    is_installed = int(entry['is_installed'])
    eightd_has_available_keys = bool(entry['eightd_has_available_keys'])
                           
    print("Inserting Station:", station_id, num_ebikes_available, num_docks_disabled, num_docks_available, num_bikes_disabled, num_bikes_available, last_reported, is_returning, is_renting, is_installed, eightd_has_available_keys) 
    query_parameters = (station_id, num_ebikes_available, num_docks_disabled, num_docks_available, num_bikes_disabled, num_bikes_available, last_reported, is_returning, is_renting, is_installed, eightd_has_available_keys) 
   
    c.execute(query_template, query_parameters)

c.close()
db.commit()

Inserting Station: 168 0 0 39 2 6 1564510588 0 1 1 False
Inserting Station: 281 0 1 64 1 0 1564510535 0 1 1 True
Inserting Station: 285 0 0 0 0 0 18001 0 0 1 True
Inserting Station: 304 0 0 22 1 10 1564510351 0 1 1 True
Inserting Station: 337 0 0 31 1 5 1564510532 0 1 1 False
Inserting Station: 347 0 0 19 1 15 1564510389 0 1 1 False
Inserting Station: 359 0 0 48 1 15 1564510579 0 1 1 False
Inserting Station: 377 0 0 23 0 24 1564510549 0 1 1 False
Inserting Station: 388 0 0 21 0 14 1564510422 0 1 1 False
Inserting Station: 402 0 0 29 1 9 1564510424 0 1 1 False
Inserting Station: 426 0 0 12 0 17 1564510415 0 1 1 True
Inserting Station: 484 0 0 37 0 9 1564510097 0 1 1 False
Inserting Station: 491 0 0 34 3 14 1564510570 0 1 1 False
Inserting Station: 520 0 0 39 1 1 1564510374 0 1 1 False
Inserting Station: 3092 0 0 21 2 4 1564510289 0 1 1 False
Inserting Station: 3233 0 0 31 0 8 1564510574 0 1 1 False
Inserting Station: 3443 0 0 38 1 2 1564510466 0 1 1 False
Inserting Station: 3459 0 0 24 

Inserting Station: 539 0 0 1 1 29 1564509997 1 1 1 False
Inserting Station: 540 0 0 14 2 14 1564510212 1 1 1 False
Inserting Station: 545 0 0 16 2 9 1564509623 1 1 1 False
Inserting Station: 546 0 0 17 2 26 1564510570 1 1 1 False
Inserting Station: 2000 0 0 2 1 27 1564510351 1 1 1 False
Inserting Station: 2002 0 0 1 0 26 1564510557 1 1 1 False
Inserting Station: 2003 0 0 50 1 3 1564510592 1 1 1 False
Inserting Station: 2005 0 0 2 0 10 1564509897 1 1 1 False
Inserting Station: 2006 0 0 13 1 35 1564510517 1 1 1 True
Inserting Station: 2008 0 0 12 0 12 1564510451 1 1 1 True
Inserting Station: 2009 0 0 31 2 2 1564510283 1 1 1 False
Inserting Station: 2010 0 0 0 0 0 18001 0 0 0 False
Inserting Station: 2012 0 0 14 2 20 1564510545 1 1 1 False
Inserting Station: 2017 0 0 18 0 21 1564509339 1 1 1 False
Inserting Station: 2021 0 0 42 1 0 1564509448 1 1 1 False
Inserting Station: 2022 0 0 22 0 11 1564510248 1 1 1 False
Inserting Station: 2023 0 0 0 0 0 18001 0 0 0 False
Inserting Station: 3002 0

Inserting Station: 3195 0 0 2 1 31 1564508791 1 1 1 False
Inserting Station: 3196 0 0 10 1 7 1564493621 1 1 1 False
Inserting Station: 3198 0 0 12 1 5 1564510213 1 1 1 False
Inserting Station: 3199 0 0 9 0 5 1564509319 1 1 1 False
Inserting Station: 3201 0 0 15 0 2 1564506278 1 1 1 False
Inserting Station: 3202 0 0 4 0 14 1564509751 1 1 1 False
Inserting Station: 3203 0 0 15 0 11 1564508541 1 1 1 False
Inserting Station: 3205 0 0 13 0 9 1564508062 1 1 1 False
Inserting Station: 3206 0 0 24 1 1 1564508373 1 1 1 False
Inserting Station: 3207 0 0 5 3 18 1564510142 1 1 1 False
Inserting Station: 3209 0 0 21 0 1 1564504607 1 1 1 False
Inserting Station: 3210 0 0 14 0 4 1564507532 1 1 1 False
Inserting Station: 3211 0 0 21 0 1 1564509398 1 1 1 False
Inserting Station: 3212 0 0 22 0 0 1564500702 1 1 1 False
Inserting Station: 3213 0 1 14 0 6 1564509554 1 1 1 False
Inserting Station: 3214 0 0 15 0 7 1564506788 1 1 1 False
Inserting Station: 3220 0 0 12 1 5 1564501221 1 1 1 False
Inserting Stat

Inserting Station: 3495 0 0 6 1 19 1564509999 1 1 1 False
Inserting Station: 3496 0 0 15 0 10 1564510397 1 1 1 False
Inserting Station: 3497 0 0 9 1 13 1564510320 1 1 1 False
Inserting Station: 3498 0 0 6 0 15 1564510551 1 1 1 False
Inserting Station: 3499 0 0 12 2 11 1564502285 1 1 1 False
Inserting Station: 3500 0 0 1 0 30 1564509965 1 1 1 False
Inserting Station: 3501 0 0 9 0 20 1564508016 1 1 1 False
Inserting Station: 3502 0 0 4 0 25 1564509753 1 1 1 False
Inserting Station: 3503 0 0 4 0 26 1564505605 1 1 1 False
Inserting Station: 3504 0 0 5 1 21 1564510433 1 1 1 True
Inserting Station: 3505 0 0 9 1 23 1564510495 1 1 1 False
Inserting Station: 3506 0 0 16 1 12 1564510148 1 1 1 False
Inserting Station: 3507 0 0 21 0 15 1564510326 1 1 1 False
Inserting Station: 3508 0 0 20 1 10 1564509616 1 1 1 False
Inserting Station: 3509 0 0 19 0 12 1564509635 1 1 1 False
Inserting Station: 3510 0 0 16 0 15 1564508984 1 1 1 False
Inserting Station: 3511 0 0 10 0 12 1564506595 1 1 1 False
Inserti

Inserting Station: 3715 0 0 6 0 25 1564510348 1 1 1 False
Inserting Station: 3716 0 0 21 0 2 1564505036 1 1 1 False
Inserting Station: 3718 0 0 45 1 0 1564510442 1 1 1 False
Inserting Station: 3720 0 0 4 0 19 1564507667 1 1 1 False
Inserting Station: 3721 0 0 7 0 28 1564510262 1 1 1 False
Inserting Station: 3723 0 0 9 0 21 1564509655 1 1 1 True
Inserting Station: 3724 0 0 19 1 19 1564510462 1 1 1 False
Inserting Station: 3725 0 0 38 0 3 1564508812 1 1 1 False
Inserting Station: 3726 0 0 4 0 16 1564509921 1 1 1 False
Inserting Station: 3727 0 0 34 0 4 1564509255 1 1 1 False
Inserting Station: 3728 0 0 19 0 5 1564510425 1 1 1 False
Inserting Station: 3733 0 0 43 2 1 1564507888 1 1 1 False
Inserting Station: 3734 0 0 42 0 8 1564510407 1 1 1 False
Inserting Station: 3737 0 0 33 1 1 1564509310 1 1 1 False
Inserting Station: 3745 0 0 18 0 9 1564505096 1 1 1 False
Inserting Station: 3746 0 0 4 3 41 1564510563 1 1 1 False
Inserting Station: 3747 0 0 32 0 1 1564510527 1 1 1 False
Inserting Stat

In [9]:
c = db.cursor()

c.execute("SELECT * FROM StationInfo LIMIT 5;")
rows = c.fetchall()

print(rows)

c.close()
db.commit()

((168, 0, 0, 38, 1, 8, '1564509309', 0, 1, 1, 0), (281, 0, 1, 60, 1, 4, '1564509454', 0, 1, 1, 1), (285, 0, 0, 0, 0, 0, '18001', 0, 0, 1, 1), (304, 0, 0, 20, 1, 12, '1564509353', 0, 1, 1, 1), (337, 0, 0, 31, 1, 5, '1564509023', 0, 1, 1, 0))


---

## I hope this has helped you find your coding legs! Next week we'll get back to descriptive analytics using Python and Pandas. For now, take time to refresh yourself on the content covered in "Introduction to Programming". 

## If you need a referesher, check out the "Supplementary Info" directory in the class repo.