## Inserting data in MySQL using Python

First let's start with a basic piece of code that fetches the data that we want to insert in the database. For our example, we will get the data about the Citibike stations, using the correspoding API call provided by the Citibike website:

In [1]:
import requests

from datetime import date, datetime, timedelta

In [2]:
# Let's get the data from the Citibike API
url = 'https://gbfs.citibikenyc.com/gbfs/en/station_information.json'
results = requests.get(url).json() 

In [3]:
# We only need a subset of the data in the JSON returned by the Citibike API, so we keep only we need
data = results["data"]["stations"]

In [4]:
len(data)

1393

Now we will connect to our MySQL server. We will use the MySQLdb library of Python.

If you do not have the library, you need to install it by typing in the shell:


In [6]:
!sudo pip3 install -U -q PyMySQL sqlalchemy sql_magic

In [8]:
import sqlalchemy
from sqlalchemy import create_engine

conn_string = 'mysql+pymysql://{user}:{password}@{host}/'.format(
    host = 'db.ipeirotis.org', 
    user = 'student',
    password = 'dwdstudent2015')

engine = create_engine(conn_string)

Once we have connected successfully, we need to create our database:

In [10]:
# Query to create a database
# In this example, we will try to create the (existing) database "public"
# But in general, we can give any name to the database
db_name = 'public'
create_db_query = f"CREATE DATABASE IF NOT EXISTS {db_name} DEFAULT CHARACTER SET 'utf8'"

# Create a database
engine.execute(create_db_query)

<sqlalchemy.engine.cursor.LegacyCursorResult at 0x7fad2d4c3690>

Then we create the table where we will store our data. For our example, we will just import three fields in the database: station_id, station_name, and number_of_docks

In [11]:
# To avoid conflicts between people writing in the same database, we add a random suffix in the tables
# We only create the variable once while running the notebook
import uuid
if 'suffix' not in globals():
    suffix = str(uuid.uuid4())[:8]
print(suffix)

d212a09e


In [12]:
data[1]

{'capacity': 33,
 'eightd_has_key_dispenser': False,
 'eightd_station_services': [],
 'electric_bike_surcharge_waiver': False,
 'external_id': '66db269c-0aca-11e7-82f6-3863bb44ef7c',
 'has_kiosk': True,
 'lat': 40.71911552,
 'legacy_id': '79',
 'lon': -74.00666661,
 'name': 'Franklin St & W Broadway',
 'region_id': '71',
 'rental_methods': ['CREDITCARD', 'KEY'],
 'rental_uris': {'android': 'https://bkn.lft.to/lastmile_qr_scan',
  'ios': 'https://bkn.lft.to/lastmile_qr_scan'},
 'short_name': '5430.08',
 'station_id': '79',
 'station_type': 'classic'}

In [15]:
table_name = f'Docks_{suffix}'
# Create a table
create_table_query = f'''CREATE TABLE IF NOT EXISTS {db_name}.{table_name} 
                                (station_id int, 
                                station_name varchar(250), 
                                capacity int,
                                PRIMARY KEY(station_id)
                                )'''
engine.execute(create_table_query)


<sqlalchemy.engine.cursor.LegacyCursorResult at 0x7fad2d452850>

Finally, we import the data into our table, using the INSERT command. 

In [16]:
query_template = f'''INSERT INTO {db_name}.{table_name}(station_id, 
                                            station_name, 
                                            capacity) 
                    VALUES (%s, %s, %s)'''

# THIS IS PROHIBITED
# query = "INSERT INTO citibike.Docks(station_id, station_name, number_of_docks) VALUES ("+entry["id"]+", "+entry["stationName"]+", "+entry["totalDocks"]+")"


for entry in data:
    dockid = entry["station_id"]
    addr = entry["name"]
    docks = entry["capacity"]
    # available = entry["availableDocks"]
    # date =  datetime.now()
    # lastcommunicationtime is a string of 
    # the form "2016-02-09 10:16:49 AM"
    # See https://docs.python.org/2/library/datetime.html#strftime-and-strptime-behavior
    # to see the documentation on how to parse 
    # date = datetime.strptime(entry["lastCommunicationTime"], '%Y-%m-%d %I:%M:%S %p')
    print("Inserting station", dockid, "at", addr)
    query_parameters = (dockid, addr, docks)
    engine.execute(query_template, query_parameters)



Inserting station 72 at W 52 St & 11 Ave
Inserting station 79 at Franklin St & W Broadway
Inserting station 82 at St James Pl & Pearl St
Inserting station 83 at Atlantic Ave & Fort Greene Pl
Inserting station 116 at W 17 St & 8 Ave
Inserting station 119 at Park Ave & St Edwards St
Inserting station 120 at Lexington Ave & Classon Ave
Inserting station 127 at Barrow St & Hudson St
Inserting station 128 at MacDougal St & Prince St
Inserting station 143 at Clinton St & Joralemon St
Inserting station 144 at Nassau St & Navy St
Inserting station 146 at Hudson St & Reade St
Inserting station 150 at E 2 St & Avenue C
Inserting station 151 at Cleveland Pl & Spring St
Inserting station 152 at Warren St & W Broadway
Inserting station 153 at E 40 St & 5 Ave
Inserting station 157 at Henry St & Atlantic Ave
Inserting station 161 at LaGuardia Pl & W 3 St
Inserting station 164 at E 47 St & 2 Ave
Inserting station 168 at W 18 St & 6 Ave
Inserting station 173 at Broadway & W 49 St
Inserting station 174 

Now let's see how to query the database

In [17]:
results = engine.execute(f"SELECT * FROM {db_name}.{table_name}")
rows = results.fetchall()
results.close()

In [18]:
for row in rows:
    print("Station ID:", row["station_id"])
    print("Station Name:", row["station_name"])
    print("Number of Docks:", row["capacity"])
    print("=============================================")

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
Station ID: 396
Station Name: Lefferts Pl & Franklin Ave
Number of Docks: 25
Station ID: 397
Station Name: Fulton St & Clermont Ave
Number of Docks: 75
Station ID: 398
Station Name: Atlantic Ave & Furman St
Number of Docks: 37
Station ID: 399
Station Name: Lafayette Ave & St James Pl
Number of Docks: 39
Station ID: 400
Station Name: Pitt St & Stanton St
Number of Docks: 15
Station ID: 401
Station Name: Allen St & Rivington St
Number of Docks: 42
Station ID: 402
Station Name: Broadway & E 22 St
Number of Docks: 41
Station ID: 403
Station Name: E 2 St & 2 Ave
Number of Docks: 58
Station ID: 405
Station Name: Washington St & Gansevoort St
Number of Docks: 51
Station ID: 406
Station Name: Hicks St & Montague St
Number of Docks: 20
Station ID: 408
Station Name: Market St & Cherry St
Number of Docks: 45
Station ID: 410
Station Name: Suffolk St & Stanton St
Number of Docks: 79
Station ID: 411
Station Name: E 6 St & Avenue D
Numb

Finally, let's clean up and close our database connection.

In [19]:
drop_table_query = f"DROP TABLE IF EXISTS {db_name}.{table_name}"
engine.execute(drop_table_query)

<sqlalchemy.engine.cursor.LegacyCursorResult at 0x7fad2d438f90>

## Exercise

At `https://gbfs.citibikenyc.com/gbfs/en/station_status.json` we can access the live status of all the stations (e.g., bikes available etc). Using the approach outlined above, create a table in the database (using the same table suffix that we created above) and store the data in the database.