## Inserting data in MySQL using Python

First let's start with a basic piece of code that fetches the data that we want to insert in the database. For our example, we will get the data about the Citibike stations, using the correspoding API call provided by the Citibike website:

In [None]:
import requests
import json

In [None]:
# Let's get the data from the Citibike API
url = 'http://www.citibikenyc.com/stations/json'
resp = requests.get(url)

In [None]:
# We transform the returned JSON answer from the API into a Python dictionary object
results = json.loads(resp.text)

In [None]:
# We only need a subset of the data in the JSON returned by the Citibike API, so we keep only we need
data = results["stationBeanList"]

In [None]:
# Let's see which Citibike stations are full

# Let's write a function, that takes as input a dictionary
# corresponding to a station, checks the "availableDocks"
# entry, and returns True if and only if availableDocks == 0
def isStationFull(station):
    if station.get("availableDocks") == 0:
        if station.get("statusValue") == "Not In Service":
            # this is not really functioning
            return False
        else: 
            return True
    else:
        return False
    
def isStationEmpty(station):
    available = station.get("availableDocks")
    total = station.get("totalDocks")
    status = station.get("statusValue")
    if (available == total and status != "Not In Service"):
        return True
    else:
        return False

print "Stations that are full"
for station in data:
    if isStationFull(station):
        print station["id"], station["stationName"]
        
print "\n\nStations that are empty"
for station in data:
    if isStationEmpty(station):
        print station["id"], station["stationName"]

In [None]:
# The code below is just to be able to look at the results in an easy-to-read DataFrame format.
# Our subsequent code that interacts with MySQL will operate using the dictionary object named "data".
import pandas

df = pandas.DataFrame(data)
df

Now we will connect to our MySQL server. We will use the MySQLdb library of Python.

If you do not have the library, you need to install it by typing in the shell:

`sudo apt-get install python-mysqldb`

In [None]:
import MySQLdb as mdb
import sys

con = mdb.connect(host = 'localhost', 
                  user = 'root', 
                  passwd = 'dwdstudent2015', 
                  charset='utf8', use_unicode=True);

Once we have connected successfully, we need to create our database:

In [None]:
# Query to create a database
db_name = 'citibike'
create_db_query = "CREATE DATABASE IF NOT EXISTS {0} DEFAULT CHARACTER SET 'utf8'".format(db_name)

# Create a database
cursor = con.cursor()
cursor.execute(create_db_query)
cursor.close()

Then we create the table where we will store our data. For our example, we will just import three fields in the database: station_id, station_name, and number_of_docks

In [None]:
cursor = con.cursor()
db_name = 'citibike'
table_name = 'Docks'
# Create a table
# The {0} and {1} are placeholders for the parameters in the format(....) statement
create_table_query = '''CREATE TABLE IF NOT EXISTS {0}.{1} 
                                (station_id int, 
                                station_name varchar(250), 
                                number_of_docks int,
                                available_docks int,
                                PRIMARY KEY(station_id)
                                )'''.format(db_name, table_name)
cursor.execute(create_table_query)
cursor.close()

Finally, we import the data into our table, using the INSERT command. 

In [None]:
query_template = '''INSERT INTO 
    citibike.Docks(station_id, station_name, number_of_docks, available_docks) 
    VALUES (%s, %s, %s, %s)'''

cursor = con.cursor()

# THIS IS PROHIBITED
# query = "INSERT INTO citibike.Docks(station_id, station_name, number_of_docks) VALUES ("+entry["id"]+", "+entry["stationName"]+", "+entry["totalDocks"]+")"

for entry in data:
    dockid = entry["id"]
    addr = entry["stationName"]
    docks = entry["totalDocks"]
    available = entry["availableDocks"]
    print "Inserting station", dockid, "at", addr
    query_parameters = (dockid, addr, docks, available)
    cursor.execute(query_template, query_parameters)
    con.commit()

cursor.close()

In [None]:
cur = con.cursor(mdb.cursors.DictCursor)
cur.execute("SELECT * FROM citibike.Docks")
rows = cur.fetchall()

for row in rows:
    print "Station ID:", row["station_id"]
    print "Station Name:", row["station_name"]
    print "Number of Docks:", row["number_of_docks"]
    print "Available Docks:", row["available_docks"]
    print "============================================="

We can, of course, transform the results back into a DataFrame (see below) or we can use the data directly from the rows object (which is a tuple, containing one dictionary object for each line of the results).

In [None]:
rows

In [None]:
df_from_sql = pd.DataFrame(list(rows))
df_from_sql

Finally, let's close our database connection.

In [None]:
con.close()