## Inserting data in MySQL using Python

First let's start with a basic piece of code that fetches the data that we want to insert in the database. For our example, we will get the data about the Citibike stations, using the correspoding API call provided by the Citibike website:

In [1]:
import requests

In [2]:
# Let's get the data from the Citibike API
url = 'http://www.citibikenyc.com/stations/json'
results = requests.get(url).json() 

In [3]:
# We only need a subset of the data in the JSON returned by the Citibike API, so we keep only we need
data = results["stationBeanList"]

In [4]:
len(data)

813

Now we will connect to our MySQL server. We will use the MySQLdb library of Python.

If you do not have the library, you need to install it by typing in the shell:

`sudo apt-get install python-mysqldb`

In [5]:
import MySQLdb as mdb

con = mdb.connect(host = 'localhost', 
                  user = 'root', 
                  passwd = 'dwdstudent2015', 
                  charset='utf8', use_unicode=True);

Once we have connected successfully, we need to create our database:

In [6]:
# Query to create a database
db_name = 'citibike_mysql_test'
create_db_query = "CREATE DATABASE IF NOT EXISTS {db} DEFAULT CHARACTER SET 'utf8'".format(db=db_name)

# Create a database
cursor = con.cursor()
cursor.execute(create_db_query)
cursor.close()

  import sys


Then we create the table where we will store our data. For our example, we will just import three fields in the database: station_id, station_name, and number_of_docks

In [7]:
cursor = con.cursor()
table_name = 'Docks'
# Create a table
# The {db} and {table} are placeholders for the parameters in the format(....) statement
create_table_query = '''CREATE TABLE IF NOT EXISTS {db}.{table} 
                                (station_id int, 
                                station_name varchar(250), 
                                number_of_docks int,
                                available_docks int,
                                date datetime,
                                PRIMARY KEY(station_id, date)
                                )'''.format(db=db_name, table=table_name)
cursor.execute(create_table_query)
cursor.close()

  del sys.path[0]


Finally, we import the data into our table, using the INSERT command. 

In [8]:
from datetime import date, datetime, timedelta

query_template = '''INSERT INTO {db}.{table}(station_id, 
                                            station_name, 
                                            number_of_docks, 
                                            available_docks, 
                                            date) 
                    VALUES (%s, %s, %s, %s, %s)'''.format(db=db_name, table=table_name)
cursor = con.cursor()

# THIS IS PROHIBITED
# query = "INSERT INTO citibike.Docks(station_id, station_name, number_of_docks) VALUES ("+entry["id"]+", "+entry["stationName"]+", "+entry["totalDocks"]+")"
for entry in data:
    dockid = entry["id"]
    addr = entry["stationName"]
    docks = entry["totalDocks"]
    available = entry["availableDocks"]
    # date =  datetime.now()
    # lastcommunicationtime is a string of 
    # the form "2016-02-09 10:16:49 AM"
    # See https://docs.python.org/2/library/datetime.html#strftime-and-strptime-behavior
    # to see the documentation on how to parse 
    date = datetime.strptime(entry["lastCommunicationTime"], 
                             '%Y-%m-%d %I:%M:%S %p')
    print("Inserting station", dockid, "at", addr)
    query_parameters = (dockid, addr, docks, available, date)
    cursor.execute(query_template, query_parameters)

con.commit()
cursor.close()

Inserting station 72 at W 52 St & 11 Ave
Inserting station 79 at Franklin St & W Broadway
Inserting station 82 at St James Pl & Pearl St
Inserting station 83 at Atlantic Ave & Fort Greene Pl
Inserting station 116 at W 17 St & 8 Ave
Inserting station 119 at Park Ave & St Edwards St
Inserting station 120 at Lexington Ave & Classon Ave
Inserting station 127 at Barrow St & Hudson St
Inserting station 128 at MacDougal St & Prince St
Inserting station 143 at Clinton St & Joralemon St
Inserting station 144 at Nassau St & Navy St
Inserting station 146 at Hudson St & Reade St
Inserting station 150 at E 2 St & Avenue C
Inserting station 151 at Cleveland Pl & Spring St
Inserting station 152 at Warren St & Church St
Inserting station 157 at Henry St & Atlantic Ave
Inserting station 161 at LaGuardia Pl & W 3 St
Inserting station 164 at E 47 St & 2 Ave
Inserting station 167 at E 39 St & 3 Ave
Inserting station 168 at W 18 St & 6 Ave
Inserting station 173 at Broadway & W 49 St
Inserting station 174 a

Inserting station 3078 at Broadway & Roebling St
Inserting station 3080 at S 4 St & Rodney St
Inserting station 3081 at Graham Ave & Grand St
Inserting station 3082 at Hope St & Union Ave
Inserting station 3083 at Bushwick Ave & Powers St
Inserting station 3085 at Roebling St & N 4 St
Inserting station 3086 at Graham Ave & Conselyea St
Inserting station 3087 at Metropolitan Ave & Meeker Ave
Inserting station 3088 at Union Ave & Jackson St
Inserting station 3090 at N 8 St & Driggs Ave
Inserting station 3091 at Frost St & Meeker St
Inserting station 3092 at Berry St & N 8 St
Inserting station 3093 at N 6 St & Bedford Ave
Inserting station 3094 at Graham Ave & Withers St
Inserting station 3095 at Graham Ave & Herbert St
Inserting station 3096 at Union Ave & N 12 St
Inserting station 3100 at Nassau Ave & Newell St
Inserting station 3101 at N 12 St & Bedford Ave
Inserting station 3102 at Driggs Ave & Lorimer St
Inserting station 3103 at N 11 St & Wythe Ave
Inserting station 3105 at N 15 St 

IntegrityError: (1062, "Duplicate entry '3148-2017-09-05 09:05:47' for key 'PRIMARY'")

In [None]:
cur = con.cursor(mdb.cursors.DictCursor)
cur.execute("SELECT * FROM {db}.{table}".format(db=db_name, table=table_name))
rows = cur.fetchall()
cur.close()

In [None]:
for row in rows:
    print("Station ID:", row["station_id"])
    print("Station Name:", row["station_name"])
    print("Number of Docks:", row["number_of_docks"])
    print("Available Docks:", row["available_docks"])
    print("Last Communication:", row["date"])
    print("=============================================")
    


We can, of course, transform the results back into a DataFrame (see below) or we can use the data directly from the rows object (which is a tuple, containing one dictionary object for each line of the results).

In [None]:
import pandas as pd
cur = con.cursor(mdb.cursors.DictCursor)
cur.execute("SELECT * FROM {db}.{table}".format(db=db_name, table=table_name))
rows = cur.fetchall()
cur.close()

In [None]:
df_from_sql = pd.DataFrame(list(rows))
df_from_sql

In [None]:
# We can then compute functions directly on the dataframe
sum(df_from_sql["number_of_docks"])

In [None]:
# We can then compute functions directly on the dataframe
sum(df_from_sql["available_docks"])

In [None]:
# And we can also create 
df_from_sql["bikes_docked"] = df_from_sql["number_of_docks"] - df_from_sql["available_docks"]

In [None]:
sum(df_from_sql['bikes_docked'])

Finally, let's clean up and close our database connection.

In [None]:
create_db_query = "DROP DATABASE IF EXISTS {db}".format(db=db_name)

# Create a database
cursor = con.cursor()
cursor.execute(create_db_query)
cursor.close()

In [None]:
con.close()