<a href="https://colab.research.google.com/github/ipeirotis/introduction-to-databases/blob/master/session1/A5-Inserting_Data_in_MySQL_using_Python.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!sudo pip3 install -U -q PyMySQL sqlalchemy sql_magic

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.8/44.8 kB[0m [31m1.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.0/3.0 MB[0m [31m40.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.6/1.6 MB[0m [31m42.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m121.9/121.9 kB[0m [31m11.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m93.4/93.4 kB[0m [31m6.0 MB/s[0m eta [36m0:00:00[0m
[?25h

## Inserting data in MySQL using Python

First let's start with a basic piece of code that fetches the data that we want to insert in the database. For our example, we will get the data about the Citibike stations, using the correspoding API call provided by the Citibike website:

In [2]:
import requests
import uuid
from datetime import date, datetime, timedelta

In [3]:
# Let's get the data from the Citibike API
url = "https://gbfs.citibikenyc.com/gbfs/en/station_information.json"
results = requests.get(url).json()

In [4]:
# We only need a subset of the data in the JSON returned by the Citibike API, so we keep only we need
data = results["data"]["stations"]

In [5]:
data[1]

{'rental_methods': ['KEY', 'CREDITCARD'],
 'lat': 40.763604677958625,
 'external_id': 'b442a648-e9f4-4893-951a-64d258bc0e55',
 'lon': -73.98917958140373,
 'capacity': 30,
 'station_id': 'b442a648-e9f4-4893-951a-64d258bc0e55',
 'eightd_has_key_dispenser': False,
 'station_type': 'classic',
 'region_id': '71',
 'electric_bike_surcharge_waiver': False,
 'name': 'W 50 St & 9 Ave',
 'has_kiosk': True,
 'short_name': '6854.05',
 'rental_uris': {'android': 'https://bkn.lft.to/lastmile_qr_scan',
  'ios': 'https://bkn.lft.to/lastmile_qr_scan'},
 'eightd_station_services': []}

In [6]:
len(data)

2098

In [7]:
from sqlalchemy import create_engine
from sqlalchemy import text

conn_string = "mysql+pymysql://{user}:{password}@{host}/".format(
    host="db.ipeirotis.org", user="student", password="dwdstudent2015"
)

engine = create_engine(conn_string)

Once we have connected successfully, we need to create our database:

In [8]:
# Query to create a database
# In this example, we will try to create the (existing) database "public"
# But in general, we can give any name to the database
db_name = "public"
create_db_query = (
    f"CREATE DATABASE IF NOT EXISTS {db_name} DEFAULT CHARACTER SET 'utf8'"
)

# Create a database
with engine.connect() as connection:
  connection.execute(text(create_db_query))

Then we create the table where we will store our data. For our example, we will just import three fields in the database: station_id, station_name, and number_of_docks

In [9]:
# To avoid conflicts between people writing in the same database, we add a random suffix in the tables
# We only create the variable once while running the notebook
if "suffix" not in globals():
    suffix = str(uuid.uuid4())[:8]
print(suffix)

d94ec55c


In [10]:
table_name = f"Docks_{suffix}"

# Drop the table if there is one already
drop_table_query = f"DROP TABLE IF EXISTS {db_name}.{table_name}"
with engine.connect() as connection:
  connection.execute(text(drop_table_query))

# Create a table
create_table_query = f"""CREATE TABLE IF NOT EXISTS {db_name}.{table_name}
                                (station_id varchar(50),
                                station_name varchar(50),
                                capacity int,
                                PRIMARY KEY(station_id)
                                )"""

with engine.connect() as connection:
  connection.execute(text(create_table_query))


Finally, we import the data into our table, using the INSERT command. (_Note: The `INSERT IGNORE` directs the database to ignore attempts to insert another tuple with the same primary key. In our case, we do not want to allow two entries for the same `station_id`._)

In [None]:
query_template = f"""
                    INSERT IGNORE INTO
                    {db_name}.{table_name}(station_id,  station_name,  capacity)
                    VALUES (:station_id, :station_name, :capacity)
                  """

# THIS IS PROHIBITED
# query = "INSERT INTO citibike.Docks(station_id, station_name, number_of_docks) " + \
#         "VALUES ("+entry["id"]+", "+entry["stationName"]+", "+entry["totalDocks"]+")"

with engine.connect() as connection:
  for entry in data:
      query_parameters = {
          "station_id": entry["station_id"],
          "station_name": entry["name"],
          "capacity": entry["capacity"]
      }
      print("Inserting station", entry["station_id"], "at", entry["name"], "with", entry["capacity"], "docks")
      connection.execute(text(query_template), query_parameters)
  connection.commit()

## Query the Database to retrieve the data

Now let's see how to query the database

In [12]:
with engine.connect() as connection:
  results = connection.execute(text(f"SELECT station_id, station_name, capacity FROM {db_name}.{table_name}"))
  rows = results.mappings().all()


In [13]:
# Let's check how many data points we got back
print(f"Number of rows: {len(rows)}")
print("=============================================")

Number of rows: 2098


In [None]:
# And now let's go over the results
for row in rows:
    print("Station ID:", row['station_id'])
    print("Station Name:", row['station_name'])
    print("Number of Docks:", row['capacity'])
    print("=============================================")

Finally, let's clean up and close our database connection.

In [15]:
drop_table_query = f"DROP TABLE IF EXISTS {db_name}.{table_name}"
with engine.connect() as connection:
  connection.execute(text(drop_table_query))

## Exercise

At `https://gbfs.citibikenyc.com/gbfs/en/station_status.json` we can access the live status of all the stations (e.g., bikes available etc). Using the approach outlined above, create a table in the database (using the same table suffix that we created above) and store the data in the database.