# SQLAlchemy and the SQL magic extension

To be able to execute SQL queries from within a Jupyter notebook, we will use the `sql_magic` extension (https://github.com/pivotal/sql_magic)

    sudo pip3 install -U sql_magic
    !conda install -c conda-forge ipython-sql 

## 1. Connecting to a remote database

In [None]:
# Use the SQL alchemy package to connect
from sqlalchemy import create_engine

# Supply credentials and point to the database we're trying to connect to and
host = 'db.ipeirotis.org'
user = 'student'
password = 'dwdstudent2015'
engine = create_engine(f'mysql://{user}:{password}@{host}/?charset=utf8')

# Load the SQL_magic extension and configure the connection
%reload_ext sql_magic
%config SQL.conn_name = 'engine'

### Running queries 

To write SQL queries within the notebook, you can include `%%read_sql` on the first line of your code cell. 

If our connection worked successfully, the `show databases` command will list the same databases you see when you open MySQL Workbench. Run the code below to test that the connection is working:

In [None]:
%%read_sql
SHOW DATABASES;

In [None]:
%%read_sql
USE imdb;
SHOW TABLES;

After `%%read_sql`, you can add a name that you want to use to store your result.    
This will put the result of your query in a DataFrame with the name of your choice.

For example, run the following to select two records from the movies table:

In [None]:
%%read_sql test_query
SELECT * 
FROM movies 
LIMIT 2;

Now, you can run the line below to see the contents stored in the `test_query` DataFrame:

In [None]:
test_query

## 2 . Connecting to your local database

Now, instead of querying a remote database, we'll insert data into our local database. Try to connect as follows:

In [None]:
# We use the SQL alchemy package to connect
from sqlalchemy import create_engine

# We supply our credentials and point to the database we're trying to connect to 
host = '127.0.0.1'
user = 'root'
password = 'dwdstudent2015'
engine = create_engine(f'mysql://{user}:{password}@{host}/?charset=utf8')

# Load the SQL_magic extension and configure the connection
%reload_ext sql_magic
%config SQL.conn_name = 'engine'

### Create a database

Once we have connected successfully, we need to create a database. Let's make one called `test_db`:

In [None]:
%%read_sql
CREATE DATABASE IF NOT EXISTS test_db DEFAULT CHARACTER SET 'utf8'

### Create a table

Then we create the table where we will store our data. For example, let's create a table called `test_table` with four variables: `id`, `name`, `purchase`, and `cost`.

In [None]:
%%read_sql
CREATE TABLE IF NOT EXISTS test_db.test_table
                                (id int, 
                                name varchar(50), 
                                purchase varchar(50),
                                cost int,
                                PRIMARY KEY(id))

### Use the database

In [None]:
%%read_sql
USE test_db

### Import data into our table

Finally, we import the data into our table, using the INSERT command. 

In [None]:
%%read_sql
INSERT INTO test_table (id, name, purchase, cost) 
VALUES (1, 'Amy', 'Apples', 5);

In [None]:
%%read_sql
INSERT INTO test_table (id, name, purchase, cost) 
VALUES (2, 'Bill', 'Barley', 4);

Let's check if it worked:

In [None]:
%%read_sql
SELECT * FROM test_table;

### Passing parameters into queries

Instead of relying on the SQL magic, we can also pass a query to SQLAlchemy as an argument:

In [None]:
con = engine.connect()

In [None]:
%%read_sql
USE test_db

In [None]:
data = "3, 'Carolyn', 'Cabbage', 6"
query = f'INSERT INTO test_table (id, name, purchase, cost) VALUES ( {data} )'

engine.execute(query)

This is helpful if we would like to run many queries -- for example, by embedding them in a loop:

In [None]:
for data in ["4, 'Dave', 'Dill', 8",
             "5, 'Eve', 'Endive', 10", 
             "6, 'Fred', 'Figs', '12'"]:
    
    query = f'INSERT INTO test_table (id, name, purchase, cost) VALUES ( {data} )'
    engine.execute(query)

### Delete a table or database

Let's remove this extra database, since it was just for testing.

In [None]:
%%read_sql
DROP TABLE IF EXISTS test_table;
DROP DATABASE IF EXISTS test_db;

## 3. Importing datasets to your SQL database

Now let's try to actually insert a substantive dataset. We'll use the citibike stations dataset, which I've posted at: [http://people.stern.nyu.edu/khoffman/intro_programming_datasci/assets/csv/citibike_stations.txt](http://people.stern.nyu.edu/khoffman/intro_programming_datasci/assets/csv/citibike_stations.txt)

We'll use Pandas to read the CSV file with the syntax: `pd.read_csv(url)`.
Note that `read_csv` can also take as an argument a path to a local file on your computer.

In [None]:
import pandas as pd 

url = 'http://people.stern.nyu.edu/khoffman/intro_programming_datasci/assets/csv/citibike_stations.txt'
stations_data = pd.read_csv(url)

# Inspect the first 5 rows of the result
stations_data.head(5)

#### Inserting a Pandas dataframe into your database

You should still be connected to your database from the exercises above. Otherwise, re-run the code below to connect again:

In [None]:
'''
# We use the SQL alchemy package to connect
from sqlalchemy import create_engine

# We supply our credentials and point to the database we're trying to connect to and
host = '127.0.0.1'
user = 'root'
password = 'dwdstudent2015'
engine = create_engine(f'mysql://{user}:{password}@{host}/?charset=utf8')

# Load the SQL_magic extension and configure the connection
%reload_ext sql_magic
%config SQL.conn_name = 'engine'
'''

Now, let's create a new database to store our citibike stations data.

In [None]:
%%read_sql
CREATE DATABASE IF NOT EXISTS citibike_stations DEFAULT CHARACTER SET 'utf8';
USE citibike_stations;

Then we create the table where we will store our data. Since we already have the data in a Pandas DataFrame, it is very easy to put the data in a database. First, let's declare our variable types:

In [None]:
# This step is typically optional, but it is good practice to define explicitly the 
# data types before storing things in a database. In many cases, this can be ommitted, though.
import sqlalchemy
dtype = {
    'capacity': sqlalchemy.types.SMALLINT(),
    'eightd_has_key_dispenser':  sqlalchemy.types.BOOLEAN,
    'lat': sqlalchemy.types.Float, 
    'lon': sqlalchemy.types.Float,
    'name': sqlalchemy.types.VARCHAR(50),
    'region_id': sqlalchemy.types.VARCHAR(5),
    'rental_url': sqlalchemy.types.VARCHAR(100),
    'short_name': sqlalchemy.types.VARCHAR(10),
    'station_id': sqlalchemy.types.SMALLINT()
}

In [None]:
stations_data.to_sql(
          name = 'stations',              # Desired name of the table
          schema = 'citibike_stations',   # Name of the database
          con = engine,                   
          if_exists = 'replace', 
          index = False, 
          dtype = dtype)                  # Can be omitted if you don't want to declare explicitly

In [None]:
# Once we have the data in the table, we also specify a primary key
# If we had FOREIGN KEYS we can add them in the same way
add_key_query = 'ALTER TABLE stations ADD PRIMARY KEY(station_id)'
engine.execute(add_key_query)

In [None]:
%%read_sql
SELECT * 
FROM stations
WHERE capacity<15
LIMIT 10;

#### Retrieving a Pandas dataframe from your database

As noted above, you can also extract a Pandas dataframe _from_ your database by supplying the dataframe name after `%%read_sql`.

In [None]:
%%read_sql stations4
SELECT * 
FROM stations
WHERE name LIKE '%% 4 St &%%';

In [None]:
stations4

#### Exporting a Pandas dataframe to a file

Finally, you can download your dataframe into a CSV file:

In [None]:
stations4.to_csv("fourth_street_stations.csv")

#### Clean up

In [None]:
%%read_sql
DROP DATABASE IF EXISTS citibike_stations;