# Querying a SQL database in a Jupyter notebook

Use [ipython-sql](https://github.com/catherinedevlin/ipython-sql) which provides `%%sql` magic for IPython (and therefore Jupyter notebooks).

In [1]:
# Setup

import os

import pandas as pd
from sqlalchemy import create_engine

NOTEBOOK_DIR = os.getcwd()
DATA_DIR = os.path.abspath(os.path.join(NOTEBOOK_DIR, os.pardir, 'data'))
DATA_DIR_TMP = os.path.join(DATA_DIR, 'tmp')

os.makedirs(DATA_DIR_TMP, exist_ok=True)

In [2]:
# Bootstrap code to create a database.

DB_PATH = os.path.join(DATA_DIR_TMP, 'example.db')
# Note that for SQLite databases with an absolute path, you need three
# slashes after the `sqlite:`, so that's four slashes in total when
# you count the leading `/` of the path.
DB_URL = f"sqlite:///{DB_PATH}"

engine = create_engine(DB_URL)

df = pd.DataFrame({'name' : ['User 1', 'User 2', 'User 3']})
df.to_sql('users', con=engine, if_exists='replace')

## Load the extension

In [3]:
%load_ext sql

## Connect to the database

This uses [SQLAlchemy database URLs](http://docs.sqlalchemy.org/en/latest/core/engines.html#database-urls).

Note that you can access Python variables by prefixing them with '$'.

In [4]:
%sql $DB_URL

'Connected: @/Users/ghing/workspace/python-data-cheatsheet/data/tmp/example.db'

## Query the database

In [5]:
%%sql
SELECT * FROM users;

 * sqlite:////Users/ghing/workspace/python-data-cheatsheet/data/tmp/example.db
Done.


index,name
0,User 1
1,User 2
2,User 3


## Save the query results to a Pandas DataFrame

You can specify a variable name where the query results will be stored as part of the `%%sql` magic. Then you can conver the results object to a Pandas `DataFrame` using the `DataFrame()` method.

In [6]:
%%sql users_results << 
SELECT * FROM users;

 * sqlite:////Users/ghing/workspace/python-data-cheatsheet/data/tmp/example.db
Done.
Returning data to local variable users_results


In [7]:
users_results\
    .DataFrame()

Unnamed: 0,index,name
0,0,User 1
1,1,User 2
2,2,User 3


## `pd.read_sql`

Pandas also has a [`read_sql`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_sql.html) method for querying a SQL database.

In [8]:
# Query a SQL database using pd.read_sql

# Pandas' database interactions use SQLAlchemy behind the scences, so it's
# worth checking out their documentation, at least for database
# connections:
# https://docs.sqlalchemy.org/en/13/core/engines.html

engine = create_engine(DB_URL)

sql_query = """
SELECT * FROM users;
"""

results_df = pd.read_sql(
    sql_query,
    con=engine
)

results_df

Unnamed: 0,index,name
0,0,User 1
1,1,User 2
2,2,User 3


## Other things to track

I haven't used it, but [jupyterlab-sql](https://github.com/pbugnion/jupyterlab-sql) is a SQL user interface to JupyterLab.