# SQL in Jupyter Notebook

## Objectives

* Create and access local database using sqlite3
* Create a table
* Insert data into the table
* Query data from the table
* Retrieve the result set into a pandas dataframe

## Import Library

[ipython-sql](https://pypi.org/project/ipython-sql/) introduces a %sql (or %%sql) magic to your notebook allowing you to connect to a database, using [SQLAlchemy](https://www.sqlalchemy.org/) connect strings, then issue SQL commands within IPython or IPython Notebook.

In [1]:
# load the ipython-sql extension
%load_ext sql
import pandas as pd

## Create a SQLite Database

SQLite connects to file-based databases, using the Python built-in module [sqlite3](https://docs.python.org/3/library/sqlite3.html) by default.

As SQLite connects to local files, the URL format is slightly different. The “file” portion of the URL is the filename of the database. For a relative file path, this requires three slashes.

Using the %sql magic from ipython-sql and connect strings from SQLAlchemy, we can create a SQLite database or connect it in case it already exists just like:

In [2]:
%sql sqlite:///mydb.db

'Connected: @mydb.db'

## Create a Table

For convenience, we can use %%sql (two %'s instead of one) at the top of a cell to indicate we want the entire cell to be treated as SQL. 

Let's use this to create a table and fill it with some test data for experimenting.

* If the table already existed in the database, an error will pop up. 
* Set **PRIMARY KEY** on **AlbumID** to prevent from inserting duplicate writers into the table.

In [3]:
%%sql
CREATE TABLE ABBA(
    AlbumID INTEGER NOT NULL PRIMARY KEY,
    Title VARCHAR(50) NOT NULL,
    Released INTEGER NOT NULL
);

 * sqlite:///mydb.db
Done.


[]

## List Tables in a Database

You can verify that the table creation was successful by retrieving the list of all tables

In [4]:
# Retrive table list 
%sql SELECT name FROM sqlite_master WHERE type='table'

 * sqlite:///mydb.db
Done.


name
ABBA


## Insert Data to the Table

The table **ABBA** you created is empty. Let's insert some data to the table.

* Title    Released    Sales
* Ring Ring    1973    616627
* Waterloo    1974    942473
* ABBA    1975    1588000
* Arrival    1976    6212100
* Abba: The Album    1977    3377848
* Voulez-Vous    1979    2710155
* Super Trouper    1980    2249062
* The Visitors    1981    1234802
* Voyage    2021    1028000

Insert Row AlbumID, Title,and Released

In [5]:
%%sql
INSERT INTO ABBA 
(AlbumID, Title, Released)
VALUES 
(1, 'Ring Ring', 1973),
(2, 'Waterloo', 1974),
(3, 'ABBA', 1975),
(4, 'Arrival', 1976),
(5, 'Abba: The Album', 1977),
(6, 'Voulez-Vous', 1979),
(7, 'Super Trouper', 1980),
(8, 'The Visitors', 1981),
(9, 'Voyage', 2021);

 * sqlite:///mydb.db
9 rows affected.


[]

Create a new column Sales and add data

In [6]:
%%sql
ALTER TABLE ABBA
ADD COLUMN Sales INTEGER;

 * sqlite:///mydb.db
Done.


[]

In [7]:
%%sql
UPDATE ABBA
SET Sales = 616627
WHERE Title = 'Ring Ring';

UPDATE ABBA
SET Sales = 942473
WHERE Title = 'Waterloo';

UPDATE ABBA
SET Sales = 1588000
WHERE Title = 'ABBA';

UPDATE ABBA
SET Sales = 6212100
WHERE Title = 'Arrival';

UPDATE ABBA
SET Sales = 3377848
WHERE Title = 'Abba: The Album';

UPDATE ABBA
SET Sales = 2710155
WHERE Title = 'Voulez-Vous';

UPDATE ABBA
SET Sales = 2249062
WHERE Title = 'Super Trouper';

UPDATE ABBA
SET Sales = 1234802
WHERE Title = 'The Visitors';

UPDATE ABBA
SET Sales = 1028000
WHERE Title = 'Voyage';

 * sqlite:///mydb.db
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.


[]

## Query Data in the Table

You can verify that the table data you added. 

In [8]:
#Select all columns from ABBA and save to abba
abba = %sql SELECT * from ABBA
type(abba)

 * sqlite:///mydb.db
Done.


sql.run.ResultSet

You can use  %%sql and convert to Pandas DataFrame

> When use %%sql whole cell will be treated as SQL and regular `# comment` is not working. In this case, you need to use `/*comment*/` just like SQL comment.

In [9]:
%%sql result << 
/*Select all columns from ABBA and save to result*/
SELECT * 
FROM ABBA;

 * sqlite:///mydb.db
Done.
Returning data to local variable result


In [10]:
df = pd.DataFrame(result)
df.head()

Unnamed: 0,AlbumID,Title,Released,Sales
0,1,Ring Ring,1973,616627
1,2,Waterloo,1974,942473
2,3,ABBA,1975,1588000
3,4,Arrival,1976,6212100
4,5,Abba: The Album,1977,3377848


In [11]:
%sql SELECT * FROM ABBA LIMIT 2;

 * sqlite:///mydb.db
Done.


AlbumID,Title,Released,Sales
1,Ring Ring,1973,616627
2,Waterloo,1974,942473


In [8]:
%sql SELECT COUNT(*) FROM ABBA

 * sqlite://
Done.


COUNT(*)
0


In [12]:
%sql SELECT MAX(Sales), Title FROM ABBA;

 * sqlite:///mydb.db
Done.


MAX(Sales),Title
6212100,Arrival


In [13]:
%%sql 
SELECT * FROM ABBA 
WHERE Released = 2021;

 * sqlite:///mydb.db
Done.


AlbumID,Title,Released,Sales
9,Voyage,2021,1028000


In [14]:
%%sql 
SELECT COUNT(DISTINCT Title) FROM ABBA

 * sqlite:///mydb.db
Done.


COUNT(DISTINCT Title)
9
