# Sakila database

The Sakila sample database was initially developed by Mike Hillyer, a former member of the MySQL AB documentation team, and is intended to provide a standard schema that can be used for examples in books, tutorials, articles, samples, and so forth.

![Image](images/sakila-schema.png)

# Get the database  

In [None]:
# Remove datasets
#!rm datasets/*sakila*
# Download Files
#!curl -o datasets/sqlite-sakila-schema.sql https://raw.githubusercontent.com/jOOQ/jOOQ/master/jOOQ-examples/Sakila/sqlite-sakila-db/sqlite-sakila-schema.sql
#!curl -o datasets/sqlite-sakila-insert-data.sql https://raw.githubusercontent.com/jOOQ/jOOQ/master/jOOQ-examples/Sakila/sqlite-sakila-db/sqlite-sakila-insert-data.sql
#Import Files
#!sqlite3 datasets/sakila.db < datasets/sqlite-sakila-schema.sql    
#!sqlite3 datasets/sakila.db < datasets/sqlite-sakila-insert-data.sql

# Package Requirements

In [1]:
!pip install --upgrade pip

Collecting pip
[?25l  Downloading https://files.pythonhosted.org/packages/00/b6/9cfa56b4081ad13874b0c6f96af8ce16cfbc1cb06bedf8e9164ce5551ec1/pip-19.3.1-py2.py3-none-any.whl (1.4MB)
[K     |████████████████████████████████| 1.4MB 3.2MB/s eta 0:00:01     |████████████████████▋           | 911kB 3.2MB/s eta 0:00:01
[?25hInstalling collected packages: pip
  Found existing installation: pip 19.2.2
    Uninstalling pip-19.2.2:
      Successfully uninstalled pip-19.2.2
Successfully installed pip-19.3.1


In [None]:
!pip install sqlalchemy ipython-sql

# Jupyter Magic 

Magic functions are pre-defined functions(“magics”) in Jupyter kernel that executes supplied commands. 
There are two kinds of magics 
- line-oriented % 
- cell-oriented %%

# SQL Extension

## Load Extension

In [3]:
%load_ext sql

## Connection

In [4]:
%sql sqlite:///datasets/sakila.db

'Connected: @datasets/sakila.db'

# Basic Query

In [5]:
%sql select * from city limit 10

 * sqlite:///datasets/sakila.db
Done.


city_id,city,country_id,last_update
1,A Corua (La Corua),87,2019-10-25 18:28:28
2,Abha,82,2019-10-25 18:28:28
3,Abu Dhabi,101,2019-10-25 18:28:28
4,Acua,60,2019-10-25 18:28:28
5,Adana,97,2019-10-25 18:28:28
6,Addis Abeba,31,2019-10-25 18:28:28
7,Aden,107,2019-10-25 18:28:28
8,Adoni,44,2019-10-25 18:28:28
9,Ahmadnagar,44,2019-10-25 18:28:28
10,Akishima,50,2019-10-25 18:28:28


# Integration with Pandas

In [6]:
result = %sql select * from city;  
dataframe = result.DataFrame()

 * sqlite:///datasets/sakila.db
Done.


In [7]:
dataframe.describe()

Unnamed: 0,city_id,country_id
count,600.0,600.0
mean,300.5,56.4
std,173.349358,30.064594
min,1.0,1.0
25%,150.75,28.75
50%,300.5,50.0
75%,450.25,80.0
max,600.0,109.0


In [8]:
dataframe.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 600 entries, 0 to 599
Data columns (total 4 columns):
city_id        600 non-null int64
city           600 non-null object
country_id     600 non-null int64
last_update    600 non-null object
dtypes: int64(2), object(2)
memory usage: 18.8+ KB


In [9]:
dataframe.head()

Unnamed: 0,city_id,city,country_id,last_update
0,1,A Corua (La Corua),87,2019-10-25 18:28:28
1,2,Abha,82,2019-10-25 18:28:28
2,3,Abu Dhabi,101,2019-10-25 18:28:28
3,4,Acua,60,2019-10-25 18:28:28
4,5,Adana,97,2019-10-25 18:28:28


In [29]:
country = %sql select country,count(distinct(c.customer_id)) from customer c join address a on a.address_id=c.address_id join city ci on ci.city_id=a.city_id join country co on co.country_id=ci.country_id group by country order by 2 desc

 * sqlite:///datasets/sakila.db
Done.


In [30]:
dataframe = country.DataFrame()

In [31]:
dataframe.head()

Unnamed: 0,country,count(distinct(c.customer_id))
0,India,60
1,China,53
2,United States,36
3,Japan,31
4,Mexico,30


In [32]:
dataframe.to_csv(r'datasets/test.csv')