# Lesson 12: A Brief Introduction to SQL Clauses

Data scientists work with data stored in tables. This lesson introduces relations, one of the most widely used ways to represent data tables. We'll also introduce SQL, the standard programming language for working with relations.

This lesson introduces operations for taking subsets of relations. When data scientists begin working with a relation, they often want to subset the specific data that they plan to use. For example, a data scientist can slice out the ten relevant features from a relation with hundreds of columns. Or, they can filter a relation to remove rows with incomplete data. For the rest of this chapter, we'll introduce relation operations using a dataset of baby names.

To work with relations, we'll introduce a domain-specific programming language called **SQL (Structured Query Language)**. We commonly pronounce *"SQL"* like *"sequel"* instead of spelling out the acronym. SQL is a specialized language for working with relations—as such, SQL has its own syntax that makes it easier to write programs that operate on relational data.

In [1]:
import pandas as pd
import numpy as np

In [2]:
people = pd.read_csv('data/sf-people.csv')
homes = pd.read_csv('data/sf-homes.csv')
pets = pd.read_csv('data/sf-pets.csv')

In [3]:
people.head()

Unnamed: 0,name,sex,age
0,Austin,M,33
1,Blair,M,90
2,Carolina,F,28
3,Dani,F,41
4,Donald,M,70


In [4]:
homes.head()

Unnamed: 0,owner_name,area,value
0,Austin,urban,145000
1,Blair,suburbs,95000
2,Carolina,suburbs,220000
3,Carolina,urban,190000
4,Dani,country,67000


In [5]:
pets.head()

Unnamed: 0,name,type,owner_name
0,Maru,cat,Austin
1,Icey,dog,Blair
2,Maxie,dog,Blair
3,Rex,dog,Carolina
4,Harambe,bird,Dani


## 1. What is SQL Lite?

- [Relational Database](https://www.ibm.com/topics/relational-databases)
- [SQLite](https://sqlite.org/index.html)
- [Self-contained](https://www.oreilly.com/library/view/using-sqlite/9781449394592/ch01s01.html)


Our database is stored in a file called `simplefolks.db`. This file is a SQLite database, so we'll set up a `sqlalchemy` object that can process this format.

In [6]:
import sqlalchemy
db = sqlalchemy.create_engine('sqlite:///data/simplefolks.db')

Let's inspect the tables.

In [7]:
insp = sqlalchemy.inspect(db)
insp.get_table_names()

['homes', 'people', 'pets']

In [8]:
insp.get_columns('people')

[{'name': 'name',
  'type': TEXT(),
  'nullable': False,
  'default': None,
  'primary_key': 0},
 {'name': 'sex',
  'type': TEXT(),
  'nullable': False,
  'default': None,
  'primary_key': 0},
 {'name': 'age',
  'type': INTEGER(),
  'nullable': False,
  'default': None,
  'primary_key': 0}]

A relation has rows and columns. Every column has a label. 

|**name**|**sex**|**age**|
|--------|-------|-------|
|Austin|M|33|
|Blair|M|90|
|Carolina|F|28|

Unlike dataframes, however, individual rows in a relation don't have labels. Also, unlike dataframes, rows of a relation aren't ordered.

### SELECT

In [10]:
query = ''' 
SELECT name, age, sex
FROM people
WHERE age > 28
LIMIT 5;
'''

pd.read_sql(sql=sqlalchemy.text(query), con=db.connect())

Unnamed: 0,name,age,sex
0,Austin,33,M
1,Blair,90,M
2,Dani,41,F
3,Donald,70,M
4,Eliza,37,F


In [17]:
query = ''' 
SELECT name
FROM people;
'''

pd.read_sql(sql=sqlalchemy.text(query), con=db.connect())

Unnamed: 0,name
0,Austin
1,Blair
2,Carolina
3,Dani
4,Donald
5,Eliza
6,Farida
7,Georgina
8,Hillary
9,Leland


### WHERE

In [18]:
query = ''' 
SELECT name 
FROM people
WHERE sex = 'M';
'''

pd.read_sql(sql=sqlalchemy.text(query), con=db.connect())

Unnamed: 0,name
0,Austin
1,Blair
2,Donald
3,Leland
4,Liam
5,Michael
6,Zed


In [20]:
query = ''' 
SELECT name, age
FROM people
WHERE sex = 'M'
LIMIT 5;
'''

pd.read_sql(sql=sqlalchemy.text(query), con=db.connect())

Unnamed: 0,name,age
0,Austin,33
1,Blair,90
2,Donald,70
3,Leland,16
4,Liam,22


In [22]:
query = ''' 
SELECT name, age, sex
FROM people
WHERE age >= 48 AND sex = 'M';
'''

pd.read_sql(sql=sqlalchemy.text(query), con=db.connect())

Unnamed: 0,name,age,sex
0,Blair,90,M
1,Donald,70,M
2,Michael,48,M


### ORDER BY

In [4]:
query = ''' 
SELECT name, age, sex
FROM people
WHERE age >= 48 AND sex = 'M'
ORDER BY age;
'''

pd.read_sql(sql=sqlalchemy.text(query), con=db.connect())

Unnamed: 0,name,age,sex
0,Michael,48,M
1,Donald,70,M
2,Blair,90,M


In [6]:
query = ''' 
SELECT name, age, sex
FROM people
WHERE age >= 48 AND sex = 'M'
ORDER BY age DESC;
'''

pd.read_sql(sql=sqlalchemy.text(query), con=db.connect())

Unnamed: 0,name,age,sex
0,Blair,90,M
1,Donald,70,M
2,Michael,48,M


### Aggregation

In [7]:
query = ''' 
SELECT * 
FROM homes;
'''

pd.read_sql(sql=sqlalchemy.text(query), con=db.connect())

Unnamed: 0,owner_name,area,value
0,Austin,urban,145000
1,Blair,suburbs,95000
2,Carolina,suburbs,220000
3,Carolina,urban,190000
4,Dani,country,67000
5,Donald,urban,450000
6,Donald,urban,260000
7,Donald,urban,660000
8,Eliza,urban,210000
9,Farida,suburbs,180000


In [46]:
query = ''' 
SELECT * 
FROM homes;
'''

pd.read_sql(sql=sqlalchemy.text(query), con=db.connect())

Unnamed: 0,owner_name,area,value
0,Austin,urban,145000
1,Blair,suburbs,95000
2,Carolina,suburbs,220000
3,Carolina,urban,190000
4,Dani,country,67000
5,Donald,urban,450000
6,Donald,urban,260000
7,Donald,urban,660000
8,Eliza,urban,210000
9,Farida,suburbs,180000


In [9]:
query = ''' 
SELECT SUM(value) as total_value
FROM homes;
'''

pd.read_sql(sql=sqlalchemy.text(query), con=db.connect())

Unnamed: 0,total_value
0,4247000


### GROUP BY

In [13]:
query = ''' 
SELECT area as location, SUM(value) as total_value
FROM homes
GROUP BY area;
'''

pd.read_sql(sql=sqlalchemy.text(query), con=db.connect())

Unnamed: 0,location,total_value
0,country,830000
1,suburbs,815000
2,urban,2602000


In [14]:
query = ''' 
SELECT owner_name, area, SUM(value)
FROM homes
WHERE owner_name = 'Donald';
'''

pd.read_sql(sql=sqlalchemy.text(query), con=db.connect())

Unnamed: 0,owner_name,area,SUM(value)
0,Donald,urban,1370000


In [16]:
query = ''' 
SELECT owner_name, area, SUM(value) as total_value
FROM homes
WHERE owner_name = 'Donald';
'''

pd.read_sql(sql=sqlalchemy.text(query), con=db.connect())

Unnamed: 0,owner_name,area,total_value
0,Donald,urban,1370000


### JOIN

In [19]:
insp.get_columns('people')

[{'name': 'name',
  'type': TEXT(),
  'nullable': False,
  'default': None,
  'autoincrement': 'auto',
  'primary_key': 0},
 {'name': 'sex',
  'type': TEXT(),
  'nullable': False,
  'default': None,
  'autoincrement': 'auto',
  'primary_key': 0},
 {'name': 'age',
  'type': INTEGER(),
  'nullable': False,
  'default': None,
  'autoincrement': 'auto',
  'primary_key': 0}]

In [20]:
insp.get_columns('pets')

[{'name': 'name',
  'type': TEXT(),
  'nullable': False,
  'default': None,
  'autoincrement': 'auto',
  'primary_key': 0},
 {'name': 'type',
  'type': TEXT(),
  'nullable': False,
  'default': None,
  'autoincrement': 'auto',
  'primary_key': 0},
 {'name': 'owner_name',
  'type': TEXT(),
  'nullable': False,
  'default': None,
  'autoincrement': 'auto',
  'primary_key': 0}]

In [25]:
query = ''' 
SELECT *
FROM people;
'''

pd.read_sql(sql=sqlalchemy.text(query), con=db.connect())

Unnamed: 0,name,sex,age
0,Austin,M,33
1,Blair,M,90
2,Carolina,F,28
3,Dani,F,41
4,Donald,M,70
5,Eliza,F,37
6,Farida,F,23
7,Georgina,F,19
8,Hillary,F,68
9,Leland,M,16


In [24]:
query = ''' 
SELECT *
FROM pets;
'''

pd.read_sql(sql=sqlalchemy.text(query), con=db.connect())

Unnamed: 0,name,type,owner_name
0,Maru,cat,Austin
1,Icey,dog,Blair
2,Maxie,dog,Blair
3,Rex,dog,Carolina
4,Harambe,bird,Dani
5,Syd,dog,Dani
6,Artemis,cat,Dani
7,Mr. Muggles,cat,Donald
8,Meowser,cat,Donald
9,Donald,cat,Donald


In [23]:
# Inner by default. 
# If there is not a match in 
# both tables nothing is returned.
query = ''' 
SELECT *
FROM people JOIN pets
    ON people.name = pets.owner_name;
'''

pd.read_sql(sql=sqlalchemy.text(query), con=db.connect())

Unnamed: 0,name,sex,age,name.1,type,owner_name
0,Austin,M,33,Maru,cat,Austin
1,Blair,M,90,Icey,dog,Blair
2,Blair,M,90,Maxie,dog,Blair
3,Carolina,F,28,Rex,dog,Carolina
4,Dani,F,41,Artemis,cat,Dani
5,Dani,F,41,Harambe,bird,Dani
6,Dani,F,41,Syd,dog,Dani
7,Donald,M,70,Donald,cat,Donald
8,Donald,M,70,Meowser,cat,Donald
9,Donald,M,70,Mr. Muggles,cat,Donald


In [28]:
# Left join returns all observations 
# from the left table. 
# If there is no match the values 
# from the right table will be None.
query = ''' 
SELECT *
FROM people LEFT JOIN pets
    ON people.name = pets.owner_name;
'''

pd.read_sql(sql=sqlalchemy.text(query), con=db.connect())

Unnamed: 0,name,sex,age,name.1,type,owner_name
0,Austin,M,33,Maru,cat,Austin
1,Blair,M,90,Icey,dog,Blair
2,Blair,M,90,Maxie,dog,Blair
3,Carolina,F,28,Rex,dog,Carolina
4,Dani,F,41,Artemis,cat,Dani
5,Dani,F,41,Harambe,bird,Dani
6,Dani,F,41,Syd,dog,Dani
7,Donald,M,70,Donald,cat,Donald
8,Donald,M,70,Meowser,cat,Donald
9,Donald,M,70,Mr. Muggles,cat,Donald


![northwinds.png](images/northwinds.png)

In [11]:
nw = sqlalchemy.create_engine('sqlite:///data/northwind.db')

Let's inspect the tables.

In [12]:
insp = sqlalchemy.inspect(nw)
insp.get_table_names()

['Categories',
 'CustomerCustomerDemo',
 'CustomerDemographics',
 'Customers',
 'EmployeeTerritories',
 'Employees',
 'Order Details',
 'Orders',
 'Products',
 'Regions',
 'Shippers',
 'Suppliers',
 'Territories']

In [13]:
insp.get_columns('Orders')

[{'name': 'OrderID',
  'type': INTEGER(),
  'nullable': False,
  'default': None,
  'primary_key': 1},
 {'name': 'CustomerID',
  'type': TEXT(),
  'nullable': True,
  'default': None,
  'primary_key': 0},
 {'name': 'EmployeeID',
  'type': INTEGER(),
  'nullable': True,
  'default': None,
  'primary_key': 0},
 {'name': 'OrderDate',
  'type': DATETIME(),
  'nullable': True,
  'default': None,
  'primary_key': 0},
 {'name': 'RequiredDate',
  'type': DATETIME(),
  'nullable': True,
  'default': None,
  'primary_key': 0},
 {'name': 'ShippedDate',
  'type': DATETIME(),
  'nullable': True,
  'default': None,
  'primary_key': 0},
 {'name': 'ShipVia',
  'type': INTEGER(),
  'nullable': True,
  'default': None,
  'primary_key': 0},
 {'name': 'Freight',
  'type': NUMERIC(),
  'nullable': True,
  'default': '0',
  'primary_key': 0},
 {'name': 'ShipName',
  'type': TEXT(),
  'nullable': True,
  'default': None,
  'primary_key': 0},
 {'name': 'ShipAddress',
  'type': TEXT(),
  'nullable': True,
  'd

In [16]:
query = ''' 
SELECT *
FROM Employees JOIN Orders
    ON Employees.EmployeeID=Orders.EmployeeID
WHERE Employees.EmployeeID = 1;
'''

pd.read_sql(sql=sqlalchemy.text(query), con=nw.connect())

Unnamed: 0,EmployeeID,LastName,FirstName,Title,TitleOfCourtesy,BirthDate,HireDate,Address,City,Region,...,RequiredDate,ShippedDate,ShipVia,Freight,ShipName,ShipAddress,ShipCity,ShipRegion,ShipPostalCode,ShipCountry
0,1,Davolio,Nancy,Sales Representative,Ms.,1968-12-08,2012-05-01,507 - 20th Ave. E.Apt. 2A,Seattle,North America,...,2016-08-14,2016-07-23,1,140.51,Ernst Handel,Kirchgasse 6,Graz,Western Europe,8010,Austria
1,1,Davolio,Nancy,Sales Representative,Ms.,1968-12-08,2012-05-01,507 - 20th Ave. E.Apt. 2A,Seattle,North America,...,2016-08-29,2016-08-02,1,136.54,Wartian Herkku,Torikatu 38,Oulu,Scandinavia,90110,Finland
2,1,Davolio,Nancy,Sales Representative,Ms.,1968-12-08,2012-05-01,507 - 20th Ave. E.Apt. 2A,Seattle,North America,...,2016-09-04,2016-08-09,1,26.93,Magazzini Alimentari Riuniti,Via Ludovico il Moro 22,Bergamo,Southern Europe,24100,Italy
3,1,Davolio,Nancy,Sales Representative,Ms.,1968-12-08,2012-05-01,507 - 20th Ave. E.Apt. 2A,Seattle,North America,...,2016-09-17,2016-08-26,2,76.83,QUICK-Stop,Taucherstraße 10,Cunewalde,Western Europe,1307,Germany
4,1,Davolio,Nancy,Sales Representative,Ms.,1968-12-08,2012-05-01,507 - 20th Ave. E.Apt. 2A,Seattle,North America,...,2016-09-25,2016-09-02,2,1.35,Tradiçao Hipermercados,"Av. Inês de Castro, 414",Sao Paulo,South America,05634-030,Brazil
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
118,1,Davolio,Nancy,Sales Representative,Ms.,1968-12-08,2012-05-01,507 - 20th Ave. E.Apt. 2A,Seattle,North America,...,2018-05-29,2018-05-04,1,30.09,Save-a-lot Markets,187 Suffolk Ln.,Boise,North America,83720,USA
119,1,Davolio,Nancy,Sales Representative,Ms.,1968-12-08,2012-05-01,507 - 20th Ave. E.Apt. 2A,Seattle,North America,...,2018-05-18,2018-05-06,2,7.98,Drachenblut Delikatessen,Walserweg 21,Aachen,Western Europe,52066,Germany
120,1,Davolio,Nancy,Sales Representative,Ms.,1968-12-08,2012-05-01,507 - 20th Ave. E.Apt. 2A,Seattle,North America,...,2018-06-01,2018-05-06,2,15.67,Tortuga Restaurante,Avda. Azteca 123,México D.F.,Central America,5033,Mexico
121,1,Davolio,Nancy,Sales Representative,Ms.,1968-12-08,2012-05-01,507 - 20th Ave. E.Apt. 2A,Seattle,North America,...,2018-06-02,,1,0.93,LILA-Supermercado,Carrera 52 con Ave. Bolívar #65-98 Llano Largo,Barquisimeto,South America,3508,Venezuela
