# SQL with Python

Outline:
 - Introduction to SQL with Python
   . Importing Modules
   . Connecting to a Database
   . Viewing Tables
 - Selecting Data
 - Filtering Data
 - Ordering Data
 - Aggregate Functions (Counting and Summing)
 - Grouping and Ordering Data
 - Advanced Queries
 - Creating and Manipulating Databases

We'll use [SQLAlchemy](https://www.sqlalchemy.org) library in this Notebook. There are many types of databases.  SQLite and MySQL are used througout this study.

## Introduction to SQL with Python

### Importing Modules

In [3]:
# Import necessary modules
from sqlalchemy import create_engine  # Connecting to database
from sqlalchemy import Table # Reflecting & viewing data
from sqlalchemy import MetaData # Reflecting & viewing data
from sqlalchemy import select # Selecting data
from sqlalchemy import and_ # Filtering data
from sqlalchemy import func # Aggregate functions


### Connecting to a Database

First, we need to connect to our database. An **engine** is an interface to a database. We'll create an engine that connects to a local SQLite file named "census.sqlite".

In [4]:
# Connect to a SQLite Database

# Create an engine that connects to 'census.sqlite'. 
engine = create_engine("sqlite:///census.sqlite")

# Connect
connection = engine.connect()

In [5]:
# Connect to a MySQL Database

#
# CREDENTIALS
#

# Create an engine that connects to the census database
# engine = create_engine('mysql+pymysql://' + 
#                        username + ':' + password + '@' +
#                        host + ':' + port +
#                        '/' + database)

# engine = create_engine('mysql://scott:tiger@localhost/test')

In [6]:
# Connect to a PostgreSQL Database

#
# CREDENTIALS
#

# Create an engine that connects to the census database
# engine = create_engine('postgresql+psycopg2://' + 
#                        username + ':' + password + '@' +
#                        host + ':' + port +
#                        '/' + database)

### Viewing Tables

Table names can be viewed by using the `.table_names()` method on the _engine_.

In [7]:
# Table names
engine.table_names()

['census', 'state_fact']

Reflecting a table allows us to work with it in Python. After reading the databese we'll need to build the metadata.

> MetaData is a container object that keeps together many different features of a database (or multiple databases) being described. [Source](https://docs.sqlalchemy.org/en/latest/core/metadata.html#sqlalchemy.schema.MetaData)

In [8]:
# from sqlalchemy import MetaData
metadata = MetaData()

We'll now reflect the _census_ table by using the Table object. `metadata` is one of the arguments. `autoload=True` autoloads the columns with engine.

In [9]:
# Reflect census table from the engine
census = Table('census', metadata, autoload=True, autoload_with=engine)

# Print census table metadata
print(repr(census))

Table('census', MetaData(bind=None), Column('state', VARCHAR(length=30), table=<census>), Column('sex', VARCHAR(length=1), table=<census>), Column('age', INTEGER(), table=<census>), Column('pop2000', INTEGER(), table=<census>), Column('pop2008', INTEGER(), table=<census>), schema=None)


In [10]:
# Column names
census.columns.keys()

['state', 'sex', 'age', 'pop2000', 'pop2008']

In [11]:
# Table metadata
repr(metadata.tables['census'])

"Table('census', MetaData(bind=None), Column('state', VARCHAR(length=30), table=<census>), Column('sex', VARCHAR(length=1), table=<census>), Column('age', INTEGER(), table=<census>), Column('pop2000', INTEGER(), table=<census>), Column('pop2008', INTEGER(), table=<census>), schema=None)"

## Selecting Data

We can use raw SQL commands with applying `.execute()` to the connection.

In [12]:
# Build select statement for census table to select all records
stmt = 'SELECT * FROM census'

# Execute the statement (Result Proxy)
results_proxy = connection.execute(stmt)

# Fetch the results (Result Set)
results = results_proxy.fetchall()

# Print the first five results
results[:5]

[('Illinois', 'M', 0, 89600, 95012),
 ('Illinois', 'M', 1, 88445, 91829),
 ('Illinois', 'M', 2, 88729, 89547),
 ('Illinois', 'M', 3, 88868, 90037),
 ('Illinois', 'M', 4, 91947, 91111)]

SQLAlchemy helps us efficiently query the data in a Pythonic way. For instance, instead of writing the raw query, which may be different for different types of databases, we'll use the SQLAlchemy's `select()` function to get all the elements from the census table.

In [13]:
# from sqlalchemy import select

# Select query: Build select statement for census table
stmt = select([census])

# Print the statement
print(stmt)

SELECT census.state, census.sex, census.age, census.pop2000, census.pop2008 
FROM census


The output above shows us that `select([census])` gives us the same result with `SELECT * FROM census` query; in an independent manner though.

In [14]:
# Execute the statement and print the first five results
connection.execute(stmt).fetchall()[:5]

[('Illinois', 'M', 0, 89600, 95012),
 ('Illinois', 'M', 1, 88445, 91829),
 ('Illinois', 'M', 2, 88729, 89547),
 ('Illinois', 'M', 3, 88868, 90037),
 ('Illinois', 'M', 4, 91947, 91111)]

In [15]:
# Execute the statement and print the first item in the first row
connection.execute(stmt).fetchall()[0][0]

'Illinois'

## Filtering Data

### `where` with Comparison Operators

`where()` is used to filter the data, among with  comparison operators (`==`, `<=`, `<`, `!=`, etc.). 

In [16]:
# Select query
stmt = select([census])

# Where clause to filter the results to only those for New York
stmt = stmt.where(census.columns.state == 'New York')

# Execute the query
results = connection.execute(stmt).fetchall()

# Print column names once more
print(census.columns.keys())

results[:5]

['state', 'sex', 'age', 'pop2000', 'pop2008']


[('New York', 'M', 0, 126237, 128088),
 ('New York', 'M', 1, 124008, 125649),
 ('New York', 'M', 2, 124725, 121615),
 ('New York', 'M', 3, 126697, 120580),
 ('New York', 'M', 4, 131357, 122482)]

It is possible to loop over the results and print only the specifies columns.

In [17]:
# Loop over the results and print the age, sex, and pop2000
for result in results[:5]:
    print(result.age, result.sex, result.pop2000)

0 M 126237
1 M 124008
2 M 124725
3 M 126697
4 M 131357


### `where` with Column Element and Expressions

Methods such as `.all`, `not_()`, `in_()` are used for further filtering. Additionally, conjunctions such as `and_()` and `or_()` can be used to have multiple criteria in a `where` clause. More on [Column Elements and Expressions.](https://docs.sqlalchemy.org/en/latest/core/sqlelement.html#module-sqlalchemy.sql.expression)

In [18]:
# Create a list of states
states = ['New York', 'California']

# Select query
stmt = select([census])

# Append a where clause to match all the states in_ the list states
stmt = stmt.where(census.columns.state.in_(states))


# Loop over the ResultProxy and print the state and its population in 2000
i = 0
for result in connection.execute(stmt):
    print(result.state, result.pop2000)
    if i == 4: # Print only first five results
        break
    i += 1

New York 126237
New York 124008
New York 124725
New York 126697
New York 131357


In [19]:
# from sqlalchemy import and_

# Select query
stmt = select([census])

# Append a where clause
stmt = stmt.where(and_(census.columns.state == 'California', census.columns.age == 34))

# Execute the query
result = connection.execute(stmt).first()

# Print results
result

('California', 'M', 34, 269607, 257167)

Note that the `and_()`, `or_()` conjunctions are also available using the Python `&`, `|` operators respectively. 

In [22]:
# Select query
stmt = select([census])

# Append a where clause
stmt = stmt.where((census.columns.state == 'California') &
                  (census.columns.age == 34))

# Execute the query
result = connection.execute(stmt).first()

# Print results
result

('California', 'M', 34, 269607, 257167)

------

## Ordering Data

Ordering can be done with `.order_by()` method. The default ordering is ascending. We need to use another function, `desc()` in order to sort in a descending order.

In [26]:
"""
Ascending Order
"""
# SELECT state FROM census
stmt = select([census.columns.state])

# Order stmt by the state column
stmt = stmt.order_by(census.columns.state)

# Execute the query
results = connection.execute(stmt).fetchall()

# Print the first 5 results
print(results[:5])

[('Alabama',), ('Alabama',), ('Alabama',), ('Alabama',), ('Alabama',)]


In [25]:
"""
Descending Order
"""

# from sqlalchemy import desc

# SELECT state FROM census
stmt = select([census.columns.state])

# Order stmt by the state column in DESCENDING orderb
stmt = stmt.order_by(desc(census.columns.state))

# Execute the query
results = connection.execute(stmt).fetchall()

# Print the first 5 results
print(results[:5])

[('Wyoming',), ('Wyoming',), ('Wyoming',), ('Wyoming',), ('Wyoming',)]


In [27]:
"""
Ordering by Multiple Columns
"""

# SELECT state, age FROM census
stmt = select([census.columns.state, census.columns.age])

# Append order by to ascend by state and descend by age
stmt = stmt.order_by(census.columns.state, desc(census.columns.age))

# Execute the statement
results = connection.execute(stmt).fetchall()

# Print the first 5 results
print(results[:5])

[('Alabama', 85), ('Alabama', 85), ('Alabama', 84), ('Alabama', 84), ('Alabama', 83)]


## Aggregate Functions

### Counting

In [29]:
# from sqlalchemy import func

# Count the distinct states values
stmt = select([func.count(census.columns.state.distinct())])

# Execute the query and store the scalar result
distinct_state_count = connection.execute(stmt).scalar()

# Print the distinct_state_count
distinct_state_count

51

### Summing

In [33]:
# Calculate the sum of the population values
stmt = select([func.sum(census.columns.pop2008)])

# Execute the query 
results = connection.execute(stmt).scalar()

# Print the results
results

302876613

## Grouping Data

In [39]:
# Select the state and count of ages by state
stmt = select([census.columns.state, func.count(census.columns.age)])

# Group stmt by state
stmt = stmt.group_by(census.columns.state)

# Execute the statement
results = connection.execute(stmt).fetchall()

# Print keys/column names and the results
results[0].keys(), results[:5]

(['state', 'count_1'],
 [('Alabama', 172),
  ('Alaska', 172),
  ('Arizona', 172),
  ('Arkansas', 172),
  ('California', 172)])

In [41]:
'''
Labelled key values with the same result
'''
# Select the state and count of ages by state
stmt = select([census.columns.state, func.count(census.columns.age).label('population')])

# Group stmt by state
stmt = stmt.group_by(census.columns.state)

# Execute the statement
results = connection.execute(stmt).fetchall()

# Print keys/column names and the results
results[0].keys(), results[:5]

(['state', 'population'],
 [('Alabama', 172),
  ('Alaska', 172),
  ('Arizona', 172),
  ('Arkansas', 172),
  ('California', 172)])

-----

In [27]:
connection.close()