# Introduction to Relational Databases in Python
- learn the basics of using SQL with Python
- Python SQL toolkit SQLAlchemy provides an accessible and intuitive way to query, build & write to SQLite, MySQL and Postgresql databases (among many others)

Outline:
1. Basics of Relational Databases
2. Applying Filtering, Ordering, and Grouping to Queries
3. Advanced SQLAlchemy Queries
4. Creating and Manipulating your own Databases
5. Project: Putting it all together

# 1. Basics of Relational Databases

- relational databases are made up of tables
- Table consist of columns and rows
- Tables can be related (thus "Relational" databases)


SQLAlchemy
- 2 main pieces
    1. Core (Relational Model focused)
    2. ORM (User Data Model focused)

Many types of databases - SQLAlchemy can connect to
- SQLite
- PostgreSQL
- MySQL
- MS SQL
- Oracle
- Many more

Connecting to a database
- Engine - common interface to the db from SQLAlchemy
- Connection string: all the details required to find the database (and login if necessary)

Note on connection strings
- example: 'sqlite:///census_nyc.sqlite'
    - Driver+Dialect: sqlite:///
    - Filename: census_nyc.sqlite
    

## 1.1 Connecting to a DB - sqlalchemy

In [None]:
from sqlalchemy import create_engine
engine = create_engine('sqlite:///census_nyc.sqlite')
connection = engine.connect()


## 1.2 What's in the database?
- before querying your db, check what's in it
    - what the tables are

In [None]:
from sqlalchemy import create_engine
engine = create_engine('sqlite:///census_nyc.sqlite')

# check table
print(engine.table_names())
# ['census', 'state_fact']

## 1.3 Reflection
- access the table
- Reflection reads db and builds SQLAlchemy Table objects

In [None]:
from sqlalchemy import MetaData, Table
metadata = MetaData()
census = Table('census', metadata, autoload=True, 
               autoload_with=engine)

# view column names and data types
print(repr(census))

## 1.4 Examples

### 1.4.1 Example: Engines and Connection Strings
Alright, it's time to create your first engine! An engine is just a common interface to a database, and the information it requires to connect to one is contained in a connection string, such as sqlite:///census_nyc.sqlite. Here, sqlite is the database driver, while census_nyc.sqlite is a SQLite file contained in the local directory.

You can learn a lot more about connection strings in the SQLAlchemy documentation. http://docs.sqlalchemy.org/en/latest/core/engines.html#database-urls

Your job in this exercise is to create an engine that connects to a local SQLite file named census.sqlite. Then, print the names of the tables it contains using the .table_names() method. Note that when you just want to print the table names, you do not need to use engine.

- Using the create_engine() function, create an engine for a local file named census.sqlite with sqlite as the driver. Be sure to enclose the connection string within quotation marks.
- Print the output from the .table_names() method on the engine.

In [None]:
# Import create_engine
from sqlalchemy import create_engine

# Create an engine that connects to the census.sqlite file: engine
engine = create_engine('sqlite:///census.sqlite')

# Print table names
print(engine.table_names())

# output: this database has 2 tables
['census', 'state_fact']

### 1.4.2 Example: Autoloading Tables from a Database using Reflection
SQLAlchemy can be used to automatically load tables from a database using something called reflection. Reflection is the process of reading the database and building the metadata based on that information. It's the opposite of creating a Table by hand and is very useful for working with existing databases. To perform reflection, you need to import the Table object from the SQLAlchemy package. Then, you use this Table object to read your table from the engine and autoload the columns. Using the Table object in this manner is a lot like passing arguments to a function. For example, to autoload the columns with the engine, you have to specify the keyword arguments autoload=True and autoload_with=engine to Table().

In this exercise, your job is to reflect the census table available on your engine into a variable called census. The metadata has already been loaded for you using MetaData() and is available in the variable metadata.

- Reflect the census table by using the Table object with the arguments:
    - The name of the table as a string ('census').
    - The metadata, contained in the variable metadata.
    - autoload=True
    - The engine to autoload with - in this case, engine.
- Print the details of census using the repr() function.

In [None]:
# Import Table
from sqlalchemy import Table

# Reflect census table from the engine: census
census = Table('census', metadata, autoload=True, 
               autoload_with=engine)

# Print census table metadata
print(repr(census))

# output
Table('census', MetaData(bind=None), 
      Column('state', VARCHAR(length=30), table=<census>), 
      Column('sex', VARCHAR(length=1), table=<census>), 
      Column('age', INTEGER(), table=<census>), 
      Column('pop2000', INTEGER(), table=<census>), 
      Column('pop2008', INTEGER(), table=<census>), 
      schema=None)


### 1.4.3 Example: Viewing Table Details
- example.columns.keys()
- metadata.tables['example']
    - similar to repr()

Great job reflecting the census table! Now you can begin to learn more about the columns and structure of your table. It is important to get an understanding of your database by examining the column names. This can be done by using the .columns attribute and accessing the .keys() method. For example, census.columns.keys() would return a list of column names of the census table.

Following this, we can use the metadata container to find out more details about the reflected table such as the columns and their types. For example, table objects are stored in the metadata.tables dictionary, so you can get the metadata of your census table with metadata.tables['census']. This is similar to your use of the repr() function on the census table from the previous exercise.

In [None]:
# Reflect the census table from the engine: census
census = Table('census', metadata, autoload=True, 
               autoload_with=engine)

# Print the column names
print(census.columns.keys())

# Print full table metadata
print(repr(metadata.tables['census']))

# output
['state', 'sex', 'age', 'pop2000', 'pop2008']
Table('census', MetaData(bind=None), 
      Column('state', VARCHAR(length=30), table=<census>), 
      Column('sex', VARCHAR(length=1), table=<census>), 
      Column('age', INTEGER(), table=<census>), 
      Column('pop2000', INTEGER(), table=<census>), 
      Column('pop2008', INTEGER(), table=<census>), 
      schema=None)


## 1.5 Intro to SQL
SQL statements
- Select, Insert, Update, Delete data
- Create and Alter data and columns

Basic SQL querying
- SELECT column_name FROM table_name
- SELECT pop2008 FROM People
- SELECT * FROM People
    - asterisks selects everything


### 1.5.1 Basic SQL querying

In [None]:
from sqlalchemy import create_engine
engine = create_engine('sqlite:///census_nyc.sqlite')
connection = engine.connect()
stmt = 'SELECT * FROM people'
# ResultProxy
result_proxy = connection.execute(stmt)
# ResultSet
results = result_proxy.fetchall()


### 1.5.2 ResultProxy vs ResultSet
- ResultProxy from execute statement
- get a ResultSet from fetch method

### 1.5.3 Handling ResultSets

In [None]:
first_row = results[0]
print(first_row)
# column names
print(first_row.keys())
# print value
print(first_row.state)

## 1.6 SQLAlchemy to Build Queries
- provides pythonic way to build SQL statements
- hides differences b/n backend db types

### 1.6.1 SQLAlchemy querying

In [None]:
from sqlalchemy import Table, MetaData
metadata = MetaData()
census = Table('census', metadata, autoload=True, 
               autoload_with=engine)
stmt = select([census])
results = connection.execute(stmt).fetchall()


### 1.6.2 SQLAlchemy Select Statement
- requires a list of one or more Tables or Columns
- using a table will select all the columns in it

In [None]:
stmt = select([census])

In [None]:
# check statement with print
print(stmt)
# out: 'SELECT * from CENSUS'

### 1.6.3 Example: Selecting data from a Table: raw SQL
Using what we just learned about SQL and applying the .execute() method on our connection, we can leverage a raw SQL query to query all the records in our census table. The object returned by the .execute() method is a ResultProxy. On this ResultProxy, we can then use the .fetchall() method to get our results - that is, the ResultSet.

In this exercise, you'll use a traditional SQL query. In the next exercise, you'll move to SQLAlchemy and begin to understand its advantages. Go for it!

- Build a SQL statement to query all the columns from census and store it in stmt. Note that your SQL statement must be a string.
- Use the .execute() and .fetchall() methods on connection and store the result in results. Remember that .execute() comes before .fetchall() and that stmt needs to be passed to .execute().

In [None]:
# Build select statement for census table: stmt
stmt = 'SELECT * from census'

# Execute the statement and fetch the results: results
results = connection.execute(stmt).fetchall()

# Print results
print(results)

### 1.6.4 Example: Selecting data from a Table with SQLAlchemy
Excellent work so far! It's now time to build your first select statement using SQLAlchemy. SQLAlchemy provides a nice "Pythonic" way of interacting with databases. So rather than dealing with the differences between specific dialects of traditional SQL such as MySQL or PostgreSQL, you can leverage the Pythonic framework of SQLAlchemy to streamline your workflow and more efficiently query your data. For this reason, it is worth learning even if you may already be familiar with traditional SQL.

In this exercise, you'll once again build a statement to query all records from the census table. This time, however, you'll make use of the select() function of the sqlalchemy module. This function requires a list of tables or columns as the only required argument.

Table and MetaData have already been imported. The metadata is available as metadata and the connection to the database as connection.

In [None]:
# Import select
from sqlalchemy import select

# Reflect census table via engine: census
census = Table('census', metadata, autoload=True, autoload_with=engine)

# Build select statement for census table: stmt
stmt = select([census])

# Print the emitted statement to see the SQL emitted
print(stmt)

# Execute the statement and print the results
print(connection.execute(stmt).fetchall())


### 1.6.5 Example: Handling a ResultSet
Recall the differences between a ResultProxy and a ResultSet:

ResultProxy: The object returned by the .execute() method. It can be used in a variety of ways to get the data returned by the query.
ResultSet: The actual data asked for in the query when using a fetch method such as .fetchall() on a ResultProxy.
This separation between the ResultSet and ResultProxy allows us to fetch as much or as little data as we desire.

Once we have a ResultSet, we can use Python to access all the data within it by column name and by list style indexes. For example, you can get the first row of the results by using results[0]. With that first row then assigned to a variable first_row, you can get data from the first column by either using first_row[0] or by column name such as first_row['column_name']. You'll now practice exactly this using the ResultSet you obtained from the census table in the previous exercise. It is stored in the variable results. Enjoy!

In [None]:
# Get the first row of the results by using an index: first_row
first_row = results[0]

# Print the first row of the results
print(first_row)

# Print the first column of the first row by using an index
print(first_row[0])

# Print the 'state' column of the first row by using its name
print(first_row['state'])


# 2. Applying Filtering, Ordering, and Grouping to Queries

## 2.1 Where Clauses
- restrict data returned by a query based on boolean conditions
- compare a column against a value or another column
- often used comparisons:
    - ==, <=, >=, !=


In [None]:
stmt = select([census])
stmt = stmt.where(census.columns.state == 'California')
results = connection.execute(stmt).fetchall()
for result in results:
    print(result.state, result.age)



## 2.2 Expressions
- provide more complex conditions than simple operators
- examples
    - in_() - matches columns value against a list
    - like() - matches columns value against a partial value with wildcard
    - between() - check if columns values is between 2 values
- many more in documentation
- available as method on a Column
    

### 2.2.1 Example: use expressions to filter states with 'new'

In [None]:
stmt = select([census])
# use where and startswith
stmt = stmt.where(census.columns.stat.startswith('New'))
for result in connection.execute(stmt):
    print(result.state, result.pop2000)

## 2.3 Conjunctions
- allows us to have multiple criteria in a where clause
- examples
    - and_()
    - not_()
    - or_()
- can nest conjunctions to get more specific


### 2.3.1 Example: conjunctions
- get all records from California or New York

In [None]:
# conjunctions
from sqlalchemy import or_
stmt = select([census])
stmt = stmt.where(or_(census.columns.state == 'California',
                     census.columns.state == 'New York'
                     )
                 )
for result in connection.execute(stmt):
    print(result.state, result.sex)

## 2.4 Connecting to a PostgreSQL Database on AWS
In these exercises, you will be working with real databases hosted on the cloud via Amazon Web Services (AWS)!

Let's begin by connecting to a PostgreSQL database. When connecting to a PostgreSQL database, many prefer to use the psycopg2 database driver as it supports practically all of PostgreSQL's features efficiently and is the standard dialect for PostgreSQL in SQLAlchemy.

You might recall from Chapter 1 that we use the create_engine() function and a connection string to connect to a database.

There are three components to the connection string in this exercise: the dialect and driver ('postgresql+psycopg2://'), followed by the username and password ('student:datacamp'), followed by the host and port ('@postgresql.csrrinzqubik.us-east-1.rds.amazonaws.com:5432/'), and finally, the database name ('census'). You will have to pass this string as an argument to create_engine() in order to connect to the database.

In [None]:
# Import create_engine function
from sqlalchemy import create_engine

# Create an engine to the census database
# Concatenate args: dialect and drive, username and password,
#  host and port, database name
engine = create_engine('postgresql+psycopg2://' +
'student:datacamp' +
'@postgresql.csrrinzqubik.us-east-1.rds.amazonaws.com:5432/' +
'census')

# Use the .table_names() method on the engine to print the table names
print(engine.table_names())

# output: 4 tables in the census database
['census', 'state_fact', 'data', 'users']

## 2.5 Filter data selected from a Table - Simple
Having connected to the database, it's now time to practice filtering your queries!

As mentioned in the video, a where() clause is used to filter the data that a statement returns. For example, to select all the records from the census table where the sex is Female (or 'F') we would do the following:

select([census]).where(census.columns.sex == 'F')

In addition to == we can use basically any python comparison operator (such as <=, !=, etc) in the where() clause. 

In [None]:
# Create a select query: stmt
stmt = select([census])

# Add a where clause to filter the results to only those for New York
stmt = stmt.where(census.columns.state == 'New York')

# Execute the query to retrieve all the data returned: results
results = connection.execute(stmt).fetchall()

# Loop over the results and print the age, sex, and pop2008
for result in results:
    print(result.age, result.sex, result.pop2008)


## 2.6 Filter data selected from a Table - Expressions
In addition to standard Python comparators, we can also use methods such as in_() to create more powerful where() clauses. You can see a full list of expressions in the SQLAlchemy Documentation. https://docs.sqlalchemy.org/en/latest/core/sqlelement.html#module-sqlalchemy.sql.expression

We've already created a list of some of the most densely populated states.

In [None]:
# Create a query for the census table: stmt
stmt = select([census])

# Append a where clause to match all the states in_ the list states
stmt = stmt.where(census.columns.state.in_(states))

# Loop over the ResultProxy and print the state and its population 
# in 2000
for i in connection.execute(stmt):
    print(i.state, i.pop2000)


## 2.7 Filter data selected from a Table - Advanced
You're really getting the hang of this! SQLAlchemy also allows users to use conjunctions such as and_(), or_(), and not_() to build more complex filtering. For example, we can get a set of records for people in New York who are 21 or 37 years old with the following code:

In [None]:
# example code noted above
stmt([census]).where(
  and_(census.columns.state == 'New York',
       or_(census.columns.age == 21,
          census.columns.age == 37
         )
      )
  )

In [None]:
# Import and_
from sqlalchemy import and_

# Build a query for the census table: stmt
stmt = select([census])

# Append a where clause to select only non-male records from California using and_
stmt = stmt.where(
    # The state of California with a non-male sex
    and_(census.columns.state == 'California',
         census.columns.sex != 'M'
         )
)

# Loop over the ResultProxy printing the age and sex
for result in connection.execute(stmt):
    print(result.age, result.sex)


## 2.8 Overview of Ordering
Order by Clauses
- allows us to control the order in which records are returned in the query results
- available as a method on statements order_by()


### 2.8.1 Order by Ascending (low to high)

In [None]:
# check contents
print(results[:10])

stmt = select([census.columns.state])
stmt = stmt.order_by(census.columns.state)
# execute the statement
results = connection.execute(stmt).fetchall()
print(results[:10])

#### Example: Ordering by a Single Column
To sort the result output by a field, we use the .order_by() method. By default, the .order_by() method sorts from lowest to highest on the supplied column. You just have to pass in the name of the column you want sorted to .order_by(). In the video, for example, Jason used stmt.order_by(census.columns.state) to sort the result output by the state column.


In [None]:
# Build a query to select the state column: stmt
stmt = select([census.columns.state])

# Order stmt by the state column
stmt = stmt.order_by(census.columns.state)

# Execute the query and store the results: results
results = connection.execute(stmt).fetchall()

# Print the first 10 results
print(results[:10])


### 2.8.2 Order by Descending (high to low)
- wrap the column with desc() in the order_by() clause

#### Example: Ordering in Descending Order by a Single Column
You can also use .order_by() to sort from highest to lowest by wrapping a column in the desc() function. Although you haven't seen this function in action, it generalizes what you have already learned.

Pass desc() (for "descending") inside an .order_by() with the name of the column you want to sort by. For instance, stmt.order_by(desc(table.columns.column_name)) sorts column_name in descending order.


In [None]:
# Import desc
from sqlalchemy import desc

# Build a query to select the state column: stmt
stmt = select([census.columns.state])

# Order stmt by state in descending order: rev_stmt
rev_stmt = stmt.order_by(desc(census.columns.state))

# Execute the query and store the results: rev_results
rev_results = connection.execute(rev_stmt).fetchall()

# Print the first 10 rev_results
print(rev_results[:10])


### 2.8.3 Order by Multiple
- just separate multiple columns with a comma
- orders completely by the first column
- then if there are duplicates in the 1st column, orders by the 2nd column
- repeat until all columns are ordered


In [None]:
# check contents
print(results)

stmt = select([census.columns.state, census.columns.sex])
# order by state, then sex
stmt = stmt.order_by(census.columns.state, census.columns.sex)
results = connection.execute(stmt).first()
print(results)

#### Example: Ordering by Multiple Columns
We can pass multiple arguments to the .order_by() method to order by multiple columns. In fact, we can also sort in ascending or descending order for each individual column. Each column in the .order_by() method is fully sorted from left to right. This means that the first column is completely sorted, and then within each matching group of values in the first column, it's sorted by the next column in the .order_by() method. This process is repeated until all the columns in the .order_by() are sorted.

Use .order_by() to sort the output of the state column in ascending order and age in descending order. (NOTE: desc is already imported).

In [None]:
# Build a query to select state and age: stmt
stmt = select([census.columns.state, census.columns.age])

# Append order by to ascend by state and descend by age
stmt = stmt.order_by(census.columns.state, desc(census.columns.age))

# Execute the statement and store all the records: results
results = connection.execute(stmt).fetchall()

# Print the first 20 results
print(results[:20])


## 2.9 Counting, Summing and Grouping Data
SQL functions
- ie. Count, Sum
- from sqlalchemy import func
    - Note: don't import Sum since it will conflict with Python's Sum
- more efficient than processing in Python
- Aggregate data

### 2.9.1 Counting Distinct Data - func.sum() or func.count()
As mentioned in the video, SQLAlchemy's func module provides access to built-in SQL functions that can make operations like counting and summing faster and more efficient.

In the video, Jason used func.sum() to get a sum of the pop2008 column of census as shown below:
- select([func.sum(census.columns.pop2008)])

If instead you want to count the number of values in pop2008, you could use func.count() like this:
- select([func.count(census.columns.pop2008)])

Furthermore, if you only want to count the distinct values of pop2008, you can use the .distinct() method:
- select([func.count(census.columns.pop2008.distinct())])

In this exercise, you will practice using func.count() and .distinct() to get a count of the distinct number of states in census.

So far, you've seen .fetchall() and .first() used on a ResultProxy to get the results. The ResultProxy also has a method called .scalar() for getting just the value of a query that returns only one row and column.

This can be very useful when you are querying for just a count or sum.

In [None]:
# example: use func.count() and .distinct()
# use .scalar() on ResultProxy to get a 1 row x 1 column value

# Build a query to count the distinct states values: stmt
stmt = select([func.count(census.columns.state.distinct())])

# Execute the query and store the scalar result: distinct_state_count
distinct_state_count = connection.execute(stmt).scalar()

# Print the distinct_state_count
print(distinct_state_count)

# output
51

#### 2.9.1.a Example: Count of Records by State
Often, we want to get a count for each record with a particular value in another column. The .group_by() method helps answer this type of query. You can pass a column to the .group_by() method and use in an aggregate function like sum() or count(). Much like the .order_by() method, .group_by() can take multiple columns as arguments.

In [None]:
# Import func
from sqlalchemy import func

# Build a query to select the state and count of ages by state: stmt
stmt = select([census.columns.state, func.count(census.columns.age)])

# Group stmt by state
stmt = stmt.group_by(census.columns.state)

# Execute the statement and store all the records: results
results = connection.execute(stmt).fetchall()

# Print results
print(results)

# Print the keys/column names of the results returned
print(results[0].keys())

### 2.9.2 Sum

In [None]:
# example

from sqlalchemy import func
stmt = select([func.sum(census.columns.pop2008)])
# use scaler() fetch method to get just the value
results = connection.execute(stmt).scalar()
print(results)

### 2.9.3 Group by
- allows us to group row by common values
- supports multiple columns to group by with a pattern similar to order_by()
- requires all selected columns to be grouped or aggregated by a function (like sum or count fxn)


In [None]:
# example
stmt = select([census.columns.sex,
              func.sum(census.columns.pop2008)])
stmt = stmt.group_by(census.columns.sex)
results = connection.execute(stmt).fetchall()
print(results)

#### 2.9.3.1 Group by Multiple    

In [None]:
stmt = select([census.columns.sex,
               census.columns.age,
               func.sum(census.columns.pop2008)])
stmt = stmt.group_by(census.columns.sex, census.columns.age)
results = connection.execute(stmt).fetchall()
print(results)           

### 2.9.4 Creating descriptive labels for ResultSets from Functions - using label() to label new column (ie. of a sum)
- SQLAlchemy autogenerates "column names" for functions in the ResultSet
- The column names are often func_# such as count_1
- To make it easier, you can replace them with the label() method

In [None]:
# using label()
print(results[0].keys())
stmt = select([census.columns.sex,
              func.sum(census.columns.pop2008).label(
              'pop2008_sum')
              ])
stmt = stmt.group_by(census.columns.sex)
results = connection.execute(stmt).fetchall()
print(results[0].keys())
# you should see the new column name here

#### 2.9.4.1 Example: Determining the Population Sum by State
To avoid confusion with query result column names like count_1, we can use the .label() method to provide a name for the resulting column. This gets appended to the function method we are using, and its argument is the name we want to use.

We can pair func.sum() with .group_by() to get a sum of the population by State and use the label() method to name the output.

We can also create the func.sum() expression before using it in the select statement. We do it the same way we would inside the select statement and store it in a variable. Then we use that variable in the select statement where the func.sum() would normally be.

In [None]:
# Import func
from sqlalchemy import func

# Build an expression to calculate the sum of pop2008 labeled as 
# population
pop2008_sum = func.sum(census.columns.pop2008).label('population')

# Build a query to select the state and sum of pop2008: stmt
stmt = select([census.columns.state, pop2008_sum])

# Group stmt by state
stmt = stmt.group_by(census.columns.state)

# Execute the statement and store all the records: results
results = connection.execute(stmt).fetchall()

# Print results
print(results)

# Print the keys/column names of the results returned
print(results[0].keys())


## 2.10 Use Pandas and Matplotlib to visualize data
SQLAlchemy and Pandas
- DataFrame can take a SQLAlchemy ResultSet
- Make sure to set the DataFrame columns to the ResultSet keys


### 2.10.1 DataFrame Example - pd.DataFrame(ResultSet)

In [None]:
import pandas as pd
# results is a SQLAlchemy ResultSet
df = pd.DataFrame(results)
df.columns = results[0].keys()
# print df and check
print(df)

#### 2.10.1.a SQLAlchemy ResultsProxy and Pandas Dataframes
We can feed a ResultProxy directly into a pandas DataFrame, which is the workhorse of many Data Scientists in PythonLand. Jason demonstrated this in the video. In this exercise, you'll follow exactly the same approach to convert a ResultProxy into a DataFrame.


In [None]:
# example 2

# import pandas
import pandas as pd

# Create a DataFrame from the results: df
df = pd.DataFrame(results)

# Set column names
df.columns = results[0].keys()

# Print the Dataframe
print(df)

### 2.10.2 Graphing
- graph like normal

In [None]:
import matplotlib.pyplot as plt
# horizontal bar chart of population count by age
# limited to 10 rows
df[10:20].plot.barh()
plt.show()

#### 2.10.2.a From SQLAlchemy results to a Graph
We can also take advantage of pandas and Matplotlib to build figures of our data. Remember that data visualization is essential for both exploratory data analysis and communication of your data!

- Use the plot.bar() method on df to create a bar plot of the results.
- Display the plot with plt.show().

In [None]:
# Import Pyplot as plt from matplotlib
import matplotlib.pyplot as plt

# Create a DataFrame from the results: df
df = pd.DataFrame(results)

# Set Column names
df.columns = results[0].keys()

# Print the DataFrame
print(df)

# Plot the DataFrame
df.plot.bar()
plt.show()


<img src="Images\sqlalchemy1.png" width="400" />

# 3. Advanced SQLAlchemy Queries
Calculating Values in a Query

Math Operators
- addition + 
- subtraction - 
- multiplication *
- division /
- modulus %
- Work differently on different data types

## 3.1 Calculating Difference
- notice wrapping expression with () and then labeling it

In [None]:
# difference
stmt = select([census.columns.age,
              (census.columns.pop2008-
              census.columns.pop2000).label('pop_chage')
              ])
# group by age
stmt = stmt.group_by(census.columns.age)
# order by pop_change
stmt = stmt.order_by(desc('pop_change'))
# return top 5 results
stmt = stmt.limit(5)
# execute the statement
results = connection.execute(stmt).fetchall()
# print
print(results)

## 3.2 Case Statement
- notice wrapping expression with () and then labeling it
- used to treat data differently based on a condition
- accepts a list of conditions to match and a column to return if the condition matches
- the list of conditions ends with an else clause to determine what to do when a record doesn't match any prior conditions

In [None]:
# Case example
from sqlalchemy import case

stmt = select([
    func.sum(
        case([
            (census.columns.state == 'New york',
             census.columns.pop2008)
    ], else_=0))
])
results = connection.execute(stmt).fetchall()
print(results)

## 3.3 Cast Statement
- converts data to another type
- usefor for converting
    - intergers to floats for division
    - strings to dates and times
- accepts a column or expression and the target Type


## 3.4 Percentage Example - Case and Cast

In [None]:
from sqlalchemy import case, cast, Float
# calculate a percentage, note: convert to float type
stmt = select([
    (func.sum(
        case([
            (census.columns.state == 'New york',
             census.columns.pop2008)
        ], else_=0)) /
     cast(func.sum(census.columns.pop2008),
          Float) * 100).label('ny_percent')
])
results = connection.execute(stmt).fetchall()
print(results)

## 

## 

## 

## 

## 

## 

# 4. Creating and Manipulating your own Databases

## 

## 

## 

## 

## 

## 

## 

## 

## 

## 

# 5. Project: Putting it all together

## 

## 

## 

## 

## 

## 

## 

## 

## 

## 