# Basic Statements

### In this notebook we study some use case examples for the most simple SQL statments. We use Python as our main programming language and make use of libraires such as Pandas ans Psycopg2 to present the result of the queries.

First, we call the pandas and psycopg2 libraries

In [1]:
import pandas as pd
import psycopg2 as pg2

We need to create a connection object, which will allow us to comunicate with the data base.

In [2]:
connection = pg2.connect(database = 'dvdrental', user = 'postgres', password = 'password')

In order to present the results in a clearer manner we define the next function, whichs make use of the connection object. This function receives the SQL query and returns the result in a pandas DataFrame object. We can also especify, via the rows argument, the number of rows to be returned, by default the function returns ten rows. Given that this notebook is just for demostrative purposes, we decide to display a little subset of the tables to avoid memory issues.

In [3]:
def get_data(query, rows = 10):

    with connection.cursor() as cursor:
        cursor. execute(query)

        if rows == 'all':
            raw_data = cursor.fetchall()
        else:
            raw_data = cursor.fetchmany(rows) 

        col_names = [col_desc[0] for col_desc in cursor.description]
        data = pd.DataFrame(raw_data, columns = col_names)

    return data

# SELECT Statement

## Ex. 1

As our first example, we'll ask for the actor table form the database.

In [4]:
query_1 = 'SELECT * FROM actor'
actor_table = get_data(query_1)

actor_table

Unnamed: 0,actor_id,first_name,last_name,last_update
0,1,Penelope,Guiness,2013-05-26 14:47:57.620
1,2,Nick,Wahlberg,2013-05-26 14:47:57.620
2,3,Ed,Chase,2013-05-26 14:47:57.620
3,4,Jennifer,Davis,2013-05-26 14:47:57.620
4,5,Johnny,Lollobrigida,2013-05-26 14:47:57.620
5,6,Bette,Nicholson,2013-05-26 14:47:57.620
6,7,Grace,Mostel,2013-05-26 14:47:57.620
7,8,Matthew,Johansson,2013-05-26 14:47:57.620
8,9,Joe,Swank,2013-05-26 14:47:57.620
9,10,Christian,Gable,2013-05-26 14:47:57.620


## Ex. 2

We can grab only a subset of columns out of the table. 

In [5]:
query_2 = 'SELECT first_name, actor_id FROM actor'
two_columns = get_data(query_2)

two_columns

Unnamed: 0,first_name,actor_id
0,Penelope,1
1,Nick,2
2,Ed,3
3,Jennifer,4
4,Johnny,5
5,Bette,6
6,Grace,7
7,Matthew,8
8,Joe,9
9,Christian,10


# DISTINCT 

## Ex 3.

It may be the case that we'd like to list only the distinct vlaues in a certain column, we can use the DISTINCT keyword to fullfil this task. Suppose we'd want to know the different category_ids of the movies that are in the film database, to respond to this question we can use the following code 

In [6]:
query_3 = 'SELECT DISTINCT(category_id) FROM film_category'
movie_categories = get_data(query_3)

movie_categories

Unnamed: 0,category_id
0,4
1,14
2,3
3,10
4,7
5,13
6,9
7,1
8,5
9,2


# COUNT Function 

The count function simply returns back the number of rows in a table.

## Ex. 4

Using the COUNT function we can calculate how many movies are in our database. 

In [7]:
query_4 = 'SELECT COUNT(film_id) FROM film'
amount_movies = get_data(query_4)

amount_movies

Unnamed: 0,count
0,1000


## Ex. 5 

Now we can not only know the distinct movie category_ids, but also calculate the amount the number of categories with a simple query.

In [8]:
query_5 = 'SELECT COUNT(DISTINCT(category_id)) FROM film_category'
amount_categories = get_data(query_5)

amount_categories

Unnamed: 0,count
0,16


# WHERE Statement

The WHERE statemenr allows us tu specify conditions on the data to be returned.

## Ex. 6

For example, we could be interested in getting the information contained in the cutomer table but only for the active customers

In [9]:
query_6 = '''
          SELECT * FROM customer
          WHERE active = 1
          '''

active_costumers = get_data(query_6)
active_costumers

Unnamed: 0,customer_id,store_id,first_name,last_name,email,address_id,activebool,create_date,last_update,active
0,524,1,Jared,Ely,jared.ely@sakilacustomer.org,530,True,2006-02-14,2013-05-26 14:49:45.738,1
1,1,1,Mary,Smith,mary.smith@sakilacustomer.org,5,True,2006-02-14,2013-05-26 14:49:45.738,1
2,2,1,Patricia,Johnson,patricia.johnson@sakilacustomer.org,6,True,2006-02-14,2013-05-26 14:49:45.738,1
3,3,1,Linda,Williams,linda.williams@sakilacustomer.org,7,True,2006-02-14,2013-05-26 14:49:45.738,1
4,4,2,Barbara,Jones,barbara.jones@sakilacustomer.org,8,True,2006-02-14,2013-05-26 14:49:45.738,1
5,5,1,Elizabeth,Brown,elizabeth.brown@sakilacustomer.org,9,True,2006-02-14,2013-05-26 14:49:45.738,1
6,6,2,Jennifer,Davis,jennifer.davis@sakilacustomer.org,10,True,2006-02-14,2013-05-26 14:49:45.738,1
7,7,1,Maria,Miller,maria.miller@sakilacustomer.org,11,True,2006-02-14,2013-05-26 14:49:45.738,1
8,8,2,Susan,Wilson,susan.wilson@sakilacustomer.org,12,True,2006-02-14,2013-05-26 14:49:45.738,1
9,9,2,Margaret,Moore,margaret.moore@sakilacustomer.org,13,True,2006-02-14,2013-05-26 14:49:45.738,1


## Ex. 7
We can also count the number of active costumers

In [10]:
query_7 = '''
          SELECT COUNT(*) FROM customer
          WHERE active = 1
          '''

num_active_costumers = get_data(query_7)
num_active_costumers

Unnamed: 0,count
0,584


## Ex. 8

Using the logical opetators AND, OR and NOT we can impose several conditions on the data to be returned. For example, we could ask for the number of active costumers whose name is Nancy

In [11]:
query_8 = '''
          SELECT COUNT(*) FROM customer
          WHERE active = 1 and first_name = 'Nancy'
          '''

custom_query = get_data(query_8)
custom_query

Unnamed: 0,count
0,1


# ORDER BY Statement

With the ORDER BY statement we can sort the data based on a column value in either ascending or descending order.

## Ex. 9

For example, we can order the customer table and order by the first_name in ascending order

In [12]:
query_9 = '''
          SELECT * FROM customer
          ORDER BY first_name ASC
          '''

custom_query = get_data(query_9)
custom_query

Unnamed: 0,customer_id,store_id,first_name,last_name,email,address_id,activebool,create_date,last_update,active
0,375,2,Aaron,Selby,aaron.selby@sakilacustomer.org,380,True,2006-02-14,2013-05-26 14:49:45.738,1
1,367,1,Adam,Gooch,adam.gooch@sakilacustomer.org,372,True,2006-02-14,2013-05-26 14:49:45.738,1
2,525,2,Adrian,Clary,adrian.clary@sakilacustomer.org,531,True,2006-02-14,2013-05-26 14:49:45.738,1
3,217,2,Agnes,Bishop,agnes.bishop@sakilacustomer.org,221,True,2006-02-14,2013-05-26 14:49:45.738,1
4,389,1,Alan,Kahn,alan.kahn@sakilacustomer.org,394,True,2006-02-14,2013-05-26 14:49:45.738,1
5,352,1,Albert,Crouse,albert.crouse@sakilacustomer.org,357,True,2006-02-14,2013-05-26 14:49:45.738,1
6,568,2,Alberto,Henning,alberto.henning@sakilacustomer.org,574,True,2006-02-14,2013-05-26 14:49:45.738,1
7,454,2,Alex,Gresham,alex.gresham@sakilacustomer.org,459,True,2006-02-14,2013-05-26 14:49:45.738,1
8,439,2,Alexander,Fennell,alexander.fennell@sakilacustomer.org,444,True,2006-02-14,2013-05-26 14:49:45.738,1
9,423,2,Alfred,Casillas,alfred.casillas@sakilacustomer.org,428,True,2006-02-14,2013-05-26 14:49:45.738,1


# Ex. 10

Another thing we can do es sort the data based on multiple columns. Forv example, let's say that we want to sort the customer table based on the store_id in descending order and then the first_name in ascending order.

In [13]:
query_10 = '''
          SELECT * FROM customer
          ORDER BY store_id DESC, first_name ASC 
          '''

custom_query = get_data(query_10)
custom_query

Unnamed: 0,customer_id,store_id,first_name,last_name,email,address_id,activebool,create_date,last_update,active
0,375,2,Aaron,Selby,aaron.selby@sakilacustomer.org,380,True,2006-02-14,2013-05-26 14:49:45.738,1
1,525,2,Adrian,Clary,adrian.clary@sakilacustomer.org,531,True,2006-02-14,2013-05-26 14:49:45.738,1
2,217,2,Agnes,Bishop,agnes.bishop@sakilacustomer.org,221,True,2006-02-14,2013-05-26 14:49:45.738,1
3,568,2,Alberto,Henning,alberto.henning@sakilacustomer.org,574,True,2006-02-14,2013-05-26 14:49:45.738,1
4,454,2,Alex,Gresham,alex.gresham@sakilacustomer.org,459,True,2006-02-14,2013-05-26 14:49:45.738,1
5,439,2,Alexander,Fennell,alexander.fennell@sakilacustomer.org,444,True,2006-02-14,2013-05-26 14:49:45.738,1
6,423,2,Alfred,Casillas,alfred.casillas@sakilacustomer.org,428,True,2006-02-14,2013-05-26 14:49:45.738,1
7,567,2,Alfredo,Mcadams,alfredo.mcadams@sakilacustomer.org,573,True,2006-02-14,2013-05-26 14:49:45.738,1
8,412,2,Allen,Butterfield,allen.butterfield@sakilacustomer.org,417,True,2006-02-14,2013-05-26 14:49:45.738,1
9,228,2,Allison,Stanley,allison.stanley@sakilacustomer.org,232,True,2006-02-14,2013-05-26 14:49:45.738,1


# LIMIT Statement

The limit statement is used to restricy the number of rows retunred by a query. The limit value defines the number of top rows to be returned.

## Ex. 11 

For example, we can choose to only ask for the first five rows in the costumer table.

In [14]:
query_11 = '''
          SELECT * FROM customer
          LIMIT 5
          '''

first_five = get_data(query_11)
first_five

Unnamed: 0,customer_id,store_id,first_name,last_name,email,address_id,activebool,create_date,last_update,active
0,524,1,Jared,Ely,jared.ely@sakilacustomer.org,530,True,2006-02-14,2013-05-26 14:49:45.738,1
1,1,1,Mary,Smith,mary.smith@sakilacustomer.org,5,True,2006-02-14,2013-05-26 14:49:45.738,1
2,2,1,Patricia,Johnson,patricia.johnson@sakilacustomer.org,6,True,2006-02-14,2013-05-26 14:49:45.738,1
3,3,1,Linda,Williams,linda.williams@sakilacustomer.org,7,True,2006-02-14,2013-05-26 14:49:45.738,1
4,4,2,Barbara,Jones,barbara.jones@sakilacustomer.org,8,True,2006-02-14,2013-05-26 14:49:45.738,1


## Ex. 12 

We can use the LIMIT statement in combination with the ORDER BY statement, to ask, for example, for the five most recent purchases un the paymenyt table.

In [15]:
query_12 = '''
          SELECT * FROM payment
          ORDER BY payment_date DESC  
          LIMIT 5
          '''

recent_pur = get_data(query_12)
recent_pur

Unnamed: 0,payment_id,customer_id,staff_id,rental_id,amount,payment_date
0,31920,269,2,12610,0.0,2007-05-14 13:44:29.996577
1,31917,267,2,12066,7.98,2007-05-14 13:44:29.996577
2,31918,267,2,13713,0.0,2007-05-14 13:44:29.996577
3,31919,269,1,13025,3.98,2007-05-14 13:44:29.996577
4,31921,274,1,13486,0.99,2007-05-14 13:44:29.996577


## Ex. 13

We can also ask for for the five most recent purchases whose amount is different from zero.

In [16]:
query_13 = '''
          SELECT * FROM payment
          WHERE amount != 0
          ORDER BY payment_date DESC  
          LIMIT 5
          '''

recent_pur = get_data(query_13)
recent_pur

Unnamed: 0,payment_id,customer_id,staff_id,rental_id,amount,payment_date
0,31922,279,2,13538,4.99,2007-05-14 13:44:29.996577
1,31917,267,2,12066,7.98,2007-05-14 13:44:29.996577
2,31919,269,1,13025,3.98,2007-05-14 13:44:29.996577
3,31921,274,1,13486,0.99,2007-05-14 13:44:29.996577
4,31923,282,2,15430,0.99,2007-05-14 13:44:29.996577


# BETWEEN Statement

We use the BETWEEN statement to select rows form teh data whose value on a column lies between a given range.

Ex. 14

Using the BETWEEN statement we can find out what payments were made between a certain range, let's say 5\$ and 8\$.  

In [18]:
query_14 = '''
          SELECT * FROM payment
          WHERE amount BETWEEN 5 AND 8
          '''

payments = get_data(query_14)
payments

Unnamed: 0,payment_id,customer_id,staff_id,rental_id,amount,payment_date
0,17503,341,2,1520,7.99,2007-02-15 22:25:46.996577
1,17505,341,1,1849,7.99,2007-02-16 22:41:45.996577
2,17507,341,2,3130,7.99,2007-02-20 17:31:48.996577
3,17508,341,1,3382,5.99,2007-02-21 12:33:49.996577
4,17509,342,2,2190,5.99,2007-02-17 23:58:17.996577
5,17510,342,1,2914,5.99,2007-02-20 02:11:44.996577
6,17513,343,1,1564,6.99,2007-02-16 01:15:33.996577
7,17516,343,2,2461,6.99,2007-02-18 18:26:38.996577
8,17526,346,1,1994,5.99,2007-02-17 09:35:32.996577
9,17533,347,1,3326,7.99,2007-02-21 07:33:16.996577


## Ex. 15

If we want to know the total nomber of these transactions we can do it with the next query

In [20]:
query_15 = '''
          SELECT COUNT(*) FROM payment
          WHERE amount BETWEEN 5 AND 8
          '''

payments = get_data(query_15)
payments

Unnamed: 0,count
0,2838


Ex. 16

We can laso deny the BETWEEN statement. Let's ask for the amount of payments that were not made within the 5\$ and 8\$ range

In [21]:
query_14 = '''
          SELECT COUNT(*) FROM payment
          WHERE amount NOT BETWEEN 5 AND 8
          '''

payments = get_data(query_14)
payments

Unnamed: 0,count
0,11758
