## Data Types and Casting

Let's talk about data types. Some databases support only certain data types (for example, SQLite doesn't support datetime or timestamps). The database we are currently using, Postgres, supports many types from timestamps to varchar arrays. 

This exercise will get you up to speed on understanding data types and learning how to cast from one type to another.

In [1]:
# Load the SQL magic extension
%load_ext sql
# Connect to the default database (using SQLAlchemy)
%sql postgresql://localhost/postgres
# Truncate output of your queries so that it's not blowing up the notebook
%config SqlMagic.displaylimit = 10

### Overview of Data Types

There are several different types of data. Just a few important data types to name: numbers, text, and collections. In database terms, these are referred to integers / doubles for numbers, strings for text, and arrays for collections. Most of the time, when you look at a data set, you can probably guess what data type it is.

Take a quick look at the `actor` table.

In [28]:
%%sql
-- What are the data types for actor_id and first_name?
select * from actor limit 5

 * postgresql://localhost/postgres
5 rows affected.


actor_id,first_name,last_name,last_update
1,Penelope,Guiness,2006-02-15 10:05:00
2,Nick,Wahlberg,2006-02-15 10:05:00
3,Ed,Chase,2006-02-15 10:05:00
4,Jennifer,Davis,2006-02-15 10:05:00
5,Johnny,Lollobrigida,2006-02-15 10:05:00


To double check, we can use the following query to confirm our suspicions. For some systems, there is a user interface layered on top of the data bricks (i.e. Toad and Oracle) that will allow for easier navigation to this information. 

In [29]:
%%sql
select 
  column_name, 
  data_type 
from 
  information_schema.columns
where 
  table_name = 'film';

 * postgresql://localhost/postgres
13 rows affected.


column_name,data_type
film_id,integer
title,character varying
description,character varying
release_year,integer
language_id,integer
rental_duration,integer
rental_rate,double precision
length,integer
replacement_cost,double precision
rating,character varying


Data types are important for when we have to compare between two different variables (i.e. checking equality or something else). In some systems, you can compare integers to strings because the backend will automatically convert the numbers to a string. However, in some systems, we have to cast them to the appropriate data type before we can perform conditional logic on them. 

In [20]:
%%sql
-- Try checking if last_update is greater than last_name
-- What kind of error do you get?
select * from actor where last_update > last_name

 * postgresql://localhost/postgres
(psycopg2.errors.UndefinedFunction) operator does not exist: timestamp without time zone > character varying
LINE 3: select * from actor where last_update > last_name
                                              ^
HINT:  No operator matches the given name and argument type(s). You might need to add explicit type casts.
 [SQL: '-- Try checking if last_update is greater than last_name\n-- What kind of error do you get?\nselect * from actor where last_update > last_name'] (Background on this error at: http://sqlalche.me/e/f405)


In [16]:
%%sql
-- To cast a variable to another type, you need to use the cast() function
-- But note that comparing last_update to first_name might not make sense! 
select * from actor where cast(last_update as varchar) < last_name

 * postgresql://localhost/postgres
200 rows affected.


actor_id,first_name,last_name,last_update
1,Penelope,Guiness,2006-02-15 10:05:00
2,Nick,Wahlberg,2006-02-15 10:05:00
3,Ed,Chase,2006-02-15 10:05:00
4,Jennifer,Davis,2006-02-15 10:05:00
5,Johnny,Lollobrigida,2006-02-15 10:05:00
6,Bette,Nicholson,2006-02-15 10:05:00
7,Grace,Mostel,2006-02-15 10:05:00
8,Matthew,Johansson,2006-02-15 10:05:00
9,Joe,Swank,2006-02-15 10:05:00
10,Christian,Gable,2006-02-15 10:05:00


### Strings
A string is a sequence of characters, either as a  constant or as a variable. Strings are used to represent text, but numbers and dates may still be string types.    
     
#### Example Strings: 
“String”, “Str!nG2!”, “34578”, “U84-32-12-44”, “2018-12-10”  

In [21]:
%%sql
-- How do you manually code a string as a field?
-- Note that for Postgres, strings are in single quotes
select
  'Julie' as cdl_employee_0,
  1 as employee_0_id,
  'Laura' as cdl_employee_1,
  2 as employee_1_id
from
  actor

 * postgresql://localhost/postgres
200 rows affected.


cdl_employee_0,employee_0_id,cdl_employee_1,employee_1_id
Julie,1,Laura,2
Julie,1,Laura,2
Julie,1,Laura,2
Julie,1,Laura,2
Julie,1,Laura,2
Julie,1,Laura,2
Julie,1,Laura,2
Julie,1,Laura,2
Julie,1,Laura,2
Julie,1,Laura,2


### Substrings
When working with strings, you may want to investigate certain portions of strings in your table (AKA substrings). For example, **“Cascade”** is a substring of **“Cascade Data Labs”**.  
  
There are many functions in SQL used to pull substrings. The most common are:

[`left()`](https://www.techonthenet.com/sql_server/functions/right.php) pulls a substring starting from the left side <br>
[`right()`](https://www.techonthenet.com/sql_server/functions/left.php) pulls a substring starting from the right side <br>
[`substring()`](https://www.techonthenet.com/sql_server/functions/substring.php) pulls a substring starting from anywhere (as long as you specify the start and end)

In [27]:
%%sql
left

 * postgresql://localhost/postgres
(psycopg2.errors.SyntaxError) syntax error at or near "left"
LINE 1: left
        ^
 [SQL: 'left'] (Background on this error at: http://sqlalche.me/e/f405)


### String Matching

When working with data, you often want to filter your output to certain instances of strings. For example, you may want to filter out specific products or filter for a certain customer. We will investigate the different methods in string matching through the films and inventory tables.

Common string cleaning functions: <br>
`upper()`
`lower()`
`trim()`

**WHERE-RELATIONAL OPERATOR Clause**  
The WHERE-RELATIONAL OPERATOR Clause is the simplest string matching method (and is not limited to the string data type), but also the least robust. 

**TIP:** For exact string matching of alphabetical strings, it is often easiest to user **UPPER(** string **)** (function to make entire string uppercase) on the word you are searching for, as string data can be inconsistent.

**Film Table**
To get a sense for the information in the film table, let's pull all the fields for the film "Airport Pollock". 

### TO DO: Figure out a way to articulate why single cases are useful in QA, specifically in tracking issues

### Example Query 3:

In [25]:
%%sql
SELECT * 
FROM film 
-- WHERE-RELATIONAL OPERATOR CLAUSE
WHERE UPPER(title) = 'AIRPORT POLLOCK'

 * postgresql+psycopg2://andrewmahler@localhost:5432/sqltool
1 rows affected.


film_id,title,description,release_year,language_id,rental_duration,rental_rate,length,replacement_cost,rating,last_update,special_features
8,Airport Pollock,A Epic Tale of a Moose And a Girl who must Confront a Monkey in Ancient India,2006,1,6,4.99,54,15.99,R,2013-05-26 14:50:58.951000,['Trailers']


The above output reveals some interesting things about this film table:

    1. This DVD store contains films where moose are protagonists and monkeys are anti-heroes.
    2. Values for the special features field are arrays.

### Test Query 3:  In the cell below, write a query to pull the description, release_year, and rental_rate for the film "african egg". Begin the query below "answer <<" 

In [26]:
%%sql 
answer << 
select description, release_year, rental_rate from film where UPPER(title) = 'AFRICAN EGG'

 * postgresql+psycopg2://andrewmahler@localhost:5432/sqltool
1 rows affected.
Returning data to local variable answer


### Question 3: Run the below cell after you've run Test Query 3

In [20]:
# Check your query by running this cell
''' 
NOTE: The output of a dataframe contains an index that uniquely identifies rows,
which is different than the output of a SQL query. 

This does not mean that it serves as the table's primary key.
'''
AfricanEggQuestion(answer)

Correct!


**WHERE-LIKE Clause**  
In that last query, we pulled rows with the title "african egg". But what if we we wanted to pull rows where the title contains the word "africa" (or an extension of the word like african). In order to accomplish this, you need the LIKE clause, which allows for pattern matching. The LIKE clause uses wildcards to identify patterns. There are two wildcards used in conjunction with the LIKE operator:   
    1. % The percent sign represents zero, one, or multiple characters       
    2. _ The underscore represents a single character
  
**TIP**: When pattern matching, make your search pattern as specific as possible. The more general your pattern is, the more likely it will return unexpected outputs.

### Example Query 4:

In [22]:
%%sql
SELECT film_id, title FROM film WHERE UPPER(title) LIKE '%AFRICA%'

 * postgresql+psycopg2://andrewmahler@localhost:5432/sqltool
3 rows affected.


film_id,title
5,African Egg
472,Italian African
637,Open African
