In [4]:
import pandas as pd
import psycopg2

def execute_query(sql_query, dbname='sakila', user='postgres', password='postgres', port='5432'):
    # Create a connection to the PostgreSQL database
    conn = psycopg2.connect(dbname=dbname, user=user, password=password, port=port)

    # Use read_sql to execute the query and load the results into a DataFrame
    df = pd.read_sql(sql_query, conn)

    # Close the database connection
    conn.close()

    # Return the DataFrame
    return df



# Text data types

You learned about some of the common data types that you'll work within PostgreSQL, some characteristics of these types, and how to determine the data type of a column in an existing table. Think back and answer the following question:

Which of the following is  a valid text data type in PostgreSQL?

- TEXT
- CHAR
- VARCHAR

# Getting information about your database

PostgreSQL has a system database called INFORMATION_SCHEMA that allows us to extract information about objects, including tables, in our database.

In this exercise we will look at how to query the tables table of the `INFORMATION_SCHEMA` database to discover information about tables in the DVD Rentals database including the name, type, schema, and catalog of all tables and views and then how to use the results to get additional information about columns in our tables.

In [9]:
query_result = execute_query(
    """
 -- Select all columns from the TABLES system database
 SELECT * 
 FROM INFORMATION_SCHEMA.tables
 -- Filter by schema
 WHERE table_schema = 'public';
    """)
query_result.head()

Unnamed: 0,table_catalog,table_schema,table_name,table_type,self_referencing_column_name,reference_generation,user_defined_type_catalog,user_defined_type_schema,user_defined_type_name,is_insertable_into,is_typed,commit_action
0,sakila,public,actor_info,VIEW,,,,,,NO,NO,
1,sakila,public,customer_list,VIEW,,,,,,NO,NO,
2,sakila,public,film_list,VIEW,,,,,,NO,NO,
3,sakila,public,nicer_but_slower_film_list,VIEW,,,,,,NO,NO,
4,sakila,public,sales_by_film_category,VIEW,,,,,,NO,NO,


In [8]:
query_result = execute_query(
    """
 -- Select all columns from the COLUMNS system database
 SELECT * 
 FROM INFORMATION_SCHEMA.COLUMNS 
 WHERE table_name = 'actor';
    """)
query_result.head()

Unnamed: 0,table_catalog,table_schema,table_name,column_name,ordinal_position,column_default,is_nullable,data_type,character_maximum_length,character_octet_length,...,is_identity,identity_generation,identity_start,identity_increment,identity_maximum,identity_minimum,identity_cycle,is_generated,generation_expression,is_updatable
0,sakila,public,actor,actor_id,1,nextval('actor_actor_id_seq'::regclass),NO,integer,,,...,NO,,,,,,NO,NEVER,,YES
1,sakila,public,actor,last_update,4,now(),NO,timestamp without time zone,,,...,NO,,,,,,NO,NEVER,,YES
2,sakila,public,actor,first_name,2,,NO,character varying,45.0,180.0,...,NO,,,,,,NO,NEVER,,YES
3,sakila,public,actor,last_name,3,,NO,character varying,45.0,180.0,...,NO,,,,,,NO,NEVER,,YES


# Determining data types

The columns table of the INFORMATION_SCHEMA database also allows us to extract information about the data types of columns in a table. We can extract information like the character or string length of a `CHAR` or `VARCHAR` column or the precision of a `DECIMAL` or `NUMERIC` floating point type.

In [10]:
query_result = execute_query(
    """
-- Get the column name and data type
SELECT
 	column_name, 
    data_type
-- From the system database information schema
FROM INFORMATION_SCHEMA.COLUMNS 
-- For the customer table
WHERE table_name = 'customer';
    """)
query_result.head()

Unnamed: 0,column_name,data_type
0,active,integer
1,store_id,integer
2,create_date,date
3,last_update,timestamp without time zone
4,customer_id,integer


# Properties of date and time data types

Which of the following is correct?

- `TIMESTAMP` data types contain both date and time values.
- `DATE` data types use an yyyy-mm-dd format.
- `INTERVAL` types are representations of periods of time.
- `TIME` data types are not stored with a timezone by default.

# Interval data types

`INTERVAL` data types provide you with a very useful tool for performing arithmetic on date and time data types. 

In [11]:
query_result = execute_query(
    """
SELECT
 	-- Select the rental and return dates
	rental_date,
	return_date,
 	-- Calculate the expected_return_date
	rental_date + INTERVAL '3 days' AS expected_return_date
FROM rental;
    """)
query_result.head()

Unnamed: 0,rental_date,return_date,expected_return_date
0,2005-05-24 22:53:30,2005-05-26 22:04:30,2005-05-27 22:53:30
1,2005-05-24 22:54:33,2005-05-28 19:40:33,2005-05-27 22:54:33
2,2005-05-24 23:03:39,2005-06-01 22:12:39,2005-05-27 23:03:39
3,2005-05-24 23:04:41,2005-06-03 01:43:41,2005-05-27 23:04:41
4,2005-05-24 23:05:21,2005-06-02 04:33:21,2005-05-27 23:05:21


# Accessing data in an ARRAY

In our DVD Rentals database, the film table contains an ARRAY for `special_features` which has a type of `TEXT[]`. Much like any ARRAY data type in PostgreSQL, a TEXT[] array can store an array of TEXT values. This comes in handy when you want to store things like phone numbers or email addresses as we saw in the lesson.

Let's take a look at the `special_features` column and also practice accessing data in the ARRAY.

In [12]:
query_result = execute_query(
    """
-- Select the title and special features column 
SELECT 
  title, 
  special_features 
FROM film;
    """)
query_result.head()

Unnamed: 0,title,special_features
0,ACADEMY DINOSAUR,"[Deleted Scenes, Behind the Scenes]"
1,ACE GOLDFINGER,"[Trailers, Deleted Scenes]"
2,ADAPTATION HOLES,"[Trailers, Deleted Scenes]"
3,AFFAIR PREJUDICE,"[Commentaries, Behind the Scenes]"
4,AFRICAN EGG,[Deleted Scenes]


In [13]:
query_result = execute_query(
    """
-- Select the title and special features column 
SELECT 
  title, 
  special_features 
FROM film
-- Use the array index of the special_features column
WHERE special_features[1] = 'Trailers';
    """)
query_result.head()

Unnamed: 0,title,special_features
0,ACE GOLDFINGER,"[Trailers, Deleted Scenes]"
1,ADAPTATION HOLES,"[Trailers, Deleted Scenes]"
2,AIRPLANE SIERRA,"[Trailers, Deleted Scenes]"
3,AIRPORT POLLOCK,[Trailers]
4,ALABAMA DEVIL,"[Trailers, Deleted Scenes]"


In [14]:
query_result = execute_query(
    """
-- Select the title and special features column 
SELECT 
  title, 
  special_features 
FROM film
-- Use the array index of the special_features column
WHERE special_features[2] = 'Deleted Scenes';
    """)
query_result.head()

Unnamed: 0,title,special_features
0,ACE GOLDFINGER,"[Trailers, Deleted Scenes]"
1,ADAPTATION HOLES,"[Trailers, Deleted Scenes]"
2,AIRPLANE SIERRA,"[Trailers, Deleted Scenes]"
3,ALABAMA DEVIL,"[Trailers, Deleted Scenes]"
4,ALADDIN CALENDAR,"[Trailers, Deleted Scenes]"


# Searching an ARRAY with ANY

PostgreSQL also provides the ability to filter results by searching for values in an ARRAY. The `ANY` function allows you to search for a value in any index position of an ARRAY. Here's an example.

In [18]:
query_result = execute_query(
    """
SELECT
  title, 
  special_features 
FROM film 
-- Modify the query to use the ANY function 
WHERE 'Trailers' = ANY (special_features);
    """)
query_result.head()

Unnamed: 0,title,special_features
0,ACE GOLDFINGER,"[Trailers, Deleted Scenes]"
1,ADAPTATION HOLES,"[Trailers, Deleted Scenes]"
2,AIRPLANE SIERRA,"[Trailers, Deleted Scenes]"
3,AIRPORT POLLOCK,[Trailers]
4,ALABAMA DEVIL,"[Trailers, Deleted Scenes]"


# Searching an ARRAY with @>

The contains operator @> operator is alternative syntax to the ANY function and matches data in an ARRAY

In [19]:
query_result = execute_query(
    """
SELECT 
  title, 
  special_features 
FROM film 
-- Filter where special_features contains 'Deleted Scenes'
WHERE special_features  @> ARRAY['Deleted Scenes'];
    """)
query_result.head()

Unnamed: 0,title,special_features
0,ACADEMY DINOSAUR,"[Deleted Scenes, Behind the Scenes]"
1,ACE GOLDFINGER,"[Trailers, Deleted Scenes]"
2,ADAPTATION HOLES,"[Trailers, Deleted Scenes]"
3,AFRICAN EGG,[Deleted Scenes]
4,AGENT TRUMAN,[Deleted Scenes]
