**Setup**

In [1]:
# Library
import pandas as pd
from sqlalchemy import create_engine

In [2]:
# Define the database connection parameters
db_params = {
    'host': 'localhost',
    'database': 'dvdrental',
    'user': 'postgres',
    'password': 'admin',
    'port': '5432'  # PostgreSQL default port
}

# Connect to the 'soccer' database
engine = create_engine(f'postgresql://{db_params["user"]}:{db_params["password"]}@{db_params["host"]}/{db_params["database"]}')

**Getting information about your database**

As we saw in the video, PostgreSQL has a system database called `INFORMATION_SCHEMA` that allows us to extract information about objects, including tables, in our database.

In this exercise we will look at how to query the `tables` table of the `INFORMATION_SCHEMA` database to discover information about tables in the DVD Rentals database including the name, type, schema, and catalog of all tables and views and then how to use the results to get additional information about columns in our tables.

**Instructions****

- Select all columns from the `INFORMATION_SCHEMA.TABLES` system database. Limit results that have a public `table_schema`.

In [4]:
query = """
 -- Select all columns from the TABLES system database
 SELECT * 
 FROM INFORMATION_SCHEMA.TABLES
 -- Filter by schema
 WHERE table_schema = 'public';
"""
result = pd.read_sql_query(query, engine)
result

Unnamed: 0,table_catalog,table_schema,table_name,table_type,self_referencing_column_name,reference_generation,user_defined_type_catalog,user_defined_type_schema,user_defined_type_name,is_insertable_into,is_typed,commit_action
0,dvdrental,public,actor,BASE TABLE,,,,,,YES,NO,
1,dvdrental,public,actor_info,VIEW,,,,,,NO,NO,
2,dvdrental,public,customer_list,VIEW,,,,,,NO,NO,
3,dvdrental,public,film_list,VIEW,,,,,,NO,NO,
4,dvdrental,public,nicer_but_slower_film_list,VIEW,,,,,,NO,NO,
5,dvdrental,public,sales_by_film_category,VIEW,,,,,,NO,NO,
6,dvdrental,public,store,BASE TABLE,,,,,,YES,NO,
7,dvdrental,public,sales_by_store,VIEW,,,,,,NO,NO,
8,dvdrental,public,staff_list,VIEW,,,,,,NO,NO,
9,dvdrental,public,address,BASE TABLE,,,,,,YES,NO,


- Select all columns from the `INFORMATION_SCHEMA.COLUMNS` system database. Limit by `table_name` to `actor`

In [5]:
query = """
 -- Select all columns from the COLUMNS system database
 SELECT * 
 FROM INFORMATION_SCHEMA.COLUMNS 
 -- Limit to the customer table
 WHERE table_name = 'actor';
"""
result = pd.read_sql_query(query, engine)
result

Unnamed: 0,table_catalog,table_schema,table_name,column_name,ordinal_position,column_default,is_nullable,data_type,character_maximum_length,character_octet_length,...,is_identity,identity_generation,identity_start,identity_increment,identity_maximum,identity_minimum,identity_cycle,is_generated,generation_expression,is_updatable
0,dvdrental,public,actor,actor_id,1,nextval('actor_actor_id_seq'::regclass),NO,integer,,,...,NO,,,,,,NO,NEVER,,YES
1,dvdrental,public,actor,last_update,4,now(),NO,timestamp without time zone,,,...,NO,,,,,,NO,NEVER,,YES
2,dvdrental,public,actor,first_name,2,,NO,character varying,45.0,180.0,...,NO,,,,,,NO,NEVER,,YES
3,dvdrental,public,actor,last_name,3,,NO,character varying,45.0,180.0,...,NO,,,,,,NO,NEVER,,YES


**Determining data types**

The `columns` table of the `INFORMATION_SCHEMA` database also allows us to extract information about the data types of columns in a table. We can extract information like the character or string length of a `CHAR` or `VARCHAR` column or the precision of a `DECIMAL` or `NUMERIC` floating point type.

Using the techniques you learned in the lesson, let's explore the `customer` table of our DVD Rental database.

**Numbering Olympic games in ascending order**

The Summer Olympics dataset contains the results of the games between 1896 and 2012. The first Summer Olympics were held in 1896, the second in 1900, and so on. What if you want to easily query the table to see in which year the 13th Summer Olympics were held? You'd need to number the rows for that.


**Instructions100 XP**

- Select the column name and data type from the `INFORMATION_SCHEMA.COLUMNS` system database.
- Limit results to only include the `customer` table.

In [6]:
query = """
-- Get the column name and data type
SELECT
 	column_name, 
    data_type
-- From the system database information schema
FROM INFORMATION_SCHEMA.COLUMNS 
-- For the customer table
WHERE table_name ='customer';
"""
result = pd.read_sql_query(query, engine)
result

Unnamed: 0,column_name,data_type
0,active,integer
1,store_id,smallint
2,create_date,date
3,last_update,timestamp without time zone
4,customer_id,integer
5,address_id,smallint
6,activebool,boolean
7,first_name,character varying
8,last_name,character varying
9,email,character varying


**Interval data types**

`INTERVAL` data types provide you with a very useful tool for performing arithmetic on date and time data types. For example, let's say our rental policy requires a DVD to be returned within 3 days. We can calculate the `expected_return_date` for a given DVD rental by adding an `INTERVAL` of 3 days to the `rental_date` from the `rental` table. We can then compare this result to the actual `return_date` to determine if the DVD was returned late.

**Instructions**

- Select the rental date and return date from the `rental` table.
- Add an `INTERVAL` of 3 days to the `rental_date` to calculate the expected return `date`.

In [7]:
query = """
SELECT
 	-- Select the rental and return dates
	rental_date,
	return_date,
 	-- Calculate the expected_return_date
	rental_date + INTERVAL '3 days' AS expected_return_date
FROM rental;
"""
result = pd.read_sql_query(query, engine)
result

Unnamed: 0,rental_date,return_date,expected_return_date
0,2005-05-24 22:54:33,2005-05-28 19:40:33,2005-05-27 22:54:33
1,2005-05-24 23:03:39,2005-06-01 22:12:39,2005-05-27 23:03:39
2,2005-05-24 23:04:41,2005-06-03 01:43:41,2005-05-27 23:04:41
3,2005-05-24 23:05:21,2005-06-02 04:33:21,2005-05-27 23:05:21
4,2005-05-24 23:08:07,2005-05-27 01:32:07,2005-05-27 23:08:07
...,...,...,...
16039,2005-08-23 22:26:47,2005-08-27 18:02:47,2005-08-26 22:26:47
16040,2005-08-23 22:42:48,2005-08-25 02:48:48,2005-08-26 22:42:48
16041,2005-08-23 22:43:07,2005-08-31 21:33:07,2005-08-26 22:43:07
16042,2005-08-23 22:50:12,2005-08-30 01:01:12,2005-08-26 22:50:12


**Accessing data in an ARRAY**

In our DVD Rentals database, the film table contains an ARRAY for `special_features` which has a type of `TEXT[]`. Much like any `ARRAY` data type in PostgreSQL, a `TEXT[]` array can store an array of `TEXT` values. This comes in handy when you want to store things like phone numbers or email addresses as we saw in the lesson.

Let's take a look at the `special_features` column and also practice accessing data in the ARRAY.

**Instructions**

- Select the title and special features from the `film` table and compare the results between the two columns.

In [8]:
query = """
-- Select the title and special features column 
SELECT 
  title, 
  special_features 
FROM film;
"""
result = pd.read_sql_query(query, engine)
result

Unnamed: 0,title,special_features
0,Chamber Italian,[Trailers]
1,Grosse Wonderful,[Behind the Scenes]
2,Airport Pollock,[Trailers]
3,Bright Encounters,[Trailers]
4,Academy Dinosaur,"[Deleted Scenes, Behind the Scenes]"
...,...,...
995,Young Language,"[Trailers, Behind the Scenes]"
996,Youth Kick,"[Trailers, Behind the Scenes]"
997,Zhivago Core,[Deleted Scenes]
998,Zoolander Fiction,"[Trailers, Deleted Scenes]"


- Select all films that have a special feature `Trailers` by filtering on the first index of the `special_features` ARRAY.

In [9]:
query = """
-- Select the title and special features column 
SELECT 
  title, 
  special_features 
FROM film
-- Use the array index of the special_features column
WHERE special_features[1] = 'Trailers';
"""
result = pd.read_sql_query(query, engine)
result

Unnamed: 0,title,special_features
0,Chamber Italian,[Trailers]
1,Airport Pollock,[Trailers]
2,Bright Encounters,[Trailers]
3,Ace Goldfinger,"[Trailers, Deleted Scenes]"
4,Adaptation Holes,"[Trailers, Deleted Scenes]"
...,...,...
530,Yentl Idaho,"[Trailers, Commentaries, Deleted Scenes]"
531,Young Language,"[Trailers, Behind the Scenes]"
532,Youth Kick,"[Trailers, Behind the Scenes]"
533,Zoolander Fiction,"[Trailers, Deleted Scenes]"


- Now let's select all films that have `Deleted Scenes` in the second index of the `special_features` ARRAY.

In [10]:
query = """
-- Select the title and special features column 
SELECT 
  title, 
  special_features 
FROM film
-- Use the array index of the special_features column
WHERE special_features[2] = 'Deleted Scenes';
"""
result = pd.read_sql_query(query, engine)
result

Unnamed: 0,title,special_features
0,Ace Goldfinger,"[Trailers, Deleted Scenes]"
1,Adaptation Holes,"[Trailers, Deleted Scenes]"
2,Airplane Sierra,"[Trailers, Deleted Scenes]"
3,Alabama Devil,"[Trailers, Deleted Scenes]"
4,Aladdin Calendar,"[Trailers, Deleted Scenes]"
...,...,...
241,Whisperer Giant,"[Trailers, Deleted Scenes]"
242,Wind Phantom,"[Commentaries, Deleted Scenes]"
243,Wizard Coldblooded,"[Commentaries, Deleted Scenes, Behind the Scenes]"
244,Working Microcosmos,"[Commentaries, Deleted Scenes]"


**Searching an ARRAY with ANY**

As we saw in the video, PostgreSQL also provides the ability to filter results by searching for values in an ARRAY. The `ANY` function allows you to search for a value in any index position of an ARRAY. Here's an example.

```
WHERE 'search text' = ANY(array_name)

```

When using the `ANY` function, the value you are filtering on appears on the left side of the equation with the name of the ARRAY column as the parameter in the `ANY` function.

**Instructions**

- Match `'Trailers'` in any index of the `special_features` ARRAY regardless of position.

In [11]:
query = """
SELECT 
  title, 
  special_features 
FROM film 
-- Modify the query to use the ANY function 
WHERE 'Trailers' = ANY (special_features);
"""
result = pd.read_sql_query(query, engine)
result

Unnamed: 0,title,special_features
0,Chamber Italian,[Trailers]
1,Airport Pollock,[Trailers]
2,Bright Encounters,[Trailers]
3,Ace Goldfinger,"[Trailers, Deleted Scenes]"
4,Adaptation Holes,"[Trailers, Deleted Scenes]"
...,...,...
530,Yentl Idaho,"[Trailers, Commentaries, Deleted Scenes]"
531,Young Language,"[Trailers, Behind the Scenes]"
532,Youth Kick,"[Trailers, Behind the Scenes]"
533,Zoolander Fiction,"[Trailers, Deleted Scenes]"


**Searching an ARRAY with @>**

The contains operator `@>` operator is alternative syntax to the `ANY` function and matches data in an ARRAY using the following syntax.
```
WHERE array_name @> ARRAY['search text'] :: type[]

```
So let's practice using this operator in the exercise.

**Instructions**

- Use the contains operator to match the text `Deleted Scenes` in the `special_features` column.

In [12]:
query = """
SELECT 
  title, 
  special_features 
FROM film 
-- Filter where special_features contains 'Deleted Scenes'
WHERE special_features @> ARRAY[ 'Deleted Scenes' ];
"""
result = pd.read_sql_query(query, engine)
result

Unnamed: 0,title,special_features
0,Academy Dinosaur,"[Deleted Scenes, Behind the Scenes]"
1,Ace Goldfinger,"[Trailers, Deleted Scenes]"
2,Adaptation Holes,"[Trailers, Deleted Scenes]"
3,African Egg,[Deleted Scenes]
4,Agent Truman,[Deleted Scenes]
...,...,...
498,Worst Banger,"[Deleted Scenes, Behind the Scenes]"
499,Wyoming Storm,[Deleted Scenes]
500,Yentl Idaho,"[Trailers, Commentaries, Deleted Scenes]"
501,Zhivago Core,[Deleted Scenes]
