**Setup**

In [1]:
# Library
import pandas as pd
from sqlalchemy import create_engine

In [2]:
# Define the database connection parameters
db_params = {
    'host': 'localhost',
    'database': 'dvdrental',
    'user': 'postgres',
    'password': 'admin',
    'port': '5432'  # PostgreSQL default port
}

# Connect to the 'soccer' database
engine = create_engine(f'postgresql://{db_params["user"]}:{db_params["password"]}@{db_params["host"]}/{db_params["database"]}')

**Adding and subtracting date and time values**

In this exercise, you will calculate the actual number of days rented as well as the true `expected_return_date` by using the `rental_duration` column from the `film` table along with the familiar `rental_date` from the `rental` table.

This will require that you dust off the skills you learned from prior courses on how to join two or more tables together. To select columns from both the `film` and `rental` tables in a single query, we'll need to use the `inventory` table to join these two tables together since there is no explicit relationship between them. Let's give it a try!

**Instructions**

- Subtract the `rental_date` from the `return_date` to calculate the number of `days_rented`.

In [3]:
query = """
SELECT f.title, f.rental_duration,
       -- Calculate the number of days rented
       r.return_date - r.rental_date AS days_rented
FROM film AS f
     INNER JOIN inventory AS i ON f.film_id = i.film_id
     INNER JOIN rental AS r ON i.inventory_id = r.inventory_id
ORDER BY f.title;
"""
result = pd.read_sql_query(query, engine)
result

Unnamed: 0,title,rental_duration,days_rented
0,Academy Dinosaur,6,3 days 02:23:00
1,Academy Dinosaur,6,5 days 18:51:00
2,Academy Dinosaur,6,4 days 20:17:00
3,Academy Dinosaur,6,4 days 03:25:00
4,Academy Dinosaur,6,0 days 23:15:00
...,...,...,...
16039,Zorro Ark,3,7 days 00:49:00
16040,Zorro Ark,3,6 days 01:03:00
16041,Zorro Ark,3,4 days 18:41:00
16042,Zorro Ark,3,1 days 00:05:00


- Now use the AGE() function to calculate the days_rented.

In [4]:
query = """
SELECT f.title, f.rental_duration,
	-- Calculate the number of days rented
	AGE(r.return_date, r.rental_date) AS days_rented
FROM film AS f
	INNER JOIN inventory AS i ON f.film_id = i.film_id
	INNER JOIN rental AS r ON i.inventory_id = r.inventory_id
ORDER BY f.title;
"""
result = pd.read_sql_query(query, engine)
result

Unnamed: 0,title,rental_duration,days_rented
0,Academy Dinosaur,6,3 days 02:23:00
1,Academy Dinosaur,6,5 days 18:51:00
2,Academy Dinosaur,6,4 days 20:17:00
3,Academy Dinosaur,6,4 days 03:25:00
4,Academy Dinosaur,6,0 days 23:15:00
...,...,...,...
16039,Zorro Ark,3,7 days 00:49:00
16040,Zorro Ark,3,6 days 01:03:00
16041,Zorro Ark,3,4 days 18:41:00
16042,Zorro Ark,3,1 days 00:05:00


**INTERVAL arithmetic**

If you were running a real DVD Rental store, there would be times when you would need to determine what film titles were currently out for rental with customers. In the previous exercise, we saw that some of the records in the results had a `NULL` value for the `return_date`. This is because the rental was still outstanding.

Each rental in the `film` table has an associated `rental_duration` column which represents the number of days that a DVD can be rented by a customer before it is considered late. In this example, you will exclude films that have a `NULL` value for the `return_date` and also convert the `rental_duration` to an `INTERVAL` type. Here's a reminder of one method for performing this conversion.

```
SELECT INTERVAL '1' day * timestamp '2019-04-10 12:34:56'

```

**Instructions**

- Convert `rental_duration` by multiplying it with a 1 day `INTERVAL`
- Subtract the `rental_date` from the `return_date` to calculate the number of `days_rented`.
- Exclude rentals with a `NULL` value for `return_date`.

In [5]:
query = """
SELECT
    f.title,
 	-- Convert the rental_duration to an interval
    INTERVAL '1' day * f.rental_duration,
 	-- Calculate the days rented as we did previously
    r.return_date - r.rental_date AS days_rented
FROM film AS f
    INNER JOIN inventory AS i ON f.film_id = i.film_id
    INNER JOIN rental AS r ON i.inventory_id = r.inventory_id
-- Filter the query to exclude outstanding rentals
WHERE r.return_date IS NOT NULL
ORDER BY f.title;
"""
result = pd.read_sql_query(query, engine)
result

Unnamed: 0,title,?column?,days_rented
0,Academy Dinosaur,6 days,3 days 19:44:00
1,Academy Dinosaur,6 days,0 days 23:15:00
2,Academy Dinosaur,6 days,2 days 19:02:00
3,Academy Dinosaur,6 days,5 days 18:51:00
4,Academy Dinosaur,6 days,7 days 03:12:00
...,...,...,...
15856,Zorro Ark,3 days,2 days 04:40:00
15857,Zorro Ark,3 days,0 days 21:00:00
15858,Zorro Ark,3 days,7 days 22:08:00
15859,Zorro Ark,3 days,4 days 18:41:00


**Calculating the expected return date**

So now that you've practiced how to add and subtract timestamps and perform relative calculations using intervals, let's use those new skills to calculate the actual expected return date of a specific rental. As you've seen in previous exercises, the `rental_duration` is the number of days allowed for a rental before it's considered late. To calculate the `expected_return_date` you will want to use the `rental_duration` and add it to the `rental_date`.

**Instructions**

- Convert `rental_duration` by multiplying it with a 1-day `INTERVAL`.
- Add it to the rental date.

In [6]:
query = """

SELECT
    f.title,
	r.rental_date,
    f.rental_duration,
    -- Add the rental duration to the rental date
    INTERVAL '1' day * f.rental_duration + r.rental_date AS expected_return_date,
    r.return_date
FROM film AS f
    INNER JOIN inventory AS i ON f.film_id = i.film_id
    INNER JOIN rental AS r ON i.inventory_id = r.inventory_id
ORDER BY f.title;
"""
result = pd.read_sql_query(query, engine)
result

Unnamed: 0,title,rental_date,rental_duration,expected_return_date,return_date
0,Academy Dinosaur,2005-07-31 21:36:07,6,2005-08-06 21:36:07,2005-08-03 23:59:07
1,Academy Dinosaur,2005-07-31 22:08:29,6,2005-08-06 22:08:29,2005-08-06 16:59:29
2,Academy Dinosaur,2005-07-27 07:51:11,6,2005-08-02 07:51:11,2005-08-01 04:08:11
3,Academy Dinosaur,2005-08-18 18:36:16,6,2005-08-24 18:36:16,2005-08-22 22:01:16
4,Academy Dinosaur,2005-08-02 00:47:19,6,2005-08-08 00:47:19,2005-08-03 00:02:19
...,...,...,...,...,...
16039,Zorro Ark,2005-06-17 15:47:00,3,2005-06-20 15:47:00,2005-06-24 16:36:00
16040,Zorro Ark,2005-08-02 21:00:05,3,2005-08-05 21:00:05,2005-08-08 22:03:05
16041,Zorro Ark,2005-08-01 10:11:25,3,2005-08-04 10:11:25,2005-08-06 04:52:25
16042,Zorro Ark,2005-05-31 11:10:17,3,2005-06-03 11:10:17,2005-06-01 11:15:17


**Working with the current date and time**

Because the Sakila database is a bit dated and most of the date and time values are from 2005 or 2006, you are going to practice using the current date and time in our queries without using Sakila. You'll get back into working with this database in the next video and throughout the remainder of the course. For now, let's practice the techniques you learned about so far in this chapter to work with the current date and time.

As you learned in the video, `NOW()` and `CURRENT_TIMESTAMP` can be used interchangeably.

**Instructions**

- Use `NOW()` to select the current timestamp with timezone.
- Select the current date without any time value.
- Now, let's use the `CAST()` function to eliminate the timezone from the current timestamp.
- Finally, let's select the current date.Use `CAST()` to retrieve the same result from the `NOW()` function.

In [8]:
query = """
-- Select the current timestamp
SELECT NOW();
"""
result = pd.read_sql_query(query, engine)
result

Unnamed: 0,now
0,2023-09-19 16:16:04.143422+00:00


In [9]:
query = """
--Select the current timestamp without a timezone
SELECT CAST( NOW() AS timestamp )
"""
result = pd.read_sql_query(query, engine)
result

Unnamed: 0,now
0,2023-09-19 23:16:21.046312


In [10]:
query = """
--Select the current timestamp without a timezone
SELECT CAST( NOW() AS timestamp )
"""
result = pd.read_sql_query(query, engine)
result

Unnamed: 0,now
0,2023-09-19 23:16:31.783285


In [11]:
query = """
SELECT 
	-- Select the current date
	CURRENT_DATE,
    -- CAST the result of the NOW() function to a date
    CAST( NOW() AS date )
"""
result = pd.read_sql_query(query, engine)
result

Unnamed: 0,current_date,now
0,2023-09-19,2023-09-19


**Manipulating the current date and time**

Most of the time when you work with the current date and time, you will want to transform, manipulate, or perform operations on the value in your queries. In this exercise, you will practice adding an `INTERVAL` to the current timestamp as well as perform some more advanced calculations.

Let's practice retrieving the current timestamp. For this exercise, please use `CURRENT_TIMESTAMP` instead of the `NOW()` function and if you need to convert a date or time value to a timestamp data type, please use the PostgreSQL specific casting rather than the `CAST()` function.

**Instructions**

- Select the current timestamp without timezone and alias it as `right_now`.

In [12]:
query = """
--Select the current timestamp without timezone
SELECT CURRENT_TIMESTAMP::timestamp AS right_now;
"""
result = pd.read_sql_query(query, engine)
result

Unnamed: 0,right_now
0,2023-09-19 23:18:21.571930


- Now select a timestamp five days from now and alias it as `five_days_from_now`.

In [13]:
query = """
SELECT
	CURRENT_TIMESTAMP::timestamp AS right_now,
    interval '5 days' + CURRENT_TIMESTAMP AS five_days_from_now;
"""
result = pd.read_sql_query(query, engine)
result

Unnamed: 0,right_now,five_days_from_now
0,2023-09-19 23:19:07.166335,2023-09-24 16:19:07.166335+00:00


- Finally, let's use a second-level precision with no fractional digits for both the `right_now` and `five_days_from_now` fields.

In [14]:
query = """
SELECT
	CURRENT_TIMESTAMP(0)::timestamp AS right_now,
    interval '5 days' + CURRENT_TIMESTAMP(0) AS five_days_from_now;
"""
result = pd.read_sql_query(query, engine)
result

Unnamed: 0,right_now,five_days_from_now
0,2023-09-19 23:19:54,2023-09-24 16:19:54+00:00


**Using EXTRACT**

You can use `EXTRACT()` and `DATE_PART()` to easily create new fields in your queries by extracting sub-fields from a source timestamp field.

Now suppose you want to produce a predictive model that will help forecast DVD rental activity by day of the week. You could use the `EXTRACT()` function with the `dow` field identifier in our query to create a new field called `dayofweek` as a sub-field of the `rental_date` column from the `rental` table.

You can `COUNT()` the number of records in the rental table for a given date range and aggregate by the newly created `dayofweek` column.

**Instructions**

- Get the day of the week from the `rental_date` column.

In [15]:
query = """
SELECT 
  -- Extract day of week from rental_date
  EXTRACT(dow FROM rental_date) AS dayofweek 
FROM rental 
LIMIT 100;
"""
result = pd.read_sql_query(query, engine)
result

Unnamed: 0,dayofweek
0,2.0
1,2.0
2,2.0
3,2.0
4,2.0
...,...
95,3.0
96,3.0
97,3.0
98,3.0


- Count the total number of rentals by day of the week.

In [16]:
query = """
-- Extract day of week from rental_date
SELECT 
  EXTRACT(dow FROM rental_date) AS dayofweek, 
  -- Count the number of rentals
  COUNT(rental_id) as rentals 
FROM rental 
GROUP BY 1;
"""
result = pd.read_sql_query(query, engine)
result

Unnamed: 0,dayofweek,rentals
0,0.0,2320
1,6.0,2311
2,1.0,2247
3,2.0,2463
4,3.0,2231
5,5.0,2272
6,4.0,2200


**Using DATE_TRUNC**

The `DATE_TRUNC()` function will truncate timestamp or interval data types to return a timestamp or interval at a specified precision. The precision values are a subset of the field identifiers that can be used with the `EXTRACT()` and `DATE_PART()` functions. `DATE_TRUNC()` will return an interval or timestamp rather than a number. For example

```
SELECT DATE_TRUNC('month', TIMESTAMP '2005-05-21 15:30:30');

```

**Result: 2005-05-01 00;00:00**

Now, let's experiment with different precisions and ultimately modify the queries from the previous exercises to aggregate rental activity.

**Instructions**

- Truncate the `rental_date` field by `year`.
- Now modify the previous query to truncate the `rental_date` by `month`.
- Let's see what happens when we truncate by day of the month.
- Finally, count the total number of rentals by `rental_day` and alias it as `rentals`.

In [17]:
query = """
-- Truncate rental_date by year
SELECT DATE_TRUNC('year', rental_date) AS rental_year
FROM rental;
"""
result = pd.read_sql_query(query, engine)
result

Unnamed: 0,rental_year
0,2005-01-01
1,2005-01-01
2,2005-01-01
3,2005-01-01
4,2005-01-01
...,...
16039,2005-01-01
16040,2005-01-01
16041,2005-01-01
16042,2005-01-01


In [18]:
query = """
-- Truncate rental_date by month
SELECT DATE_TRUNC('month', rental_date) AS rental_month
FROM rental;
"""
result = pd.read_sql_query(query, engine)
result

Unnamed: 0,rental_month
0,2005-05-01
1,2005-05-01
2,2005-05-01
3,2005-05-01
4,2005-05-01
...,...
16039,2005-08-01
16040,2005-08-01
16041,2005-08-01
16042,2005-08-01


In [19]:
query = """
-- Truncate rental_date by day of the month 
SELECT DATE_TRUNC('day', rental_date) AS rental_day 
FROM rental;
"""
result = pd.read_sql_query(query, engine)
result

Unnamed: 0,rental_day
0,2005-05-24
1,2005-05-24
2,2005-05-24
3,2005-05-24
4,2005-05-24
...,...
16039,2005-08-23
16040,2005-08-23
16041,2005-08-23
16042,2005-08-23


In [20]:
query = """
SELECT 
  DATE_TRUNC('day', rental_date) AS rental_day,
  -- Count total number of rentals 
  COUNT(rental_id) as rentals 
FROM rental
GROUP BY 1;
"""
result = pd.read_sql_query(query, engine)
result

Unnamed: 0,rental_day,rentals
0,2005-05-28,196
1,2005-05-25,137
2,2005-05-29,154
3,2005-08-16,23
4,2005-05-31,163
5,2005-07-11,461
6,2005-07-10,480
7,2005-06-18,344
8,2005-07-31,679
9,2005-06-14,16


**Putting it all together**

Many of the techniques you've learned in this course will be useful when building queries to extract data for model training. Now let's use some date/time functions to extract and manipulate some DVD rentals data from our fictional DVD rental store.

In this exercise, you are going to extract a list of customers and their rental history over 90 days. You will be using the `EXTRACT()`, `DATE_TRUNC()`, and `AGE()` functions that you learned about during this chapter along with some general SQL skills from the prerequisites to extract a data set that could be used to determine what day of the week customers are most likely to rent a DVD and the likelihood that they will return the DVD late.

**Instructions**

- Extract the day of the week from the `rental_date` column using the alias `dayofweek`.
- Use an `INTERVAL` in the `WHERE` clause to select records for the 90 day period starting on 5/1/2005.

In [21]:
query = """
SELECT 
  -- Extract the day of week date part from the rental_date
  EXTRACT(dow FROM rental_date) AS dayofweek,
  AGE(return_date, rental_date) AS rental_days
FROM rental AS r 
WHERE 
  -- Use an INTERVAL for the upper bound of the rental_date 
  rental_date BETWEEN CAST('2005-05-01' AS DATE)
   AND CAST('2005-05-01' AS DATE) + INTERVAL '90 day';
"""
result = pd.read_sql_query(query, engine)
result

Unnamed: 0,dayofweek,rental_days
0,2.0,3 days 20:46:00
1,2.0,7 days 23:09:00
2,2.0,9 days 02:39:00
3,2.0,8 days 05:28:00
4,2.0,2 days 02:24:00
...,...,...
8858,5.0,5 days 20:36:00
8859,5.0,0 days 22:20:00
8860,5.0,1 days 05:16:00
8861,5.0,2 days 05:21:00


- Finally, use a CASE statement and DATE_TRUNC() to create a new column called past_due which will be TRUE if the rental_days is greater than the rental_duration otherwise, it will be FALSE.

In [22]:
query = """
SELECT 
  c.first_name || ' ' || c.last_name AS customer_name,
  f.title,
  r.rental_date,
  -- Extract the day of week date part from the rental_date
  EXTRACT(dow FROM r.rental_date) AS dayofweek,
  AGE(r.return_date, r.rental_date) AS rental_days,
  -- Use DATE_TRUNC to get days from the AGE function
  CASE WHEN DATE_TRUNC('day', AGE(r.return_date, r.rental_date)) > 
    f.rental_duration * INTERVAL '1' day 
  THEN TRUE 
  ELSE FALSE END AS past_due 
FROM 
  film AS f 
  INNER JOIN inventory AS i 
  	ON f.film_id = i.film_id 
  INNER JOIN rental AS r 
  	ON i.inventory_id = r.inventory_id 
  INNER JOIN customer AS c 
  	ON c.customer_id = r.customer_id 
WHERE 
  -- Use an INTERVAL for the upper bound of the rental_date 
  r.rental_date BETWEEN CAST('2005-05-01' AS DATE) 
  AND CAST('2005-05-01' AS DATE) + INTERVAL '90 day';
"""
result = pd.read_sql_query(query, engine)
result

Unnamed: 0,customer_name,title,rental_date,dayofweek,rental_days,past_due
0,Tommy Collazo,Freaky Pocus,2005-05-24 22:54:33,2.0,3 days 20:46:00,False
1,Manuel Murrell,Graduate Lord,2005-05-24 23:03:39,2.0,7 days 23:09:00,False
2,Andrew Purdy,Love Suicides,2005-05-24 23:04:41,2.0,9 days 02:39:00,True
3,Delores Hansen,Idols Snatchers,2005-05-24 23:05:21,2.0,8 days 05:28:00,True
4,Nelson Christenson,Mystic Truman,2005-05-24 23:08:07,2.0,2 days 02:24:00,False
...,...,...,...,...,...,...
8858,Frances Parker,Alien Center,2005-07-29 23:52:01,5.0,5 days 20:36:00,False
8859,Sally Pierce,Kwai Homeward,2005-07-29 23:52:12,5.0,0 days 22:20:00,False
8860,Richard Mccrary,Hanover Galaxy,2005-07-29 23:54:54,5.0,1 days 05:16:00,False
8861,Marie Turner,Freddy Storm,2005-07-29 23:58:19,5.0,2 days 05:21:00,False
