In [1]:
import pandas as pd
import psycopg2

def execute_query(sql_query, dbname='sakila', user='postgres', password='postgres', port='5432'):
    # Create a connection to the PostgreSQL database
    conn = psycopg2.connect(dbname=dbname, user=user, password=password, port=port)

    # Use read_sql to execute the query and load the results into a DataFrame
    df = pd.read_sql(sql_query, conn)

    # Close the database connection
    conn.close()

    # Return the DataFrame
    return df



# Concatenating strings

In this exercise and the ones that follow, we are going to derive new fields from columns within the customer and film tables of the DVD rental database.

In [2]:
query_result = execute_query(
    """
-- Concatenate the first_name and last_name and email 
SELECT first_name || ' ' || last_name || ' <' || email || '>' AS full_email 
FROM customer
    """)
query_result.head()

Unnamed: 0,full_email
0,MARY SMITH <MARY.SMITH@sakilacustomer.org>
1,PATRICIA JOHNSON <PATRICIA.JOHNSON@sakilacusto...
2,LINDA WILLIAMS <LINDA.WILLIAMS@sakilacustomer....
3,BARBARA JONES <BARBARA.JONES@sakilacustomer.org>
4,ELIZABETH BROWN <ELIZABETH.BROWN@sakilacustome...


In [3]:
query_result = execute_query(
    """
-- Concatenate the first_name and last_name and email
SELECT CONCAT(first_name, ' ', last_name,  ' <', email, '>') AS full_email 
FROM customer
    """)
query_result.head()

Unnamed: 0,full_email
0,MARY SMITH <MARY.SMITH@sakilacustomer.org>
1,PATRICIA JOHNSON <PATRICIA.JOHNSON@sakilacusto...
2,LINDA WILLIAMS <LINDA.WILLIAMS@sakilacustomer....
3,BARBARA JONES <BARBARA.JONES@sakilacustomer.org>
4,ELIZABETH BROWN <ELIZABETH.BROWN@sakilacustome...


# Changing the case of string data

Now you are going to use the film and category tables to create a new field called film_category by concatenating the category name with the film's title. You will also format the result using some sql functions.

In [4]:
query_result = execute_query(
    """
SELECT 
  -- Concatenate the category name to coverted to uppercase
  -- to the film title converted to title case
  UPPER(name)  || ': ' || INITCAP(title) AS film_category, 
  -- Convert the description column to lowercase
  LOWER(description) AS description
FROM 
  film AS f 
  INNER JOIN film_category AS fc 
  	ON f.film_id = fc.film_id 
  INNER JOIN category AS c 
  	ON fc.category_id = c.category_id;
    """)
query_result.head()

Unnamed: 0,film_category,description
0,DOCUMENTARY: Academy Dinosaur,a epic drama of a feminist and a mad scientist...
1,HORROR: Ace Goldfinger,a astounding epistle of a database administrat...
2,DOCUMENTARY: Adaptation Holes,a astounding reflection of a lumberjack and a ...
3,HORROR: Affair Prejudice,a fanciful documentary of a frisbee and a lumb...
4,FAMILY: African Egg,a fast-paced documentary of a pastry chef and ...


# Replacing string data

Sometimes you will need to make sure that the data you are extracting does not contain any whitespace. There are many different approaches you can take to cleanse and prepare your data for these situations. A common technique is to replace any whitespace with an underscore.

In [5]:
query_result = execute_query(
    """
SELECT 
  -- Replace whitespace in the film title with an underscore
  REPLACE(title, ' ', '_') AS title
FROM film; 
    """)
query_result.head()

Unnamed: 0,title
0,ACADEMY_DINOSAUR
1,ACE_GOLDFINGER
2,ADAPTATION_HOLES
3,AFFAIR_PREJUDICE
4,AFRICAN_EGG


# Determining the length of strings

Determining the number of characters in a string is something that you will use frequently when working with data in a SQL database. Many situations will require you to find the length of a string stored in your database. 

In [6]:
query_result = execute_query(
    """
SELECT 
  -- Select the title and description columns
  title,
  description,
  -- Determine the length of the description column
  LENGTH(description) AS desc_len
FROM film;
    """)
query_result.head()

Unnamed: 0,title,description,desc_len
0,ACADEMY DINOSAUR,A Epic Drama of a Feminist And a Mad Scientist...,96
1,ACE GOLDFINGER,A Astounding Epistle of a Database Administrat...,100
2,ADAPTATION HOLES,A Astounding Reflection of a Lumberjack And a ...,96
3,AFFAIR PREJUDICE,A Fanciful Documentary of a Frisbee And a Lumb...,92
4,AFRICAN EGG,A Fast-Paced Documentary of a Pastry Chef And ...,117


# Truncating strings

In the previous exercise, you calculated the length of the description column and noticed that the number of characters varied but most of the results were over 75 characters. There will be many times when you need to truncate a text column to a certain length to meet specific criteria for an application.

In [7]:
query_result = execute_query(
    """
SELECT 
  -- Select the first 50 characters of description
  LEFT(description, 50) AS short_desc
FROM 
  film AS f; 
    """)
query_result.head()

Unnamed: 0,short_desc
0,A Epic Drama of a Feminist And a Mad Scientist...
1,A Astounding Epistle of a Database Administrat...
2,A Astounding Reflection of a Lumberjack And a ...
3,A Fanciful Documentary of a Frisbee And a Lumb...
4,A Fast-Paced Documentary of a Pastry Chef And ...


# Extracting substrings from text data

In this exercise, you are going to practice how to extract substrings from text columns. ou'll use several functions that you've learned about in the video to manipulate the address column and return only the street address.

In [8]:
query_result = execute_query(
    """
SELECT 
  -- Select only the street name from the address table
  SUBSTRING(address FROM POSITION(' ' IN address)+1 FOR LENGTH(address))
FROM 
  address;
    """)
query_result.head()

Unnamed: 0,substring
0,MySakila Drive
1,MySQL Boulevard
2,Workhaven Lane
3,Lillydale Drive
4,Hanoi Way


# Combining functions for string manipulation

In the next example, we are going to break apart the email column from the customer table into three new derived fields. Parsing a single column into multiple columns can be useful when you need to work with certain subsets of data. 

In [9]:
query_result = execute_query(
    """
SELECT
  -- Extract the characters to the left of the '@'
  LEFT(email, POSITION('@' IN email)-1) AS username,
  -- Extract the characters to the right of the '@'
  SUBSTRING(email FROM POSITION('@' IN email)+1 FOR LENGTH(email)) AS domain
FROM customer;
    """)
query_result.head()

Unnamed: 0,username,domain
0,MARY.SMITH,sakilacustomer.org
1,PATRICIA.JOHNSON,sakilacustomer.org
2,LINDA.WILLIAMS,sakilacustomer.org
3,BARBARA.JONES,sakilacustomer.org
4,ELIZABETH.BROWN,sakilacustomer.org


# Padding

Padding strings is useful in many real-world situations. Earlier in this course, we learned about string concatenation and how to combine the customer's first and last name separated by a single blank space and also combined the customer's full name with their email address.

In [10]:
query_result = execute_query(
    """
-- Concatenate the padded first_name and last_name 
SELECT 
	RPAD(first_name, LENGTH(first_name)+1) || last_name AS full_name
FROM customer;
    """)
query_result.head()

Unnamed: 0,full_name
0,MARY SMITH
1,PATRICIA JOHNSON
2,LINDA WILLIAMS
3,BARBARA JONES
4,ELIZABETH BROWN


In [11]:
query_result = execute_query(
    """
-- Concatenate the first_name and last_name 
SELECT 
	first_name || LPAD(last_name, LENGTH(last_name)+1) AS full_name
FROM customer; 
    """)
query_result.head()

Unnamed: 0,full_name
0,MARY SMITH
1,PATRICIA JOHNSON
2,LINDA WILLIAMS
3,BARBARA JONES
4,ELIZABETH BROWN


In [12]:
query_result = execute_query(
    """
-- Concatenate the first_name and last_name 
SELECT 
	RPAD(first_name, LENGTH(first_name)+1) 
    || RPAD(last_name, LENGTH(last_name)+2, ' <') 
    || RPAD(email, LENGTH(email)+1, '>') AS full_email
FROM customer; 
    """)
query_result.head()

Unnamed: 0,full_email
0,MARY SMITH <MARY.SMITH@sakilacustomer.org>
1,PATRICIA JOHNSON <PATRICIA.JOHNSON@sakilacusto...
2,LINDA WILLIAMS <LINDA.WILLIAMS@sakilacustomer....
3,BARBARA JONES <BARBARA.JONES@sakilacustomer.org>
4,ELIZABETH BROWN <ELIZABETH.BROWN@sakilacustome...


# The TRIM function

We can use trimming functions to eliminate the whitespace at the end of the string after it's been truncated.

In [13]:
query_result = execute_query(
    """
-- Concatenate the uppercase category name and film title
SELECT 
  CONCAT(UPPER(name), ': ', title) AS film_category, 
  -- Truncate the description remove trailing whitespace
  TRIM(LEFT(description, 50)) AS film_desc
FROM 
  film AS f 
  INNER JOIN film_category AS fc 
  	ON f.film_id = fc.film_id 
  INNER JOIN category AS c 
  	ON fc.category_id = c.category_id;
    """)
query_result.head()

Unnamed: 0,film_category,film_desc
0,DOCUMENTARY: ACADEMY DINOSAUR,A Epic Drama of a Feminist And a Mad Scientist...
1,HORROR: ACE GOLDFINGER,A Astounding Epistle of a Database Administrat...
2,DOCUMENTARY: ADAPTATION HOLES,A Astounding Reflection of a Lumberjack And a Car
3,HORROR: AFFAIR PREJUDICE,A Fanciful Documentary of a Frisbee And a Lumb...
4,FAMILY: AFRICAN EGG,A Fast-Paced Documentary of a Pastry Chef And ...


# Putting it all together

In this exercise, we are going to use the `film` and `category` tables to create a new field called `film_category` by concatenating the category `name` with the film's `title`. You will also practice how to truncate text fields like the `film` table's `description` column without cutting off a word.

In [14]:
query_result = execute_query(
    """
SELECT 
  UPPER(c.name) || ': ' || f.title AS film_category, 
  -- Truncate the description without cutting off a word
  LEFT(description, 50 - 
    -- Subtract the position of the first whitespace character
    POSITION(
      ' ' IN REVERSE(LEFT(description, 50))
    )
  ) 
FROM 
  film AS f 
  INNER JOIN film_category AS fc 
  	ON f.film_id = fc.film_id 
  INNER JOIN category AS c 
  	ON fc.category_id = c.category_id;
    """)
query_result.head()

Unnamed: 0,film_category,left
0,DOCUMENTARY: ACADEMY DINOSAUR,A Epic Drama of a Feminist And a Mad Scientist
1,HORROR: ACE GOLDFINGER,A Astounding Epistle of a Database Administrator
2,DOCUMENTARY: ADAPTATION HOLES,A Astounding Reflection of a Lumberjack And a Car
3,HORROR: AFFAIR PREJUDICE,A Fanciful Documentary of a Frisbee And a
4,FAMILY: AFRICAN EGG,A Fast-Paced Documentary of a Pastry Chef And a
