# **Outline**

- [**1. Introduction**](#-1.-introduction)
    - [1.1. Alising, DISTINCT, VIEW, COUNT](#1.1.-Alising,-DISTINCT,-VIEW,-COUNT)
- [**2. Filtering Numbers**](#-2.-filtering-numbers)

In [1]:
import duckdb
import pandas as pd

First, establish a connection to DuckDB and create new tables from the csv files.

In [2]:
conn = duckdb.connect(  database='data/database.duckdb',
                        read_only=False)

# Creating and loading tables from CSV files
conn.sql("CREATE OR REPLACE TABLE books AS SELECT * FROM read_csv_auto('data/books.csv');")
conn.sql("CREATE OR REPLACE TABLE people AS SELECT * FROM read_csv_auto('data/people.csv')")
conn.sql("CREATE OR REPLACE TABLE films AS SELECT * FROM read_csv_auto('data/films.csv')")
conn.sql("CREATE OR REPLACE TABLE reviews AS SELECT * FROM read_csv_auto('data/reviews.csv')")
conn.sql("CREATE OR REPLACE TABLE roles AS SELECT * FROM read_csv_auto('data/roles.csv')")

conn.sql("SHOW TABLES")

┌─────────────────┐
│      name       │
│     varchar     │
├─────────────────┤
│ books           │
│ films           │
│ library_authors │
│ people          │
│ reviews         │
│ roles           │
└─────────────────┘

In [3]:
# For simplicity
def sql(query):
    display(conn.sql(query).df())

# **1. Introduction**

In SQL the execution has a different order than the order of the commands. The order of the commands is:

2º SELECT column_name

1º FROM table

3º LIMIT n

- `SELECT`: which columns to show
- `FROM`: which table to use
- `LIMIT`: how many results to show

## 1.1. Alising, DISTINCT, VIEW, COUNT

- `AS`: Alising is used to rename a column or table with an alias. This is useful when the name of the column is too long or not descriptive.

- `DISTINCT`: Select only distinct records from a table.

In [4]:
# Returns a result set with just one column listing the unique authors in the books table
sql(
"""
SELECT DISTINCT author 
FROM books
LIMIT 5;
""")


Unnamed: 0,author
0,JJ Smith
1,National Geographic Kids
2,Larry Schweikart
3,Chris Kyle
4,Khaled Hosseini


In [5]:

# Return the unique author and genre combinations in the books table
sql(
"""
SELECT DISTINCT author, genre 
FROM books
LIMIT 5;
""")


Unnamed: 0,author,genre
0,Atul Gawande,Non Fiction
1,Marjorie Sarnat,Non Fiction
2,David Zinczenko,Non Fiction
3,Rachel Hollis,Non Fiction
4,Gillian Flynn,Fiction


In [6]:

# Rename the author column to unique_author
sql( 
"""
SELECT DISTINCT author AS unique_author
FROM books
LIMIT 5;
""" )

Unnamed: 0,unique_author
0,Amor Towles
1,Jaycee Dugard
2,Veronica Roth
3,Neil deGrasse Tyson
4,Atul Gawande



- `VIEW`: A view is a virtual table that is the result of a saved sql `SELECT` statement. There is no real data in a view, it is just a saved query.

In [7]:
# Saves the results of the written query as a view called library_authors
display(conn.sql(
"""
CREATE OR REPLACE VIEW library_authors AS
SELECT DISTINCT author AS unique_author
FROM books;

SHOW TABLES;
""" 
))

sql(
"""
SELECT * 
FROM library_authors
LIMIT 5;
"""
)



┌─────────────────┐
│      name       │
│     varchar     │
├─────────────────┤
│ books           │
│ films           │
│ library_authors │
│ people          │
│ reviews         │
│ roles           │
└─────────────────┘

Unnamed: 0,unique_author
0,JJ Smith
1,National Geographic Kids
2,Larry Schweikart
3,Chris Kyle
4,Khaled Hosseini


- `COUNT(field_name)`: Count the number of values in a field 
    - `COUNT(*)`: Use * to count all records in a table.

In [8]:
# Return the number of records containing a film_id in the reviews table
sql(
"""
SELECT COUNT(film_id) AS count_film_id
FROM reviews;
"""
)

Unnamed: 0,count_film_id
0,4968


In [9]:
# Count the total number of records in the people table
sql(
"""
SELECT COUNT(*) AS count_records
FROM people
"""
)


Unnamed: 0,count_records
0,8397


In [10]:

# Count the number of records with a birthdate in the people table
sql(
"""
SELECT COUNT(birthdate) AS count_birthdate
FROM people
"""
)


Unnamed: 0,count_birthdate
0,6152


In [11]:

# Count the records for languages and countries in the films table
sql(
"""
SELECT 
    COUNT(language) AS count_languages,
    COUNT(country) AS count_countries
FROM films;
"""
)

Unnamed: 0,count_languages,count_countries
0,4957,4966


- Combine `COUNT()` with `DISTINCT()` to count unique values in a field.

In [12]:
# Return the unique countries represented in the films table using DISTINCT
sql(
"""
SELECT DISTINCT(country) 
FROM films
LIMIT 3;
"""
)


Unnamed: 0,country
0,Aruba
1,Colombia
2,Romania


In [13]:

# Return the number of unique countries represented in the films table, aliased as count_distinct_countries
sql(
"""
SELECT COUNT(DISTINCT(country)) AS count_distinct_countries
FROM films
LIMIT 3;
"""
)

Unnamed: 0,count_distinct_countries
0,64


# **2. Filtering Records**

The order of the commands with the filtering is changed to the following order:

3º SELECT column_name

1º FROM table

2º WHERE condition

4º LIMIT n

- `WHERE`: Filter records based on a condition. 

## 2.1 Filtering Numbers

Comparison operators are used to filter records based on a condition. The comparison operators are:

- `>`: Greater than or after
- `<`: Less than or before
- `=`: Equal to
- `>=`: Greater than or equal to
- `<=`: Less than or equal to
- `<>`: Not equal to

In [14]:
# Select the title of the films released after the year 2000
sql(
"""
SELECT title
FROM films
WHERE release_year > 2000   
LIMIT 5;
"""
)

Unnamed: 0,title
0,15 Minutes
1,3000 Miles to Graceland
2,A Beautiful Mind
3,A Knight's Tale
4,A.I. Artificial Intelligence


In [15]:
# Select the film_id and imdb_score from the reviews table and filter on scores higher than 7.0.
sql(
"""
SELECT film_id, imdb_score
FROM reviews
WHERE imdb_score > 7.0
LIMIT 3;
"""
)


Unnamed: 0,film_id,imdb_score
0,3934,7.1
1,74,7.6
2,1254,8.0


In [16]:

# Select the film_id and facebook_likes of the first ten records with less than 1000 likes from the reviews table.
sql(
"""
SELECT film_id, facebook_likes
FROM reviews
WHERE facebook_likes < 1000
LIMIT 10;
"""
)


Unnamed: 0,film_id,facebook_likes
0,3405,0
1,478,491
2,74,930
3,740,0
4,2869,689
5,1181,0
6,2020,0
7,2312,912
8,1820,872
9,831,975


In [17]:
# Count how many records have a num_votes of at least 100,000; use the alias films_over_100K_votes.
sql(
"""
SELECT COUNT(*) AS films_over_100K_votes
FROM reviews
WHERE num_votes >= 100000;
"""
)

Unnamed: 0,films_over_100K_votes
0,1211


In [18]:
# Select and count the language field using the alias count_spanish for table films.
# Apply a filter to select only Spanish from the language field.
sql(
"""
SELECT COUNT(*) AS count_spanish
FROM films
WHERE language = 'Spanish';
"""
)

Unnamed: 0,count_spanish
0,40


## 2.2 Multiple Criteria

- `OR`: Show a record if any of the conditions are true.

- `AND`: Show a record if all of the conditions are true.

- `BETWEEN`: Combine with `AND` and `OR` to filter records between a range of values.

In [19]:
# Select the title and release_year for all German-language films released before 2000.
sql(
"""
SELECT title, release_year
FROM films
where language = 'German' 
    AND release_year < 2000 
"""
)

Unnamed: 0,title,release_year
0,Metropolis,1927.0
1,Pandora's Box,1929.0
2,The Torture Chamber of Dr. Sadism,1967.0
3,Das Boot,1981.0
4,Run Lola Run,1998.0
5,Aimee & Jaguar,1999.0


In [20]:
# Select all details for German-language films released after 2000 but before 2010 using only WHERE and AND.
sql(
"""
SELECT *
FROM films
where language = 'German' 
    AND release_year > 2000
    AND release_year < 2010;
"""
)

Unnamed: 0,id,title,release_year,country,duration,language,certification,gross,budget
0,1952,Good Bye Lenin!,2003.0,Germany,121.0,German,R,4063859.0,4800000.0
1,2130,Downfall,2004.0,Germany,178.0,German,R,5501940.0,13500000.0
2,2224,Summer Storm,2004.0,Germany,98.0,German,R,95016.0,2700000.0
3,2709,The Lives of Others,2006.0,Germany,137.0,German,R,11284657.0,2000000.0
4,3100,The Baader Meinhof Complex,2008.0,Germany,184.0,German,R,476270.0,20000000.0
5,3143,The Wave,2008.0,Germany,107.0,German,,,5000000.0
6,3220,Cargo,2009.0,Switzerland,112.0,German,,,4500000.0
7,3346,Soul Kitchen,2009.0,Germany,99.0,German,,274385.0,4000000.0
8,3412,The White Ribbon,2009.0,Germany,144.0,German,R,2222647.0,12000000.0


In [21]:
# 1. Select the title and release_year for films released in 1990 or 1999 using only WHERE and OR.
# 2. Filter the records to only include English or Spanish-language films.
# 3. Finally, restrict the query to only return films worth more than $2,000,000 gross
sql(
"""
SELECT title, release_year
FROM films
where (release_year = 1990 OR release_year = 1999)
    AND (language = 'English' OR language = 'Spanish')
    AND gross > 2000000
LIMIT 5;
"""
)

Unnamed: 0,title,release_year
0,Arachnophobia,1990.0
1,Back to the Future Part III,1990.0
2,Child's Play 2,1990.0
3,Dances with Wolves,1990.0
4,Days of Thunder,1990.0


In [22]:
# 1. Select the title and release_year of all films released between 1990 and 2000 (inclusive) using BETWEEN.
# 2. Now select only films with a budget over $100 million.
# 3. Now, restrict the query to only return Spanish-language films.
# 4. Finally, amend the query to include all Spanish-language or French-language films
sql(
"""
SELECT title, release_year
FROM films
WHERE release_year BETWEEN 1990 AND 2000
    AND budget > 100000000
    AND (language = 'Spanish' OR language = 'French');
"""
)

Unnamed: 0,title,release_year
0,Tango,1998.0
1,Les couloirs du temps: Les visiteurs II,1998.0


## 2.3 Filtering text

The `WHERE` clause can also be used to filter text. We can filter a patters using the following operators:

- `LIKE`: Search for a pattern in a field.

    - `%`: A substitute for zero, one or more characters

    - `_`: A substitute for a single character

- `IN`: Specify multiple values in a WHERE clause.

- `NOT`: Can be used with `LIKE` and `IN` to filter records that do not match a pattern or value.

In [23]:
# Select the names of all people whose names begin with 'B'
sql(
"""
SELECT name 
FROM people
WHERE name LIKE 'B%'
LIMIT 5;
"""
)

Unnamed: 0,name
0,B.J. Novak
1,Babak Najafi
2,Babar Ahmed
3,Bahare Seddiqi
4,Bai Ling


In [24]:
# Select the names of people whose names have 'r' as the second letter.
sql(
"""
SELECT name 
FROM people
WHERE name LIKE '_r%'
LIMIT 5;
"""
)

Unnamed: 0,name
0,Ara Celi
1,Aramis Knight
2,Arben Bajraktaraj
3,Arcelia RamÃ­rez
4,Archie Kao


In [25]:
# Select the names of people whose names don't start with 'A'.
sql(
"""
SELECT name 
FROM people
WHERE name NOT LIKE 'A%'
LIMIT 5;
"""
)

Unnamed: 0,name
0,50 Cent
1,Ãlex Angulo
2,Ãlex de la Iglesia
3,Ãngela Molina
4,B.J. Novak


In [26]:
# Select the title and release_year of all films released in 1990 or 2000 that were longer than two hours
sql(
"""
SELECT title, release_year
FROM films
WHERE (release_year = 1990 OR release_year = 2000)
    AND duration > 120
LIMIT 5;
"""
)

# Select the title and language of all films in English, Spanish, or French using IN.
sql(
"""
SELECT title, language
FROM films
WHERE language IN ('English', 'Spanish', 'French')
LIMIT 5;
"""
)

Unnamed: 0,title,release_year
0,Dances with Wolves,1990.0
1,Die Hard 2,1990.0
2,Ghost,1990.0
3,Goodfellas,1990.0
4,Mo' Better Blues,1990.0


Unnamed: 0,title,language
0,The Broadway Melody,English
1,Hell's Angels,English
2,A Farewell to Arms,English
3,42nd Street,English
4,She Done Him Wrong,English


In [27]:
# Select the title, certification and language of all films certified NC-17 or R that are in English, Italian, or Greek.
sql(
"""
SELECT title, certification, language
FROM films
WHERE language IN ('English', 'Italian', 'Greek')
    AND certification IN ('NC-17', 'R') 
LIMIT 5;
"""
)

Unnamed: 0,title,certification,language
0,Pink Flamingos,NC-17,English
1,The Evil Dead,NC-17,English
2,Showgirls,NC-17,English
3,Orgazmo,NC-17,English
4,L.I.E.,NC-17,English


In [28]:
#   1. Count the unique titles from the films database and use the alias nineties_english_films_for_teens.
#   2. Filter to include only movies with a release_year from 1990 to 1999, inclusive.
#   3. Add another filter narrowing your query down to English-language films.
#   4. Add a final filter to select only films with 'G', 'PG', 'PG-13' certifications.
sql(
"""
SELECT COUNT(DISTINCT(title)) AS nineties_english_films_for_teens
FROM films
WHERE release_year BETWEEN 1990 AND 1999
    AND language =  'English'
    AND certification IN ('G', 'PG', 'PG-13');

"""    
)

Unnamed: 0,nineties_english_films_for_teens
0,310


## 2.4 Filtering NULL values

`IS NULL` and `IS NOT NULL` are operators used with the WHERE clause to test for empty values.

In [29]:
# Select the title of every film that doesn't have a budget associated with it and use the alias no_budget_info.
sql(
"""
SELECT title AS no_budget_info
FROM films
WHERE budget IS NULL
LIMIT 5;
"""    
)

Unnamed: 0,no_budget_info
0,Pandora's Box
1,The Prisoner of Zenda
2,The Blue Bird
3,Bambi
4,State Fair


In [30]:
# Count the number of films with a language associated with them and use the alias count_language_known.
sql(
"""
SELECT COUNT(*) AS count_language_known
FROM films
WHERE language IS NOT NULL;
"""    
)

Unnamed: 0,count_language_known
0,4957


# **3. Aggregate Functions**

## 3.1. Summmarizing Data

An aggregate function is a function where the values of multiple rows are grouped together to form a single value of more significant meaning or measurement such as a set, a bag or a list. Consider the following aggregate functions:

- `COUNT()`: Count the number of rows in a table.

- `SUM()`: Calculate the sum of a set of values.

- `AVG()`: Calculate the average of a set of values.

- `MAX()`: Get the maximum value in a set of values.

- `MIN()`: Get the minimum value in a set of values.

The aggregate functions `MAX` and `MIN` also can be used with text fields based in the alphabetical order of the first letter.

- `ROUND()`: Round a number to a specified number of decimal places. Can be used with `SUM`, `AVG`, `MAX` and `MIN`.

In [31]:
# Use the SUM() function to calculate the total duration of all films and alias with total_duration.
sql(
"""
SELECT SUM(duration) AS total_duration
FROM films;
"""
)

# Calculate the average duration of all films and alias with average_duration.
sql(
"""
SELECT AVG(duration) AS average_duration
FROM films;
"""
)

# Find the most recent release_year in the films table, aliasing as latest_year.
sql(
"""
SELECT MAX(release_year) AS latest_year
FROM films;
"""
)

# Find the duration of the shortest film and use the alias shortest_film.
sql(
"""
SELECT MIN(duration) AS shortest_film
FROM films;
"""
)



Unnamed: 0,total_duration
0,534882.0


Unnamed: 0,average_duration
0,107.947931


Unnamed: 0,latest_year
0,2016.0


Unnamed: 0,shortest_film
0,7.0


In [32]:
# Use SUM() to calculate the total gross for all films made in the year 2000 or later, and use the alias total_gross.
sql(
"""
SELECT SUM(gross) AS total_gross
FROM films
WHERE release_year >= 2000;
"""
)

# Calculate the average amount grossed by all films whose titles start with the letter 'A' and alias with avg_gross_A.
sql(
"""
SELECT AVG(gross) AS avg_gross_A
FROM films
WHERE title LIKE 'A%';
"""
)

# Calculate the lowest gross film in 1994 and use the alias lowest_gross.
sql(
"""
SELECT AVG(gross) AS avg_gross_A
FROM films
WHERE title LIKE 'A%';
"""
)

# Calculate the highest gross film between 2000 and 2012, inclusive, and use the alias highest_gross.
sql(
"""
SELECT MIN(gross) AS lowest_gross
FROM films
WHERE release_year = 1994;
"""
)

# Calculate the highest gross film between 2000 and 2012, inclusive, and use the alias highest_gross.
sql(
"""
SELECT MAX(gross) AS highest_gross
FROM films
WHERE release_year BETWEEN 1994 AND 2012;
"""
)

# Calculate the average facebook_likes to one decimal place and assign to the alias, avg_facebook_likes.
sql(
"""
SELECT ROUND(AVG(facebook_likes),1) AS avg_facebook_likes
FROM reviews;
"""
)

# Calculate the average budget from the films table, aliased as avg_budget_thousands, and round to the nearest thousand.
sql(
"""
SELECT ROUND(AVG(budget),-3) AS avg_budget_thousands
FROM films;
"""
)

Unnamed: 0,total_gross
0,150900900000.0


Unnamed: 0,avg_gross_A
0,47893240.0


Unnamed: 0,avg_gross_A
0,47893240.0


Unnamed: 0,lowest_gross
0,125169.0


Unnamed: 0,highest_gross
0,760505847.0


Unnamed: 0,avg_facebook_likes
0,7802.9


Unnamed: 0,avg_budget_thousands
0,39903000.0


## 3.2 Arithmetic

We can use arithmetic operations to perform calculations in SQL. The arithmetic operators are `+`, `-`, `*`, `/`. With the arithmetic operations we can perform calculations in the rows instead of the columns, as in the case of aggregate functions.

<center>
<img src="figures/agg_arithmetic.png" alt="drawing" width = 600/>
</center>

In [33]:
# Select the title and duration in hours for all films and alias as duration_hours; divide duration by 60.0.
sql(
"""
SELECT title, duration / 60.0 AS duration_hours
FROM films
LIMIT 5;
"""
)

# Calculate the percentage of people who are no longer alive and alias the result as percentage_dead.
sql(
"""
SELECT ROUND(COUNT(deathdate)*100.0/COUNT(*),2) as percentage_dead
FROM people;
"""
)

# Find how many decades (period of ten years) the films table covers by using MIN() and MAX(); alias as number_of_decades.
sql(
"""
SELECT (MAX(release_year) - MIN(release_year))/10 AS number_of_decades
FROM films;
"""
)

Unnamed: 0,title,duration_hours
0,Intolerance: Love's Struggle Throughout the Ages,2.05
1,Over the Hill to the Poorhouse,1.833333
2,The Big Parade,2.516667
3,Metropolis,2.416667
4,Pandora's Box,1.833333


Unnamed: 0,percentage_dead
0,9.37


Unnamed: 0,number_of_decades
0,10.0


# **4. Sorting and Grouping**

- **The order of the commands with the ORDER BY is the following order:**

3º SELECT column_name

1º FROM table

2º WHERE condition

4º ORDER BY column_name ASC|DESC

5º LIMIT n


- **The order of the commands with the GROUP BY is the following order:**

3º SELECT AGGREGATION(column_name) AS alias

1º FROM table

2º GROUP BY column_name

4º ORDER BY column_name ASC|DESC

5º LIMIT n


## 4.1. Sorting results

- `ORDER BY`: Sort the result set in ascending or descending order.

    - `ASC`: Ascending order (default).

    - `DESC`: Descending order.

In [34]:
# Select the name of each person in the people table, sorted alphabetically.
sql(
"""
SELECT name
FROM people
ORDER BY name
LIMIT 5;
"""
)

# Select the title and duration for every film, from longest duration to shortest.

sql(
"""
SELECT title, duration
FROM films
ORDER BY duration DESC
LIMIT 5;
"""
)

Unnamed: 0,name
0,50 Cent
1,A. Michael Baldwin
2,A. Raven Cruz
3,A.J. Buckley
4,A.J. DeLucia


Unnamed: 0,title,duration
0,Carlos,334.0
1,"Blood In, Blood Out",330.0
2,Heaven's Gate,325.0
3,The Legend of Suriyothai,300.0
4,Das Boot,293.0


In [35]:
# Select the release_year, duration, and title of films ordered by their release year and duration, in that order.
sql(
"""
SELECT release_year, duration, title
FROM films
ORDER BY release_year, duration
LIMIT 5;
"""
)

# Select the certification, release_year, and title from films ordered first by certification (alphabetically) and second by release year, starting with the most recent year.
sql(
"""
SELECT certification, release_year, title
FROM films
ORDER BY certification, release_year DESC 
LIMIT 5;
"""
)

Unnamed: 0,release_year,duration,title
0,1916.0,123.0,Intolerance: Love's Struggle Throughout the Ages
1,1920.0,110.0,Over the Hill to the Poorhouse
2,1925.0,151.0,The Big Parade
3,1927.0,145.0,Metropolis
4,1929.0,100.0,The Broadway Melody


Unnamed: 0,certification,release_year,title
0,Approved,1967.0,You Only Live Twice
1,Approved,1967.0,In Cold Blood
2,Approved,1967.0,Point Blank
3,Approved,1966.0,A Funny Thing Happened on the Way to the Forum
4,Approved,1966.0,A Man for All Seasons


## 4.2. Grouping and Filtering Grouped Data

- `GROUP BY`: The GROUP BY statement is used in conjunction with the aggregate functions to group the result-set by one or more columns. 


While the `GROUP BY` clause is used to organize rows into groups, it can be used with the `WHERE` clause, which filters rows before grouping. The `HAVING` clause is used to filter these groups after they have been created. The correct order of SQL commands when using the `HAVING` clause is:


5º `SELECT` AGGREGATION(column_name)

1º `FROM` table

2º `WHERE` condition (filter individual records)

3º `GROUP BY` column_name

4º `HAVING` condition (filter grouped records)

6º `ORDER BY` column_name `ASC`|`DESC`

7º `LIMIT` n
    

- `WHERE` filters individual records, `HAVING` filters grouped records.

- Aggregation function in the `SELECT` clause determines **what values will be displayed** in the final output for each group.
- Aggregation function in the `HAVING` clause determines **which groups will be displayed** in the final output.


In [36]:
# Select the release_year and count of films released in each year aliased as film_count.
sql(
"""
SELECT release_year, COUNT(*) AS film_count
FROM films
GROUP BY release_year
LIMIT 5;
"""
)

# Select the release_year and average duration aliased as avg_duration of all films, grouped by release_year.

sql(
"""
SELECT release_year, AVG(duration) AS avg_duration
FROM films
GROUP BY release_year
LIMIT 5;
"""
)

Unnamed: 0,release_year,film_count
0,1949.0,2
1,1960.0,3
2,1961.0,5
3,1963.0,8
4,1968.0,11


Unnamed: 0,release_year,avg_duration
0,1929.0,105.0
1,1940.0,108.0
2,1945.0,103.75
3,1950.0,107.0
4,1958.0,108.0


In [37]:
# Select the release_year, country, and the maximum budget aliased as max_budget for each year and each country; sort your results by release_year and country.

sql(
"""
SELECT release_year, country, MAX(budget) AS max_budget
FROM films
GROUP BY release_year, country
ORDER BY release_year, country
LIMIT 5;
"""
)

Unnamed: 0,release_year,country,max_budget
0,1916.0,USA,385907.0
1,1920.0,USA,100000.0
2,1925.0,USA,245000.0
3,1927.0,Germany,6000000.0
4,1929.0,Germany,


In [46]:

# Select country from the films table, and get the distinct count of certification aliased as certification_count.
# Group the results by country.
# Filter the unique count of certifications to only results greater than 10.
sql(
"""
SELECT country, COUNT(DISTINCT(certification)) AS certification_count
FROM films
GROUP BY country
HAVING COUNT(DISTINCT(certification)) > 10;
"""
)


# Select the country and the average budget as average_budget, rounded to two decimal, from films.
# Group the results by country.
# Filter the results to countries with an average budget of more than one billion (1000000000).
# Sort by descending order of the average_budget.
sql(
"""
SELECT country, ROUND(AVG(budget), 2) AS average_budget
FROM films
GROUP BY country
HAVING AVG(budget) > 1000000000
ORDER BY average_budget DESC;       
"""
)


Unnamed: 0,country,certification_count
0,USA,12


Unnamed: 0,country,average_budget
0,South Korea,1383960000.0
1,Hungary,1260000000.0


In [50]:

# Select the release_year for each film in the films table, filter for records released after 1990, and group by release_year.
sql(
"""
SELECT release_year
FROM films
WHERE release_year > 1990
GROUP BY release_year
LIMIT 5;       
"""
)

# Modify the query to include the average budget aliased as avg_budget and average gross aliased as avg_gross for the results we have so far.
sql(
"""
SELECT  release_year, 
        AVG(budget) AS avg_budget,
        AVG(gross) AS avg_gross        
FROM films
WHERE release_year > 1990
GROUP BY release_year
LIMIT 5;       
"""
)

# Modify the query once more so that only years with an average budget of greater than 60 million are included.
sql(
"""
SELECT  release_year, 
        AVG(budget) AS avg_budget,
        AVG(gross) AS avg_gross        
FROM films
WHERE release_year > 1990
GROUP BY release_year
HAVING AVG(budget) > 60000000;       
"""
)

# Finally, order the results from the highest average gross and limit to one.
sql(
"""
SELECT  release_year, 
        AVG(budget) AS avg_budget,
        AVG(gross) AS avg_gross        
FROM films
WHERE release_year > 1990
GROUP BY release_year
HAVING AVG(budget) > 60000000
ORDER BY avg_gross DESC
LIMIT 1;       
"""
)


Unnamed: 0,release_year
0,1996.0
1,2001.0
2,2004.0
3,2008.0
4,2014.0


Unnamed: 0,release_year,avg_budget,avg_gross
0,1996.0,31620610.0,42044170.0
1,2001.0,37687310.0,43255720.0
2,2004.0,46865340.0,40726530.0
3,2008.0,41804890.0,44573510.0
4,2014.0,35325800.0,62412140.0


Unnamed: 0,release_year,avg_budget,avg_gross
0,2005.0,70323940.0,41159140.0
1,2006.0,93968930.0,39237860.0


Unnamed: 0,release_year,avg_budget,avg_gross
0,2005.0,70323940.0,41159140.0
