In [1]:
import pandas as pd
import sqlalchemy as sa
import psycopg2 as ps
from sqlalchemy import create_engine

In [2]:
%load_ext sql
%sql postgresql://postgres:lingga28@localhost:2828/datacamp
conn = create_engine('postgresql://postgres:lingga28@localhost/datacamp')

# 1.Aggregate functions and data types
### Exercises
Aggregate functions are another valuable tool for the SQL programmer. They are used extensively across businesses to calculate important metrics, such as the average cost of making a film.

You know five different aggregate functions:

- AVG()
- SUM()
- MIN()
- MAX()
- COUNT()

Test your knowledge of what data types they are compatible with.

### Instruction
Classify the function based on what data type it is compatible with.

### Drag the items into the correct bucket:
- AVG()
- SUM()
- MIN()
- MAX()
- COUNT()

### Answers
Numerical Data Only:
- AVG()
- SUM()

Various Data Types:
- MIN()
- MAX()
- COUNT()

# 2. Practice with aggregate functions
### Exercises
Now let's try extracting summary information from a table using these new aggregate functions. Summarizing is helpful in real life when extracting top-line details from your dataset. Perhaps you'd like to know how old the oldest film in the films table is, what the most expensive film is, or how many films you have listed.

Now it's your turn to get more insights about the films table!

### task 1
### Instruction
Use the SUM() function to calculate the total duration of all films and alias with total_duration.

In [3]:
%%sql
-- Query the sum of film durations
SELECT SUM(duration) AS total_duration
FROM cinema.films

 * postgresql://postgres:***@localhost:2828/datacamp
1 rows affected.


total_duration
534882.0


### task 2
### Instruction
Calculate the average duration of all films and alias with average_duration.

In [4]:
%%sql

-- Calculate the average duration of all films
SELECT AVG(duration) as average_duration
FROM cinema.films;

 * postgresql://postgres:***@localhost:2828/datacamp
1 rows affected.


average_duration
107.94793138244198


### task 3
### Instructiom
Find the most recent release_year in the films table, aliasing as latest_year.

In [5]:
%%sql

-- Find the latest release_year
SELECT MAX(release_year) as latest_year
FROM cinema.films;

 * postgresql://postgres:***@localhost:2828/datacamp
1 rows affected.


latest_year
2016.0


### task 4
### Instruction
Find the duration of the shortest film and use the alias shortest_film.

In [6]:
%%sql

-- Find the duration of the shortest film
SELECT MIN(duration) AS shortest_film
FROM cinema.films;

 * postgresql://postgres:***@localhost:2828/datacamp
1 rows affected.


shortest_film
7.0


# 3. Combining aggregate functions with WHERE
### Exercises
When combining aggregate functions with WHERE, you get a powerful tool that allows you to get more granular with your insights, for example, to get the total budget of movies made from the year 2010 onwards.

This combination is useful when you only want to summarize a subset of your data. In your film-industry role, as an example, you may like to summarize each certification category to compare how they each perform or if one certification has a higher average budget than another.

Let's see what insights you can gain about the financials in the dataset.

### task 1
### Instruction
Use SUM() to calculate the total gross for all films made in the year 2000 or later, and use the alias total_gross.

In [7]:
%%sql

-- Calculate the sum of gross from the year 2000 or later
SELECT SUM(gross) as total_gross
FROM cinema.films
WHERE release_year >= 2000;

 * postgresql://postgres:***@localhost:2828/datacamp
1 rows affected.


total_gross
150900926358.0


### task 2
### Instruction
Calculate the average amount grossed by all films whose titles start with the letter 'A' and alias with avg_gross_A.

In [8]:
%%sql

-- Calculate the average gross of films that start with A
SELECT AVG(gross) AS avg_gross_A
FROM cinema.films
WHERE title LIKE 'A%';

 * postgresql://postgres:***@localhost:2828/datacamp
1 rows affected.


avg_gross_a
47893236.42248062


### task 3
### Instruction
Calculate the lowest gross film in 1994 and use the alias lowest_gross.

In [9]:
%%sql

-- Calculate the lowest gross film in 1994
SELECT MIN(gross) AS lowest_gross
FROM cinema.films
WHERE release_year = 1994;

 * postgresql://postgres:***@localhost:2828/datacamp
1 rows affected.


lowest_gross
125169.0


### task 4
### Instruction
Calculate the highest gross film between 2000 and 2012, inclusive, and use the alias highest_gross.

In [10]:
%%sql

-- Calculate the highest gross film released between 2000-2012
SELECT MAX(gross) as highest_gross
FROM cinema.films
WHERE release_year BETWEEN 2000 AND 2012;

 * postgresql://postgres:***@localhost:2828/datacamp
1 rows affected.


highest_gross
760505847.0


# 4. ROUND() with a negative parameter
### Exercises
A useful thing you can do with ROUND() is have a negative number as the decimal place parameter. This can come in handy if your manager only needs to know the average number of facebook_likes to the hundreds since granularity below one hundred likes won't impact decision making.

Social media plays a significant role in determining success. If a movie trailer is posted and barely gets any likes, the movie itself may not be successful. Remember how 2020's "Sonic the Hedgehog" movie got a revamp after the public saw the trailer?

Let's apply this to other parts of the dataset and see what the benchmark is for movie budgets so, in the future, it's clear whether the film is above or below budget.

### Instruction
Calculate the average budget from the films table, aliased as avg_budget_thousands, and round to the nearest thousand.

In [15]:
%%sql

-- Calculate the average budget rounded to the thousands
SELECT ROUND(AVG(budget), -3) as avg_budget_thousands
FROM cinema.films;

 * postgresql://postgres:***@localhost:2828/datacamp
1 rows affected.


avg_budget_thousands
39903000


# 5. Using arithmetic
### Exercises
SQL arithmetic comes in handy when your table is missing a metric you want to review. Suppose you have some data on movie ticket sales, but the table only has fields for ticket price and discount. In that case, you could combine these by subtracting the discount from the ticket price to get the amount the film-goer paid.

You have seen that SQL can act strangely when dividing integers. What is the result if you divide a discount of two dollars by the paid_price of ten dollars to get the discount percentage?

### Possible Answers
- A. 2
- B. 0.222
- C. 0
- D. 0.2

Answer: C

# 6. Aliasing with functions
### Exercises
Aliasing can be a lifesaver, especially as we start to do more complex SQL queries with multiple criteria. Aliases help you keep your code clean and readable. For example, if you want to find the MAX() value of several fields without aliasing, you'll end up with the result with several columns called max and no idea which is which. You can fix this with aliasing.

Now, it's over to you to clean up the following queries.

### task 1
### Instruction
Select the title and duration in hours for all films and alias as duration_hours; since the current durations are in minutes, you'll need to divide duration by 60.0.

In [18]:
%%sql

-- Calculate the title and duration_hours from films
SELECT title, duration / 60.0 AS duration_hours
FROM cinema.films
LIMIT 3; --just an addition, so that the table is not elongated

 * postgresql://postgres:***@localhost:2828/datacamp
3 rows affected.


title,duration_hours
Intolerance: Love's Struggle Throughout the Ages,2.05
Over the Hill to the Poorhouse,1.8333333333333333
The Big Parade,2.5166666666666666


### task 2
### Instruction
Calculate the percentage of people who are no longer alive and alias the result as percentage_dead.

In [19]:
%%sql

-- Calculate the percentage of people who are no longer alive
SELECT COUNT(deathdate) * 100.0 / COUNT(*) AS percentage_dead
FROM cinema.people
LIMIT 3; --just an addition, so that the table is not elongated

 * postgresql://postgres:***@localhost:2828/datacamp
1 rows affected.


percentage_dead
9.360485887817076


### task 3
### Instruction
Find how many decades the films table covers by using MIN() and MAX() and alias as number_of_decades.

In [22]:
%%sql

-- Find the number of decades in the films table
SELECT (MAX(release_year) - MIN(release_year)) / 10.0 AS number_of_decades
FROM cinema.films;

 * postgresql://postgres:***@localhost:2828/datacamp
1 rows affected.


number_of_decades
10.0


# 7. Rounding results
### Exercises
You found some valuable insights in the previous exercise, but many of the results were inconveniently long. We forgot to round! We won't make you redo them all; however, you'll update the worst offender in this exercise.

### Instruction
Update the query by adding ROUND() around the calculation and round to two decimal places.

In [29]:
%%sql

-- Round duration_hours to two decimal places
SELECT title, ROUND(duration / 60.0, 2) AS duration_hours
FROM cinema.films
LIMIT 3; --just an addition, so that the table is not elongated

 * postgresql://postgres:***@localhost:2828/datacamp
3 rows affected.


title,duration_hours
Intolerance: Love's Struggle Throughout the Ages,2.05
Over the Hill to the Poorhouse,1.83
The Big Parade,2.52
