In [1]:
import pandas as pd
import sqlalchemy as sa
import psycopg2 as ps
from sqlalchemy import create_engine

In [2]:
%load_ext sql
%sql postgresql://postgres:lingga28@localhost:2828/datacamp
conn = create_engine('postgresql://postgres:lingga28@localhost/datacamp')

# 1. Basic Correlated Subqueries
### Exercises
Correlated subqueries are subqueries that reference one or more columns in the main query. Correlated subqueries depend on information in the main query to run, and thus, cannot be executed on their own.

Correlated subqueries are evaluated in SQL once per row of data retrieved -- a process that takes a lot more computing power and time than a simple subquery.

In this exercise, you will practice using correlated subqueries to examine matches with scores that are extreme outliers for each country -- above 3 times the average score!

### Instructions
- Select the country_id, date, home_goal, and away_goal columns in the main query.
- Complete the AVG value in the subquery.
- Complete the subquery column references, so that country_id is matched in the main and subquery.

In [3]:
%%sql

SELECT 
	-- Select country ID, date, home, and away goals from match
	main.country_id,
    main.date,
    main.home_goal, 
    main.away_goal
FROM match AS main
WHERE 
	-- Filter the main query by the subquery
	(home_goal + away_goal) > 
        (SELECT AVG((sub.home_goal + sub.away_goal) * 3)
         FROM match AS sub
         -- Join the main query to the subquery in WHERE
         WHERE main.country_id = sub.country_id)
LIMIT 3; --just an addition, so that the table is not elongated

 * postgresql://postgres:***@localhost:2828/datacamp
3 rows affected.


country_id,date,home_goal,away_goal
1,2011-10-29,4,5
1729,2011-08-28,8,2
1729,2012-12-29,7,3


# 2. Correlated subquery with multiple conditions
### Exercises
Correlated subqueries are useful for matching data across multiple columns. In the previous exercise, you generated a list of matches with extremely high scores for each country. In this exercise, you're going to add an additional column for matching to answer the question -- what was the highest scoring match for each country, in each season?

*Note: this query may take a while to load.

### Instructions
- Select the country_id, date, home_goal, and away_goal columns in the main query.
- Complete the subquery: Select the matches with the highest number of total goals.
- Match the subquery to the main query using country_id and season.
- Fill in the correct logical operator so that total goals equals the max goals recorded in the subquery.

In [4]:
%%sql

SELECT 
	-- Select country ID, date, home, and away goals from match
	main.country_id,
    main.date,
    main.home_goal,
    main.away_goal
FROM match AS main
WHERE 
	-- Filter for matches with the highest number of goals scored
	(home_goal + away_goal) =
        (SELECT MAX(sub.home_goal + sub.away_goal)
         FROM match AS sub
         WHERE main.country_id = sub.country_id
               AND main.season = sub.season)
LIMIT 3; --just an addition, so that the table is not elongated

 * postgresql://postgres:***@localhost:2828/datacamp
3 rows affected.


country_id,date,home_goal,away_goal
1,2011-10-29,4,5
1,2012-11-17,2,6
1,2012-12-09,1,7


# 3. Nested simple subqueries
### Exercises
Nested subqueries can be either simple or correlated.

Just like an unnested subquery, a nested subquery's components can be executed independently of the outer query, while a correlated subquery requires both the outer and inner subquery to run and produce results.

In this exercise, you will practice creating a nested subquery to examine the highest total number of goals in each season, overall, and during July across all seasons.

### Instructions
- Complete the main query to select the season and the max total goals in a match for each season. Name this max_goals.
- Complete the first simple subquery to select the max total goals in a match across all seasons. Name this overall_max_goals.
- Complete the nested subquery to select the maximum total goals in a match played in July across all seasons.
- Select the maximum total goals in the outer subquery. Name this entire subquery july_max_goals.

In [7]:
%%sql

SELECT
	-- Select the season and max goals scored in a match
	season,
    MAX(home_goal + away_goal) AS max_goals,
    -- Select the overall max goals scored in a match
   (SELECT MAX(home_goal + away_goal) FROM match) AS overall_max_goals,
   -- Select the max number of goals scored in any match in July
   (SELECT MAX(home_goal + away_goal) 
    FROM match
    WHERE id IN (
          SELECT id FROM match WHERE EXTRACT(MONTH FROM date) = 07)) AS july_max_goals
FROM match
GROUP BY season;

 * postgresql://postgres:***@localhost:2828/datacamp
4 rows affected.


season,max_goals,overall_max_goals,july_max_goals
2013/2014,10,11,7
2012/2013,11,11,7
2014/2015,10,11,7
2011/2012,10,11,7


# 4. Nest a subquery in FROM
### Exercises
What's the average number of matches per season where a team scored 5 or more goals? How does this differ by country?

Let's use a nested, correlated subquery to perform this operation. In the real world, you will probably find that nesting multiple subqueries is a task you don't have to perform often. In some cases, however, you may find yourself struggling to properly group by the column you want, or to calculate information requiring multiple mathematical transformations (i.e., an AVG of a COUNT).

Nesting subqueries and performing your transformations one step at a time, adding it to a subquery, and then performing the next set of transformations is often the easiest way to yield accurate information about your data. Let's get to it!

### task 1
### Instruction
Generate a list of matches where at least one team scored 5 or more goals.

In [8]:
%%sql

-- Select matches where a team scored 5+ goals
SELECT
	country_id,
    season,
	id
FROM match
WHERE home_goal >=5 OR away_goal >=5
LIMIT 3; --just an addition, so that the table is not elongated

 * postgresql://postgres:***@localhost:2828/datacamp
3 rows affected.


country_id,season,id
1,2011/2012,764
1,2011/2012,766
1,2011/2012,781


### task 2
### Instruction
- Turn the query from the previous step into a subquery in the FROM statement.
- COUNT the match ids generated in the previous step, and group the query by country_id and season.

In [9]:
%%sql

-- Count match ids
SELECT
    country_id,
    season,
    COUNT(id) AS matches
-- Set up and alias the subquery
FROM (
	SELECT
    	country_id,
    	season,
    	id
	FROM match
	WHERE home_goal >= 5 OR away_goal >= 5) 
    AS subquery
-- Group by country_id and season
GROUP BY country_id, season
LIMIT 3; --just an addition, so that the table is not elongated

 * postgresql://postgres:***@localhost:2828/datacamp
3 rows affected.


country_id,season,matches
19694,2012/2013,5
21518,2012/2013,23
13274,2011/2012,24


### task 3
### Instruction
- Finally, declare the same query from step 2 as a subquery in FROM with the alias outer_s.
- Left join it to the country table using the outer query's country_id column.
- Calculate an AVG of high scoring matches per country in the main query.

In [10]:
%%sql

SELECT
	c.name AS country,
    -- Calculate the average matches per season
	AVG(c.id) AS avg_seasonal_high_scores
FROM country AS c
-- Left join outer_s to country
LEFT JOIN (
  SELECT country_id, season,
         COUNT(id) AS matches
  FROM (
    SELECT country_id, season, id
	FROM match
	WHERE home_goal >= 5 OR away_goal >= 5) AS inner_s
  -- Close parentheses and alias the subquery
  GROUP BY country_id, season) AS outer_s
ON c.id = outer_s.country_id
GROUP BY country;

 * postgresql://postgres:***@localhost:2828/datacamp
11 rows affected.


country,avg_seasonal_high_scores
Portugal,17642.0
France,4769.0
Scotland,19694.0
Netherlands,13274.0
Spain,21518.0
Belgium,1.0
Italy,10257.0
Germany,7809.0
England,1729.0
Switzerland,24558.0


# 5. Clean up with CTEs
### Exercises
In chapter 2, you generated a list of countries and the number of matches in each country with more than 10 total goals. The query in that exercise utilized a subquery in the FROM statement in order to filter the matches before counting them in the main query. Below is the query you created:

SELECT\
  c.name AS country,\
  COUNT(sub.id) AS matches\
FROM country AS c\
INNER JOIN (\
  SELECT country_id, id \
  FROM match\
  WHERE (home_goal + away_goal) >= 10) AS sub\
ON c.id = sub.country_id\
GROUP BY country;

You can list one (or more) subqueries as common table expressions (CTEs) by declaring them ahead of your main query, which is an excellent tool for organizing information and placing it in a logical order.

In this exercise, let's rewrite a similar query using a CTE.

### Instructions
- Complete the syntax to declare your CTE.
- Select the country_id and match id from the match table in your CTE.
- Left join the CTE to the league table using country_id.

In [11]:
%%sql

-- Set up your CTE
WITH match_list AS (
    SELECT 
  		country_id, 
  		id
    FROM match
    WHERE (home_goal + away_goal) >= 10)
-- Select league and count of matches from the CTE
SELECT
    l.name AS league,
    COUNT(match_list.id) AS matches
FROM league AS l
-- Join the CTE to the league table
LEFT JOIN match_list ON l.id = match_list.country_id
GROUP BY l.name;

 * postgresql://postgres:***@localhost:2828/datacamp
11 rows affected.


league,matches
Switzerland Super League,0
Poland Ekstraklasa,0
Netherlands Eredivisie,1
Scotland Premier League,0
France Ligue 1,0
Spain LIGA BBVA,4
Germany 1. Bundesliga,1
Italy Serie A,0
Portugal Liga ZON Sagres,0
England Premier League,3


# 6. Organizing with CTEs
### Exercises
Previously, you modified a query based on a statement you completed in chapter 2 using common table expressions.

This time, let's expand on the exercise by looking at details about matches with very high scores using CTEs. Just like a subquery in FROM, you can join tables inside a CTE.

### Instructions
- Declare your CTE, where you create a list of all matches with the league name.
- Select the league, date, home, and away goals from the CTE.
- Filter the main query for matches with 10 or more goals.

In [12]:
%%sql

-- Set up your CTE
WITH match_list as (
  -- Select the league, date, home, and away goals
    SELECT 
  		l.name AS league, 
     	date, 
  		m.home_goal, 
  		m.away_goal,
       (m.home_goal + m.away_goal) AS total_goals
    FROM match AS m
    LEFT JOIN league as l ON m.country_id = l.id)
-- Select the league, date, home, and away goals from the CTE
SELECT league, date,home_goal, away_goal
FROM match_list
-- Filter by total goals
WHERE total_goals >= 10;

 * postgresql://postgres:***@localhost:2828/datacamp
9 rows affected.


league,date,home_goal,away_goal
England Premier League,2011-08-28,8,2
England Premier League,2012-12-29,7,3
England Premier League,2013-05-19,5,5
Germany 1. Bundesliga,2013-03-30,9,2
Netherlands Eredivisie,2011-11-06,6,4
Spain LIGA BBVA,2013-10-30,7,3
Spain LIGA BBVA,2015-04-05,9,1
Spain LIGA BBVA,2015-05-23,7,3
Spain LIGA BBVA,2014-09-20,2,8


# 7. CTEs with nested subqueries
### Exercises
If you find yourself listing multiple subqueries in the FROM clause with nested statement, your query will likely become long, complex, and difficult to read.

Since many queries are written with the intention of being saved and re-run in the future, proper organization is key to a seamless workflow. Arranging subqueries as CTEs will save you time, space, and confusion in the long run!

### Instructions
- Declare a CTE that calculates the total goals from matches in August of the 2013/2014 season.
- Left join the CTE onto the league table using country_id from the match_list CTE.
- Filter the list on the inner subquery to only select matches in August of the 2013/2014 season.

In [13]:
%%sql

-- Set up your CTE
WITH match_list AS (
    SELECT 
  		country_id,
  	   (home_goal + away_goal) AS goals
    FROM match
  	-- Create a list of match IDs to filter data in the CTE
    WHERE id IN (
       SELECT id
       FROM match
       WHERE season = '2013/2014' AND EXTRACT(MONTH FROM DATE) = 08 ))
-- Select the league name and average of goals in the CTE
SELECT 
	l.name,
    AVG(match_list.goals)
FROM league AS l
-- Join the CTE onto the league table
LEFT JOIN match_list ON l.id = match_list.country_id
GROUP BY l.name;

 * postgresql://postgres:***@localhost:2828/datacamp
11 rows affected.


name,avg
Switzerland Super League,1.9375
Poland Ekstraklasa,2.310344827586207
Netherlands Eredivisie,3.414634146341464
Scotland Premier League,2.1379310344827585
France Ligue 1,2.027027027027027
Spain LIGA BBVA,2.92
Germany 1. Bundesliga,3.235294117647059
Italy Serie A,2.75
Portugal Liga ZON Sagres,3.0
England Premier League,2.0


# 8. Get team names with a subquery
### Exercises
Let's solve a problem we've encountered a few times in this course so far -- How do you get both the home and away team names into one final query result?

Out of the 4 techniques we just discussed, this can be performed using subqueries, correlated subqueries, and CTEs. Let's practice creating similar result sets using each of these 3 methods over the next 3 exercises, starting with subqueries in FROM.

### task 1
### Instruction
Create a query that left joins team to match in order to get the identity of the home team. This becomes the subquery in the next step.

In [14]:
%%sql

SELECT 
	m.id, 
    t.team_long_name AS hometeam
-- Left join team to match
FROM match AS m
LEFT JOIN team as t
ON m.hometeam_id = team_api_id
LIMIT 3; --just an addition, so that the table is not elongated

 * postgresql://postgres:***@localhost:2828/datacamp
3 rows affected.


id,hometeam
757,Oud-Heverlee Leuven
758,RAEC Mons
759,KRC Genk


### task 2
### Instruction
Add a second subquery to the FROM statement to get the away team name, changing only the hometeam_id. Left join both subqueries to the match table on the id column.

Warning: if your code is timing out, you have probably made a mistake in the JOIN and tried to join on the wrong fields which caused the table to be too big! Read the provided code and comments carefully, and check your ON conditions!

In [15]:
%%sql

SELECT
	m.date,
    -- Get the home and away team names
    hometeam,
    awayteam,
    m.home_goal,
    m.away_goal
FROM match AS m

-- Join the home subquery to the match table
LEFT JOIN (
  SELECT match.id, team.team_long_name AS hometeam
  FROM match
  LEFT JOIN team
  ON match.hometeam_id = team.team_api_id) AS home
ON home.id = m.id

-- Join the away subquery to the match table
LEFT JOIN (
  SELECT match.id, team.team_long_name AS awayteam
  FROM match
  LEFT JOIN team
  -- Get the away team ID in the subquery
  ON match.awayteam_id = team.team_api_id) AS away
ON away.id = m.id
LIMIT 3; --just an addition, so that the table is not elongated

 * postgresql://postgres:***@localhost:2828/datacamp
3 rows affected.


date,hometeam,awayteam,home_goal,away_goal
2011-07-29,Oud-Heverlee Leuven,RSC Anderlecht,2,1
2011-07-30,RAEC Mons,Standard de Liège,1,1
2011-07-30,KRC Genk,Beerschot AC,3,1


# 9. Get team names with correlated subqueries
### Exercises
Let's solve the same problem using correlated subqueries -- How do you get both the home and away team names into one final query result?

This can easily be performed using correlated subqueries. But how might that impact the performance of your query? Complete the following steps and let's find out!

Please note that your query will run more slowly than the previous exercise!

### task 1
### Instruction
Using a correlated subquery in the SELECT statement, match the team_api_id column from team to the hometeam_id from match.

In [17]:
%%sql

SELECT
    m.date,
   (SELECT team_long_name
    FROM team AS t
    -- Connect the team to the match table
    WHERE m.hometeam_id = t.team_api_id) AS hometeam
FROM match AS m
LIMIT 3; --just an addition, so that the table is not elongated

 * postgresql://postgres:***@localhost:2828/datacamp
3 rows affected.


date,hometeam
2011-07-29,Oud-Heverlee Leuven
2011-07-30,RAEC Mons
2011-07-30,KRC Genk


### task 2
### Instruction
- Create a second correlated subquery in SELECT, yielding the away team's name.
- Select the home and away goal columns from match in the main query.

In [18]:
%%sql

SELECT
    m.date,
   (SELECT team_long_name
    FROM team AS t
    WHERE t.team_api_id = m.hometeam_id) AS hometeam,
    -- Connect the team to the match table
   (SELECT team_long_name
    FROM team AS t
    WHERE t.team_api_id = m.awayteam_id) AS awayteam,
   -- Select home and away goals
    m.home_goal,
    m.away_goal
FROM match AS m
LIMIT 3; --just an addition, so that the table is not elongated

 * postgresql://postgres:***@localhost:2828/datacamp
3 rows affected.


date,hometeam,awayteam,home_goal,away_goal
2011-07-29,Oud-Heverlee Leuven,RSC Anderlecht,2,1
2011-07-30,RAEC Mons,Standard de Liège,1,1
2011-07-30,KRC Genk,Beerschot AC,3,1


# 10. Get team names with CTEs
### Exercises
You've now explored two methods for answering the question, How do you get both the home and away team names into one final query result?

Let's explore the final method - common table expressions. Common table expressions are similar to the subquery method for generating results, mainly differing in syntax and the order in which information is processed.

### task 1
### Instruction
Select id from match and team_long_name from team. Join these two tables together on hometeam_id in match and team_api_id in team.

In [19]:
%%sql

SELECT 
	-- Select match id and team long name
    m.id, 
    t.team_long_name AS hometeam
FROM match AS m
-- Join team to match using team_api_id and hometeam_id
LEFT JOIN team AS t 
ON t.team_api_id = m.hometeam_id
LIMIT 3; --just an addition, so that the table is not elongated

 * postgresql://postgres:***@localhost:2828/datacamp
3 rows affected.


id,hometeam
757,Oud-Heverlee Leuven
758,RAEC Mons
759,KRC Genk


### task 2
### Instruction
Declare the query from the previous step as a common table expression. SELECT everything from the CTE into the main query. Your results will not change at this step!

In [20]:
%%sql

-- Declare the home CTE
WITH home AS (
	SELECT m.id, t.team_long_name AS hometeam
	FROM match AS m
	LEFT JOIN team AS t 
	ON m.hometeam_id = t.team_api_id)
-- Select everything from home
SELECT *
FROM home
LIMIT 3; --just an addition, so that the table is not elongated

 * postgresql://postgres:***@localhost:2828/datacamp
3 rows affected.


id,hometeam
757,Oud-Heverlee Leuven
758,RAEC Mons
759,KRC Genk


### task 3
### Instruction
- Let's declare the second CTE, away. Join it to the first CTE on the id column.
- The date, home_goal, and away_goal columns have been added to the CTEs. SELECT them into the main query.

In [21]:
%%sql

WITH home AS (
  SELECT m.id, m.date, 
  		 t.team_long_name AS hometeam, m.home_goal
  FROM match AS m
  LEFT JOIN team AS t 
  ON m.hometeam_id = t.team_api_id),
-- Declare and set up the away CTE
away AS (
  SELECT m.id, m.date, 
  		 t.team_long_name AS awayteam, m.away_goal
  FROM match AS m
  LEFT JOIN team AS t 
  ON m.awayteam_id = t.team_api_id)
-- Select date, home_goal, and away_goal
SELECT 
	home.date,
    home.hometeam,
    away.awayteam,
    home.home_goal,
    away.away_goal
-- Join away and home on the id column
FROM home
INNER JOIN away
ON home.id = away.id
LIMIT 3; --just an addition, so that the table is not elongated

 * postgresql://postgres:***@localhost:2828/datacamp
3 rows affected.


date,hometeam,awayteam,home_goal,away_goal
2011-07-29,Oud-Heverlee Leuven,RSC Anderlecht,2,1
2011-07-30,RAEC Mons,Standard de Liège,1,1
2011-07-30,KRC Genk,Beerschot AC,3,1


# 11. Which technique to use?
### Exercises
The previous three exercises demonstrated that, in many cases, you can use multiple techniques in SQL to answer the same question.

Based on what you learned, which of the following statements is false regarding differences in the use and performance of multiple/nested subqueries, correlated subqueries, and common table expressions?

### Answer the question
### Possible Answers
- A. Correlated subqueries can allow you to circumvent multiple, complex joins.
- B. Common table expressions are declared first, improving query run time.
- C. Correlated subqueries can reduce the length of your query, which improves query run time.
- D. Multiple or nested subqueries are processed first, before your main query.

Answer: C