## Correlated Queries, Nested Queries, and Common Table Expressions
- learn how to use nested and correlated subqueries to extract more complex data from a relational database.
- learn about common table expressions 
- and how to best construct queries using multiple common table expressions.

In [1]:
import pandas as pd
import sqlite3
%reload_ext sql
%sql sqlite:///database.sqlite

con=sqlite3.connect("database.sqlite")
mycur = con.cursor()
mycur.execute("SELECT name FROM sqlite_master WHERE type='table' ORDER BY name;")
available_table=(mycur.fetchall())
con.close() 
available_table


[('Country',),
 ('League',),
 ('Match',),
 ('Player',),
 ('Player_Attributes',),
 ('Team',),
 ('Team_Attributes',),
 ('sqlite_sequence',)]

### Basic Correlated Subqueries
- Correlated subqueries are subqueries that reference one or more columns in the main query. Correlated subqueries depend on information in the main query to run, and thus, cannot be executed on their own.

- Correlated subqueries are evaluated in SQL once per row of data retrieved -- a process that takes a lot more computing power and time than a simple subquery.

In [2]:
%%sql
SELECT * FROM Match
LIMIT 5;

 * sqlite:///database.sqlite
Done.


id,country_id,league_id,season,stage,date,match_api_id,hometeam_id,awayteam_id,home_goal,away_goal,home_player_X1,home_player_X2,home_player_X3,home_player_X4,home_player_X5,home_player_X6,home_player_X7,home_player_X8,home_player_X9,home_player_X10,home_player_X11,away_player_X1,away_player_X2,away_player_X3,away_player_X4,away_player_X5,away_player_X6,away_player_X7,away_player_X8,away_player_X9,away_player_X10,away_player_X11,home_player_Y1,home_player_Y2,home_player_Y3,home_player_Y4,home_player_Y5,home_player_Y6,home_player_Y7,home_player_Y8,home_player_Y9,home_player_Y10,home_player_Y11,away_player_Y1,away_player_Y2,away_player_Y3,away_player_Y4,away_player_Y5,away_player_Y6,away_player_Y7,away_player_Y8,away_player_Y9,away_player_Y10,away_player_Y11,home_player_1,home_player_2,home_player_3,home_player_4,home_player_5,home_player_6,home_player_7,home_player_8,home_player_9,home_player_10,home_player_11,away_player_1,away_player_2,away_player_3,away_player_4,away_player_5,away_player_6,away_player_7,away_player_8,away_player_9,away_player_10,away_player_11,goal,shoton,shotoff,foulcommit,card,cross,corner,possession,B365H,B365D,B365A,BWH,BWD,BWA,IWH,IWD,IWA,LBH,LBD,LBA,PSH,PSD,PSA,WHH,WHD,WHA,SJH,SJD,SJA,VCH,VCD,VCA,GBH,GBD,GBA,BSH,BSD,BSA
1,1,1,2008/2009,1,2008-08-17 00:00:00,492473,9987,9993,1,1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1.73,3.4,5.0,1.75,3.35,4.2,1.85,3.2,3.5,1.8,3.3,3.75,,,,1.7,3.3,4.33,1.9,3.3,4.0,1.65,3.4,4.5,1.78,3.25,4.0,1.73,3.4,4.2
2,1,1,2008/2009,1,2008-08-16 00:00:00,492474,10000,9994,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1.95,3.2,3.6,1.8,3.3,3.95,1.9,3.2,3.5,1.9,3.2,3.5,,,,1.83,3.3,3.6,1.95,3.3,3.8,2.0,3.25,3.25,1.85,3.25,3.75,1.91,3.25,3.6
3,1,1,2008/2009,1,2008-08-16 00:00:00,492475,9984,8635,0,3,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2.38,3.3,2.75,2.4,3.3,2.55,2.6,3.1,2.3,2.5,3.2,2.5,,,,2.5,3.25,2.4,2.63,3.3,2.5,2.35,3.25,2.65,2.5,3.2,2.5,2.3,3.2,2.75
4,1,1,2008/2009,1,2008-08-17 00:00:00,492476,9991,9998,5,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1.44,3.75,7.5,1.4,4.0,6.8,1.4,3.9,6.0,1.44,3.6,6.5,,,,1.44,3.75,6.0,1.44,4.0,7.5,1.45,3.75,6.5,1.5,3.75,5.5,1.44,3.75,6.5
5,1,1,2008/2009,1,2008-08-16 00:00:00,492477,7947,9985,1,3,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,5.0,3.5,1.65,5.0,3.5,1.6,4.0,3.3,1.7,4.0,3.4,1.72,,,,4.2,3.4,1.7,4.5,3.5,1.73,4.5,3.4,1.65,4.5,3.5,1.65,4.75,3.3,1.67


In [3]:
%%sql
SELECT 
    main.country_id,
    main.date,
    main.home_goal, 
    main.away_goal
FROM Match AS main
WHERE 
    main.season ='2013/2014'
    AND (main.home_goal + main.away_goal) > 
        (SELECT AVG((sub.home_goal + sub.away_goal) * 3)
         FROM Match AS sub
         WHERE main.country_id = sub.country_id)
LIMIT 5;

 * sqlite:///database.sqlite
Done.


country_id,date,home_goal,away_goal
1729,2013-12-14 00:00:00,6,3
1729,2014-03-22 00:00:00,3,6
4769,2014-04-20 00:00:00,2,6
4769,2014-04-20 00:00:00,4,4
10257,2014-04-13 00:00:00,3,5


### Correlated subquery with multiple conditions

In [4]:
%%sql
SELECT 

    main.country_id,
    main.date,
    
    main.home_goal,
    main.away_goal
FROM Match AS main
WHERE 
    (home_goal + away_goal) =
        (SELECT MAX(sub.home_goal + sub.away_goal)
         FROM Match AS sub
         WHERE main.country_id = sub.country_id
               AND main.season = sub.season)
LIMIT 5;

 * sqlite:///database.sqlite
Done.


country_id,date,home_goal,away_goal
1,2008-10-25 00:00:00,7,1
1,2009-12-04 00:00:00,2,5
1,2009-12-26 00:00:00,5,2
1,2010-11-20 00:00:00,4,4
1,2010-09-19 00:00:00,5,3


### Nested simple subqueries
- Nested subqueries can be either simple or correlated.

- Just like an unnested subquery, a nested subquery's components can be executed independently of the outer query, while a correlated subquery requires both the outer and inner subquery to run and produce results.

In [5]:
%%sql
SELECT
    season,
    Max(home_goal + away_goal) AS max_goals,
    (SELECT MAX(home_goal + away_goal) FROM Match) AS overall_max_goals,
    (SELECT Max(home_goal + away_goal) 
        FROM Match
        WHERE id IN (
          SELECT id FROM Match WHERE STRFTIME('%m', date) = '07')) AS july_max_goals
            --EXTRACT(MONTH FROM date) = 07)
            
FROM Match
GROUP BY season;

 * sqlite:///database.sqlite
Done.


season,max_goals,overall_max_goals,july_max_goals
2008/2009,9,12,8
2009/2010,12,12,8
2010/2011,10,12,8
2011/2012,10,12,8
2012/2013,11,12,8
2013/2014,10,12,8
2014/2015,10,12,8
2015/2016,12,12,8


### Nest a subquery in FROM

In [6]:
%%sql
SELECT
	c.name AS country,
    -- Calculate the average matches per season
	AVG(outer_s.matches) AS avg_seasonal_high_scores
FROM country AS c
-- Left join outer_s to country
LEFT JOIN (
  SELECT country_id, season,
         COUNT(id) AS matches
  FROM (
    SELECT country_id, season, id
	FROM match
	WHERE home_goal >= 5 OR away_goal >= 5) AS inner_s
  -- Close parentheses and alias the subquery
  GROUP BY country_id, season) AS outer_s
ON c.id = outer_s.country_id
GROUP BY country;

 * sqlite:///database.sqlite
Done.


country,avg_seasonal_high_scores
Belgium,9.571428571428571
England,14.5
France,8.0
Germany,13.75
Italy,8.5
Netherlands,20.125
Poland,5.857142857142857
Portugal,8.625
Scotland,7.125
Spain,19.125


### Common Table Experession CTE


In [7]:
%%sql
-- Set up your CTE
WITH match_list AS (
    SELECT 
  		country_id, 
  		id
    FROM match
    WHERE (home_goal + away_goal) >= 10)
-- Select league and count of matches from the CTE
SELECT
    l.name AS league,
    COUNT(match_list.id) AS matches
FROM league AS l
-- Join the CTE to the league table
LEFT JOIN match_list ON l.id = match_list.country_id
GROUP BY l.name;

 * sqlite:///database.sqlite
Done.


league,matches
Belgium Jupiler League,0
England Premier League,4
France Ligue 1,1
Germany 1. Bundesliga,1
Italy Serie A,0
Netherlands Eredivisie,2
Poland Ekstraklasa,0
Portugal Liga ZON Sagres,0
Scotland Premier League,1
Spain LIGA BBVA,5


In [8]:
%%sql

-- Set up your CTE
with match_list as (
  -- Select the league, date, home, and away goals
    SELECT 
  		l.name AS league, 
     	m.date, 
  		m.home_goal, 
  		m.away_goal,
       (m.home_goal + m.away_goal) AS total_goals
    FROM match AS m
    LEFT JOIN league as l ON m.country_id = l.id)
-- Select the league, date, home, and away goals from the CTE
SELECT league, date, home_goal, away_goal
FROM match_list
-- Filter by total goals
WHERE total_goals >= 10;

 * sqlite:///database.sqlite
Done.


league,date,home_goal,away_goal
England Premier League,2009-11-22 00:00:00,9,1
England Premier League,2011-08-28 00:00:00,8,2
England Premier League,2012-12-29 00:00:00,7,3
England Premier League,2013-05-19 00:00:00,5,5
France Ligue 1,2009-11-08 00:00:00,5,5
Germany 1. Bundesliga,2013-03-30 00:00:00,9,2
Netherlands Eredivisie,2010-10-24 00:00:00,10,0
Netherlands Eredivisie,2011-11-06 00:00:00,6,4
Scotland Premier League,2010-05-05 00:00:00,6,6
Spain LIGA BBVA,2013-10-30 00:00:00,7,3


### CTEs with nested subqueries

In [9]:
%%sql

with match_list as (
    SELECT 
        country_id,
         (home_goal + away_goal) AS goals
    FROM Match      
    WHERE id IN (
       SELECT id
       FROM Match
       WHERE season = '2013/2014'
         AND STRFTIME('%m', date) = '08'
        -- AND EXTRACT(MONTH FROM date) = 8 (Postgre)
    ))
SELECT 
    l.name,
    avg(match_list.goals)
FROM league AS l
LEFT JOIN match_list ON l.id = match_list.country_id
GROUP BY l.name
LIMIT 10;

 * sqlite:///database.sqlite
Done.


name,avg(match_list.goals)
Belgium Jupiler League,
England Premier League,2.0
France Ligue 1,2.027027027027027
Germany 1. Bundesliga,3.235294117647059
Italy Serie A,2.75
Netherlands Eredivisie,3.414634146341464
Poland Ekstraklasa,2.310344827586207
Portugal Liga ZON Sagres,3.0
Scotland Premier League,2.1379310344827585
Spain LIGA BBVA,2.92


In [10]:
%%sql
SELECT
	m.date,
    -- Get the home and away team names
    home.hometeam,
    away.awayteam,
    m.home_goal,
    m.away_goal
FROM match AS m

-- Join the home subquery to the match table
INNER JOIN (
  SELECT match.id, team.team_long_name AS hometeam
  FROM match
  LEFT JOIN team
  ON match.hometeam_id = team.team_api_id) AS home
ON home.id = m.id

-- Join the away subquery to the match table
INNER JOIN (
  SELECT match.id, team.team_long_name AS awayteam
  FROM match
  LEFT JOIN team
  -- Get the away team ID in the subquery
  ON match.awayteam_id = team.team_api_id) AS away
ON away.id = m.id
LIMIT 10;

 * sqlite:///database.sqlite
Done.


date,hometeam,awayteam,home_goal,away_goal
2008-08-17 00:00:00,KRC Genk,Beerschot AC,1,1
2008-08-16 00:00:00,SV Zulte-Waregem,Sporting Lokeren,0,0
2008-08-16 00:00:00,KSV Cercle Brugge,RSC Anderlecht,0,3
2008-08-17 00:00:00,KAA Gent,RAEC Mons,5,0
2008-08-16 00:00:00,FCV Dender EH,Standard de Liège,1,3
2008-09-24 00:00:00,KV Mechelen,Club Brugge KV,1,1
2008-08-16 00:00:00,KSV Roeselare,KV Kortrijk,2,2
2008-08-16 00:00:00,Tubize,Royal Excel Mouscron,1,2
2008-08-16 00:00:00,KVC Westerlo,Sporting Charleroi,1,0
2008-11-01 00:00:00,Club Brugge KV,KV Kortrijk,4,1


In [11]:
%%sql

SELECT
    m.date,
    (SELECT team_long_name
     FROM team AS t
     WHERE t.team_api_id = m.hometeam_id) AS hometeam,
    -- Connect the team to the match table
    (SELECT team_long_name
     FROM team AS t
     WHERE m.awayteam_id = t.team_api_id) AS awayteam,
    -- Select home and away goals
     m.home_goal,
     m.away_goal
FROM match AS m
LIMIT 10;

 * sqlite:///database.sqlite
Done.


date,hometeam,awayteam,home_goal,away_goal
2008-08-17 00:00:00,KRC Genk,Beerschot AC,1,1
2008-08-16 00:00:00,SV Zulte-Waregem,Sporting Lokeren,0,0
2008-08-16 00:00:00,KSV Cercle Brugge,RSC Anderlecht,0,3
2008-08-17 00:00:00,KAA Gent,RAEC Mons,5,0
2008-08-16 00:00:00,FCV Dender EH,Standard de Liège,1,3
2008-09-24 00:00:00,KV Mechelen,Club Brugge KV,1,1
2008-08-16 00:00:00,KSV Roeselare,KV Kortrijk,2,2
2008-08-16 00:00:00,Tubize,Royal Excel Mouscron,1,2
2008-08-16 00:00:00,KVC Westerlo,Sporting Charleroi,1,0
2008-11-01 00:00:00,Club Brugge KV,KV Kortrijk,4,1


In [12]:
%%sql

WITH home AS (
  SELECT m.id, m.date, 
  		 t.team_long_name AS hometeam, m.home_goal
  FROM match AS m
  LEFT JOIN team AS t 
  ON m.hometeam_id = t.team_api_id),
-- Declare and set up the away CTE
away AS (
  SELECT m.id, m.date, 
  		 t.team_long_name AS awayteam, m.away_goal
  FROM match AS m
  LEFT JOIN team AS t 
  ON m.awayteam_id = t.team_api_id)
-- Select date, home_goal, and away_goal
SELECT 
	home.date,
    home.hometeam,
    away.awayteam,
    home.home_goal,
    away.away_goal
-- Join away and home on the id column
FROM home
INNER JOIN away
ON home.id = away.id
LIMIT 10;

 * sqlite:///database.sqlite
Done.


date,hometeam,awayteam,home_goal,away_goal
2008-08-17 00:00:00,KRC Genk,Beerschot AC,1,1
2008-08-16 00:00:00,SV Zulte-Waregem,Sporting Lokeren,0,0
2008-08-16 00:00:00,KSV Cercle Brugge,RSC Anderlecht,0,3
2008-08-17 00:00:00,KAA Gent,RAEC Mons,5,0
2008-08-16 00:00:00,FCV Dender EH,Standard de Liège,1,3
2008-09-24 00:00:00,KV Mechelen,Club Brugge KV,1,1
2008-08-16 00:00:00,KSV Roeselare,KV Kortrijk,2,2
2008-08-16 00:00:00,Tubize,Royal Excel Mouscron,1,2
2008-08-16 00:00:00,KVC Westerlo,Sporting Charleroi,1,0
2008-11-01 00:00:00,Club Brugge KV,KV Kortrijk,4,1
