# Storing Fantasy Premier League data in a database - part 2
- Author: Steffan Rees
- 19/04/2022

## Background
This notebook follows on from **Storing Fantasy Premier League data in a database - part 1** where we extracted fantasy premier league data from the API and stored the transformed data in a SQLite database.

In this notebook we will query the data using SQL and pandas.

It's worth noting that the database and its tables can also be accessed using SQLite Studio.

Pandas has a guide for comparing pandas operations with SQL https://pandas.pydata.org/docs/getting_started/comparison/comparison_with_sql.html.

In [1]:
# Import libraries
import pandas as pd
import sqlite3
from pandasql import sqldf
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import re

In [2]:
# Connect to the database
def sql_connection(db):
    try:
        con = sqlite3.connect(db)
    except Error:
        print(Error)
        
    cursor = con.cursor()
    return con, cursor

In [3]:
con, cursor = sql_connection('fpl.db')

## List the tables in the database

In [4]:
# Using SQLite
cursor.execute("SELECT name FROM sqlite_master WHERE type='table';")
print(cursor.fetchall())

[('gameweeks',), ('teams',), ('positions',), ('players',), ('player_stats',), ('fixtures',)]


In [5]:
# Using pandas
pd.read_sql_query("SELECT name FROM sqlite_master WHERE type='table'", con)

Unnamed: 0,name
0,gameweeks
1,teams
2,positions
3,players
4,player_stats
5,fixtures


## Import data into dataframe

### Teams

In [6]:
teams = pd.read_sql_query("SELECT * FROM teams", con)
print(teams.shape)
teams.head()

(20, 21)


Unnamed: 0,team_id,team_code,team_name,team_short_name,draw,form,loss,played,points,position,...,team_division,unavailable,win,strength_overall_home,strength_overall_away,strength_attack_home,strength_attack_away,strength_defence_home,strength_defence_away,pulse_id
0,1,3,Arsenal,ARS,0,,0,0,0,0,...,,0,0,1250,1270,1150,1210,1190,1220,1
1,2,7,Aston Villa,AVL,0,,0,0,0,0,...,,0,0,1100,1100,1140,1110,1090,1090,2
2,3,94,Brentford,BRE,0,,0,0,0,0,...,,0,0,1060,1070,1120,1150,1080,1120,130
3,4,36,Brighton,BHA,0,,0,0,0,0,...,,0,0,1100,1090,1160,1160,1100,1120,131
4,5,90,Burnley,BUR,0,,0,0,0,0,...,,0,0,1060,1060,1080,1130,1060,1100,43


In [7]:
teams.info(verbose=True)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20 entries, 0 to 19
Data columns (total 21 columns):
 #   Column                 Non-Null Count  Dtype 
---  ------                 --------------  ----- 
 0   team_id                20 non-null     int64 
 1   team_code              20 non-null     int64 
 2   team_name              20 non-null     object
 3   team_short_name        20 non-null     object
 4   draw                   20 non-null     int64 
 5   form                   0 non-null      object
 6   loss                   20 non-null     int64 
 7   played                 20 non-null     int64 
 8   points                 20 non-null     int64 
 9   position               20 non-null     int64 
 10  strength               20 non-null     int64 
 11  team_division          0 non-null      object
 12  unavailable            20 non-null     int64 
 13  win                    20 non-null     int64 
 14  strength_overall_home  20 non-null     int64 
 15  strength_overall_away  20

### Gameweeks

In [8]:
gameweeks = pd.read_sql_query("SELECT * FROM gameweeks", con, dtype={'highest_scoring_entry_id': 'Int32',
                                                                     'highest_score': 'Int32',
                                                                     'most_selected_id': 'Int32',
                                                                     'most_transferred_in_id': 'Int32',
                                                                     'top_player_id': 'Int32',
                                                                     'most_captained_id': 'Int32',
                                                                     'most_vice_captained_id': 'Int32',
                                                                     'most_points_score': 'Int8',
                                                                     'bboost_played': 'Int32',
                                                                     'freehit_played': 'Int32',
                                                                     'wildcard_played': 'Int32',
                                                                     'triple_captain_played': 'Int32'
                                                                    })
print(gameweeks.shape)
gameweeks.head()

(38, 26)


Unnamed: 0,gameweek_id,gameweek_name,deadline_time,average_entry_score,finished,data_checked,highest_scoring_entry_id,deadline_time_epoch,deadline_time_game_offset,highest_score,...,most_transferred_in_id,top_player_id,transfers_made,most_captained_id,most_vice_captained_id,most_points_score,bboost_played,freehit_played,wildcard_played,triple_captain_played
0,1,Gameweek 1,2021-08-13T17:30:00Z,69,1,1,5059647,1628875800,0,150,...,1,277,0,233,277,20,145658,,,225749
1,2,Gameweek 2,2021-08-21T10:00:00Z,56,1,1,6882931,1629540000,0,146,...,272,142,12038724,233,277,18,95038,102410.0,277209.0,269514
2,3,Gameweek 3,2021-08-28T10:00:00Z,54,1,1,7516002,1630144800,0,119,...,419,268,15553648,277,277,18,94049,117627.0,372083.0,138714
3,4,Gameweek 4,2021-09-11T10:00:00Z,57,1,1,7797969,1631354400,0,120,...,579,432,28985870,233,233,13,77204,168976.0,1117718.0,157121
4,5,Gameweek 5,2021-09-17T17:30:00Z,55,1,1,3139954,1631899800,0,144,...,579,44,18706283,233,579,15,49533,127510.0,708431.0,104192


In [9]:
gameweeks.info(verbose=True)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 38 entries, 0 to 37
Data columns (total 26 columns):
 #   Column                     Non-Null Count  Dtype 
---  ------                     --------------  ----- 
 0   gameweek_id                38 non-null     int64 
 1   gameweek_name              38 non-null     object
 2   deadline_time              38 non-null     object
 3   average_entry_score        38 non-null     int64 
 4   finished                   38 non-null     int64 
 5   data_checked               38 non-null     int64 
 6   highest_scoring_entry_id   37 non-null     Int32 
 7   deadline_time_epoch        38 non-null     int64 
 8   deadline_time_game_offset  38 non-null     int64 
 9   highest_score              37 non-null     Int32 
 10  is_previous                38 non-null     int64 
 11  is_current                 38 non-null     int64 
 12  is_next                    38 non-null     int64 
 13  cup_leagues_created        38 non-null     int64 
 14  h2h_ko_match

### Fixtures

In [10]:
fixtures = pd.read_sql_query("SELECT * FROM fixtures", con, dtype={'gameweek_id': 'Int8',
                                                                   'started': 'Int8',
                                                                   'away_team_score': 'Int8',
                                                                   'home_team_score': 'Int8'
                                                                  })
print(fixtures.shape)
fixtures.head()

(380, 16)


Unnamed: 0,fixture_id,fixture_code,gameweek_id,finished,finished_provisional,kickoff_time,minutes,provisional_start_time,started,away_team_id,away_team_score,home_team_id,home_team_score,home_team_difficulty,away_team_difficulty,pulse_id
0,1,2210271,1,1,1,2021-08-13T19:00:00Z,90,0,1,1,0,3,2,4,2,66342
1,2,2210272,1,1,1,2021-08-14T14:00:00Z,90,0,1,4,2,5,1,2,2,66343
2,3,2210273,1,1,1,2021-08-14T14:00:00Z,90,0,1,7,0,6,3,2,4,66344
3,4,2210274,1,1,1,2021-08-14T14:00:00Z,90,0,1,16,1,8,3,2,2,66345
4,5,2210275,1,1,1,2021-08-14T14:00:00Z,90,0,1,20,0,9,1,3,3,66346


In [11]:
fixtures.info(verbose=True)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 380 entries, 0 to 379
Data columns (total 16 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   fixture_id              380 non-null    int64 
 1   fixture_code            380 non-null    int64 
 2   gameweek_id             380 non-null    Int8  
 3   finished                380 non-null    int64 
 4   finished_provisional    380 non-null    int64 
 5   kickoff_time            380 non-null    object
 6   minutes                 380 non-null    int64 
 7   provisional_start_time  380 non-null    int64 
 8   started                 380 non-null    Int8  
 9   away_team_id            380 non-null    int64 
 10  away_team_score         366 non-null    Int8  
 11  home_team_id            380 non-null    int64 
 12  home_team_score         366 non-null    Int8  
 13  home_team_difficulty    380 non-null    int64 
 14  away_team_difficulty    380 non-null    int64 
 15  pulse_

### Players

In [12]:
players = pd.read_sql_query("SELECT * FROM players", con)
print(players.shape)
players.head()

(736, 66)


Unnamed: 0,player_id,player_code,first_name,second_name,web_name,team_id,team_code,chance_of_playing_next_round,chance_of_playing_this_round,cost_change_event,...,threat_rank,threat_rank_type,ict_index_rank,ict_index_rank_type,corners_and_indirect_freekicks_order,corners_and_indirect_freekicks_text,direct_freekicks_order,direct_freekicks_text,penalties_order,penalties_text
0,1,80201,Bernd,Leno,Leno,1,3,100.0,100.0,0,...,598,60,442,30,,,,,,
1,2,115918,Rúnar Alex,Rúnarsson,Rúnarsson,1,3,0.0,0.0,0,...,490,19,541,55,,,,,,
2,3,47431,Willian,Borges Da Silva,Willian,1,3,0.0,0.0,0,...,732,306,732,306,,,,,,
3,4,54694,Pierre-Emerick,Aubameyang,Aubameyang,1,3,0.0,0.0,0,...,56,24,174,31,,,,,,
4,5,58822,Cédric,Soares,Cédric,1,3,100.0,100.0,0,...,319,104,261,86,2.0,,3.0,,,


In [13]:
players.info(verbose=True)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 736 entries, 0 to 735
Data columns (total 66 columns):
 #   Column                                Non-Null Count  Dtype  
---  ------                                --------------  -----  
 0   player_id                             736 non-null    int64  
 1   player_code                           736 non-null    int64  
 2   first_name                            736 non-null    object 
 3   second_name                           736 non-null    object 
 4   web_name                              736 non-null    object 
 5   team_id                               736 non-null    int64  
 6   team_code                             736 non-null    int64  
 7   chance_of_playing_next_round          570 non-null    float64
 8   chance_of_playing_this_round          569 non-null    float64
 9   cost_change_event                     736 non-null    int64  
 10  cost_change_event_fall                736 non-null    int64  
 11  cost_change_start  

### Positions

In [14]:
positions = pd.read_sql_query("SELECT * FROM positions", con)
print(positions.shape)
positions.head()

(4, 9)


Unnamed: 0,position_id,position_name_short,position_name,squad_select,squad_min_play,squad_max_play,ui_shirt_specific,sub_positions_locked,element_count
0,1,GKP,Goalkeeper,2,1,1,1,12.0,83
1,2,DEF,Defender,5,3,5,0,,247
2,3,MID,Midfielder,5,2,5,0,,308
3,4,FWD,Forward,3,1,3,0,,98


In [15]:
positions.info(verbose=True)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 9 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   position_id           4 non-null      int64  
 1   position_name_short   4 non-null      object 
 2   position_name         4 non-null      object 
 3   squad_select          4 non-null      int64  
 4   squad_min_play        4 non-null      int64  
 5   squad_max_play        4 non-null      int64  
 6   ui_shirt_specific     4 non-null      int64  
 7   sub_positions_locked  1 non-null      float64
 8   element_count         4 non-null      int64  
dtypes: float64(1), int64(6), object(2)
memory usage: 416.0+ bytes


## Queries

### Add team name and position to players data

#### SQLite

In [16]:
query = """
SELECT 
    pl.player_id,
    pl.first_name,
    pl.second_name,
    pl.team_id,
    t.team_name,
    pl.position_id,
    po.position_name
FROM players as pl 
INNER JOIN positions as po
    ON po.position_id = pl.position_id
INNER JOIN teams as t
    on t.team_id = pl.team_id
ORDER BY pl.player_id ASC;
"""

cursor.execute(query)
print(cursor.fetchall())

[(1, 'Bernd', 'Leno', 1, 'Arsenal', 1, 'Goalkeeper'), (2, 'Rúnar Alex', 'Rúnarsson', 1, 'Arsenal', 1, 'Goalkeeper'), (3, 'Willian', 'Borges Da Silva', 1, 'Arsenal', 3, 'Midfielder'), (4, 'Pierre-Emerick', 'Aubameyang', 1, 'Arsenal', 4, 'Forward'), (5, 'Cédric', 'Soares', 1, 'Arsenal', 2, 'Defender'), (6, 'Alexandre', 'Lacazette', 1, 'Arsenal', 4, 'Forward'), (7, 'Granit', 'Xhaka', 1, 'Arsenal', 3, 'Midfielder'), (8, 'Pablo', 'Marí', 1, 'Arsenal', 2, 'Defender'), (9, 'Héctor', 'Bellerín', 1, 'Arsenal', 2, 'Defender'), (10, 'Calum', 'Chambers', 2, 'Aston Villa', 2, 'Defender'), (11, 'Sead', 'Kolasinac', 1, 'Arsenal', 2, 'Defender'), (12, 'Mohamed Naser', 'El Sayed Elneny', 1, 'Arsenal', 3, 'Midfielder'), (13, 'Ainsley', 'Maitland-Niles', 1, 'Arsenal', 3, 'Midfielder'), (14, 'Rob', 'Holding', 1, 'Arsenal', 2, 'Defender'), (15, 'Thomas', 'Partey', 1, 'Arsenal', 3, 'Midfielder'), (16, 'Kieran', 'Tierney', 1, 'Arsenal', 2, 'Defender'), (17, 'Nicolas', 'Pépé', 1, 'Arsenal', 3, 'Midfielder'), 

In [17]:
# Storing the query results in a dataframe
query = """
SELECT 
    pl.player_id,
    pl.first_name,
    pl.second_name,
    pl.team_id,
    t.team_name,
    pl.position_id,
    po.position_name
FROM players as pl 
INNER JOIN positions as po
    ON po.position_id = pl.position_id
INNER JOIN teams as t
    on t.team_id = pl.team_id
ORDER BY pl.player_id ASC;
"""

pd.read_sql_query(query, con)

Unnamed: 0,player_id,first_name,second_name,team_id,team_name,position_id,position_name
0,1,Bernd,Leno,1,Arsenal,1,Goalkeeper
1,2,Rúnar Alex,Rúnarsson,1,Arsenal,1,Goalkeeper
2,3,Willian,Borges Da Silva,1,Arsenal,3,Midfielder
3,4,Pierre-Emerick,Aubameyang,1,Arsenal,4,Forward
4,5,Cédric,Soares,1,Arsenal,2,Defender
...,...,...,...,...,...,...,...
731,732,Tiago,Çukur,18,Watford,4,Forward
732,733,Adrian,Blake,18,Watford,3,Midfielder
733,734,Jack,Grieves,18,Watford,4,Forward
734,735,Joseph,McGlynn,5,Burnley,4,Forward


#### Pandas

In [18]:
(players[['player_id',
          'first_name',
          'second_name',
          'team_id',
          'position_id'
         ]]
 .assign(team_name = players.team_id.map(teams.set_index('team_id').team_name),
         position_name = players.position_id.map(positions.set_index('position_id').position_name)
        )
 .reindex(columns=['player_id',
                   'first_name',
                   'second_name',
                   'team_id',
                   'team_name',
                   'position_id',
                   'position_name'
                  ])
)

Unnamed: 0,player_id,first_name,second_name,team_id,team_name,position_id,position_name
0,1,Bernd,Leno,1,Arsenal,1,Goalkeeper
1,2,Rúnar Alex,Rúnarsson,1,Arsenal,1,Goalkeeper
2,3,Willian,Borges Da Silva,1,Arsenal,3,Midfielder
3,4,Pierre-Emerick,Aubameyang,1,Arsenal,4,Forward
4,5,Cédric,Soares,1,Arsenal,2,Defender
...,...,...,...,...,...,...,...
731,732,Tiago,Çukur,18,Watford,4,Forward
732,733,Adrian,Blake,18,Watford,3,Midfielder
733,734,Jack,Grieves,18,Watford,4,Forward
734,735,Joseph,McGlynn,5,Burnley,4,Forward


In [19]:
# Using pandasql
pysqldf = lambda q: sqldf(q, globals())
pysqldf("""
SELECT 
    pl.player_id,
    pl.first_name,
    pl.second_name,
    pl.team_id,
    t.team_name,
    pl.position_id,
    po.position_name
FROM players as pl 
INNER JOIN positions as po
    ON po.position_id = pl.position_id
INNER JOIN teams as t
    on t.team_id = pl.team_id
ORDER BY pl.player_id ASC;
"""
)

Unnamed: 0,player_id,first_name,second_name,team_id,team_name,position_id,position_name
0,1,Bernd,Leno,1,Arsenal,1,Goalkeeper
1,2,Rúnar Alex,Rúnarsson,1,Arsenal,1,Goalkeeper
2,3,Willian,Borges Da Silva,1,Arsenal,3,Midfielder
3,4,Pierre-Emerick,Aubameyang,1,Arsenal,4,Forward
4,5,Cédric,Soares,1,Arsenal,2,Defender
...,...,...,...,...,...,...,...
731,732,Tiago,Çukur,18,Watford,4,Forward
732,733,Adrian,Blake,18,Watford,3,Midfielder
733,734,Jack,Grieves,18,Watford,4,Forward
734,735,Joseph,McGlynn,5,Burnley,4,Forward


### Count of fixtures per gameweek for each team

#### SQL

In [20]:
query = """
SELECT
    t2.gameweek_id,
    t2.team_id,
    t.team_name,
    COUNT(*) AS Fixtures
FROM
(
SELECT
    t1.gameweek_id,
    t1.team_id
FROM
(
    SELECT 
       f1.gameweek_id,
       f1.home_team_id AS team_id
    FROM fixtures as f1
    UNION ALL
    SELECT 
       f2.gameweek_id,
       f2.away_team_id AS team_id
    FROM fixtures as f2
) AS t1
) AS t2
INNER JOIN teams as t
    ON t.team_id = t2.team_id
WHERE t2.gameweek_id IS NOT NULL
GROUP BY 
    t2.gameweek_id,
    t2.team_id,
    t.team_name
ORDER BY 
    t2.gameweek_id ASC,
    t.team_name ASC
"""

cursor.execute(query)
print(cursor.fetchall())

[(1, 1, 'Arsenal', 1), (1, 2, 'Aston Villa', 1), (1, 3, 'Brentford', 1), (1, 4, 'Brighton', 1), (1, 5, 'Burnley', 1), (1, 6, 'Chelsea', 1), (1, 7, 'Crystal Palace', 1), (1, 8, 'Everton', 1), (1, 10, 'Leeds', 1), (1, 9, 'Leicester', 1), (1, 11, 'Liverpool', 1), (1, 12, 'Man City', 1), (1, 13, 'Man Utd', 1), (1, 14, 'Newcastle', 1), (1, 15, 'Norwich', 1), (1, 16, 'Southampton', 1), (1, 17, 'Spurs', 1), (1, 18, 'Watford', 1), (1, 19, 'West Ham', 1), (1, 20, 'Wolves', 1), (2, 1, 'Arsenal', 1), (2, 2, 'Aston Villa', 1), (2, 3, 'Brentford', 1), (2, 4, 'Brighton', 1), (2, 5, 'Burnley', 1), (2, 6, 'Chelsea', 1), (2, 7, 'Crystal Palace', 1), (2, 8, 'Everton', 1), (2, 10, 'Leeds', 1), (2, 9, 'Leicester', 1), (2, 11, 'Liverpool', 1), (2, 12, 'Man City', 1), (2, 13, 'Man Utd', 1), (2, 14, 'Newcastle', 1), (2, 15, 'Norwich', 1), (2, 16, 'Southampton', 1), (2, 17, 'Spurs', 1), (2, 18, 'Watford', 1), (2, 19, 'West Ham', 1), (2, 20, 'Wolves', 1), (3, 1, 'Arsenal', 1), (3, 2, 'Aston Villa', 1), (3, 3, 

#### Pandas

In [21]:
(fixtures[['gameweek_id', 'home_team_id', 'away_team_id']]
 .append(fixtures.rename(columns={**dict(zip(['home_team_id'], ['away_team_id'])), **dict(zip(['away_team_id'], ['home_team_id']))}))
 .sort_index(ignore_index=True)
 .drop(columns=['away_team_id'])
 .rename(columns={'home_team_id': 'team_id'})
 .groupby(['gameweek_id', 'team_id']).size().to_frame('fixtures').reset_index()
 .assign(team_name = lambda x: x.team_id.map(teams.set_index('team_id').team_name))
 .astype({'gameweek_id': 'int8'})
 .reindex(columns=['gameweek_id',
                   'team_id',
                   'team_name',
                   'fixtures'
                  ])
)

Unnamed: 0,gameweek_id,team_id,team_name,fixtures
0,1,1,Arsenal,1
1,1,2,Aston Villa,1
2,1,3,Brentford,1
3,1,4,Brighton,1
4,1,5,Burnley,1
...,...,...,...,...
694,38,16,Southampton,1
695,38,17,Spurs,1
696,38,18,Watford,1
697,38,19,West Ham,1


### What teams have gameweeks with more than one fixture coming up?

In [22]:
current_gameweek = (gameweeks[gameweeks.finished == False]
                    .sort_values('gameweek_id', ascending=True)
                    .gameweek_id
                    .min()
                   )

In [23]:
(fixtures[['gameweek_id', 'home_team_id', 'away_team_id']]
 .append(fixtures.rename(columns={**dict(zip(['home_team_id'], ['away_team_id'])), **dict(zip(['away_team_id'], ['home_team_id']))}))
 .sort_index(ignore_index=True)
 .drop(columns=['away_team_id'])
 .rename(columns={'home_team_id': 'team_id'})
 .groupby(['gameweek_id', 'team_id']).size().to_frame('fixtures').reset_index()
 .assign(team_name = lambda x: x.team_id.map(teams.set_index('team_id').team_name))
 .astype({'gameweek_id': 'int8'})
 .reindex(columns=['gameweek_id',
                   'team_id',
                   'team_name',
                   'fixtures'
                  ])
 .query(f"gameweek_id >= {current_gameweek} and fixtures > 1")
)

Unnamed: 0,gameweek_id,team_id,team_name,fixtures
661,37,2,Aston Villa,2
664,37,5,Burnley,2
666,37,7,Crystal Palace,2
667,37,8,Everton,2
668,37,9,Leicester,2


### Table of gameweek fixture counts for each team for the remaining fixtures

In [24]:
(fixtures[['gameweek_id', 'home_team_id', 'away_team_id']]
 .append(fixtures.rename(columns={**dict(zip(['home_team_id'], ['away_team_id'])), **dict(zip(['away_team_id'], ['home_team_id']))}))
 .sort_index(ignore_index=True)
 .drop(columns=['away_team_id'])
 .rename(columns={'home_team_id': 'team_id'})
 .groupby(['gameweek_id', 'team_id']).size().to_frame('fixtures').reset_index()
 .assign(team_name = lambda x: x.team_id.map(teams.set_index('team_id').team_name))
 .astype({'gameweek_id': 'int8'})
 .reindex(columns=['gameweek_id',
                   'team_id',
                   'team_name',
                   'fixtures'
                  ])
 .query(f"gameweek_id >= {current_gameweek}")
 .pivot_table(index=['team_id', 'team_name'], columns=['gameweek_id'], values='fixtures', fill_value=0)
)

Unnamed: 0_level_0,gameweek_id,37,38
team_id,team_name,Unnamed: 2_level_1,Unnamed: 3_level_1
1,Arsenal,1,1
2,Aston Villa,2,1
3,Brentford,1,1
4,Brighton,1,1
5,Burnley,2,1
6,Chelsea,1,1
7,Crystal Palace,2,1
8,Everton,2,1
9,Leicester,2,1
10,Leeds,1,1


### Table of remaining fixtures

In [25]:
# Setting this as a constant as the code below to derive the table of remaining fixtures is not dynamic in its current form
CURRENT_GAMEWEEK = 36

In [26]:
# Colour home and away matches
def colour(team):
    if team.isupper():
        colour = 'green'
    elif team.islower():
        colour = 'red'
    else:
        colour = 'white'
    return 'background-color: %s' % colour

In [27]:
(
    fixtures[['fixture_id', 'kickoff_time', 'gameweek_id', 'home_team_id', 'away_team_id']]
    .append(fixtures.rename(columns={**dict(zip(['home_team_id'], ['away_team_id'])),
                                     **dict(zip(['away_team_id'], ['home_team_id']))
                                    }))
    .sort_index(ignore_index=True)
    .drop(columns=['away_team_id'])
    .rename(columns={'home_team_id': 'team_id'})
    .assign(team_name = lambda x: x.team_id.map(teams.set_index('team_id').team_name),
            team_short_name = lambda x: x.team_id.map(teams.set_index('team_id').team_short_name),
            gameweek_name = lambda x: x.gameweek_id.map(gameweeks.set_index('gameweek_id').gameweek_name)
           )
    .query(f"gameweek_id >= {CURRENT_GAMEWEEK}")
    .merge(fixtures[['fixture_id', 'home_team_id']], how='left',
           left_on=['fixture_id', 'team_id'],
           right_on=['fixture_id', 'home_team_id']
          )
    .astype({'gameweek_id': 'int8',
             'home_team_id': 'Int8'
            })
    .assign(home = lambda x: np.where(x.home_team_id.isnull(), 0, 1),
            venue = lambda x: np.where(x.home == 1, 'home', 'away'),
            fixture_dummy_id = lambda x: np.where(x.venue == 'home', x.fixture_id, -1*x.fixture_id)
           )
    .sort_values(['gameweek_id', 'kickoff_time'])
    .assign(gameweek_fixture_number = lambda x: x.groupby(['gameweek_id', 'team_id']).cumcount()+1,
            gameweek_fixture_name = lambda x: np.where(x.gameweek_fixture_number == 1, 'Fixture One', 'Fixture Two')
           )
    .reindex(columns=['fixture_dummy_id',
                      'gameweek_name',
                      'team_id',
                      'gameweek_fixture_name'
                     ])
    .pivot(index=['team_id'], columns=['gameweek_name', 'gameweek_fixture_name'], values='fixture_dummy_id').sort_index(axis='columns', level='gameweek_name')
    .fillna(0)
    .astype('Int16')
    .pipe(lambda x: x.set_axis([' '.join(col) for col in x.columns.values], axis=1))
    .reset_index()
    .assign(team_name = lambda x: x.team_id.map(teams.set_index('team_id').team_short_name),
            Gameweek_36_Fixture_One_Team = lambda x: np.where(x['Gameweek 36 Fixture One'] < 0,
                                                              (-1*x['Gameweek 36 Fixture One']).map(fixtures.set_index('fixture_id').home_team_id),
                                                              x['Gameweek 36 Fixture One'].map(fixtures.set_index('fixture_id').away_team_id)
                                                             ).astype('int8'),
            Gameweek_36_Fixture_One_Opponent = lambda x: np.where(x['Gameweek 36 Fixture One'] < 0,
                                                                  x['Gameweek_36_Fixture_One_Team'].map(teams.set_index('team_id').team_short_name).str.lower(),
                                                                  x['Gameweek_36_Fixture_One_Team'].map(teams.set_index('team_id').team_short_name).str.upper()
                                                                 ),
            Gameweek_36_Fixture_Two_Team = lambda x: np.where(x['Gameweek 36 Fixture Two'] < 0,
                                                              (-1*x['Gameweek 36 Fixture Two']).map(fixtures.set_index('fixture_id').home_team_id),
                                                              x['Gameweek 36 Fixture Two'].map(fixtures.set_index('fixture_id').away_team_id)
                                                             ).astype('int8'),
            Gameweek_36_Fixture_Two_Opponent = lambda x: np.where(x['Gameweek 36 Fixture Two'] < 0,
                                                                  x['Gameweek_36_Fixture_Two_Team'].map(teams.set_index('team_id').team_short_name).str.lower(),
                                                                  x['Gameweek_36_Fixture_Two_Team'].map(teams.set_index('team_id').team_short_name).str.upper()
                                                                 ),
            Gameweek_37_Fixture_One_Team = lambda x: np.where(x['Gameweek 37 Fixture One'] < 0,
                                                              (-1*x['Gameweek 37 Fixture One']).map(fixtures.set_index('fixture_id').home_team_id),
                                                              x['Gameweek 37 Fixture One'].map(fixtures.set_index('fixture_id').away_team_id)
                                                             ).astype('int8'),
            Gameweek_37_Fixture_One_Opponent = lambda x: np.where(x['Gameweek 37 Fixture One'] < 0,
                                                                  x['Gameweek_37_Fixture_One_Team'].map(teams.set_index('team_id').team_short_name).str.lower(),
                                                                  x['Gameweek_37_Fixture_One_Team'].map(teams.set_index('team_id').team_short_name).str.upper()
                                                                 ),
            Gameweek_37_Fixture_Two_Team = lambda x: np.where(x['Gameweek 37 Fixture Two'] < 0,
                                                              (-1*x['Gameweek 37 Fixture Two']).map(fixtures.set_index('fixture_id').home_team_id),
                                                              x['Gameweek 37 Fixture Two'].map(fixtures.set_index('fixture_id').away_team_id)
                                                             ).astype('int8'),
            Gameweek_37_Fixture_Two_Opponent = lambda x: np.where(x['Gameweek 37 Fixture Two'] < 0,
                                                                  x['Gameweek_37_Fixture_Two_Team'].map(teams.set_index('team_id').team_short_name).str.lower(),
                                                                  x['Gameweek_37_Fixture_Two_Team'].map(teams.set_index('team_id').team_short_name).str.upper()
                                                                 ),
            Gameweek_38_Fixture_One_Team = lambda x: np.where(x['Gameweek 38 Fixture One'] < 0,
                                                              (-1*x['Gameweek 38 Fixture One']).map(fixtures.set_index('fixture_id').home_team_id),
                                                              x['Gameweek 38 Fixture One'].map(fixtures.set_index('fixture_id').away_team_id)
                                                             ).astype('int8'),
            Gameweek_38_Fixture_One_Opponent = lambda x: np.where(x['Gameweek 38 Fixture One'] < 0,
                                                                  x['Gameweek_38_Fixture_One_Team'].map(teams.set_index('team_id').team_short_name).str.lower(),
                                                                  x['Gameweek_38_Fixture_One_Team'].map(teams.set_index('team_id').team_short_name).str.upper()
                                                                 ) 
           )
    .drop(columns=['team_id',
                   'Gameweek 36 Fixture One',
                   'Gameweek_36_Fixture_One_Team',
                   'Gameweek 36 Fixture Two',
                   'Gameweek_36_Fixture_Two_Team',
                   'Gameweek 37 Fixture One',
                   'Gameweek_37_Fixture_One_Team',
                   'Gameweek 37 Fixture Two',
                   'Gameweek_37_Fixture_Two_Team',
                   'Gameweek 38 Fixture One',
                   'Gameweek_38_Fixture_One_Team'
                  ])
    .fillna('')
    .pipe(lambda x: x.set_axis(x.columns.str.replace('_',' '), axis=1))
    .pipe(lambda x: x.style.applymap(colour, subset=x.iloc[:, 1:7].columns))
)

Unnamed: 0,team name,Gameweek 36 Fixture One Opponent,Gameweek 36 Fixture Two Opponent,Gameweek 37 Fixture One Opponent,Gameweek 37 Fixture Two Opponent,Gameweek 38 Fixture One Opponent
0,ARS,LEE,tot,new,,EVE
1,AVL,bur,LIV,CRY,BUR,mci
2,BRE,SOU,,eve,,LEE
3,BHA,MUN,,lee,,WHU
4,BUR,AVL,,tot,avl,NEW
5,CHE,WOL,lee,LEI,,WAT
6,CRY,WAT,,avl,eve,MUN
7,EVE,lei,wat,BRE,CRY,ars
8,LEI,EVE,NOR,wat,che,SOU
9,LEE,ars,CHE,BHA,,bre


### Close connection to the database

In [28]:
con.close()

## Summary
We have explored some simple queries of the fpl database and compared SQL and pandas query equivalents.

Further queries may be added in the future.