# SQL Data Analysis: European Soccer Data

In this small project, we want to do a data analysis on the European Soccer dataset using SQL queries processed through Python 3. My motivation for this project is to gain a better understanding of how to query data with SQL and improve my Python data analysis skill.

Firstly, we import libraries that are essential for this study.

In [8]:
# Import libraries
import numpy as np
import pandas as pd
import sqlite3
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')

# Input data files path
path = "data/" 
database = path + 'database.sqlite'

## 1. Data Exploration
First we will create a connection to the database, and explore the tables in the data.

In [9]:
# Establish a connection

conn = sqlite3.connect(database)

# Explore which tables exist in the data

tables = pd.read_sql("""SELECT *
                        FROM sqlite_master
                        WHERE type='table';""", conn)
tables

Unnamed: 0,type,name,tbl_name,rootpage,sql
0,table,sqlite_sequence,sqlite_sequence,4,"CREATE TABLE sqlite_sequence(name,seq)"
1,table,Player_Attributes,Player_Attributes,11,"CREATE TABLE ""Player_Attributes"" (\n\t`id`\tIN..."
2,table,Player,Player,14,CREATE TABLE `Player` (\n\t`id`\tINTEGER PRIMA...
3,table,Match,Match,18,CREATE TABLE `Match` (\n\t`id`\tINTEGER PRIMAR...
4,table,League,League,24,CREATE TABLE `League` (\n\t`id`\tINTEGER PRIMA...
5,table,Country,Country,26,CREATE TABLE `Country` (\n\t`id`\tINTEGER PRIM...
6,table,Team,Team,29,"CREATE TABLE ""Team"" (\n\t`id`\tINTEGER PRIMARY..."
7,table,Team_Attributes,Team_Attributes,2,CREATE TABLE `Team_Attributes` (\n\t`id`\tINTE...


Now, we furtherly explore the tables.

In [10]:
# Extracting the tables into variables

player_attributes = pd.read_sql("""SELECT * FROM Player_Attributes;""", conn)
player = pd.read_sql("""SELECT * FROM Player;""", conn)
match = pd.read_sql("""SELECT * FROM Match;""", conn)
league = pd.read_sql("""SELECT * FROM League;""", conn)
country = pd.read_sql("""SELECT * FROM Country;""", conn)
team = pd.read_sql("""SELECT * FROM Team;""", conn)
team_attributes = pd.read_sql("""SELECT * FROM Team_Attributes""", conn)

Next, we take a glimpse at each table.

In [12]:
player_attributes.head(1)

Unnamed: 0,id,player_fifa_api_id,player_api_id,date,overall_rating,potential,preferred_foot,attacking_work_rate,defensive_work_rate,crossing,...,vision,penalties,marking,standing_tackle,sliding_tackle,gk_diving,gk_handling,gk_kicking,gk_positioning,gk_reflexes
0,1,218353,505942,2016-02-18 00:00:00,67.0,71.0,right,medium,medium,49.0,...,54.0,48.0,65.0,69.0,69.0,6.0,11.0,10.0,8.0,8.0


In [14]:
player.head(1)

Unnamed: 0,id,player_api_id,player_name,player_fifa_api_id,birthday,height,weight
0,1,505942,Aaron Appindangoye,218353,1992-02-29 00:00:00,182.88,187


In [15]:
match.head(1)

Unnamed: 0,id,country_id,league_id,season,stage,date,match_api_id,home_team_api_id,away_team_api_id,home_team_goal,...,SJA,VCH,VCD,VCA,GBH,GBD,GBA,BSH,BSD,BSA
0,1,1,1,2008/2009,1,2008-08-17 00:00:00,492473,9987,9993,1,...,4.0,1.65,3.4,4.5,1.78,3.25,4.0,1.73,3.4,4.2


In [16]:
league.head(1)

Unnamed: 0,id,country_id,name
0,1,1,Belgium Jupiler League


In [17]:
country.head(1)

Unnamed: 0,id,name
0,1,Belgium


In [20]:
team.head(1)

Unnamed: 0,id,team_api_id,team_fifa_api_id,team_long_name,team_short_name
0,1,9987,673.0,KRC Genk,GEN


In [19]:
team_attributes.head(1)

Unnamed: 0,id,team_fifa_api_id,team_api_id,date,buildUpPlaySpeed,buildUpPlaySpeedClass,buildUpPlayDribbling,buildUpPlayDribblingClass,buildUpPlayPassing,buildUpPlayPassingClass,...,chanceCreationShooting,chanceCreationShootingClass,chanceCreationPositioningClass,defencePressure,defencePressureClass,defenceAggression,defenceAggressionClass,defenceTeamWidth,defenceTeamWidthClass,defenceDefenderLineClass
0,1,434,9930,2010-02-22 00:00:00,60,Balanced,,Little,50,Mixed,...,55,Normal,Organised,50,Medium,55,Press,45,Normal,Cover


We can see that the dataset contains the following tables:
* **Country**: It contains the list of countries where the games are played and their unique ids.
* **League**: It contains the specific titles of the sports events/league matches.
* **Match**: It contains performance metrics of various players for different league matches. The information is specified using the unique ids of the other three tables: Country, League, and Team.
* **Player**: It has Players’ names, height, weight, birth date, FIFA ID, and API ID.
* **Player_Attributes**: It has various parameters like rating, possible score, best foot, etc., for each player to highlight their overall performance.
* **Team**: It contains teams’ names (short and long), and their IDs for API and FIFA.
* **Team_Attributes**: It contains various columns that reflect each team’s performance.