# Part 3: Soccer Data

*Introductory - Intermediate level SQL*

---

## Setup

Download the [SQLite database](https://www.kaggle.com/hugomathien/soccer/download). *Note: You may be asked to log in, or "continue and download".* Unpack the ZIP file into your working directory (i.e., wherever you'd like to complete this challenge set). There should be a *database.sqlite* file.

As with Part II, you can check the schema:

In [103]:
import pandas as pd
import sqlite3

conn = sqlite3.connect('../../data/database.sqlite')

query = "SELECT * FROM sqlite_master"

df_schema = pd.read_sql_query(query, conn)

df_schema.tbl_name.unique()

array(['sqlite_sequence', 'Player_Attributes', 'Player', 'Match',
       'League', 'Country', 'Team', 'Team_Attributes'], dtype=object)

Please complete this exercise using sqlite3 (the soccer data, above) and your Jupyter notebook.

1. Which team scored the most points when playing at home?

In [42]:
query = '''
SELECT team_long_name Team, SUM(home_team_goal) Points
FROM Match
JOIN Team ON Match.home_team_api_id = Team.team_api_id
GROUP BY home_team_api_id
ORDER BY SUM(home_team_goal) DESC
LIMIT 1
'''

df = pd.read_sql_query(query, conn)
df.head()

Unnamed: 0,Team,Points
0,Real Madrid CF,505


2. Did this team also score the most points when playing away? 

No

In [43]:
query = '''
SELECT team_long_name Team, SUM(away_team_goal) Points
FROM Match
JOIN Team ON Match.away_team_api_id = Team.team_api_id
GROUP BY away_team_api_id
ORDER BY SUM(away_team_goal) DESC
LIMIT 1
'''

df = pd.read_sql_query(query, conn)
df.head()

Unnamed: 0,Team,Points
0,FC Barcelona,354


3. How many matches resulted in a tie?

In [46]:
query = '''
SELECT COUNT(match_api_id) as Total
FROM Match
WHERE home_team_goal = away_team_goal
'''

df = pd.read_sql_query(query, conn)
df.head()

Unnamed: 0,Total
0,6596


4. How many players have Smith for their last name? How many have 'smith' anywhere in their name?

In [61]:
query = '''
SELECT COUNT(player_name) as CountSmithLastName
FROM Player
WHERE player_name LIKE '% smith'
'''

df = pd.read_sql_query(query, conn)
df

Unnamed: 0,CountSmithLastName
0,15


In [63]:
query = '''
SELECT COUNT(player_name) as CountSmith
FROM Player
WHERE player_name LIKE '%smith%'
'''

df = pd.read_sql_query(query, conn)
df

Unnamed: 0,CountSmith
0,18


5. What was the median tie score? Use the value determined in the previous question for the number of tie games. *Hint:* PostgreSQL does not have a median function. Instead, think about the steps required to calculate a median and use the [`WITH`](https://www.postgresql.org/docs/8.4/static/queries-with.html) command to store stepwise results as a table and then operate on these results.

In [72]:
query = '''
SELECT temp.score
FROM
    (SELECT home_team_goal as score, ROW_NUMBER() OVER(ORDER BY home_team_goal DESC) RowNumber
    FROM Match
    WHERE home_team_goal = away_team_goal) temp
WHERE RowNumber = (6596 / 2)
'''

df = pd.read_sql_query(query, conn)
df

Unnamed: 0,score
0,1


6. What percentage of players prefer their left or right foot? *Hint:* Calculate either the right or left foot, whichever is easier based on how you setup the problem.

Left

In [101]:
query = '''
SELECT (COUNT(DISTINCT(player_api_id)) * 100.0 / (SELECT COUNT(DISTINCT(player_api_id)) FROM Player_Attributes)) as percentage
FROM Player_Attributes
WHERE preferred_foot LIKE '%left%'
'''

df = pd.read_sql_query(query, conn)
df

Unnamed: 0,percentage
0,28.951175


Right

In [102]:
query = '''
SELECT (COUNT(DISTINCT(player_api_id)) * 100.0 / (SELECT COUNT(DISTINCT(player_api_id)) FROM Player_Attributes)) as percentage
FROM Player_Attributes
WHERE preferred_foot LIKE '%right%'
'''

df = pd.read_sql_query(query, conn)
df

Unnamed: 0,percentage
0,81.184448
