![](https://i.imgur.com/A3Cxp7P.png)

----

- [Some analysis of Goals...](#Some-analysis-of-Goals...)
    - [Goals scored vs conceded per team](#Goals-scored-vs-conceded-per-team)
    - [Goal period](#Goal-period)
    - [Shirt number matters...](#Shirt-number-matters...)
    
    
- [Some team analysis...](#Some-team-analysis...)
    - [What was the best stage native/foreign coaches were able to take their team to?](#What-was-the-best-stage-native/foreign-coaches-were-able-to-take-their-team-to?)
    
    
- [Some analysis on players...](#Some-analysis-on-players...)
    - [Goalkeepers are biggg!!!](#Goalkeepers-are-biggg!!!)
    - [Some age analysis](#Some-age-analysis)
    - [None of the teams with average age above 28 could make it past Quarterfinals...](#None-of-the-teams-with-average-age-above-28-could-make-it-past-Quarterfinals...)
    
    
- [Some match attendance analysis...](#Some-match-attendance-analysis...)
    - [Why so low attendance in quarters? Maybe bcoz of no Portugal, Spain and Argentina!](#Why-so-low-attendance-in-quarters?-Maybe-bcoz-of-no-Portugal,-Spain-and-Argentina!)
    
    
- [Some analysis on team tactics...](#Some-analysis-on-team-tactics...)
    - [4-2-3-1 seems to be everyone's favorite. Does it guarantee a win?](#4-2-3-1-seems-to-be-everyone's-favorite.-Does-it-guarantee-a-win?)
    - [How does 4-2-3-1 tactic fares at big stage?](#How-does-4-2-3-1-tactic-fares-at-big-stage?)
    
    
- [Some analysis on ball possession...](#Some-analysis-on-ball-possession...)
    - [In 4/6 matches, France had lesser ball possession than the opponent..](#In-4/6-matches,-France-had-lesser-ball-possession-than-the-opponent..)
    
----

In [None]:
import pandas as pd
%matplotlib inline

In [None]:
teams_df = pd.read_csv("data/teams.csv", index_col="IdTeam")

In [None]:
teams_df.head()

In [None]:
players_df = pd.read_csv("data/players.csv", parse_dates=['BirthDate'], index_col="IdPlayer")

In [None]:
players_df.head()

In [None]:
matches_df = pd.read_csv("data/matches.csv", parse_dates=['Date'], index_col="IdMatch")

In [None]:
matches_df.head()

In [None]:
goals_df = pd.read_csv("data/goals.csv")

In [None]:
goals_df.head()

## Some analysis of Goals...
![](https://i.imgflip.com/219r12.jpg)

### Goals scored vs conceded per team

In [None]:
goals_scored_per_team = goals_df['TeamName'].value_counts()

In [None]:
goals_conceded_per_team = goals_df['OppositionTeamName'].value_counts()

In [None]:
goals_scored_per_team.get('Croatia')

In [None]:
teams_df['GoalsScored'] = teams_df['TeamName'].apply(goals_scored_per_team.get)

In [None]:
teams_df['GoalsConceded'] = teams_df['TeamName'].apply(goals_conceded_per_team.get)

In [None]:
# goals scored vs conceded
teams_df.plot.bar(x='TeamName', y=['GoalsScored', 'GoalsConceded'], figsize=(15,6))

In [None]:
# Which teams conceded more goals than they scored?
teams_df.query('GoalsScored < GoalsConceded')

### Goal period

In [None]:
def set_goal_period(minute):
    total = eval(minute.replace('\'', ''))
    if "45'+" in minute or total<=45:
        return "FirstHalf"
    elif  "90'+" in minute or 45<total<=90:
        return "SecondHalf"
    elif 90<total<=120:
        return "ExtraTime"
    else:
        return "PenaltyShootout"

In [None]:
goals_df['Period'] = goals_df['Minute'].apply(set_goal_period)

In [None]:
goals_df['Period'].value_counts().plot.pie(autopct='%1.1f%%')

### Shirt number matters...

In [None]:
# 10 vs 7
goals_df['PlayerShirtNumber'].value_counts()

In [None]:
# players with shirt no. 10 who scored goals 
goals_df.query("PlayerShirtNumber == 10")['PlayerName'].value_counts()

## Some team analysis...
![](https://michaelsloredotcom.files.wordpress.com/2017/12/545399686-france-team-members-celebrate-after-beating-germany-2-0-crop-promo-xlarge2-e1512248877901.jpg?w=845&h=450&crop=1)

In [None]:
# Best stage teams made?
best_stage_map = {}
for i in range(len(matches_df.index)):
    match = matches_df.iloc[i]
    best_stage_map[match['HomeTeamName']] = match['Stage']
    best_stage_map[match['AwayTeamName']] = match['Stage']

In [None]:
best_stage_map

In [None]:
teams_df['BestStage'] = teams_df['TeamName'].apply(best_stage_map.get)

### What was the best stage native/foreign coaches were able to take their team to?

In [None]:
teams_df.query('TeamName != CoachCountry')['BestStage'].value_counts().plot.pie(autopct='%1.1f%%')

In [None]:
teams_df.query('TeamName == CoachCountry')['BestStage'].value_counts().plot.pie(autopct='%1.1f%%')

## Some analysis on players...
![](http://n.sinaimg.cn/sports/transform/224/w615h409/20180707/hZYi-hexfcvm1472143.jpg)

In [None]:
players_df['Height'].plot.hist()

In [None]:
# Who is tallest player?
players_df.loc[players_df['Height'].idxmax()]

In [None]:
# who is the tiniest?
players_df.loc[players_df['Height'].idxmin()]

In [None]:
# Player position breakdown
players_df.Position.value_counts()

In [None]:
def get_position_color(position):
    color_map = {
        "Defender": 'y',
        "Midfielder": 'b',
        "Forward": 'g',
        "Goalkeeper": 'r'
    }
    return color_map[position]

In [None]:
color_map = players_df['Position'].apply(get_position_color)

In [None]:
players_df.plot.scatter(x='Height', y='Weight', c=color_map)

In [None]:
players_df.groupby('Position')[['Height', 'Weight']].mean()

### Goalkeepers are biggg!!!
![](https://i.imgur.com/v81dlTt.jpg)

### Some age analysis

![](https://t.resfu.com/media/img_news/montage-of-essam-el-hadary-and-kylian-mbappe--besoccer.png?size=776x&q=60)

In [None]:
from datetime import datetime

In [None]:
now = datetime.now()

In [None]:
players_df['Age'] = (now - players_df['BirthDate']).astype('<m8[Y]')

In [None]:
# Who are the oldest players?
players_df.sort_values('Age', ascending=False).head()

In [None]:
# Who are the youngest players?
players_df.sort_values('Age').head()

In [None]:
avg_country_ages = players_df.groupby('TeamName')['Age'].mean().sort_values()

In [None]:
teams_df["AverageAge"] = teams_df['TeamName'].apply(avg_country_ages.get)

### None of the teams with average age above 28 could make it past Quarterfinals... 

In [None]:
teams_df.sort_values('AverageAge')[['TeamName', 'AverageAge', 'BestStage']]

## Some match attendance analysis...
![](https://i.imgur.com/MyTtkVy.jpg)

In [None]:
# Attendance in France matches
matches_df.query("HomeTeamName == 'France' or AwayTeamName == 'France'").plot.bar(x="Stage", y="Attendance", figsize=(15,7))

In [None]:
def get_avg_team_attendance(team_name):
    return matches_df.query("HomeTeamName == '{0}' or AwayTeamName == '{0}'".format(team_name))["Attendance"].mean()

In [None]:
teams_df['AvgAttendance'] = teams_df['TeamName'].apply(get_avg_team_attendance)

In [None]:
teams_df[['TeamName', 'AvgAttendance', 'BestStage']].sort_values('AvgAttendance', ascending=False)

In [None]:
# Stage wise average attendance
matches_df.groupby("Stage")['Attendance'].mean().sort_values(ascending=False)

In [None]:
matches_df.groupby("Stage")['Attendance'].mean().sort_values(ascending=False).plot.bar()

### Why so low attendance in quarters? Maybe bcoz of no Portugal, Spain and Argentina!

## Some analysis on team tactics...
![](https://theresonlyoneball.files.wordpress.com/2018/01/tactics.png?w=648)

In [None]:
matches_df["HomeTeamTactics"].value_counts()

### 4-2-3-1 seems to be everyone's favorite. Does it guarantee a win?

In [None]:
def set_winning_team_tactics(match):
    if match["Winner"] == match["HomeTeamName"]:
        return match["HomeTeamTactics"]
    elif match["Winner"] == match["AwayTeamName"]:
        return match["AwayTeamTactics"]
    else:
        return None

In [None]:
matches_df["WinningTeamTactics"] = matches_df.apply(set_winning_team_tactics, axis=1)

In [None]:
matches_df["WinningTeamTactics"].value_counts().plot.pie(autopct='%1.1f%%')

### 4-2-3-1 is used by the winning team maximum number of times...what about losing teams' tactics?

In [None]:
def set_losing_team_tactics(match):
    if match["Winner"] == match["HomeTeamName"]:
        return match["AwayTeamTactics"]
    elif match["Winner"] == match["AwayTeamName"]:
        return match["HomeTeamTactics"]
    else:
        return None

In [None]:
matches_df["LosingTeamTactics"] = matches_df.apply(set_losing_team_tactics, axis=1)

In [None]:
matches_df["LosingTeamTactics"].value_counts().plot.pie(autopct='%1.1f%%')

### How does 4-2-3-1 tactic fares at big stage?

In [None]:
matches_df.query("WinningTeamTactics == '4-2-3-1' and Stage != 'First stage'")

![](https://img.memecdn.com/leaked-portugal-world-cup-tactics_o_3411817.jpg)

## Some analysis on ball possession...

![](https://sportzwiki.com/wp-content/uploads/2018/03/sprinkle.jpg)

In [None]:
def set_winning_team_ball_possession(match):
    if match["Winner"] == match["HomeTeamName"]:
        return match["BallPossessionHome"]
    elif match["Winner"] == match["AwayTeamName"]:
        return match["BallPossessionAway"]
    else:
        return None
    
def set_losing_team_ball_possession(match):
    if match["Winner"] == match["HomeTeamName"]:
        return match["BallPossessionAway"]
    elif match["Winner"] == match["AwayTeamName"]:
        return match["BallPossessionHome"]
    else:
        return None

In [None]:
matches_df["BallPossessionWinner"] = matches_df.apply(set_winning_team_ball_possession, axis=1)
matches_df["BallPossessionLoser"] = matches_df.apply(set_losing_team_ball_possession, axis=1)

In [None]:
matches_df.query("BallPossessionWinner < BallPossessionLoser").describe(include='object')

### In 4/6 matches, France had lesser ball possession than the opponent..

In [None]:
def get_avg_team_ball_possession(team_name):
    home_possessions = matches_df.query("HomeTeamName == '{0}'".format(team_name))["BallPossessionHome"]
    away_possessions = matches_df.query("AwayTeamName == '{0}'".format(team_name))["BallPossessionAway"]
    return (sum(home_possessions) + sum(away_possessions))/ (len(home_possessions) + len(away_possessions))

In [None]:
teams_df["AverageBallPossession"] = teams_df["TeamName"].apply(get_avg_team_ball_possession)

In [None]:
teams_df[["TeamName", "AverageBallPossession", "BestStage"]].sort_values(by="AverageBallPossession", ascending=False)

![](http://ift.tt/UeVgbp)