# Arsène Wenger years in charge of Arsenal FC


## Table of Contents
<ul>
<li><a href="#intro">Introduction</a></li>
<li><a href="#wrangling">Data Wrangling</a></li>
<li><a href="#eda">Exploratory Data Analysis</a></li>
<li><a href="#conclusions">Conclusions</a></li>
<li><a href="#statistical">Statistical Tests</a></li>
</ul>

<a id='intro'></a>
## Introduction

> This is an analysis of the period when manager Arsène Wenger was in charge of the Arsenal Football Club. The dataset was gathered from Kaggle and contains information about matches between 1993 (the start of modern English Premier League) and 2018. Manager Arsène Wenger joined Arsenal in August 1996 and left in by the end of 2018 season. 

<a id='wrangling'></a>
## Data Wrangling


### Gather

In [36]:
import zipfile
import pandas as pd
import numpy as np
import random
import matplotlib.pyplot as plt
%matplotlib inline
#We are setting the seed to assure you get the same answers on quizzes as we set up
random.seed(42)

In [2]:
# Extract contents from Premier League zip file 

with zipfile.ZipFile('epl-results-19932018.zip', 'r') as myzip:
    myzip.extractall()

In [4]:
# Read CSV

df = pd.read_csv('EPL_Set.csv')

### Assess

In [5]:
df.head()

Unnamed: 0,Div,Date,HomeTeam,AwayTeam,FTHG,FTAG,FTR,HTHG,HTAG,HTR,Season
0,E0,14/08/93,Arsenal,Coventry,0,3,A,,,,1993-94
1,E0,14/08/93,Aston Villa,QPR,4,1,H,,,,1993-94
2,E0,14/08/93,Chelsea,Blackburn,1,2,A,,,,1993-94
3,E0,14/08/93,Liverpool,Sheffield Weds,2,0,H,,,,1993-94
4,E0,14/08/93,Man City,Leeds,1,1,D,,,,1993-94


#### Columns Used

- **Div:** The division the match was played in.
- **Date:** The date the match was played.
- **HomeTeam:** The name of the home team.
- **AwayTeam:** The name of the away team.
- **FTHG:** The total number of goals scored by the home team during the match at full time.
- **FTAG:** The total number of goals scored by the away team during the match at half time.
- **FTR:** The full time result, denoted as 'H' for home team win, 'A' for away team win, or 'D' for draw.
- **Season:** The season in which the match was played.

In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9664 entries, 0 to 9663
Data columns (total 11 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   Div       9664 non-null   object 
 1   Date      9664 non-null   object 
 2   HomeTeam  9664 non-null   object 
 3   AwayTeam  9664 non-null   object 
 4   FTHG      9664 non-null   int64  
 5   FTAG      9664 non-null   int64  
 6   FTR       9664 non-null   object 
 7   HTHG      8740 non-null   float64
 8   HTAG      8740 non-null   float64
 9   HTR       8740 non-null   object 
 10  Season    9664 non-null   object 
dtypes: float64(2), int64(2), object(7)
memory usage: 830.6+ KB


### Clean

#### Selecting seasons range from 1996 to 2018. 
> This is the period Arsène Wenger was in charge of Arsenal FC. 

In [9]:
# Extract only seasons between season 1996-97 and 2017-18.

seasons = ['1996-97','1997-98', '1998-99', '1999-00', '2000-01', '2001-02', '2002-03', '2003-04', '2004-05', '2005-06', '2006-07', '2007-08', '2008-09',
           '2009-10', '2010-11', '2011-12', '2012-13', '2013-14', '2014-15', '2015-16', '2016-17', '2017-18']

df_aw = df.query('Season in @seasons')

In [13]:
# Select games where Arsenal was involved.

df_aw = df_aw.query('HomeTeam == "Arsenal" or AwayTeam == "Arsenal"')

In [15]:
df_aw.head()

Unnamed: 0,Div,Date,HomeTeam,AwayTeam,FTHG,FTAG,FTR,HTHG,HTAG,HTR,Season
1304,E0,17/08/96,Arsenal,West Ham,2,0,H,2.0,0.0,H,1996-97
1314,E0,19/08/96,Liverpool,Arsenal,2,0,H,0.0,0.0,D,1996-97
1326,E0,24/08/96,Leicester,Arsenal,0,2,A,0.0,1.0,A,1996-97
1335,E0,4/9/1996,Arsenal,Chelsea,3,3,D,1.0,2.0,A,1996-97
1344,E0,7/9/1996,Aston Villa,Arsenal,2,2,D,1.0,0.0,H,1996-97


<a id='eda'></a>
## Exploratory Data Analysis

### Research Question 1  - What is the number of goals scored playing home and away? 
> **Goals playing home** = 910
 **Goals playing away** = 668

In [48]:
# Goals playing home 
print(df_aw.query('HomeTeam == "Arsenal"').FTHG.sum())

# Goals playing away
print(df_aw.query('AwayTeam == "Arsenal"').FTAG.sum())

910
668


### Research Question 2  - What is the mean of goals scored at home and away? 
> **Mean of goals playing home** = 2.17 **Mean of goals playing away** = 1.59

In [31]:
# Mean of goals playing home 
print(df_aw.query('HomeTeam == "Arsenal"').FTHG.sum()/(df_aw.query('HomeTeam == "Arsenal"').count()))

Div         2.177033
Date        2.177033
HomeTeam    2.177033
AwayTeam    2.177033
FTHG        2.177033
FTAG        2.177033
FTR         2.177033
HTHG        2.177033
HTAG        2.177033
HTR         2.177033
Season      2.177033
dtype: float64


In [33]:
# Mean of goals playing away 
print(df_aw.query('AwayTeam == "Arsenal"').FTAG.sum()/(df_aw.query('AwayTeam == "Arsenal"').count()))

Div         1.598086
Date        1.598086
HomeTeam    1.598086
AwayTeam    1.598086
FTHG        1.598086
FTAG        1.598086
FTR         1.598086
HTHG        1.598086
HTAG        1.598086
HTR         1.598086
Season      1.598086
dtype: float64


### Research Question 3  - What is the mean of goals scored in the first and second half? 
> **Mean of goals scored in the first half:** 0.84 **Mean of goals scored in the second half:** 1.04

In [44]:
# First and seconf half goals at home

fh_goals_home = df_aw.query('HomeTeam == "Arsenal"').HTHG.sum()
sh_goals_home = (df_aw.query('HomeTeam == "Arsenal"').FTHG.sum()-fh_goals_home)       

# First and seconf half goals away

fh_goals_away = df_aw.query('AwayTeam == "Arsenal"').HTAG.sum()
sh_goals_away = (df_aw.query('AwayTeam == "Arsenal"').FTAG.sum()-fh_goals_away)  

# The mean of goals scored in the first half
print((fh_goals_home + fh_goals_away)/df_aw.count())

# The mean of goals scored in the second half
print((sh_goals_home + sh_goals_away)/df_aw.count())

Div         0.84689
Date        0.84689
HomeTeam    0.84689
AwayTeam    0.84689
FTHG        0.84689
FTAG        0.84689
FTR         0.84689
HTHG        0.84689
HTAG        0.84689
HTR         0.84689
Season      0.84689
dtype: float64
Div         1.04067
Date        1.04067
HomeTeam    1.04067
AwayTeam    1.04067
FTHG        1.04067
FTAG        1.04067
FTR         1.04067
HTHG        1.04067
HTAG        1.04067
HTR         1.04067
Season      1.04067
dtype: float64


<a id='conclusion123'></a>
### Conclusions for questions 1 to 3 

> Arsenal scored more goals playing at home than playing away from home. The goals are more frequent in the second half. 

### Research Questions
- Distribution of goals per match
- Winning percentage against all teams 
- Winning percentage against the big four
- Winning percentage in derbies 
- Winning percentage in derbies at home and away
- Winning percentage in title seasons (96-97, 01-02, 03-04) 
- Goals scored in title seasons X non-title seasons 
- Goals conceded in title seasons X non-title seasons 