## 2021: Week 28 - It's Coming Rome

'55 years of hurt, Never stopped me dreaming!'

It was another night of pain for England fans on Sunday evening when they lost yet another penalty shootout in the European Football Championship final. This seems like it has been a common outcome for a lot of the tournaments that England have taken part in over the years, but what does the data agree? 

The challenge this week is to analyse the all of the penalty shootouts in the World Cup and European Championships (Euro's) since 1976.

### Input
Data is from Wikipedia (World Cup & Euro's) and is two sheets

### Requirements
- Input Data
- Determine what competition each penalty was taken in
- Clean any fields, correctly format the date the penalty was taken, & group the two German countries (eg, West Germany & Germany)
- Rank the countries on the following: 
    - Shootout win % (exclude teams who have never won a shootout)
    - Penalties scored %
- What is the most and least successful time to take a penalty? (What penalty number are you most likely to score or miss?)
- Output the Data

### Outputs
3 Outputs:
1. Win % Rankings (5 fields, 26 rows)
2. Scored % Rankings (5 fields, 34 rows)
3. Penalty Position Rankings (6 fields, 9 rows)

In [227]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

### Input Data

In [228]:
data = pd.read_excel("./data/InternationalPenalties.xlsx", sheet_name=["WorldCup", "Euros"])

In [229]:
world_cup = data["WorldCup"].copy()
euros = data["Euros"].copy()

### Determine what competition each penalty was taken in

In [230]:
world_cup.columns

Index(['No.', 'Penalty Number ', 'Event Year', 'Winner', 'Full Time Score',
       'Loser', 'Winning Team GK', 'Winning team Taker', 'Losing team Taker',
       'Losing Team GK', 'Round', 'Date'],
      dtype='object')

In [231]:
world_cup["Penalty Number "].value_counts()

1    30
2    30
3    30
4    30
5    24
6     2
Name: Penalty Number , dtype: int64

In [232]:
world_cup.head()

Unnamed: 0,No.,Penalty Number,Event Year,Winner,Full Time Score,Loser,Winning Team GK,Winning team Taker,Losing team Taker,Losing Team GK,Round,Date
0,1,1,1982,West Germany,3–3,France,Schumacher,Kaltz Penalty scored,Penalty scored Giresse,Ettori,Semi-finals,2021-07-08
1,1,2,1982,West Germany,3–3,France,Schumacher,Breitner Penalty scored,Penalty scored Amoros,Ettori,Semi-finals,2021-07-08
2,1,3,1982,West Germany,3–3,France,Schumacher,Stielike Penalty missed,Penalty scored Rocheteau,Ettori,Semi-finals,2021-07-08
3,1,4,1982,West Germany,3–3,France,Schumacher,Littbarski Penalty scored,Penalty missed Six,Ettori,Semi-finals,2021-07-08
4,1,5,1982,West Germany,3–3,France,Schumacher,Rummenigge Penalty scored,Penalty scored Platini,Ettori,Semi-finals,2021-07-08


In [233]:
euros.columns

Index(['No.', 'Penalty Number', 'Event Year', 'Winner', 'Full Time Score',
       'Loser', 'Winning team GK', 'Winning team Taker', 'Losing team Taker',
       'Losing team GK', 'Round', 'Date'],
      dtype='object')

In [234]:
euros["Penalty Number"].value_counts()

1    22
2    22
3    22
4    22
5    19
6     6
7     3
8     2
9     2
Name: Penalty Number, dtype: int64

In [235]:
winner_penalty = world_cup["Winning team Taker"].str.split().apply(pd.Series)

In [236]:
winner_penalty["Winner Penalty type"] = winner_penalty[2]

In [237]:
winner_penalty = winner_penalty.drop([0, 1, 2, 3, 4, 5], axis=1)
winner_penalty

Unnamed: 0,Winner Penalty type
0,scored
1,scored
2,missed
3,scored
4,scored
...,...
141,scored
142,missed
143,scored
144,scored


In [238]:
world_cup = world_cup.join(winner_penalty, how="left")
world_cup

Unnamed: 0,No.,Penalty Number,Event Year,Winner,Full Time Score,Loser,Winning Team GK,Winning team Taker,Losing team Taker,Losing Team GK,Round,Date,Winner Penalty type
0,1,1,1982,West Germany,3–3,France,Schumacher,Kaltz Penalty scored,Penalty scored Giresse,Ettori,Semi-finals,2021-07-08,scored
1,1,2,1982,West Germany,3–3,France,Schumacher,Breitner Penalty scored,Penalty scored Amoros,Ettori,Semi-finals,2021-07-08,scored
2,1,3,1982,West Germany,3–3,France,Schumacher,Stielike Penalty missed,Penalty scored Rocheteau,Ettori,Semi-finals,2021-07-08,missed
3,1,4,1982,West Germany,3–3,France,Schumacher,Littbarski Penalty scored,Penalty missed Six,Ettori,Semi-finals,2021-07-08,scored
4,1,5,1982,West Germany,3–3,France,Schumacher,Rummenigge Penalty scored,Penalty scored Platini,Ettori,Semi-finals,2021-07-08,scored
...,...,...,...,...,...,...,...,...,...,...,...,...,...
141,30,1,2018,Croatia,2–2,Russia,Subašić,Brozović Penalty scored,Penalty missed Smolov,Akinfeev,Quarter-finals,2021-07-07,scored
142,30,2,2018,Croatia,2–2,Russia,Subašić,Kovačić Penalty missed,Penalty scored Dzagoev,Akinfeev,Quarter-finals,2021-07-07,missed
143,30,3,2018,Croatia,2–2,Russia,Subašić,Modrić Penalty scored,Penalty missed Fernandes,Akinfeev,Quarter-finals,2021-07-07,scored
144,30,4,2018,Croatia,2–2,Russia,Subašić,Vida Penalty scored,Penalty scored Ignashevich,Akinfeev,Quarter-finals,2021-07-07,scored


In [239]:
loser_penalty = world_cup["Losing team Taker"].str.split().apply(pd.Series)
loser_penalty["Loser Penalty type"] = loser_penalty[1]
loser_penalty = loser_penalty.drop([0, 1, 2, 3, 4], axis=1)
loser_penalty

Unnamed: 0,Loser Penalty type
0,scored
1,scored
2,scored
3,missed
4,scored
...,...
141,missed
142,scored
143,missed
144,scored


In [240]:
world_cup = world_cup.join(loser_penalty, how="left")
world_cup

Unnamed: 0,No.,Penalty Number,Event Year,Winner,Full Time Score,Loser,Winning Team GK,Winning team Taker,Losing team Taker,Losing Team GK,Round,Date,Winner Penalty type,Loser Penalty type
0,1,1,1982,West Germany,3–3,France,Schumacher,Kaltz Penalty scored,Penalty scored Giresse,Ettori,Semi-finals,2021-07-08,scored,scored
1,1,2,1982,West Germany,3–3,France,Schumacher,Breitner Penalty scored,Penalty scored Amoros,Ettori,Semi-finals,2021-07-08,scored,scored
2,1,3,1982,West Germany,3–3,France,Schumacher,Stielike Penalty missed,Penalty scored Rocheteau,Ettori,Semi-finals,2021-07-08,missed,scored
3,1,4,1982,West Germany,3–3,France,Schumacher,Littbarski Penalty scored,Penalty missed Six,Ettori,Semi-finals,2021-07-08,scored,missed
4,1,5,1982,West Germany,3–3,France,Schumacher,Rummenigge Penalty scored,Penalty scored Platini,Ettori,Semi-finals,2021-07-08,scored,scored
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
141,30,1,2018,Croatia,2–2,Russia,Subašić,Brozović Penalty scored,Penalty missed Smolov,Akinfeev,Quarter-finals,2021-07-07,scored,missed
142,30,2,2018,Croatia,2–2,Russia,Subašić,Kovačić Penalty missed,Penalty scored Dzagoev,Akinfeev,Quarter-finals,2021-07-07,missed,scored
143,30,3,2018,Croatia,2–2,Russia,Subašić,Modrić Penalty scored,Penalty missed Fernandes,Akinfeev,Quarter-finals,2021-07-07,scored,missed
144,30,4,2018,Croatia,2–2,Russia,Subašić,Vida Penalty scored,Penalty scored Ignashevich,Akinfeev,Quarter-finals,2021-07-07,scored,scored


### Clean any fields, correctly format the date the penalty was taken, & group the two German countries (eg, West Germany & Germany)

In [241]:
world_cup.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 146 entries, 0 to 145
Data columns (total 14 columns):
 #   Column               Non-Null Count  Dtype         
---  ------               --------------  -----         
 0   No.                  146 non-null    int64         
 1   Penalty Number       146 non-null    int64         
 2   Event Year           146 non-null    object        
 3   Winner               146 non-null    object        
 4   Full Time Score      146 non-null    object        
 5   Loser                146 non-null    object        
 6   Winning Team GK      146 non-null    object        
 7   Winning team Taker   141 non-null    object        
 8   Losing team Taker    138 non-null    object        
 9   Losing Team GK       146 non-null    object        
 10  Round                146 non-null    object        
 11  Date                 146 non-null    datetime64[ns]
 12  Winner Penalty type  141 non-null    object        
 13  Loser Penalty type   138 non-null  

In [242]:
world_cup["Event Year"] = pd.to_datetime(world_cup["Event Year"].str.replace(",", ""))
world_cup["Event Year"] = world_cup["Event Year"].map(lambda x: x.year)

In [243]:
world_cup["Winner"] = world_cup["Winner"].str.strip()
world_cup["Loser"] = world_cup["Loser"].str.strip()

In [244]:
world_cup.loc[world_cup["Winner"] == "West Germany", "Winner"] = "Germany"

In [245]:
world_cup[world_cup["Winner"] == "Germany"]

Unnamed: 0,No.,Penalty Number,Event Year,Winner,Full Time Score,Loser,Winning Team GK,Winning team Taker,Losing team Taker,Losing Team GK,Round,Date,Winner Penalty type,Loser Penalty type
0,1,1,1982,Germany,3–3,France,Schumacher,Kaltz Penalty scored,Penalty scored Giresse,Ettori,Semi-finals,2021-07-08,scored,scored
1,1,2,1982,Germany,3–3,France,Schumacher,Breitner Penalty scored,Penalty scored Amoros,Ettori,Semi-finals,2021-07-08,scored,scored
2,1,3,1982,Germany,3–3,France,Schumacher,Stielike Penalty missed,Penalty scored Rocheteau,Ettori,Semi-finals,2021-07-08,missed,scored
3,1,4,1982,Germany,3–3,France,Schumacher,Littbarski Penalty scored,Penalty missed Six,Ettori,Semi-finals,2021-07-08,scored,missed
4,1,5,1982,Germany,3–3,France,Schumacher,Rummenigge Penalty scored,Penalty scored Platini,Ettori,Semi-finals,2021-07-08,scored,scored
5,1,6,1982,Germany,3–3,France,Schumacher,Hrubesch Penalty scored,Penalty missed Bossis,Ettori,Semi-finals,2021-07-08,scored,missed
11,3,1,1986,Germany,0–0,Mexico,Schumacher,Allofs Penalty scored,Penalty scored Negrete,Larios,Quarter-finals,2021-06-21,scored,scored
12,3,2,1986,Germany,0–0,Mexico,Schumacher,Brehme Penalty scored,Penalty missed Quirarte,Larios,Quarter-finals,2021-06-21,scored,missed
13,3,3,1986,Germany,0–0,Mexico,Schumacher,Matthäus Penalty scored,Penalty missed Servín,Larios,Quarter-finals,2021-06-21,scored,missed
14,3,4,1986,Germany,0–0,Mexico,Schumacher,Littbarski Penalty scored,,Larios,Quarter-finals,2021-06-21,scored,


### Rank the countries on the following: 
- Shootout win % (exclude teams who have never won a shootout)
- Penalties scored %

In [246]:
# Penalties scored %
world_cup["Winner Penalty type"] = world_cup["Winner Penalty type"].map({"scored": 1, "missed": 0})
world_cup["Loser Penalty type"] = world_cup["Loser Penalty type"].map({"scored": 1, "missed": 0})

In [247]:
winner_scored = world_cup.groupby(["Winner"])["Winner Penalty type"].sum()
winner_scored

Winner
Argentina              15.0
Belgium                 4.0
Brazil                  9.0
Bulgaria                3.0
Costa Rica              5.0
Croatia                 7.0
England                 4.0
France                  8.0
Germany                17.0
Italy                   3.0
Netherlands             3.0
Paraguay                5.0
Portugal                3.0
Republic of Ireland     5.0
Russia                  4.0
South Korea             0.0
Spain                   3.0
Sweden                  3.0
Ukraine                 3.0
Uruguay                 4.0
Name: Winner Penalty type, dtype: float64

In [248]:
loser_scored = world_cup.groupby(["Loser"])["Loser Penalty type"].sum()
loser_scored

Loser
Argentina               2.0
Brazil                  3.0
Chile                   2.0
Colombia                3.0
Costa Rica              3.0
Denmark                 2.0
England                 7.0
France                  7.0
Ghana                   2.0
Greece                  3.0
Italy                   8.0
Japan                   3.0
Mexico                  2.0
Netherlands             4.0
Republic of Ireland     2.0
Romania                 8.0
Russia                  3.0
Spain                  10.0
Switzerland             0.0
Yugoslavia              2.0
Name: Loser Penalty type, dtype: float64

In [254]:
total_scored = pd.concat([winner_scored, loser_scored], axis=1)
total_scored = total_scored.fillna(0)
total_scored["Total scored"] = total_scored["Winner Penalty type"] + total_scored["Loser Penalty type"]
total_scored

Unnamed: 0,Winner Penalty type,Loser Penalty type,Total scored
Argentina,15.0,2.0,17.0
Belgium,4.0,0.0,4.0
Brazil,9.0,3.0,12.0
Bulgaria,3.0,0.0,3.0
Costa Rica,5.0,3.0,8.0
Croatia,7.0,0.0,7.0
England,4.0,7.0,11.0
France,8.0,7.0,15.0
Germany,17.0,0.0,17.0
Italy,3.0,8.0,11.0


In [255]:
winner_shoot_count = world_cup.groupby(["Winner"])["Winner Penalty type"].count()
winner_shoot_count

Winner
Argentina              18
Belgium                 4
Brazil                 12
Bulgaria                4
Costa Rica              5
Croatia                10
England                 5
France                 10
Germany                18
Italy                   3
Netherlands             3
Paraguay                5
Portugal                5
Republic of Ireland     5
Russia                  4
South Korea             0
Spain                   5
Sweden                  4
Ukraine                 4
Uruguay                 4
Name: Winner Penalty type, dtype: int64

In [256]:
loser_shoot_count = world_cup.groupby(["Loser"])["Loser Penalty type"].count()
loser_shoot_count

Loser
Argentina               4
Brazil                  5
Chile                   5
Colombia                5
Costa Rica              5
Denmark                 5
England                14
France                 10
Ghana                   4
Greece                  4
Italy                  15
Japan                   4
Mexico                  7
Netherlands             8
Republic of Ireland     5
Romania                11
Russia                  5
Spain                  14
Switzerland             3
Yugoslavia              5
Name: Loser Penalty type, dtype: int64

In [259]:
total_shoot = pd.concat([winner_shoot_count, loser_shoot_count], axis=1).fillna(0)
total_shoot["Total count"] = total_shoot["Winner Penalty type"] + total_shoot["Loser Penalty type"]
total_shoot

Unnamed: 0,Winner Penalty type,Loser Penalty type,Total count
Argentina,18.0,4.0,22.0
Belgium,4.0,0.0,4.0
Brazil,12.0,5.0,17.0
Bulgaria,4.0,0.0,4.0
Costa Rica,5.0,5.0,10.0
Croatia,10.0,0.0,10.0
England,5.0,14.0,19.0
France,10.0,10.0,20.0
Germany,18.0,0.0,18.0
Italy,3.0,15.0,18.0
