## 2021: Week 28 - It's Coming Rome

'55 years of hurt, Never stopped me dreaming!'

It was another night of pain for England fans on Sunday evening when they lost yet another penalty shootout in the European Football Championship final. This seems like it has been a common outcome for a lot of the tournaments that England have taken part in over the years, but what does the data agree? 

The challenge this week is to analyse the all of the penalty shootouts in the World Cup and European Championships (Euro's) since 1976.

### Input
Data is from Wikipedia (World Cup & Euro's) and is two sheets

### Requirements
- Input Data
- Determine what competition each penalty was taken in
- Clean any fields, correctly format the date the penalty was taken, & group the two German countries (eg, West Germany & Germany)
- Rank the countries on the following: 
    - Shootout win % (exclude teams who have never won a shootout)
    - Penalties scored %
- What is the most and least successful time to take a penalty? (What penalty number are you most likely to score or miss?)
- Output the Data

### Outputs
3 Outputs:
1. Win % Rankings (5 fields, 26 rows)
2. Scored % Rankings (5 fields, 34 rows)
3. Penalty Position Rankings (6 fields, 9 rows)

In [240]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

### Input Data

In [241]:
data = pd.read_excel("./data/InternationalPenalties.xlsx", sheet_name=["WorldCup", "Euros"])

In [242]:
world_cup = data["WorldCup"].copy()
euros = data["Euros"].copy()

### Determine what competition each penalty was taken in

In [243]:
world_cup.columns

Index(['No.', 'Penalty Number ', 'Event Year', 'Winner', 'Full Time Score',
       'Loser', 'Winning Team GK', 'Winning team Taker', 'Losing team Taker',
       'Losing Team GK', 'Round', 'Date'],
      dtype='object')

In [244]:
world_cup["Penalty Number "].value_counts()

1    30
2    30
3    30
4    30
5    24
6     2
Name: Penalty Number , dtype: int64

In [245]:
world_cup.head()

Unnamed: 0,No.,Penalty Number,Event Year,Winner,Full Time Score,Loser,Winning Team GK,Winning team Taker,Losing team Taker,Losing Team GK,Round,Date
0,1,1,1982,West Germany,3–3,France,Schumacher,Kaltz Penalty scored,Penalty scored Giresse,Ettori,Semi-finals,2021-07-08
1,1,2,1982,West Germany,3–3,France,Schumacher,Breitner Penalty scored,Penalty scored Amoros,Ettori,Semi-finals,2021-07-08
2,1,3,1982,West Germany,3–3,France,Schumacher,Stielike Penalty missed,Penalty scored Rocheteau,Ettori,Semi-finals,2021-07-08
3,1,4,1982,West Germany,3–3,France,Schumacher,Littbarski Penalty scored,Penalty missed Six,Ettori,Semi-finals,2021-07-08
4,1,5,1982,West Germany,3–3,France,Schumacher,Rummenigge Penalty scored,Penalty scored Platini,Ettori,Semi-finals,2021-07-08


In [246]:
euros.columns

Index(['No.', 'Penalty Number', 'Event Year', 'Winner', 'Full Time Score',
       'Loser', 'Winning team GK', 'Winning team Taker', 'Losing team Taker',
       'Losing team GK', 'Round', 'Date'],
      dtype='object')

In [247]:
euros["Penalty Number"].value_counts()

1    22
2    22
3    22
4    22
5    19
6     6
7     3
8     2
9     2
Name: Penalty Number, dtype: int64

In [248]:
winner_penalty = world_cup["Winning team Taker"].str.split().apply(pd.Series)

In [249]:
winner_penalty["Winner Penalty type"] = winner_penalty[2]

In [250]:
winner_penalty = winner_penalty.drop([0, 1, 2, 3, 4, 5], axis=1)
winner_penalty

Unnamed: 0,Winner Penalty type
0,scored
1,scored
2,missed
3,scored
4,scored
...,...
141,scored
142,missed
143,scored
144,scored


In [251]:
world_cup = world_cup.join(winner_penalty, how="left")
world_cup

Unnamed: 0,No.,Penalty Number,Event Year,Winner,Full Time Score,Loser,Winning Team GK,Winning team Taker,Losing team Taker,Losing Team GK,Round,Date,Winner Penalty type
0,1,1,1982,West Germany,3–3,France,Schumacher,Kaltz Penalty scored,Penalty scored Giresse,Ettori,Semi-finals,2021-07-08,scored
1,1,2,1982,West Germany,3–3,France,Schumacher,Breitner Penalty scored,Penalty scored Amoros,Ettori,Semi-finals,2021-07-08,scored
2,1,3,1982,West Germany,3–3,France,Schumacher,Stielike Penalty missed,Penalty scored Rocheteau,Ettori,Semi-finals,2021-07-08,missed
3,1,4,1982,West Germany,3–3,France,Schumacher,Littbarski Penalty scored,Penalty missed Six,Ettori,Semi-finals,2021-07-08,scored
4,1,5,1982,West Germany,3–3,France,Schumacher,Rummenigge Penalty scored,Penalty scored Platini,Ettori,Semi-finals,2021-07-08,scored
...,...,...,...,...,...,...,...,...,...,...,...,...,...
141,30,1,2018,Croatia,2–2,Russia,Subašić,Brozović Penalty scored,Penalty missed Smolov,Akinfeev,Quarter-finals,2021-07-07,scored
142,30,2,2018,Croatia,2–2,Russia,Subašić,Kovačić Penalty missed,Penalty scored Dzagoev,Akinfeev,Quarter-finals,2021-07-07,missed
143,30,3,2018,Croatia,2–2,Russia,Subašić,Modrić Penalty scored,Penalty missed Fernandes,Akinfeev,Quarter-finals,2021-07-07,scored
144,30,4,2018,Croatia,2–2,Russia,Subašić,Vida Penalty scored,Penalty scored Ignashevich,Akinfeev,Quarter-finals,2021-07-07,scored


In [252]:
loser_penalty = world_cup["Losing team Taker"].str.split().apply(pd.Series)
loser_penalty["Loser Penalty type"] = loser_penalty[1]
loser_penalty = loser_penalty.drop([0, 1, 2, 3, 4], axis=1)
loser_penalty

Unnamed: 0,Loser Penalty type
0,scored
1,scored
2,scored
3,missed
4,scored
...,...
141,missed
142,scored
143,missed
144,scored


In [253]:
world_cup = world_cup.join(loser_penalty, how="left")
world_cup

Unnamed: 0,No.,Penalty Number,Event Year,Winner,Full Time Score,Loser,Winning Team GK,Winning team Taker,Losing team Taker,Losing Team GK,Round,Date,Winner Penalty type,Loser Penalty type
0,1,1,1982,West Germany,3–3,France,Schumacher,Kaltz Penalty scored,Penalty scored Giresse,Ettori,Semi-finals,2021-07-08,scored,scored
1,1,2,1982,West Germany,3–3,France,Schumacher,Breitner Penalty scored,Penalty scored Amoros,Ettori,Semi-finals,2021-07-08,scored,scored
2,1,3,1982,West Germany,3–3,France,Schumacher,Stielike Penalty missed,Penalty scored Rocheteau,Ettori,Semi-finals,2021-07-08,missed,scored
3,1,4,1982,West Germany,3–3,France,Schumacher,Littbarski Penalty scored,Penalty missed Six,Ettori,Semi-finals,2021-07-08,scored,missed
4,1,5,1982,West Germany,3–3,France,Schumacher,Rummenigge Penalty scored,Penalty scored Platini,Ettori,Semi-finals,2021-07-08,scored,scored
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
141,30,1,2018,Croatia,2–2,Russia,Subašić,Brozović Penalty scored,Penalty missed Smolov,Akinfeev,Quarter-finals,2021-07-07,scored,missed
142,30,2,2018,Croatia,2–2,Russia,Subašić,Kovačić Penalty missed,Penalty scored Dzagoev,Akinfeev,Quarter-finals,2021-07-07,missed,scored
143,30,3,2018,Croatia,2–2,Russia,Subašić,Modrić Penalty scored,Penalty missed Fernandes,Akinfeev,Quarter-finals,2021-07-07,scored,missed
144,30,4,2018,Croatia,2–2,Russia,Subašić,Vida Penalty scored,Penalty scored Ignashevich,Akinfeev,Quarter-finals,2021-07-07,scored,scored


### Clean any fields, correctly format the date the penalty was taken, & group the two German countries (eg, West Germany & Germany)

In [254]:
world_cup.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 146 entries, 0 to 145
Data columns (total 14 columns):
 #   Column               Non-Null Count  Dtype         
---  ------               --------------  -----         
 0   No.                  146 non-null    int64         
 1   Penalty Number       146 non-null    int64         
 2   Event Year           146 non-null    object        
 3   Winner               146 non-null    object        
 4   Full Time Score      146 non-null    object        
 5   Loser                146 non-null    object        
 6   Winning Team GK      146 non-null    object        
 7   Winning team Taker   141 non-null    object        
 8   Losing team Taker    138 non-null    object        
 9   Losing Team GK       146 non-null    object        
 10  Round                146 non-null    object        
 11  Date                 146 non-null    datetime64[ns]
 12  Winner Penalty type  141 non-null    object        
 13  Loser Penalty type   138 non-null  

In [255]:
world_cup["Event Year"] = pd.to_datetime(world_cup["Event Year"].str.replace(",", ""))
world_cup["Event Year"] = world_cup["Event Year"].map(lambda x: x.year)

In [256]:
world_cup["Winner"] = world_cup["Winner"].str.strip()
world_cup["Loser"] = world_cup["Loser"].str.strip()

In [257]:
world_cup.loc[world_cup["Winner"] == "West Germany", "Winner"] = "Germany"

In [258]:
world_cup[world_cup["Winner"] == "Germany"].shape

(19, 14)

In [259]:
euros["Winner"].value_counts()

 Spain             20
 Italy             19
 Germany           15
 Czechoslovakia    14
 Portugal          12
 Czech Republic     6
 Netherlands        6
 Denmark            5
 France             5
 Poland             5
  Switzerland       5
 England            4
 Turkey             4
Name: Winner, dtype: int64

In [260]:
euros["Winner"] = euros["Winner"].str.strip()
euros["Winner"].value_counts()

Spain             20
Italy             19
Germany           15
Czechoslovakia    14
Portugal          12
Czech Republic     6
Netherlands        6
Denmark            5
France             5
Poland             5
Switzerland        5
England            4
Turkey             4
Name: Winner, dtype: int64

In [261]:
euros["Loser"] = euros["Loser"].str.strip()
euros.loc[euros["Loser"] == "West Germany", "Loser"] = "Germany"

In [262]:
euros.shape

(120, 12)

In [263]:
euros.head()

Unnamed: 0,No.,Penalty Number,Event Year,Winner,Full Time Score,Loser,Winning team GK,Winning team Taker,Losing team Taker,Losing team GK,Round,Date
0,1,1,1976,Czechoslovakia,2–2,Germany,Viktor,Masný Penalty scored,Penalty scored Bonhof,Maier,Final,2021-06-20
1,1,2,1976,Czechoslovakia,2–2,Germany,Viktor,Nehoda Penalty scored,Penalty scored Flohe,Maier,Final,2021-06-20
2,1,3,1976,Czechoslovakia,2–2,Germany,Viktor,Ondruš Penalty scored,Penalty scored Bongartz,Maier,Final,2021-06-20
3,1,4,1976,Czechoslovakia,2–2,Germany,Viktor,Jurkemik Penalty scored,Penalty missed Hoeneß,Maier,Final,2021-06-20
4,1,5,1976,Czechoslovakia,2–2,Germany,Viktor,Panenka Penalty scored,,Maier,Final,2021-06-20


### Null value replacement

In [264]:
world_cup.loc[world_cup["Winner"] == "South Korea", "Winner Penalty type"] = "scored"

In [265]:
world_cup.loc[19, "Winner Penalty type"] = "scored"

In [266]:
world_cup.loc[[45, 48, 94, 95, 107, 117], "Winner Penalty type"] = "scored"
world_cup.loc[105, "Winner Penalty type"] = "missed"

In [267]:
world_cup[world_cup["Winner Penalty type"].isna()]

Unnamed: 0,No.,Penalty Number,Event Year,Winner,Full Time Score,Loser,Winning Team GK,Winning team Taker,Losing team Taker,Losing Team GK,Round,Date,Winner Penalty type,Loser Penalty type
34,7,5,1990,Argentina,1–1,Italy,Goycochea,,Penalty missed Serena,Zenga,Semi-finals,2021-07-03,,missed
39,8,5,1990,Germany,1–1,England,Illgner,,Penalty missed Waddle,Shilton,Semi-finals,2021-07-04,,missed
54,11,5,1994,Brazil,0–0,Italy,Taffarel,,Penalty missed R. Baggio,Pagliuca,Final,2021-07-17,,missed
121,25,5,2014,Netherlands,0–0,Costa Rica,Krul,,Penalty missed Umaña,Navas,Quarter-finals,2021-07-05,,missed
130,27,5,2018,Russia,1–1,Spain,Akinfeev,,Penalty missed Aspas,De Gea,Second round,2021-07-01,,missed


### Rank the countries on the following: 
- Shootout win % (exclude teams who have never won a shootout)
- Penalties scored %

In [268]:
world_cup_shootout = world_cup.drop(["Winner Penalty type", "Loser Penalty type"], axis=1)
world_cup_shootout.shape

(146, 12)

In [269]:
world_cup_shootout.columns = ['No.', 'Penalty Number', 'Event Year', 'Winner', 'Full Time Score',
       'Loser', 'Winning Team GK', 'Winning team Taker', 'Losing team Taker',
       'Losing Team GK', 'Round', 'Date']
euros.columns = ['No.', 'Penalty Number', 'Event Year', 'Winner', 'Full Time Score',
       'Loser', 'Winning Team GK', 'Winning team Taker', 'Losing team Taker',
       'Losing Team GK', 'Round', 'Date']

In [270]:
total_shootout = pd.concat([world_cup_shootout, euros], axis=0)
total_shootout.shape

(266, 12)

In [271]:
# Penalties scored %
# world_cup["Winner Penalty type"] = world_cup["Winner Penalty type"].map({"scored": 1, "missed": 0})
# world_cup["Loser Penalty type"] = world_cup["Loser Penalty type"].map({"scored": 1, "missed": 0})

In [272]:
total_shootout = total_shootout.drop(["Full Time Score", "Winning Team GK", "Winning team Taker",
                "Losing team Taker", "Losing Team GK", "Round", "Date"], axis=1)
total_shootout

Unnamed: 0,No.,Penalty Number,Winner,Loser
0,1,1,Germany,France
1,1,2,Germany,France
2,1,3,Germany,France
3,1,4,Germany,France
4,1,5,Germany,France
...,...,...,...,...
115,22,1,Italy,England
116,22,2,Italy,England
117,22,3,Italy,England
118,22,4,Italy,England


In [290]:
total_shootout[(total_shootout["Winner"] == "Germany") | (total_shootout["Loser"] == "Germany")]

Unnamed: 0,No.,Penalty Number,Winner,Loser
0,1,1,Germany,France
1,1,2,Germany,France
2,1,3,Germany,France
3,1,4,Germany,France
4,1,5,Germany,France
5,1,6,Germany,France
11,3,1,Germany,Mexico
12,3,2,Germany,Mexico
13,3,3,Germany,Mexico
14,3,4,Germany,Mexico


In [283]:
winner = total_shootout.drop_duplicates(subset=["No.", "Winner"])[["No.", "Winner"]]
loser = total_shootout.drop_duplicates(subset=["No.", "Loser"])[["No.", "Loser"]]

In [284]:
winner = winner.groupby(["Winner"])["No."].nunique()
loser = loser.groupby(["Loser"])["No."].nunique()

In [285]:
winner

Winner
Argentina              4
Belgium                1
Brazil                 3
Bulgaria               1
Costa Rica             1
Croatia                2
Czech Republic         1
Czechoslovakia         2
Denmark                1
England                2
France                 3
Germany                4
Italy                  5
Netherlands            2
Paraguay               1
Poland                 1
Portugal               3
Republic of Ireland    1
Russia                 1
South Korea            1
Spain                  4
Sweden                 1
Switzerland            1
Turkey                 1
Ukraine                1
Uruguay                1
Name: No., dtype: int64

In [286]:
loser

Loser
Argentina              1
Brazil                 1
Chile                  1
Colombia               1
Costa Rica             1
Croatia                1
Denmark                2
England                6
France                 4
Germany                1
Ghana                  1
Greece                 1
Italy                  5
Japan                  1
Mexico                 2
Netherlands            5
Poland                 1
Portugal               1
Republic of Ireland    1
Romania                2
Russia                 1
Spain                  5
Sweden                 1
Switzerland            3
Yugoslavia             1
Name: No., dtype: int64

In [287]:
win_pct = pd.concat([winner, loser], axis=1).fillna(0)
win_pct.columns = ["Shootouts", "lose_shootouts"]
win_pct["Total Shootouts"] = win_pct["Shootouts"] + win_pct["lose_shootouts"]
win_pct = win_pct.reset_index()
win_pct["Shootout Win %"] = (win_pct["Shootouts"] / win_pct["Total Shootouts"] * 100).round(0)

In [288]:
win_pct = win_pct.sort_values(by="Shootout Win %", ascending=False)
no_shootout_idx = win_pct[win_pct["Shootout Win %"] == 0].index
win_pct = win_pct.drop(no_shootout_idx, axis=0).rename(columns={"index": "Team"})
win_pct = win_pct.drop("lose_shootouts", axis=1)

In [289]:
win_pct["Win % Rank"] = win_pct["Shootout Win %"].rank(method="dense", ascending=False).astype(int)
win_pct = win_pct.reset_index(drop=True)
win_pct

Unnamed: 0,Team,Shootouts,Total Shootouts,Shootout Win %,Win % Rank
0,Bulgaria,1.0,1.0,100.0,1
1,Uruguay,1.0,1.0,100.0,1
2,Czech Republic,1.0,1.0,100.0,1
3,Czechoslovakia,2.0,2.0,100.0,1
4,Ukraine,1.0,1.0,100.0,1
5,Turkey,1.0,1.0,100.0,1
6,South Korea,1.0,1.0,100.0,1
7,Paraguay,1.0,1.0,100.0,1
8,Belgium,1.0,1.0,100.0,1
9,Argentina,4.0,5.0,80.0,2


In [61]:
winner_scored = world_cup.groupby(["Winner"])["Winner Penalty type"].sum()
winner_scored

Winner
Argentina              scoredscoredmissedmissedscoredscoredscoredscor...
Belgium                                      scoredscoredscoredscoredder
Brazil                 missedscoredscoredscoredscoredscoredscoredscor...
Bulgaria                                        missedscoredscoredscored
Costa Rica                                scoredscoredscoredscoredscored
Croatia                missedscoredscoredmissedscoredscoredmissedscor...
England                                   scoredscoredmissedscoredscored
France                 scoredscoredscoredmissedscoredscoredmissedscor...
Germany                scoredscoredmissedscoredscoredscoredscoredscor...
Italy                                   scoredscoredPenaltyPenaltyscored
Netherlands                                    Penaltyscoredscoredscored
Paraguay                                  scoredscoredscoredscoredscored
Portugal                                  scoredmissedmissedscoredscored
Republic of Ireland                       sc

In [22]:
loser_scored = world_cup.groupby(["Loser"])["Loser Penalty type"].sum()
loser_scored

Loser
Argentina               2.0
Brazil                  3.0
Chile                   2.0
Colombia                3.0
Costa Rica              3.0
Denmark                 2.0
England                 7.0
France                  7.0
Ghana                   2.0
Greece                  3.0
Italy                   8.0
Japan                   3.0
Mexico                  2.0
Netherlands             4.0
Republic of Ireland     2.0
Romania                 8.0
Russia                  3.0
Spain                  10.0
Switzerland             0.0
Yugoslavia              2.0
Name: Loser Penalty type, dtype: float64

In [23]:
total_scored = pd.concat([winner_scored, loser_scored], axis=1)
total_scored = total_scored.fillna(0)
total_scored["Total scored"] = total_scored["Winner Penalty type"] + total_scored["Loser Penalty type"]
total_scored

Unnamed: 0,Winner Penalty type,Loser Penalty type,Total scored
Argentina,15.0,2.0,17.0
Belgium,4.0,0.0,4.0
Brazil,9.0,3.0,12.0
Bulgaria,3.0,0.0,3.0
Costa Rica,5.0,3.0,8.0
Croatia,7.0,0.0,7.0
England,4.0,7.0,11.0
France,8.0,7.0,15.0
Germany,17.0,0.0,17.0
Italy,3.0,8.0,11.0


In [24]:
winner_shoot_count = world_cup.groupby(["Winner"])["Winner Penalty type"].count()
winner_shoot_count

Winner
Argentina              18
Belgium                 4
Brazil                 12
Bulgaria                4
Costa Rica              5
Croatia                10
England                 5
France                 10
Germany                18
Italy                   3
Netherlands             3
Paraguay                5
Portugal                5
Republic of Ireland     5
Russia                  4
South Korea             0
Spain                   5
Sweden                  4
Ukraine                 4
Uruguay                 4
Name: Winner Penalty type, dtype: int64

In [25]:
loser_shoot_count = world_cup.groupby(["Loser"])["Loser Penalty type"].count()
loser_shoot_count

Loser
Argentina               4
Brazil                  5
Chile                   5
Colombia                5
Costa Rica              5
Denmark                 5
England                14
France                 10
Ghana                   4
Greece                  4
Italy                  15
Japan                   4
Mexico                  7
Netherlands             8
Republic of Ireland     5
Romania                11
Russia                  5
Spain                  14
Switzerland             3
Yugoslavia              5
Name: Loser Penalty type, dtype: int64

In [26]:
total_shoot = pd.concat([winner_shoot_count, loser_shoot_count], axis=1).fillna(0)
total_shoot["Total count"] = total_shoot["Winner Penalty type"] + total_shoot["Loser Penalty type"]
total_shoot

Unnamed: 0,Winner Penalty type,Loser Penalty type,Total count
Argentina,18.0,4.0,22.0
Belgium,4.0,0.0,4.0
Brazil,12.0,5.0,17.0
Bulgaria,4.0,0.0,4.0
Costa Rica,5.0,5.0,10.0
Croatia,10.0,0.0,10.0
England,5.0,14.0,19.0
France,10.0,10.0,20.0
Germany,18.0,0.0,18.0
Italy,3.0,15.0,18.0


In [27]:
total = pd.concat([total_scored["Total scored"], total_shoot["Total count"]], axis=1)
total["Total missed"] = total["Total count"] - total["Total scored"]
total

Unnamed: 0,Total scored,Total count,Total missed
Argentina,17.0,22.0,5.0
Belgium,4.0,4.0,0.0
Brazil,12.0,17.0,5.0
Bulgaria,3.0,4.0,1.0
Costa Rica,8.0,10.0,2.0
Croatia,7.0,10.0,3.0
England,11.0,19.0,8.0
France,15.0,20.0,5.0
Germany,17.0,18.0,1.0
Italy,11.0,18.0,7.0


In [28]:
total["Total scored"] / total["Total count"]

Argentina              0.772727
Belgium                1.000000
Brazil                 0.705882
Bulgaria               0.750000
Costa Rica             0.800000
Croatia                0.700000
England                0.578947
France                 0.750000
Germany                0.944444
Italy                  0.611111
Netherlands            0.636364
Paraguay               1.000000
Portugal               0.600000
Republic of Ireland    0.700000
Russia                 0.777778
South Korea                 NaN
Spain                  0.684211
Sweden                 0.750000
Ukraine                0.750000
Uruguay                1.000000
Chile                  0.400000
Colombia               0.600000
Denmark                0.400000
Ghana                  0.500000
Greece                 0.750000
Japan                  0.750000
Mexico                 0.285714
Romania                0.727273
Switzerland            0.000000
Yugoslavia             0.400000
dtype: float64