## 2021: Week 28 - It's Coming Rome

'55 years of hurt, Never stopped me dreaming!'

It was another night of pain for England fans on Sunday evening when they lost yet another penalty shootout in the European Football Championship final. This seems like it has been a common outcome for a lot of the tournaments that England have taken part in over the years, but what does the data agree? 

The challenge this week is to analyse the all of the penalty shootouts in the World Cup and European Championships (Euro's) since 1976.

### Input
Data is from Wikipedia (World Cup & Euro's) and is two sheets

### Requirements
- Input Data
- Determine what competition each penalty was taken in
- Clean any fields, correctly format the date the penalty was taken, & group the two German countries (eg, West Germany & Germany)
- Rank the countries on the following: 
    - Shootout win % (exclude teams who have never won a shootout)
    - Penalties scored %
- What is the most and least successful time to take a penalty? (What penalty number are you most likely to score or miss?)
- Output the Data

### Outputs
3 Outputs:
1. Win % Rankings (5 fields, 26 rows)
2. Scored % Rankings (5 fields, 34 rows)
3. Penalty Position Rankings (6 fields, 9 rows)

In [744]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

### Input Data

In [745]:
data = pd.read_excel("./data/InternationalPenalties.xlsx", sheet_name=["WorldCup", "Euros"])

In [746]:
world_cup = data["WorldCup"].copy()
euros = data["Euros"].copy()

### Determine what competition each penalty was taken in

In [747]:
world_cup.columns

Index(['No.', 'Penalty Number ', 'Event Year', 'Winner', 'Full Time Score',
       'Loser', 'Winning Team GK', 'Winning team Taker', 'Losing team Taker',
       'Losing Team GK', 'Round', 'Date'],
      dtype='object')

In [748]:
world_cup["Penalty Number "].value_counts()

1    30
2    30
3    30
4    30
5    24
6     2
Name: Penalty Number , dtype: int64

In [749]:
world_cup.head()

Unnamed: 0,No.,Penalty Number,Event Year,Winner,Full Time Score,Loser,Winning Team GK,Winning team Taker,Losing team Taker,Losing Team GK,Round,Date
0,1,1,1982,West Germany,3–3,France,Schumacher,Kaltz Penalty scored,Penalty scored Giresse,Ettori,Semi-finals,2021-07-08
1,1,2,1982,West Germany,3–3,France,Schumacher,Breitner Penalty scored,Penalty scored Amoros,Ettori,Semi-finals,2021-07-08
2,1,3,1982,West Germany,3–3,France,Schumacher,Stielike Penalty missed,Penalty scored Rocheteau,Ettori,Semi-finals,2021-07-08
3,1,4,1982,West Germany,3–3,France,Schumacher,Littbarski Penalty scored,Penalty missed Six,Ettori,Semi-finals,2021-07-08
4,1,5,1982,West Germany,3–3,France,Schumacher,Rummenigge Penalty scored,Penalty scored Platini,Ettori,Semi-finals,2021-07-08


In [750]:
euros.columns

Index(['No.', 'Penalty Number', 'Event Year', 'Winner', 'Full Time Score',
       'Loser', 'Winning team GK', 'Winning team Taker', 'Losing team Taker',
       'Losing team GK', 'Round', 'Date'],
      dtype='object')

In [751]:
euros["Penalty Number"].value_counts()

1    22
2    22
3    22
4    22
5    19
6     6
7     3
8     2
9     2
Name: Penalty Number, dtype: int64

In [752]:
# winner_penalty = world_cup["Winning team Taker"].str.split().apply(pd.Series)
# winner_penalty["Winner Penalty type"] = winner_penalty[2]
# winner_penalty = winner_penalty.drop([0, 1, 2, 3, 4, 5], axis=1)
# winner_penalty

In [753]:
def find_penalty_type(string):
    import re
    regex = re.compile("scored")
    if string == "Unknown":
        return np.nan
    match_obj = regex.search(string)
    if match_obj != None:
        return match_obj.group()
    else:
        return "missed"

In [754]:
world_cup["Winning team Taker"] = world_cup["Winning team Taker"].fillna("Unknown")
world_cup["Losing team Taker"] = world_cup["Losing team Taker"].fillna("Unknown")

winner_penalty = world_cup["Winning team Taker"].map(lambda x: find_penalty_type(x))
loser_penalty = world_cup["Losing team Taker"].map(lambda x: find_penalty_type(x))

In [755]:
winner_penalty.value_counts()

scored    120
missed     21
Name: Winning team Taker, dtype: int64

In [756]:
loser_penalty.value_counts()

scored    76
missed    62
Name: Losing team Taker, dtype: int64

In [757]:
# loser_penalty = world_cup["Losing team Taker"].str.split().apply(pd.Series)
# loser_penalty["Loser Penalty type"] = loser_penalty[1]
# loser_penalty = loser_penalty.drop([0, 1, 2, 3, 4], axis=1)
# loser_penalty

In [758]:
winner_penalty

0      scored
1      scored
2      missed
3      scored
4      scored
        ...  
141    scored
142    missed
143    scored
144    scored
145    scored
Name: Winning team Taker, Length: 146, dtype: object

In [759]:
world_cup["Winner Penalty type"] = winner_penalty
world_cup["Loser Penalty type"] = loser_penalty

In [760]:
euros

Unnamed: 0,No.,Penalty Number,Event Year,Winner,Full Time Score,Loser,Winning team GK,Winning team Taker,Losing team Taker,Losing team GK,Round,Date
0,1,1,1976,Czechoslovakia,2–2,West Germany,Viktor,Masný Penalty scored,Penalty scored Bonhof,Maier,Final,2021-06-20
1,1,2,1976,Czechoslovakia,2–2,West Germany,Viktor,Nehoda Penalty scored,Penalty scored Flohe,Maier,Final,2021-06-20
2,1,3,1976,Czechoslovakia,2–2,West Germany,Viktor,Ondruš Penalty scored,Penalty scored Bongartz,Maier,Final,2021-06-20
3,1,4,1976,Czechoslovakia,2–2,West Germany,Viktor,Jurkemik Penalty scored,Penalty missed Hoeneß,Maier,Final,2021-06-20
4,1,5,1976,Czechoslovakia,2–2,West Germany,Viktor,Panenka Penalty scored,,Maier,Final,2021-06-20
...,...,...,...,...,...,...,...,...,...,...,...,...
115,22,1,2020,Italy,1–1,England,Donnarumma,Berardi Penalty scored,Penalty scored Kane,Pickford,Final,2021-07-11
116,22,2,2020,Italy,1–1,England,Donnarumma,Belotti Penalty missed,Penalty scored Maguire,Pickford,Final,2021-07-11
117,22,3,2020,Italy,1–1,England,Donnarumma,Bonucci Penalty scored,Penalty missed Rashford,Pickford,Final,2021-07-11
118,22,4,2020,Italy,1–1,England,Donnarumma,Bernardeschi Penalty scored,Penalty missed Sancho,Pickford,Final,2021-07-11


In [761]:
euros["Winning team Taker"] = euros["Winning team Taker"].fillna("Unknown")
euros["Losing team Taker"] = euros["Losing team Taker"].fillna("Unknown")

euros_winner_penalty = euros["Winning team Taker"].map(lambda x: find_penalty_type(x))
euros_loser_penalty = euros["Losing team Taker"].map(lambda x: find_penalty_type(x))

In [762]:
euros_winner_penalty.value_counts()

scored    105
missed     14
Name: Winning team Taker, dtype: int64

In [763]:
euros_loser_penalty.value_counts()

scored    73
missed    40
Name: Losing team Taker, dtype: int64

In [764]:
euros["Winner Penalty type"] = euros_winner_penalty
euros["Loser Penalty type"] = euros_loser_penalty
euros

Unnamed: 0,No.,Penalty Number,Event Year,Winner,Full Time Score,Loser,Winning team GK,Winning team Taker,Losing team Taker,Losing team GK,Round,Date,Winner Penalty type,Loser Penalty type
0,1,1,1976,Czechoslovakia,2–2,West Germany,Viktor,Masný Penalty scored,Penalty scored Bonhof,Maier,Final,2021-06-20,scored,scored
1,1,2,1976,Czechoslovakia,2–2,West Germany,Viktor,Nehoda Penalty scored,Penalty scored Flohe,Maier,Final,2021-06-20,scored,scored
2,1,3,1976,Czechoslovakia,2–2,West Germany,Viktor,Ondruš Penalty scored,Penalty scored Bongartz,Maier,Final,2021-06-20,scored,scored
3,1,4,1976,Czechoslovakia,2–2,West Germany,Viktor,Jurkemik Penalty scored,Penalty missed Hoeneß,Maier,Final,2021-06-20,scored,missed
4,1,5,1976,Czechoslovakia,2–2,West Germany,Viktor,Panenka Penalty scored,Unknown,Maier,Final,2021-06-20,scored,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
115,22,1,2020,Italy,1–1,England,Donnarumma,Berardi Penalty scored,Penalty scored Kane,Pickford,Final,2021-07-11,scored,scored
116,22,2,2020,Italy,1–1,England,Donnarumma,Belotti Penalty missed,Penalty scored Maguire,Pickford,Final,2021-07-11,missed,scored
117,22,3,2020,Italy,1–1,England,Donnarumma,Bonucci Penalty scored,Penalty missed Rashford,Pickford,Final,2021-07-11,scored,missed
118,22,4,2020,Italy,1–1,England,Donnarumma,Bernardeschi Penalty scored,Penalty missed Sancho,Pickford,Final,2021-07-11,scored,missed


### Clean any fields, correctly format the date the penalty was taken, & group the two German countries (eg, West Germany & Germany)

In [765]:
world_cup.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 146 entries, 0 to 145
Data columns (total 14 columns):
 #   Column               Non-Null Count  Dtype         
---  ------               --------------  -----         
 0   No.                  146 non-null    int64         
 1   Penalty Number       146 non-null    int64         
 2   Event Year           146 non-null    object        
 3   Winner               146 non-null    object        
 4   Full Time Score      146 non-null    object        
 5   Loser                146 non-null    object        
 6   Winning Team GK      146 non-null    object        
 7   Winning team Taker   146 non-null    object        
 8   Losing team Taker    146 non-null    object        
 9   Losing Team GK       146 non-null    object        
 10  Round                146 non-null    object        
 11  Date                 146 non-null    datetime64[ns]
 12  Winner Penalty type  141 non-null    object        
 13  Loser Penalty type   138 non-null  

In [766]:
world_cup["Event Year"] = pd.to_datetime(world_cup["Event Year"].str.replace(",", ""))
world_cup["Event Year"] = world_cup["Event Year"].map(lambda x: x.year)

In [767]:
world_cup["Winner"] = world_cup["Winner"].str.strip()
world_cup["Loser"] = world_cup["Loser"].str.strip()

In [768]:
world_cup.loc[world_cup["Winner"] == "West Germany", "Winner"] = "Germany"

In [769]:
world_cup[world_cup["Winner"] == "Germany"].shape

(19, 14)

In [770]:
euros["Winner"].value_counts()

 Spain             20
 Italy             19
 Germany           15
 Czechoslovakia    14
 Portugal          12
 Czech Republic     6
 Netherlands        6
 Denmark            5
 France             5
 Poland             5
  Switzerland       5
 England            4
 Turkey             4
Name: Winner, dtype: int64

In [771]:
euros["Winner"] = euros["Winner"].str.strip()
euros["Winner"].value_counts()

Spain             20
Italy             19
Germany           15
Czechoslovakia    14
Portugal          12
Czech Republic     6
Netherlands        6
Denmark            5
France             5
Poland             5
Switzerland        5
England            4
Turkey             4
Name: Winner, dtype: int64

In [772]:
euros["Loser"] = euros["Loser"].str.strip()
euros["Loser"].value_counts()

Italy           23
England         23
Netherlands     14
France          11
Switzerland     10
Spain            9
Sweden           6
West Germany     5
Denmark          5
Portugal         5
Poland           5
Croatia          4
Name: Loser, dtype: int64

In [773]:
euros.loc[euros["Loser"] == "West Germany", "Loser"] = "Germany"

In [774]:
euros.shape

(120, 14)

In [775]:
euros.head()

Unnamed: 0,No.,Penalty Number,Event Year,Winner,Full Time Score,Loser,Winning team GK,Winning team Taker,Losing team Taker,Losing team GK,Round,Date,Winner Penalty type,Loser Penalty type
0,1,1,1976,Czechoslovakia,2–2,Germany,Viktor,Masný Penalty scored,Penalty scored Bonhof,Maier,Final,2021-06-20,scored,scored
1,1,2,1976,Czechoslovakia,2–2,Germany,Viktor,Nehoda Penalty scored,Penalty scored Flohe,Maier,Final,2021-06-20,scored,scored
2,1,3,1976,Czechoslovakia,2–2,Germany,Viktor,Ondruš Penalty scored,Penalty scored Bongartz,Maier,Final,2021-06-20,scored,scored
3,1,4,1976,Czechoslovakia,2–2,Germany,Viktor,Jurkemik Penalty scored,Penalty missed Hoeneß,Maier,Final,2021-06-20,scored,missed
4,1,5,1976,Czechoslovakia,2–2,Germany,Viktor,Panenka Penalty scored,Unknown,Maier,Final,2021-06-20,scored,


### Null value replacement

In [776]:
world_cup.loc[world_cup["Winner"] == "South Korea", "Winner Penalty type"] = "scored"

In [777]:
world_cup.loc[19, "Winner Penalty type"] = "scored"

In [778]:
world_cup.loc[[45, 48, 94, 95, 107, 117], "Winner Penalty type"] = "scored"
world_cup.loc[105, "Winner Penalty type"] = "missed"

In [779]:
world_cup[world_cup["Winner Penalty type"].isna()]

Unnamed: 0,No.,Penalty Number,Event Year,Winner,Full Time Score,Loser,Winning Team GK,Winning team Taker,Losing team Taker,Losing Team GK,Round,Date,Winner Penalty type,Loser Penalty type
34,7,5,1990,Argentina,1–1,Italy,Goycochea,Unknown,Penalty missed Serena,Zenga,Semi-finals,2021-07-03,,missed
39,8,5,1990,Germany,1–1,England,Illgner,Unknown,Penalty missed Waddle,Shilton,Semi-finals,2021-07-04,,missed
54,11,5,1994,Brazil,0–0,Italy,Taffarel,Unknown,Penalty missed R. Baggio,Pagliuca,Final,2021-07-17,,missed
121,25,5,2014,Netherlands,0–0,Costa Rica,Krul,Unknown,Penalty missed Umaña,Navas,Quarter-finals,2021-07-05,,missed
130,27,5,2018,Russia,1–1,Spain,Akinfeev,Unknown,Penalty missed Aspas,De Gea,Second round,2021-07-01,,missed


In [780]:
euros[euros["Winner Penalty type"].isna()]

Unnamed: 0,No.,Penalty Number,Event Year,Winner,Full Time Score,Loser,Winning team GK,Winning team Taker,Losing team Taker,Losing team GK,Round,Date,Winner Penalty type,Loser Penalty type
65,12,4,2008,Turkey,1–1,Croatia,Reçber,Unknown,Penalty missed Petrić,Pletikosa,Quarter-finals,2021-06-20,,missed


In [781]:
euros[euros["Loser Penalty type"].isna()]

Unnamed: 0,No.,Penalty Number,Event Year,Winner,Full Time Score,Loser,Winning team GK,Winning team Taker,Losing team Taker,Losing team GK,Round,Date,Winner Penalty type,Loser Penalty type
4,1,5,1976,Czechoslovakia,2–2,Germany,Viktor,Panenka Penalty scored,Unknown,Maier,Final,2021-06-20,scored,
70,13,5,2008,Spain,0–0,Italy,Casillas,Fàbregas Penalty scored,Unknown,Buffon,Quarter-finals,2021-06-22,scored,
75,14,5,2012,Italy,0–0,England,Buffon,Diamanti Penalty scored,Unknown,Hart,Quarter-finals,2021-06-24,scored,
80,15,5,2012,Spain,0–0,Portugal,Casillas,Fàbregas Penalty scored,Unknown,Patrício,Semi-finals,2021-06-27,scored,
90,17,5,2016,Portugal,1–1,Poland,Patrício,Quaresma Penalty scored,Unknown,Fabiański,Quarter-finals,2021-06-30,scored,
109,20,5,2020,Spain,1–1,Switzerland,Simón,Oyarzabal Penalty scored,Unknown,Sommer,Quarter-finals,2021-07-02,scored,
114,21,5,2020,Italy,1–1,Spain,Donnarumma,Jorginho Penalty scored,Unknown,Simón,Semi-finals,2021-07-06,scored,


### Rank the countries on the following: 
- Shootout win % (exclude teams who have never won a shootout)
- Penalties scored %

### Shootout win %

In [782]:
world_cup_shootout = world_cup.drop(["Winner Penalty type", "Loser Penalty type"], axis=1)
euros_shootout = euros.drop(["Winner Penalty type", "Loser Penalty type"], axis=1)
world_cup_shootout.shape, euros_shootout.shape

((146, 12), (120, 12))

In [783]:
world_cup_shootout.columns = ['No.', 'Penalty Number', 'Event Year', 'Winner', 'Full Time Score',
       'Loser', 'Winning Team GK', 'Winning team Taker', 'Losing team Taker',
       'Losing Team GK', 'Round', 'Date']
euros_shootout.columns = ['No.', 'Penalty Number', 'Event Year', 'Winner', 'Full Time Score',
       'Loser', 'Winning Team GK', 'Winning team Taker', 'Losing team Taker',
       'Losing Team GK', 'Round', 'Date']

In [784]:
world_cup_shootout

Unnamed: 0,No.,Penalty Number,Event Year,Winner,Full Time Score,Loser,Winning Team GK,Winning team Taker,Losing team Taker,Losing Team GK,Round,Date
0,1,1,1982,Germany,3–3,France,Schumacher,Kaltz Penalty scored,Penalty scored Giresse,Ettori,Semi-finals,2021-07-08
1,1,2,1982,Germany,3–3,France,Schumacher,Breitner Penalty scored,Penalty scored Amoros,Ettori,Semi-finals,2021-07-08
2,1,3,1982,Germany,3–3,France,Schumacher,Stielike Penalty missed,Penalty scored Rocheteau,Ettori,Semi-finals,2021-07-08
3,1,4,1982,Germany,3–3,France,Schumacher,Littbarski Penalty scored,Penalty missed Six,Ettori,Semi-finals,2021-07-08
4,1,5,1982,Germany,3–3,France,Schumacher,Rummenigge Penalty scored,Penalty scored Platini,Ettori,Semi-finals,2021-07-08
...,...,...,...,...,...,...,...,...,...,...,...,...
141,30,1,2018,Croatia,2–2,Russia,Subašić,Brozović Penalty scored,Penalty missed Smolov,Akinfeev,Quarter-finals,2021-07-07
142,30,2,2018,Croatia,2–2,Russia,Subašić,Kovačić Penalty missed,Penalty scored Dzagoev,Akinfeev,Quarter-finals,2021-07-07
143,30,3,2018,Croatia,2–2,Russia,Subašić,Modrić Penalty scored,Penalty missed Fernandes,Akinfeev,Quarter-finals,2021-07-07
144,30,4,2018,Croatia,2–2,Russia,Subašić,Vida Penalty scored,Penalty scored Ignashevich,Akinfeev,Quarter-finals,2021-07-07


In [785]:
world_cup_shootout = world_cup_shootout.drop(["Full Time Score", "Winning Team GK", "Winning team Taker",
                "Losing team Taker", "Losing Team GK", "Round", "Date"], axis=1)
world_cup_shootout

Unnamed: 0,No.,Penalty Number,Event Year,Winner,Loser
0,1,1,1982,Germany,France
1,1,2,1982,Germany,France
2,1,3,1982,Germany,France
3,1,4,1982,Germany,France
4,1,5,1982,Germany,France
...,...,...,...,...,...
141,30,1,2018,Croatia,Russia
142,30,2,2018,Croatia,Russia
143,30,3,2018,Croatia,Russia
144,30,4,2018,Croatia,Russia


In [786]:
euros_shootout = euros_shootout.drop(["Full Time Score", "Winning Team GK", "Winning team Taker",
                "Losing team Taker", "Losing Team GK", "Round", "Date"], axis=1)
euros_shootout

Unnamed: 0,No.,Penalty Number,Event Year,Winner,Loser
0,1,1,1976,Czechoslovakia,Germany
1,1,2,1976,Czechoslovakia,Germany
2,1,3,1976,Czechoslovakia,Germany
3,1,4,1976,Czechoslovakia,Germany
4,1,5,1976,Czechoslovakia,Germany
...,...,...,...,...,...
115,22,1,2020,Italy,England
116,22,2,2020,Italy,England
117,22,3,2020,Italy,England
118,22,4,2020,Italy,England


In [787]:
world_cup_shootout = world_cup_shootout.drop_duplicates(subset=["Winner", "Loser", "Event Year"])
euros_shootout = euros_shootout.drop_duplicates(subset=["Winner", "Loser", "Event Year"])

In [788]:
world_cup_winner = world_cup_shootout["Winner"].value_counts()
world_cup_loser = world_cup_shootout["Loser"].value_counts()

In [789]:
euros_winner = euros_shootout["Winner"].value_counts()
euros_loser = euros_shootout["Loser"].value_counts()

In [790]:
world_cup_win_pct = pd.concat([world_cup_winner, world_cup_loser], axis=1).fillna(0)
world_cup_win_pct.columns = ["Shootouts", "lose_shootouts"]
world_cup_win_pct["Total Shootouts"] = world_cup_win_pct["Shootouts"] + world_cup_win_pct["lose_shootouts"]

In [791]:
euros_win_pct = pd.concat([euros_winner, euros_loser], axis=1).fillna(0)
euros_win_pct.columns = ["Shootouts", "lose_shootouts"]
euros_win_pct["Total Shootouts"] = euros_win_pct["Shootouts"] + euros_win_pct["lose_shootouts"]
euros_win_pct

Unnamed: 0,Shootouts,lose_shootouts,Total Shootouts
Spain,4.0,2.0,6.0
Italy,4.0,3.0,7.0
Czechoslovakia,2.0,0.0,2.0
Germany,2.0,1.0,3.0
Portugal,2.0,1.0,3.0
Denmark,1.0,1.0,2.0
England,1.0,4.0,5.0
France,1.0,2.0,3.0
Czech Republic,1.0,0.0,1.0
Netherlands,1.0,3.0,4.0


In [792]:
total = pd.concat([world_cup_win_pct, euros_win_pct], axis=1).fillna(0)
total.columns = ["world_shootouts", "world_lose_shootouts", "world_total_shootouts",
                 "euros_shootouts", "euros_lose_shootouts", "euros_total_shootouts"]

In [793]:
total = total.drop(["world_lose_shootouts", "euros_lose_shootouts"], axis=1)

In [794]:
total.columns

Index(['world_shootouts', 'world_total_shootouts', 'euros_shootouts',
       'euros_total_shootouts'],
      dtype='object')

In [795]:
total["Total Shootouts"] = total["world_total_shootouts"] + total["euros_total_shootouts"]
total["Shootouts"] = total["world_shootouts"] + total["euros_shootouts"]
total = total.reset_index().rename(columns={"index": "Team"})
total = total.drop(["world_shootouts", "world_total_shootouts",
                    "euros_shootouts", "euros_total_shootouts"], axis=1)
total

Unnamed: 0,Team,Total Shootouts,Shootouts
0,Germany,7.0,6.0
1,Argentina,5.0,4.0
2,Brazil,4.0,3.0
3,Croatia,3.0,2.0
4,France,7.0,3.0
5,Italy,11.0,5.0
6,Russia,2.0,1.0
7,Netherlands,7.0,2.0
8,Costa Rica,2.0,1.0
9,Uruguay,1.0,1.0


In [796]:
total["Shootout Win %"] = (total["Shootouts"] / total["Total Shootouts"] * 100).round(0)

In [797]:
total = total.sort_values(by="Shootout Win %", ascending=False)
no_shootout_idx = total[total["Shootout Win %"] == 0].index
total = total.drop(no_shootout_idx, axis=0).rename(columns={"index": "Team"})
total

Unnamed: 0,Team,Total Shootouts,Shootouts,Shootout Win %
32,Turkey,1.0,1.0,100.0
31,Czech Republic,1.0,1.0,100.0
30,Czechoslovakia,2.0,2.0,100.0
18,Belgium,1.0,1.0,100.0
9,Uruguay,1.0,1.0,100.0
10,Paraguay,1.0,1.0,100.0
11,Ukraine,1.0,1.0,100.0
13,South Korea,1.0,1.0,100.0
16,Bulgaria,1.0,1.0,100.0
0,Germany,7.0,6.0,86.0


In [798]:
total["Win % Rank"] = total["Shootout Win %"].rank(method="dense", ascending=False).astype(int)
total = total.reset_index(drop=True)
total = total.loc[:, ["Win % Rank", "Shootout Win %", "Total Shootouts", "Shootouts", "Team"]]

total["Shootout Win %"] = total["Shootout Win %"].astype(int)
total["Total Shootouts"] = total["Total Shootouts"].astype(int)
total["Shootouts"] = total["Shootouts"].astype(int)

total

Unnamed: 0,Win % Rank,Shootout Win %,Total Shootouts,Shootouts,Team
0,1,100,1,1,Turkey
1,1,100,1,1,Czech Republic
2,1,100,2,2,Czechoslovakia
3,1,100,1,1,Belgium
4,1,100,1,1,Uruguay
5,1,100,1,1,Paraguay
6,1,100,1,1,Ukraine
7,1,100,1,1,South Korea
8,1,100,1,1,Bulgaria
9,2,86,7,6,Germany


### Penalties scored %

In [799]:
# Penalties scored %
# world_cup["Winner Penalty type"] = world_cup["Winner Penalty type"].map({"scored": 1, "missed": 0})
# world_cup["Loser Penalty type"] = world_cup["Loser Penalty type"].map({"scored": 1, "missed": 0})

In [800]:
winner = (world_cup[["Winner", "Winner Penalty type"]]
              .melt(id_vars="Winner", value_name="Penalties")
              .drop("variable", axis=1)
              .rename(columns={"Winner": "Team"})

)
winner["Penalties"] = winner["Penalties"].str.strip()
winner["Penalties"] = winner["Penalties"].map({"scored": 1, "missed": 0})
winner

Unnamed: 0,Team,Penalties
0,Germany,1.0
1,Germany,1.0
2,Germany,0.0
3,Germany,1.0
4,Germany,1.0
...,...,...
141,Croatia,1.0
142,Croatia,0.0
143,Croatia,1.0
144,Croatia,1.0


In [801]:
loser = (world_cup[["Loser", "Loser Penalty type"]]
             .melt(id_vars="Loser", value_name="Penalties")
             .drop("variable", axis=1)
             .rename(columns={"Loser": "Team"})
        )
loser["Penalties"] = loser["Penalties"].str.strip()
loser["Penalties"] = loser["Penalties"].map({"scored": 1, "missed": 0})
loser

Unnamed: 0,Team,Penalties
0,France,1.0
1,France,1.0
2,France,1.0
3,France,0.0
4,France,1.0
...,...,...
141,Russia,0.0
142,Russia,1.0
143,Russia,0.0
144,Russia,1.0


In [802]:
world_cup_penalties = pd.concat([winner, loser], axis=0)

In [803]:
winner = (euros[["Winner", "Winner Penalty type"]]
              .melt(id_vars="Winner", value_name="Penalties")
              .drop("variable", axis=1)
              .rename(columns={"Winner": "Team"})
         )
winner["Penalties"] = winner["Penalties"].str.strip()
winner["Penalties"] = winner["Penalties"].map({"scored": 1, "missed": 0})
winner

Unnamed: 0,Team,Penalties
0,Czechoslovakia,1.0
1,Czechoslovakia,1.0
2,Czechoslovakia,1.0
3,Czechoslovakia,1.0
4,Czechoslovakia,1.0
...,...,...
115,Italy,1.0
116,Italy,0.0
117,Italy,1.0
118,Italy,1.0


In [804]:
loser = (euros[["Loser", "Loser Penalty type"]]
              .melt(id_vars="Loser", value_name="Penalties")
              .drop("variable", axis=1)
              .rename(columns={"Loser": "Team"})
         )
loser["Penalties"] = loser["Penalties"].str.strip()
loser["Penalties"] = loser["Penalties"].map({"scored": 1, "missed": 0})
loser

Unnamed: 0,Team,Penalties
0,Germany,1.0
1,Germany,1.0
2,Germany,1.0
3,Germany,0.0
4,Germany,
...,...,...
115,England,1.0
116,England,1.0
117,England,0.0
118,England,0.0


In [805]:
euros_penalties = pd.concat([winner, loser], axis=0)

In [806]:
total_penalties = pd.concat([world_cup_penalties, euros_penalties], axis=0)
total_penalties = total_penalties.dropna()

In [808]:
penalties_scored = total_penalties.groupby(["Team"])["Penalties"].sum()
penalties_missed = total_penalties.groupby(["Team"])["Penalties"].count() - total_penalties.groupby(["Team"])["Penalties"].sum()
penalties_total = total_penalties.groupby(["Team"])["Penalties"].count()

In [809]:
score_pct = pd.concat([penalties_missed, penalties_scored, penalties_total], axis=1)
score_pct.columns = ["Penalties Missed", "Penalties Scored", "Penalties Total"]
score_pct

Unnamed: 0_level_0,Penalties Missed,Penalties Scored,Penalties Total
Team,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Argentina,5.0,17.0,22
Belgium,0.0,5.0,5
Brazil,5.0,13.0,18
Bulgaria,1.0,3.0,4
Chile,3.0,2.0,5
Colombia,2.0,3.0,5
Costa Rica,2.0,8.0,10
Croatia,6.0,8.0,14
Czech Republic,0.0,6.0,6
Czechoslovakia,0.0,14.0,14


In [810]:
score_pct["% Total Penalties Scored"] = (score_pct["Penalties Scored"] / score_pct["Penalties Total"] * 100).round(0).astype(int)
score_pct["Penalties Scored %Rank"] = score_pct["% Total Penalties Scored"].rank(method="dense", ascending=False).astype(int)
score_pct = score_pct.sort_values(by="% Total Penalties Scored", ascending=False)
score_pct = score_pct.reset_index()
score_pct = score_pct.loc[:, ["Penalties Scored %Rank", "% Total Penalties Scored", "Penalties Missed", "Penalties Scored", "Team"]]
score_pct[["Penalties Missed", "Penalties Scored"]] = score_pct[["Penalties Missed", "Penalties Scored"]].astype(int)
score_pct

Unnamed: 0,Penalties Scored %Rank,% Total Penalties Scored,Penalties Missed,Penalties Scored,Team
0,1,100,0,5,Belgium
1,1,100,0,3,Turkey
2,1,100,0,5,South Korea
3,1,100,0,6,Czech Republic
4,1,100,0,14,Czechoslovakia
5,1,100,0,5,Paraguay
6,2,89,1,8,Poland
7,3,86,5,32,Germany
8,4,81,7,29,France
9,5,80,1,4,Uruguay
