## <font color='blue'> Section 2. Data Transformation </font>

Once the data has been sufficiently cleaned, it is time to transform the data by aggregating and deriving the necessary columns. 
For this project, we have the personal attributes per player, but we would like to consider the team attributes (generally average of players) per game, which requires combining some of the tables and then running aggregate queries. 

### Data aggregation process

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import altair as alt
import matplotlib.pyplot as plt
pd.set_option('display.max_columns', 300) # allow us read the full picture of dataframe

In [None]:
#read the 'cleaned' csv files into pandas dataframe
appearances = pd.read_csv("./appearances_cleaned.csv")
clubs = pd.read_csv("./clubs_cleaned.csv")
competitions = pd.read_csv("./competitions_cleaned.csv")
games = pd.read_csv("./games_cleaned.csv")
players = pd.read_csv("./players_cleaned.csv")

#### process the player_appearance dataframe

In this dataframe each row is a player in a game and all his attributes. In the original dataset, the **appearances** table has the information of player in a game, and the **players** table has the information of "country_of_birth/country_of_citizenship/date_of_birth/position/sub_position/foot/height_in_cm/continent". Therefore, use pandas to merge the 2 dataframes. 

We also need the player's age in the date of the game, and the date of the game would be obtained from the **games** table

In [None]:
#merge the table appearances & players & games. 
player_appearance = appearances.merge(players, how='left', on='player_id')
player_appearance = player_appearance.merge(games, how='left', on='game_id')

#drop the incomplete row
player_appearance.dropna(subset=['home_club_id'],inplace=True)

# in the games table the only necessary column is date(of the game) to calculate the player's age in the matchday. drop the other columns
player_appearance.drop(['Unnamed: 0_x', 'Unnamed: 0_y', 'competition_code', 'season','Unnamed: 0', 'round','home_club_id', 'away_club_id', 'home_club_goals', 'away_club_goals', 'home_club_position', 'away_club_position'],inplace=True,axis=1)

#calculate the player's age in the matchday
player_appearance.rename(columns={"date": "game_date"},inplace=True)
player_appearance['age_in_matchday'] = ""
player_appearance['game_date'] = pd.to_datetime(player_appearance['game_date']).dt.date
player_appearance['date_of_birth'] = pd.to_datetime(player_appearance['date_of_birth']).dt.date
player_appearance.eval('age_in_matchday = game_date - date_of_birth', inplace=True)
player_appearance['age_in_matchday'] = player_appearance['age_in_matchday'].dt.days/365

#process the player_appearance table, adding the club_game id for each record
player_appearance['game_id'] = player_appearance['game_id'].astype('Int64')
player_appearance['club_game'] = player_appearance['player_club_id'].map(str) +'_'+ player_appearance['game_id'].map(str)
player_appearance.head()

Unnamed: 0,player_id,game_id,appearance_id,player_club_id,last_season,current_club_id,pretty_name,country_of_citizenship,date_of_birth,position,foot,height_in_cm,continent,game_date,age_in_matchday,club_game
0,52453,2483937,2483937_52453,28095,2015.0,28095.0,Haris Handzic,Bosnia and Herzegovina,1990-06-20,Attack,Left,191.0,Europe,2014-08-08,24.150685,28095_2483937
1,67064,2479929,2479929_67064,28095,2017.0,4128.0,Felicio Brown Forbes,Costa Rica,1991-08-28,Attack,Right,189.0,North America,2014-08-03,22.947945,28095_2479929
2,67064,2483937,2483937_67064,28095,2017.0,4128.0,Felicio Brown Forbes,Costa Rica,1991-08-28,Attack,Right,189.0,North America,2014-08-08,22.961644,28095_2483937
3,67064,2484582,2484582_67064,28095,2017.0,4128.0,Felicio Brown Forbes,Costa Rica,1991-08-28,Attack,Right,189.0,North America,2014-08-13,22.975342,28095_2484582
4,67064,2485965,2485965_67064,28095,2017.0,4128.0,Felicio Brown Forbes,Costa Rica,1991-08-28,Attack,Right,189.0,North America,2014-08-16,22.983562,28095_2485965


#### process the result dataframe

In this dataframe each row is a team in a game and all the factors that might be a factor influencing the result. Our modelling would based on this dataframe

In the original dataset, the **games** table has the information of result in a game, each row is a game. We need to split it, make each row represent one team in one game.

Therefore, use pandas to transformat the **games** dataframe. 

In [None]:
#process the game table to split the team
games.head()

Unnamed: 0.1,Unnamed: 0,game_id,competition_code,season,round,date,home_club_id,away_club_id,home_club_goals,away_club_goals,home_club_position,away_club_position
0,0,2457642,NLSC,2014,Final,2014-08-03,1269,610,1,0,0.0,0.0
1,1,2639088,BESC,2013,Final,2014-07-20,58,498,2,1,0.0,0.0
2,2,2481145,SUC,2014,final 1st leg,2014-08-19,418,13,1,1,0.0,0.0
3,3,2484338,POSU,2014,Final,2014-08-10,294,2425,3,2,0.0,0.0
4,4,2502472,FRCH,2014,Final,2014-08-02,583,855,2,0,0.0,0.0


In [None]:
#drop the Delete duplicate index column
games.drop(games.columns[0], axis=1, inplace=True)

#function to get the result based on the goals
def get_result(x):
    if x["home_club_goals"] > x["away_club_goals"]:
        return "homewin"
    elif x["home_club_goals"] < x["away_club_goals"]:
        return "awaywin"
    else:
        return "draw"
    
games.loc[:,"result"] = games.apply(get_result,axis=1)
games.head()

Unnamed: 0,game_id,competition_code,season,round,date,home_club_id,away_club_id,home_club_goals,away_club_goals,home_club_position,away_club_position,result
0,2457642,NLSC,2014,Final,2014-08-03,1269,610,1,0,0.0,0.0,homewin
1,2639088,BESC,2013,Final,2014-07-20,58,498,2,1,0.0,0.0,homewin
2,2481145,SUC,2014,final 1st leg,2014-08-19,418,13,1,1,0.0,0.0,draw
3,2484338,POSU,2014,Final,2014-08-10,294,2425,3,2,0.0,0.0,homewin
4,2502472,FRCH,2014,Final,2014-08-02,583,855,2,0,0.0,0.0,homewin


In [None]:
#copy the dataframe and only process home club
home_result = games.copy(deep=True)
home_result.drop(['away_club_id', 'away_club_goals', 'away_club_position'],inplace=True,axis=1)
home_result['club_game'] = home_result['home_club_id'].map(str) +'_'+ home_result['game_id'].map(str)
home_result = home_result[['club_game','result','game_id', 'competition_code', 'season', 'round', 'date', 'home_club_id', 'home_club_goals', 'home_club_position']]
home_result.head()

Unnamed: 0,club_game,result,game_id,competition_code,season,round,date,home_club_id,home_club_goals,home_club_position
0,1269_2457642,homewin,2457642,NLSC,2014,Final,2014-08-03,1269,1,0.0
1,58_2639088,homewin,2639088,BESC,2013,Final,2014-07-20,58,2,0.0
2,418_2481145,draw,2481145,SUC,2014,final 1st leg,2014-08-19,418,1,0.0
3,294_2484338,homewin,2484338,POSU,2014,Final,2014-08-10,294,3,0.0
4,583_2502472,homewin,2502472,FRCH,2014,Final,2014-08-02,583,2,0.0


In [None]:
#copy the dataframe and only process away club
away_result = games.copy(deep=True)
away_result.drop(['home_club_id', 'home_club_goals', 'home_club_position'],inplace=True,axis=1)
away_result['club_game'] = away_result['away_club_id'].map(str) +'_'+ away_result['game_id'].map(str)
away_result = away_result[['club_game','result','game_id', 'competition_code', 'season', 'round', 'date', 'away_club_id', 'away_club_goals', 'away_club_position']]
away_result.head()

Unnamed: 0,club_game,result,game_id,competition_code,season,round,date,away_club_id,away_club_goals,away_club_position
0,610_2457642,homewin,2457642,NLSC,2014,Final,2014-08-03,610,0,0.0
1,498_2639088,homewin,2639088,BESC,2013,Final,2014-07-20,498,1,0.0
2,13_2481145,draw,2481145,SUC,2014,final 1st leg,2014-08-19,13,1,0.0
3,2425_2484338,homewin,2484338,POSU,2014,Final,2014-08-10,2425,2,0.0
4,855_2502472,homewin,2502472,FRCH,2014,Final,2014-08-02,855,0,0.0


### Add player attributes columns to the *result* dataframe for the modeling

columns =  
['player_number', 'num_attack', 'num_defender', 'num_midfield', 'num_goalkeeper',

'attack_ratio', 'defender_ratio', 'midfield_ratio',

'avg_age_team', 'avg_age_attack', 'avg_age_defender', 'avg_age_midfield', 'avg_age_goalkeeper',

'avg_height_team', 'avg_height_attack', 'avg_height_defender', 'avg_height_midfield', 'avg_height_goalkeeper', 

'Europe_num', 'North_America_num', 'South_America_num', 'Asia_num', 'Oceania_num', 'Africa_num', 

'EU_ratio', 'NA_ratio', 'SA_ratio', 'AS_ratio', 'AF_ratio', 'OC_ratio', 

'left_num', 'right_num', 'both_num', 'left_ratio', 'right_ratio', 'both_ratio']


#### 1. Count the number of players

Obtain the following attributes:</br>
'player_number', 'num_attack', 'num_defender', 'num_midfield', 'num_goalkeeper',

In [None]:
group_by_posi = player_appearance.groupby(['club_game','position'])
posi_count = group_by_posi['player_id'].count().reset_index(name='player_number')

posi_list = ['num_attack','num_defender','num_midfield','num_goalkeeper']
for i in posi_list:
    posi_count[i] = ""
    
def cnt_at(x):
    if x["position"] == "Attack":
        return x["player_number"]

def cnt_df(x):
    if x["position"] == "Defender":
        return x["player_number"]

def cnt_mf(x):
    if x["position"] == "Midfield":
        return x["player_number"]

def cnt_gk(x):
    if x["position"] == "Goalkeeper":
        return x["player_number"]


posi_count.loc[:,'num_attack'] = posi_count.apply(cnt_at,axis=1)
posi_count.loc[:,'num_defender'] = posi_count.apply(cnt_df,axis=1)
posi_count.loc[:,'num_midfield'] = posi_count.apply(cnt_mf,axis=1)
posi_count.loc[:,'num_goalkeeper'] = posi_count.apply(cnt_gk,axis=1)
posi_count

Unnamed: 0,club_game,position,player_number,num_attack,num_defender,num_midfield,num_goalkeeper
0,1002_3052476,Defender,1,,1.0,,
1,1002_3052491,Defender,1,,1.0,,
2,1002_3076559,Defender,1,,1.0,,
3,1002_3076560,Defender,1,,1.0,,
4,1003_2460298,Attack,5,5.0,,,
...,...,...,...,...,...,...,...
288709,995_3589358,Goalkeeper,1,,,,1.0
288710,995_3589358,Midfield,4,,,4.0,
288711,9_3292267,Attack,1,1.0,,,
288712,9_3292267,Midfield,1,,,1.0,


In [None]:
posi_group = posi_count.groupby(['club_game'])
posi_stat = posi_group.sum()
posi_stat.eval('attack_ratio = num_attack / player_number', inplace=True)
posi_stat.eval('defender_ratio = num_defender / player_number', inplace=True)
posi_stat.eval('midfield_ratio = num_midfield / player_number', inplace=True)

posi_stat

Unnamed: 0_level_0,player_number,num_attack,num_defender,num_midfield,num_goalkeeper,attack_ratio,defender_ratio,midfield_ratio
club_game,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
1002_3052476,1,0.0,1.0,0.0,0.0,0.000000,1.000000,0.000000
1002_3052491,1,0.0,1.0,0.0,0.0,0.000000,1.000000,0.000000
1002_3076559,1,0.0,1.0,0.0,0.0,0.000000,1.000000,0.000000
1002_3076560,1,0.0,1.0,0.0,0.0,0.000000,1.000000,0.000000
1003_2460298,14,5.0,4.0,4.0,1.0,0.357143,0.285714,0.285714
...,...,...,...,...,...,...,...,...
995_3589330,15,4.0,6.0,4.0,1.0,0.266667,0.400000,0.266667
995_3589334,14,4.0,5.0,4.0,1.0,0.285714,0.357143,0.285714
995_3589358,15,4.0,6.0,4.0,1.0,0.266667,0.400000,0.266667
9_3292267,2,1.0,0.0,1.0,0.0,0.500000,0.000000,0.500000


In [None]:
home_result = home_result.merge(posi_stat, how='left', on='club_game')
home_result.rename(columns={'player_number':'h_player_number', 'num_attack':'h_num_attack', 'num_defender':'h_num_defender', 'num_midfield':'h_num_midfield', 'num_goalkeeper':'h_num_goalkeeper', 'attack_ratio':'h_attack_ratio', 'defender_ratio':'h_defender_ratio', 'midfield_ratio':'h_midfield_ratio'},inplace=True)
home_result.head()

Unnamed: 0,club_game,result,game_id,competition_code,season,round,date,home_club_id,home_club_goals,home_club_position,h_player_number,h_num_attack,h_num_defender,h_num_midfield,h_num_goalkeeper,h_attack_ratio,h_defender_ratio,h_midfield_ratio
0,1269_2457642,homewin,2457642,NLSC,2014,Final,2014-08-03,1269,1,0.0,13.0,2.0,4.0,6.0,1.0,0.153846,0.307692,0.461538
1,58_2639088,homewin,2639088,BESC,2013,Final,2014-07-20,58,2,0.0,14.0,5.0,5.0,3.0,1.0,0.357143,0.357143,0.214286
2,418_2481145,draw,2481145,SUC,2014,final 1st leg,2014-08-19,418,1,0.0,13.0,4.0,4.0,4.0,1.0,0.307692,0.307692,0.307692
3,294_2484338,homewin,2484338,POSU,2014,Final,2014-08-10,294,3,0.0,14.0,5.0,4.0,4.0,1.0,0.357143,0.285714,0.285714
4,583_2502472,homewin,2502472,FRCH,2014,Final,2014-08-02,583,2,0.0,14.0,4.0,4.0,5.0,1.0,0.285714,0.285714,0.357143


In [None]:
away_result = away_result.merge(posi_stat, how='left', on='club_game')
away_result.rename(columns={'player_number':'a_player_number', 'num_attack':'a_num_attack', 'num_defender':'a_num_defender', 'num_midfield':'a_num_midfield', 'num_goalkeeper':'a_num_goalkeeper', 'attack_ratio':'a_attack_ratio', 'defender_ratio':'a_defender_ratio', 'midfield_ratio':'a_midfield_ratio'},inplace=True)
away_result.head()

Unnamed: 0,club_game,result,game_id,competition_code,season,round,date,away_club_id,away_club_goals,away_club_position,a_player_number,a_num_attack,a_num_defender,a_num_midfield,a_num_goalkeeper,a_attack_ratio,a_defender_ratio,a_midfield_ratio
0,610_2457642,homewin,2457642,NLSC,2014,Final,2014-08-03,610,0,0.0,14.0,5.0,5.0,3.0,1.0,0.357143,0.357143,0.214286
1,498_2639088,homewin,2639088,BESC,2013,Final,2014-07-20,498,1,0.0,14.0,5.0,4.0,4.0,1.0,0.357143,0.285714,0.285714
2,13_2481145,draw,2481145,SUC,2014,final 1st leg,2014-08-19,13,1,0.0,14.0,4.0,6.0,3.0,1.0,0.285714,0.428571,0.214286
3,2425_2484338,homewin,2484338,POSU,2014,Final,2014-08-10,2425,2,0.0,14.0,4.0,4.0,5.0,1.0,0.285714,0.285714,0.357143
4,855_2502472,homewin,2502472,FRCH,2014,Final,2014-08-02,855,0,0.0,14.0,5.0,4.0,4.0,1.0,0.357143,0.285714,0.285714


#### 2. Average attribute of the team

Group the player_appearance by club_game ID

In [None]:
group_team = player_appearance.groupby(['club_game'])
team_age = group_team['age_in_matchday'].mean().reset_index(name='avg_age_team')
team_height = group_team['height_in_cm'].mean().reset_index(name='avg_height_team')
team_stat = team_age.merge(team_height, how='left', on='club_game')
team_stat

Unnamed: 0,club_game,avg_age_team,avg_height_team
0,1002_3052476,22.797260,195.000000
1,1002_3052491,22.778082,195.000000
2,1002_3076559,22.821918,195.000000
3,1002_3076560,22.841096,195.000000
4,1003_2460298,26.335616,182.500000
...,...,...,...
74956,995_3589330,26.688767,180.666667
74957,995_3589334,24.833464,180.571429
74958,995_3589358,26.014977,180.333333
74959,9_3292267,24.820548,180.500000


In [None]:
#merge the result dataframe with the age stats dataframe
home_result = home_result.merge(team_stat, how='left', on='club_game')
home_result.rename(columns={'avg_age_team':'h_avg_age_team', 'avg_height_team':'h_avg_height_team'},inplace=True)
home_result.head()

Unnamed: 0,club_game,result,game_id,competition_code,season,round,date,home_club_id,home_club_goals,home_club_position,h_player_number,h_num_attack,h_num_defender,h_num_midfield,h_num_goalkeeper,h_attack_ratio,h_defender_ratio,h_midfield_ratio,h_avg_age_team,h_avg_height_team
0,1269_2457642,homewin,2457642,NLSC,2014,Final,2014-08-03,1269,1,0.0,13.0,2.0,4.0,6.0,1.0,0.153846,0.307692,0.461538,25.197471,180.461538
1,58_2639088,homewin,2639088,BESC,2013,Final,2014-07-20,58,2,0.0,14.0,5.0,5.0,3.0,1.0,0.357143,0.357143,0.214286,22.627397,180.785714
2,418_2481145,draw,2481145,SUC,2014,final 1st leg,2014-08-19,418,1,0.0,13.0,4.0,4.0,4.0,1.0,0.307692,0.307692,0.307692,27.648894,181.153846
3,294_2484338,homewin,2484338,POSU,2014,Final,2014-08-10,294,3,0.0,14.0,5.0,4.0,4.0,1.0,0.357143,0.285714,0.285714,27.850881,182.428571
4,583_2502472,homewin,2502472,FRCH,2014,Final,2014-08-02,583,2,0.0,14.0,4.0,4.0,5.0,1.0,0.285714,0.285714,0.357143,25.655577,181.5


In [None]:
#merge the result dataframe with the age stats dataframe
away_result = away_result.merge(team_stat, how='left', on='club_game')
away_result.rename(columns={'avg_age_team':'a_avg_age_team', 'avg_height_team':'a_avg_height_team'},inplace=True)
away_result.head()

Unnamed: 0,club_game,result,game_id,competition_code,season,round,date,away_club_id,away_club_goals,away_club_position,a_player_number,a_num_attack,a_num_defender,a_num_midfield,a_num_goalkeeper,a_attack_ratio,a_defender_ratio,a_midfield_ratio,a_avg_age_team,a_avg_height_team
0,610_2457642,homewin,2457642,NLSC,2014,Final,2014-08-03,610,0,0.0,14.0,5.0,5.0,3.0,1.0,0.357143,0.357143,0.214286,23.325636,181.928571
1,498_2639088,homewin,2639088,BESC,2013,Final,2014-07-20,498,1,0.0,14.0,5.0,4.0,4.0,1.0,0.357143,0.285714,0.285714,26.525832,183.285714
2,13_2481145,draw,2481145,SUC,2014,final 1st leg,2014-08-19,13,1,0.0,14.0,4.0,6.0,3.0,1.0,0.285714,0.428571,0.214286,27.065949,183.857143
3,2425_2484338,homewin,2484338,POSU,2014,Final,2014-08-10,2425,2,0.0,14.0,4.0,4.0,5.0,1.0,0.285714,0.285714,0.357143,24.600587,181.0
4,855_2502472,homewin,2502472,FRCH,2014,Final,2014-08-02,855,0,0.0,14.0,5.0,4.0,4.0,1.0,0.357143,0.285714,0.285714,29.161057,179.642857


#### 4. Age attribute by position
Group the player_appearance by club_game ID and position </br>

Obtain the following attributes:
'avg_age_attack', 'avg_age_defender', 'avg_age_midfield', 'avg_age_goalkeeper',

In [None]:
group_by_posi = player_appearance.groupby(['club_game','position'])
avg_age = group_by_posi['age_in_matchday'].mean().reset_index(name='avg_age')

age_list = ['avg_age_attack','avg_age_defender','avg_age_midfield','avg_age_goalkeeper']
for i in age_list:
    avg_age[i] = ""
    
def age_at(x):
    if x["position"] == "Attack":
        return x["avg_age"]

def age_df(x):
    if x["position"] == "Defender":
        return x["avg_age"]

def age_mf(x):
    if x["position"] == "Midfield":
        return x["avg_age"]

def age_gk(x):
    if x["position"] == "Goalkeeper":
        return x["avg_age"]


avg_age.loc[:,'avg_age_attack'] = avg_age.apply(age_at,axis=1)
avg_age.loc[:,'avg_age_defender'] = avg_age.apply(age_df,axis=1)
avg_age.loc[:,'avg_age_midfield'] = avg_age.apply(age_mf,axis=1)
avg_age.loc[:,'avg_age_goalkeeper'] = avg_age.apply(age_gk,axis=1)
avg_age.head()

Unnamed: 0,club_game,position,avg_age,avg_age_attack,avg_age_defender,avg_age_midfield,avg_age_goalkeeper
0,1002_3052476,Defender,22.79726,,22.79726,,
1,1002_3052491,Defender,22.778082,,22.778082,,
2,1002_3076559,Defender,22.821918,,22.821918,,
3,1002_3076560,Defender,22.841096,,22.841096,,
4,1003_2460298,Attack,25.269041,25.269041,,,


In [None]:
age_group = avg_age.groupby(['club_game'])
age_stat = age_group.sum()
age_stat

Unnamed: 0_level_0,avg_age,avg_age_attack,avg_age_defender,avg_age_midfield,avg_age_goalkeeper
club_game,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1002_3052476,22.797260,0.000000,22.797260,0.000000,0.000000
1002_3052491,22.778082,0.000000,22.778082,0.000000,0.000000
1002_3076559,22.821918,0.000000,22.821918,0.000000,0.000000
1002_3076560,22.841096,0.000000,22.841096,0.000000,0.000000
1003_2460298,106.705342,25.269041,27.787671,25.851370,27.797260
...,...,...,...,...,...
995_3589330,102.628539,25.470548,29.375799,24.804110,22.978082
995_3589334,97.453425,22.813699,26.830137,24.817808,22.991781
995_3589358,100.111416,22.838356,29.414155,24.842466,23.016438
9_3292267,49.641096,30.838356,0.000000,18.802740,0.000000


In [None]:
#the avg_age column was sum up again and no longer represent the average age
age_stat.drop(columns = ['avg_age'],inplace=True)

In [None]:
#merge the result dataframe with the age stats dataframe
home_result = home_result.merge(age_stat, how='left', on='club_game')
home_result.rename(columns={'avg_age_attack':'h_avg_age_attack', 'avg_age_defender':'h_avg_age_defender', 'avg_age_midfield':'h_avg_age_midfield', 'avg_age_goalkeeper':'h_avg_age_goalkeeper'},inplace=True)
home_result.head()

Unnamed: 0,club_game,result,game_id,competition_code,season,round,date,home_club_id,home_club_goals,home_club_position,h_player_number,h_num_attack,h_num_defender,h_num_midfield,h_num_goalkeeper,h_attack_ratio,h_defender_ratio,h_midfield_ratio,h_avg_age_team,h_avg_height_team,h_avg_age_attack,h_avg_age_defender,h_avg_age_midfield,h_avg_age_goalkeeper
0,1269_2457642,homewin,2457642,NLSC,2014,Final,2014-08-03,1269,1,0.0,13.0,2.0,4.0,6.0,1.0,0.153846,0.307692,0.461538,25.197471,180.461538,21.578082,28.562329,22.713699,33.879452
1,58_2639088,homewin,2639088,BESC,2013,Final,2014-07-20,58,2,0.0,14.0,5.0,5.0,3.0,1.0,0.357143,0.357143,0.214286,22.627397,180.785714,22.262466,24.601096,20.237443,21.753425
2,418_2481145,draw,2481145,SUC,2014,final 1st leg,2014-08-19,418,1,0.0,13.0,4.0,4.0,4.0,1.0,0.307692,0.307692,0.307692,27.648894,181.153846,26.969178,27.203425,27.368493,33.271233
3,294_2484338,homewin,2484338,POSU,2014,Final,2014-08-10,294,3,0.0,14.0,5.0,4.0,4.0,1.0,0.357143,0.285714,0.285714,27.850881,182.428571,25.666849,30.741781,26.262329,33.561644
4,583_2502472,homewin,2502472,FRCH,2014,Final,2014-08-02,583,2,0.0,14.0,4.0,4.0,5.0,1.0,0.285714,0.285714,0.357143,25.655577,181.5,25.896575,25.787671,24.973699,27.572603


In [None]:
away_result = away_result.merge(age_stat, how='left', on='club_game')
away_result.rename(columns={'avg_age_attack':'a_avg_age_attack', 'avg_age_defender':'a_avg_age_defender', 'avg_age_midfield':'a_avg_age_midfield', 'avg_age_goalkeeper':'a_avg_age_goalkeeper'},inplace=True)
away_result.head()

Unnamed: 0,club_game,result,game_id,competition_code,season,round,date,away_club_id,away_club_goals,away_club_position,a_player_number,a_num_attack,a_num_defender,a_num_midfield,a_num_goalkeeper,a_attack_ratio,a_defender_ratio,a_midfield_ratio,a_avg_age_team,a_avg_height_team,a_avg_age_attack,a_avg_age_defender,a_avg_age_midfield,a_avg_age_goalkeeper
0,610_2457642,homewin,2457642,NLSC,2014,Final,2014-08-03,610,0,0.0,14.0,5.0,5.0,3.0,1.0,0.357143,0.357143,0.214286,23.325636,181.928571,20.721096,24.075616,24.66484,28.580822
1,498_2639088,homewin,2639088,BESC,2013,Final,2014-07-20,498,1,0.0,14.0,5.0,4.0,4.0,1.0,0.357143,0.285714,0.285714,26.525832,183.285714,25.482192,26.853425,27.467123,26.668493
2,13_2481145,draw,2481145,SUC,2014,final 1st leg,2014-08-19,13,1,0.0,14.0,4.0,6.0,3.0,1.0,0.285714,0.428571,0.214286,27.065949,183.857143,25.782192,28.646575,24.505023,30.4
3,2425_2484338,homewin,2484338,POSU,2014,Final,2014-08-10,2425,2,0.0,14.0,4.0,4.0,5.0,1.0,0.285714,0.285714,0.357143,24.600587,181.0,22.578082,24.973973,24.036712,34.016438
4,855_2502472,homewin,2502472,FRCH,2014,Final,2014-08-02,855,0,0.0,14.0,5.0,4.0,4.0,1.0,0.357143,0.285714,0.285714,29.161057,179.642857,26.285479,31.278082,31.810274,24.473973


#### 5. Height attribute
Group the player_appearance by club_game ID and position </br>

obtain the following attributes:
'avg_height_attack','avg_height_defender','avg_height_midfield','avg_height_goalkeeper',


In [None]:
avg_height = group_by_posi['height_in_cm'].mean().reset_index(name='avg_height')

height_list = ['avg_height_attack','avg_height_defender','avg_height_midfield','avg_height_goalkeeper']
for i in height_list:
    avg_height[i] = ""
    
def hgt_at(x):
    if x["position"] == "Attack":
        return x["avg_height"]

def hgt_df(x):
    if x["position"] == "Defender":
        return x["avg_height"]

def hgt_mf(x):
    if x["position"] == "Midfield":
        return x["avg_height"]

def hgt_gk(x):
    if x["position"] == "Goalkeeper":
        return x["avg_height"]


avg_height.loc[:,'avg_height_attack'] = avg_height.apply(hgt_at,axis=1)
avg_height.loc[:,'avg_height_defender'] = avg_height.apply(hgt_df,axis=1)
avg_height.loc[:,'avg_height_midfield'] = avg_height.apply(hgt_mf,axis=1)
avg_height.loc[:,'avg_height_goalkeeper'] = avg_height.apply(hgt_gk,axis=1)

avg_height

Unnamed: 0,club_game,position,avg_height,avg_height_attack,avg_height_defender,avg_height_midfield,avg_height_goalkeeper
0,1002_3052476,Defender,195.00,,195.0,,
1,1002_3052491,Defender,195.00,,195.0,,
2,1002_3076559,Defender,195.00,,195.0,,
3,1002_3076560,Defender,195.00,,195.0,,
4,1003_2460298,Attack,182.00,182.0,,,
...,...,...,...,...,...,...,...
288709,995_3589358,Goalkeeper,196.00,,,,196.0
288710,995_3589358,Midfield,177.75,,,177.75,
288711,9_3292267,Attack,184.00,184.0,,,
288712,9_3292267,Midfield,177.00,,,177.00,


In [None]:
hgt_group = avg_height.groupby(['club_game'])
height_stat = hgt_group.sum()
height_stat.drop(columns = ['avg_height'],inplace=True)
height_stat

Unnamed: 0_level_0,avg_height_attack,avg_height_defender,avg_height_midfield,avg_height_goalkeeper
club_game,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1002_3052476,0.00,195.000000,0.00,0.0
1002_3052491,0.00,195.000000,0.00,0.0
1002_3076559,0.00,195.000000,0.00,0.0
1002_3076560,0.00,195.000000,0.00,0.0
1003_2460298,182.00,183.500000,180.50,189.0
...,...,...,...,...
995_3589330,181.75,179.333333,177.75,196.0
995_3589334,180.50,179.800000,177.75,196.0
995_3589358,180.50,179.333333,177.75,196.0
9_3292267,184.00,0.000000,177.00,0.0


In [None]:
#merge the result dataframe with the height stats dataframe
home_result = home_result.merge(height_stat, how='left', on='club_game')
home_result.rename(columns={'avg_height_attack':'h_avg_height_attack', 'avg_height_defender':'h_avg_height_defender', 'avg_height_midfield':'h_avg_height_midfield', 'avg_height_goalkeeper':'h_avg_height_goalkeeper'},inplace=True)
home_result.head()

Unnamed: 0,club_game,result,game_id,competition_code,season,round,date,home_club_id,home_club_goals,home_club_position,h_player_number,h_num_attack,h_num_defender,h_num_midfield,h_num_goalkeeper,h_attack_ratio,h_defender_ratio,h_midfield_ratio,h_avg_age_team,h_avg_height_team,h_avg_age_attack,h_avg_age_defender,h_avg_age_midfield,h_avg_age_goalkeeper,h_avg_height_attack,h_avg_height_defender,h_avg_height_midfield,h_avg_height_goalkeeper
0,1269_2457642,homewin,2457642,NLSC,2014,Final,2014-08-03,1269,1,0.0,13.0,2.0,4.0,6.0,1.0,0.153846,0.307692,0.461538,25.197471,180.461538,21.578082,28.562329,22.713699,33.879452,182.5,181.0,177.166667,194.0
1,58_2639088,homewin,2639088,BESC,2013,Final,2014-07-20,58,2,0.0,14.0,5.0,5.0,3.0,1.0,0.357143,0.357143,0.214286,22.627397,180.785714,22.262466,24.601096,20.237443,21.753425,178.0,181.6,181.0,190.0
2,418_2481145,draw,2481145,SUC,2014,final 1st leg,2014-08-19,418,1,0.0,13.0,4.0,4.0,4.0,1.0,0.307692,0.307692,0.307692,27.648894,181.153846,26.969178,27.203425,27.368493,33.271233,184.0,179.5,179.75,182.0
3,294_2484338,homewin,2484338,POSU,2014,Final,2014-08-10,294,3,0.0,14.0,5.0,4.0,4.0,1.0,0.357143,0.285714,0.285714,27.850881,182.428571,25.666849,30.741781,26.262329,33.561644,180.8,184.0,180.25,193.0
4,583_2502472,homewin,2502472,FRCH,2014,Final,2014-08-02,583,2,0.0,14.0,4.0,4.0,5.0,1.0,0.285714,0.285714,0.357143,25.655577,181.5,25.896575,25.787671,24.973699,27.572603,183.25,181.5,178.0,192.0


In [None]:
#merge the result dataframe with the height stats dataframe
away_result = away_result.merge(height_stat, how='left', on='club_game')
away_result.rename(columns={'avg_height_attack':'a_avg_height_attack', 'avg_height_defender':'a_avg_height_defender', 'avg_height_midfield':'a_avg_height_midfield', 'avg_height_goalkeeper':'a_avg_height_goalkeeper'},inplace=True)
away_result.head()

Unnamed: 0,club_game,result,game_id,competition_code,season,round,date,away_club_id,away_club_goals,away_club_position,a_player_number,a_num_attack,a_num_defender,a_num_midfield,a_num_goalkeeper,a_attack_ratio,a_defender_ratio,a_midfield_ratio,a_avg_age_team,a_avg_height_team,a_avg_age_attack,a_avg_age_defender,a_avg_age_midfield,a_avg_age_goalkeeper,a_avg_height_attack,a_avg_height_defender,a_avg_height_midfield,a_avg_height_goalkeeper
0,610_2457642,homewin,2457642,NLSC,2014,Final,2014-08-03,610,0,0.0,14.0,5.0,5.0,3.0,1.0,0.357143,0.357143,0.214286,23.325636,181.928571,20.721096,24.075616,24.66484,28.580822,186.4,182.4,173.666667,182.0
1,498_2639088,homewin,2639088,BESC,2013,Final,2014-07-20,498,1,0.0,14.0,5.0,4.0,4.0,1.0,0.357143,0.285714,0.285714,26.525832,183.285714,25.482192,26.853425,27.467123,26.668493,182.4,183.5,182.25,191.0
2,13_2481145,draw,2481145,SUC,2014,final 1st leg,2014-08-19,13,1,0.0,14.0,4.0,6.0,3.0,1.0,0.285714,0.428571,0.214286,27.065949,183.857143,25.782192,28.646575,24.505023,30.4,184.5,184.333333,180.333333,189.0
3,2425_2484338,homewin,2484338,POSU,2014,Final,2014-08-10,2425,2,0.0,14.0,4.0,4.0,5.0,1.0,0.285714,0.285714,0.357143,24.600587,181.0,22.578082,24.973973,24.036712,34.016438,178.75,183.0,180.6,184.0
4,855_2502472,homewin,2502472,FRCH,2014,Final,2014-08-02,855,0,0.0,14.0,5.0,4.0,4.0,1.0,0.357143,0.285714,0.285714,29.161057,179.642857,26.285479,31.278082,31.810274,24.473973,180.4,180.25,173.5,198.0


#### 6. Nationality

group player_appearance by continent

obtain the following attributes:
'Europe_num','North_America_num','South_America_num','Asia_num','Oceania_num','Africa_num'

In [None]:
group_by_ctn = player_appearance.groupby(['club_game','continent'])
ctn_count = group_by_ctn['player_id'].count().reset_index(name='count')
ctn_list = ['Europe_num','North_America_num','South_America_num','Asia_num','Oceania_num','Africa_num']
for i in ctn_list:
    ctn_count[i] = ""

def eur(x):
    if x["continent"] == "Europe":
        return x["count"]

def afr(x):
    if x["continent"] == "Africa":
        return x["count"]

def nam(x):
    if x["continent"] == "North America":
        return x["count"]

def sam(x):
    if x["continent"] == "South America":
        return x["count"]

def asi(x):
    if x["continent"] == "Asia":
        return x["count"]

def oce(x):
    if x["continent"] == "Oceania":
        return x["count"]

ctn_count.loc[:,'Europe_num'] = ctn_count.apply(eur,axis=1)
ctn_count.loc[:,'North_America_num'] = ctn_count.apply(nam,axis=1)
ctn_count.loc[:,'South_America_num'] = ctn_count.apply(sam,axis=1)
ctn_count.loc[:,'Asia_num'] = ctn_count.apply(asi,axis=1)
ctn_count.loc[:,'Oceania_num'] = ctn_count.apply(oce,axis=1)
ctn_count.loc[:,'Africa_num'] = ctn_count.apply(afr,axis=1)


ctn_count

Unnamed: 0,club_game,continent,count,Europe_num,North_America_num,South_America_num,Asia_num,Oceania_num,Africa_num
0,1002_3052476,Europe,1,1.0,,,,,
1,1002_3052491,Europe,1,1.0,,,,,
2,1002_3076559,Europe,1,1.0,,,,,
3,1002_3076560,Europe,1,1.0,,,,,
4,1003_2460298,Africa,2,,,,,,2.0
...,...,...,...,...,...,...,...,...,...
209655,995_3589358,Africa,2,,,,,,2.0
209656,995_3589358,Europe,11,11.0,,,,,
209657,995_3589358,South America,2,,,2.0,,,
209658,9_3292267,Europe,2,2.0,,,,,


In [None]:
ctn_group = ctn_count.groupby(['club_game'])
ctn_stat = ctn_group.sum()

ctn_stat.eval('EU_ratio = Europe_num / count', inplace=True)
ctn_stat.eval('NA_ratio = North_America_num / count', inplace=True)
ctn_stat.eval('SA_ratio = South_America_num / count', inplace=True)
ctn_stat.eval('AS_ratio = Asia_num / count', inplace=True)
ctn_stat.eval('AF_ratio = Oceania_num / count', inplace=True)
ctn_stat.eval('OC_ratio = Africa_num / count', inplace=True)

ctn_stat.drop(columns = ['count'],inplace=True)
ctn_stat

Unnamed: 0_level_0,Europe_num,North_America_num,South_America_num,Asia_num,Oceania_num,Africa_num,EU_ratio,NA_ratio,SA_ratio,AS_ratio,AF_ratio,OC_ratio
club_game,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
1002_3052476,1.0,0.0,0.0,0.0,0.0,0.0,1.000000,0.000000,0.000000,0.0,0.000000,0.000000
1002_3052491,1.0,0.0,0.0,0.0,0.0,0.0,1.000000,0.000000,0.000000,0.0,0.000000,0.000000
1002_3076559,1.0,0.0,0.0,0.0,0.0,0.0,1.000000,0.000000,0.000000,0.0,0.000000,0.000000
1002_3076560,1.0,0.0,0.0,0.0,0.0,0.0,1.000000,0.000000,0.000000,0.0,0.000000,0.000000
1003_2460298,8.0,2.0,1.0,0.0,1.0,2.0,0.571429,0.142857,0.071429,0.0,0.071429,0.142857
...,...,...,...,...,...,...,...,...,...,...,...,...
995_3589330,11.0,0.0,2.0,0.0,0.0,2.0,0.733333,0.000000,0.133333,0.0,0.000000,0.133333
995_3589334,9.0,0.0,2.0,0.0,0.0,3.0,0.642857,0.000000,0.142857,0.0,0.000000,0.214286
995_3589358,11.0,0.0,2.0,0.0,0.0,2.0,0.733333,0.000000,0.133333,0.0,0.000000,0.133333
9_3292267,2.0,0.0,0.0,0.0,0.0,0.0,1.000000,0.000000,0.000000,0.0,0.000000,0.000000


In [None]:
print(list(ctn_stat))

['Europe_num', 'North_America_num', 'South_America_num', 'Asia_num', 'Oceania_num', 'Africa_num', 'EU_ratio', 'NA_ratio', 'SA_ratio', 'AS_ratio', 'AF_ratio', 'OC_ratio']


'Europe_num':'h_Europe_num', 'North_America_num':'h_North_America_num', 'South_America_num':'h_South_America_num', 'Asia_num':'h_Asia_num', 'Oceania_num':'h_Oceania_num', 'Africa_num':'h_Africa_num', 'EU_ratio':'h_EU_ratio', 'NA_ratio':'h_NA_ratio', 'SA_ratio':'h_SA_ratio', 'AS_ratio':'h_AS_ratio', 'AF_ratio':'h_AF_ratio', 'OC_ratio':'h_OC_ratio']

'Europe_num':'a_Europe_num', 'North_America_num':'a_North_America_num', 'South_America_num':'a_South_America_num', 'Asia_num':'a_Asia_num', 'Oceania_num':'a_Oceania_num', 'Africa_num':'a_Africa_num', 'EU_ratio':'a_EU_ratio', 'NA_ratio':'a_NA_ratio', 'SA_ratio':'a_SA_ratio', 'AS_ratio':'a_AS_ratio', 'AF_ratio':'a_AF_ratio', 'OC_ratio':'a_OC_ratio']

In [None]:
#merge the result dataframe with the height stats dataframe
home_result = home_result.merge(ctn_stat, how='left', on='club_game')
home_result.rename(columns={'Europe_num':'h_Europe_num', 'North_America_num':'h_North_America_num', 'South_America_num':'h_South_America_num', 'Asia_num':'h_Asia_num', 'Oceania_num':'h_Oceania_num', 'Africa_num':'h_Africa_num', 'EU_ratio':'h_EU_ratio', 'NA_ratio':'h_NA_ratio', 'SA_ratio':'h_SA_ratio', 'AS_ratio':'h_AS_ratio', 'AF_ratio':'h_AF_ratio', 'OC_ratio':'h_OC_ratio'},inplace=True)
home_result.head()

Unnamed: 0,club_game,result,game_id,competition_code,season,round,date,home_club_id,home_club_goals,home_club_position,h_player_number,h_num_attack,h_num_defender,h_num_midfield,h_num_goalkeeper,h_attack_ratio,h_defender_ratio,h_midfield_ratio,h_avg_age_team,h_avg_height_team,h_avg_age_attack,h_avg_age_defender,h_avg_age_midfield,h_avg_age_goalkeeper,h_avg_height_attack,h_avg_height_defender,h_avg_height_midfield,h_avg_height_goalkeeper,h_Europe_num,h_North_America_num,h_South_America_num,h_Asia_num,h_Oceania_num,h_Africa_num,h_EU_ratio,h_NA_ratio,h_SA_ratio,h_AS_ratio,h_AF_ratio,h_OC_ratio
0,1269_2457642,homewin,2457642,NLSC,2014,Final,2014-08-03,1269,1,0.0,13.0,2.0,4.0,6.0,1.0,0.153846,0.307692,0.461538,25.197471,180.461538,21.578082,28.562329,22.713699,33.879452,182.5,181.0,177.166667,194.0,10.0,0.0,0.0,0.0,2.0,1.0,0.769231,0.0,0.0,0.0,0.153846,0.076923
1,58_2639088,homewin,2639088,BESC,2013,Final,2014-07-20,58,2,0.0,14.0,5.0,5.0,3.0,1.0,0.357143,0.357143,0.214286,22.627397,180.785714,22.262466,24.601096,20.237443,21.753425,178.0,181.6,181.0,190.0,8.0,1.0,1.0,0.0,0.0,4.0,0.571429,0.071429,0.071429,0.0,0.0,0.285714
2,418_2481145,draw,2481145,SUC,2014,final 1st leg,2014-08-19,418,1,0.0,13.0,4.0,4.0,4.0,1.0,0.307692,0.307692,0.307692,27.648894,181.153846,26.969178,27.203425,27.368493,33.271233,184.0,179.5,179.75,182.0,10.0,0.0,3.0,0.0,0.0,0.0,0.769231,0.0,0.230769,0.0,0.0,0.0
3,294_2484338,homewin,2484338,POSU,2014,Final,2014-08-10,294,3,0.0,14.0,5.0,4.0,4.0,1.0,0.357143,0.285714,0.285714,27.850881,182.428571,25.666849,30.741781,26.262329,33.561644,180.8,184.0,180.25,193.0,4.0,0.0,10.0,0.0,0.0,0.0,0.285714,0.0,0.714286,0.0,0.0,0.0
4,583_2502472,homewin,2502472,FRCH,2014,Final,2014-08-02,583,2,0.0,14.0,4.0,4.0,5.0,1.0,0.285714,0.285714,0.357143,25.655577,181.5,25.896575,25.787671,24.973699,27.572603,183.25,181.5,178.0,192.0,10.0,0.0,4.0,0.0,0.0,0.0,0.714286,0.0,0.285714,0.0,0.0,0.0


In [None]:
#merge the result dataframe with the height stats dataframe
away_result = away_result.merge(ctn_stat, how='left', on='club_game')
away_result.rename(columns={'Europe_num':'a_Europe_num', 'North_America_num':'a_North_America_num', 'South_America_num':'a_South_America_num', 'Asia_num':'a_Asia_num', 'Oceania_num':'a_Oceania_num', 'Africa_num':'a_Africa_num', 'EU_ratio':'a_EU_ratio', 'NA_ratio':'a_NA_ratio', 'SA_ratio':'a_SA_ratio', 'AS_ratio':'a_AS_ratio', 'AF_ratio':'a_AF_ratio', 'OC_ratio':'a_OC_ratio'},inplace=True)
away_result.head()

Unnamed: 0,club_game,result,game_id,competition_code,season,round,date,away_club_id,away_club_goals,away_club_position,a_player_number,a_num_attack,a_num_defender,a_num_midfield,a_num_goalkeeper,a_attack_ratio,a_defender_ratio,a_midfield_ratio,a_avg_age_team,a_avg_height_team,a_avg_age_attack,a_avg_age_defender,a_avg_age_midfield,a_avg_age_goalkeeper,a_avg_height_attack,a_avg_height_defender,a_avg_height_midfield,a_avg_height_goalkeeper,a_Europe_num,a_North_America_num,a_South_America_num,a_Asia_num,a_Oceania_num,a_Africa_num,a_EU_ratio,a_NA_ratio,a_SA_ratio,a_AS_ratio,a_AF_ratio,a_OC_ratio
0,610_2457642,homewin,2457642,NLSC,2014,Final,2014-08-03,610,0,0.0,14.0,5.0,5.0,3.0,1.0,0.357143,0.357143,0.214286,23.325636,181.928571,20.721096,24.075616,24.66484,28.580822,186.4,182.4,173.666667,182.0,13.0,0.0,0.0,0.0,0.0,1.0,0.928571,0.0,0.0,0.0,0.0,0.071429
1,498_2639088,homewin,2639088,BESC,2013,Final,2014-07-20,498,1,0.0,14.0,5.0,4.0,4.0,1.0,0.357143,0.285714,0.285714,26.525832,183.285714,25.482192,26.853425,27.467123,26.668493,182.4,183.5,182.25,191.0,9.0,0.0,2.0,0.0,0.0,3.0,0.642857,0.0,0.142857,0.0,0.0,0.214286
2,13_2481145,draw,2481145,SUC,2014,final 1st leg,2014-08-19,13,1,0.0,14.0,4.0,6.0,3.0,1.0,0.285714,0.428571,0.214286,27.065949,183.857143,25.782192,28.646575,24.505023,30.4,184.5,184.333333,180.333333,189.0,9.0,1.0,4.0,0.0,0.0,0.0,0.642857,0.071429,0.285714,0.0,0.0,0.0
3,2425_2484338,homewin,2484338,POSU,2014,Final,2014-08-10,2425,2,0.0,14.0,4.0,4.0,5.0,1.0,0.285714,0.285714,0.357143,24.600587,181.0,22.578082,24.973973,24.036712,34.016438,178.75,183.0,180.6,184.0,6.0,0.0,5.0,0.0,0.0,3.0,0.428571,0.0,0.357143,0.0,0.0,0.214286
4,855_2502472,homewin,2502472,FRCH,2014,Final,2014-08-02,855,0,0.0,14.0,5.0,4.0,4.0,1.0,0.357143,0.285714,0.285714,29.161057,179.642857,26.285479,31.278082,31.810274,24.473973,180.4,180.25,173.5,198.0,10.0,1.0,0.0,0.0,0.0,3.0,0.714286,0.071429,0.0,0.0,0.0,0.214286


#### 7. Foot of the team

group player_appearance by foot

obtain the following attributes:
'left_num', 'right_num', "both_num', 'left_ratio','right_ratio', 'both_ratio'

In [None]:
group_by_ft = player_appearance.groupby(['club_game','foot'])
ft_count = group_by_ft['player_id'].count().reset_index(name='count')


ft_list = ['left_num','right_num', 'both_num']
for i in ft_list:
    ft_count[i] = ""

def right(x):
    if x["foot"] == "Right":
        return x["count"]

def left(x):
    if x["foot"] == "Left" :
        return x["count"]

def both(x):
    if x["foot"] == "Both":
        return x["count"]


ft_count.loc[:,'left_num'] = ft_count.apply(left,axis=1)
ft_count.loc[:,'right_num'] = ft_count.apply(right,axis=1)
ft_count.loc[:,'both_num'] = ft_count.apply(both,axis=1)

ft_count

Unnamed: 0,club_game,foot,count,left_num,right_num,both_num
0,1002_3052476,Right,1,,1.0,
1,1002_3052491,Right,1,,1.0,
2,1002_3076559,Right,1,,1.0,
3,1002_3076560,Right,1,,1.0,
4,1003_2460298,Left,4,4.0,,
...,...,...,...,...,...,...
178598,995_3589358,Both,1,,,1.0
178599,995_3589358,Left,4,4.0,,
178600,995_3589358,Right,10,,10.0,
178601,9_3292267,Right,2,,2.0,


In [None]:
ft_group = ft_count.groupby(['club_game'])
ft_stat = ft_group.sum()

ft_stat.eval('left_ratio = left_num / count', inplace=True)
ft_stat.eval('right_ratio = right_num / count', inplace=True)
ft_stat.eval('both_ratio = both_num / count', inplace=True)
ft_stat.drop(columns = ['count'],inplace=True)
ft_stat

Unnamed: 0_level_0,left_num,right_num,both_num,left_ratio,right_ratio,both_ratio
club_game,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1002_3052476,0.0,1.0,0.0,0.000000,1.000000,0.000000
1002_3052491,0.0,1.0,0.0,0.000000,1.000000,0.000000
1002_3076559,0.0,1.0,0.0,0.000000,1.000000,0.000000
1002_3076560,0.0,1.0,0.0,0.000000,1.000000,0.000000
1003_2460298,4.0,10.0,0.0,0.285714,0.714286,0.000000
...,...,...,...,...,...,...
995_3589330,4.0,10.0,1.0,0.266667,0.666667,0.066667
995_3589334,3.0,10.0,1.0,0.214286,0.714286,0.071429
995_3589358,4.0,10.0,1.0,0.266667,0.666667,0.066667
9_3292267,0.0,2.0,0.0,0.000000,1.000000,0.000000


In [None]:
print(list(ft_stat))

['left_num', 'right_num', 'both_num', 'left_ratio', 'right_ratio', 'both_ratio']


'left_num':'h_left_num', 'right_num':'h_right_num', 'both_num':'h_both_num', 'left_ratio':'h_left_ratio', 'right_ratio':'h_right_ratio', 'both_ratio':'h_both_ratio'

'left_num':'a_left_num', 'right_num':'a_right_num', 'both_num':'a_both_num', 'left_ratio':'a_left_ratio', 'right_ratio':'a_right_ratio', 'both_ratio':'a_both_ratio'

In [None]:
#merge the result dataframe with the height stats dataframe
home_result = home_result.merge(ft_stat, how='left', on='club_game')
home_result.rename(columns={'left_num':'h_left_num', 'right_num':'h_right_num', 'both_num':'h_both_num', 'left_ratio':'h_left_ratio', 'right_ratio':'h_right_ratio', 'both_ratio':'h_both_ratio'},inplace=True)
home_result.head()

Unnamed: 0,club_game,result,game_id,competition_code,season,round,date,home_club_id,home_club_goals,home_club_position,h_player_number,h_num_attack,h_num_defender,h_num_midfield,h_num_goalkeeper,h_attack_ratio,h_defender_ratio,h_midfield_ratio,h_avg_age_team,h_avg_height_team,h_avg_age_attack,h_avg_age_defender,h_avg_age_midfield,h_avg_age_goalkeeper,h_avg_height_attack,h_avg_height_defender,h_avg_height_midfield,h_avg_height_goalkeeper,h_Europe_num,h_North_America_num,h_South_America_num,h_Asia_num,h_Oceania_num,h_Africa_num,h_EU_ratio,h_NA_ratio,h_SA_ratio,h_AS_ratio,h_AF_ratio,h_OC_ratio,h_left_num,h_right_num,h_both_num,h_left_ratio,h_right_ratio,h_both_ratio
0,1269_2457642,homewin,2457642,NLSC,2014,Final,2014-08-03,1269,1,0.0,13.0,2.0,4.0,6.0,1.0,0.153846,0.307692,0.461538,25.197471,180.461538,21.578082,28.562329,22.713699,33.879452,182.5,181.0,177.166667,194.0,10.0,0.0,0.0,0.0,2.0,1.0,0.769231,0.0,0.0,0.0,0.153846,0.076923,3.0,8.0,2.0,0.230769,0.615385,0.153846
1,58_2639088,homewin,2639088,BESC,2013,Final,2014-07-20,58,2,0.0,14.0,5.0,5.0,3.0,1.0,0.357143,0.357143,0.214286,22.627397,180.785714,22.262466,24.601096,20.237443,21.753425,178.0,181.6,181.0,190.0,8.0,1.0,1.0,0.0,0.0,4.0,0.571429,0.071429,0.071429,0.0,0.0,0.285714,4.0,9.0,1.0,0.285714,0.642857,0.071429
2,418_2481145,draw,2481145,SUC,2014,final 1st leg,2014-08-19,418,1,0.0,13.0,4.0,4.0,4.0,1.0,0.307692,0.307692,0.307692,27.648894,181.153846,26.969178,27.203425,27.368493,33.271233,184.0,179.5,179.75,182.0,10.0,0.0,3.0,0.0,0.0,0.0,0.769231,0.0,0.230769,0.0,0.0,0.0,5.0,6.0,2.0,0.384615,0.461538,0.153846
3,294_2484338,homewin,2484338,POSU,2014,Final,2014-08-10,294,3,0.0,14.0,5.0,4.0,4.0,1.0,0.357143,0.285714,0.285714,27.850881,182.428571,25.666849,30.741781,26.262329,33.561644,180.8,184.0,180.25,193.0,4.0,0.0,10.0,0.0,0.0,0.0,0.285714,0.0,0.714286,0.0,0.0,0.0,3.0,11.0,0.0,0.214286,0.785714,0.0
4,583_2502472,homewin,2502472,FRCH,2014,Final,2014-08-02,583,2,0.0,14.0,4.0,4.0,5.0,1.0,0.285714,0.285714,0.357143,25.655577,181.5,25.896575,25.787671,24.973699,27.572603,183.25,181.5,178.0,192.0,10.0,0.0,4.0,0.0,0.0,0.0,0.714286,0.0,0.285714,0.0,0.0,0.0,2.0,10.0,2.0,0.142857,0.714286,0.142857


In [None]:
#merge the result dataframe with the height stats dataframe
away_result = away_result.merge(ft_stat, how='left', on='club_game')
away_result.rename(columns={'left_num':'a_left_num', 'right_num':'a_right_num', 'both_num':'a_both_num', 'left_ratio':'a_left_ratio', 'right_ratio':'a_right_ratio', 'both_ratio':'a_both_ratio'},inplace=True)
away_result.head()

Unnamed: 0,club_game,result,game_id,competition_code,season,round,date,away_club_id,away_club_goals,away_club_position,a_player_number,a_num_attack,a_num_defender,a_num_midfield,a_num_goalkeeper,a_attack_ratio,a_defender_ratio,a_midfield_ratio,a_avg_age_team,a_avg_height_team,a_avg_age_attack,a_avg_age_defender,a_avg_age_midfield,a_avg_age_goalkeeper,a_avg_height_attack,a_avg_height_defender,a_avg_height_midfield,a_avg_height_goalkeeper,a_Europe_num,a_North_America_num,a_South_America_num,a_Asia_num,a_Oceania_num,a_Africa_num,a_EU_ratio,a_NA_ratio,a_SA_ratio,a_AS_ratio,a_AF_ratio,a_OC_ratio,a_left_num,a_right_num,a_both_num,a_left_ratio,a_right_ratio,a_both_ratio
0,610_2457642,homewin,2457642,NLSC,2014,Final,2014-08-03,610,0,0.0,14.0,5.0,5.0,3.0,1.0,0.357143,0.357143,0.214286,23.325636,181.928571,20.721096,24.075616,24.66484,28.580822,186.4,182.4,173.666667,182.0,13.0,0.0,0.0,0.0,0.0,1.0,0.928571,0.0,0.0,0.0,0.0,0.071429,5.0,8.0,1.0,0.357143,0.571429,0.071429
1,498_2639088,homewin,2639088,BESC,2013,Final,2014-07-20,498,1,0.0,14.0,5.0,4.0,4.0,1.0,0.357143,0.285714,0.285714,26.525832,183.285714,25.482192,26.853425,27.467123,26.668493,182.4,183.5,182.25,191.0,9.0,0.0,2.0,0.0,0.0,3.0,0.642857,0.0,0.142857,0.0,0.0,0.214286,2.0,12.0,0.0,0.142857,0.857143,0.0
2,13_2481145,draw,2481145,SUC,2014,final 1st leg,2014-08-19,13,1,0.0,14.0,4.0,6.0,3.0,1.0,0.285714,0.428571,0.214286,27.065949,183.857143,25.782192,28.646575,24.505023,30.4,184.5,184.333333,180.333333,189.0,9.0,1.0,4.0,0.0,0.0,0.0,0.642857,0.071429,0.285714,0.0,0.0,0.0,3.0,10.0,1.0,0.214286,0.714286,0.071429
3,2425_2484338,homewin,2484338,POSU,2014,Final,2014-08-10,2425,2,0.0,14.0,4.0,4.0,5.0,1.0,0.285714,0.285714,0.357143,24.600587,181.0,22.578082,24.973973,24.036712,34.016438,178.75,183.0,180.6,184.0,6.0,0.0,5.0,0.0,0.0,3.0,0.428571,0.0,0.357143,0.0,0.0,0.214286,3.0,11.0,0.0,0.214286,0.785714,0.0
4,855_2502472,homewin,2502472,FRCH,2014,Final,2014-08-02,855,0,0.0,14.0,5.0,4.0,4.0,1.0,0.357143,0.285714,0.285714,29.161057,179.642857,26.285479,31.278082,31.810274,24.473973,180.4,180.25,173.5,198.0,10.0,1.0,0.0,0.0,0.0,3.0,0.714286,0.071429,0.0,0.0,0.0,0.214286,2.0,10.0,2.0,0.142857,0.714286,0.142857


In [None]:
print(list(away_result))

['club_game', 'result', 'game_id', 'competition_code', 'season', 'round', 'date', 'away_club_id', 'away_club_goals', 'away_club_position', 'a_player_number', 'a_num_attack', 'a_num_defender', 'a_num_midfield', 'a_num_goalkeeper', 'a_attack_ratio', 'a_defender_ratio', 'a_midfield_ratio', 'a_avg_age_team', 'a_avg_height_team', 'a_avg_age_attack', 'a_avg_age_defender', 'a_avg_age_midfield', 'a_avg_age_goalkeeper', 'a_avg_height_attack', 'a_avg_height_defender', 'a_avg_height_midfield', 'a_avg_height_goalkeeper', 'a_Europe_num', 'a_North_America_num', 'a_South_America_num', 'a_Asia_num', 'a_Oceania_num', 'a_Africa_num', 'a_EU_ratio', 'a_NA_ratio', 'a_SA_ratio', 'a_AS_ratio', 'a_AF_ratio', 'a_OC_ratio', 'a_left_num', 'a_right_num', 'a_both_num', 'a_left_ratio', 'a_right_ratio', 'a_both_ratio']


In [None]:
away_result.drop(columns = ['result', 'competition_code', 'season', 'round', 'date'],inplace=True)

#### Merge the home and away dataframe

In [None]:
result_new = home_result.merge(away_result, how='inner', on='game_id')
result_new.drop(columns = ['club_game_x','club_game_y'],inplace=True)
result_new.head()

Unnamed: 0,result,game_id,competition_code,season,round,date,home_club_id,home_club_goals,home_club_position,h_player_number,h_num_attack,h_num_defender,h_num_midfield,h_num_goalkeeper,h_attack_ratio,h_defender_ratio,h_midfield_ratio,h_avg_age_team,h_avg_height_team,h_avg_age_attack,h_avg_age_defender,h_avg_age_midfield,h_avg_age_goalkeeper,h_avg_height_attack,h_avg_height_defender,h_avg_height_midfield,h_avg_height_goalkeeper,h_Europe_num,h_North_America_num,h_South_America_num,h_Asia_num,h_Oceania_num,h_Africa_num,h_EU_ratio,h_NA_ratio,h_SA_ratio,h_AS_ratio,h_AF_ratio,h_OC_ratio,h_left_num,h_right_num,h_both_num,h_left_ratio,h_right_ratio,h_both_ratio,away_club_id,away_club_goals,away_club_position,a_player_number,a_num_attack,a_num_defender,a_num_midfield,a_num_goalkeeper,a_attack_ratio,a_defender_ratio,a_midfield_ratio,a_avg_age_team,a_avg_height_team,a_avg_age_attack,a_avg_age_defender,a_avg_age_midfield,a_avg_age_goalkeeper,a_avg_height_attack,a_avg_height_defender,a_avg_height_midfield,a_avg_height_goalkeeper,a_Europe_num,a_North_America_num,a_South_America_num,a_Asia_num,a_Oceania_num,a_Africa_num,a_EU_ratio,a_NA_ratio,a_SA_ratio,a_AS_ratio,a_AF_ratio,a_OC_ratio,a_left_num,a_right_num,a_both_num,a_left_ratio,a_right_ratio,a_both_ratio
0,homewin,2457642,NLSC,2014,Final,2014-08-03,1269,1,0.0,13.0,2.0,4.0,6.0,1.0,0.153846,0.307692,0.461538,25.197471,180.461538,21.578082,28.562329,22.713699,33.879452,182.5,181.0,177.166667,194.0,10.0,0.0,0.0,0.0,2.0,1.0,0.769231,0.0,0.0,0.0,0.153846,0.076923,3.0,8.0,2.0,0.230769,0.615385,0.153846,610,0,0.0,14.0,5.0,5.0,3.0,1.0,0.357143,0.357143,0.214286,23.325636,181.928571,20.721096,24.075616,24.66484,28.580822,186.4,182.4,173.666667,182.0,13.0,0.0,0.0,0.0,0.0,1.0,0.928571,0.0,0.0,0.0,0.0,0.071429,5.0,8.0,1.0,0.357143,0.571429,0.071429
1,homewin,2639088,BESC,2013,Final,2014-07-20,58,2,0.0,14.0,5.0,5.0,3.0,1.0,0.357143,0.357143,0.214286,22.627397,180.785714,22.262466,24.601096,20.237443,21.753425,178.0,181.6,181.0,190.0,8.0,1.0,1.0,0.0,0.0,4.0,0.571429,0.071429,0.071429,0.0,0.0,0.285714,4.0,9.0,1.0,0.285714,0.642857,0.071429,498,1,0.0,14.0,5.0,4.0,4.0,1.0,0.357143,0.285714,0.285714,26.525832,183.285714,25.482192,26.853425,27.467123,26.668493,182.4,183.5,182.25,191.0,9.0,0.0,2.0,0.0,0.0,3.0,0.642857,0.0,0.142857,0.0,0.0,0.214286,2.0,12.0,0.0,0.142857,0.857143,0.0
2,draw,2481145,SUC,2014,final 1st leg,2014-08-19,418,1,0.0,13.0,4.0,4.0,4.0,1.0,0.307692,0.307692,0.307692,27.648894,181.153846,26.969178,27.203425,27.368493,33.271233,184.0,179.5,179.75,182.0,10.0,0.0,3.0,0.0,0.0,0.0,0.769231,0.0,0.230769,0.0,0.0,0.0,5.0,6.0,2.0,0.384615,0.461538,0.153846,13,1,0.0,14.0,4.0,6.0,3.0,1.0,0.285714,0.428571,0.214286,27.065949,183.857143,25.782192,28.646575,24.505023,30.4,184.5,184.333333,180.333333,189.0,9.0,1.0,4.0,0.0,0.0,0.0,0.642857,0.071429,0.285714,0.0,0.0,0.0,3.0,10.0,1.0,0.214286,0.714286,0.071429
3,homewin,2484338,POSU,2014,Final,2014-08-10,294,3,0.0,14.0,5.0,4.0,4.0,1.0,0.357143,0.285714,0.285714,27.850881,182.428571,25.666849,30.741781,26.262329,33.561644,180.8,184.0,180.25,193.0,4.0,0.0,10.0,0.0,0.0,0.0,0.285714,0.0,0.714286,0.0,0.0,0.0,3.0,11.0,0.0,0.214286,0.785714,0.0,2425,2,0.0,14.0,4.0,4.0,5.0,1.0,0.285714,0.285714,0.357143,24.600587,181.0,22.578082,24.973973,24.036712,34.016438,178.75,183.0,180.6,184.0,6.0,0.0,5.0,0.0,0.0,3.0,0.428571,0.0,0.357143,0.0,0.0,0.214286,3.0,11.0,0.0,0.214286,0.785714,0.0
4,homewin,2502472,FRCH,2014,Final,2014-08-02,583,2,0.0,14.0,4.0,4.0,5.0,1.0,0.285714,0.285714,0.357143,25.655577,181.5,25.896575,25.787671,24.973699,27.572603,183.25,181.5,178.0,192.0,10.0,0.0,4.0,0.0,0.0,0.0,0.714286,0.0,0.285714,0.0,0.0,0.0,2.0,10.0,2.0,0.142857,0.714286,0.142857,855,0,0.0,14.0,5.0,4.0,4.0,1.0,0.357143,0.285714,0.285714,29.161057,179.642857,26.285479,31.278082,31.810274,24.473973,180.4,180.25,173.5,198.0,10.0,1.0,0.0,0.0,0.0,3.0,0.714286,0.071429,0.0,0.0,0.0,0.214286,2.0,10.0,2.0,0.142857,0.714286,0.142857


In [None]:
list(result_new)

['result',
 'game_id',
 'competition_code',
 'season',
 'round',
 'date',
 'home_club_id',
 'home_club_goals',
 'home_club_position',
 'h_player_number',
 'h_num_attack',
 'h_num_defender',
 'h_num_midfield',
 'h_num_goalkeeper',
 'h_attack_ratio',
 'h_defender_ratio',
 'h_midfield_ratio',
 'h_avg_age_team',
 'h_avg_height_team',
 'h_avg_age_attack',
 'h_avg_age_defender',
 'h_avg_age_midfield',
 'h_avg_age_goalkeeper',
 'h_avg_height_attack',
 'h_avg_height_defender',
 'h_avg_height_midfield',
 'h_avg_height_goalkeeper',
 'h_Europe_num',
 'h_North_America_num',
 'h_South_America_num',
 'h_Asia_num',
 'h_Oceania_num',
 'h_Africa_num',
 'h_EU_ratio',
 'h_NA_ratio',
 'h_SA_ratio',
 'h_AS_ratio',
 'h_AF_ratio',
 'h_OC_ratio',
 'h_left_num',
 'h_right_num',
 'h_both_num',
 'h_left_ratio',
 'h_right_ratio',
 'h_both_ratio',
 'away_club_id',
 'away_club_goals',
 'away_club_position',
 'a_player_number',
 'a_num_attack',
 'a_num_defender',
 'a_num_midfield',
 'a_num_goalkeeper',
 'a_attack_ra

#### 2. Exceptional Data Dropping:</br>
It is only possible that the player number in a game is between 11 - 16, therefore the other value should be drop

In [None]:
result_new.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 43161 entries, 0 to 43160
Data columns (total 84 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   result                   43161 non-null  object 
 1   game_id                  43161 non-null  int64  
 2   competition_code         43161 non-null  object 
 3   season                   43161 non-null  int64  
 4   round                    43161 non-null  object 
 5   date                     43161 non-null  object 
 6   home_club_id             43161 non-null  int64  
 7   home_club_goals          43161 non-null  int64  
 8   home_club_position       43161 non-null  float64
 9   h_player_number          37176 non-null  float64
 10  h_num_attack             37176 non-null  float64
 11  h_num_defender           37176 non-null  float64
 12  h_num_midfield           37176 non-null  float64
 13  h_num_goalkeeper         37176 non-null  float64
 14  h_attack_ratio        

In [None]:
result_new['h_player_number'].value_counts()

14.0    24272
13.0     4217
15.0     2983
16.0     2257
1.0      1316
12.0      972
2.0       460
3.0       202
11.0      183
17.0       90
4.0        68
10.0       48
5.0        25
9.0        22
18.0       18
6.0        14
7.0        13
8.0         8
19.0        4
20.0        2
29.0        1
24.0        1
Name: h_player_number, dtype: int64

In [None]:
#drop the exceptional data and the row without player information
result_new.drop(result_new[result_new.h_player_number < 11].index, inplace=True)
result_new.drop(result_new[result_new.h_player_number > 16].index, inplace=True)
result_new.drop(result_new[result_new.h_num_goalkeeper < 1].index, inplace=True)
result_new.drop(result_new[result_new.a_player_number < 11].index, inplace=True)
result_new.drop(result_new[result_new.a_player_number > 16].index, inplace=True)
result_new.drop(result_new[result_new.a_num_goalkeeper < 1].index, inplace=True)
result_new.dropna(subset=['h_player_number'],inplace=True)
result_new.dropna(subset=['a_player_number'],inplace=True)
result_new.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 32993 entries, 0 to 43160
Data columns (total 84 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   result                   32993 non-null  object 
 1   game_id                  32993 non-null  int64  
 2   competition_code         32993 non-null  object 
 3   season                   32993 non-null  int64  
 4   round                    32993 non-null  object 
 5   date                     32993 non-null  object 
 6   home_club_id             32993 non-null  int64  
 7   home_club_goals          32993 non-null  int64  
 8   home_club_position       32993 non-null  float64
 9   h_player_number          32993 non-null  float64
 10  h_num_attack             32993 non-null  float64
 11  h_num_defender           32993 non-null  float64
 12  h_num_midfield           32993 non-null  float64
 13  h_num_goalkeeper         32993 non-null  float64
 14  h_attack_ratio        

In [None]:
result_new.to_csv("./result_new.csv")

In [None]:
import pandas as pd
result_new = pd.read_csv("./result_new.csv")
print(list(result_new))

['Unnamed: 0', 'result', 'game_id', 'competition_code', 'season', 'round', 'date', 'home_club_id', 'home_club_goals', 'home_club_position', 'h_player_number', 'h_num_attack', 'h_num_defender', 'h_num_midfield', 'h_num_goalkeeper', 'h_attack_ratio', 'h_defender_ratio', 'h_midfield_ratio', 'h_avg_age_team', 'h_avg_height_team', 'h_avg_age_attack', 'h_avg_age_defender', 'h_avg_age_midfield', 'h_avg_age_goalkeeper', 'h_avg_height_attack', 'h_avg_height_defender', 'h_avg_height_midfield', 'h_avg_height_goalkeeper', 'h_Europe_num', 'h_North_America_num', 'h_South_America_num', 'h_Asia_num', 'h_Oceania_num', 'h_Africa_num', 'h_EU_ratio', 'h_NA_ratio', 'h_SA_ratio', 'h_AS_ratio', 'h_AF_ratio', 'h_OC_ratio', 'h_left_num', 'h_right_num', 'h_both_num', 'h_left_ratio', 'h_right_ratio', 'h_both_ratio', 'away_club_id', 'away_club_goals', 'away_club_position', 'a_player_number', 'a_num_attack', 'a_num_defender', 'a_num_midfield', 'a_num_goalkeeper', 'a_attack_ratio', 'a_defender_ratio', 'a_midfield_r

In [None]:
import pandas as pd
result = pd.read_csv("./result.csv")
print(list(result))

['Unnamed: 0', 'club_game', 'result', 'game_id', 'competition_code', 'season', 'round', 'date', 'club_id', 'club_goals', 'club_position', 'location', 'player_number', 'num_attack', 'num_defender', 'num_midfield', 'num_goalkeeper', 'attack_ratio', 'defender_ratio', 'midfield_ratio', 'avg_age_team', 'avg_height_team', 'avg_age_attack', 'avg_age_defender', 'avg_age_midfield', 'avg_age_goalkeeper', 'avg_height_attack', 'avg_height_defender', 'avg_height_midfield', 'avg_height_goalkeeper', 'Europe_num', 'North_America_num', 'South_America_num', 'Asia_num', 'Oceania_num', 'Africa_num', 'EU_ratio', 'NA_ratio', 'SA_ratio', 'AS_ratio', 'AF_ratio', 'OC_ratio', 'left_num', 'right_num', 'both_num', 'left_ratio', 'right_ratio', 'both_ratio']


<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=aceb451a-c323-4d42-a6b2-6c4559d224cd' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>