<h1><center>Reverse Engineering the Overall Ratings in FIFA 20</center></h1>
<p>We are going to train a model to predict the overall rating of players from EA Sports' FIFA 20 video game. The data used in this notebook was scraped using scrapy. See the other notebooks in order to scrape your own data!</p><p>We will predict each player's overall and potential rating using his attributes. From personal experience playing the game, the overall rating also depends on the player's position, while the potential rating strongly depends on the player's age.</p>

### Imports

In [1]:
# These will help us import the json lines file and create a DataFrame using Pandas

import json
from collections import OrderedDict
import pandas as pd
from datetime import datetime

# Feature Engineering
import re

# Machine Learning Libraries


### Load the data and create a DataFrame

In [2]:
cd data

/Users/rfelix/Desktop/Projects/GitHub/sofifa_data/data


In [3]:
# Load the sofi_stats.jl file, save each line as an ordered dictionary, 
# each of which will be an element of a list called data

data=[]
with open('sofi_stats.jl') as stats:
    for line in stats:
        data.append(json.loads(line,object_pairs_hook=OrderedDict)) # OrderedDict keeps our keys in order here

In [4]:
cd ..

/Users/rfelix/Desktop/Projects/GitHub/sofifa_data


In [5]:
# Create the DataFrame and check it out

df = pd.DataFrame(data)

pd.set_option('display.max_columns', None) # We want to see all the columns!

df.head()

Unnamed: 0,name,id,full_name,country,positions,age,dob,height,weight,overall,potential,value,wage,preferred_foot,int_rep,weak_foot,skill_moves,work_rate,body_type,real_face,release_clause,club,club_rating,club_position,jersey_number,national_team,nt_rating,nt_position,nt_jersey,crossing,finishing,heading accuracy,short passing,volleys,dribbling,curve,fk accuracy,long passing,ball control,acceleration,sprint speed,agility,reactions,balance,shot power,jumping,stamina,strength,long shots,aggression,interceptions,positioning,vision,penalties,composure,defensive awareness,standing tackle,sliding tackle,gk diving,gk handling,gk kicking,gk positioning,gk reflexes,traits
0,E. Camavinga,248243,Eduardo Camavinga,France,[CM],16,"Nov 10, 2002",72,150,70,90,€3.5M,€3K,Left,1,3,3,High/ High,Lean,No,€10.1M,Stade Rennais FC,75,LCM,18,,,,,59,52,55,72,51,73,64,48,71,73,73,70,74,68,70,70,62,65,51,56,74,67,63,68,52,74,66,69,68,12,6,8,12,12,[Long Passer (CPU AI Only)]
1,E. Håland,239085,Erling Braut Håland,Norway,[ST],18,"Jul 21, 2000",76,192,77,88,€14.5M,€16K,Left,1,3,3,High/ Medium,Normal,No,€24.7M,Norway,76,LS,23,FC Red Bull Salzburg,73.0,LS,2023.0,46,81,67,69,72,72,69,62,49,76,79,89,76,75,65,78,70,75,85,67,75,35,77,62,80,80,38,31,15,7,14,13,11,7,"[Finesse Shot, Speed Dribbler (CPU AI Only)]"
2,F. Valverde,239053,Federico Valverde,Uruguay,"[CM, LM, CDM]",20,"Jul 22, 1998",72,172,79,87,€18.5M,€105K,Right,1,3,3,High/ High,Normal,Yes,€41.6M,Real Madrid,86,SUB,15,,,,,59,67,59,81,47,76,68,56,83,80,72,75,69,79,67,77,54,80,77,77,77,75,68,79,48,80,70,73,59,6,10,6,15,8,[]
3,M. Rashica,229167,Milot Rashica,Kosovo,"[LW, CF, RW]",23,"Jun 28, 1996",70,161,80,85,€20M,€34K,Right,1,4,4,High/ Medium,Lean,No,€35.5M,SV Werder Bremen,77,LW,7,,,,,74,78,53,76,77,85,78,69,69,81,94,84,90,78,77,85,47,76,58,79,67,47,77,73,71,78,55,37,36,7,14,7,14,6,"[Power Free-Kick, Long Shot Taker (CPU AI Only..."
4,R. Sterling,202652,Raheem Sterling,England,"[LW, RW]",24,"Dec 8, 1994",67,152,89,91,€82.5M,€265K,Right,3,3,4,High/ Medium,Lean,Yes,€158.8M,Manchester City,86,LW,7,England,82.0,LW,7.0,78,86,38,84,67,90,77,63,69,88,96,91,94,90,94,78,57,79,56,79,38,30,92,82,69,80,47,53,47,15,12,12,15,9,"[Flair, Speed Dribbler (CPU AI Only)]"


## Considerations for our models
When it comes to building these models, it seems to make sense to me to build an individual model for each position, rather than use one-hot encoding and try to solve all of them at once. In my experience playing the game, there are certain attributes, depending on the player's position, that don't affect the overall rating when they are adjusted. For example, maxing out the GK attributes for a striker won't change their overall rating at all. 

For simplicity's sake, we will use the first element of each player's list of positions, as this is their favored/preferred position according to the game. It seems safe to assume that it is this position that is being used to calculate the overall rating. 

For now, we will assume that all the other attributes affect the overall rating in a linear manner, however we may need to come back and reconsider how the "work_rate","weak_foot", and "skill_moves" affect the overall rating. 

## Feature Engineering for Overall Ratings
We are going to have to deal with player positions and split the work rates into two groups and encode them with integers. We may have to try one-hot encoding later for the work rates. For now, we will use a Multiple Linear Regression model from scikit-learn.

In [6]:
# First things first, let's deal create a new position column

df['first_pos']=df['positions'].apply(lambda x: x[0]) # This gives us the first position in their list

df['first_pos'].unique() # This shows us all the different positions in the game

    

array(['CM', 'ST', 'LW', 'CAM', 'RW', 'LM', 'LB', 'RB', 'CDM', 'RM', 'CB',
       'CF', 'GK', 'RWB', 'LWB'], dtype=object)

In [7]:
# We will define a function that we can apply to the position column 

def position_selector(pos):
    
    OWB=['RWB','LWB'] # Wing Backs
    OB=['RB','LB'] # Outside Backs
    Winger=['LW','RW'] # Wingers
    OM=['LM','RM'] # Outside Mids
    ST=['ST','CF'] # Strikers
    
    if pos in OWB:
        return 'OWB'
    elif pos in OB:
        return 'OB'
    elif pos in Winger:
        return 'W'
    elif pos in OM:
        return 'OM'
    elif pos in ST:
        return 'ST'
    else:
        return pos


In [8]:
# New column that will give us our position for the model

df['model_pos']=df['first_pos'].apply(position_selector)

df['model_pos'].unique()

array(['CM', 'ST', 'W', 'CAM', 'OM', 'OB', 'CDM', 'CB', 'GK', 'OWB'],
      dtype=object)

In [9]:
# New columns for the work rate

df['awr']=df['work_rate'].apply(lambda x: x.split('/')[0])

df['dwr']=df['work_rate'].apply(lambda x: x.split('/')[1][1:]) # [1:] gets rid of a space character

print(df['awr'].unique())
print(df['dwr'].unique())

['High' 'Medium' 'Low']
['High' 'Medium' 'Low']


In [10]:
# Now we will map these to integers

def wr_map(wr):
    if wr=='High':
        return 3
    elif wr=='Medium':
        return 2
    elif wr=='Low':
        return 1
    
df['awr']=df['awr'].apply(wr_map)
df['dwr']=df['dwr'].apply(wr_map)

print(df['awr'].unique())
print(df['dwr'].unique())

[3 2 1]
[3 2 1]


Now we just need to create a new df with only the columns we want!

In [11]:
# Here are our columns
df.columns

Index(['name', 'id', 'full_name', 'country', 'positions', 'age', 'dob',
       'height', 'weight', 'overall', 'potential', 'value', 'wage',
       'preferred_foot', 'int_rep', 'weak_foot', 'skill_moves', 'work_rate',
       'body_type', 'real_face', 'release_clause', 'club', 'club_rating',
       'club_position', 'jersey_number', 'national_team', 'nt_rating',
       'nt_position', 'nt_jersey', 'crossing', 'finishing', 'heading accuracy',
       'short passing', 'volleys', 'dribbling', 'curve', 'fk accuracy',
       'long passing', 'ball control', 'acceleration', 'sprint speed',
       'agility', 'reactions', 'balance', 'shot power', 'jumping', 'stamina',
       'strength', 'long shots', 'aggression', 'interceptions', 'positioning',
       'vision', 'penalties', 'composure', 'defensive awareness',
       'standing tackle', 'sliding tackle', 'gk diving', 'gk handling',
       'gk kicking', 'gk positioning', 'gk reflexes', 'traits', 'first_pos',
       'model_pos', 'awr', 'dwr'],
      dt

In [12]:
# figure out how to not manually type them all out

print(df.columns.get_loc('model_pos'))
print(df.columns.get_loc('int_rep'))
print(df.columns.get_loc('crossing'))
print(df.columns.get_loc('gk reflexes'))
print(df.columns.get_loc('overall'))
print(df.columns[65])

65
14
29
62
9
model_pos


In [13]:
# For overall rating, lets assume only the skill attributes matter
overall_cols=list(df.columns[65:])
overall_cols.extend(df.columns[14:17])
overall_cols.extend(df.columns[29:63])
overall_cols.append(df.columns[9])     

overall_data = df[overall_cols]
overall_data.head()


Unnamed: 0,model_pos,awr,dwr,int_rep,weak_foot,skill_moves,crossing,finishing,heading accuracy,short passing,volleys,dribbling,curve,fk accuracy,long passing,ball control,acceleration,sprint speed,agility,reactions,balance,shot power,jumping,stamina,strength,long shots,aggression,interceptions,positioning,vision,penalties,composure,defensive awareness,standing tackle,sliding tackle,gk diving,gk handling,gk kicking,gk positioning,gk reflexes,overall
0,CM,3,3,1,3,3,59,52,55,72,51,73,64,48,71,73,73,70,74,68,70,70,62,65,51,56,74,67,63,68,52,74,66,69,68,12,6,8,12,12,70
1,ST,3,2,1,3,3,46,81,67,69,72,72,69,62,49,76,79,89,76,75,65,78,70,75,85,67,75,35,77,62,80,80,38,31,15,7,14,13,11,7,77
2,CM,3,3,1,3,3,59,67,59,81,47,76,68,56,83,80,72,75,69,79,67,77,54,80,77,77,77,75,68,79,48,80,70,73,59,6,10,6,15,8,79
3,W,3,2,1,4,4,74,78,53,76,77,85,78,69,69,81,94,84,90,78,77,85,47,76,58,79,67,47,77,73,71,78,55,37,36,7,14,7,14,6,80
4,W,3,2,3,3,4,78,86,38,84,67,90,77,63,69,88,96,91,94,90,94,78,57,79,56,79,38,30,92,82,69,80,47,53,47,15,12,12,15,9,89


In [14]:
# Remove spaces in column names

overall_data.columns=list(map(lambda x: x.replace(' ','_'),overall_cols))
overall_data.head()

Unnamed: 0,model_pos,awr,dwr,int_rep,weak_foot,skill_moves,crossing,finishing,heading_accuracy,short_passing,volleys,dribbling,curve,fk_accuracy,long_passing,ball_control,acceleration,sprint_speed,agility,reactions,balance,shot_power,jumping,stamina,strength,long_shots,aggression,interceptions,positioning,vision,penalties,composure,defensive_awareness,standing_tackle,sliding_tackle,gk_diving,gk_handling,gk_kicking,gk_positioning,gk_reflexes,overall
0,CM,3,3,1,3,3,59,52,55,72,51,73,64,48,71,73,73,70,74,68,70,70,62,65,51,56,74,67,63,68,52,74,66,69,68,12,6,8,12,12,70
1,ST,3,2,1,3,3,46,81,67,69,72,72,69,62,49,76,79,89,76,75,65,78,70,75,85,67,75,35,77,62,80,80,38,31,15,7,14,13,11,7,77
2,CM,3,3,1,3,3,59,67,59,81,47,76,68,56,83,80,72,75,69,79,67,77,54,80,77,77,77,75,68,79,48,80,70,73,59,6,10,6,15,8,79
3,W,3,2,1,4,4,74,78,53,76,77,85,78,69,69,81,94,84,90,78,77,85,47,76,58,79,67,47,77,73,71,78,55,37,36,7,14,7,14,6,80
4,W,3,2,3,3,4,78,86,38,84,67,90,77,63,69,88,96,91,94,90,94,78,57,79,56,79,38,30,92,82,69,80,47,53,47,15,12,12,15,9,89


In [15]:
# Let's make sure there are linear relationships!!

index = [position for position in overall_data['model_pos'].unique()]
columns=[attribute for attribute in overall_data.drop(['model_pos','overall'],axis=1).columns]

corr_data=pd.DataFrame(index=index,columns=columns)

for position in overall_data['model_pos'].unique():
    
    df_pos = overall_data[overall_data['model_pos']==position]
    df_att = df_pos.drop('overall',axis=1)
    
    corr_by_pos = df_att.corrwith(df_pos['overall'])
    
    corr_data.loc[position]=corr_by_pos

    
corr_data

Unnamed: 0,awr,dwr,int_rep,weak_foot,skill_moves,crossing,finishing,heading_accuracy,short_passing,volleys,dribbling,curve,fk_accuracy,long_passing,ball_control,acceleration,sprint_speed,agility,reactions,balance,shot_power,jumping,stamina,strength,long_shots,aggression,interceptions,positioning,vision,penalties,composure,defensive_awareness,standing_tackle,sliding_tackle,gk_diving,gk_handling,gk_kicking,gk_positioning,gk_reflexes
CM,0.22907,0.25685,0.535738,0.295434,0.686828,0.78816,0.754441,0.48282,0.943148,0.677447,0.897084,0.716046,0.641913,0.919026,0.951048,0.0901842,0.0508327,0.377639,0.91701,0.17244,0.757373,0.237444,0.580093,0.477135,0.825843,0.57353,0.81114,0.824257,0.924017,0.617142,0.858306,0.745207,0.763653,0.637312,0.0128501,0.0478628,0.058128,0.0734738,0.0315978
ST,0.250742,0.0480433,0.534862,0.34462,0.658516,0.651072,0.930584,0.670951,0.862391,0.855883,0.833746,0.672082,0.58881,0.65204,0.919593,0.196715,0.231519,0.31904,0.913794,0.0813745,0.890404,0.352164,0.51396,0.463398,0.824454,0.54673,0.47172,0.942943,0.738464,0.639324,0.867097,0.464436,0.426373,0.332049,0.0733487,0.0604127,0.0868148,0.0374783,0.0543973
W,0.199389,0.0952405,0.603781,0.248971,0.681596,0.852785,0.80642,0.397738,0.910432,0.712364,0.929628,0.767306,0.637578,0.776546,0.94334,0.467728,0.413049,0.564199,0.916532,0.319887,0.710978,0.316711,0.628624,0.318048,0.837649,0.484573,0.428712,0.909235,0.878566,0.566579,0.870297,0.415547,0.268506,0.201495,0.0682715,0.0839739,0.0698019,0.0566946,-0.0569956
CAM,0.27033,0.0249681,0.545283,0.328418,0.725627,0.863565,0.863349,0.441291,0.916486,0.78459,0.914361,0.790766,0.740052,0.816581,0.952056,0.303883,0.249519,0.475211,0.893465,0.272472,0.796212,0.189833,0.619545,0.368427,0.866473,0.424937,0.458935,0.898842,0.925895,0.697767,0.837168,0.364674,0.317104,0.128896,0.077875,0.101456,0.0494408,0.0639051,0.0645371
OM,0.252325,0.0600873,0.482466,0.264559,0.639791,0.850207,0.772213,0.379132,0.888022,0.682859,0.905318,0.727506,0.572327,0.721231,0.92942,0.455873,0.419498,0.515392,0.885519,0.270702,0.728376,0.257975,0.547854,0.265758,0.776778,0.403141,0.352595,0.88665,0.858895,0.521507,0.824897,0.357969,0.266518,0.201947,0.0581827,0.0145622,0.0633564,0.00316056,0.0338567
OB,0.34448,0.0635817,0.474453,0.15826,0.642689,0.895851,0.578039,0.677241,0.871419,0.571749,0.794771,0.682816,0.485001,0.775576,0.894922,0.402759,0.458734,0.439337,0.897865,0.224824,0.698512,0.388363,0.677308,0.469615,0.652935,0.661474,0.905839,0.687528,0.729574,0.477813,0.849893,0.880042,0.899288,0.892289,0.0489512,0.0513431,0.104492,0.0500604,0.0965046
CDM,0.0659081,0.247875,0.493606,0.1911,0.534707,0.604507,0.551673,0.567722,0.904127,0.537554,0.749091,0.549824,0.466137,0.861923,0.871268,0.0597915,0.0512198,0.241559,0.905903,0.0376957,0.672935,0.309606,0.624829,0.550419,0.636752,0.661859,0.94071,0.650737,0.796259,0.490533,0.848732,0.922376,0.92843,0.858037,0.0878672,0.0660563,0.0613346,0.118109,-0.00427754
CB,0.126912,0.189711,0.517776,0.17956,0.201416,0.415144,0.447478,0.883837,0.789302,0.369013,0.605371,0.424938,0.341372,0.723772,0.758216,0.0269608,0.117636,0.209723,0.899362,-0.0588639,0.529108,0.264286,0.352569,0.565435,0.446972,0.786297,0.937769,0.384103,0.526685,0.3321,0.842889,0.934799,0.93604,0.914647,0.0536218,0.044928,0.0540777,0.0384157,0.032758
GK,,,0.515279,0.173408,,0.188223,0.447486,0.216502,0.375336,0.459331,0.415728,0.242692,0.240529,0.364807,0.42836,0.540079,0.550277,0.471659,0.898715,0.200205,0.82041,0.490927,0.488183,0.44555,0.466002,0.305833,0.488178,0.477014,0.496539,0.36758,0.667172,0.429016,0.248474,0.21196,0.963153,0.943158,0.822204,0.95867,0.961646
OWB,0.259051,0.0547725,0.3982,0.10001,0.62988,0.921789,0.480397,0.692956,0.936209,0.493232,0.863322,0.766202,0.548742,0.848606,0.92878,0.279359,0.394766,0.4392,0.892097,0.208897,0.806036,0.329519,0.724794,0.562813,0.613563,0.715289,0.926899,0.75358,0.738355,0.535771,0.88155,0.927158,0.92261,0.912908,0.360724,-0.0486898,0.169199,0.0161095,0.062981


We are going to want to filter out the attributes that clearly have a linear relationship with the overall rating. We will write a script that sets a threshold to remove some of the attributes that aren't important. 

We will create a dictionary that maps each position to it's relevant attributes

In [16]:
corr_data_T=corr_data.T

rel_attributes={}

for position in corr_data_T.columns:
    
    rel_attributes[position]=list(corr_data_T[corr_data_T[position]>0.55][position].index)
    rel_attributes[position].append('overall')
    
print(rel_attributes['CM'])


['skill_moves', 'crossing', 'finishing', 'short_passing', 'volleys', 'dribbling', 'curve', 'fk_accuracy', 'long_passing', 'ball_control', 'reactions', 'shot_power', 'stamina', 'long_shots', 'aggression', 'interceptions', 'positioning', 'vision', 'penalties', 'composure', 'defensive_awareness', 'standing_tackle', 'sliding_tackle', 'overall']


Now we are ready to build a pipeline to train, fit, and display performance for all of the models!

# Pipeline Creation 

### Regression Metrics Function

In [17]:
# Create a function to print out regression analysis

def regression_results(y_true, y_pred):

    # Regression metrics
    explained_variance=metrics.explained_variance_score(y_true, y_pred)
    mean_absolute_error=metrics.mean_absolute_error(y_true, y_pred) 
    mse=metrics.mean_squared_error(y_true, y_pred) 
    mean_squared_log_error=metrics.mean_squared_log_error(y_true, y_pred)
    median_absolute_error=metrics.median_absolute_error(y_true, y_pred)
    r2=metrics.r2_score(y_true, y_pred)

    print('explained_variance: ', round(explained_variance,4))    
    print('mean_squared_log_error: ', round(mean_squared_log_error,4))
    print('r2: ', round(r2,4))
    print('MAE: ', round(mean_absolute_error,4))
    print('MSE: ', round(mse,4))
    print('RMSE: ', round(np.sqrt(mse),4))

### Pipeline Code

In [18]:
# Imports
from sklearn.model_selection import train_test_split
import sklearn.metrics as metrics
import numpy as np


# Class definition

class overall_pipeline(object):
        
    def __init__(self):
        
        self.positions=[]
        self.position_data_dic={}
        self.position_split_data_dic={}
        self.model_coef={}
        self.model_preds={}
        self.regs=None
        
    def split(self,data):
        
        for x in data['model_pos'].unique():   # Iterate through the different positions
            
            self.positions.append(x)         # List the different positions 
            
            self.position_data_dic[x]=data[data['model_pos']==x]   # New DataFrame filtered by position
            
            self.position_data_dic[x]=self.position_data_dic[x][rel_attributes[x]]  # Filtered further by relevant attributes
            
            X=self.position_data_dic[x].drop(['overall'],axis=1)
            y=self.position_data_dic[x]['overall']
            
            X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.3)
            
            self.position_split_data_dic[x]=[X_train,X_test,y_train,y_test]  # We have run a train/test split on the data and now have a list containing the split data
            
    
            
    
    def run(self,regressors):    # You will have to set the parameters in the list of regressors that you pass...
        
        self.regs=regressors
        
        for x in self.position_split_data_dic.keys():
            
            reg_count=0
            
            for reg in regressors:
                reg_count+=1
                reg.fit(self.position_split_data_dic[x][0],self.position_split_data_dic[x][2])
            
            ######     Need to change this out of a dataframe 
                try: 
                    self.model_coef[(x,reg)]={'Intercept':reg.intercept_,'Coefficients':pd.DataFrame(reg.coef_,index=self.position_split_data_dic[x][0].columns)} #############
                except:
                    self.model_coef[(x,reg)]='Not Applicable for this Regressor'
            ######
                
                #### Predictions#####
                
                self.model_preds[(x,reg)]=np.array([round(x) for x in reg.predict(self.position_split_data_dic[x][1])])
            
                print(f'Regressor {reg_count} Results for: ' + x + ':\n')
            
                print(f"Number of {x}'s: {len(self.model_preds[(x,reg)])}")
                print(f"Total Difference in rating points: {abs(self.model_preds[(x,reg)]-self.position_split_data_dic[x][3]).sum()}\n")
            
                regression_results(self.position_split_data_dic[x][3],self.model_preds[(x,reg)])  # This is our function we defined earlier
                print('\n')
    
    
    def coef(self,positions=None):  # Here we can print all coefficients, one, or a list of them

        if self.regs == None:
            print('You must run the model first')
        
        else:
            if type(positions)==str: # Type is String
                for reg in self.regs:
                    print('Model: '+str(reg))
                    print('Coefficients for ' + position +':' + '\n \n',self.model_coef[(position,reg)],'\n \n \n')
            elif type(positions)==list: # Type is List
                for x in positions:
                    for reg in self.regs:
                        print('Coefficients for ' + x +':' + '\n \n',self.model_coef[(x,reg)],'\n \n \n')
            else:    # Print coefficients for all positions
                for x in self.model_coef.keys():
                    print('Coefficients for ' + x[0] +'with '+ str(x[1]) +':' + '\n \n',self.model_coef[x],'\n \n \n')

            




## Run the Models and Write Output to a File

In [19]:
%%capture cap 

print('Overall Model Results\n\n\n')   # Header for the Text File

################# CHOOSE REGRESSORS HERE ################
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from xgboost import XGBRegressor

Regressors=[LinearRegression(fit_intercept=False),XGBRegressor(booster='gblinear'),RandomForestRegressor()]  

################# RUN IT ALL ################

runner=overall_pipeline()     
runner.split(overall_data)
runner.run(Regressors)
runner.coef()



In [20]:
cd results/

/Users/rfelix/Desktop/Projects/GitHub/sofifa_data/results


In [21]:
time=datetime.now().strftime('%d-%m-%y %H:%M')

with open(f'Model Results {time}.txt','w') as file:
    file.write(str(cap))

## Visualize the model performance  (Work in Progress)

In [29]:
overall_data

Unnamed: 0,model_pos,awr,dwr,int_rep,weak_foot,skill_moves,crossing,finishing,heading_accuracy,short_passing,volleys,dribbling,curve,fk_accuracy,long_passing,ball_control,acceleration,sprint_speed,agility,reactions,balance,shot_power,jumping,stamina,strength,long_shots,aggression,interceptions,positioning,vision,penalties,composure,defensive_awareness,standing_tackle,sliding_tackle,gk_diving,gk_handling,gk_kicking,gk_positioning,gk_reflexes,overall,Lin,XGB,Ran
0,CM,3,3,1,3,3,59,52,55,72,51,73,64,48,71,73,73,70,74,68,70,70,62,65,51,56,74,67,63,68,52,74,66,69,68,12,6,8,12,12,70,,,
1,ST,3,2,1,3,3,46,81,67,69,72,72,69,62,49,76,79,89,76,75,65,78,70,75,85,67,75,35,77,62,80,80,38,31,15,7,14,13,11,7,77,,,
2,CM,3,3,1,3,3,59,67,59,81,47,76,68,56,83,80,72,75,69,79,67,77,54,80,77,77,77,75,68,79,48,80,70,73,59,6,10,6,15,8,79,,,
3,W,3,2,1,4,4,74,78,53,76,77,85,78,69,69,81,94,84,90,78,77,85,47,76,58,79,67,47,77,73,71,78,55,37,36,7,14,7,14,6,80,,,
4,W,3,2,3,3,4,78,86,38,84,67,90,77,63,69,88,96,91,94,90,94,78,57,79,56,79,38,30,92,82,69,80,47,53,47,15,12,12,15,9,89,,,
5,CAM,2,2,3,2,5,84,74,52,83,68,84,81,77,81,85,82,76,86,74,84,72,46,80,55,73,60,65,75,88,63,76,65,49,51,14,15,15,8,10,82,,,
6,CM,2,2,1,3,3,70,79,75,82,58,84,75,69,76,83,69,68,76,75,75,79,66,75,76,75,77,78,73,80,58,80,70,74,69,9,12,9,13,14,80,,,
7,ST,2,2,1,3,3,50,80,79,68,70,69,50,42,45,75,68,78,63,77,56,77,70,66,79,67,41,22,79,67,72,73,35,23,15,12,7,8,13,13,77,,,
8,W,3,2,4,2,5,87,79,51,83,77,88,88,88,80,87,85,84,94,84,77,83,72,78,60,85,73,42,85,88,73,86,41,55,52,10,7,11,12,11,87,,,
9,CM,2,2,1,3,3,56,58,58,75,62,70,65,56,74,74,69,75,74,60,67,75,71,62,72,61,54,51,52,71,60,64,54,56,49,12,8,6,11,12,68,,,


In [35]:
runner.model_preds[list(runner.model_preds.keys())[0]]

array([71., 48., 67., 74., 88., 65., 68., 74., 76., 71., 79., 70., 72.,
       72., 77., 82., 78., 55., 83., 72., 71., 74., 67., 64., 61., 69.,
       59., 55., 82., 78., 66., 71., 67., 66., 65., 61., 66., 62., 71.,
       60., 70., 69., 60., 58., 67., 69., 75., 71., 63., 60., 82., 62.,
       64., 69., 69., 65., 74., 62., 65., 65., 72., 72., 55., 62., 74.,
       73., 62., 75., 61., 70., 64., 63., 74., 55., 64., 72., 77., 79.,
       72., 67., 63., 64., 54., 73., 78., 73., 76., 63., 55., 56., 74.,
       73., 59., 69., 48., 81., 64., 63., 77., 69., 64., 69., 65., 76.,
       79., 63., 74., 71., 74., 62., 72., 65., 62., 81., 77., 76., 66.,
       68., 65., 74., 67., 78., 83., 74., 50., 57., 71., 63., 74., 60.,
       60., 70., 71., 77., 65., 74., 57., 66., 75., 65., 71., 62., 73.,
       70., 79., 58., 69., 73., 75., 53., 70., 69., 63., 69., 55., 75.,
       69., 60., 86., 65., 77., 55., 86., 74., 57., 75., 70., 71., 67.,
       64., 51., 69., 67., 61., 66., 82., 72., 62., 67., 70., 75