# FIFA 22 Players - Regression

The objective of this project is to mine the players dataset from FIFA 22 and create a model for determining the best position for a player depending on the best features.

The dataset was downloaded from: https://www.kaggle.com/cashncarry/fifa-22-complete-player-dataset

The dataset contains attributes related to different soccer related abilities as well as some other features, i.e. value, wage, nationality, club name, playing position in club team, national team name, playing position in national team, jersey number in national team and club, release clause for clubs, contract info with clubs etc. The scores to the soccer related abilities are given out by FIFA ratings division on a scale of 0 to 100. TotalStats is sum of scores from all soccer abilities related attributes and BaseStats is sum of scores from position related attributes.

# Import libraries

In [1]:
import pandas as pd
import numpy as np

In [2]:
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import mean_squared_error

# Import CSV file

In [3]:
# Read the csv file into a pandas DataFrame
df = pd.read_csv('players_fifa22.csv')#, dtype= str)
df.head()

Unnamed: 0,ID,Name,FullName,Age,Height,Weight,PhotoUrl,Nationality,Overall,Potential,...,LMRating,CMRating,RMRating,LWBRating,CDMRating,RWBRating,LBRating,CBRating,RBRating,GKRating
0,158023,L. Messi,Lionel Messi,34,170,72,https://cdn.sofifa.com/players/158/023/22_60.png,Argentina,93,93,...,93,90,93,69,67,69,64,53,64,22
1,188545,R. Lewandowski,Robert Lewandowski,32,185,81,https://cdn.sofifa.com/players/188/545/22_60.png,Poland,92,92,...,87,83,87,67,69,67,64,63,64,22
2,20801,Cristiano Ronaldo,C. Ronaldo dos Santos Aveiro,36,187,83,https://cdn.sofifa.com/players/020/801/22_60.png,Portugal,91,91,...,89,81,89,66,62,66,63,56,63,23
3,231747,K. Mbappé,Kylian Mbappé,22,182,73,https://cdn.sofifa.com/players/231/747/22_60.png,France,91,95,...,92,84,92,70,66,70,66,57,66,21
4,200389,J. Oblak,Jan Oblak,28,188,87,https://cdn.sofifa.com/players/200/389/22_60.png,Slovenia,91,93,...,38,41,38,35,39,35,35,36,35,92


In [4]:
# Checking the data types of each column
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 19260 entries, 0 to 19259
Data columns (total 90 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   ID                 19260 non-null  int64  
 1   Name               19260 non-null  object 
 2   FullName           19260 non-null  object 
 3   Age                19260 non-null  int64  
 4   Height             19260 non-null  int64  
 5   Weight             19260 non-null  int64  
 6   PhotoUrl           19260 non-null  object 
 7   Nationality        19260 non-null  object 
 8   Overall            19260 non-null  int64  
 9   Potential          19260 non-null  int64  
 10  Growth             19260 non-null  int64  
 11  TotalStats         19260 non-null  int64  
 12  BaseStats          19260 non-null  int64  
 13  Positions          19260 non-null  object 
 14  BestPosition       19260 non-null  object 
 15  Club               19260 non-null  object 
 16  ValueEUR           192

# Transforming categorical data to numerical data

In [5]:
# Converting AttackingWorkRate & DefensiveWorkRate to numerical columns
df['AttackingWorkRate_code'] = df['AttackingWorkRate'].apply(lambda x: 3 if x == 'High' \
                                                                  else (2 if x == 'Medium' else 1))
df['DefensiveWorkRate_code'] = df['DefensiveWorkRate'].apply(lambda x: 3 if x == 'High' \
                                                                  else (2 if x == 'Medium' else 1))

In [6]:
# Converting BestPosition to numerical categorical column
df['BestPosition_code'] = df['BestPosition'].astype('category').cat.codes
# df[['BestPosition', 'BestPosition_code']].value_counts()

In [7]:
# Converting PreferredFoot to numerical categorical column
df['PreferredFoot_code'] = df['PreferredFoot'].astype('category').cat.codes
# df[['PreferredFoot', 'PreferredFoot_code']].value_counts()

In [8]:
# Creating a dictionary of BestPosition name and assigned code for future reference 
pos_dict = df[['BestPosition', 'BestPosition_code']].drop_duplicates()\
            .set_index('BestPosition', drop=True).to_dict(orient= 'dict')
pos_dict

{'BestPosition_code': {'RW': 12,
  'ST': 14,
  'GK': 5,
  'CM': 4,
  'LW': 8,
  'CDM': 2,
  'LM': 7,
  'CF': 3,
  'CB': 1,
  'CAM': 0,
  'LB': 6,
  'RB': 10,
  'RM': 11,
  'LWB': 9,
  'RWB': 13}}

In [9]:
# Creating a dictionary of BestPosition name and assigned code for future reference 
PreferredFoot_dict = df[['PreferredFoot', 'PreferredFoot_code']].drop_duplicates()\
            .set_index('PreferredFoot', drop=True).to_dict(orient= 'dict')
PreferredFoot_dict

{'PreferredFoot_code': {'Left': 0, 'Right': 1}}

# Dropping unnecessary columns

In [10]:
# Dropping object data types as well as some unnecessary columns from the dataframe 
numerics = ['int8','int16', 'int32', 'int64', 'float16', 'float32', 'float64']

df_select = df.select_dtypes(include=numerics)
df_select.drop(['ID', 'Potential', 'NationalNumber', 'ContractUntil', 'ClubNumber', 'ClubJoined',\
                'IntReputation','ReleaseClause'], axis =1, inplace = True)
df_select.dropna(axis = 0, how = 'any', inplace = True)
df_select.head()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().drop(
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_select.dropna(axis = 0, how = 'any', inplace = True)


Unnamed: 0,Age,Height,Weight,Overall,Growth,TotalStats,BaseStats,ValueEUR,WageEUR,WeakFoot,...,CDMRating,RWBRating,LBRating,CBRating,RBRating,GKRating,AttackingWorkRate_code,DefensiveWorkRate_code,BestPosition_code,PreferredFoot_code
0,34,170,72,93,0,2219,462,78000000,320000,4,...,67,69,64,53,64,22,2,1,12,0
1,32,185,81,92,0,2212,460,119500000,270000,4,...,69,67,64,63,64,22,3,2,14,1
2,36,187,83,91,0,2208,457,45000000,270000,4,...,62,66,63,56,63,23,3,1,14,1
3,22,182,73,91,4,2175,470,194000000,230000,4,...,66,70,66,57,66,21,3,1,14,1
4,28,188,87,91,2,1413,489,112000000,130000,3,...,39,35,35,36,35,92,2,2,5,1


In [11]:
df_select.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 19260 entries, 0 to 19259
Data columns (total 72 columns):
 #   Column                  Non-Null Count  Dtype
---  ------                  --------------  -----
 0   Age                     19260 non-null  int64
 1   Height                  19260 non-null  int64
 2   Weight                  19260 non-null  int64
 3   Overall                 19260 non-null  int64
 4   Growth                  19260 non-null  int64
 5   TotalStats              19260 non-null  int64
 6   BaseStats               19260 non-null  int64
 7   ValueEUR                19260 non-null  int64
 8   WageEUR                 19260 non-null  int64
 9   WeakFoot                19260 non-null  int64
 10  SkillMoves              19260 non-null  int64
 11  PaceTotal               19260 non-null  int64
 12  ShootingTotal           19260 non-null  int64
 13  PassingTotal            19260 non-null  int64
 14  DribblingTotal          19260 non-null  int64
 15  DefendingTotal     

# Feature selection, splitting and scaling

In [12]:
X = df_select.drop('Growth', axis = 1)
y = df_select['Growth']

In [13]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

In [14]:
from sklearn.preprocessing import StandardScaler

# Create a StandardScater model and fit it to the training data

X_scaler = StandardScaler().fit(X_train)
X_train_scaled = X_scaler.transform(X_train)
X_test_scaled = X_scaler.transform(X_test)

# Linear Regression

In [15]:
from sklearn.linear_model import LinearRegression
lin_reg_model = LinearRegression()
lin_reg_model.fit(X_train_scaled, y_train)
print(lin_reg_model)

LinearRegression()


In [16]:
print('Weight coefficients: ', lin_reg_model.coef_)
print('y-axis intercept: ', lin_reg_model.intercept_) 

Weight coefficients:  [-4.15916127e+00  1.61822466e-01 -1.40846374e-01 -1.05996938e+00
 -9.20054363e+12 -1.12611698e+12  9.63771506e-03  1.14573533e-01
  6.25403458e-02  7.86437988e-02  3.00222077e+11  3.89970882e+11
  2.77344840e+11  2.74715054e+11  4.61410880e+11  2.70783046e+11
  6.09283368e+11  6.66774106e+11  5.86989942e+11  4.91133015e+11
  5.98112245e+11  6.36653485e+11  6.15149312e+11  5.80041238e+11
  5.06922676e+11  5.64420860e+11  5.14004593e+11  5.07306926e+11
  5.03266776e+11  3.03856353e+11  4.83864569e+11  4.42902178e+11
  4.06965259e+11  5.45395020e+11  4.26351750e+11  6.56192761e+11
  5.74473953e+11  6.98653343e+11  6.63597042e+11  4.58521297e+11
  5.35278581e+11  2.11425781e-01  6.82029712e+11  7.17244923e+11
  7.01179664e+11  5.99676777e+11  5.74878934e+11  5.64329166e+11
  5.82512915e+11  6.10635858e+11  2.06274414e+00 -9.09679670e+10
 -1.96179085e+11  9.41984858e+10  1.01980600e+11  9.09679670e+10
  1.96643066e+00  6.48481929e+10 -1.20898438e+00 -6.48481929e+10
 -3

In [17]:
print(f"Training Data Score: {lin_reg_model.score(X_train_scaled, y_train)}")
print(f"Testing Data Score: {lin_reg_model.score(X_test_scaled, y_test)}")

Training Data Score: 0.7946445372593808
Testing Data Score: 0.7949090244324191


In [18]:
predictions = lin_reg_model.predict(X_test_scaled)
print(f"Predicted Labels: {predictions[:5]}")
print(f"Actual Labels: {list(y_test[:5])}")

Predicted Labels: [-0.44936263  5.96106118  8.78927286 -1.49140145 12.8269926 ]
Actual Labels: [0, 5, 15, 0, 15]


In [19]:
predictions_lin_reg = lin_reg_model.predict(X_test_scaled)
MSE = mean_squared_error(y_test, predictions_lin_reg)
r2 = lin_reg_model.score(X_test_scaled, y_test)

print(f"RMSE: {np.sqrt(MSE)}, R2: {r2}")

RMSE: 2.4574517852799835, R2: 0.7949090244324191


# Ridge regression

In [21]:
from sklearn.linear_model import Ridge
reg_ridge = Ridge(alpha=.1)
reg_ridge.fit(X_train_scaled, y_train)

predictions_ridge = reg_ridge.predict(X_test_scaled)
MSE = mean_squared_error(y_test, predictions_ridge)
r2 = reg_ridge.score(X_test_scaled, y_test)

print(f"RMSE: {np.sqrt(MSE)}, R2: {r2}")

RMSE: 2.4574925236825735, R2: 0.7949022245854935


In [23]:
print(f"Training Data Score: {reg_ridge.score(X_train_scaled, y_train)}")
print(f"Testing Data Score: {reg_ridge.score(X_test_scaled, y_test)}")

Training Data Score: 0.7946673515033441
Testing Data Score: 0.7949022245854935


In [24]:
%%time
parameters = {'alpha': [x for x in range(11)]}
# define the grid search
Ridge_reg= GridSearchCV(reg_ridge, parameters, scoring='neg_mean_squared_error',cv=5, verbose = 3)

#fit the grid search
Ridge_reg.fit(X_train_scaled, y_train)

Fitting 5 folds for each of 11 candidates, totalling 55 fits
[CV 1/5] END ..........................alpha=0;, score=-6.271 total time=   0.0s
[CV 2/5] END ..........................alpha=0;, score=-6.361 total time=   0.0s
[CV 3/5] END ..........................alpha=0;, score=-6.440 total time=   0.0s
[CV 4/5] END ..........................alpha=0;, score=-6.131 total time=   0.0s
[CV 5/5] END ..........................alpha=0;, score=-6.272 total time=   0.0s
[CV 1/5] END ..........................alpha=1;, score=-6.256 total time=   0.0s
[CV 2/5] END ..........................alpha=1;, score=-6.269 total time=   0.0s
[CV 3/5] END ..........................alpha=1;, score=-6.418 total time=   0.0s
[CV 4/5] END ..........................alpha=1;, score=-6.128 total time=   0.0s
[CV 5/5] END ..........................alpha=1;, score=-6.170 total time=   0.0s
[CV 1/5] END ..........................alpha=2;, score=-6.255 total time=   0.0s
[CV 2/5] END ..........................alpha=2;,

GridSearchCV(cv=5, estimator=Ridge(alpha=0.1),
             param_grid={'alpha': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]},
             scoring='neg_mean_squared_error', verbose=3)

In [25]:
Ridge_reg.best_params_

{'alpha': 2}

In [26]:
Ridge_reg.best_estimator_

Ridge(alpha=2)

In [27]:
# best model
best_ridge_model = Ridge_reg.best_estimator_
best_ridge_model.fit(X_train_scaled, y_train)

Ridge(alpha=2)

In [28]:
print(f"Training Data Score: {best_ridge_model.score(X_train_scaled, y_train)}")
print(f"Testing Data Score: {best_ridge_model.score(X_test_scaled, y_test)}")

Training Data Score: 0.7946494423867194
Testing Data Score: 0.794922873167468


In [29]:
predictions_ridge = best_ridge_model.predict(X_test_scaled)
MSE = mean_squared_error(y_test, predictions_ridge)
r2 = best_ridge_model.score(X_test_scaled, y_test)

print(f"RMSE: {np.sqrt(MSE)}, R2: {r2}")

RMSE: 2.457368814361574, R2: 0.794922873167468


# Random Forest Regressor

In [30]:
%%time
# Fit the data into model
rfr = RandomForestRegressor(n_estimators=200)
rfr.fit(X_train_scaled, y_train)

Wall time: 1min 14s


RandomForestRegressor(n_estimators=200)

In [31]:
sorted(zip(rfr.feature_importances_, X.columns), reverse=True)

[(0.8964824895865219, 'Age'),
 (0.009005134799600278, 'ValueEUR'),
 (0.008641971025045662, 'Overall'),
 (0.003619516528459422, 'WageEUR'),
 (0.0030708865033409094, 'Reactions'),
 (0.0028118624710873103, 'Crossing'),
 (0.002514057296041595, 'Positioning'),
 (0.0024825374674556765, 'Stamina'),
 (0.002125831068951181, 'HeadingAccuracy'),
 (0.0020680899191387776, 'Jumping'),
 (0.0019443015082688192, 'FKAccuracy'),
 (0.001927830193113276, 'Aggression'),
 (0.00186033652584893, 'Marking'),
 (0.0018497683050164966, 'BaseStats'),
 (0.0018217680215895369, 'Composure'),
 (0.0017978541308899265, 'BestPosition_code'),
 (0.0017120314047635772, 'Strength'),
 (0.0016976070565076463, 'Weight'),
 (0.0016891312984820476, 'Penalties'),
 (0.0016679280284369136, 'Agility'),
 (0.001667672364619912, 'GKReflexes'),
 (0.0016346022574873775, 'Vision'),
 (0.0016181322938855535, 'ShotPower'),
 (0.0015720571851052083, 'Height'),
 (0.0015631653977902335, 'Balance'),
 (0.0015296998442654046, 'SlidingTackle'),
 (0.001

In [32]:
predictions = rfr.predict(X_test_scaled)
base_train_accuracy = round(rfr.score(X_train_scaled, y_train)*100,3)
base_test_accuracy = round(rfr.score(X_test_scaled, y_test)*100,3)
print(f"Training Data Score: {base_train_accuracy}")
print(f"Testing Data Score: {base_test_accuracy}")

Training Data Score: 98.968
Testing Data Score: 92.778


In [33]:
print(f"Predicted Labels: {predictions[:5]}")
print(f"Actual Labels: {list(y_test[:5])}")

Predicted Labels: [ 0.     4.38   9.76   0.    16.885]
Actual Labels: [0, 5, 15, 0, 15]


In [34]:
from sklearn.metrics import mean_squared_error
MSE_score = mean_squared_error(y_test,predictions)
print("Mean Squared Error",MSE_score.mean())

Mean Squared Error 2.126703753894081


In [35]:
np.sqrt(MSE_score)

1.4583222393881543

In [36]:
# Get randomforest params
rfr.get_params()

{'bootstrap': True,
 'ccp_alpha': 0.0,
 'criterion': 'squared_error',
 'max_depth': None,
 'max_features': 'auto',
 'max_leaf_nodes': None,
 'max_samples': None,
 'min_impurity_decrease': 0.0,
 'min_samples_leaf': 1,
 'min_samples_split': 2,
 'min_weight_fraction_leaf': 0.0,
 'n_estimators': 200,
 'n_jobs': None,
 'oob_score': False,
 'random_state': None,
 'verbose': 0,
 'warm_start': False}

In [37]:
# Create the GridSearchCV model
from sklearn.model_selection import GridSearchCV
param_grid = {
    'n_estimators': [100, 200,300]
}
rfr_grid = GridSearchCV(rfr, param_grid, cv=5, verbose=3)

In [38]:
%%time
# Train the model with GridSearch
rfr_grid.fit(X_train_scaled, y_train)

Fitting 5 folds for each of 3 candidates, totalling 15 fits
[CV 1/5] END ..................n_estimators=100;, score=0.918 total time=  28.4s
[CV 2/5] END ..................n_estimators=100;, score=0.923 total time=  28.6s
[CV 3/5] END ..................n_estimators=100;, score=0.920 total time=  28.4s
[CV 4/5] END ..................n_estimators=100;, score=0.923 total time=  28.7s
[CV 5/5] END ..................n_estimators=100;, score=0.925 total time=  27.9s
[CV 1/5] END ..................n_estimators=200;, score=0.919 total time=  55.3s
[CV 2/5] END ..................n_estimators=200;, score=0.924 total time=  55.2s
[CV 3/5] END ..................n_estimators=200;, score=0.921 total time=  53.9s
[CV 4/5] END ..................n_estimators=200;, score=0.924 total time=  54.0s
[CV 5/5] END ..................n_estimators=200;, score=0.925 total time=  53.3s
[CV 1/5] END ..................n_estimators=300;, score=0.919 total time= 1.4min
[CV 2/5] END ..................n_estimators=300;,

GridSearchCV(cv=5, estimator=RandomForestRegressor(n_estimators=200),
             param_grid={'n_estimators': [100, 200, 300]}, verbose=3)

In [39]:
rfr_grid.best_params_
# {'n_estimators': 300}

{'n_estimators': 300}

In [40]:
rfr_grid.score(X_train_scaled, y_train)

0.9897172994023559

In [41]:
rfr_grid.score(X_test_scaled, y_test)

0.9275252473297764

In [45]:
%%time
# Fit the data into model
rfr_tuned = RandomForestRegressor(n_estimators=300, random_state=1)
rfr_tuned.fit(X_train_scaled, y_train)

Wall time: 1min 49s


RandomForestRegressor(n_estimators=300, random_state=1)

In [46]:
print(f' Training Score: {rfr_tuned.score(X_train_scaled, y_train)}')
print(f' Testing Score: {rfr_tuned.score(X_test_scaled, y_test)}')

 Training Score: 0.9898423876603489
 Testing Score: 0.9277775626980825


In [47]:
rfr_predictions = rfr_tuned.predict(X_test_scaled)
rfr_ln_mse = mean_squared_error(y_test, rfr_predictions)
rfm_ln_rmse = np.sqrt(rfr_ln_mse)
print(rfm_ln_rmse)

1.4583031054573803


# SVR

In [39]:
%%time
from sklearn.svm import SVR
from sklearn.metrics import mean_squared_error
svm_reg_ln = SVR(kernel="linear")
svm_reg_ln.fit(X_train_scaled, y_train)
predictions = svm_reg_ln.predict(X_test_scaled)
svm_ln_mse = mean_squared_error(y_test, predictions)
svm_ln_rmse = np.sqrt(svm_ln_mse)
print(svm_ln_rmse)

2.4818275822054643
Wall time: 43.2 s


In [40]:
print(f' Training Score: {svm_reg_ln.score(X_train_scaled, y_train)}')
print(f' Testing Score: {svm_reg_ln.score(X_train_scaled, y_train)}')

 Training Score: 0.7893478474803727
 Testing Score: 0.7893478474803727


In [41]:
%%time
svm_reg_rbf = SVR(kernel="rbf")
svm_reg_rbf.fit(X_train_scaled, y_train)
predictions = svm_reg_rbf.predict(X_test_scaled)
svm_rbf_mse = mean_squared_error(y_test, predictions)
svm_rbf_rmse = np.sqrt(svm_rbf_mse)
print(svm_rbf_rmse)

1.870760400772687
Wall time: 29.6 s


In [42]:
print(f' Training Score: {svm_reg_rbf.score(X_train_scaled, y_train)}')
print(f' Testing Score: {svm_reg_rbf.score(X_train_scaled, y_train)}')

 Training Score: 0.8861223626619954
 Testing Score: 0.8861223626619954


In [43]:
# %%time
# param_grid = {'kernel': ['linear', 'rbf'], 'C': [2**x for x in range(1,11)]}
# # train across 3 folds, that's a total of (8+16)*3=72 rounds of training 
# svr_grid = GridSearchCV(estimator=SVR(), param_grid= param_grid, cv=3,
#                            return_train_score=True, verbose = 3)
# svr_grid.fit(X_train_scaled, y_train)

Fitting 3 folds for each of 20 candidates, totalling 60 fits
[CV 1/3] END C=2, kernel=linear;, score=(train=0.791, test=0.786) total time=  29.5s
[CV 2/3] END C=2, kernel=linear;, score=(train=0.793, test=0.783) total time=  31.6s
[CV 3/3] END C=2, kernel=linear;, score=(train=0.786, test=0.791) total time=  30.1s
[CV 1/3] END C=2, kernel=rbf;, score=(train=0.897, test=0.881) total time=  16.2s
[CV 2/3] END C=2, kernel=rbf;, score=(train=0.897, test=0.882) total time=  16.0s
[CV 3/3] END C=2, kernel=rbf;, score=(train=0.896, test=0.887) total time=  16.4s
[CV 1/3] END C=4, kernel=linear;, score=(train=0.791, test=0.786) total time=  48.1s
[CV 2/3] END C=4, kernel=linear;, score=(train=0.793, test=0.783) total time=  49.3s
[CV 3/3] END C=4, kernel=linear;, score=(train=0.786, test=0.791) total time=  48.6s
[CV 1/3] END C=4, kernel=rbf;, score=(train=0.913, test=0.895) total time=  17.4s
[CV 2/3] END C=4, kernel=rbf;, score=(train=0.913, test=0.895) total time=  16.8s
[CV 3/3] END C=4, k

GridSearchCV(cv=3, estimator=SVR(),
             param_grid={'C': [2, 4, 8, 16, 32, 64, 128, 256, 512, 1024],
                         'kernel': ['linear', 'rbf']},
             return_train_score=True, verbose=3)

In [44]:
# print(svr_grid.best_params_)
# print(svr_grid.best_score_)
# {'C': 32, 'kernel': 'rbf'}
# 0.9096533136006023

{'C': 32, 'kernel': 'rbf'}
0.9096533136006023


In [46]:
%%time
svm_reg_rbf_tuned = SVR(kernel="rbf", C = 32)
svm_reg_rbf_tuned.fit(X_train_scaled, y_train)

Wall time: 2min 9s


SVR(C=32)

In [49]:
print(f' Training Score: {svm_reg_rbf_tuned.score(X_train_scaled, y_train)}')
print(f' Testing Score: {svm_reg_rbf_tuned.score(X_test_scaled, y_test)}')

 Training Score: 0.9484316955397023
 Testing Score: 0.9188830311313904


In [53]:
svm_predictions = svm_reg_rbf_tuned.predict(X_test_scaled)
svm_ln_mse = mean_squared_error(y_test, svm_predictions)
svm_ln_rmse = np.sqrt(svm_ln_mse)
print(svm_ln_rmse)

1.5454949391538066


So, random forest regressor model has the best accuracy.