# FIFA 19 Cluster Analysis

## Problem Definition
A challenge would be to go through and individually scout thousands of players in the FIFA 19 dataset in an effort to identify transfer targets. Elements to consider in a player acquisition include team positional need, transfer fee, and player wage; teams should balance these variables to optimally operate and maximize the return on investment. A player's ability can develop and plateau so timing the acquistion in a player's career is critical to maximize value.

To improve the scouting process, players are clustered. The dataset has features that would enable similar players to be grouped together based on certain feature values. The goal of the clustering analysis is to generate clusters of players based on the value they can be of for a team. Hence, players in the cluster that have the potential to develop into a highly rated player whose market value is currently relatively low but can increase to a substantial value will be the most valuable. This undervalued cluster of players can serve as suggested transfer targets that can be quality assets for a team.

In [1]:
# import libraries
%matplotlib inline

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import collections

from sklearn.metrics.pairwise import euclidean_distances
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import DBSCAN

sns.set_style('whitegrid')

In [2]:
df = pd.read_csv('cleaned data/fifa19data_clean.csv')
print(df.columns)

Index(['Name', 'Age', 'Nationality', 'Overall', 'Potential', 'Club', 'Value',
       'Wage', 'Special', 'Preferred Foot', 'International Reputation',
       'Weak Foot', 'Skill Moves', 'Work Rate', 'Position', 'Height', 'Weight',
       'LS', 'ST', 'RS', 'LW', 'LF', 'CF', 'RF', 'RW', 'LAM', 'CAM', 'RAM',
       'LM', 'LCM', 'CM', 'RCM', 'RM', 'LWB', 'LDM', 'CDM', 'RDM', 'RWB', 'LB',
       'LCB', 'CB', 'RCB', 'RB', 'Crossing', 'Finishing', 'HeadingAccuracy',
       'ShortPassing', 'Volleys', 'Dribbling', 'Curve', 'FKAccuracy',
       'LongPassing', 'BallControl', 'Acceleration', 'SprintSpeed', 'Agility',
       'Reactions', 'Balance', 'ShotPower', 'Jumping', 'Stamina', 'Strength',
       'LongShots', 'Aggression', 'Interceptions', 'Positioning', 'Vision',
       'Penalties', 'Composure', 'Marking', 'StandingTackle', 'SlidingTackle',
       'GKDiving', 'GKHandling', 'GKKicking', 'GKPositioning', 'GKReflexes'],
      dtype='object')


## Feature Engineering

In [3]:
# split Work Rate into 2 separate features
df['Attack Work Rate'] = df.apply(lambda row: row['Work Rate'].split("/ ")[0], axis=1)
df['Defensive Work Rate'] = df.apply(lambda row: row['Work Rate'].split("/ ")[1], axis=1)

In [4]:
# group similar positions together and make new feature 'Position Category'
def groupPosition(position):
    forward = ['RS', 'LS', 'RF', 'LF', 'CF', 'ST']
    attack_mid = ['RAM', 'LAM', 'CAM']
    wings = ['RM', 'RW', 'LM', 'LW']
    central_mid = ['CM', 'LCM', 'RCM']
    defensive_mid = ['CDM', 'LDM', 'RDM']
    fullback = ['RB', 'RWB', 'LB', 'LWB']
    cb_def = ['CB', 'LCB', 'RCB']

    if position == 'GK':
        return 'GK'
    elif position in forward:
        return 'FW'
    elif position in attack_mid:
        return 'AM'
    elif position in wings:
        return 'W'
    elif position in central_mid:
        return 'CM'
    elif position in defensive_mid:
        return 'DM'
    elif position in fullback:
        return 'FB'
    elif position in cb_def:
        return 'CB'

df['Position Category'] = df['Position'].apply(groupPosition)

In [5]:
# one-hot encode categorical features
categorical = ['Preferred Foot', 'Attack Work Rate', 'Defensive Work Rate', 'Position Category']
dummy_prefix = ['Foot', 'AWR', 'DWR', 'Pos']

for i in range(0, len(categorical)):
    df = pd.concat([df, pd.get_dummies(df[categorical[i]], prefix=dummy_prefix[i])], axis=1)
    
df_raw = df.copy()

In [6]:
# normalize numerical features

cat_cols = ['Name', 'Nationality', 'Club', 'Preferred Foot', 'Work Rate', 'Attack Work Rate', 'Defensive Work Rate', 'Position', 'Position Category']

num_cols = list(set(list(df.columns)) - set(cat_cols))

for col in num_cols:
    df[col] = StandardScaler().fit_transform(df[col].values.reshape(-1, 1))



## Model Training 1

In [7]:
# select features to cluster data by
X_columns = ['Age', 'Overall', 'Potential', 'Value', 'Wage', 'Pos_FW', 'Pos_AM', 'Pos_W', 'Pos_CM', 
             'Pos_DM', 'Pos_FB', 'Pos_CB', 'Pos_GK']

In [8]:
model = DBSCAN(eps=2, min_samples=30)
model.fit(df[X_columns])

cluster_labels = model.labels_
n_clusters = len(set(cluster_labels))
print(collections.Counter(cluster_labels))

df['cluster'] = cluster_labels

Counter({2: 3060, 4: 2932, 5: 2769, 0: 2639, 3: 2147, 1: 2008, 6: 1423, 7: 980, -1: 201})


## Model Evaluation

In [9]:
# Inter-Cluster
centroids = []
for cluster in sorted(set(model.labels_)):
    centroids.append(df[df['cluster']==cluster][X_columns].mean().values)
distances = []
for c1 in centroids:
    for c2 in centroids:
        distances.append(euclidean_distances(c1.reshape(-1, 1), c2.reshape(-1, 1))[0][0])
print('Inter Cluster distance', np.mean(distances))

# Intra-Cluster
distances = []
for cluster in sorted(set(model.labels_)):
    df_filter = df[df['cluster']==cluster]
    centroid = df_filter[X_columns].mean().values
    for k, v in df_filter[X_columns].iterrows():
        distances.append(euclidean_distances(centroid.reshape(-1, 1), v.values.reshape(-1, 1))[0][0])
print('Intra Cluster distance', np.mean(distances))

# Inertia
distances = []
for cluster in sorted(set(model.labels_)):
    df_filter = df[df['cluster']==cluster]
    centroid = df_filter[X_columns].mean().values
    for k, v in df_filter[X_columns].iterrows():
        distances.append(euclidean_distances(centroid.reshape(1, -1), v.values.reshape(1, -1), squared=True)[0][0])
print('Inertia', np.sum(distances))

Inter Cluster distance 0.2018833868622251
Intra Cluster distance 0.8182788896248376
Inertia 72442.4264390069


## Cluster Descriptions 1

In [10]:
df[df['cluster']==-1]

Unnamed: 0,Name,Age,Nationality,Overall,Potential,Club,Value,Wage,Special,Preferred Foot,...,DWR_Medium,Pos_AM,Pos_CB,Pos_CM,Pos_DM,Pos_FB,Pos_FW,Pos_GK,Pos_W,cluster
0,L. Messi,1.258441,Argentina,4.013364,3.697415,FC Barcelona,19.296676,25.211255,2.213984,Left,...,0.593229,-0.241409,-0.452655,-0.369363,-0.293368,-0.424985,2.410140,-0.354276,-0.442195,-1
1,Cristiano Ronaldo,1.686666,Portugal,4.013364,3.697415,Juventus,13.315778,17.946385,2.309273,Right,...,-1.685689,-0.241409,-0.452655,-0.369363,-0.293368,-0.424985,2.410140,-0.354276,-0.442195,-1
2,Neymar Jr,0.187878,Brazil,3.724114,3.534396,Paris Saint-Germain,20.724951,12.724759,1.997752,Right,...,0.593229,-0.241409,-0.452655,-0.369363,-0.293368,-0.424985,-0.414914,-0.354276,2.261447,-1
3,De Gea,0.401990,Spain,3.579489,3.534396,Manchester United,12.423106,11.362595,-0.465097,Right,...,0.593229,-0.241409,-0.452655,-0.369363,-0.293368,-0.424985,-0.414914,2.822660,-0.442195,-1
4,K. De Bruyne,0.401990,Belgium,3.579489,3.371377,Manchester City,17.779135,15.676112,2.503515,Right,...,-1.685689,-0.241409,-0.452655,2.707363,-0.293368,-0.424985,-0.414914,-0.354276,-0.442195,-1
5,E. Hazard,0.401990,Belgium,3.579489,3.208358,Chelsea,16.172326,14.995031,1.994087,Right,...,0.593229,-0.241409,-0.452655,-0.369363,-0.293368,-0.424985,2.410140,-0.354276,-0.442195,-1
6,L. Modrić,1.472553,Croatia,3.579489,3.208358,Real Madrid,11.530435,18.627466,2.499850,Right,...,-1.685689,-0.241409,-0.452655,2.707363,-0.293368,-0.424985,-0.414914,-0.354276,-0.442195,-1
7,L. Suárez,1.258441,Uruguay,3.579489,3.208358,FC Barcelona,13.851381,20.216657,2.741737,Right,...,0.593229,-0.241409,-0.452655,-0.369363,-0.293368,-0.424985,2.410140,-0.354276,-0.442195,-1
8,Sergio Ramos,1.472553,Spain,3.579489,3.208358,Real Madrid,8.673886,16.811248,2.210319,Right,...,0.593229,-0.241409,2.209187,-0.369363,-0.293368,-0.424985,-0.414914,-0.354276,-0.442195,-1
9,J. Oblak,-0.026235,Slovenia,3.434863,3.534396,Atlético Madrid,11.708969,3.825292,-0.978191,Right,...,0.593229,-0.241409,-0.452655,-0.369363,-0.293368,-0.424985,-0.414914,2.822660,-0.442195,-1


In [11]:
df[df['cluster']==0]

Unnamed: 0,Name,Age,Nationality,Overall,Potential,Club,Value,Wage,Special,Preferred Foot,...,DWR_Medium,Pos_AM,Pos_CB,Pos_CM,Pos_DM,Pos_FB,Pos_FW,Pos_GK,Pos_W,cluster
76,Iniesta,1.900779,Spain,2.856362,2.393263,Vissel Kobe,3.407125,0.510694,1.686231,Right,...,0.593229,-0.241409,-0.452655,-0.369363,-0.293368,-0.424985,2.41014,-0.354276,-0.442195,0
109,Z. Ibrahimović,2.329004,Sweden,2.711737,2.230244,LA Galaxy,2.068118,0.238262,1.517643,Right,...,-1.685689,-0.241409,-0.452655,-0.369363,-0.293368,-0.424985,2.41014,-0.354276,-0.442195,0
130,Iago Aspas,1.044328,Spain,2.567112,2.067225,RC Celta,4.924666,1.600425,1.682566,Left,...,-1.685689,-0.241409,-0.452655,-0.369363,-0.293368,-0.424985,2.41014,-0.354276,-0.442195,0
145,Jonas,1.900779,Brazil,2.567112,2.067225,SL Benfica,2.514453,0.692316,1.557957,Right,...,0.593229,-0.241409,-0.452655,-0.369363,-0.293368,-0.424985,2.41014,-0.354276,-0.442195,0
151,A. Gómez,1.044328,Argentina,2.567112,2.067225,Atalanta,4.924666,1.963669,1.653246,Right,...,0.593229,-0.241409,-0.452655,-0.369363,-0.293368,-0.424985,2.41014,-0.354276,-0.442195,0
159,Louri Beretta,0.187878,Brazil,2.422487,1.904205,Atlético Mineiro,4.656865,2.281507,1.184132,Right,...,-1.685689,-0.241409,-0.452655,-0.369363,-0.293368,-0.424985,2.41014,-0.354276,-0.442195,0
174,Gerard Moreno,0.187878,Spain,2.422487,2.230244,Villarreal CF,5.281735,1.963669,1.352720,Left,...,0.593229,-0.241409,-0.452655,-0.369363,-0.293368,-0.424985,2.41014,-0.354276,-0.442195,0
193,Rodrigo,0.401990,Spain,2.422487,2.067225,Valencia CF,5.013933,2.145290,1.693561,Left,...,0.593229,-0.241409,-0.452655,-0.369363,-0.293368,-0.424985,2.41014,-0.354276,-0.442195,0
204,B. Dost,0.830215,Netherlands,2.422487,1.904205,Sporting CP,4.210529,0.737722,0.715018,Right,...,0.593229,-0.241409,-0.452655,-0.369363,-0.293368,-0.424985,2.41014,-0.354276,-0.442195,0
206,M. Balotelli,0.401990,Italy,2.422487,1.904205,OGC Nice,4.478331,2.054479,1.158477,Right,...,-1.685689,-0.241409,-0.452655,-0.369363,-0.293368,-0.424985,2.41014,-0.354276,-0.442195,0


In [12]:
df[df['cluster']==1]

Unnamed: 0,Name,Age,Nationality,Overall,Potential,Club,Value,Wage,Special,Preferred Foot,...,DWR_Medium,Pos_AM,Pos_CB,Pos_CM,Pos_DM,Pos_FB,Pos_FW,Pos_GK,Pos_W,cluster
41,G. Buffon,3.185454,Italy,3.145613,2.719301,Paris Saint-Germain,0.282775,3.053399,-0.967196,Right,...,0.593229,-0.241409,-0.452655,-0.369363,-0.293368,-0.424985,-0.414914,2.82266,-0.442195,1
126,A. Lopes,0.401990,Portugal,2.567112,2.393263,Olympique Lyonnais,4.031995,2.826372,-0.890232,Left,...,0.593229,-0.241409,-0.452655,-0.369363,-0.293368,-0.424985,-0.414914,2.82266,-0.442195,1
133,L. Hrádecký,0.616103,Finland,2.567112,2.067225,Bayer 04 Leverkusen,3.496392,2.917183,-1.707516,Right,...,0.593229,-0.241409,-0.452655,-0.369363,-0.293368,-0.424985,-0.414914,2.82266,-0.442195,1
141,Sergio Asenjo,0.830215,Spain,2.567112,2.230244,Villarreal CF,3.853461,1.464209,-0.970861,Right,...,0.593229,-0.241409,-0.452655,-0.369363,-0.293368,-0.424985,-0.414914,2.82266,-0.442195,1
147,S. Ruffier,1.258441,France,2.567112,2.067225,AS Saint-Étienne,2.960789,1.418803,-1.260392,Right,...,0.593229,-0.241409,-0.452655,-0.369363,-0.293368,-0.424985,-0.414914,2.82266,-0.442195,1
149,K. Schmeichel,1.258441,Denmark,2.567112,2.067225,Leicester City,2.960789,3.098805,-0.904892,Right,...,0.593229,-0.241409,-0.452655,-0.369363,-0.293368,-0.424985,-0.414914,2.82266,-0.442195,1
180,J. Pickford,-0.240348,England,2.422487,2.719301,Everton,4.031995,3.098805,-0.597035,Left,...,0.593229,-0.241409,-0.452655,-0.369363,-0.293368,-0.424985,-0.414914,2.82266,-0.442195,1
189,T. Horn,-0.026235,Germany,2.422487,2.556282,1. FC Köln,3.853461,0.692316,-1.234737,Left,...,0.593229,-0.241409,-0.452655,-0.369363,-0.293368,-0.424985,-0.414914,2.82266,-0.442195,1
196,Neto,0.616103,Brazil,2.422487,2.067225,Valencia CF,3.317858,1.418803,-1.110129,Right,...,0.593229,-0.241409,-0.452655,-0.369363,-0.293368,-0.424985,-0.414914,2.82266,-0.442195,1
197,O. Baumann,0.616103,Germany,2.422487,2.067225,TSG 1899 Hoffenheim,3.317858,1.464209,-1.377671,Right,...,0.593229,-0.241409,-0.452655,-0.369363,-0.293368,-0.424985,-0.414914,2.82266,-0.442195,1


In [13]:
df[df['cluster']==2]

Unnamed: 0,Name,Age,Nationality,Overall,Potential,Club,Value,Wage,Special,Preferred Foot,...,DWR_Medium,Pos_AM,Pos_CB,Pos_CM,Pos_DM,Pos_FB,Pos_FW,Pos_GK,Pos_W,cluster
102,Naldo,2.114891,Brazil,2.711737,2.230244,FC Schalke 04,1.175446,1.282587,1.323400,Right,...,0.593229,-0.241409,2.209187,-0.369363,-0.293368,-0.424985,-0.414914,-0.354276,-0.442195,2
104,Miranda,1.686666,Brazil,2.711737,2.230244,Inter,2.335919,3.916103,1.030204,Right,...,-1.685689,-0.241409,2.209187,-0.369363,-0.293368,-0.424985,-0.414914,-0.354276,-0.442195,2
108,Pepe,2.114891,Portugal,2.711737,2.230244,Beşiktaş JK,1.175446,2.145290,0.857951,Right,...,-1.685689,-0.241409,2.209187,-0.369363,-0.293368,-0.424985,-0.414914,-0.354276,-0.442195,2
127,S. de Vrij,0.187878,Netherlands,2.567112,2.393263,Inter,5.192468,3.552859,0.769992,Right,...,-1.685689,-0.241409,2.209187,-0.369363,-0.293368,-0.424985,-0.414914,-0.354276,-0.442195,2
150,Raúl Albiol,1.472553,Spain,2.567112,2.067225,Napoli,2.782255,3.552859,0.612399,Right,...,-1.685689,-0.241409,2.209187,-0.369363,-0.293368,-0.424985,-0.414914,-0.354276,-0.442195,2
152,A. Barzagli,2.543116,Italy,2.567112,2.067225,Juventus,0.318482,3.870697,0.520775,Right,...,-1.685689,-0.241409,2.209187,-0.369363,-0.293368,-0.424985,-0.414914,-0.354276,-0.442195,2
158,Josué Chiamulera,0.187878,Brazil,2.422487,1.904205,Grêmio,3.585659,1.509614,1.085178,Right,...,0.593229,-0.241409,2.209187,-0.369363,-0.293368,-0.424985,-0.414914,-0.354276,-0.442195,2
160,P. Kimpembe,-0.668573,France,2.422487,2.882320,Paris Saint-Germain,5.013933,3.280426,1.103503,Left,...,-1.685689,-0.241409,2.209187,-0.369363,-0.293368,-0.424985,-0.414914,-0.354276,-0.442195,2
165,J. Tah,-0.668573,Germany,2.422487,2.719301,Bayer 04 Leverkusen,4.835399,2.599345,0.275223,Right,...,0.593229,-0.241409,2.209187,-0.369363,-0.293368,-0.424985,-0.414914,-0.354276,-0.442195,2
175,Felipe,0.830215,Brazil,2.422487,1.904205,FC Porto,3.139323,0.556100,0.696693,Right,...,-1.685689,-0.241409,2.209187,-0.369363,-0.293368,-0.424985,-0.414914,-0.354276,-0.442195,2


In [14]:
df[df['cluster']==3]

Unnamed: 0,Name,Age,Nationality,Overall,Potential,Club,Value,Wage,Special,Preferred Foot,...,DWR_Medium,Pos_AM,Pos_CB,Pos_CM,Pos_DM,Pos_FB,Pos_FW,Pos_GK,Pos_W,cluster
143,A. Witsel,0.830215,Belgium,2.567112,2.067225,Borussia Dortmund,3.853461,3.053399,2.206654,Right,...,0.593229,-0.241409,-0.452655,2.707363,-0.293368,-0.424985,-0.414914,-0.354276,-0.442195,3
178,Manu Trigueros,0.187878,Spain,2.422487,2.067225,Villarreal CF,4.924666,1.782047,1.917123,Right,...,0.593229,-0.241409,-0.452655,2.707363,-0.293368,-0.424985,-0.414914,-0.354276,-0.442195,3
190,Allan,0.401990,Brazil,2.422487,1.904205,Napoli,4.299796,3.507454,2.049061,Right,...,-1.685689,-0.241409,-0.452655,2.707363,-0.293368,-0.424985,-0.414914,-0.354276,-0.442195,3
194,Pizzi,0.616103,Portugal,2.422487,1.904205,SL Benfica,4.121262,0.556100,2.019741,Right,...,0.593229,-0.241409,-0.452655,2.707363,-0.293368,-0.424985,-0.414914,-0.354276,-0.442195,3
195,K. Kampl,0.401990,Slovenia,2.422487,1.904205,RB Leipzig,4.299796,2.917183,1.752200,Right,...,-1.685689,-0.241409,-0.452655,2.707363,-0.293368,-0.424985,-0.414914,-0.354276,-0.442195,3
205,Oscar,0.187878,Brazil,2.422487,2.067225,Shanghai SIPG FC,5.013933,0.873938,1.554292,Right,...,0.593229,-0.241409,-0.452655,2.707363,-0.293368,-0.424985,-0.414914,-0.354276,-0.442195,3
227,Rodri,-0.668573,Spain,2.277861,2.556282,Atlético Madrid,4.746132,2.236101,1.675236,Right,...,-1.685689,-0.241409,-0.452655,2.707363,-0.293368,-0.424985,-0.414914,-0.354276,-0.442195,3
230,Rosberto Dourado,1.044328,Brazil,2.277861,1.741186,Atlético Mineiro,2.514453,1.645830,1.697225,Right,...,0.593229,-0.241409,-0.452655,2.707363,-0.293368,-0.424985,-0.414914,-0.354276,-0.442195,3
235,L. Torreira,-0.668573,Uruguay,2.277861,2.556282,Arsenal,4.031995,4.006913,1.880473,Right,...,-1.685689,-0.241409,-0.452655,2.707363,-0.293368,-0.424985,-0.414914,-0.354276,-0.442195,3
240,A. Kramarić,0.401990,Croatia,2.277861,1.904205,TSG 1899 Hoffenheim,4.210529,2.009074,1.521308,Right,...,0.593229,-0.241409,-0.452655,2.707363,-0.293368,-0.424985,-0.414914,-0.354276,-0.442195,3


In [16]:
df[df['cluster']==4]

Unnamed: 0,Name,Age,Nationality,Overall,Potential,Club,Value,Wage,Special,Preferred Foot,...,DWR_Medium,Pos_AM,Pos_CB,Pos_CM,Pos_DM,Pos_FB,Pos_FW,Pos_GK,Pos_W,cluster
153,Quaresma,1.900779,Portugal,2.567112,2.067225,Beşiktaş JK,2.335919,3.189615,1.429684,Right,...,-1.685689,-0.241409,-0.452655,-0.369363,-0.293368,-0.424985,-0.414914,-0.354276,2.261447,4
154,A. Robben,1.900779,Netherlands,2.567112,2.067225,FC Bayern München,2.335919,4.551779,1.634921,Left,...,-1.685689,-0.241409,-0.452655,-0.369363,-0.293368,-0.424985,-0.414914,-0.354276,2.261447,4
157,Ronaldo Cabrais,0.187878,Brazil,2.422487,1.904205,Grêmio,4.567598,1.872858,1.821834,Right,...,0.593229,-0.241409,-0.452655,-0.369363,-0.293368,-0.424985,-0.414914,-0.354276,2.261447,4
164,K. Coman,-0.668573,France,2.422487,2.556282,FC Bayern München,5.638803,3.416643,1.173137,Right,...,-1.685689,-0.241409,-0.452655,-0.369363,-0.293368,-0.424985,-0.414914,-0.354276,2.261447,4
167,T. Werner,-0.668573,Germany,2.422487,2.556282,RB Leipzig,5.728071,2.735561,1.352720,Right,...,0.593229,-0.241409,-0.452655,-0.369363,-0.293368,-0.424985,-0.414914,-0.354276,2.261447,4
170,Q. Promes,0.187878,Netherlands,2.422487,2.067225,Sevilla FC,5.013933,0.828532,1.678901,Right,...,-1.685689,-0.241409,-0.452655,-0.369363,-0.293368,-0.424985,-0.414914,-0.354276,2.261447,4
173,Y. Carrasco,-0.240348,Belgium,2.422487,2.393263,Dalian YiFang FC,5.460269,0.465289,1.528638,Right,...,0.593229,-0.241409,-0.452655,-0.369363,-0.293368,-0.424985,-0.414914,-0.354276,2.261447,4
221,F. Ribéry,2.114891,France,2.422487,1.904205,FC Bayern München,1.621782,2.826372,1.213452,Right,...,0.593229,-0.241409,-0.452655,-0.369363,-0.293368,-0.424985,-0.414914,-0.354276,2.261447,4
233,Gelson Martins,-0.454460,Portugal,2.277861,2.556282,Atlético Madrid,4.835399,2.236101,1.557957,Right,...,0.593229,-0.241409,-0.452655,-0.369363,-0.293368,-0.424985,-0.414914,-0.354276,2.261447,4
234,Gonçalo Guedes,-0.882685,Portugal,2.277861,2.393263,Valencia CF,4.746132,1.191776,1.495653,Right,...,-1.685689,-0.241409,-0.452655,-0.369363,-0.293368,-0.424985,-0.414914,-0.354276,2.261447,4


In [17]:
df[df['cluster']==5]

Unnamed: 0,Name,Age,Nationality,Overall,Potential,Club,Value,Wage,Special,Preferred Foot,...,DWR_Medium,Pos_AM,Pos_CB,Pos_CM,Pos_DM,Pos_FB,Pos_FW,Pos_GK,Pos_W,cluster
106,Filipe Luís,1.472553,Brazil,2.711737,2.230244,Atlético Madrid,3.407125,3.235021,2.005082,Left,...,0.593229,-0.241409,-0.452655,-0.369363,-0.293368,2.353023,-0.414914,-0.354276,-0.442195,5
200,Marcos Alonso,0.401990,Spain,2.422487,2.067225,Chelsea,3.853461,5.459888,2.210319,Left,...,-1.685689,-0.241409,-0.452655,-0.369363,-0.293368,2.353023,-0.414914,-0.354276,-0.442195,5
216,L. Bender,0.830215,Germany,2.422487,1.904205,Bayer 04 Leverkusen,3.139323,3.280426,1.653246,Right,...,-1.685689,-0.241409,-0.452655,-0.369363,-0.293368,2.353023,-0.414914,-0.354276,-0.442195,5
238,L. Hernández,-0.668573,France,2.277861,2.719301,Atlético Madrid,4.121262,1.963669,1.301410,Left,...,-1.685689,-0.241409,-0.452655,-0.369363,-0.293368,2.353023,-0.414914,-0.354276,-0.442195,5
241,A. Robertson,-0.240348,Scotland,2.277861,2.556282,Liverpool,3.942728,4.006913,1.557957,Left,...,-1.685689,-0.241409,-0.452655,-0.369363,-0.293368,2.353023,-0.414914,-0.354276,-0.442195,5
247,João Cancelo,-0.240348,Portugal,2.277861,2.719301,Juventus,4.121262,3.870697,2.111365,Right,...,0.593229,-0.241409,-0.452655,-0.369363,-0.293368,2.353023,-0.414914,-0.354276,-0.442195,5
258,A. Florenzi,0.401990,Italy,2.277861,1.904205,Roma,3.228591,2.871777,2.400897,Right,...,0.593229,-0.241409,-0.452655,-0.369363,-0.293368,2.353023,-0.414914,-0.354276,-0.442195,5
276,Mário Fernandes,0.401990,Russia,2.277861,1.741186,PFC CSKA Moscow,2.871522,-0.397415,1.847489,Right,...,-1.685689,-0.241409,-0.452655,-0.369363,-0.293368,2.353023,-0.414914,-0.354276,-0.442195,5
290,K. Trippier,0.401990,England,2.277861,1.741186,Tottenham Hotspur,2.871522,4.052319,2.063721,Right,...,-1.685689,-0.241409,-0.452655,-0.369363,-0.293368,2.353023,-0.414914,-0.354276,-0.442195,5
291,A. Kolarov,1.472553,Serbia,2.277861,1.741186,Roma,1.800316,2.871777,2.316603,Left,...,0.593229,-0.241409,-0.452655,-0.369363,-0.293368,2.353023,-0.414914,-0.354276,-0.442195,5


In [15]:
df[df['cluster']==6]

Unnamed: 0,Name,Age,Nationality,Overall,Potential,Club,Value,Wage,Special,Preferred Foot,...,DWR_Medium,Pos_AM,Pos_CB,Pos_CM,Pos_DM,Pos_FB,Pos_FW,Pos_GK,Pos_W,cluster
187,Danilo Pereira,0.187878,Portugal,2.422487,2.393263,FC Porto,4.389063,0.510694,1.576282,Right,...,-1.685689,-0.241409,-0.452655,-0.369363,3.408692,-0.424985,-0.414914,-0.354276,-0.442195,6
203,K. Strootman,0.616103,Netherlands,2.422487,1.904205,Olympique de Marseille,3.317858,2.553939,1.983092,Left,...,-1.685689,-0.241409,-0.452655,-0.369363,3.408692,-0.424985,-0.414914,-0.354276,-0.442195,6
215,Javi Martínez,0.830215,Spain,2.422487,1.904205,FC Bayern München,3.139323,3.825292,1.283086,Right,...,0.593229,-0.241409,-0.452655,-0.369363,3.408692,-0.424985,-0.414914,-0.354276,-0.442195,6
217,Lucas Leiva,1.258441,Brazil,2.422487,1.904205,Lazio,2.692988,2.463128,1.821834,Right,...,-1.685689,-0.241409,-0.452655,-0.369363,3.408692,-0.424985,-0.414914,-0.354276,-0.442195,6
223,D. De Rossi,1.900779,Italy,2.422487,1.904205,Roma,0.996912,3.189615,1.843824,Right,...,-1.685689,-0.241409,-0.452655,-0.369363,3.408692,-0.424985,-0.414914,-0.354276,-0.442195,6
249,T. Partey,-0.026235,Ghana,2.277861,2.719301,Atlético Madrid,4.121262,2.236101,1.722880,Right,...,-1.685689,-0.241409,-0.452655,-0.369363,3.408692,-0.424985,-0.414914,-0.354276,-0.442195,6
287,S. Nzonzi,0.830215,France,2.277861,1.741186,Roma,2.603721,3.053399,1.495653,Right,...,0.593229,-0.241409,-0.452655,-0.369363,3.408692,-0.424985,-0.414914,-0.354276,-0.442195,6
298,M. Parolo,1.686666,Italy,2.277861,1.741186,Lazio,1.175446,2.236101,2.027071,Right,...,-1.685689,-0.241409,-0.452655,-0.369363,3.408692,-0.424985,-0.414914,-0.354276,-0.442195,6
299,L. Fejsa,0.830215,Serbia,2.277861,1.741186,SL Benfica,2.603721,0.374478,1.272091,Right,...,-1.685689,-0.241409,-0.452655,-0.369363,3.408692,-0.424985,-0.414914,-0.354276,-0.442195,6
313,L. Biglia,1.472553,Argentina,2.277861,1.741186,Milan,1.800316,4.006913,1.854818,Right,...,-1.685689,-0.241409,-0.452655,-0.369363,3.408692,-0.424985,-0.414914,-0.354276,-0.442195,6


In [18]:
df[df['cluster']==7]

Unnamed: 0,Name,Age,Nationality,Overall,Potential,Club,Value,Wage,Special,Preferred Foot,...,DWR_Medium,Pos_AM,Pos_CB,Pos_CM,Pos_DM,Pos_FB,Pos_FW,Pos_GK,Pos_W,cluster
208,Giuliano,0.616103,Brazil,2.422487,1.904205,Al Nassr,4.210529,2.463128,1.741205,Right,...,0.593229,4.142342,-0.452655,-0.369363,-0.293368,-0.424985,-0.414914,-0.354276,-0.442195,7
282,J. Pastore,0.830215,Argentina,2.277861,1.741186,Roma,3.407125,3.371237,1.656911,Right,...,0.593229,4.142342,-0.452655,-0.369363,-0.293368,-0.424985,-0.414914,-0.354276,-0.442195,7
295,N. Gaitán,1.044328,Argentina,2.277861,1.741186,Dalian YiFang FC,3.317858,0.556100,1.510313,Left,...,0.593229,4.142342,-0.452655,-0.369363,-0.293368,-0.424985,-0.414914,-0.354276,-0.442195,7
320,Laure Santeiro,1.044328,Brazil,2.133236,1.578167,Fluminense,2.603721,1.373398,0.934915,Left,...,0.593229,4.142342,-0.452655,-0.369363,-0.293368,-0.424985,-0.414914,-0.354276,-0.442195,7
325,Pablo Fornals,-0.668573,Spain,2.133236,2.393263,Villarreal CF,3.853461,1.191776,1.733875,Right,...,-1.685689,4.142342,-0.452655,-0.369363,-0.293368,-0.424985,-0.414914,-0.354276,-0.442195,7
337,João Mário,-0.026235,Portugal,2.133236,2.067225,Inter,3.585659,2.780967,1.675236,Right,...,0.593229,4.142342,-0.452655,-0.369363,-0.293368,-0.424985,-0.414914,-0.354276,-0.442195,7
360,J. Iličić,1.044328,Slovenia,2.133236,1.578167,Atalanta,2.603721,1.418803,1.385704,Left,...,0.593229,4.142342,-0.452655,-0.369363,-0.293368,-0.424985,-0.414914,-0.354276,-0.442195,7
365,D. Tadić,0.830215,Serbia,2.133236,1.578167,Ajax,2.692988,0.737722,1.371045,Left,...,-1.685689,4.142342,-0.452655,-0.369363,-0.293368,-0.424985,-0.414914,-0.354276,-0.442195,7
379,S. Kagawa,0.830215,Japan,2.133236,1.578167,Borussia Dortmund,2.692988,2.508534,1.283086,Right,...,0.593229,4.142342,-0.452655,-0.369363,-0.293368,-0.424985,-0.414914,-0.354276,-0.442195,7
402,Borja Valero,1.686666,Spain,2.133236,1.578167,Inter,1.532515,2.917183,1.616596,Right,...,0.593229,4.142342,-0.452655,-0.369363,-0.293368,-0.424985,-0.414914,-0.354276,-0.442195,7


The initial analysis clustered players based on positional groups with the exception of the unclustered group which contained many of the top players in the game from all position groups. Cluster labels for each position group were as follows:

        -1: unclustered
        0: forward
        1: goalkeeper
        2: centre back
        3: central midfield
        4: wingers
        5: fullback
        6: defensive midfield
        7: attacking midfield

This clustering analysis did not separate players by age, overall, potential, value, and wage as all clusters had a wide range of values for these features. With regard to scouting players and searching for good value transfer targets, meaningful clusters were not produced.

## Model Traning 2

In [19]:
# select features to cluster data by
X_columns = ['Age', 'Overall', 'Potential', 'Value', 'Wage']

In [20]:
model = DBSCAN(eps=5, min_samples=15)
model.fit(df[X_columns])

cluster_labels = model.labels_
n_clusters = len(set(cluster_labels))
print(collections.Counter(cluster_labels))

Counter({0: 18157, -1: 2})


In [21]:
# select features to cluster data by
X_columns = ['Age', 'Overall', 'Potential', 'Value', 'Wage', 'Pos_FW', 'Pos_AM', 'Pos_W', 'Pos_CM', 
             'Pos_DM', 'Pos_FB', 'Pos_CB', 'Pos_GK']

In [22]:
model = DBSCAN(eps=2, min_samples=15)
model.fit(df_raw[X_columns])

cluster_labels = model.labels_
n_clusters = len(set(cluster_labels))
print(collections.Counter(cluster_labels))

df_raw['cluster'] = cluster_labels

Counter({1: 12861, -1: 5218, 4: 16, 0: 15, 3: 15, 2: 14, 6: 12, 5: 8})


## Model Evaluation with Unscaled Data

In [23]:
# Inter-Cluster
centroids = []
for cluster in sorted(set(model.labels_)):
    centroids.append(df[df['cluster']==cluster][X_columns].mean().values)
distances = []
for c1 in centroids:
    for c2 in centroids:
        distances.append(euclidean_distances(c1.reshape(-1, 1), c2.reshape(-1, 1))[0][0])
print('Inter Cluster distance', np.mean(distances))

# Intra-Cluster
distances = []
for cluster in sorted(set(model.labels_)):
    df_filter = df[df['cluster']==cluster]
    centroid = df_filter[X_columns].mean().values
    for k, v in df_filter[X_columns].iterrows():
        distances.append(euclidean_distances(centroid.reshape(-1, 1), v.values.reshape(-1, 1))[0][0])
print('Intra Cluster distance', np.mean(distances))

# Inertia
distances = []
for cluster in sorted(set(model.labels_)):
    df_filter = df[df['cluster']==cluster]
    centroid = df_filter[X_columns].mean().values
    for k, v in df_filter[X_columns].iterrows():
        distances.append(euclidean_distances(centroid.reshape(1, -1), v.values.reshape(1, -1), squared=True)[0][0])
print('Inertia', np.sum(distances))

Inter Cluster distance 0.2073226601303598
Intra Cluster distance 0.8154145950076919
Inertia 68900.1226028749


## Cluster Descriptions 2

Clustering the data using DBSCAN when the features were not scaled yielded very different results from the clustering analysis done with scaled data. Meaningful clusters were not able to be formed when scaled age, overall, potential, value, and wage were the only features considered. When unscaled data were considered with position groups, more meaningful clusters resulted and the inertia value improved from the initial analysis. In this clustering analysis, more weight was given to age, overall, potential, value, and wage over position groups due to their unscaled nature. This analysis resulted in one cluster having the majority of players and a decent portion of players being unable to be clustered. As expected, this analysis clustered players favouring age, ratings, value, and wage.

In [25]:
print(df_raw['Overall'].mean())
print(df_raw['Potential'].mean())
print(df_raw['Value'].mean())

66.24990362905446
71.31912550250564
2.4161313949005554


In [26]:
df_raw[df_raw['cluster']==1].sort_values(by='Potential', ascending=False).head()

Unnamed: 0,Name,Age,Nationality,Overall,Potential,Club,Value,Wage,Special,Preferred Foot,...,DWR_Medium,Pos_AM,Pos_CB,Pos_CM,Pos_DM,Pos_FB,Pos_FW,Pos_GK,Pos_W,cluster
11916,J. Carbonero,18,Colombia,64,82,Once Caldas,0.925,1.0,1522,Right,...,1,0,0,0,0,0,0,0,1,1
13568,A. Almendra,18,Argentina,62,81,Boca Juniors,0.625,2.0,1630,Right,...,1,0,0,1,0,0,0,0,0,1
9685,M. Coulibaly,19,Senegal,66,81,Udinese,1.4,4.0,1763,Right,...,1,0,0,1,0,0,0,0,0,1
12791,Y. Dhanda,19,England,63,81,Swansea City,0.8,3.0,1655,Right,...,1,1,0,0,0,0,0,0,0,1
10113,J. Pérez,20,United States,65,81,Los Angeles FC,1.2,2.0,1623,Left,...,1,0,0,0,0,0,0,0,1,1


In [27]:
df_raw[df_raw['cluster']==1].sort_values(by='Value', ascending=False).head()

Unnamed: 0,Name,Age,Nationality,Overall,Potential,Club,Value,Wage,Special,Preferred Foot,...,DWR_Medium,Pos_AM,Pos_CB,Pos_CM,Pos_DM,Pos_FB,Pos_FW,Pos_GK,Pos_W,cluster
2210,Míchel,29,Spain,74,74,Real Valladolid CF,5.5,18.0,1891,Right,...,1,0,0,1,0,0,0,0,0,1
2219,J. Kembo-Ekoko,30,DR Congo,74,74,Bursaspor,5.5,18.0,1873,Right,...,0,0,0,0,0,0,0,0,1,1
2301,Diogo Figueiras,27,Portugal,74,75,SC Braga,5.0,12.0,1964,Right,...,1,0,0,0,0,1,0,0,0,1
2432,J. Kana-Biyik,28,Cameroon,74,75,Kayserispor,5.0,13.0,1609,Right,...,1,0,1,0,0,0,0,0,0,1
3158,Y. Namli,24,Denmark,73,76,PEC Zwolle,5.0,8.0,1811,Left,...,1,0,0,0,0,0,0,0,1,1


The cluster with the majority of players varies broadly in age, position group, overall rating, potential, value, and wage. There are older players in the cluster whose potential is almost or has been fulfilled and younger players who have large unfulfilled potential and reasonable market value. The latter sub-cluster can be considered to be some players that should be viewed as possible transfer targets. The top world-class players are not included in this cluster of the majority of players.

In [28]:
df_raw[df_raw['cluster']==4]

Unnamed: 0,Name,Age,Nationality,Overall,Potential,Club,Value,Wage,Special,Preferred Foot,...,DWR_Medium,Pos_AM,Pos_CB,Pos_CM,Pos_DM,Pos_FB,Pos_FW,Pos_GK,Pos_W,cluster
3168,Hervías,25,Spain,73,76,SD Eibar,5.0,17.0,1783,Right,...,1,0,0,0,0,0,0,0,1,4
3189,A. Solari,26,Argentina,73,74,Racing Club,4.7,17.0,1956,Right,...,0,0,0,0,0,0,0,0,1,4
3222,Rober Ibáñez,25,Spain,72,75,Getafe CF,3.9,15.0,1836,Right,...,1,0,0,0,0,0,0,0,1,4
3265,K. Karaman,24,Turkey,72,76,Fortuna Düsseldorf,4.1,18.0,1726,Right,...,1,0,0,0,0,0,0,0,1,4
3403,T. Pledl,24,Germany,72,75,FC Ingolstadt 04,3.9,16.0,1872,Right,...,1,0,0,0,0,0,0,0,1,4
3523,João Schmidt,25,Brazil,72,76,Rio Ave FC,3.9,18.0,1977,Left,...,1,0,0,1,0,0,0,0,0,4
3651,R. Gómez,25,Argentina,72,75,Unión de Santa Fe,3.9,17.0,1903,Right,...,0,0,0,0,0,0,0,0,1,4
3664,R. Matos,25,Brazil,72,75,Hellas Verona,3.9,19.0,1814,Right,...,1,0,0,0,0,0,0,0,1,4
3743,A. Biyogo Poko,25,Gabon,72,76,Göztepe SK,3.9,17.0,1941,Right,...,0,0,0,0,1,0,0,0,0,4
3802,O. Rivero,26,Uruguay,72,75,Club Atlas,3.9,17.0,1683,Right,...,1,0,0,0,0,0,1,0,0,4


In [29]:
df_raw[df_raw['cluster']==3]

Unnamed: 0,Name,Age,Nationality,Overall,Potential,Club,Value,Wage,Special,Preferred Foot,...,DWR_Medium,Pos_AM,Pos_CB,Pos_CM,Pos_DM,Pos_FB,Pos_FW,Pos_GK,Pos_W,cluster
2700,S. Thioub,23,France,73,78,Nîmes Olympique,5.5,14.0,1785,Left,...,1,0,0,0,0,0,0,0,1,3
2930,T. Murg,23,Austria,73,77,SK Rapid Wien,5.0,14.0,1912,Left,...,1,0,0,0,0,0,0,0,1,3
3255,P. Gerkens,23,Belgium,72,77,RSC Anderlecht,4.2,13.0,1909,Right,...,1,0,0,0,0,0,1,0,0,3
3469,A. Castro,23,Argentina,72,78,San Lorenzo de Almagro,4.3,13.0,1918,Left,...,1,0,0,0,0,0,0,0,1,3
3575,A. Barboza,23,Argentina,72,77,Defensa y Justicia,3.5,13.0,1449,Left,...,1,0,1,0,0,0,0,0,0,3
3669,M. Møller Dæhli,23,Norway,72,77,FC St. Pauli,4.2,12.0,1619,Right,...,0,0,0,0,0,0,0,0,1,3
3703,D. Bouanga,23,Gabon,72,77,Nîmes Olympique,4.2,12.0,1778,Right,...,1,0,0,0,0,0,0,0,1,3
3742,J. Otero,23,Colombia,72,79,Amiens SC,4.4,12.0,1847,Right,...,1,0,0,0,0,0,0,0,1,3
3787,M. Rodríguez,23,Chile,72,78,U.N.A.M.,4.3,14.0,1816,Right,...,1,0,0,0,0,0,0,0,1,3
3864,L. Phiri,23,South Africa,72,78,En Avant de Guingamp,4.2,14.0,1962,Right,...,0,0,0,1,0,0,0,0,0,3


The two clusters above have players in their early and mid-20s who have not reached their full potential. Wingers are the most represented in this cluster with regards to position group. All players in this cluster have an overall rating and potential rating greater than the respective means of the entire dataset suggesting they are decent players and should be looked into as possible transfer targets. However, the value of every player in this cluster is greater than the mean value of all players.

In [30]:
df_raw[df_raw['cluster']==0]

Unnamed: 0,Name,Age,Nationality,Overall,Potential,Club,Value,Wage,Special,Preferred Foot,...,DWR_Medium,Pos_AM,Pos_CB,Pos_CM,Pos_DM,Pos_FB,Pos_FW,Pos_GK,Pos_W,cluster
977,O. Kıvrak,30,Turkey,77,77,Trabzonspor,5.5,24.0,1289,Right,...,1,0,0,0,0,0,0,1,0,0
1167,N. Pallois,30,France,77,77,FC Nantes,6.5,25.0,1754,Left,...,0,0,1,0,0,0,0,0,0,0
1220,S. Langkamp,30,Germany,76,76,SV Werder Bremen,5.5,24.0,1488,Right,...,1,0,1,0,0,0,0,0,0,0
1225,M. Esser,30,Germany,76,76,Hannover 96,4.9,24.0,1193,Right,...,1,0,0,0,0,0,0,1,0,0
1282,Alexo Baia,30,Brazil,76,76,Cruzeiro,5.5,25.0,1920,Right,...,1,0,0,0,0,1,0,0,0,0
1392,F. Lustenberger,30,Switzerland,76,76,Hertha BSC,5.5,24.0,1758,Right,...,0,0,1,0,0,0,0,0,0,0
1433,P. Aguilar,31,Paraguay,76,76,Cruz Azul,5.0,23.0,1781,Right,...,0,0,1,0,0,0,0,0,0,0
1455,Júnior Caiçara,29,Brazil,76,76,Medipol Başakşehir FK,6.0,24.0,2059,Right,...,1,0,0,0,0,1,0,0,0,0
1486,Andeson Trigo,30,Brazil,76,76,Fluminense,5.5,25.0,2105,Left,...,1,0,0,0,0,1,0,0,0,0
1513,Raúl Navas,30,Spain,76,76,Real Sociedad,5.5,24.0,1688,Right,...,1,0,1,0,0,0,0,0,0,0


This cluster has players who are approaching or have reached 30 years of age and have reached their peak potential. Their market values are greater than the dataset mean value as well. A return on investment from a market value perspective is unlikely to materialize by acquiring a player from this cluster.

In [31]:
df_raw[df_raw['cluster']==5]

Unnamed: 0,Name,Age,Nationality,Overall,Potential,Club,Value,Wage,Special,Preferred Foot,...,DWR_Medium,Pos_AM,Pos_CB,Pos_CM,Pos_DM,Pos_FB,Pos_FW,Pos_GK,Pos_W,cluster
4112,P. Škuletić,28,Serbia,71,71,Montpellier HSC,2.4,18.0,1565,Left,...,0,0,0,0,0,0,1,0,0,5
4179,N. Ghilas,28,Algeria,71,71,Göztepe SK,2.4,18.0,1745,Right,...,1,0,0,0,0,0,1,0,0,5
4295,D. Kaiser,29,Germany,71,71,Brøndby IF,2.3,18.0,1970,Right,...,1,1,0,0,0,0,0,0,0,5
4515,F. Navarro,29,Mexico,71,71,Club León,1.8,18.0,1928,Right,...,1,0,0,0,0,1,0,0,0,5
4623,R. Herrera,29,Uruguay,71,71,Pachuca,1.8,18.0,1549,Right,...,1,0,1,0,0,0,0,0,0,5
4731,Cristian López,29,Spain,71,71,Angers SCO,2.4,17.0,1703,Right,...,1,0,0,0,0,0,1,0,0,5
4970,D. Blum,27,Germany,70,71,UD Las Palmas,2.1,19.0,1727,Left,...,0,0,0,0,0,0,1,0,0,5
5455,L. Jutkiewicz,29,England,70,70,Birmingham City,1.8,18.0,1761,Left,...,1,0,0,0,0,0,1,0,0,5


This cluster is similar to the previous cluster with players approaching 30 years of age that have reached their potential but a difference is that all players in this cluster have a market value that is less than the mean market value of the entire dataset. With the potential rating also being all lower than that of the dataset mean, these players will likely not help a team and thus should not be targeted in transfers.

In [32]:
df_raw[df_raw['cluster']==2]

Unnamed: 0,Name,Age,Nationality,Overall,Potential,Club,Value,Wage,Special,Preferred Foot,...,DWR_Medium,Pos_AM,Pos_CB,Pos_CM,Pos_DM,Pos_FB,Pos_FW,Pos_GK,Pos_W,cluster
3193,C. Roldan,23,United States,73,79,Seattle Sounders FC,4.7,6.0,1987,Right,...,0,0,0,0,0,0,0,0,1,2
3247,Bruno Tabata,21,Brazil,72,81,Portimonense SC,4.8,6.0,1827,Left,...,0,0,0,0,0,0,0,0,1,2
3253,H. Diallo,23,Senegal,72,80,FC Metz,4.6,6.0,1643,Right,...,1,0,0,0,0,0,1,0,0,2
3272,M. Murillo,22,Panama,72,80,New York Red Bulls,3.9,5.0,1913,Right,...,0,0,0,0,0,1,0,0,0,2
3294,Pedro Nuno,23,Portugal,72,79,Moreirense FC,4.4,5.0,1830,Right,...,1,0,0,0,0,0,0,0,1,2
3381,Rafa Soares,23,Portugal,72,80,Vitória Guimarães,3.9,6.0,1945,Left,...,1,0,0,0,0,1,0,0,0,2
3391,S. Adegbenro,22,Nigeria,72,80,Rosenborg BK,4.6,6.0,1873,Right,...,1,0,0,0,0,0,0,0,1,2
3480,S. Mosquera,23,Colombia,72,80,FC Dallas,4.6,6.0,1734,Right,...,0,0,0,0,0,0,0,0,1,2
3656,L. Agbenyenu,21,Ghana,72,80,Sporting CP,3.9,6.0,1802,Left,...,1,0,0,0,0,1,0,0,0,2
3849,K. Acosta,22,United States,72,80,Colorado Rapids,4.5,6.0,2032,Right,...,0,0,0,1,0,0,0,0,0,2


This cluster is similar to the clusters with the players in their 20s who have not fulfilled their potential but the players in this cluster have a slightly greater potential and their wages are lower. None of the wages of players in this cluster exceeds 7K where none of the wages in the aforementioned clusters were less than 12K. Players in this cluster would be of better value due to the lower wages and should be deemed as transfer targets especially after considering age and potential. Like its similar clusters, wingers are well-represented in this cluster.

In [33]:
df_raw[df_raw['cluster']==6]

Unnamed: 0,Name,Age,Nationality,Overall,Potential,Club,Value,Wage,Special,Preferred Foot,...,DWR_Medium,Pos_AM,Pos_CB,Pos_CM,Pos_DM,Pos_FB,Pos_FW,Pos_GK,Pos_W,cluster
14815,A. Kay,35,England,60,60,Port Vale,0.05,2.0,1579,Right,...,1,0,0,0,1,0,0,0,0,6
15342,T. Enomoto,35,Japan,59,59,Urawa Red Diamonds,0.04,1.0,841,Right,...,1,0,0,0,0,0,0,1,0,6
15431,M. Sawa,35,Japan,59,59,Kashiwa Reysol,0.07,1.0,1476,Right,...,0,0,0,0,0,0,1,0,0,6
15476,L. Kryger,35,Denmark,59,59,AC Horsens,0.07,1.0,1620,Right,...,0,0,0,0,0,0,0,0,1,6
15624,Ahn Seong Nam,34,Korea Republic,59,59,Gyeongnam FC,0.05,1.0,1757,Right,...,0,0,0,0,1,0,0,0,0,6
15644,S. Russell,35,England,59,59,Grimsby Town,0.04,1.0,1171,Right,...,1,0,0,0,0,0,0,1,0,6
15720,K. Tokushige,34,Japan,59,59,V-Varen Nagasaki,0.06,1.0,899,Right,...,1,0,0,0,0,0,0,1,0,6
15808,P. Cherrie,34,Scotland,58,58,Derry City,0.05,1.0,977,Right,...,1,0,0,0,0,0,0,1,0,6
15844,B. Williams,35,England,58,58,Bolton Wanderers,0.03,1.0,1123,Right,...,1,0,0,0,0,0,0,1,0,6
16023,S. Farelli,35,Italy,58,58,Pescara,0.03,1.0,1033,Right,...,1,0,0,0,0,0,0,1,0,6


This cluster has players that are below the dataset mean for overall quality and are in the age range when most players are past their prime and approach retirement. These players should definitely not be considered for acquisition.

In [34]:
df_raw[df_raw['cluster']==-1].head()

Unnamed: 0,Name,Age,Nationality,Overall,Potential,Club,Value,Wage,Special,Preferred Foot,...,DWR_Medium,Pos_AM,Pos_CB,Pos_CM,Pos_DM,Pos_FB,Pos_FW,Pos_GK,Pos_W,cluster
0,L. Messi,31,Argentina,94,94,FC Barcelona,110.5,565.0,2202,Left,...,1,0,0,0,0,0,1,0,0,-1
1,Cristiano Ronaldo,33,Portugal,94,94,Juventus,77.0,405.0,2228,Right,...,0,0,0,0,0,0,1,0,0,-1
2,Neymar Jr,26,Brazil,92,93,Paris Saint-Germain,118.5,290.0,2143,Right,...,1,0,0,0,0,0,0,0,1,-1
3,De Gea,27,Spain,91,93,Manchester United,72.0,260.0,1471,Right,...,1,0,0,0,0,0,0,1,0,-1
4,K. De Bruyne,27,Belgium,91,92,Manchester City,102.0,355.0,2281,Right,...,0,0,0,1,0,0,0,0,0,-1


In [35]:
df_raw[df_raw['cluster']==-1].tail(20)

Unnamed: 0,Name,Age,Nationality,Overall,Potential,Club,Value,Wage,Special,Preferred Foot,...,DWR_Medium,Pos_AM,Pos_CB,Pos_CM,Pos_DM,Pos_FB,Pos_FW,Pos_GK,Pos_W,cluster
18127,E. Clarke,19,England,48,59,Fleetwood Town,0.04,1.0,1225,Left,...,1,0,0,0,0,1,0,0,0,-1
18128,T. Hillman,17,Wales,48,57,Newport County,0.04,1.0,1218,Right,...,1,0,0,0,0,0,0,0,1,-1
18129,R. Roache,18,Republic of Ireland,48,69,Blackpool,0.07,1.0,1178,Right,...,1,0,0,0,0,0,1,0,0,-1
18132,M. Hurst,22,Scotland,48,58,St. Johnstone FC,0.04,1.0,987,Right,...,1,0,0,0,0,0,0,1,0,-1
18135,K. Pilkington,44,England,48,48,Cambridge United,0.0,1.0,774,Right,...,1,0,0,0,0,0,0,1,0,-1
18136,D. Horton,18,England,48,55,Lincoln City,0.04,1.0,1368,Right,...,1,0,0,1,0,0,0,0,0,-1
18137,E. Tweed,19,Republic of Ireland,48,59,Derry City,0.05,1.0,1315,Right,...,1,0,0,1,0,0,0,0,0,-1
18138,Zhang Yufeng,20,China PR,47,64,Beijing Renhe FC,0.06,1.0,1389,Right,...,1,0,0,1,0,0,0,0,0,-1
18139,C. Ehlich,19,Germany,47,59,SpVgg Unterhaching,0.04,1.0,1366,Right,...,1,0,0,0,0,1,0,0,0,-1
18140,L. Collins,17,Wales,47,62,Newport County,0.06,1.0,1297,Right,...,1,0,0,1,0,0,0,0,0,-1


The players that were not able to be clustered include the top world-class players and players whose potential ratings are below the mean potential of all players. There are young players who have room to grow to fulfill their potential but their fulfilled potential rating is below the dataset mean and would not be a quality player to add to a team. None of these unclustered players should be considered to be acquired as the cost to obtain them would be too much and a substantial return on investment would be difficult to achieve or their quality as a player would not make them an asset to a team.