# Football Talent Recommender

![](https://www.si.com/.image/ar_16:9%2Cc_fill%2Ccs_srgb%2Cfl_progressive%2Cq_auto:good%2Cw_1200/MTc1NzkwNzE4OTg2NDk1MDMx/usmnt-abroad-top-clubs.jpg)

Supposed you are a football player scout agent and you are assigned to find a new talent for a club, or perhaps you are a Football Manager/FIFA Manager geek and have no idea which player you should sign. Sometimes, it's hard to define the details or specification of a player that we desired. Instead of saying 'I want to hire a striker who can run up to 70 km/h and kick the ball so hard which force up to 1500 Pounds', we will more likely to say 'Find me a winger like Cristiano Ronaldo' or 'Find me a midfielder like Paul Pogba'.

As a football fans and data scientist, I will try to make a recommender system using content-based filtering that will be able to give us some player recommendations based on one single player (who likely have similar performance with the players we desired) we input. Hopefully, this recommender system could give an initial idea for anyone who is looking for a new player.

The sections of this project are:
1. Data Cleansing & Preparation
2. Recommender Model Building
3. Test the Recommender

**import libraries**

In [1]:
import pandas as pd
import numpy as np

### 1. Data Cleansing & Preparation

Quick look on the data

In [48]:
pd.set_option('display.max_columns', 200)
df = pd.read_csv('players_21.csv')
df.head(3)

Unnamed: 0,sofifa_id,player_url,short_name,long_name,age,dob,height_cm,weight_kg,nationality,club_name,league_name,league_rank,overall,potential,value_eur,wage_eur,player_positions,preferred_foot,international_reputation,weak_foot,skill_moves,work_rate,body_type,real_face,release_clause_eur,player_tags,team_position,team_jersey_number,loaned_from,joined,contract_valid_until,nation_position,nation_jersey_number,pace,shooting,passing,dribbling,defending,physic,gk_diving,gk_handling,gk_kicking,gk_reflexes,gk_speed,gk_positioning,player_traits,attacking_crossing,attacking_finishing,attacking_heading_accuracy,attacking_short_passing,attacking_volleys,skill_dribbling,skill_curve,skill_fk_accuracy,skill_long_passing,skill_ball_control,movement_acceleration,movement_sprint_speed,movement_agility,movement_reactions,movement_balance,power_shot_power,power_jumping,power_stamina,power_strength,power_long_shots,mentality_aggression,mentality_interceptions,mentality_positioning,mentality_vision,mentality_penalties,mentality_composure,defending_marking,defending_standing_tackle,defending_sliding_tackle,goalkeeping_diving,goalkeeping_handling,goalkeeping_kicking,goalkeeping_positioning,goalkeeping_reflexes,ls,st,rs,lw,lf,cf,rf,rw,lam,cam,ram,lm,lcm,cm,rcm,rm,lwb,ldm,cdm,rdm,rwb,lb,lcb,cb,rcb,rb
0,158023,https://sofifa.com/player/158023/lionel-messi/...,L. Messi,Lionel Andrés Messi Cuccittini,33,1987-06-24,170,72,Argentina,FC Barcelona,Spain Primera Division,1.0,93,93,67500000,560000,"RW, ST, CF",Left,5,4,4,Medium/Low,Messi,Yes,138400000.0,"#Dribbler, #Distance Shooter, #FK Specialist, ...",CAM,10.0,,2004-07-01,2021.0,RW,10.0,85.0,92.0,91.0,95.0,38.0,65.0,,,,,,,"Finesse Shot, Long Shot Taker (AI), Speed Drib...",85,95,70,91,88,96,93,94,91,96,91,80,91,94,95,86,68,72,69,94,44,40,93,95,75,96,,35,24,6,11,15,14,8,89+3,89+3,89+3,92+0,93+0,93+0,93+0,92+0,93+0,93+0,93+0,91+2,87+3,87+3,87+3,91+2,66+3,65+3,65+3,65+3,66+3,62+3,52+3,52+3,52+3,62+3
1,20801,https://sofifa.com/player/20801/c-ronaldo-dos-...,Cristiano Ronaldo,Cristiano Ronaldo dos Santos Aveiro,35,1985-02-05,187,83,Portugal,Juventus,Italian Serie A,1.0,92,92,46000000,220000,"ST, LW",Right,5,4,5,High/Low,C. Ronaldo,Yes,75900000.0,"#Aerial Threat, #Dribbler, #Distance Shooter, ...",LS,7.0,,2018-07-10,2022.0,LS,7.0,89.0,93.0,81.0,89.0,35.0,77.0,,,,,,,"Power Free-Kick, Flair, Long Shot Taker (AI), ...",84,95,90,82,86,88,81,76,77,92,87,91,87,95,71,94,95,84,78,93,63,29,95,82,84,95,,32,24,7,11,15,14,11,91+1,91+1,91+1,89+0,91+0,91+0,91+0,89+0,88+3,88+3,88+3,88+3,81+3,81+3,81+3,88+3,65+3,61+3,61+3,61+3,65+3,61+3,54+3,54+3,54+3,61+3
2,200389,https://sofifa.com/player/200389/jan-oblak/210002,J. Oblak,Jan Oblak,27,1993-01-07,188,87,Slovenia,Atlético Madrid,Spain Primera Division,1.0,91,93,75000000,125000,GK,Right,3,3,1,Medium/Medium,PLAYER_BODY_TYPE_259,Yes,159400000.0,,GK,13.0,,2014-07-16,2023.0,GK,1.0,,,,,,,87.0,92.0,78.0,90.0,52.0,90.0,"GK Long Throw, Comes For Crosses",13,11,15,43,13,12,13,14,40,30,43,60,67,88,49,59,78,41,78,12,34,19,11,65,11,68,,12,18,87,92,78,90,90,33+3,33+3,33+3,32+0,35+0,35+0,35+0,32+0,38+3,38+3,38+3,35+3,38+3,38+3,38+3,35+3,32+3,36+3,36+3,36+3,32+3,32+3,33+3,33+3,33+3,32+3


In [3]:
df.info(max_cols=200)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18944 entries, 0 to 18943
Data columns (total 106 columns):
 #    Column                      Non-Null Count  Dtype  
---   ------                      --------------  -----  
 0    sofifa_id                   18944 non-null  int64  
 1    player_url                  18944 non-null  object 
 2    short_name                  18944 non-null  object 
 3    long_name                   18944 non-null  object 
 4    age                         18944 non-null  int64  
 5    dob                         18944 non-null  object 
 6    height_cm                   18944 non-null  int64  
 7    weight_kg                   18944 non-null  int64  
 8    nationality                 18944 non-null  object 
 9    club_name                   18719 non-null  object 
 10   league_name                 18719 non-null  object 
 11   league_rank                 18719 non-null  float64
 12   overall                     18944 non-null  int64  
 13   potential     

Instead of droping null values, we will just replace it with 0.

In [4]:
df2 = df.fillna(0)

In [47]:
df2.head(3)

Unnamed: 0,sofifa_id,player_url,short_name,long_name,age,dob,height_cm,weight_kg,nationality,club_name,league_name,league_rank,overall,potential,value_eur,wage_eur,player_positions,preferred_foot,international_reputation,weak_foot,skill_moves,work_rate,body_type,real_face,release_clause_eur,player_tags,team_position,team_jersey_number,loaned_from,joined,contract_valid_until,nation_position,nation_jersey_number,pace,shooting,passing,dribbling,defending,physic,gk_diving,gk_handling,gk_kicking,gk_reflexes,gk_speed,gk_positioning,player_traits,attacking_crossing,attacking_finishing,attacking_heading_accuracy,attacking_short_passing,attacking_volleys,skill_dribbling,skill_curve,skill_fk_accuracy,skill_long_passing,skill_ball_control,movement_acceleration,movement_sprint_speed,movement_agility,movement_reactions,movement_balance,power_shot_power,power_jumping,power_stamina,power_strength,power_long_shots,mentality_aggression,mentality_interceptions,mentality_positioning,mentality_vision,mentality_penalties,mentality_composure,defending_marking,defending_standing_tackle,defending_sliding_tackle,goalkeeping_diving,goalkeeping_handling,goalkeeping_kicking,goalkeeping_positioning,goalkeeping_reflexes
0,158023,https://sofifa.com/player/158023/lionel-messi/...,L. Messi,Lionel Andrés Messi Cuccittini,33,1987-06-24,170,72,Argentina,FC Barcelona,Spain Primera Division,1.0,93,93,67500000,560000,"RW, ST, CF",Left,5,4,4,Medium/Low,Messi,Yes,138400000.0,"#Dribbler, #Distance Shooter, #FK Specialist, ...",CAM,10.0,0,2004-07-01,2021.0,RW,10.0,85.0,92.0,91.0,95.0,38.0,65.0,0.0,0.0,0.0,0.0,0.0,0.0,"Finesse Shot, Long Shot Taker (AI), Speed Drib...",85,95,70,91,88,96,93,94,91,96,91,80,91,94,95,86,68,72,69,94,44,40,93,95,75,96,0.0,35,24,6,11,15,14,8
1,20801,https://sofifa.com/player/20801/c-ronaldo-dos-...,Cristiano Ronaldo,Cristiano Ronaldo dos Santos Aveiro,35,1985-02-05,187,83,Portugal,Juventus,Italian Serie A,1.0,92,92,46000000,220000,"ST, LW",Right,5,4,5,High/Low,C. Ronaldo,Yes,75900000.0,"#Aerial Threat, #Dribbler, #Distance Shooter, ...",LS,7.0,0,2018-07-10,2022.0,LS,7.0,89.0,93.0,81.0,89.0,35.0,77.0,0.0,0.0,0.0,0.0,0.0,0.0,"Power Free-Kick, Flair, Long Shot Taker (AI), ...",84,95,90,82,86,88,81,76,77,92,87,91,87,95,71,94,95,84,78,93,63,29,95,82,84,95,0.0,32,24,7,11,15,14,11
2,200389,https://sofifa.com/player/200389/jan-oblak/210002,J. Oblak,Jan Oblak,27,1993-01-07,188,87,Slovenia,Atlético Madrid,Spain Primera Division,1.0,91,93,75000000,125000,GK,Right,3,3,1,Medium/Medium,PLAYER_BODY_TYPE_259,Yes,159400000.0,0,GK,13.0,0,2014-07-16,2023.0,GK,1.0,0.0,0.0,0.0,0.0,0.0,0.0,87.0,92.0,78.0,90.0,52.0,90.0,"GK Long Throw, Comes For Crosses",13,11,15,43,13,12,13,14,40,30,43,60,67,88,49,59,78,41,78,12,34,19,11,65,11,68,0.0,12,18,87,92,78,90,90


We take the variables for the cosine similarity processing later.

In [6]:
df_final = df2.iloc[:,33:79]
df_final.drop(columns='player_traits', inplace=True)
df_final.head()

Unnamed: 0,pace,shooting,passing,dribbling,defending,physic,gk_diving,gk_handling,gk_kicking,gk_reflexes,gk_speed,gk_positioning,attacking_crossing,attacking_finishing,attacking_heading_accuracy,attacking_short_passing,attacking_volleys,skill_dribbling,skill_curve,skill_fk_accuracy,skill_long_passing,skill_ball_control,movement_acceleration,movement_sprint_speed,movement_agility,movement_reactions,movement_balance,power_shot_power,power_jumping,power_stamina,power_strength,power_long_shots,mentality_aggression,mentality_interceptions,mentality_positioning,mentality_vision,mentality_penalties,mentality_composure,defending_marking,defending_standing_tackle,defending_sliding_tackle,goalkeeping_diving,goalkeeping_handling,goalkeeping_kicking,goalkeeping_positioning
0,85.0,92.0,91.0,95.0,38.0,65.0,0.0,0.0,0.0,0.0,0.0,0.0,85,95,70,91,88,96,93,94,91,96,91,80,91,94,95,86,68,72,69,94,44,40,93,95,75,96,0.0,35,24,6,11,15,14
1,89.0,93.0,81.0,89.0,35.0,77.0,0.0,0.0,0.0,0.0,0.0,0.0,84,95,90,82,86,88,81,76,77,92,87,91,87,95,71,94,95,84,78,93,63,29,95,82,84,95,0.0,32,24,7,11,15,14
2,0.0,0.0,0.0,0.0,0.0,0.0,87.0,92.0,78.0,90.0,52.0,90.0,13,11,15,43,13,12,13,14,40,30,43,60,67,88,49,59,78,41,78,12,34,19,11,65,11,68,0.0,12,18,87,92,78,90
3,78.0,91.0,78.0,85.0,43.0,82.0,0.0,0.0,0.0,0.0,0.0,0.0,71,94,85,84,89,85,79,85,70,88,77,78,77,93,82,89,84,76,86,85,81,49,94,79,88,88,0.0,42,19,15,6,12,8
4,91.0,85.0,86.0,94.0,36.0,59.0,0.0,0.0,0.0,0.0,0.0,0.0,85,87,62,87,87,95,88,89,81,95,94,89,96,91,83,80,62,81,50,84,51,36,87,90,92,93,0.0,30,29,9,9,15,15


### 2. Recommender Model Building

We use Count Vectorizer and Cosine Similarity for our recommender system 'engine'.

In [7]:
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity

In [8]:
cv = CountVectorizer()
by_desc = cv.fit_transform(df2['player_positions'])
by_desc = by_desc.toarray()
df_desc = pd.DataFrame(by_desc, columns=cv.get_feature_names())
df_desc.tail()

Unnamed: 0,cam,cb,cdm,cf,cm,gk,lb,lm,lw,lwb,rb,rm,rw,rwb,st
18939,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0
18940,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0
18941,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0
18942,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0
18943,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0


In [9]:
df_use = df_final.join(df_desc)
df_use.head()

Unnamed: 0,pace,shooting,passing,dribbling,defending,physic,gk_diving,gk_handling,gk_kicking,gk_reflexes,gk_speed,gk_positioning,attacking_crossing,attacking_finishing,attacking_heading_accuracy,attacking_short_passing,attacking_volleys,skill_dribbling,skill_curve,skill_fk_accuracy,skill_long_passing,skill_ball_control,movement_acceleration,movement_sprint_speed,movement_agility,movement_reactions,movement_balance,power_shot_power,power_jumping,power_stamina,power_strength,power_long_shots,mentality_aggression,mentality_interceptions,mentality_positioning,mentality_vision,mentality_penalties,mentality_composure,defending_marking,defending_standing_tackle,defending_sliding_tackle,goalkeeping_diving,goalkeeping_handling,goalkeeping_kicking,goalkeeping_positioning,cam,cb,cdm,cf,cm,gk,lb,lm,lw,lwb,rb,rm,rw,rwb,st
0,85.0,92.0,91.0,95.0,38.0,65.0,0.0,0.0,0.0,0.0,0.0,0.0,85,95,70,91,88,96,93,94,91,96,91,80,91,94,95,86,68,72,69,94,44,40,93,95,75,96,0.0,35,24,6,11,15,14,0,0,0,1,0,0,0,0,0,0,0,0,1,0,1
1,89.0,93.0,81.0,89.0,35.0,77.0,0.0,0.0,0.0,0.0,0.0,0.0,84,95,90,82,86,88,81,76,77,92,87,91,87,95,71,94,95,84,78,93,63,29,95,82,84,95,0.0,32,24,7,11,15,14,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1
2,0.0,0.0,0.0,0.0,0.0,0.0,87.0,92.0,78.0,90.0,52.0,90.0,13,11,15,43,13,12,13,14,40,30,43,60,67,88,49,59,78,41,78,12,34,19,11,65,11,68,0.0,12,18,87,92,78,90,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0
3,78.0,91.0,78.0,85.0,43.0,82.0,0.0,0.0,0.0,0.0,0.0,0.0,71,94,85,84,89,85,79,85,70,88,77,78,77,93,82,89,84,76,86,85,81,49,94,79,88,88,0.0,42,19,15,6,12,8,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
4,91.0,85.0,86.0,94.0,36.0,59.0,0.0,0.0,0.0,0.0,0.0,0.0,85,87,62,87,87,95,88,89,81,95,94,89,96,91,83,80,62,81,50,84,51,36,87,90,92,93,0.0,30,29,9,9,15,15,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0


In [13]:
csn = cosine_similarity(df_use)

### 3. Test the Recommender

Let's test our recommender system!

In [68]:
player_name = input('Who is your player model? ')

index_player = df2[df2['short_name']==player_name].index
player_rec = list(enumerate(csn[index_player[0]]))
our_recom = sorted(player_rec, key = lambda x: x[1], reverse=True)

recom = our_recom[1:11]

print('')
print('Recommended player based on your model: ')
for i in range(len(recom)):
    print('{}. {} ({}) |{}|   - {}'.format(i+1, df2.loc[recom[i][0], 'short_name'], df2.loc[recom[i][0], 'player_positions'],  \
                                           df2.loc[recom[i][0], 'overall'], df2.loc[recom[i][0], 'club_name']))

Who is your player model? L. Messi

Recommended player based on your model: 
1. A. Miranchuk (CAM, ST) |79|   - Atalanta
2. R. Mahrez (RW, RM) |85|   - Manchester City
3. Z. Labyad (CF, CAM, RW) |75|   - Ajax
4. A. Januzaj (RW, RM) |80|   - Real Sociedad
5. A. Robben (RM, CAM) |80|   - FC Groningen
6. Neymar Jr (LW, CAM) |91|   - Paris Saint-Germain
7. C. Vela (RW, LW, CAM) |83|   - Los Angeles FC
8. R. Ghezzal (RW, RM) |73|   - Leicester City
9. G. dos Santos (CAM, ST, CF) |73|   - Club América
10. E. Hazard (LW, ST) |88|   - Real Madrid


Supposed we want to find a player like Lionel Messi, we input the name 'L. Messi' (because it's how it is written in our dataset), then our recommender system show us some names. Now lets see some names on the list carefully: R. Mahrez, A. Robben, Neymar Jr, and E. Hazard, these players are very similar with Lionel Messi; run fast, good dribbling, and good at shooting with both feet. Hence, we can say that this recommender system works well because it is able to provide us a list of recommended players who have similar skils with our player model.