# brainstorm solutions by feature

what kind of information can each field give us?

what kind of questions are we able to ask of each field?

how can each field contribute to our goal of recommending players?

## procedure
* import the data to list all fields
* group fields
* thoroughly explore each field
    * denotation
    * how could this be useful generally?
    * list all possible causal chains from a field to a player skill
* triage connections and chains for the most promising

### imports

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

In [2]:
players = pd.read_csv("data/complete_dataset_cleaned.csv",index_col='index')
players.head()

  interactivity=interactivity, compiler=compiler, result=result)


Unnamed: 0_level_0,Name,Age,Photo,Nationality,Flag,Overall,Potential,Club,Club Logo,Value,...,RB,RCB,RCM,RDM,RF,RM,RS,RW,RWB,ST
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,Cristiano Ronaldo,32,https://cdn.sofifa.org/48/18/players/20801.png,Portugal,https://cdn.sofifa.org/flags/38.png,94,94,Real Madrid CF,https://cdn.sofifa.org/24/18/teams/243.png,95500000.0,...,61.0,53.0,82.0,62.0,91.0,89.0,92.0,91.0,66.0,92.0
1,L. Messi,30,https://cdn.sofifa.org/48/18/players/158023.png,Argentina,https://cdn.sofifa.org/flags/52.png,93,93,FC Barcelona,https://cdn.sofifa.org/24/18/teams/241.png,105000000.0,...,57.0,45.0,84.0,59.0,92.0,90.0,88.0,91.0,62.0,88.0
2,Neymar,25,https://cdn.sofifa.org/48/18/players/190871.png,Brazil,https://cdn.sofifa.org/flags/54.png,92,94,Paris Saint-Germain,https://cdn.sofifa.org/24/18/teams/73.png,123000000.0,...,59.0,46.0,79.0,59.0,88.0,87.0,84.0,89.0,64.0,84.0
3,L. Suárez,30,https://cdn.sofifa.org/48/18/players/176580.png,Uruguay,https://cdn.sofifa.org/flags/60.png,92,92,FC Barcelona,https://cdn.sofifa.org/24/18/teams/241.png,97000000.0,...,64.0,58.0,80.0,65.0,88.0,85.0,88.0,87.0,68.0,88.0
4,M. Neuer,31,https://cdn.sofifa.org/48/18/players/167495.png,Germany,https://cdn.sofifa.org/flags/21.png,92,92,FC Bayern Munich,https://cdn.sofifa.org/24/18/teams/21.png,61000000.0,...,,,,,,,,,,


In [4]:
players.columns

Index(['Name', 'Age', 'Photo', 'Nationality', 'Flag', 'Overall', 'Potential',
       'Club', 'Club Logo', 'Value', 'Wage', 'Special', 'Acceleration',
       'Aggression', 'Agility', 'Balance', 'Ball control', 'Composure',
       'Crossing', 'Curve', 'Dribbling', 'Finishing', 'Free kick accuracy',
       'GK diving', 'GK handling', 'GK kicking', 'GK positioning',
       'GK reflexes', 'Heading accuracy', 'Interceptions', 'Jumping',
       'Long passing', 'Long shots', 'Marking', 'Penalties', 'Positioning',
       'Reactions', 'Short passing', 'Shot power', 'Sliding tackle',
       'Sprint speed', 'Stamina', 'Standing tackle', 'Strength', 'Vision',
       'Volleys', 'CAM', 'CB', 'CDM', 'CF', 'CM', 'ID', 'LAM', 'LB', 'LCB',
       'LCM', 'LDM', 'LF', 'LM', 'LS', 'LW', 'LWB', 'Preferred Positions',
       'RAM', 'RB', 'RCB', 'RCM', 'RDM', 'RF', 'RM', 'RS', 'RW', 'RWB', 'ST'],
      dtype='object')

### group fields
* personal attributes (Nationality, Club, Photo, Age, Value etc.)
* performance attributes (Overall, Potential, Aggression, Agility etc.)
* preferred position and ratings at all positions.

In [7]:
personal_attributes = ['Name', 'Age', 'Photo', 'Nationality', 'Flag', 'Club', 'Club Logo', 'Value', 'Wage']

In [6]:
performance_attributes = ['Overall', 'Potential', 'Special', 'Acceleration',
       'Aggression', 'Agility', 'Balance', 'Ball control', 'Composure',
       'Crossing', 'Curve', 'Dribbling', 'Finishing', 'Free kick accuracy',
       'GK diving', 'GK handling', 'GK kicking', 'GK positioning',
       'GK reflexes', 'Heading accuracy', 'Interceptions', 'Jumping',
       'Long passing', 'Long shots', 'Marking', 'Penalties', 'Positioning',
       'Reactions', 'Short passing', 'Shot power', 'Sliding tackle',
       'Sprint speed', 'Stamina', 'Standing tackle', 'Strength', 'Vision',
       'Volleys']

In [11]:
positions = ['CAM', 'CB', 'CDM', 'CF', 'CM', 'LAM', 'LB', 'LCB',
       'LCM', 'LDM', 'LF', 'LM', 'LS', 'LW', 'LWB', 'Preferred Positions',
       'RAM', 'RB', 'RCB', 'RCM', 'RDM', 'RF', 'RM', 'RS', 'RW', 'RWB', 'ST']

### describe fields

#### personal attributes

In [8]:
players[personal_attributes].head()

Unnamed: 0_level_0,Name,Age,Photo,Nationality,Flag,Club,Club Logo,Value,Wage
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
0,Cristiano Ronaldo,32,https://cdn.sofifa.org/48/18/players/20801.png,Portugal,https://cdn.sofifa.org/flags/38.png,Real Madrid CF,https://cdn.sofifa.org/24/18/teams/243.png,95500000.0,565000.0
1,L. Messi,30,https://cdn.sofifa.org/48/18/players/158023.png,Argentina,https://cdn.sofifa.org/flags/52.png,FC Barcelona,https://cdn.sofifa.org/24/18/teams/241.png,105000000.0,565000.0
2,Neymar,25,https://cdn.sofifa.org/48/18/players/190871.png,Brazil,https://cdn.sofifa.org/flags/54.png,Paris Saint-Germain,https://cdn.sofifa.org/24/18/teams/73.png,123000000.0,280000.0
3,L. Suárez,30,https://cdn.sofifa.org/48/18/players/176580.png,Uruguay,https://cdn.sofifa.org/flags/60.png,FC Barcelona,https://cdn.sofifa.org/24/18/teams/241.png,97000000.0,510000.0
4,M. Neuer,31,https://cdn.sofifa.org/48/18/players/167495.png,Germany,https://cdn.sofifa.org/flags/21.png,FC Bayern Munich,https://cdn.sofifa.org/24/18/teams/21.png,61000000.0,230000.0


##### name
* the first and last name of every player
* connections
    * does the name tell us of the player origins as being different from nationality?
    * can we run a name recognition study to discover the popularity, brand value of the name?
    * cooler name leads to more press leads to more value leads to better contract leads to better training leads to better skill
    
##### age
* the age in years of the player
* connections
    * **players are more skilled at a certain age**
    * players prefer positions by age
    * **players are more valuable by age**
    * **as far as recruiting is concerned, the main prediction activity is to know if a young player will get better as he ages. so what fields coupled with age allow us to predict overall skill?

##### photo
* a portrait of the player
* connections
    * do a facial analysis
    * derive personality based on face
    * face leads to personality leads to skill
    
##### nationality
* the country of origin of the player
* connections
    * poorer countries of better players
    * richer countries of better clubs
    * certain countries train better players of a certain skill or position
    * **because of the culture of a country, we may be able to predict a player skill growth and skill cap**
    
##### flag
* url link to the flag of the country of the player
    
##### club
* the name of the club the player is in
* connections
    * certain clubs have better players
    * clubs in certain countries have better players
    * certain clubs recruit players by select attributes
    * clubs will position a player by select attributes
    * clubs will favor players of a certain nationality
    * **which clubs lead to the highest growth and skill?**
    
##### club logo
* url link to the logo

##### value
* the market value of a player for a given year
* connections
    * value is based on skill
    * value is based on marketability
    * future value can be anticipated based on skill for young players
    * **when a player is high-value while still young, how does this affect skill development moving forwards?**
##### wage
* the weekly wage of a player in euros
* connections
    * what is strongest predictor of wage
    * better players of higher wages
    * players from richer countries of better wages
    * players of higher value have better wages
    * **how do wages affect skill development over time?**


#### performance attributes

In [9]:
players[performance_attributes].head()

Unnamed: 0_level_0,Overall,Potential,Special,Acceleration,Aggression,Agility,Balance,Ball control,Composure,Crossing,...,Reactions,Short passing,Shot power,Sliding tackle,Sprint speed,Stamina,Standing tackle,Strength,Vision,Volleys
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,94,94,2228,89,63,89,63,93,95,85,...,96,83,94,23,91,92,31,80,85,88
1,93,93,2154,92,48,90,95,95,96,77,...,95,88,85,26,87,73,28,59,90,85
2,92,94,2100,94,56,96,82,95,92,75,...,88,81,80,33,90,78,24,53,80,83
3,92,92,2291,88,78,86,60,91,83,77,...,93,83,87,38,77,89,45,80,84,88
4,92,92,1493,58,29,52,35,48,70,15,...,85,55,25,11,61,44,10,83,70,11


##### overall
* the overall rating for a player, on a scale of 100
* connections
    * this is the collective skill metric
    * how determined is this by lower-level technique?
    * **can I come up with a better overall metric, based on technique?**
    
##### potential
* the highest maximum skill a player can achieve
* connections
    * difference between potential and overall
    * connection between potential and age
    * connection between potential and technique
    * potential is limited by team placement
    
##### special
* undefined

##### [techniques]
* various football techniques rated on a scale of 100
* connections
    * predicting overall skill based on technique
    * predicting best position based on techniques
    * understand which techniques are clustered

#### positions

In [13]:
players[positions].head()

Unnamed: 0_level_0,CAM,CB,CDM,CF,CM,LAM,LB,LCB,LCM,LDM,...,RB,RCB,RCM,RDM,RF,RM,RS,RW,RWB,ST
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,89.0,53.0,62.0,91.0,82.0,89.0,61.0,53.0,82.0,62.0,...,61.0,53.0,82.0,62.0,91.0,89.0,92.0,91.0,66.0,92.0
1,92.0,45.0,59.0,92.0,84.0,92.0,57.0,45.0,84.0,59.0,...,57.0,45.0,84.0,59.0,92.0,90.0,88.0,91.0,62.0,88.0
2,88.0,46.0,59.0,88.0,79.0,88.0,59.0,46.0,79.0,59.0,...,59.0,46.0,79.0,59.0,88.0,87.0,84.0,89.0,64.0,84.0
3,87.0,58.0,65.0,88.0,80.0,87.0,64.0,58.0,80.0,65.0,...,64.0,58.0,80.0,65.0,88.0,85.0,88.0,87.0,68.0,88.0
4,,,,,,,,,,,...,,,,,,,,,,


##### [positions]
* the rating for a given position on a scale of 100
* connections
    * does the position your best at make you more likely to have a better overall rating?
    
##### preferred positions
* the positions the player prefers to play
* connections
    * our preferred positions the same as best positions
    * are certain positions generally more preferred than others