# Fifa Awards 2017

The Fifa honored best football performers with Fifa Awards. The rules of the voting can be found under [1], but in short coaches, captains of national teams, and media representatives from all associated countries selected the best male and female player, best coach, etc. 

Results for FIFA MEN'S WORLD PLAYER 2017 can be found [2], the votes [3], and the rules [1]. In the following we will analyze the results, and identify some discrepancies. 

There are at least two issues identified. Firstly, the rules of allocation state that voters chose from a list of 23 male players compiled by a panel of experts. There are, however, 24 players in the official results list [2]. Secondly, the percentage results derived from votes [3] differ from the official results [2].

[1] http://resources.fifa.com/mm/document/the-best/general/02/90/27/31/fifaawards2017_thebestawards_rulesofallocation_v5_en_neutral.pdf

[2] http://resources.fifa.com/mm/document/the-best/general/02/91/68/84/fullresults-tbffa_award_rankingpresslist2017_neutral.pdf

[3] http://resources.fifa.com/mm/document/the-best/playeroftheyear-men/02/91/68/49/faward_menplayer2017_neutral.pdf

In [2]:
import pandas as pd
import numpy as np

In [3]:
df = pd.read_csv('./player-votes.csv', index_col=0)

In [50]:
#Kuwait and Guatemala are suspended
# but only media votes are on the list, so I guess they are intendet to be included
df.loc[df.Country.isin(['Kuwait','Guatemala'])]
# if you want to remove them from the set uncoment the following
#df = df.loc[~df.Country.isin(['Kuwait','Guatemala'])]

Unnamed: 0,Vote,Country,Name,First,Second,Third
366,Media,Guatemala,Solares Lucho,Cristiano Ronaldo,Messi Lionel,Navas Keylor
382,Media,Kuwait,Adnad Yousif,Cristiano Ronaldo,Messi Lionel,Kroos Toni


In [4]:
# table structure is as following
df.head()

Unnamed: 0,Vote,Country,Name,First,Second,Third
0,Captain,Albania,Agolli Ansi,Cristiano Ronaldo,Messi Lionel,Buffon Gianluigi
1,Captain,Algeria,Mbolhi Adi Rais,Cristiano Ronaldo,Buffon Gianluigi,Messi Lionel
2,Captain,American Samoa,Ott Ramin,Cristiano Ronaldo,Neymar,Dybala Paulo
3,Captain,Andorra,Lima Sola Ildefons,Neymar,Modric Luka,Buffon Gianluigi
4,Captain,Angola,Gaspar Wilson Pinto,Cristiano Ronaldo,Ramos Sergio,Modric Luka


In [52]:
# let us verify our conversion to csv, following occurance numbers are based on count in the original pdf [3]
cpt = 4*33 + 19
coach = 4*33 + 20
media = 3*33 + 30 + 27

sumx = cpt+media+coach

print('Occurances {} == {}'.format(sumx, len(df)))

# Carvajal is easy to verify as it appears only 3 times in the vote summary (3 times on second place i.e. 9 pts)
print('Carvajal result with sumx={} \t==> {}'.format(sumx, 100*9/(sumx*9)))

# we remove Kuwait and Guatemala
sumx-=2
print('Carvajal result with sumx={} \t==> {}'.format(sumx, 100*9/(sumx*9)))

# result from [2]
print('Carvajal result by FIFA \t=>  0.62')


Occurances 459 == 459
Carvajal result with sumx=459 	==> 0.2178649237472767
Carvajal result with sumx=457 	==> 0.2188183807439825
Carvajal result by FIFA 	=>  0.62


In [5]:
# how many times a player was put on the first position
df.First.value_counts()

Cristiano Ronaldo            340
Messi Lionel                  55
Neymar                        15
Buffon Gianluigi               8
Kroos Toni                     7
Aubameyang Pierre-Emerick      6
Ramos Sergio                   6
Modric Luka                    5
Suárez Luis                    4
Kanté N'Golo                   3
Navas Keylor                   2
Lewandowski Robert             2
Ibrahimovic Zlatan             2
Neuer Manuel                   2
Hazard Eden                    1
Kane Harry                     1
Name: First, dtype: int64

In [6]:
# votes without Ronaldo and Messi
# sidenote: am I the only one iritated by the fact that Ronaldo is the only player with natural Surename and Name order?
no_ronaldo = df[df.apply(lambda x: 'Cristiano Ronaldo' not in [x['First'], x['Second'], x['Third']], axis='columns')]
no_messi = df[df.apply(lambda x: 'Messi Lionel' not in [x['First'], x['Second'], x['Third']], axis='columns')]

print('{0:.2f}% of votes did not include Ronaldo'.format(100*len(no_ronaldo)/len(df)))
print('{0:.2f}% of votes did not include Messi'.format(100*len(no_messi)/len(df)))


9.80% of votes did not include Ronaldo
36.60% of votes did not include Messi


In [7]:
# who don't like Ronaldo by category
no_ronaldo.Vote.value_counts()

Coach      22
Captain    16
Media       7
Name: Vote, dtype: int64

In [8]:
# and same for Messi by category
no_messi.Vote.value_counts()

Captain    58
Media      57
Coach      53
Name: Vote, dtype: int64

In [54]:
# let us prepare table with number of how many times a player was put in each poistion (first, second, third)
# e.g how many times each player name occured in the 'First' column:
df.First.value_counts().head()

Cristiano Ronaldo    340
Messi Lionel          55
Neymar                15
Buffon Gianluigi       8
Kroos Toni             7
Name: First, dtype: int64

In [9]:
# pivoting the table to get summaries by player
nf = pd.DataFrame(df.First.value_counts())
nf2 = pd.DataFrame(df.Second.value_counts())
nf3 = pd.DataFrame(df.Third.value_counts())

res = nf.join((nf2.join(nf3, how='outer')), how='outer')

# for some reason some columns are float and not int, so let us fix it
# if the player was never present on given place we set the value to 0.0
res.fillna(0.0, inplace=True)
res = res.astype(int)

In [10]:
# calculate number of votes and points according to FIFA rules (first == 5 pts, second = 3 pts, third = 1pt)
res['Occurances'] = res.First + res.Second + res.Third
res['Points'] = res.First * 5 + res.Second * 3 + res.Third

In [11]:
# top 10 with most points
res.Points.sort_values(ascending=False).head(10)

Cristiano Ronaldo            1888
Messi Lionel                  823
Buffon Gianluigi              317
Neymar                        298
Ramos Sergio                  126
Modric Luka                   115
Kroos Toni                     76
Kanté N'Golo                   63
Aubameyang Pierre-Emerick      56
Hazard Eden                    42
Name: Points, dtype: int64

In [18]:
# points summary, compare to [2]
points_total = len(df)*9 
relative = 100*res.Points/points_total
relative.sort_values(ascending=False)

Cristiano Ronaldo            45.703220
Messi Lionel                 19.922537
Buffon Gianluigi              7.673687
Neymar                        7.213750
Ramos Sergio                  3.050109
Modric Luka                   2.783830
Kroos Toni                    1.839748
Kanté N'Golo                  1.525054
Aubameyang Pierre-Emerick     1.355604
Hazard Eden                   1.016703
Marcelo                       0.944081
Suárez Luis                   0.847252
Ibrahimovic Zlatan            0.750424
Lewandowski Robert            0.702009
Dybala Paulo                  0.629388
Griezmann Antoine             0.605180
Neuer Manuel                  0.532559
Iniesta Andrés                0.508351
Vidal Arturo                  0.508351
Bonucci Leonardo              0.484144
Navas Keylor                  0.435730
Sanchez Alexis                0.387315
Kane Harry                    0.363108
Carvajal Dani                 0.217865
Name: Points, dtype: float64

In [13]:
# top 10 least popular players
res.Occurances.sort_values().head(10)

Carvajal Dani          3
Navas Keylor           6
Kane Harry             7
Neuer Manuel           8
Vidal Arturo           9
Griezmann Antoine      9
Sanchez Alexis        10
Iniesta Andrés        11
Bonucci Leonardo      12
Ibrahimovic Zlatan    13
Name: Occurances, dtype: int64