# Analysis of the highest paid athletes in the world


**Here is an analysis of the world's Top 10 highest paid athelets from 1990 to 2020. The analysis brings out some interesting facts:**

- Tiger Woods dominates the list of Top ranked recently while earlier it was Michael Jordan
- USA dominates the world when it comes to earnings
- Monica Seles is the only women to make the top-10 highest paid athlete list from 1990 to 2020
- Basketball players earn the most followed by Boxing and Golf.



## Updated for year 2020

In [1]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load in 

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
from datetime import datetime

#Visualisation libraries
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set()
from plotly.offline import init_notebook_mode, iplot 
import plotly.graph_objs as go
import plotly.offline as py
import pycountry
py.init_notebook_mode(connected=True)
import folium 
from folium import plugins

# Image
import numpy as np
from PIL import Image

# Animation
import matplotlib.ticker as ticker
import matplotlib.animation as animation
from IPython.display import HTML

# Graphics in retina format 
%config InlineBackend.figure_format = 'retina' 

# Increase the default plot size and set the color scheme
plt.rcParams['figure.figsize'] = 8, 5


# Disable warnings in Anaconda
import warnings
warnings.filterwarnings('ignore')
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# Any results you write to the current directory are saved as output.

# Objective

 We begin by analysing the data and then we use plotly for visualisation to get better insights. Also, we plot a racing bar graph with only matplotlib to help the effects come out.


# 1. Reading and Preprocessing the data


### Steps :
* Read in the data using pandas
* Convert Year column to datetime
* Convert the text in the Sport column to either upper or lower case

In [2]:
df = pd.read_csv(r'C:\Users\bmthm\Downloads\archive (7)\Forbes Richest Atheletes (Forbes Richest Athletes 1990-2020).csv')
df.head()

Unnamed: 0,S.NO,Name,Nationality,Current Rank,Previous Year Rank,Sport,Year,earnings ($ million)
0,1,Mike Tyson,USA,1,,boxing,1990,28.6
1,2,Buster Douglas,USA,2,,boxing,1990,26.0
2,3,Sugar Ray Leonard,USA,3,,boxing,1990,13.0
3,4,Ayrton Senna,Brazil,4,,auto racing,1990,10.0
4,5,Alain Prost,France,5,,auto racing,1990,9.0


In [3]:
# Creating a copy of the original dataframe- df
df1 = df.copy()

df1.drop('S.NO',axis=1,inplace=True)
df1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 301 entries, 0 to 300
Data columns (total 7 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   Name                  301 non-null    object 
 1   Nationality           301 non-null    object 
 2   Current Rank          301 non-null    int64  
 3   Previous Year Rank    277 non-null    object 
 4   Sport                 301 non-null    object 
 5   Year                  301 non-null    int64  
 6   earnings ($ million)  301 non-null    float64
dtypes: float64(1), int64(2), object(4)
memory usage: 16.6+ KB


In [4]:

# Convert string to datetime64
df1['Year'] = df1['Year'].apply(pd.to_datetime,format='%Y')

df1

Unnamed: 0,Name,Nationality,Current Rank,Previous Year Rank,Sport,Year,earnings ($ million)
0,Mike Tyson,USA,1,,boxing,1990-01-01,28.6
1,Buster Douglas,USA,2,,boxing,1990-01-01,26.0
2,Sugar Ray Leonard,USA,3,,boxing,1990-01-01,13.0
3,Ayrton Senna,Brazil,4,,auto racing,1990-01-01,10.0
4,Alain Prost,France,5,,auto racing,1990-01-01,9.0
...,...,...,...,...,...,...,...
296,Stephen Curry,USA,6,9,Basketball,2020-01-01,74.4
297,Kevin Durant,USA,7,10,Basketball,2020-01-01,63.9
298,Tiger Woods,USA,8,11,Golf,2020-01-01,62.3
299,Kirk Cousins,USA,9,>100,American Football,2020-01-01,60.5


In [5]:
#Set Date column as the index column.
df1['year'] = pd.DatetimeIndex(df1['Year']).year  
df1.set_index('year', inplace=True)
df1.drop('Year',axis=1,inplace=True)

df1

Unnamed: 0_level_0,Name,Nationality,Current Rank,Previous Year Rank,Sport,earnings ($ million)
year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1990,Mike Tyson,USA,1,,boxing,28.6
1990,Buster Douglas,USA,2,,boxing,26.0
1990,Sugar Ray Leonard,USA,3,,boxing,13.0
1990,Ayrton Senna,Brazil,4,,auto racing,10.0
1990,Alain Prost,France,5,,auto racing,9.0
...,...,...,...,...,...,...
2020,Stephen Curry,USA,6,9,Basketball,74.4
2020,Kevin Durant,USA,7,10,Basketball,63.9
2020,Tiger Woods,USA,8,11,Golf,62.3
2020,Kirk Cousins,USA,9,>100,American Football,60.5


In [6]:
# Converting the sport column to uppercase
df1['Sport'] = df1['Sport'].str.upper()
df1.head()

# df is the original dataframe while df1 is a copy where the Date has been set as an index column

Unnamed: 0_level_0,Name,Nationality,Current Rank,Previous Year Rank,Sport,earnings ($ million)
year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1990,Mike Tyson,USA,1,,BOXING,28.6
1990,Buster Douglas,USA,2,,BOXING,26.0
1990,Sugar Ray Leonard,USA,3,,BOXING,13.0
1990,Ayrton Senna,Brazil,4,,AUTO RACING,10.0
1990,Alain Prost,France,5,,AUTO RACING,9.0



# 2. World's Highest-Paid Athletes in 2020


In [7]:
data_2020 = df1[df1.index == 2020]

data_2020

Unnamed: 0_level_0,Name,Nationality,Current Rank,Previous Year Rank,Sport,earnings ($ million)
year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2020,Roger Federer,Switzerland,1,5,TENNIS,106.3
2020,Cristiano Ronaldo,Portugal,2,2,SOCCER,105.0
2020,Lionel Messi,Argentina,3,1,SOCCER,104.0
2020,Neymar,Brazil,4,3,SOCCER,95.5
2020,LeBron James,USA,5,8,BASKETBALL,88.2
2020,Stephen Curry,USA,6,9,BASKETBALL,74.4
2020,Kevin Durant,USA,7,10,BASKETBALL,63.9
2020,Tiger Woods,USA,8,11,GOLF,62.3
2020,Kirk Cousins,USA,9,>100,AMERICAN FOOTBALL,60.5
2020,Carson Wentz,USA,10,>100,AMERICAN FOOTBALL,59.1


In [32]:

trace = go.Bar(x = data_2020["earnings ($ million)"],y = data_2020['Name'],orientation='h')

layout = go.Layout(barmode = "group",title="World's Highest-Paid Athletes in 2020",width=800, height=400, 
                       #xaxis= dict(title='No of times ranked higest'),
                       yaxis=dict(autorange="reversed"),
                       showlegend=False)
data = [trace]

fig = go.Figure(data = data, layout = layout)
iplot(fig)

 # 3. Analysis of Higest paid Athlete each year from 1990 to 2020
---

In [35]:
# Top Paid Athlete for Each Year

Top_paid = df1[df1['Current Rank'] == 1].sort_values(by='year',ascending=False)
Top_paid

Unnamed: 0_level_0,Name,Nationality,Current Rank,Previous Year Rank,Sport,earnings ($ million)
year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2020,Roger Federer,Switzerland,1,5,TENNIS,106.3
2019,Lionel Messi,Argentina,1,2,SOCCER,127.0
2018,Floyd Mayweather,USA,1,>100,BOXING,285.0
2017,Cristiano Ronaldo,Portugal,1,1,SOCCER,93.0
2016,Cristiano Ronaldo,Portugal,1,3,SOCCER,88.0
2015,Floyd Mayweather,USA,1,1,BOXING,300.0
2014,Floyd Mayweather,USA,1,14,BOXING,105.0
2013,Tiger Woods,USA,1,3,GOLF,78.1
2012,Floyd Mayweather,USA,1,?,BOXING,85.0
2011,Tiger Woods,USA,1,1,GOLF,75.0


In [41]:
Top_Paid = Top_paid[['Name','Sport','Nationality','earnings ($ million)']].sort_values(by='year',ascending=False)

#z.style.set_properties(**{'background-color': 'pink',
                           # 'color': 'black',
                            #'border-color': 'white'})
Top_Paid.style.background_gradient(cmap='PuBu')  

Unnamed: 0_level_0,Name,Sport,Nationality,earnings ($ million)
year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2020,Roger Federer,TENNIS,Switzerland,106.3
2019,Lionel Messi,SOCCER,Argentina,127.0
2018,Floyd Mayweather,BOXING,USA,285.0
2017,Cristiano Ronaldo,SOCCER,Portugal,93.0
2016,Cristiano Ronaldo,SOCCER,Portugal,88.0
2015,Floyd Mayweather,BOXING,USA,300.0
2014,Floyd Mayweather,BOXING,USA,105.0
2013,Tiger Woods,GOLF,USA,78.1
2012,Floyd Mayweather,BOXING,USA,85.0
2011,Tiger Woods,GOLF,USA,75.0


##  Athlete earning the most,maximum number of times

In [43]:
high_count = Top_paid['Name'].value_counts().to_frame()


trace = go.Bar(
                    y = high_count.index,
                    x = high_count['Name'] ,
                    orientation='h',
                    marker = dict(color='pink',
                                 line=dict(color='black',width=1)),
                    )
data = [trace]
layout = go.Layout(barmode = "group",title='Athlete earning the most,maximum number of times',width=800, height=500, 
                       xaxis= dict(title='No of times ranked higest'),
                       yaxis=dict(autorange="reversed"),
                       showlegend=False)
fig = go.Figure(data = data, layout = layout)
iplot(fig)


## Country which produces the maximum earners in Sports

In [46]:
counts_top = Top_paid['Nationality'].value_counts().to_frame()


trace = go.Bar(
                    x = counts_top.index,
                    y = counts_top['Nationality'] ,
                    orientation='v',
                    marker = dict(color='pink',
                                 line=dict(color='black',width=1)),
                    )
data = [trace]
layout = go.Layout(barmode = "group",title='Country which produces the maximum earners in Sports',width=800, height=500, 
                       xaxis= dict(title='No of times ranked higest'),
                       #yaxis=dict(autorange="reversed"),
                       showlegend=False)
fig = go.Figure(data = data, layout = layout)
iplot(fig)

## How much the Top Paid Athlete for Each Year, earn? 

In [47]:


trace = go.Scatter(
                    x = Top_paid.index,
                    y = Top_paid['earnings ($ million)'] ,
                    orientation='v',
                    marker = dict(color='red',
                                 line=dict(color='royalblue',width=2)),
                    )
data = [trace]
layout = go.Layout(title='How much did the Top Paid Athlete for Each Year, earn? ',width=800, height=500, 
                       xaxis= dict(title='Years'),
                       yaxis=dict(title="Earning in US Dollars(million)"),
                       showlegend=False)
fig = go.Figure(data = data, layout = layout)
iplot(fig)


# 4. Analysis of the Top Ten Higest paid Athletes each year from 1990 to 2010
---



## Sport which dominates in earnings

In [48]:
df['Sport'] = df['Sport'].str.upper() # Converting the text to uppercase
top_sport = df['Sport'].value_counts().to_frame()

trace = go.Bar(
                    y = top_sport.index,
                    x = top_sport['Sport'] ,
                    orientation='h',
                    marker = dict(color='pink',
                                 line=dict(color='black',width=1)),
                    )
data = [trace]
layout = go.Layout(barmode = "group",title='Sport which dominates in earnings',width=800, height=500, 
                       xaxis= dict(title='No of times ranked highest'),
                       yaxis=dict(autorange="reversed"),
                       showlegend=False)
fig = go.Figure(data = data, layout = layout)
iplot(fig)

## Country which dominates in Sports earnings

In [49]:

top_country = df['Nationality'].value_counts().to_frame()


trace = go.Bar(
                    y = top_country.index,
                    x = top_country['Nationality'] ,
                    orientation='h',
                    marker = dict(color='pink',
                                 line=dict(color='black',width=1)),
                    )
data = [trace]
layout = go.Layout(barmode = "group",title='Country which dominates in Sports earningss',width=800, height=500, 
                       xaxis= dict(title='No of times ranked highest'),
                       yaxis=dict(autorange="reversed"),
                       showlegend=False)
fig = go.Figure(data = data, layout = layout)
iplot(fig)

## Athletes appearing maximum time on the list

In [51]:

athelte_top = df['Name'].value_counts().to_frame()[:5]
athelte_top.style.background_gradient(cmap='Reds')  


Unnamed: 0,Name
Tiger Woods,19
Michael Jordan,19
Kobe Bryant,14
LeBron James,13
Michael Schumacher,13


# 5. Where are the Women?


In [52]:
# People who have appeared once on the list.
names = df['Name'].value_counts().to_frame()
names[names['Name']==1].index


Index(['Matthew Stafford', 'Aaron Rodgers', 'Rafael Nadal', 'Kirk Cousins',
       'Aaron Rogers', 'Novak Djokovic', 'Jordan Spieth', 'Cam Newton',
       'Canelo Alvarez', 'Andrew Luck', 'Rory McIlroy', 'Drew Brees',
       'James Harden', 'Lewis Hamilton', 'Russell Wilson', 'Conor McGregor',
       'Deion Sanders', 'Donovan "Razor" Ruddock', 'Terrell Suggs',
       'Eli Manning', 'Emmit Smith', 'Dennis Rodman', 'Gerhard Berger',
       'Joe Sakic', 'Cecil Fielder', 'Sergei Federov', 'Gary Sheffield',
       'Jeff Gordon', 'Buster Douglas', 'Monica Seles', 'Michael Vick',
       'Lance Armstrong', 'Muhammad Ali', 'Tom Brady', 'Michael Moorer',
       'Dale Earnhardt Jr.', 'Greg Norman', 'Carson Wentz'],
      dtype='object')

In [53]:
# On scanning the list, we find the name of a sole women athlete- monica seles
monica = df[df['Name'] == 'Monica Seles']
monica.style.set_properties(**{'background-color': 'pink',
                            'color': 'black',
                            'border-color': 'black'})

Unnamed: 0,S.NO,Name,Nationality,Current Rank,Previous Year Rank,Sport,Year,earnings ($ million)
29,30,Monica Seles,USA,10,12,TENNIS,1992,8.5


# 6. Analysis Top three earning players of all time
---



In [55]:
top_earners_alltime = pd.pivot_table(df, index='Name',values="earnings ($ million)", aggfunc='sum')
top3_earners_all = top_earners_alltime.sort_values(by="earnings ($ million)",ascending=False)[:3]

top3_earners_all.style.background_gradient(cmap='Reds')  

Unnamed: 0_level_0,earnings ($ million)
Name,Unnamed: 1_level_1
Tiger Woods,1373.8
LeBron James,844.8
Floyd Mayweather,840.0


# TigerWoods Earnings

In [60]:
Tiger_Woods = df1[df1['Name'] == 'Tiger Woods']

In [61]:
Tiger_Woods

Unnamed: 0_level_0,Name,Nationality,Current Rank,Previous Year Rank,Sport,earnings ($ million)
year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1997,Tiger Woods,USA,6,26.0,GOLF,26.1
1998,Tiger Woods,USA,4,,GOLF,26.8
1999,Tiger Woods,USA,2,4.0,GOLF,47.0
2000,Tiger Woods,USA,2,2.0,GOLF,53.0
2002,Tiger Woods,USA,1,2.0,GOLF,69.0
2003,Tiger Woods,USA,1,1.0,GOLF,78.0
2004,Tiger Woods,USA,1,1.0,GOLF,80.3
2005,Tiger Woods,USA,1,1.0,GOLF,87.0
2006,Tiger Woods,USA,1,1.0,GOLF,90.0
2007,Tiger Woods,USA,1,1.0,GOLF,100.0


In [62]:
trace = go.Scatter(
                    x = Tiger_Woods.index,
                    y = Tiger_Woods['earnings ($ million)'] ,
                    orientation='v',
                    marker = dict(color='red',
                                 line=dict(color='royalblue',width=2)),
                    )
data = [trace]
layout = go.Layout(title='How much did Tiger Woods Each Year, earn? ',width=800, height=500, 
                       xaxis= dict(title='Years'),
                       yaxis=dict(title="Earning in US Dollars(million)"),
                       showlegend=False)
fig = go.Figure(data = data, layout = layout)
iplot(fig)

#LeBron James Earnings

In [63]:
LeBron_James = df1[df1['Name'] == 'LeBron James']
LeBron_James

Unnamed: 0_level_0,Name,Nationality,Current Rank,Previous Year Rank,Sport,earnings ($ million)
year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2008,LeBron James,USA,7,>10,BASKETBALL,38.0
2009,LeBron James,USA,6,7,BASKETBALL,40.0
2010,LeBron James,USA,7,6,BASKETBALL,42.8
2011,LeBron James,USA,3,7,BASKETBALL,48.0
2012,LeBron James,USA,4,3,BASKETBALL,53.0
2013,LeBron James,USA,4,4,BASKETBALL,59.8
2014,LeBron James,USA,3,4,BASKETBALL,72.3
2015,LeBron James,USA,6,3,BASKETBALL,64.8
2016,LeBron James,USA,3,6,BASKETBALL,77.2
2017,LeBron James,USA,2,3,BASKETBALL,86.2


In [64]:
trace = go.Scatter(
                    x = LeBron_James.index,
                    y = LeBron_James['earnings ($ million)'] ,
                    orientation='v',
                    marker = dict(color='red',
                                 line=dict(color='royalblue',width=2)),
                    )
data = [trace]
layout = go.Layout(title='How much did LeBron_James Each Year, earn? ',width=800, height=500, 
                       xaxis= dict(title='Years'),
                       yaxis=dict(title="Earning in US Dollars(million)"),
                       showlegend=False)
fig = go.Figure(data = data, layout = layout)
iplot(fig)

#Floyd Mayweather Earnings

In [65]:
Floyd_Mayweather = df1[df1['Name'] == 'Floyd Mayweather']
Floyd_Mayweather

Unnamed: 0_level_0,Name,Nationality,Current Rank,Previous Year Rank,Sport,earnings ($ million)
year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2010,Floyd Mayweather,USA,2,>10,BOXING,65.0
2012,Floyd Mayweather,USA,1,?,BOXING,85.0
2014,Floyd Mayweather,USA,1,14,BOXING,105.0
2015,Floyd Mayweather,USA,1,1,BOXING,300.0
2018,Floyd Mayweather,USA,1,>100,BOXING,285.0


In [66]:
trace = go.Scatter(
                    x = Floyd_Mayweather.index,
                    y = Floyd_Mayweather['earnings ($ million)'] ,
                    orientation='v',
                    marker = dict(color='red',
                                 line=dict(color='royalblue',width=2)),
                    )
data = [trace]
layout = go.Layout(title='How much did Floyd_Mayweather Each Year, earn? ',width=800, height=500, 
                       xaxis= dict(title='Years'),
                       yaxis=dict(title="Earning in US Dollars(million)"),
                       showlegend=False)
fig = go.Figure(data = data, layout = layout)
iplot(fig)