# Exploring Darts players
### In this notebook, a look on the variables present in the darts dataset will be made.
### Any comments will be appreciated

In [None]:
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns
sns.set()
%matplotlib inline

In [None]:
darts = pd.read_csv('../input/darts-players-top-500/PlayerProfiles.csv')

### Getting familiar with the dataset

#### Getting some sample rows from the dataset

In [None]:
darts.sample(10)

#### just from the sampling we can see there are duplicate rows, which would totally affect our analysis. this could pass unnoticed. we will fix that later. now onto some information about the dataset.

In [None]:
darts.info()

### Data preparation

#### Alright, so we found there were duplicate rows. Lets see how many of them there is, and them drop them

In [None]:
darts[darts.duplicated() ==True].count()

#### There are 18 duplicate values in the data. better just drop them

In [None]:
darts = darts.drop_duplicates()
darts

#### It seems like the Career earnings is not considered as a number. We may fix that

In [None]:
darts['Career Earnings :'] = darts['Career Earnings :'].apply(lambda x: x[1:]).str.replace(',', '').astype(int)

#### Now moving on to analysis

-------------------------
## Statistics and plots 

### In this section, the analysis of each feature will be conducted

### Career earnings
#### Lets rank the players by earnings, and then get a view on its distribution

In [None]:
darts.sort_values(by = 'Career Earnings :', ascending = False, inplace = True)
darts

In [None]:
darts['Career Earnings :'].hist();

#### Alright, so most of darts professionals did not earn more than 2 million in their carrer . Lets understand that more in depth

In [None]:
darts[darts['Career Earnings :'] < 2e6]['Career Earnings :'].hist(bins=20);

#### It is visible that most of dart players did not earn more than 1.5 million pounds in their career

#### Now, visualizing the countries that did earn more money

In [None]:
darts.groupby('Country :').sum()['Career Earnings :'].sort_values().reset_index() \
.plot(kind = 'barh', x = 'Country :', y = 'Career Earnings :', figsize = (12,10));

#### Alright so we see england has won the most money, but which country has the most eanings by player?

In [None]:
## of course there is a much simpler way to do it 
grouped = darts.groupby('Country :').mean().reset_index()[['Country :', 'Career Earnings :']]
grouped.columns = ['Country', 'EarningPerPlayer']
grouped

In [None]:
grouped.sort_values('EarningPerPlayer') \
.plot(kind ='barh',x = 'Country', figsize = (12,10));

#### So when we see by average earning/player, Scotland gets the win, while england gets way behind. So england may have the most money earned because it has the most amount of players, but the mean amount of money earned per player is quite low, getting behind nothern ireland

### Now lets see the amount of money earned by  player of each country in a scatterplot, and also the mean by country

In [None]:
plt.figure(figsize = (12,10))
sns.scatterplot(data = darts, x = 'Career Earnings :', y ='Country :', marker ='o', s= 100)
sns.scatterplot(data = grouped, x = 'EarningPerPlayer', y = 'Country', marker = 'x', s = 200)

#### It is visible the amount of players England has, and their average earnings is not that high. it is also easy to tell Scotland has a high average of earning per player because it has only a few players, with relative high earnings. 
#### The Netherlands is also up there just because of the top 1 earner, with more than 8 million pounds earned in his career. lets see who he is. as our data is already sorted by earnings, it wil be easy to do it

In [None]:
darts.head(1)

#### We can see Michael is the top 1 most sucessful player in all terms.

#### Alright so we had a good analysis on the Earnings. lets move to another feature

# Age

#### lets get the distribution of the players`s age

In [None]:
sns.set(palette = 'summer')

In [None]:
darts['Age :'].hist(bins = 15);

#### By the histogram, we can see the youngest person is 25 years old, and the oldest if a little older than 55 years old. Michael, the top 1 player, is quite young, for the money an sucess he has achieved
#### lets see if the carrer earnings is correlated with age

In [None]:
sns.lineplot(data =darts, x = 'Age :', y = 'Career Earnings :', ci= None, estimator = np.median);

#### So we can see the carrer earning is not correlated with age. A trend is only visible for players older than 50 years old.

#### Visualizing the scatterplot of age by country

In [None]:
plt.figure(figsize = (8,6))
sns.scatterplot(data = darts, x = 'Age :', y = 'Country :');

# Feel free to post suggestions