**PUBG Competition Analysis**

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

**Read and drop any NaN values**

In [None]:
df = pd.read_csv('../input/train_V2.csv')
df = df.dropna()


**Describe the Dataset**

In [None]:
df.describe()

**Analysis of players based on their winrate**

In [None]:
top10_players = df[df['winPlacePerc'] >= 0.9]

mid_players = df[(df['winPlacePerc'] >= 0.5) & (df['winPlacePerc'] < 0.9)]

bottom_players = df[df['winPlacePerc'] < 0.5]

**Plot a correletion map**

In [None]:
f,ax = plt.subplots(figsize=(15, 15))
sns.heatmap(df.corr(), annot=True, linewidths=.5, fmt= '.1f',ax=ax)
plt.show()


**We can see that Boosts, WalkDistance, WeaponsAcquired have a strong positive correaltion with winPercPlace.
So let's see how different ranked players play**

In [None]:
print('The 10% Top players kills in average {:.2f}, while the mid players kills {:.2f} and the bottom players kills {:.2f}'.format(top10_players['kills'].mean(),mid_players['kills'].mean(),bottom_players['kills'].mean()))


In [None]:
sns.jointplot(x='winPlacePerc',y='kills',data=df, height=10, ratio=3)
plt.show()

**The players with most kills usually wins, so you better train your aim!**

In [None]:
print('The 10% Top players kills with headshot in average {:.2f}, while the mid players kills {:.2f} with headshot and the bottom players kills {:.2f} with headshot'.format(top10_players['headshotKills'].mean(),mid_players['headshotKills'].mean(),bottom_players['headshotKills'].mean()))


In [None]:
sns.jointplot(x='winPlacePerc',y='headshotKills',data=df, height=10, ratio=3)
plt.show()

**The 10% players are better shoters, and we can see that the having a better aim makes the difference**

In [None]:
print('The 10% Top players longest kill in average {:.2f}, while the mid players longest kill {:.2f} and the bottom players longest kill {:.2f}'.format(top10_players['longestKill'].mean(),mid_players['longestKill'].mean(),bottom_players['longestKill'].mean()))


In [None]:
sns.jointplot(x='winPlacePerc',y='longestKill',data=df, height=10, ratio=3)
plt.show()


** The distance of kill have a correlation of 0.4 with the winPercPlace, and the 10% players prefer to maintain their distance from the enemy**

In [None]:
print('The 10% Top players deals in average {:.2f} of damage, while the mid players deals {:.2f} and the bottom players deals {:.2f}'.format(top10_players['damageDealt'].mean(),mid_players['damageDealt'].mean(),bottom_players['damageDealt'].mean()))


In [None]:
sns.jointplot(x='winPlacePerc',y='damageDealt',data=df, height=10, ratio=3)
plt.show()

**Dealing damage it's a great start for achieving better winPercPlace, the Top 10% players deals almost 2x damage compared to mid players**

In [None]:
print('The 10% Top players have in average {:.2f} weapons, while the mid players have {:.2f} and the bottom players have {:.2f}'.format(top10_players['weaponsAcquired'].mean(),mid_players['weaponsAcquired'].mean(),bottom_players['weaponsAcquired'].mean()))


In [None]:
sns.jointplot(x='winPlacePerc',y='weaponsAcquired',data=df, height=10, ratio=3)
plt.show()

**Finding a weapon gives you a great chance for fighting!**

In [None]:
print('The 10% Top players walks in average {:.2f}, while the mid players walks {:.2f} and the bottom players walks {:.2f}'.format(top10_players['walkDistance'].mean(),mid_players['walkDistance'].mean(),bottom_players['walkDistance'].mean()))


In [None]:
sns.jointplot(x='winPlacePerc',y='walkDistance',data=df, height=10, ratio=3)
plt.show()

**The more you walk more chances of winning you will have, so you better star running!**

In [None]:
print('The 10% Top players number of heals in average {:.2f}, while the mid players use {:.2f} heals and the bottom players use {:.2f} heals'.format(top10_players['heals'].mean(),mid_players['heals'].mean(),bottom_players['heals'].mean()))


In [None]:
sns.jointplot('winPlacePerc','heals',df, height=10, ratio=3)
plt.show()

**The top 10% players use a lot o heals, and heals have a strong relationship with winning**

In [None]:
print('The 10% Top players number of boosts in average {:.2f}, while the mid players use {:.2f} boosts and the bottom players use {:.2f} boosts'.format(top10_players['boosts'].mean(),mid_players['boosts'].mean(),bottom_players['boosts'].mean()))


In [None]:
sns.jointplot('winPlacePerc','boosts',df, height=10, ratio=3)
plt.show()

**Using boosts increase your chances of winning **