**About this Competition**

In a PUBG game, up to 100 players start in each match (matchId). Players can be on teams (groupId) which get ranked at the end of the game (winPlacePerc) based on how many other teams are still alive when they are eliminated. In game, players can pick up different munitions, revive downed-but-not-out (knocked) teammates, drive vehicles, swim, run, shoot, and experience all of the consequences -- such as falling too far or running themselves over and eliminating themselves.

You are provided with a large number of anonymized PUBG game stats, formatted so that each row contains one player's post-game stats. The data comes from matches of all types: solos, duos, squads, and custom; there is no guarantee of there being 100 players per match, nor at most 4 player per group.

You must create a model which predicts players' finishing placement based on their final stats, on a scale from 1 (first place) to 0 (last place).

**Data**

* DBNOs - Number of enemy players knocked.
* assists - Number of enemy players this player damaged that were killed by teammates.
* boosts - Number of boost items used.
* damageDealt - Total damage dealt. Note: Self inflicted damage is subtracted.
* headshotKills - Number of enemy players killed with headshots.
* heals - Number of healing items used.
* killPlace - Ranking in match of number of enemy players killed.
* killPoints - Kills-based external ranking of player. (Think of this as an Elo ranking where only kills matter.)
* killStreaks - Max number of enemy players killed in a short amount of time.
* kills - Number of enemy players killed.
* longestKill - Longest distance between player and player killed at time of death. This may be misleading, as downing a player and driving away may lead to a large longestKill stat.
* matchId - Integer ID to identify match. There are no matches that are in both the training and testing set.
* revives - Number of times this player revived teammates.
* rideDistance - Total distance traveled in vehicles measured in meters.
* roadKills - Number of kills while in a vehicle.
* swimDistance - Total distance traveled by swimming measured in meters.
* teamKills - Number of times this player killed a teammate.
* vehicleDestroys - Number of vehicles destroyed.
* walkDistance - Total distance traveled on foot measured in meters.
* weaponsAcquired - Number of weapons picked up.
* winPoints - Win-based external ranking of player. (Think of this as an Elo ranking where only winning matters.)
* groupId - Integer ID to identify a group within a match. If the same group of players plays in different matches, they will have a different groupId each time.
* numGroups - Number of groups we have data for in the match.
* maxPlace - Worst placement we have data for in the match. This may not match with numGroups, as sometimes the data skips over placements.
*  winPlacePerc - The target of prediction. This is a percentile winning placement, where 1 corresponds to 1st place, and 0 corresponds to last place in the match. It is calculated off of maxPlace, not numGroups, so it is possible to have missing chunks in a match.

In [None]:
import pandas as pd
import numpy as np 
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

df = pd.read_csv('../input/train.csv') #import data
df.head(5)

In [None]:
df.shape #check shape

In [None]:
df.describe()

In [None]:
df.dtypes #check type

In [None]:
df.isnull().sum(axis=0) # Check missing values

In [None]:
match = df.groupby(['matchId']).count()['kills']
kills = match.sort_values(axis=0, ascending=False)
kills.head(5)

In [None]:
kills.tail(5)

Upto 100 players start in each match and highest kills have been 100. On the other hand, we have had few matches were just 1 player was killed

In [None]:
df_corr = df.iloc[:,3:] #Drop ID's
corr=df_corr.corr()

sns.set(font_scale=1.15)
plt.figure(figsize=(24, 18))

sns.heatmap(corr, vmax=.8, linewidths=0.01,
            square=True,annot=True,cmap='YlGnBu',linecolor="black")
plt.title('Correlation between features');

In [None]:
fig, ax = plt.subplots(3,3, figsize=(20, 14))
sns.distplot(df.rideDistance, bins = 20, ax=ax[0,0])  
sns.distplot(df.damageDealt, bins = 20, ax=ax[0,1]) 
sns.distplot(df.killPlace, bins = 20, ax=ax[0,2]) 
sns.distplot(df.longestKill, bins = 20, ax=ax[1,0]) 
sns.distplot(df.maxPlace, bins = 20, ax=ax[1,1]) 
sns.distplot(df.rideDistance, bins = 20, ax=ax[1,2]) 
sns.distplot(df.swimDistance, bins = 20, ax=ax[2,0]) 
sns.distplot(df.walkDistance, bins = 20, ax=ax[2,1]) 
sns.distplot(df.winPoints, bins = 20, ax=ax[2,2]) 
plt.show()

In [None]:
sns.lmplot(x='kills', y='damageDealt', data=df)

In [None]:
sns.lmplot(x='winPlacePerc', y='walkDistance', data=df)

In [None]:
g = sns.factorplot('kills','DBNOs', data=df,
                   hue='boosts',
                   size=18,
                   aspect=0.7,
                   palette='Blues',
                   join=False,
              )

In [None]:
sns.jointplot(x="damageDealt", y="kills", data=df, height=10, ratio=3, color="r")
plt.show()

**...TO BE CONTINUED**