# PlayerUnknown’s Battlegrounds

PlayerUnknown’s Battlegrounds (more commonly referred to as PUBG) is a multiplayer online battle royale game published and developed by the South Korean company Bluehole. Similar to other popular games on the esport scene, PUBG is also based on a mod created for ARMA 2 called “DayZ: Battle Royale” by Brendan “PlayerUnknown” Greene.

<img src='https://www.esportsguide.com/wp-content/uploads/2019/06/playerunknown-s-battlegrounds-characters-pubg-artwork-1024x576.jpeg'/>

The whole concept of a Battle Royale game mode is heavily inspired by the Japanese film “Battle Royale” in which a group of students are transferred to an island and forced to fight to death by their government. The game was released for Microsoft Windows via the Steam platform, and it was initially released as an early access game in March 2017. The game was fully developed and released on December 20, 2017. Ever since, it has had several releases on different platforms, such as PS4 and Xbox One but also for mobile devices.

PlayerUnknown’s Battlegrounds sends up to 100 players to an island via air transfer and it’s up to all individual players to decide when to parachute onto the island where they instantly need to scavenge and loot buildings and smaller compounds for weapons and equipment that they need to use for surviving and eliminating their enemies. In early game, the map is very big, but the playable area is slowly decreasing over time.

Being outside the playable area will slowly inflict damage on players until they either die or manage to run into the playable area before all their health is lost. The game can be played in Solo, Duo or Squad mode with up to 4 players. The goal of the game is obviously to be the last one standing to get the win.

It’s a very intense and exciting game which will keep you on your toes from start to finish. It also forces players to be more cautious and strategic than in your general run-and-gun shooter game. The game was praised on its release, even though it was released with a list of irritating bugs and server issues, it still managed to get a really good traction via big twitch streamers that popularized the game very quickly.

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load in 

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))
import warnings
warnings.filterwarnings('ignore')
# Any results you write to the current directory are saved as output.

In [None]:
df=pd.read_csv('/kaggle/input/pubg-finish-placement-prediction/train_V2.csv')

In [None]:
pd.set_option('display.max_columns', None)

## Let's take a look at the data

In [None]:
df.head()

### Column description:
* DBNOs - Number of enemy players knocked.
* assists - Number of enemy players this player damaged that were killed by teammates.
* boosts - Number of boost items used.
* damageDealt - Total damage dealt. Note: Self inflicted damage is subtracted.
* headshotKills - Number of enemy players killed with headshots.
* heals - Number of healing items used.
* Id - Player’s Id
* killPlace - Ranking in match of number of enemy players killed.
* killPoints - Kills-based external ranking of player. (Think of this as an Elo ranking where only kills matter.) If there is a value other than -1 in rankPoints, then any 0 in killPoints should be treated as a “None”.
* killStreaks - Max number of enemy players killed in a short amount of time.
* kills - Number of enemy players killed.
* longestKill - Longest distance between player and player killed at time of death. This may be misleading, as downing a player and driving away may lead to a large longestKill stat.
* matchDuration - Duration of match in seconds.
* matchId - ID to identify match. There are no matches that are in both the training and testing set.
* matchType - String identifying the game mode that the data comes from. The standard modes are “solo”, “duo”, “squad”, “solo-fpp”, “duo-fpp”, and “squad-fpp”; other modes are from events or custom matches.
* rankPoints - Elo-like ranking of player. This ranking is inconsistent and is being deprecated in the API’s next version, so use with caution. Value of -1 takes place of “None”.
* revives - Number of times this player revived teammates.
* rideDistance - Total distance traveled in vehicles measured in meters.
* roadKills - Number of kills while in a vehicle.
* swimDistance - Total distance traveled by swimming measured in meters.
* teamKills - Number of times this player killed a teammate.
* vehicleDestroys - Number of vehicles destroyed.
* walkDistance - Total distance traveled on foot measured in meters.
* weaponsAcquired - Number of weapons picked up.
* winPoints - Win-based external ranking of player. (Think of this as an Elo ranking where only winning matters.) If there is a value other than -1 in rankPoints, then any 0 in winPoints should be treated as a “None”.
* groupId - ID to identify a group within a match. If the same group of players plays in different matches, they will have a different groupId each time.
* numGroups - Number of groups we have data for in the match.
* maxPlace - Worst placement we have data for in the match. This may not match with numGroups, as sometimes the data skips over placements.
* winPlacePerc - The target of prediction. This is a percentile winning placement, where 1 corresponds to 1st place, and 0 corresponds to last place in the match. It is calculated off of maxPlace, not numGroups, so it is possible to have missing chunks in a match.

# Primary Questions:

### - What should you do to win a match in PUBG?
### - What should you not do to win a match in PUBG?

In [None]:
backup=df.copy()

## Let's see the info of the data

In [None]:
df.info()

## Let's see the distribution of the data

In [None]:
df.describe(include='all')

Seeing the above table, we can say atleast this, that many of the columns have int64 datatype where the maximum numbers are not meant for this datatype. We can significantly reduce the size of the dataset by changing their datatypes.


# Quality:

- The data is too large. There are lots of columns which have int64 but can be reduced to int16 or int32
- RankPoints column has Null values represented in -1
- HeadshotKills, Heals,revives and kills have unacceptable maximum values


# Cleaning Data
## Define: There are lots of columns which have int64 but can be reduced to int16 or int32
### Code

In [None]:
df['assists']=df['assists'].astype('int16')
df['boosts']=df['boosts'].astype('int16')
df['DBNOs']=df['DBNOs'].astype('int16')
df['headshotKills']=df['headshotKills'].astype('int16')
df['heals']=df['heals'].astype('int16')
df['killPlace']=df['killPlace'].astype('int16')
df['killPoints']=df['killPoints'].astype('int32')
df['kills']=df['kills'].astype('int16')
df['killStreaks']=df['killStreaks'].astype('int16')
df['matchDuration']=df['matchDuration'].astype('int32')
df['maxPlace']=df['maxPlace'].astype('int16')
df['numGroups']=df['numGroups'].astype('int16')
df['rankPoints']=df['rankPoints'].astype('int32')
df['roadKills']=df['roadKills'].astype('int16')
df['teamKills']=df['teamKills'].astype('int16')
df['vehicleDestroys']=df['vehicleDestroys'].astype('int16')
df['weaponsAcquired']=df['weaponsAcquired'].astype('int16')
df['winPoints']=df['winPoints'].astype('int32')
df['winPlacePerc']=df['winPlacePerc'].astype('float32')
df['damageDealt']=df['damageDealt'].astype('float32')
df['longestKill']=df['longestKill'].astype('float32')
df['rideDistance']=df['rideDistance'].astype('float32')
df['swimDistance']=df['swimDistance'].astype('float32')
df['walkDistance']=df['walkDistance'].astype('float32')

### Test

In [None]:
df.info()

There we go: from 1000 MB to 460 MB

## Define: `RankPoints` column has Null values represented in -1
### Code

In [None]:
df['rankPoints'].value_counts()/df.shape[0]*100

Dropping the rank column as this ranking is inconsistent and is being deprecated in the API’s next version. 

In [None]:
df.drop(columns={'rankPoints'},inplace=True)

### Test

In [None]:
df.head(1)

## Define: `HeadshotKills`, `Heals`,`Revives` and `Kills` have unacceptable maximum values
### Code

In [None]:
df[(df['headshotKills']>40) | (df['heals']>50) | (df['revives']>30) | (df['kills']>60)]

### Very very interesting discrepancies pointed out:
- 334400, 3431247 have very high number of `Kills`. It's like these peolpe have wiped out 3 quarters of the map themselves.
- 3431247, 2020831, 1454065, 3673965 have very high number of `Headshots`. Either these players are very very good or they are hackers.
- 4193891, 3088672 have 39 and 32 `Revives` respectively with 0 and 3 `Kills`. Who are these players who play PUBG only to revive others and not kill anyone?

All the rows have unique discrepancies which can be pointed out. It is better we remove these people as these outliers might change the mathematics of our calculations

In [None]:
df=df[(df['headshotKills']<25) & (df['revives']<20) & (df['kills']<30)]

### Test

In [None]:
df.shape

## Cleaned Data

In [None]:
df.head()

# Exploratory Data Analysis

## Let's start with the correlation between all the columns

In [None]:
plt.subplots(figsize=(20,7))
sns.heatmap(df.corr())
plt.title('Correlation between the columns')
plt.show()

### Inferences from the above correlation heatmap:
- `Boost` has a moderate amount of correlation with `Walk Distance` and `Win Place`.
- `Damage Dealt` is highly correalted with `Kills`.
- `Kill Place` is an interesting column which has high negative correlation with `Win Place`, `Kill Streaks`, `Damage Dealt`.
- `Kills` is also correlated with `Kill Streaks`.
- `Walk Distance` is highly correlated with `Win Place`.
- There is a significant amount of correlation between `Weapons Acquired` and `Win Place`.

### Comparing `Boosts` with `Win Place`

In [None]:
plt.subplots(figsize=(30,12))
sns.violinplot(x='boosts',y='winPlacePerc',data=df)
plt.title('Boosts vs Win Place')
plt.show()

Can't say why Boost 24 has such a peculiar distribution. But the interesting fact to draw from the graph is that people who have more than 0.8 Win Place hit 6 to 13 boosts in a match.

### Comparing `Walk Distance` with `Win Place`

In [None]:
sns.jointplot(x='walkDistance',y='winPlacePerc',data=df,height=15)
plt.show()

People who normally walk greater than 6000 distance has win probability more than 60%. Well, it's not that people less than this Walk Distance cannot win.

### Relationship between `Damage Dealt` and `Kills`

In [None]:
sns.jointplot(x='kills',y='damageDealt',data=df,height=10)
plt.show()

It's kind of self explanatory that the more the person kills, the more he is prone to be damaged. My interest is with those people who pull off maximum kills with very less damage.
But this number is very less and most people kill in the range 0-5 with the corresponding damages as indicated by the below graph.


In [None]:
sns.jointplot(x='kills',y='damageDealt',data=df,height=5
              ,kind='hex')
plt.show()

# Target-centric Analysis

### Let's see how does `Assists` go with `Win Place`

In [None]:
plt.subplots(figsize=(20,10))
sns.boxplot(x='assists',y='winPlacePerc',data=df)
plt.title('Assist vs Win Place')
plt.show()

The above graph shows that assists 2-10 is associated with high Win Place. But I really don't trust assist values 3-6 as it contains so many outliers.

Assist values 7-10 gives us a promising result.

It is also interesting to note that assist values higher than 10 gives worse results...but it might be the case that we do not have enough data in those values to draw any conclusion.

### Let's see how `Boosts` goes with `Win Place`

In [None]:
plt.subplots(figsize=(20,10))
sns.boxplot(x='boosts',y='winPlacePerc',data=df)
plt.title('Boosts vs Win Place')
plt.show()

It is logical that the higher you use boosts the longer you will live and might actually win the match

### Damage Dealt vs Win Place

In [None]:
sns.jointplot(x='damageDealt',y='winPlacePerc',data=df,height=10,kind='hex')
plt.show()

Surprisingly Damage Dealt has no significant relationship with Win Place.

### DBNOs vs Win Place

In [None]:
plt.subplots(figsize=(20,10))
sns.barplot(x='DBNOs',y='winPlacePerc',data=df)
plt.title('DBNOs vs Win Place')
plt.show()

### Headshots vs Win Place


In [None]:
plt.subplots(figsize=(20,10))
sns.barplot(x='headshotKills',y='winPlacePerc',data=df)
plt.title('Headshots vs Win Place')
plt.show()

### Heals vs Win Place

In [None]:
plt.subplots(figsize=(20,10))
sns.boxplot(x='heals',y='winPlacePerc',data=df)
plt.title('Heals vs Win Place')
plt.show()

### Kill Points vs Win Place

In [None]:
sns.jointplot(x='killPoints',y='winPlacePerc',data=df,height=10)
plt.show()

The graph might look interesting at the first glance but not giving us any useful info other than that every player who plays PUBG in every run scores atleast 500-1000 Kill Points even if they die very fast.

### Kills vs Win Place

In [None]:
plt.subplots(figsize=(20,10))
sns.barplot(x='kills',y='winPlacePerc',data=df)
plt.title('Kills vs Win Place')
plt.show()

Kills more than 6 gives us 80% Win Place or more.

### Kill Streaks vs Win Place

In [None]:
plt.subplots(figsize=(20,10))
sns.barplot(x='killStreaks',y='winPlacePerc',data=df)
plt.title('Kill Streaks vs Win Place')
plt.show()

### Longest Kill vs Win Place

In [None]:
sns.jointplot(x='longestKill',y='winPlacePerc',data=df,height=10)
plt.show()

Not a very significant relationship

### Match Type vs Win Place

In [None]:
plt.subplots(figsize=(20,10))
sns.violinplot(x='matchType',y='winPlacePerc',data=df)
plt.title('Match Type vs Win Place')
plt.show()

It's very hard to draw any conclusion from this graph as all the graphs are representing the same Win Place. Seeing the upper quartile ranges and the kde's we can say that normal-solo, normal-squad-fpp, and normal-solo-fpp have some upperhand on Win Place.

### Revives vs Win Place

In [None]:
plt.subplots(figsize=(20,10))
ax=sns.barplot(x='revives',y='winPlacePerc',data=df)
ax.set_xticklabels(ax.get_xticklabels(),rotation=90,ha='right')
plt.title('Revives vs Win Place')
plt.show()

### Ride Distance vs Win place

In [None]:
sns.jointplot(x='rideDistance',y='winPlacePerc',data=df,height=15)
plt.show()

The only conclusion that can be drawn is that people who ride more has higher chances of winning the match.

### Road Kills vs Win Place

In [None]:
plt.subplots(figsize=(20,10))
sns.barplot(x='roadKills',y='winPlacePerc',data=df)
plt.title('Road Kills vs Win Place')
plt.show()

### Swim Distance vs Win Place

In [None]:
sns.jointplot(x='swimDistance',y='winPlacePerc',data=df,height=10)
plt.show()

### Team Kills vs Win Place

In [None]:
plt.subplots(figsize=(20,10))
sns.boxplot(x='teamKills',y='winPlacePerc',data=df)
plt.title('Team Kills vs Win Place')
plt.show()

### Vehicle Destroys vs Win Place

In [None]:
plt.subplots(figsize=(20,10))
sns.boxplot(x='vehicleDestroys',y='winPlacePerc',data=df)
plt.title('Vehicle Destroys vs Win Place')
plt.show()

Vehicle Destroys has a significant relation with Win Place.

### Walk Distance vs Win Place

In [None]:
sns.jointplot(x='walkDistance',y='winPlacePerc',data=df,kind='hex')
plt.show()

This graph is really promising showing more is the Walk Distance, more is the chance of winning.

### Weapons Acquired vs Win Place

In [None]:
plt.subplots(figsize=(20,10))
sns.barplot(x='weaponsAcquired',y='winPlacePerc',data=df)
plt.title('Weapons Acquired vs Win Place')
plt.show()

### Win Points vs Win Place

In [None]:
sns.jointplot(x='winPoints',y='winPlacePerc',data=df,height=10,kind='hex')
plt.show()

No significant relation

In [None]:
good=df[df['winPlacePerc']>0.6]

In [None]:
good['player_type']=np.where(good['winPlacePerc']>0.9,"Winners","Medium")

## Comparison of Winners and Medium players in terms of Walk Distance and Kills

In [None]:
sns.relplot(x='walkDistance',y='kills',hue='player_type',data=good,height=7,aspect=3)
plt.title('Walk Distance vs Kills')
plt.show()

#### Well, finally we have something very interesting. The Winners actually choose a sweet spot between killing and walking. The Medium players walk less and try to gain more kills. Some medium players like to walk very much instead of killing (they try to survive). But the winners walk and kill with satisfying equality.

## Comparison between Winners and Medium Players in terms of Headshots and Kills

In [None]:
sns.relplot(x='headshotKills',y='kills',hue='player_type',data=good,height=7,aspect=3)
plt.title('Headshots vs Kills')
plt.show()

#### Interesting to note that Medium players try to get as many kills as they can but the winners try to take headshots more than preferring normally killing more people.

## Comparison between Winners and Medium Players in terms of Walk and Ride Distance

In [None]:
sns.relplot(x='rideDistance',y='walkDistance',hue='player_type',data=good,height=7,aspect=3)
plt.title('Ride Distance vs Walk Distance')
plt.show()

#### The Medium players are random. They try everything: walking and riding(doing either extensively) but winners pick up a sweet spot where they neither walk very much nor ride very much. They just focus on eliminating enemies. 

## Which MatchType is better?

In [None]:
plt.subplots(figsize=(10,10))
ax=sns.countplot(x='matchType',hue='player_type',data=good)
ax.set_xticklabels(ax.get_xticklabels(),rotation=40,ha='right')
plt.title('Frequency of Matches Played')
plt.show()

#### It is evident from the above graph that squad-fpp and duo-fpp are the best Match Types. And both are preferred by Medium and Winners.

## Comparison between  Winners and Medium Players in terms of Kills and Weapons Acquired 

In [None]:
sns.relplot(x='kills',y='weaponsAcquired',hue='player_type',data=good,height=7,aspect=3)
plt.title('Weapons Acquired vs Kills')
plt.show()

#### Medium players try to pick up as many weapons as they can killing less enemies but Winners take only as much weapon needed and focus on eliminating enemies

## The Ultimate Comparison

In [None]:
sns.pairplot(good[['kills','walkDistance','rideDistance','matchType','weaponsAcquired','player_type']],hue='player_type',height=5,aspect=2,plot_kws={"s": 5})
plt.show()

### We have seen most of the observations before but there some additions to it:
- Winners tend to ride less and kill more.
- Winners tend to pick up the sweet spot between walking and riding.

# PUBG helpful tips to find yourself in the top:
### - Do not just be a walker or a rider. Find a sweet spot between walking and riding (do either when required).
### - Do not pick up weapons as much as you get. Just pick up as much as you need and focus on killing enemies.
### - Go for Headshots because not only does it kill your enemies faster but also give you more points.
### - Try to play with a squad or play duo as it increases chances of winning as you will have people to back you up.
### - Apply as many boosts as you can because it increases your chance of survival.
## Now go and tear the ground up. Best of Luck!!!

# Thank You very much for staying along this far.