## PlayerUnknown’s Battlegrounds

PlayerUnknown’s Battlegrounds (more commonly referred to as PUBG) is a multiplayer online battle royale game published and developed by the South Korean company Bluehole. Similar to other popular games on the esport scene, PUBG is also based on a mod created for ARMA 2 called “DayZ: Battle Royale” by Brendan “PlayerUnknown” Greene.



The whole concept of a Battle Royale game mode is heavily inspired by the Japanese film “Battle Royale” in which a group of students are transferred to an island and forced to fight to death by their government. The game was released for Microsoft Windows via the Steam platform, and it was initially released as an early access game in March 2017. The game was fully developed and released on December 20, 2017. Ever since, it has had several releases on different platforms, such as PS4 and Xbox One but also for mobile devices.

PlayerUnknown’s Battlegrounds sends up to 100 players to an island via air transfer and it’s up to all individual players to decide when to parachute onto the island where they instantly need to scavenge and loot buildings and smaller compounds for weapons and equipment that they need to use for surviving and eliminating their enemies. In early game, the map is very big, but the playable area is slowly decreasing over time.

Being outside the playable area will slowly inflict damage on players until they either die or manage to run into the playable area before all their health is lost. The game can be played in Solo, Duo or Squad mode with up to 4 players. The goal of the game is obviously to be the last one standing to get the win.

It’s a very intense and exciting game which will keep you on your toes from start to finish. It also forces players to be more cautious and strategic than in your general run-and-gun shooter game. The game was praised on its release, even though it was released with a list of irritating bugs and server issues, it still managed to get a really good traction via big twitch streamers that popularized the game very quickly.

In [None]:
## Importing some libraries.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
plt.style.use('fivethirtyeight')

### imporing the dataset and creating a backup of the dataset.

In [None]:
df=pd.read_csv('../input/pubg-finish-placement-prediction/train_V2.csv')
reserved=df.copy()

### Shape of the dataset.

In [None]:
df.shape

### Head of the dataset.

In [None]:
df.head()

### Info of the data.

In [None]:
df.info()

### All the columns of the dataset.

In [None]:
df.columns

### Column description:
- DBNOs - Number of enemy players knocked.
- assists - Number of enemy players this player damaged that were killed by teammates.
- boosts - Number of boost items used.
- damageDealt - Total damage dealt. Note: Self inflicted damage is subtracted.
- headshotKills - Number of enemy players killed with headshots.
- heals - Number of healing items used.
- Id - Player’s Id
- killPlace - Ranking in match of number of enemy players killed.
- killPoints - Kills-based external ranking of player. (Think of this as an Elo ranking where only kills matter.) If there is a value other than -1 in rankPoints, then any 0 in killPoints should be treated as a “None”.
- killStreaks - Max number of enemy players killed in a short amount of time.
- kills - Number of enemy players killed.
- longestKill - Longest distance between player and player killed at time of death. This may be misleading, as downing a player and driving away may lead to a large longestKill stat.
- matchDuration - Duration of match in seconds.
- matchId - ID to identify match. There are no matches that are in both the training and testing set.
- matchType - String identifying the game mode that the data comes from. The standard modes are “solo”, “duo”, “squad”, “solo-fpp”, “duo-fpp”, and “squad-fpp”; other modes are from events or custom matches.
- rankPoints - Elo-like ranking of player. This ranking is inconsistent and is being deprecated in the API’s next version, so use with caution. Value of -1 takes place of “None”.
- revives - Number of times this player revived teammates.
- rideDistance - Total distance traveled in vehicles measured in meters.
- roadKills - Number of kills while in a vehicle.
- swimDistance - Total distance traveled by swimming measured in meters.
- teamKills - Number of times this player killed a teammate.
- vehicleDestroys - Number of vehicles destroyed.
- walkDistance - Total distance traveled on foot measured in meters.
- weaponsAcquired - Number of weapons picked up.
- winPoints - Win-based external ranking of player. (Think of this as an Elo ranking where only winning matters.) If there is a value other than -1 in rankPoints, then any 0 in winPoints should be treated as a “None”.
- groupId - ID to identify a group within a match. If the same group of players plays in different matches, they will have a different groupId each time.
- numGroups - Number of groups we have data for in the match.
- maxPlace - Worst placement we have data for in the match. This may not match with numGroups, as sometimes the data skips over placements.
- winPlacePerc - The target of prediction. This is a percentile winning placement, where 1 corresponds to 1st place, and 0 corresponds to last place in the match. It is calculated off of maxPlace, not numGroups, so it is possible to have missing chunks in a match.

### Some quick conclusion of the dataset.
- The dataset is huge and consuming lots of memory. 
- Many columns are stored in int64 which can be stored in int32 or even in int16, which will reduce the memory consumtion.
- 'RankPoints' column have null values stored as -1.

### Lets see the value count of rankPoints column.

In [None]:
df['rankPoints'].value_counts()

### As we can see, a hugh number of values are -1. So, it is useless and lets drop the column.

In [None]:
df.drop(columns={'rankPoints'},inplace=True)

### There are many columns which is stored in int64 datatype. It can be reduced to int32 or int16. So, Let's do this.
### 'matchType' is a categorical column but it is stored in object data type. Changing the datatype of the column from object to categorical

In [None]:
df['assists']=df['assists'].astype('int16')
df['boosts']=df['boosts'].astype('int16')
df['DBNOs']=df['DBNOs'].astype('int16')
df['headshotKills']=df['headshotKills'].astype('int16')
df['heals']=df['heals'].astype('int16')
df['killPlace']=df['killPlace'].astype('int16')
df['killPoints']=df['killPoints'].astype('int32')
df['kills']=df['kills'].astype('int16')
df['killStreaks']=df['killStreaks'].astype('int16')
df['matchDuration']=df['matchDuration'].astype('int32')
df['maxPlace']=df['maxPlace'].astype('int16')
df['numGroups']=df['numGroups'].astype('int16')
df['roadKills']=df['roadKills'].astype('int16')
df['teamKills']=df['teamKills'].astype('int16')
df['vehicleDestroys']=df['vehicleDestroys'].astype('int16')
df['weaponsAcquired']=df['weaponsAcquired'].astype('int16')
df['winPoints']=df['winPoints'].astype('int32')
df['winPlacePerc']=df['winPlacePerc'].astype('float32')
df['damageDealt']=df['damageDealt'].astype('float32')
df['longestKill']=df['longestKill'].astype('float32')
df['rideDistance']=df['rideDistance'].astype('float32')
df['swimDistance']=df['swimDistance'].astype('float32')
df['walkDistance']=df['walkDistance'].astype('float32')
df['matchType']=df['matchType'].astype('category')

### Now let see what are the changes.

In [None]:
print(df.shape)
df.info()

### Wow, memory usage have droped from 983 mb to 411 mb. That's great.

### Lets plot a heatmap of correlation between all the columns.

In [None]:
plt.figure(figsize=(20,8))
sns.heatmap(df.corr())

### Points to be noted from the above corelation heatmap.
- killPoints is highly related with winPoints.
- damageDealt is corelated with DBNOs and kills.
- boosts is corelated with winPlacePrec and walkDistance.
- weaponAcquired is also corelated with walkDistance and winPlacePerc.
- and many more ........

### Comparing 'killPoints' with 'winPoints'

In [None]:
sns.jointplot(x='killPoints',y='winPoints',data=df)

### Comparing 'damageDealt' with 'kills'.

In [None]:
sns.jointplot(x='kills',y='damageDealt',data=df,height=8)

### Comparing walkDistance with winPlacePerc.

In [None]:
sns.jointplot(x='walkDistance',y='winPlacePerc',data=df,height=15)
plt.show()

### Comparing boosts with winPlacePrec and walkDistance

In [None]:
plt.figure(figsize=(20,6))
sns.jointplot(x='boosts',y='walkDistance',data=df,height=10,color='green')

In [None]:
sns.jointplot(x='winPlacePerc',y='boosts',data=df,height=10,kind='scatter',color='red')

### From the above graphs we can conclude that -
- Players who walk more that 6000 metre has wining chance of more that 60%
- Players who took 6 or more boosters has high chance of wining.
- Players with high kills also deal with high amount of damage.
- Players with high killpoints also have high winPoint. (It's quite normal)

## Now let us see the comparison between winPlacePerc and others one by one.

### "kills" vs "WinPlacePerc"

In [None]:
plt.figure(figsize=(20,6))
sns.boxenplot(x='kills',y='winPlacePerc',data=df)

### We can see that player with 6-15 kills have a chance to win match 80% and above.

### 'assists' vs 'winPlacePerc'

In [None]:
plt.figure(figsize=(20,6))
sns.boxplot(x='assists',y='winPlacePerc',data=df)

### We can see that assists of 2-10 has wining chance more that 60%, but can't say about assists of 3-6 because it has so many outliers. 7-9 is assuring some good results.

### 'Boosts' vs 'winPlacePerc'

In [None]:
plt.figure(figsize=(20,6))
sns.boxplot(x='boosts',y='winPlacePerc',data=df)

### We can see players who consumed 4 or more boosts has a wining chance more that 70%. Now it's quite natural that more you take booster to heal your damage more you can stay alive and win a match.

### DBNOs vs winPlacePerc

In [None]:
plt.figure(figsize=(20,6))
sns.boxenplot(x='DBNOs',y='winPlacePerc',data=df)

### 'headshotKills' vs 'winPlacePerc'

In [None]:
plt.figure(figsize=(20,6))
sns.boxenplot(x='headshotKills',y='winPlacePerc',data=df)

### It can be noticed that players with higher headshotkills have higher chance to win match. Now again, it's logically correct because if you are aiming for the head, you are gonna take down your enemy faster.

### "heals" vs 'winPlacePerc'

In [None]:
plt.figure(figsize=(20,6))
sns.boxenplot(x='heals',y='winPlacePerc',data=df)

## 'matchtype' vs 'winPlacePerc'

In [None]:
plt.figure(figsize=(20,6))
sns.violinplot(x='matchType',y='winPlacePerc',data=df)

## 'rideDistance' vs 'winPlacePerc'

In [None]:
plt.figure(figsize=(20,6))
sns.jointplot(x='rideDistance',y='winPlacePerc',data=df,color='green',height=8)

### We can clearly see that players with high rideDistance has higher chance of wining.

## 'swimDistance' vs 'winPlacePerc'

In [None]:
plt.figure(figsize=(20,6))
sns.jointplot(x='swimDistance',y='winPlacePerc',data=df,color='green',height=8)

### It is also kind of same as rideDistance vs winPlacePerc graph and also same conclusion.

## Lastly, Let us see which type of matchtype is better to play. 

In [None]:
plt.figure(figsize=(20,10))
sns.countplot(df['matchType'])

### We can see, squad-fpp is played by most of the players and below that duo-fpp, squad, solo-fpp are also played by many.

## Final Conclusion-
- Try to aim for headshot, as it dramatically increase the chance of wining a fight as well as a whole match.
- Try to prefer squad matchs tpp or fpp, if not than atleast play duo matchs. It gives you a second chance to survive by you teammates unlike in solo matchs.
- Always try to stay at full health and full boosted by consuming med-kits and boosters, it will surely help you to win a match.
- Always keep moving by walking or riding.
- Choose some good weapons with high damage so that you can take down you enemy easily.