# ***BIBLE FOR CHICKEN WINNER***
![](https://steamuserimages-a.akamaihd.net/ugc/869617814907901964/4668BA6C6879C1D260F265E2AA04E3E9E08FE109/)
> Image from https://steamuserimages-a.akamaihd.net/ugc/869617814907901964/4668BA6C6879C1D260F265E2AA04E3E9E08FE109/ by  Küni

### Winner Winner Chicken Dinner! 
Player Unkown's Battle Groud (PUBG) is one of hottest game in the world, which ranked 1st in Steam which is a global game platform.  
By looking at the PUBG data, this report will discuss about the game in various aspects.  

### ***The main objective of this report is examining the factors affecting to win the game.***


## 1. Import library and data

In [None]:
#Import library and data
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns 
import warnings
warnings.filterwarnings("ignore")

train = pd.read_csv("../input/train.csv")
test = pd.read_csv("../input/test.csv")

In [None]:
#check train data set
train.head(10)

In [None]:
#check test data set
test.head(10)

In [None]:
#characteristics of variables
train.info()

### 1. a) Data variables
* **DBNOs** - Number of enemy players knocked.
* **Assists** - Number of enemy players this player damaged that were killed by teammates.
* **boosts** - Number of boost items used.
* **damageDealt** - Total damage dealt. Note: Self inflicted damage is subtracted.
* **headshotKills** - Number of enemy players killed with headshots.
* **heals** - Number of healing items used.
* **killPlace** - Ranking in match of number of enemy players killed.
* **killPoints** - Kills-based external ranking of player. (Think of this as an Elo ranking where only kills matter.)
* **killStreaks** - Max number of enemy players killed in a short amount of time.
* **kills** - Number of enemy players killed.
* **longestKill** - Longest distance between player and player killed at time of death. This may be misleading, as downing a player and driving away may lead to a large longestKill stat.
* **matchId** - Integer ID to identify match. There are no matches that are in both the training and testing set.
* **revives** - Number of times this player revived teammates.
* **rideDistance** - Total distance traveled in vehicles measured in meters.
* **roadKills** - Number of kills while in a vehicle.
* **swimDistance** - Total distance traveled by swimming measured in meters.
* **teamKills** - Number of times this player killed a teammate.
* **vehicleDestroys** - Number of vehicles destroyed.
* **walkDistance** - Total distance traveled on foot measured in meters.
* **weaponsAcquired** - Number of weapons picked up.
* **winPoints** - Win-based external ranking of player. (Think of this as an Elo ranking where only winning matters.)
* **groupId** - Integer ID to identify a group within a match. If the same group of players plays in different matches, they will have a different groupId each time.
* **numGroups** - Number of groups we have data for in the match.
* **maxPlace** - Worst placement we have data for in the match. This may not match with numGroups, as sometimes the data skips over placements.
* **winPlacePerc** - The target of prediction. This is a percentile winning placement, where 1 corresponds to 1st place, and 0 corresponds to last place in the match. It is calculated off of maxPlace, not numGroups, so it is possible to have missing chunks in a match.

### Check point 1
* **Cheater or Hack-user** are serious concern in PUBG these days and to look at the usual pattern of the players in PUBG, they need to be removed.  
    There are few variables which may help on removing their data which are **walkDistance heals damage dealt DBNOs kills headshotKills** .  
    This is not confirmed yet and it will be examined further in ***Data Cleaning*** stage.
* **numGroups** variable will give an difference in match type as *Solo, Duo, Squad*. However, this does not fully represent certain match type such as *solo squad*.  
* **Objective of this study** will be obtaining the what kind of strategy player should use to win the game.  
  
  
By looking at the process, the data is ready for further analysis.

## 2. Data cleaning
As it is shown above, there are some data cleaning needed to get rid of cheaters and hack-users.  
There are few variables which may show the characteristics of them.  
* **walkDistance** - In the game, there are cases you may die after you just landed from plane but there is a hack that which you can fly. Therefore, there are few cases showing that extremely low walkDistance and high kills. These are suspected to be cheaters. Hence, it will be removed in this analysis.
* **heals** - it is hardly possible for a person to achieve the win without even healing one or not using boosted which will affect the healing as well. Therefore, 0 heals with high kills need to be examined.
* **damage dealt** - Extremely high damage need to be suspected which may not be able to be shown in a game with limited time
* **DBNOs** - This is one of the strongest factor which will show the characteristic of hack user. Their program is designed to kill people and that result will be shown in the value of DBNOs.
* **headshotKills** - Hack users are set to aim head of the opponents which will lead them for the headshot kills. Extremely high headshot kills need to be suspected which may not be even seen in pro players as well.

*These factors will be observed to check the outliers of the report.*|

#### 2. a) Factor Analysis

In [None]:
#plot walkDistance
plt.figure(figsize=(15,10))
plt.title('Walking Distance Distribution')
sns.distplot(train['walkDistance'], kde = False)
plt.show()

In [None]:
#plot heals
plt.figure(figsize=(15,10))
plt.title('Heals Distribution')
sns.distplot(train['heals'], kde = False)
plt.show()

In [None]:
#plot Damage Dealt
plt.figure(figsize=(15,10))
plt.title('Damage Dealt Distribution')
sns.distplot(train['damageDealt'], kde = False)
plt.show()

In [None]:
#plot DBNOs
plt.figure(figsize=(15,10))
plt.title('Number of Kills distribution')
sns.distplot(train['DBNOs'], kde = False)
plt.show()