This kernel aims to identify the most important aspects (survival/kill/boost/..) 
required to win a Squad game (TPP & FPP) through statistics

Built upon experience & intuition acquired after months of gaming 

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns 
import warnings
warnings.filterwarnings("ignore")

In [None]:
train = pd.read_csv('../input/train_V2.csv')
test = pd.read_csv('../input/test_V2.csv')
train.head()

**Data fields**
* DBNOs - Number of enemy players knocked.
* assists - Number of enemy players this player damaged that were killed by teammates.
* boosts - Number of boost items used.
* damageDealt - Total damage dealt. Note: Self inflicted damage is subtracted.
* headshotKills - Number of enemy players killed with headshots.
* heals - Number of healing items used.
* Id - Player’s Id
* killPlace - Ranking in match of number of enemy players killed.
* killPoints - Kills-based external ranking of player. (Think of this as an Elo ranking where only kills matter.) If there is a value other than -1 in rankPoints, then any 0 in killPoints should be treated as a “None”.
* killStreaks - Max number of enemy players killed in a short amount of time.
* kills - Number of enemy players killed.
* longestKill - Longest distance between player and player killed at time of death. This may be misleading, as downing a player and driving away may lead to a large longestKill stat.
* matchDuration - Duration of match in seconds.
* matchId - ID to identify match. There are no matches that are in both the training and testing set.
* matchType - String identifying the game mode that the data comes from. The standard modes are “solo”, “duo”, “squad”, “solo-fpp”, “duo-fpp”, and “squad-fpp”; other modes are from events or custom matches.
* rankPoints - Elo-like ranking of player. This ranking is inconsistent and is being deprecated in the API’s next version, so use with caution. Value of -1 takes place of “None”.
* revives - Number of times this player revived teammates.
* rideDistance - Total distance traveled in vehicles measured in meters.
* roadKills - Number of kills while in a vehicle.
* swimDistance - Total distance traveled by swimming measured in meters.
* teamKills - Number of times this player killed a teammate.
* vehicleDestroys - Number of vehicles destroyed.
* walkDistance - Total distance traveled on foot measured in meters.
* weaponsAcquired - Number of weapons picked up.
* winPoints - Win-based external ranking of player. (Think of this as an Elo ranking where only winning matters.) If there is a value other than -1 in rankPoints, then any 0 in winPoints should be treated as a “None”.
* groupId - ID to identify a group within a match. If the same group of players plays in different matches, they will have a different groupId each time.
* numGroups - Number of groups we have data for in the match.
* maxPlace - Worst placement we have data for in the match. This may not match with numGroups, as sometimes the data skips over placements.
winPlacePerc - The target of prediction. This is a percentile winning placement, where 1 corresponds to 1st place, and 0 corresponds to last place in the match. It is calculated off of maxPlace, not numGroups, so it is possible to have missing chunks in a match.

In [None]:
#load data for squad-tpp (squad) & squad-fpp
squad_data = train.loc[(train['matchType'] == 'squad') 
                       | (train['matchType'] == 'squad-fpp')]
squad_data.head()

In [None]:
#getting rid of useless info
useless_columns = ['Id', 'groupId', 'matchId']
squad_data.drop(useless_columns, inplace=True, axis=1)
squad_data.head()


In a squad game, let us test out the various strategies and see which one strategy or a combination of strategies give more **Chicken Dinners**

we would see if Running or covering the map by Vehicle or Swimming or a permutation & combination of these 3 lead to more Chicken Dinners 

**Case 1 : Running **

Running can be subdivided into two sub strategies

1. ***Runners who play aggressively*** : Runners who roam about to kill enemies
2. ***Runners who hide from fighting zones*** : Runners who run and hide from fights & encounters

***(here runners mean squads who run on map with an average distance traversed above average)***

Now, i will classify runners who play aggressively based on the damage they deal.
I cannot justifiably classify aggressiveness based on kills because
* Mean kill is about 0.90. +2 above average is also not considered aggressive. I can't take kills above 0.90 and declare a squad aggressive nor can i set a certain value (like 5 kills which indicate aggressiveness) based on a random intuition.

Hence, a better alternative to classify aggressiveness is based on th damage dealt. 

In [None]:
#Average runs in a game
print("In a squad game, the whole squad or the alive players walk/sprint {:.1f}m on an average  ".format(squad_data['walkDistance'].mean()))

Let us see if there is a coorelation between walking and Chicken Dinners (which i think there is,but let data do the talking)

In [None]:
sns.jointplot(x="winPlacePerc", y="walkDistance",  data=squad_data, height=10, ratio=3, color="m")
plt.show()

Let us now take only the data where the walking distance is higher than the average 

In [None]:
squad_data_run = squad_data.loc[(squad_data['walkDistance'] > 1237.8 )]
squad_data_run.head()

In [None]:
sns.jointplot(x="winPlacePerc", y="walkDistance",  data=squad_data_run, height=10, ratio=3, color="m")
plt.show()

Let's find out the pearson coorelation score

In [None]:
squad_data_run['walkDistance'].corr(squad_data_run['winPlacePerc'])

**Case 1.1. Aggressive runners**

Runners who roam about to kill enemies


In [None]:
#Average damage in a game
print("In a squad game, the whole squad or the alive players deal a damage of {:.1f} on an average  ".format(squad_data['damageDealt'].mean()))

We consider damage by squads above 132.0 as aggressive game play...
Let us filter out runners who are aggressive, i.e., squads with walking distance above average & damage above average

In [None]:
squad_runKillers = squad_data_run.loc[(squad_data_run['damageDealt'] > 132.0 )]
squad_runKillers.head()

Now let us find coorelation between Aggressive running & Chicken Dinner with help of
* scatter plot
* pearson coorelation

In [None]:
sns.jointplot(x="winPlacePerc", y="walkDistance",  data=squad_runKillers, height=10, ratio=3, color="m")
plt.show()

In [None]:
squad_runKillers['walkDistance'].corr(squad_runKillers['winPlacePerc'])

**Case 1.2. Camper Runners**

 Runners who run and hide from fights & encounters

Let us filter out runners who are campers, i.e., squads with walking distance above average & damage below average

In [None]:
squad_runCampers = squad_data_run.loc[(squad_data_run['damageDealt'] < 132.0 )]
squad_runCampers.head()

Now let us find coorelation between camping - running & Chicken Dinner with help of
* scatter plot
* pearson coorelation

In [None]:
sns.jointplot(x="winPlacePerc", y="walkDistance",  data=squad_runCampers, height=10, ratio=3, color="m")
plt.show()

In [None]:
squad_runCampers['walkDistance'].corr(squad_runCampers['winPlacePerc'])

**Case 2 : Driving**


In [None]:
#Average vehicle distance traversed in a game
print("In a squad game, the whole squad or the alive players ride {:.1f}m on an average  ".format(squad_data['rideDistance'].mean()))

Let us see if there is a coorelation between driving and Chicken Dinners (which to my intution shouldn't be much after spending hours daily into this game)
Because driving around increases chance of being spotted and can lead to getting killed especially in a tense place or small zones

In [None]:
sns.jointplot(x="winPlacePerc", y="rideDistance",  data=squad_data_run, height=10, ratio=3, color="m")
plt.show()

Let us now take only the data where riding distance is higher than the average 

In [None]:
squad_data_drive = squad_data.loc[(squad_data['rideDistance'] >  636.4 )]
squad_data_run.head()

In [None]:
sns.jointplot(x="winPlacePerc", y="rideDistance",  data=squad_data_drive, height=10, ratio=3, color="m")
plt.show()

Finding the pearson coorelation score

In [None]:
squad_data_drive['rideDistance'].corr(squad_data_drive['winPlacePerc'])

There is a small coorelation between riding & winning (as expected)

**Case 3 : Swimming**

While many disregard this strategy, this strategy has a very high probability of landing you up in Top 10s more often, provided the zones and playing areas are in your favour ,i.e., towards water bodies.

(*Again, this is an experience based intuition. Let's see what data says*)

In [None]:
#Average distance swam in a game
print("In a squad game, the whole squad or the alive players swim {:.1f}m on an average  ".format(squad_data['swimDistance'].mean()))

In [None]:
sns.jointplot(x="winPlacePerc", y="swimDistance",  data=squad_data, height=10, ratio=3, color="m")
plt.show()

Let's check out players/squads with swimming distance above average

In [None]:
squad_data_swim = squad_data.loc[(train['rideDistance'] >  4.4 )]
squad_data_run.head()

In [None]:
sns.jointplot(x="winPlacePerc", y="swimDistance",  data=squad_data_swim, height=10, ratio=3, color="m")
plt.show()

Seems like my intuition of swimming leads to top 10s more often wasn't wrong though.
However, it should also be stated that swimming won't help you win matches mainly because of 
1. Less availability of loot
2. Disability to fire when in water (You would only be taking hits)
3. It becomes dufficult ot get out of water and into land during the final or pre-final circle as it increases chances of getting spotted 
4. Playzone never ends in a water body (at max below the bridge)

However let's still go forward and assign a numerical value to my intution and visualization
Calculation Pearson coorelation....

In [None]:
squad_data_swim['swimDistance'].corr(squad_data_swim['winPlacePerc'])

**Case 4 : Killing**

Killing can be sub divided into two categories
1. Camping & Killing :-

    Here i will consider squads who's walking distance is less than average. They usually end up predicting the zone and stay put in a place
2. Aggressive Killing without camping(Running around) :-

    Already discussed in the Runners section asAggressive Runners

In [None]:
#Average kills in a game
print("In a squad game, the whole squad or the alive players kill {:.1f}players on an average  ".format(squad_data['kills'].mean()))

In [None]:
squad_camperKillers = squad_data.loc[(squad_data['walkDistance'] <  636.4 )]

In [None]:
squad_camperKillers['killsCategories'] = pd.cut(squad_camperKillers['kills'], [-1, 0, 2, 5, 10, 60], labels=['0_kills','1-2_kills', '3-5_kills', '6-10_kills', '10+_kills'])

plt.figure(figsize=(15,8))
sns.boxplot(x="killsCategories", y="winPlacePerc", data=squad_camperKillers)
plt.show()

In [None]:
squad_camperKillers['kills'].corr(squad_camperKillers['winPlacePerc'])

Hence, Camper squads have lesser chance of making it to a Chicken dinner over squads who explore the entire playing area and kill

**As of now...**
Squads who tend to be runners and play aggressively ,i.e., squads with walkDistance & damageDealt above average, have a higher coorelation with winPlacePerc...
I will try and think about the logical inference of a coorelation between..
* Heals & boosts
* weapons used
and a Chicken Dinner