# PUBG Linear Exploration

HOW COOL IS THIS DATASET

#### Labels

* DBNOs - Number of enemy players knocked.
* assists - Number of enemy players this player damaged that were killed by teammates.
* boosts - Number of boost items used.
* damageDealt - Total damage dealt. Note: Self inflicted damage is subtracted.
* headshotKills - Number of enemy players killed with headshots.
* heals - Number of healing items used.
* Id - Player’s Id
* killPlace - Ranking in match of number of enemy players killed.
* killPoints - Kills-based external ranking of player. (Think of this as an Elo ranking where only kills matter.) If there is a value other than -1 in rankPoints, then any 0 in killPoints should be treated as a “None”.
* killStreaks - Max number of enemy players killed in a short amount of time.
* kills - Number of enemy players killed.
* longestKill - Longest distance between player and player killed at time of death. This may be misleading, as downing a player and driving away may lead to a large longestKill stat.
* matchDuration - Duration of match in seconds.
* matchId - ID to identify match. There are no matches that are in both the training and testing set.
* matchType - String identifying the game mode that the data comes from. The standard modes are “solo”, “duo”, “squad”, “solo-fpp”, “duo-fpp”, and “squad-fpp”; other modes are from events or custom matches.
* rankPoints - Elo-like ranking of player. This ranking is inconsistent and is being deprecated in the API’s next version, so use with caution. Value of -1 takes place of “None”.
* revives - Number of times this player revived teammates.
* rideDistance - Total distance traveled in vehicles measured in meters.
* roadKills - Number of kills while in a vehicle.
* swimDistance - Total distance traveled by swimming measured in meters.
* teamKills - Number of times this player killed a teammate.
* vehicleDestroys - Number of vehicles destroyed.
* walkDistance - Total distance traveled on foot measured in meters.
* weaponsAcquired - Number of weapons picked up.
* winPoints - Win-based external ranking of player. (Think of this as an Elo ranking where only winning matters.) If there is a value other than -1 in rankPoints, then any 0 in winPoints should be treated as a “None”.
* groupId - ID to identify a group within a match. If the same group of players plays in different matches, they will have a different groupId each time.
* numGroups - Number of groups we have data for in the match.
* maxPlace - Worst placement we have data for in the match. This may not match with numGroups, as sometimes the data skips over placements.
* winPlacePerc - The target of prediction. This is a percentile winning placement, where 1 corresponds to 1st place, and 0 corresponds to last place in the match. It is calculated off of maxPlace, not numGroups, so it is possible to have missing chunks in a match.

In [None]:
# Our imports so data can be used
import pandas as pd
import numpy as np

In [None]:
# Looking at the training data
df = pd.read_csv('../input/train_V2.csv')

In [None]:
# Looking at the head
df.head()

In [None]:
# Looking at the tail
df.tail()

In [None]:
# Checking for duplicate 'Id's
df['Id'].value_counts().head()

# No duplicate 'Id's

In [None]:
# Looking at the structure of 'df'
df.info(null_counts=True)

# Almost perfectly even! We are missing a single 'winPlacePerc' value
# Also, we will clean up 'df' by dropping object type columns

In [None]:
# Dropping the Na datapoint
df.dropna(inplace=True)

In [None]:
# Douple checking that the number of rows match
df.info(null_counts=True)

# Nice and symmetrical!

In [None]:
# Dropping the object type arrays (EXCEPT FOR Id)
df.drop(['groupId','matchId','matchType'],axis=1,inplace=True)

In [None]:
# Checking info again (hopefully the last time)
df.info()

# Perfect!

## Basic Linear Regression

In [None]:
# Setting X and y for train test split
X = df[df.columns[1:-1]] # indexed to exclude Id and winPlacePerc
y = df['winPlacePerc']

In [None]:
# Importing train_test_split to split the data
from sklearn.model_selection import train_test_split

In [None]:
# Splitting the data for the model
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=101)

In [None]:
# Import the linear model
from sklearn.linear_model import LinearRegression

In [None]:
# Instantializing the model to 'lm'
lm = LinearRegression()

In [None]:
# Training the model
lm.fit(X_train,y_train)

In [None]:
# Making predictions
predictions = lm.predict(X_test)

In [None]:
# Visualizing the model prediction against Y_Test
import matplotlib.pyplot as plt
%matplotlib inline

plt.scatter(y_test,predictions)
plt.xlabel('Y Test')
plt.ylabel('Predicted Y')

# Pretty noisy but not terrable... let's check the mean absolute error

In [None]:
# Checking MAE - Mean Absolute Error
from sklearn import metrics

print('MAE:', metrics.mean_absolute_error(y_test, predictions))

# Not bad!
#   Our Model is performing at around %90 (1 - MAE)

In [None]:
# Using a distplot to see if our model is normally distributed
import seaborn as sns

sns.distplot((y_test-predictions),bins=50);

# The model looks to be mostly normally distributed

## Test Predictions

In [None]:
# Loading Sample Submission as 'ss'
ss = pd.read_csv('../input/sample_submission_V2.csv')

In [None]:
# Displaying 'ss' so that we know how to format our results
ss.head(3)

In [None]:
# Reading in the test data
test = pd.read_csv('../input/test_V2.csv')

In [None]:
# Checking the head
test.head(3)

In [None]:
# Cleaning 'test'
# Dropping the object type arrays (EXCEPT FOR Id)
test.drop(['groupId','matchId','matchType'],axis=1,inplace=True)

In [None]:
#Checking 'test' with .info
test.info(null_counts=True)

In [None]:
# Making Predictions!
pred = lm.predict(test[test.columns[1:]]) # Without Id column

In [None]:
# Instantializing our predictions dataframe 'test_pred'
test_pred = test[['Id']]

In [None]:
# Filling out 'test_pred' with our predictions
test_pred['winPlacePerc'] = pred

In [None]:
# Making sure that 'test_pred' is well formatted
test_pred.head()

In [None]:
# And boom goes the dynamite... we have PUBG predictions!
test_pred.to_csv('PUBGpredictions_V2_1.csv',index= False)