## CSGO Project

### Problem Statement:
Predicting the outcome of CSGO matches is complex due to numerous variables like player stats, team dynamics, and match conditions. Existing methods often lack accuracy and depth.



### Key Goal:

Build machine learning models that accurately predict the winning team in CSGO matches using player and team performance data.



### Why Is This Important?

* Supports strategic decisions in esports.
* Enhances fan engagement with data-driven insights.
* Helps teams improve performance.
* Demonstrates AI’s potential in gaming analytics.

**Counter-Strike (CS)** is a popular series of tactical first-person shooter (FPS) video games that have been enjoyed by gamers worldwide for many years. The series originated as a modification for the popular game Half-Life and quickly gained its own dedicated following. Here's an overview of Counter-Strike:

**Gameplay Overview:**

Counter-Strike is primarily a multiplayer game where two teams, the Counter-Terrorists (CTs) and the Terrorists (Ts), compete against each other.

The objective of each round varies based on the game mode, but the primary goals include:

**Counter-Terrorists:** Prevent the Terrorists from achieving their objectives, such as defusing a bomb or rescuing hostages.

**Terrorists:** Achieve their objectives, which may include planting a bomb at a designated site or holding hostages.
Rounds are relatively short, typically lasting a few minutes, and players have only one life per round. When a player is eliminated, they must wait until the next round to respawn.

**Key Features:**

**Weapons:** Players can purchase and use a wide variety of firearms, grenades, and equipment. The choice of weaponry is an essential strategic element in the game.

**Economy:** Players earn in-game money based on their performance in the previous rounds. Money is used to buy weapons and equipment for the next round.

**Maps:** Counter-Strike features a range of maps, each with its own layout and objectives. Popular maps include Dust II, Mirage, Inferno, and more.

**Teamwork:** Successful gameplay in Counter-Strike heavily relies on teamwork, communication, and strategy. Players often coordinate their actions with their teammates to achieve objectives.

**Competitive Play:** Counter-Strike is well-known for its competitive scene, with professional esports tournaments held worldwide.

**Popular Game Modes:**

**Bomb Defusal (de_):** In this mode, Terrorists attempt to plant a bomb at one of the designated bomb sites, while Counter-Terrorists aim to prevent the bomb from being planted or defuse it if it's planted.

**Hostage Rescue (cs_):** In hostage rescue mode, Counter-Terrorists must rescue hostages held by the Terrorists, while the Terrorists aim to prevent the rescues.

**Arms Race:** A fast-paced mode where players cycle through a series of weapons, aiming to be the first to get a kill with each weapon.

**Deathmatch:** A mode where players respawn quickly and aim to get as many kills as possible within a set time limit.

**Wingman:** A 2v2 competitive mode with smaller maps and shorter rounds.

Counter-Strike has evolved over the years with different versions, including Counter-Strike 1.6, Counter-Strike: Source, and Counter-Strike: Global Offensive (CS:GO), which is the most recent and widely played installment as of my last knowledge update in September 2021.

CS:GO is known for its competitive gameplay, professional esports scene, and ongoing updates that have kept the game relevant and enjoyable for players worldwide. It remains a cornerstone of the first-person shooter genre.



In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [4]:
df = pd.read_csv("CSGO Round Snapshots.csv")

In [5]:
df

Unnamed: 0,time_left,ct_score,t_score,map,bomb_planted,ct_health,t_health,ct_armor,t_armor,ct_money,...,t_grenade_flashbang,ct_grenade_smokegrenade,t_grenade_smokegrenade,ct_grenade_incendiarygrenade,t_grenade_incendiarygrenade,ct_grenade_molotovgrenade,t_grenade_molotovgrenade,ct_grenade_decoygrenade,t_grenade_decoygrenade,round_winner
0,175.00,0.0,0.0,de_dust2,False,500.0,500.0,0.0,0.0,4000.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,CT
1,156.03,0.0,0.0,de_dust2,False,500.0,500.0,400.0,300.0,600.0,...,0.0,0.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,CT
2,96.03,0.0,0.0,de_dust2,False,391.0,400.0,294.0,200.0,750.0,...,0.0,0.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,CT
3,76.03,0.0,0.0,de_dust2,False,391.0,400.0,294.0,200.0,750.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,CT
4,174.97,1.0,0.0,de_dust2,False,500.0,500.0,192.0,0.0,18350.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,CT
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
122405,15.41,11.0,14.0,de_train,True,200.0,242.0,195.0,359.0,100.0,...,2.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,T
122406,174.93,11.0,15.0,de_train,False,500.0,500.0,95.0,175.0,11500.0,...,2.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,T
122407,114.93,11.0,15.0,de_train,False,500.0,500.0,495.0,475.0,1200.0,...,4.0,3.0,5.0,1.0,0.0,0.0,5.0,0.0,0.0,T
122408,94.93,11.0,15.0,de_train,False,500.0,500.0,495.0,475.0,1200.0,...,5.0,0.0,3.0,0.0,0.0,0.0,4.0,0.0,0.0,T


In [7]:
# How many columns are there

df.columns

Index(['time_left', 'ct_score', 't_score', 'map', 'bomb_planted', 'ct_health',
       't_health', 'ct_armor', 't_armor', 'ct_money', 't_money', 'ct_helmets',
       't_helmets', 'ct_defuse_kits', 'ct_players_alive', 't_players_alive',
       'ct_weapon_ak47', 't_weapon_ak47', 'ct_weapon_aug', 't_weapon_aug',
       'ct_weapon_awp', 't_weapon_awp', 'ct_weapon_bizon', 't_weapon_bizon',
       'ct_weapon_cz75auto', 't_weapon_cz75auto', 'ct_weapon_elite',
       't_weapon_elite', 'ct_weapon_famas', 't_weapon_famas',
       'ct_weapon_g3sg1', 't_weapon_g3sg1', 'ct_weapon_galilar',
       't_weapon_galilar', 'ct_weapon_glock', 't_weapon_glock',
       'ct_weapon_m249', 't_weapon_m249', 'ct_weapon_m4a1s', 't_weapon_m4a1s',
       'ct_weapon_m4a4', 't_weapon_m4a4', 'ct_weapon_mac10', 't_weapon_mac10',
       'ct_weapon_mag7', 't_weapon_mag7', 'ct_weapon_mp5sd', 't_weapon_mp5sd',
       'ct_weapon_mp7', 't_weapon_mp7', 'ct_weapon_mp9', 't_weapon_mp9',
       'ct_weapon_negev', 't_weapon_negev',

## Columns Details 

* time_left - Time remaining in the current round (in seconds).
* map - The map being played (e.g., Dust2, Mirage, Inferno).
* bomb_planted - Indicates if the bomb is planted (0 = No, 1 = Yes).
* round_winner - Indicates the winner of the round (CT for Counter-Terrorists, T for Terrorists).
* ct_score - Current score of the Counter-Terrorists team.
* t_score - Current score of the Terrorists team.
* ct_health - Total health of the Counter-Terrorists.
* t_health - Total health of the Terrorists.
* ct_armor - Total armor of the Counter-Terrorists.
* t_armor - Total armor of the Terrorists.
* ct_money - Total money available for the Counter-Terrorists.
* t_money - Total money available for the Terrorists.
* ct_helmets - Number of Counter-Terrorists with helmets.
* t_helmets - Number of Terrorists with helmets.
* ct_defuse_kits - Number of defuse kits held by Counter-Terrorists.
* ct_players_alive - Number of Counter-Terrorists alive.
* t_players_alive - Number of Terrorists alive.
* ct_weapon_ak47, t_weapon_ak47 - Number of players holding AK-47.
* ct_weapon_aug, t_weapon_aug - Number of players holding AUG.
* ct_weapon_awp, t_weapon_awp - Number of players holding AWP.
* ct_weapon_famas, t_weapon_famas - Number of players holding FAMAS.
* ct_weapon_m4a1s, t_weapon_m4a1s - Number of players holding M4A1-S.
* ct_weapon_m4a4, t_weapon_m4a4 - Number of players holding M4A4.
* ct_weapon_mp5sd, t_weapon_mp5sd - Number of players holding MP5-SD.
* ct_weapon_mp7, t_weapon_mp7 - Number of players holding MP7.
* ct_weapon_mac10, t_weapon_mac10 - Number of players holding MAC-10.
* ct_weapon_p90, t_weapon_p90 - Number of players holding P90.
* ct_weapon_m249, t_weapon_m249 - Number of players holding M249.
* ct_weapon_negev, t_weapon_negev - Number of players holding Negev.
* ct_weapon_nova, t_weapon_nova - Number of players holding Nova.
* ct_weapon_xm1014, t_weapon_xm1014 - Number of players holding XM1014.
* ct_weapon_glock, t_weapon_glock - Number of players holding Glock.
* ct_weapon_deagle, t_weapon_deagle - Number of players holding Desert Eagle.
* ct_weapon_fiveseven, t_weapon_fiveseven - Number of players holding Five-Seven.
* ct_weapon_usps, t_weapon_usps - Number of players holding USP-S.
* ct_weapon_p250, t_weapon_p250 - Number of players holding P250.
* ct_weapon_ssg08, t_weapon_ssg08 - Number of players holding SSG08.
* ct_weapon_g3sg1, t_weapon_g3sg1 - Number of players holding G3SG1.
* ct_weapon_scar20, t_weapon_scar20 - Number of players holding SCAR-20.
* ct_grenade_hegrenade, t_grenade_hegrenade - Number of HE grenades held.
* ct_grenade_flashbang, t_grenade_flashbang - Number of Flashbang grenades held.
* ct_grenade_smokegrenade, t_grenade_smokegrenade - Number of Smoke grenades held.
* ct_grenade_incendiarygrenade, t_grenade_molotovgrenade - Number of Incendiary/Molotov grenades held.
* ct_grenade_decoygrenade, t_grenade_decoygrenade - Number of Decoy grenades held.



In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 122410 entries, 0 to 122409
Data columns (total 97 columns):
 #   Column                        Non-Null Count   Dtype  
---  ------                        --------------   -----  
 0   time_left                     122410 non-null  float64
 1   ct_score                      122410 non-null  float64
 2   t_score                       122410 non-null  float64
 3   map                           122410 non-null  object 
 4   bomb_planted                  122410 non-null  bool   
 5   ct_health                     122410 non-null  float64
 6   t_health                      122410 non-null  float64
 7   ct_armor                      122410 non-null  float64
 8   t_armor                       122410 non-null  float64
 9   ct_money                      122410 non-null  float64
 10  t_money                       122410 non-null  float64
 11  ct_helmets                    122410 non-null  float64
 12  t_helmets                     122410 non-nul

## Total 97 columns and dtypes: bool(1), float64(94), object(2) memory usage: 89.8+ MB

In [8]:
# Check How many null values are present.

df.isnull().sum()

time_left                    0
ct_score                     0
t_score                      0
map                          0
bomb_planted                 0
                            ..
ct_grenade_molotovgrenade    0
t_grenade_molotovgrenade     0
ct_grenade_decoygrenade      0
t_grenade_decoygrenade       0
round_winner                 0
Length: 97, dtype: int64

In [9]:
df.isnull().sum().sum()

0

In [11]:
# Check how many duplicates records are there.

df.duplicated().sum()

4962

### Total 4962 Duplicates records are there

In [13]:
# Drop all duplicates records.

df.drop_duplicates(inplace=True)

In [14]:
df

Unnamed: 0,time_left,ct_score,t_score,map,bomb_planted,ct_health,t_health,ct_armor,t_armor,ct_money,...,t_grenade_flashbang,ct_grenade_smokegrenade,t_grenade_smokegrenade,ct_grenade_incendiarygrenade,t_grenade_incendiarygrenade,ct_grenade_molotovgrenade,t_grenade_molotovgrenade,ct_grenade_decoygrenade,t_grenade_decoygrenade,round_winner
0,175.00,0.0,0.0,de_dust2,False,500.0,500.0,0.0,0.0,4000.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,CT
1,156.03,0.0,0.0,de_dust2,False,500.0,500.0,400.0,300.0,600.0,...,0.0,0.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,CT
2,96.03,0.0,0.0,de_dust2,False,391.0,400.0,294.0,200.0,750.0,...,0.0,0.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,CT
3,76.03,0.0,0.0,de_dust2,False,391.0,400.0,294.0,200.0,750.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,CT
4,174.97,1.0,0.0,de_dust2,False,500.0,500.0,192.0,0.0,18350.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,CT
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
122405,15.41,11.0,14.0,de_train,True,200.0,242.0,195.0,359.0,100.0,...,2.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,T
122406,174.93,11.0,15.0,de_train,False,500.0,500.0,95.0,175.0,11500.0,...,2.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,T
122407,114.93,11.0,15.0,de_train,False,500.0,500.0,495.0,475.0,1200.0,...,4.0,3.0,5.0,1.0,0.0,0.0,5.0,0.0,0.0,T
122408,94.93,11.0,15.0,de_train,False,500.0,500.0,495.0,475.0,1200.0,...,5.0,0.0,3.0,0.0,0.0,0.0,4.0,0.0,0.0,T


In [None]:
df.describe()

Unnamed: 0,time_left,ct_score,t_score,ct_health,t_health,ct_armor,t_armor,ct_money,t_money,ct_helmets,...,ct_grenade_flashbang,t_grenade_flashbang,ct_grenade_smokegrenade,t_grenade_smokegrenade,ct_grenade_incendiarygrenade,t_grenade_incendiarygrenade,ct_grenade_molotovgrenade,t_grenade_molotovgrenade,ct_grenade_decoygrenade,t_grenade_decoygrenade
count,117448.0,117448.0,117448.0,117448.0,117448.0,117448.0,117448.0,117448.0,117448.0,117448.0,...,117448.0,117448.0,117448.0,117448.0,117448.0,117448.0,117448.0,117448.0,117448.0,117448.0
mean,94.648097,6.769566,6.828775,408.522623,398.658828,317.751064,302.5623,10000.738625,11492.634613,2.120028,...,1.90532,1.895034,1.583084,1.671778,1.026318,0.020383,0.049605,1.386248,0.027689,0.025117
std,53.224518,4.802249,4.832447,133.833268,141.393442,170.339769,174.118608,11308.757451,12245.826779,1.831718,...,1.769392,1.803067,1.73846,1.835046,1.462231,0.145991,0.231219,1.671632,0.169642,0.162253
min,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,54.91,3.0,3.0,336.0,309.0,195.0,179.0,1300.0,1650.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
50%,94.89,6.0,6.0,500.0,500.0,382.0,353.0,5900.0,7650.0,2.0,...,2.0,1.0,1.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0
75%,114.96,10.0,11.0,500.0,500.0,487.0,471.0,15000.0,18350.0,4.0,...,3.0,3.0,3.0,3.0,2.0,0.0,0.0,3.0,0.0,0.0
max,175.0,32.0,33.0,500.0,600.0,500.0,500.0,80000.0,80000.0,5.0,...,7.0,7.0,6.0,9.0,5.0,3.0,3.0,5.0,3.0,2.0


## Feature  Engineering

In [15]:
# Label Encoding

from sklearn.preprocessing import LabelEncoder

In [17]:
LE = LabelEncoder()

In [18]:
for i in df: 
    if df[i].dtypes == 'object': 
        df[i] = LE.fit_transform(df[i])

In [19]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 117448 entries, 0 to 122409
Data columns (total 97 columns):
 #   Column                        Non-Null Count   Dtype  
---  ------                        --------------   -----  
 0   time_left                     117448 non-null  float64
 1   ct_score                      117448 non-null  float64
 2   t_score                       117448 non-null  float64
 3   map                           117448 non-null  int32  
 4   bomb_planted                  117448 non-null  bool   
 5   ct_health                     117448 non-null  float64
 6   t_health                      117448 non-null  float64
 7   ct_armor                      117448 non-null  float64
 8   t_armor                       117448 non-null  float64
 9   ct_money                      117448 non-null  float64
 10  t_money                       117448 non-null  float64
 11  ct_helmets                    117448 non-null  float64
 12  t_helmets                     117448 non-null  fl

In [20]:
# spliting the data into dependent and independent data

x = df.drop(columns = ['round_winner'])
y = df['round_winner']

## Standardization

In [23]:
from sklearn.preprocessing import StandardScaler

In [24]:
SS = StandardScaler()

**Standardization is necessary because our dataset contain so many high values and the so many low values. With the help of standardization our model give equal priority.** 

In [29]:
x = SS.fit_transform(x)

In [30]:
from sklearn.model_selection import train_test_split

In [31]:
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size = 0.2,random_state = 42)

In [None]:
# Feature selection   LEt'sa proceed with LDA for feature selection but not extraction as we are having only 2 classes in the target variables

## LDA implementation

In [32]:
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis

In [33]:
LDA = LinearDiscriminantAnalysis()

In [34]:
LDA.fit(x_train,y_train)

In [35]:
LDA.transform(x_test)

array([[-0.43790026],
       [-0.47385472],
       [ 2.04945778],
       ...,
       [-0.16614537],
       [ 0.83720344],
       [ 0.91863186]])

In [36]:
LDA.coef_

array([[ 1.32372281e-01, -1.28978150e-02,  1.80963770e-02,
        -8.74185514e-02,  2.40076034e-01, -4.42476990e-01,
         3.27769273e-01, -6.05125130e-01,  6.33651067e-01,
        -2.14669010e-01,  1.68799592e-01,  2.93023855e-02,
         7.34037417e-02, -2.68519863e-02, -1.96531128e-01,
         3.65538537e-01, -1.93260654e-01,  6.06824785e-01,
        -2.03305620e-01,  3.73297583e-02, -3.08489259e-01,
         2.55585122e-01, -1.13545867e-15, -5.51710293e-04,
         1.05045089e-02, -6.88320630e-03, -2.87143079e-03,
         1.98834906e-02, -1.37645265e-01,  4.58321868e-02,
         6.88186482e-16,  1.92752285e-02, -4.90614330e-02,
         1.34278032e-01,  2.05839755e-02, -1.39874336e-01,
         2.67752772e+14,  5.81547183e-15, -9.79776987e-02,
         4.86793492e-02, -4.53596017e-01,  1.04187273e-01,
        -9.38185221e-03,  1.14825405e-01, -1.60833971e-02,
         7.00108869e-03, -1.39624021e-02,  3.88652522e-02,
         6.27665196e-03, -6.29920886e-03, -1.21072628e-0

**After the LDA.coef_ we get positive and negative value but (+ & - ) sign indicate the direction.**

In [39]:
# Ww need to convert all negtaive values into positive values. 

LDA_coefficients = np.exp(np.abs(LDA.coef_))
LDA_coefficients

  LDA_coefficients = np.exp(np.abs(LDA.coef_))


array([[1.14153321, 1.01298135, 1.01826111, 1.09135337, 1.27134581,
        1.55655803, 1.38786872, 1.83148137, 1.88447839, 1.23945158,
        1.18388286, 1.02973592, 1.07616494, 1.02721575, 1.21717321,
        1.44128999, 1.21319898, 1.8345969 , 1.22544693, 1.03803527,
        1.36136689, 1.29121692, 1.        , 1.00055186, 1.01055987,
        1.00690695, 1.00287556, 1.02008248, 1.1475684 , 1.04689871,
        1.        , 1.0194622 , 1.05028487, 1.14371076, 1.02079729,
        1.15012926,        inf, 1.        , 1.10293819, 1.04988365,
        1.57396202, 1.10980827, 1.009426  , 1.12167758, 1.01621343,
        1.00702565, 1.01406033, 1.03963039, 1.00629639, 1.00631909,
        1.12870688, 1.03983244, 1.        , 1.00699706, 1.018258  ,
        1.01402578, 1.02478867, 1.01828867, 1.        , 1.00566942,
        1.        , 1.01091521, 1.00143482, 1.01114707, 1.22388652,
        1.74695808, 1.08676621, 1.02341686, 1.05282986, 1.09863758,
        1.01639207, 1.00120661, 1.01381084, 1.00

In [40]:
LDA_coefficients = LDA_coefficients.flatten()

In [41]:
LDA_coefficients

array([1.14153321, 1.01298135, 1.01826111, 1.09135337, 1.27134581,
       1.55655803, 1.38786872, 1.83148137, 1.88447839, 1.23945158,
       1.18388286, 1.02973592, 1.07616494, 1.02721575, 1.21717321,
       1.44128999, 1.21319898, 1.8345969 , 1.22544693, 1.03803527,
       1.36136689, 1.29121692, 1.        , 1.00055186, 1.01055987,
       1.00690695, 1.00287556, 1.02008248, 1.1475684 , 1.04689871,
       1.        , 1.0194622 , 1.05028487, 1.14371076, 1.02079729,
       1.15012926,        inf, 1.        , 1.10293819, 1.04988365,
       1.57396202, 1.10980827, 1.009426  , 1.12167758, 1.01621343,
       1.00702565, 1.01406033, 1.03963039, 1.00629639, 1.00631909,
       1.12870688, 1.03983244, 1.        , 1.00699706, 1.018258  ,
       1.01402578, 1.02478867, 1.01828867, 1.        , 1.00566942,
       1.        , 1.01091521, 1.00143482, 1.01114707, 1.22388652,
       1.74695808, 1.08676621, 1.02341686, 1.05282986, 1.09863758,
       1.01639207, 1.00120661, 1.01381084, 1.00103125, 1.03914

In [47]:
feature_names = x.columns
feature_names

AttributeError: 'numpy.ndarray' object has no attribute 'columns'

In [45]:
plt.figure(figsize=(20,10))
plt.bar(feature_names,lda_coefficients)
plt.xticks(rotation=90)
plt.xlabel("Features")
plt.ylabel("Score")
plt.show()

NameError: name 'feature_names' is not defined

<Figure size 2000x1000 with 0 Axes>

In [46]:
df_feature_score = pd.DataFrame({"Feature_names":feature_names,"feature_scores":LDA_coefficients})

NameError: name 'feature_names' is not defined

In [None]:
df_feature_score

Unnamed: 0,Feature_names,feature_scores
0,time_left,1.151192
1,ct_score,1.016513
2,t_score,1.016677
3,map,1.088077
4,bomb_planted,1.278696
...,...,...
91,t_grenade_incendiarygrenade,1.014425
92,ct_grenade_molotovgrenade,1.011119
93,t_grenade_molotovgrenade,1.113260
94,ct_grenade_decoygrenade,1.000134


In [None]:
top_20_values = df_feature_score.nlargest(20,"feature_scores")
# df_feature_score.nlargest(20, "feature_scores") is using the nlargest method to retrieve the top 20 rows from
#  a DataFrame based on a specific column named "feature_scores."

In [None]:
top_20_values

Unnamed: 0,Feature_names,feature_scores
8,t_armor,1.901714
17,t_weapon_ak47,1.868838
7,ct_armor,1.831334
65,t_weapon_sg553,1.765034
40,ct_weapon_m4a4,1.596029
5,ct_health,1.530782
15,t_players_alive,1.464153
6,t_health,1.384316
20,ct_weapon_awp,1.366133
4,bomb_planted,1.278696


In [None]:
imp_cols=top_20_values.index
imp_cols

Int64Index([8, 17, 7, 65, 40, 5, 15, 6, 20, 4, 21, 89, 14, 9, 18, 64, 87, 16,
            35, 10],
           dtype='int64')

In [None]:
x_train

array([[ 1.41686578,  0.04867334, -1.40859784, ..., -0.8280369 ,
        -0.1628247 , -0.15382351],
       [ 0.38227138, -0.99065426, -0.99586095, ...,  2.16418228,
        -0.1628247 , -0.15382351],
       [-1.49969795,  0.88013541,  1.06782348, ..., -0.8280369 ,
        -0.1628247 , -0.15382351],
       ...,
       [ 0.28772178,  1.29586645,  0.44871815, ...,  2.16418228,
        -0.1628247 , -0.15382351],
       [ 0.00689256,  1.29586645,  1.48056037, ...,  1.56573845,
        -0.1628247 , -0.15382351],
       [ 1.51085147,  0.46440437,  0.03598127, ..., -0.8280369 ,
        -0.1628247 , -0.15382351]])

In [None]:
x_train=x_train[:,imp_cols]

In [None]:
x_test=x_test[:,imp_cols]

In [None]:
x_train

array([[-1.73825231, -0.94029438, -0.19849244, ...,  1.2210558 ,
         1.06731306,  0.50792636],
       [ 1.02521179,  1.2541644 ,  0.48371375, ..., -0.47725904,
         1.06731306, -0.68489092],
       [-0.60069328, -0.94029438, -1.28061261, ...,  1.2210558 ,
        -1.99263947, -0.40302656],
       ...,
       [ 0.80689238,  1.2541644 ,  1.07182254, ..., -0.47725904,
         1.06731306,  0.94093538],
       [ 1.08266427,  0.52267814,  1.07182254, ..., -0.47725904,
         1.06731306, -0.47655639],
       [-0.08936625, -0.94029438, -1.86872139, ..., -0.47725904,
        -0.15666795,  0.56103124]])

In [None]:
pd.DataFrame(x_train)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0,-1.738252,-0.940294,-0.198492,-0.743743,-0.907498,0.684387,0.614814,0.717496,2.084163,-0.364083,-0.689961,-0.909722,0.619567,3.166033,-0.296341,-0.440699,-1.049887,1.221056,1.067313,0.507926
1,1.025212,1.254164,0.483714,-0.743743,-0.907498,0.684387,0.614814,0.717496,-0.875420,-0.364083,-0.689961,1.814845,0.619567,-0.856265,-0.296341,-0.440699,1.168070,-0.477259,1.067313,-0.684891
2,-0.600693,-0.940294,-1.280613,-0.743743,-0.907498,-2.303445,-0.989928,-0.865139,-0.875420,2.746628,-0.689961,-0.364809,-2.656273,-0.865095,-0.296341,-0.440699,-1.049887,1.221056,-1.992639,-0.403027
3,1.059683,1.254164,1.071823,0.128663,3.174906,0.684387,0.614814,0.717496,-0.875420,-0.364083,1.389839,0.725018,0.619567,-0.622256,-0.296341,-0.440699,1.722559,-0.477259,-0.156668,1.132930
4,-1.738252,-0.940294,-0.692504,-0.743743,-0.907498,0.684387,0.614814,0.717496,2.084163,-0.364083,-0.689961,-0.909722,0.619567,4.945380,-0.296341,-0.440699,-1.049887,-0.477259,1.067313,4.151738
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
93953,0.898816,-0.208808,1.071823,1.873476,0.725463,0.684387,0.614814,0.717496,2.084163,-0.364083,1.389839,1.814845,0.619567,-0.776790,1.841857,-0.440699,1.722559,-0.477259,0.455323,0.438482
93954,0.801147,0.522678,-0.716028,1.001070,-0.907498,0.684387,0.614814,0.717496,-0.875420,-0.364083,-0.689961,1.814845,0.619567,0.044447,-0.296341,-0.440699,1.722559,-0.477259,-0.156668,0.128022
93955,0.806892,1.254164,1.071823,1.001070,2.358425,0.684387,0.614814,0.717496,0.604371,-0.364083,-0.689961,1.814845,0.619567,-0.851849,-0.296341,-0.440699,1.722559,-0.477259,1.067313,0.940935
93956,1.082664,0.522678,1.071823,0.128663,-0.907498,0.684387,0.614814,0.717496,-0.875420,-0.364083,-0.689961,1.269932,0.619567,-0.798866,-0.296341,1.340562,1.722559,-0.477259,1.067313,-0.476556


In [None]:
# Model Building
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import *

In [None]:
# logistic model
log_model=LogisticRegression()

In [None]:
log_model.fit(x_train,y_train)

In [None]:
log_pred=log_model.predict(x_test)

In [None]:
y_test

86634     0
29985     0
3264      0
51472     1
44943     0
         ..
40083     1
65879     0
102343    0
25198     0
32293     1
Name: round_winner, Length: 23490, dtype: int64

In [None]:
accuracy_score(y_test,log_pred)

0.7532141336739038

In [None]:
confusion_matrix(y_test,log_pred)

array([[8686, 2814],
       [2983, 9007]])

In [None]:
# Decision Tree implementation
dt_model = DecisionTreeClassifier()

In [None]:
dt_model.fit(x_train,y_train)

In [None]:
dt_pred=dt_model.predict(x_test)

In [None]:
accuracy_score(y_test,dt_pred)

0.8126436781609195

In [None]:
confusion_matrix(y_test,dt_pred)


array([[9288, 2212],
       [2189, 9801]])

In [None]:
9288 +9801

19089

### Random Forest Classifier implementation

In [None]:
rf_model = RandomForestClassifier()

In [None]:
rf_model.fit(x_train,y_train)

In [None]:
rf_pred=rf_model.predict(x_test)

In [None]:
accuracy_score(y_test,rf_pred)

0.8537675606641124

In [None]:
confusion_matrix(y_test,rf_pred)

array([[ 9856,  1644],
       [ 1791, 10199]])

In [None]:
9856 +10199

20055

In [None]:
# Best Model in Random Forest for the given problem