# Overwatch: Predicting Competitive Rank

## Background

Overwatch Origins Edition was released on May of 2016 for Windows, PS4 and Xbox One, by Blizzard Entertainment. Blizzard is known for their Massive Multiplayer Online (MMO) and stragety based games, like World of Warcraft, Heroes of the Storm, and StraCraft. Overwatch features a team based first-person shooter with 26 characters split between four group classifications; Offense, Defense, Tanks, and Support. The structure of the game is based on team work to capture objectives and
transport cargo objectives, or hold capture points. The games design has a variety of modes to choose from, depending on the players preferred gaming style; Arcade mode, which presents challenges for a player to play solo or as a team, Quickplay mode, where players are queued with other players in non-ranked matches for the maps objectives, and Competitive mode, where players are queued with other players based on skill rating for map objectives and is only available during seasons.

Since it's release, the game has grown in popularity, with the most recent player count at 35 million in October of 2017, according to statista.com (2017) and pcgamesn.com (2017). Overwatch has entered into the esports community with the launch of Overwatch League (overwatchleague.com/en-us/) in 2018, showcasing the top competitive players in the Overwatch community. The competitive players are determined through the games Skill Rating system in Competitive matches, ranging from 1 to 4000+ (overwatch.gamepedia.com/Competitive_Play), with Bronze being the lowest and Grand Master being the highest (outside of the top 500 for the region).

With the demomstrated interest from the gaming community on Overwatch League, at 10 million viewers in the League's first week (dotesports.com/overwatch/news/overwatch-league-10-million-viewers-20274), Competitive Overwatch is a popular game mode. As a regular gamer, and an Overwatch fan, I regularly participate in Competitive matches, and I am working to rise in the ranks with each passing season. During my time playing Overwatch Competitive, I have tried to maintain a balance in the number of characters I play well, with the hope this will help me raise my Skill Rating. The Skill Rating system has been a topic in numerous gaming threads on reddit and Overwatch forums, as the community tries to determine the best way to rise in the ranks. My goal is to determine what is the best way to increase a player's skill rating, thus their Competitive Rank, by analyzing player's Overwatch console data.

---
## Problem Statement

**Does having the flexibility to play 2-3 characters increase your chances of a higher Skill Rating (SR), or should a player focus on 1 character and master them for a higher SR? Can I predict a player's SR from the number of characters they have played, by analyzing the total number of games played and the total wins/loss/ties for each character, from a span of three seasons? **

---
### Hypothesis

I believe that having the flexibilty to play 2-3 characters, well, provides a player the ability to adapt to a competitive match's team composition by filling in the needed group classification for that match. By creating a balanced team, the chances of winning the match increases, thus providing the player with a higher SR through out their competitive season.

By analyzing a player's game data, I will be able to predict their future seasons SR based on the number of characters they've played and the total games played with each character over a season. 


---
## Data
The data for this project was obtained by contacting Blizzard Entertainment through Omnicmeta.com. The data was provided for Season 5, 6, and 7 for about 100k players on combined Xbox/PS4 console. The data columns contain the player id (numerically randomized), SR, and wins/losses/ties for each hero (26 x 3).

The data is classified as confidential by the Blizzard and therefore can not be made public. Using a python script, I have created a dummy data set, with similar data to use as a representation of the data prepping work I will be performing on the data obtained from Omnicmeta. The final results will reflect the data from Omnicmeta.

---
### Data Import

In [None]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn import metrics
%matplotlib inline
import seaborn as sns
sns.set(font_scale=1.5)
import matplotlib.pyplot as plt

#Defining constants
TG = 'totGames'
SR = 'skillrating'
PLAYER = 'player'
TC = 'Main_Char'
OFF = 'Offense'
DEF = 'Defense'
TAN = 'Tank'
SUP = 'Support'

# Create path for csv file
url = 'C:\Users\Chupi\Documents\GADataScience\Project_Work\OverwatchRankingsProject\owdata_dummy.csv'
#creating dataframe for the dummy data
OW = pd.read_csv(url, skipinitialspace = True)
OW.head()



---
### Processing the data
Because the data is so large, we need to reduce the amount of information we have by creating new columns for the total number of games played in the season, total number of games played by each character divided by the total number of games played, and then, the total characters played for each season. The ultimate goal is to have a dataframe with 5 columns as follows:

* Player
* SR
* Rank status
* Total Games Played
* Total Characters played

This will require several steps. First we want to find the total number of games played by each player. This was acheived by creating a list of columns from the df, with each character name. This column was then added to the end of the dataframe

---

In [None]:
# Adding a new column totalGames that is the sum of the character columns 
col_list = list(OW.columns.values)

col_new = col_list[2:79]
OW[TG] = OW[col_new].sum(axis = 1)

OW.head()

---

Here, we create a new empty dataframe that woud allow us to input the sum of each character, and create a new column using the existing column name, minus anything after the '_'.

---

In [None]:
#Create a blank df
OW_new = pd.DataFrame()

#redefine col_list
col_list = list(OW.columns.values)
#extracting the character name by splitting the column name by '_' to yield the first element
for name in col_list:
    char_name = name.split('_')[0]
    column = OW[name]
#Check if this column exists in the dataframe. If it does not, then add it.
    if not char_name in OW_new.columns:
        OW_new[char_name] = column
    #IF it already exist, sum it.
    else:
        prev_col = OW_new[char_name]
        OW_new[char_name] = column + prev_col
OW_new.head()

To add more data to predict with, we will breaking down the total charachaters played into their respective classification categories (https://overwatch.gamepedia.com/Heroes) as follows:

Offense:
* Doomfist	
* Genji
* McCree
* Pharah
* Reaper
* Soldier: 76
* Sombra
* Tracer

Defense:
* Bastion
* Hanzo
* Junkrat
* Mei
* Torbjörn
* Widowmaker

Tank:
* D.Va
* Orisa
* Reinhardt
* Roadhog
* Winston
* Zarya

Support:
* Ana
* Lúcio
* Mercy
* Moira
* Symmetra
* Zenyatta

In [None]:
#redefine col_list to OW_ratio df
col_list = ['player', 'skillrating', 'totGames','Doomfist', 'Genji', 'McCree', 'Pharah', 'Reaper', 'Soldier76', 
            'Sombra', 'Tracer', 'Bastion', 'Hanzo', 'Junkrat', 'Mei', 'Torbjorn', 'Widowmaker', 
            'D.Va',  'Orisa', 'Reinhardt', 'Roadhog', 'Winston', 'Zarya', 'Ana', 'Lucio', 'Mercy', 'Symmetra',
             'Zenyatta','Moira']
OW_new = OW_new[col_list]
OW_new.head()
# grouping the characters into their classification category
Offense = col_list[3:11]
Defense = col_list[11:17] 
Tank = col_list[17:23]
Support = col_list[23:29]

#Create a blank df
OW_new2 = pd.DataFrame()

# creating an empty list for the results to populate later
Off = []
Def = []
Tan = []
Sup = []


# Count the total number of character in each category class
for index, row in OW_new.iterrows():
    counter1 = 0
    counter2 = 0
    counter3 = 0
    counter4  = 0
    index = 3
    for cell in row[3:30]:
        col_name = col_list[index]
        index = index + 1
        if cell == 0:
            continue
        if col_name in Offense:
            counter1 = counter1 + 1
        elif col_name in Defense:
            counter2 = counter2 + 1
        elif col_name in Tank:
            counter3 = counter3 + 1
        elif col_name in Support:
            counter4 = counter4 + 1
        
    # filling in the empty list with the results of counter
    Off.append(counter1)
    Def.append(counter2)
    Tan.append(counter3)
    Sup.append(counter4)
#Add the columns from OW_new df to new df
OW_new2 = OW_new[col_list]

# Adding in the new columns to the df
OW_new2[OFF] = Off
OW_new2[DEF] = Def
OW_new2[TAN] = Tan
OW_new2[SUP] = Sup


OW_new2.head()

---
Now we will take the character column and divide by the total number of games played to create a new column of the percentage played of each character for the season (character ratio). After this, we will want to determine the most played characters, based on the the character ratio. 

---

In [None]:
#Create a blank df
OW_ratio = pd.DataFrame()

#redefine col_list
col_list = list(OW_new.columns.values)

#Add the columns player, SR and totalGames to the OW_ratio df
OW_ratio[PLAYER] = OW_new[PLAYER]
OW_ratio[SR] = OW_new[SR]
OW_ratio[TG] = OW_new[TG]
OW_ratio[OFF] = OW_new2[OFF]
OW_ratio[DEF] = OW_new2[DEF]
OW_ratio[TAN] = OW_new2[TAN]
OW_ratio[SUP] = OW_new2[SUP]

#Finding the charater play time ratio (total character games divided by total games) starting at index 2
for name in col_list[3:29]:
    char_total = OW_new[name]
    total_games = OW_new[TG]
    OW_ratio[name] = char_total.div(total_games, axis = 0)
    
OW_ratio.head()

---

Next will be to count the total number of characters, whose sum will equal 80% of the total games played. To ensure I am including characters with higher game time played, I am setting a parameter that will exclude characters from the summatin whose values are less than or equal to 5%.

---

In [None]:
# Create a new empty df
OW = pd.DataFrame()

# Pulling columns to new df
OW[PLAYER] = OW_ratio[PLAYER]
OW[SR] = OW_ratio[SR]
OW[TG] = OW_ratio[TG]
OW[OFF] = OW_ratio[OFF]
OW[DEF] = OW_ratio[DEF]
OW[TAN] = OW_ratio[TAN]
OW[SUP] = OW_ratio[SUP]

# Count the number of columns, whose sum is at least 80%, but does not include values under 5%.
min_value = 0.05
max_tot = 0.80
#creating an empty list for the results to populate later
col_counter = []
for index, row in OW_ratio.iterrows():
    counter = 0
    sum = 0
    for cell in row[7:33]:
        if cell > min_value:
            counter = counter + 1
            sum = sum + cell
            if sum >= max_tot:
                break;
    # filling in the empty list with the results of counter
    col_counter.append(counter)
# creating a new column that is set to the total count
OW[TC] = col_counter

OW.head()      


---
In order to create the Logistic Model, I will be creating a new column that categorizes the SR column into their corresponding Rank per https://overwatch.gamepedia.com/Competitive_Play (20 January 2018).

* Bronze - 1-1499 SR

* Silver - 1500-1999 SR

* Gold - 2000-2499 SR

* Platinum - 2500-2999 SR

* Diamond - 3000-3499 SR

* Master - 3500-3999 SR

* Grandmaster - 4000+


---

In [None]:
# Adding a new column for categorizing the SR into a string Rank. First create an empty list for the results of the for loop

col_rank = []
for value in OW[SR]:
    if value <= 1499:
        rank = 'Bronze'
    elif value <= 1999:
        rank = 'Silver'
    elif value <= 2499:
        rank = 'Gold'
    elif value <= 2999:
        rank = 'Platinum'
    elif value <= 3499:
        rank = 'Diamond'
    elif value <=3999:
        rank = 'Master'
    elif value > 4000:
        rank = 'Grandmaster'
    # filling in the empty list with the results of rank
    col_rank.append(rank)
# adding rank column to the df
OW['rank'] = col_rank

OW.head()

In [None]:
feature_col = [SR, TC, TG, OFF, DEF, TAN, SUP]
OW[feature_col].describe()

In [None]:
# setting up Logistic Regression Model
logreg = LogisticRegression()

# Creating an X feature and a y-response
X = OW[feature_col]
y = OW.skillrating

# fitting the model
logreg.fit(X, y)
OW['rank_pred'] = logreg.predict(X)

In [None]:
#import standard scaler
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()

scaler.fit(X)
X_scaled = scaler.transform(X)

print X_scaled[:, 0].mean()
print X_scaled[:, 0].std()
print X_scaled[:, 1].mean()
print X_scaled[:, 1].std()

In [None]:
# split into training and testing sets
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=1)

# standardize X_train
scaler.fit(X_train)
X_trained_scaled = scaler.transform(X_train)

In [None]:
# check that it standardized correctly
print X_trained_scaled[:, 0].mean()
print X_trained_scaled[:, 0].std()
print X_trained_scaled[:, 1].mean()
print X_trained_scaled[:, 1].std()

In [None]:
# standardize X_test
X_test_scaled = scaler.transform(X_test)

#Check that it worked
print X_test_scaled[:, 0].mean()
print X_test_scaled[:, 0].std()
print X_test_scaled[:, 1].mean()
print X_test_scaled[:, 1].std()

In [None]:
#checking for accuracy on train_test_split data

from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors = 5)

knn.fit(X_trained_scaled, y_train)
y_pred_class = knn.predict(X_test_scaled)

from sklearn import metrics
print metrics.accuracy_score(y_test, y_pred_class)

In [None]:
#checking for accuracy on my data
knn = KNeighborsClassifier(n_neighbors = 5)
from sklearn.model_selection import cross_val_score

cross_val_score(knn, X, y, cv=5, scoring='accuracy').mean()

In [None]:
cross_val_score(knn, X_scaled, y, cv = 5, scoring='accuracy').mean()



In [None]:
# fitting the model with X_scaled

logreg.fit(X, y)
OW['rank_pred'] = logreg.predict(X)
OW.head()

In [None]:
# store the predicted probabilites of class 1
OW['rank_predict_prob'] = logreg.predict_proba(X)[:, 1]
OW.head()

---
### Cited Sources: 
Overwatch player statistics: https://www.pcgamesn.com/overwatch/overwatch-sales-numbers, https://www.statista.com/statistics/618035/number-gamers-overwatch-worldwide/
Ovewatch character data: https://overwatch.gamepedia.com/Heroes