## Inferential Statistics: League of League Ranked Games

### Overview

Each game in League of Legends has a *blue* and *red* side. Although each side is **randomly assigned** for each game, each side has it's differences which may or may not be leading to advantages before the game even starts.

Blue side has it's red buff on the **bottom side** (near their duo lane) whereas red side has it's red buff on the **top side** (near their top lane). Certain junglers who heavily rely on their mana in the jungle tend to start their blue buff first and getting randomly assigned to a red/blue side will **dictate** where the jungler finishes their early game jungle route (there will always be variations depending on jungler/strategy). 

This is noteworthy because not only will respective **lanes** be affected, but also **objectives** (towers, dragons, scuttle crab, rift herald, etc.) will be closer in proximity / on route for certain sides and junglers.

### With lanes and objectives impacted solely on randomly assigned sides for teams, I am curious and even doubt if the proportion of wins on the blue side is **equal** to the proportion of wins on the red side.


In [1]:
# Import modules/packages

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
import math
import sys

In [2]:
df = pd.read_csv('macro_stats.csv') # Import csv
winner = df[['winner']].copy() # Make a copy of only neccessary column

### Sample Proportion Calculation for Blue / Red Side Wins

In [3]:
conditions = [ # Create list of current values
        winner.winner == 1,
        winner.winner == 2
        ]
choices = ['Blue', 'Red'] # Create list of new values

winner['side'] = np.select(conditions, choices) # Map choices to conditions

In [4]:
wins_b = winner[winner.side=='Blue']['winner'].count() # number of wins on blue side

wins_r = winner[winner.side=='Red']['winner'].count() # number of wins on red side

In [10]:
n = len(winner)# number of games

In [6]:
prop_b = wins_b/n # win proportion for blue side

prop_r = wins_r/n # win proportion for blue side

print("The sample proportion of wins for blue side is "+str(round(prop_b,3)))
print("The sample proportion of wins for red side is "+str(round(prop_r,3)))

The sample proportion of wins for blue side is 0.506
The sample proportion of wins for red side is 0.494


### Central Limit Theorem Conditions

**Random Condition:** As mentioned above, teams are **randomly** assigned blue and red side. Thus, our sample **meets** the random condition of the Central Limit Theorem.

**Normal  Condition:** Both sample proportions of blue and red side wins, when multiplied by sample size, are **greater than 10**. Their proportions are both in the **middle of 0 and 1** as well as have a **large number of records**. Thus, this sampling distribution for both sample proportions **meet** the normal condition.

**Independence Condition:** Both samples have sample sizes that are **less than 10%** of the number of games played overall. Thus, this our sample distribution for both sample proportions **meet** the independent condition.


### Null & Alternative Hypothesis

*Null Hypothesis:* In terms of winning, there is **no difference** for teams assigned to either blue or red side.

*Alternative Hypothesis:* In terms of winning, there **is a difference** for teams assigned to either blue or red side.

### Significance Level & Power

**Significance Level**: α = 0.01 

**Power**: We are worried of making a **Type I error** because if there is no difference between the sample proportion of wins on either blue or red side and reject this, players will leave games before the game starts if they're on an unfavorable side --- as a result they will be **wasting time** and **losing unneccessary  LP** (ranking points).

### Margin of Error & Confidence Interval

In [7]:
prop_diff = prop_b - prop_r # sample statistic

std_error = math.sqrt((prop_b * (1-prop_b) / n) + (prop_r * (1-prop_r)/n)) # standard error

z_score = round(stats.norm.ppf(.995),3) # Z-score of 99% confidence interval

print("The difference between the sample proportions of the blue and red side is", round(prop_diff,3))

The difference between the sample proportions of the blue and red side is 0.013


In [8]:
moe = z_score*std_error # margin of error

lower = prop_diff - moe # lower level of confidence interval
upper = prop_diff + moe # upper level of confidence interval

print("Margin of Error:", round(moe,3))
print("Confidence Interval:", [round(lower,3),round(upper,3)])

Margin of Error: 0.008
Confidence Interval: [0.005, 0.021]


The is a **99% chance** that the true difference between blue and red side winning percentages is between *.005 and .021.*

This means we are **99% confident** that there exists a difference between blue and red side winning percentages in which players assigned to blue side are more favorable to win.



### Z-Score & P-Value

In [9]:
p = prop_b = prop_r # assuming null hypothesis is true

p_hat = (wins_b + wins_r) / (n + n) # new proportion assuming null-hypothesis is true

std_error = math.sqrt((2 * p_hat * (1-p_hat))/n) # standard error of sampling proportion assuming null-hypothesis is true

z_score = (prop_diff - 0)/std_error # calculate z score

p_value = stats.norm.sf(abs(z_score)) #calculate p-value

print('Z-score:', round(z_score,2))
print('P-Value:', round(p_value,4))

Z-score: 4.14
P-Value: 0.0


The probability of getting a Z-score **as extreme or more extreme*** than 4.14 is 0% (p-value = 0.0), *assuming the null-hypothesis is true*. 

Since our p-value is **less than** our predetermined significance level of 0.01, we **reject** the null hypothesis and assume **statistical significance** in the difference between proportions of blue and red side win percentages.

### Conclusion

Firstly, our statistical analysis says that we are *confident* that the true difference between winning percentages on blue and red side is between 0.005 and 0.021, 99% of the time. Thus, we are quite confident that the true difference is between 0.005 and 0.021 **BUT** that percentage is **AT MOST** less than 2%. 

Furthermore, our analysis **ONLY** looked at the relationship between assigned sides and winning percentages. There are a couple more variables to look and to uncover the correlation between such as win percentages of certain champions and even looking at success rate of all the combinations of champions (which may get quite complex).

Thus, if I were making suggestions to a player seeking to rise up the ranking ladder, I would not advise him to leave a game if randomly sorted to a specific side based on statistical **AND** practical significance. 