<a href="https://colab.research.google.com/github/eylulpelinkilic/Clash-Royale-DSA210-Project/blob/main/Hypothesis_Testing.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Test 1:

H0 (Null Hypothesis): Deck composition has no significant impact on match outcomes.

H1 (Alternative Hypothesis): Certain deck compositions significantly increase the probability of winning.

In [1]:
import pandas as pd
from scipy.stats import chi2_contingency

# Load the match data
matches = pd.read_excel("matches.xlsx")

# Count the most common 10 decks (by exact string)
top_decks = matches['playerDeck'].value_counts().head(10)

# Label each deck as one of the top 10 or 'other'
def label_deck(deck_str):
    return deck_str if deck_str in top_decks.index else 'other'

matches['deckGroup'] = matches['playerDeck'].apply(label_deck)

# Create a contingency table: deckGroup vs match outcome (win/loss)
contingency_table = pd.crosstab(matches['deckGroup'], matches['win'])

# Perform the Chi-Square test
chi2_stat, p_value, dof, expected = chi2_contingency(contingency_table)

# Show results
print("Contingency Table (Deck vs Win):")
print(contingency_table)
print(f"\nChi-Square Statistic: {chi2_stat:.4f}")
print(f"Degrees of Freedom: {dof}")
print(f"P-Value: {p_value:.4f}")

# Hypothesis test result
alpha = 0.05
if p_value < alpha:
    print("\nConclusion: Deck composition has a statistically significant effect on match outcome. (Reject H0)")
else:
    print("\nConclusion: No significant effect of deck composition on match outcome was found. (Fail to reject H0)")


Contingency Table (Deck vs Win):
win                                                 False  True 
deckGroup                                                       
Bats,Mortar,Arrows,Boss Bandit,Cannon Cart,Ice ...      2      0
Goblin Barrel,Royal Recruits,Tesla,Dart Goblin,...      2      1
Goblin Barrel,Valkyrie,Dart Goblin,Goblin Gang,...      9     11
Hunter,Barbarians,Wizard,Bats,Witch,Miner,Skele...      1      1
Hunter,Barbarians,Wizard,Dart Goblin,Bats,Miner...     12     14
Hunter,Valkyrie,Skeleton King,Arrows,Firecracke...     10      9
Hunter,Valkyrie,Skeleton King,Arrows,Goblin Gan...      3      2
Hunter,Valkyrie,Skeleton King,Arrows,Ice Wizard...      1      2
Mega Knight,Witch,Electro Giant,P.E.K.K.A,Wizar...      5      5
Valkyrie,Hog Rider,Firecracker,Boss Bandit,Bats...     14     16
other                                                   8      4

Chi-Square Statistic: 4.7106
Degrees of Freedom: 10
P-Value: 0.9097

Conclusion: No significant effect of deck compositio

Test 2:

H0 (Null Hypothesis): Average elixir cost does not affect match performance.

H1 (Alternative Hypothesis): Extremely high or low elixir costs impact match performance significantly.

In [2]:
import pandas as pd
from scipy.stats import ttest_ind, shapiro

# Load your data
matches = pd.read_excel("matches.xlsx")

# Separate the average elixir costs by match outcome
wins = matches[matches['win'] == True]['playerAvgElixir']
losses = matches[matches['win'] == False]['playerAvgElixir']

# Step 1: Check normality using Shapiro-Wilk test
shapiro_win_p = shapiro(wins).pvalue
shapiro_loss_p = shapiro(losses).pvalue

print("Shapiro-Wilk p-value (Win group):", shapiro_win_p)
print("Shapiro-Wilk p-value (Loss group):", shapiro_loss_p)

# Step 2: Perform Welch's t-test (assumes unequal variance)
t_stat, p_value = ttest_ind(wins, losses, equal_var=False)

# Step 3: Show results
print("\nAverage elixir cost (wins):", round(wins.mean(), 3))
print("Average elixir cost (losses):", round(losses.mean(), 3))
print("T-statistic:", round(t_stat, 4))
print("P-value:", round(p_value, 4))

# Interpretation
alpha = 0.05
if p_value < alpha:
    print("\nConclusion: The difference in average elixir cost is statistically significant (Reject H0).")
else:
    print("\nConclusion: No statistically significant difference in elixir cost between win/loss (Fail to reject H0).")


Shapiro-Wilk p-value (Win group): 7.921260728095923e-09
Shapiro-Wilk p-value (Loss group): 4.68307206338889e-09

Average elixir cost (wins): 3.951
Average elixir cost (losses): 3.968
T-statistic: -0.1951
P-value: 0.8456

Conclusion: No statistically significant difference in elixir cost between win/loss (Fail to reject H0).


Test 3:

H0 (Null Hypothesis): The difference between the player's elixir cost and the opponent's elixir cost does not affect the match outcome.

H1 (Alternative Hypothesis): The elixir cost difference (higher/lower) significantly affects the probability of winning.

In [3]:
import pandas as pd
from scipy.stats import ttest_ind

# Load your dataset
df = pd.read_excel("matches.xlsx")

# Compute the elixir difference between player and opponent
df["elixir_diff"] = df["playerAvgElixir"] - df["opponentAvgElixir"]

# Split the dataset into wins and losses
won = df[df["win"] == True]["elixir_diff"]
lost = df[df["win"] == False]["elixir_diff"]

# Perform an independent two-sample t-test
t_stat, p_value = ttest_ind(won, lost, equal_var=False)

# Print the results
print("Elixir Cost Difference Hypothesis Test")
print("---------------------------------------")
print(f"t-statistic: {t_stat:.4f}")
print(f"p-value: {p_value:.4f}")
print(f"Mean Elixir Diff (Win): {won.mean():.3f}")
print(f"Mean Elixir Diff (Loss): {lost.mean():.3f}")

# Interpretation
if p_value < 0.05:
    print("\nResult: Reject H0 — elixir difference significantly affects match outcomes.")
else:
    print("\nResult: Fail to reject H0 — no significant effect of elixir difference on outcome.")


Elixir Cost Difference Hypothesis Test
---------------------------------------
t-statistic: 1.0347
p-value: 0.3028
Mean Elixir Diff (Win): 0.434
Mean Elixir Diff (Loss): 0.320

Result: Fail to reject H0 — no significant effect of elixir difference on outcome.


Test 4:

H0 (Null Hypothesis): The number of shared cards between the player and opponent has no effect on the match outcome.

H1 (Alternative Hypothesis): Sharing more cards with the opponent significantly affects the probability of winning.

In [11]:
import pandas as pd
from scipy.stats import ttest_ind

# Load the dataset
df = pd.read_excel("matches.xlsx")

# Split decks into sets of card names
df["playerCards"] = df["playerDeck"].str.split(",")
df["opponentCards"] = df["opponentDeck"].str.split(",")

# Count how many cards are shared between player and opponent
def count_common(row):
    return len(set(row["playerCards"]) & set(row["opponentCards"]))

df["common_cards"] = df.apply(count_common, axis=1)

# Split by win/loss and run t-test
win_common = df[df["win"] == True]["common_cards"]
loss_common = df[df["win"] == False]["common_cards"]

t_stat, p_val = ttest_ind(win_common, loss_common, equal_var=False)

# Print results
print("Hypothesis Test: Number of Shared Cards")
print(f"t-statistic: {t_stat:.4f}")
print(f"p-value: {p_val:.4f}")
print(f"Mean Shared Cards (Win): {win_common.mean():.3f}")
print(f"Mean Shared Cards (Loss): {loss_common.mean():.3f}")

if p_val < 0.05:
    print("Result: Reject H0 — Shared cards significantly affect outcome.")
else:
    print("Result: Fail to reject H0 — No significant effect of shared cards.")


Hypothesis Test: Number of Shared Cards
t-statistic: -2.5736
p-value: 0.0112
Mean Shared Cards (Win): 0.877
Mean Shared Cards (Loss): 1.433
Result: Reject H0 — Shared cards significantly affect outcome.


Test 5:

H0 (Null Hypothesis): The opponent’s average elixir cost has no impact on the match result.

H1 (Alternative Hypothesis): If the opponent has a lower average elixir cost, it significantly increases the chance of winning.

In [10]:
import pandas as pd
from scipy.stats import ttest_ind

# Load your dataset
df = pd.read_excel("matches.xlsx")

# Split into two groups: win vs. loss
win_elixir = df[df["win"] == True]["opponentAvgElixir"]
loss_elixir = df[df["win"] == False]["opponentAvgElixir"]

# Run independent two-sample t-test
t_stat, p_val = ttest_ind(win_elixir, loss_elixir, equal_var=False)

# Print results
print("Hypothesis Test: Opponent's Average Elixir Cost")
print(f"t-statistic: {t_stat:.4f}")
print(f"p-value: {p_val:.4f}")
print(f"Mean Opponent Elixir (Win): {win_elixir.mean():.3f}")
print(f"Mean Opponent Elixir (Loss): {loss_elixir.mean():.3f}")

if p_val < 0.05:
    print("Result: Reject H0 — Opponent elixir cost significantly affects win rate.")
else:
    print("Result: Fail to reject H0 — No significant effect detected.")


Hypothesis Test: Opponent's Average Elixir Cost
t-statistic: -1.4647
p-value: 0.1455
Mean Opponent Elixir (Win): 3.517
Mean Opponent Elixir (Loss): 3.647
Result: Fail to reject H0 — No significant effect detected.
