# Potential Correlations and Analyses

## Player Performance Metrics vs. Economy Rounds
- Correlate **"Econ" rating** with:
  - **"Kills Per Round (KPR)"**
  - **"Assists Per Round (APR)"**
  - **"Rating (R)"**
- Analyze how **economy round types** (e.g., eco, semi-eco) influence player performance metrics.

## Agent Picks and Match Outcomes
- Examine correlations between **"Pick Rate"** and **"Total Wins By Map"** to identify strong agent-map combinations.
- Analyze **"Attacker Side Win Percentage"** and **"Defender Side Win Percentage"** based on agent picks.

## Map Characteristics and Team Success
- Analyze correlations between:
  - **"Attacker Side Win Percentage"** and **"Team Attacker Score"**
  - **"Defender Side Win Percentage"** and **"Team Defender Score"**
- Investigate if **"Total Maps Played"** affects win rates on specific maps.

## Impact of Loadout Value
- Compare **"Loadout Value"** with:
  - **"Remaining Credits"**
  - **"Type"**  
  To determine their effect on winning specific economy rounds.
- Study **"Loadout Value"** impact on:
  - **"Kills"**
  - **"Spike Plants"**
  - **"Spike Defuse"**

## Clutch Success and Player Metrics
- Investigate correlations between:
  - **"Clutch Success %"** and **"Kills"**
  - **"Clutch Success %"** and **"Assists"**
  - **"Clutch Success %"** and **"First Kills"**
- Analyze how **"Clutch Success %"** affects the overall **"Rating (R)"**.

## Game Durations and Strategy
- Correlate **"Duration"** with:
  - **"Team Score"**
  - **"Eliminations"**
  - **"Spike Defuse"** strategies  
  To understand how strategies evolve in longer matches.

---

# Techniques to Apply

## Apriori Algorithm
- Identify frequent patterns in:
  - **"Agent Picks"**
  - **"Winning Economy Types"**
  - **"Map-Based Wins"**
- Example:  
  *Rules like "If a team spends $$ (Semi-buy) and plays Agent X, the win rate is higher on Map Y."*

## K-means Clustering
- Cluster players based on:
  - **"Rating (R)"**
  - **"Kills"**
  - **"KAST"**
  - **"ACS"**  
  To classify player performance levels.
- Group maps by similarities in **"Attacker/Defender Win Percentages"**.

## EM Clustering (Gaussian Mixture Models)
- Identify latent patterns in:
  - **"Spike Plants"**
  - **"Eliminations"**
  - **"Eco (won)" metrics**
- Example:  
  *Classify rounds based on complex relationships like credits spent and strategies used.*

## DBScan Clustering
- Find anomalies in performance data:
  - Players with exceptionally high **"Kills Per Round"** or **"Clutch Success %"**
- Analyze unusual strategies based on **"Remaining Credits"** and **"Econ"**.

## SLINK Clustering
- Hierarchically cluster teams or players based on multidimensional performance statistics.
- Example:  
  *Discover tiers of teams based on "Team Score," "Kills Per Round," and "Duration."*

## Linear Regression
- Predict **"Rating (R)"** based on:
  - **"Kills"**
  - **"Assists"**
  - **"ACS"**
  - **"Clutch Success %"**
- Model **"Attacker Side Win Percentage"** as a function of **"Agent Picks"** and **"Loadout Value"**.

---

# Action Plan

## 1. Data Cleaning
- Handle missing values.
- Normalize numerical data.
- Encode categorical variables like **"Type"** and **"Agent"**.

## 2. Exploratory Data Analysis (EDA)
- Visualize distributions, correlations, and outliers using:
  - Scatter plots
  - Boxplots
  - Heatmaps

## 3. Feature Selection
- Identify the most impactful features for clustering or prediction using:
  - Statistical tests
  - Feature importance techniques.

## 4. Apply Techniques
- Run clustering algorithms to identify patterns or groupings.
- Use regression to predict or explain relationships between variables.
- Use association rule mining to extract meaningful game strategies.

## 5. Interpret Results
- Validate findings using:
  - Cross-validation
  - Statistical metrics (e.g., R² for regression, silhouette score for clustering).


In [3]:
#INITIALIZE LIBRARIES
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import os


In [6]:

#FILE PATHS

## AGENTS
file_path_agentPickRate = r"Dataset3\vct_2024\agents\agents_pick_rates.csv"
file_path_mapsStats = r"Dataset3\vct_2024\agents\maps_stats.csv"
file_path_teamsPickedAgents = r"Dataset3\vct_2024\agents\teams_picked_agents.csv"

## IDS
file_path_playersIds = r"Dataset3\vct_2024\ids\players_ids.csv"
file_path_teamsIds = r"Dataset3\vct_2024\ids\teams_ids.csv"
file_path_tournaments_stages_match_types_ids = r"Dataset3\vct_2024\ids\tournaments_stages_match_types_ids.csv"
file_path_tournaments_stages_matches_games_ids = r"Dataset3\vct_2024\ids\tournaments_stages_matches_games_ids.csv"

## MATCHES
file_path_draftPhase = r"Dataset3\vct_2024\matches\draft_phase.csv"
file_path_ecoRounds = r"Dataset3\vct_2024\matches\eco_rounds.csv"
file_path_ecoStats = r"Dataset3\vct_2024\matches\eco_stats.csv"
file_path_killsStats = r"Dataset3\vct_2024\matches\kills_stats.csv"
file_path_kills = r"Dataset3\vct_2024\matches\kills.csv"
file_path_mapsPlayed = r"Dataset3\vct_2024\matches\maps_played.csv"
file_path_mapsScores = r"Dataset3\vct_2024\matches\maps_scores.csv"
file_path_overview = r"Dataset3\vct_2024\matches\overview.csv"
file_path_roundsKills = r"Dataset3\vct_2024\matches\rounds_kills.csv"
file_path_scores = r"Dataset3\vct_2024\matches\scores.csv"
file_path_teamMapping = r"Dataset3\vct_2024\matches\team_mapping.csv"
file_path_win_loss_methods_count = r"Dataset3\vct_2024\matches\win_loss_methods_count.csv"
file_path_win_loss_methods_round_number = r"Dataset3\vct_2024\matches\win_loss_methods_round_number.csv"

##PLAYERS
file_path_players_stats = r"Dataset3\vct_2024\players_stats\players_stats.csv"


In [None]:
data = pd.read_csv('your_dataset.csv')

# Ensure correct data types
data['Pick Rate'] = data['Pick Rate'].astype(float)
data['Rating (R)'] = data['Rating (R)'].astype(float)
data['Kills Per Round (KPR)'] = data['Kills Per Round (KPR)'].astype(float)

In [16]:
# Load individual CSV files
df_pick_rate = pd.read_csv(file_path_agentPickRate)
df_win_percentages = pd.read_csv(file_path_mapsStats)
df_ratings = pd.read_csv(file_path_players_stats)
df_kills = pd.read_csv(file_path_players_stats)

# Filter rows where Tournament = "Valorant Champions 2024"
df_pick_rate = df_pick_rate[df_pick_rate['Tournament'] == "Valorant Champions 2024"]
df_win_percentages = df_win_percentages[df_win_percentages['Tournament'] == "Valorant Champions 2024"]
df_ratings = df_ratings[df_ratings['Tournament'] == "Valorant Champions 2024"]
df_kills = df_kills[df_kills['Tournament'] == "Valorant Champions 2024"]

# Ensure correct data types
df_pick_rate['Pick Rate'] = df_pick_rate['Pick Rate'].str.replace('%', '', regex=False)  # Remove %
df_pick_rate['Pick Rate'] = df_pick_rate['Pick Rate'].astype(float) / 100  # Convert to float and scale to fraction

df_win_percentages['Attacker Side Win Percentage'] = (
    df_win_percentages['Attacker Side Win Percentage']
    .str.replace('%', '', regex=False)
    .astype(float) / 100
)

df_win_percentages['Defender Side Win Percentage'] = (
    df_win_percentages['Defender Side Win Percentage']
    .str.replace('%', '', regex=False)
    .astype(float) / 100
)

# Merge all DataFrames on a common key (e.g., 'Tournament')
merged_data = df_pick_rate.merge(df_win_percentages, on='Tournament') \
                          .merge(df_ratings, on='Tournament') \
                          .merge(df_kills, on='Tournament')

# Compute correlation matrix for the selected columns
corr_matrix = merged_data[['Pick Rate', 'Attacker Side Win Percentage', 
                           'Defender Side Win Percentage', 'Rating (R)', 
                           'Kills Per Round (KPR)']].corr()

# Plot correlation heatmap
plt.figure(figsize=(10, 8))
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', fmt=".2f")
plt.title("Correlation Heatmap")
plt.show()


MemoryError: Unable to allocate 2.81 GiB for an array with shape (376648800,) and data type int64

In [40]:
#import the modules
import os
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
from mlxtend.frequent_patterns import apriori, association_rules

def convert_to_numeric(value):
    """Convert string values like '24.7k' to numerical values like 24700"""
    if isinstance(value, str):
        if 'k' in value.lower():
            return float(value.lower().replace('k', '').strip()) * 1000
        elif 'm' in value.lower():
            return float(value.lower().replace('m', '').strip()) * 1000000
        # Add more cases as needed (e.g., handling 'b' for billions)
    return pd.to_numeric(value, errors='coerce')

# Load the eco_rounds.csv file
file_path = r'Dataset3\vct_2024\matches\eco_rounds.csv'
eco_rounds = pd.read_csv(file_path)

# Convert 'Loadout Value' and 'Remaining Credits' to numeric using the function
eco_rounds['Loadout Value'] = eco_rounds['Loadout Value'].apply(convert_to_numeric)
eco_rounds['Remaining Credits'] = eco_rounds['Remaining Credits'].apply(convert_to_numeric)

# Handle any missing values (e.g., drop rows with missing values)
eco_rounds.dropna(subset=['Loadout Value', 'Remaining Credits', 'Outcome'], inplace=True)

# Convert Outcome to a binary value (1 for win, 0 for loss, assuming 'Outcome' is categorical)
eco_rounds['Outcome'] = eco_rounds['Outcome'].apply(lambda x: 1 if x == 'Win' else 0)

# Show a snapshot of the data
eco_rounds.head()

# First, we need to encode the data into a format suitable for Apriori
# For simplicity, let's encode 'Loadout Value', 'Remaining Credits' as categorical (binned)
eco_rounds['Loadout Value Binned'] = pd.cut(eco_rounds['Loadout Value'], bins=5, labels=['Low', 'Medium', 'High', 'Very High', 'Max'])
eco_rounds['Remaining Credits Binned'] = pd.cut(eco_rounds['Remaining Credits'], bins=5, labels=['Low', 'Medium', 'High', 'Very High', 'Max'])

# Encode categorical columns into dummy variables
basket = pd.get_dummies(eco_rounds[['Loadout Value Binned', 'Remaining Credits Binned', 'Outcome']])

# Apply Apriori algorithm to find frequent item sets
frequent_itemsets = apriori(basket, min_support=0.1, use_colnames=True)

# Find association rules
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1, num_itemsets=10000)

# Show the resulting rules
rules.head()



Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
0,(Loadout Value Binned_Very High),(Outcome),0.592571,0.5,0.325897,0.549971,1.099941,1.0,0.029611,1.111039,0.22301,0.425078,0.099941,0.600882
1,(Outcome),(Loadout Value Binned_Very High),0.5,0.592571,0.325897,0.651793,1.099941,1.0,0.029611,1.170078,0.181721,0.425078,0.145356,0.600882
2,(Loadout Value Binned_Low),(Remaining Credits Binned_Low),0.148896,0.625717,0.121632,0.81689,1.305527,1.0,0.028465,2.044038,0.274967,0.186271,0.510772,0.505639
3,(Remaining Credits Binned_Low),(Loadout Value Binned_Low),0.625717,0.148896,0.121632,0.194388,1.305527,1.0,0.028465,1.056469,0.625264,0.186271,0.05345,0.505639
4,(Remaining Credits Binned_Low),(Loadout Value Binned_High),0.625717,0.141798,0.11746,0.18772,1.323859,1.0,0.028734,1.056535,0.653603,0.180692,0.05351,0.508041


In [41]:
from sklearn.linear_model import LogisticRegression

#import the modules
import os
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
from mlxtend.frequent_patterns import apriori, association_rules

def convert_to_numeric(value):
    """Convert string values like '24.6k' to numerical values like 24600 (int)."""
    if isinstance(value, str):
        if 'k' in value.lower():
            return int(float(value.lower().replace('k', '').strip()) * 1000)
        elif 'm' in value.lower():
            return int(float(value.lower().replace('m', '').strip()) * 1000000)
        # Add more cases as needed (e.g., handling 'b' for billions)
    return pd.to_numeric(value, errors='coerce', downcast='integer')

# Load the eco_rounds.csv file
file_path = r'Dataset3\vct_2024\matches\eco_rounds.csv'
eco_rounds = pd.read_csv(file_path)

# Convert 'Loadout Value' and 'Remaining Credits' using the convert_to_numeric function
eco_rounds['Loadout Value'] = eco_rounds['Loadout Value'].apply(convert_to_numeric)
eco_rounds['Remaining Credits'] = eco_rounds['Remaining Credits'].apply(convert_to_numeric)

# Filter the data where Tournament is "Valorant Champions 2024"
eco_rounds_filtered = eco_rounds[eco_rounds['Tournament'] == 'Valorant Champions 2024']

# Drop rows with NaN values (if any) in relevant columns
eco_rounds_filtered = eco_rounds_filtered.dropna(subset=['Loadout Value', 'Remaining Credits', 'Outcome'])

# Prepare the features and target variable
X = eco_rounds_filtered[['Loadout Value', 'Remaining Credits']]
y = eco_rounds_filtered['Outcome']

# Fit a logistic regression model (as Outcome is binary)
log_reg = LogisticRegression()
log_reg.fit(X, y)

# Coefficients and intercept of the model
print(f"Coefficients: {log_reg.coef_}")
print(f"Intercept: {log_reg.intercept_}")

# Predicting and evaluating the model
predictions = log_reg.predict(X)

# Visualizing the decision boundary
x_min, x_max = eco_rounds_filtered['Loadout Value'].min() - 1, eco_rounds_filtered['Loadout Value'].max() + 1
y_min, y_max = eco_rounds_filtered['Remaining Credits'].min() - 1, eco_rounds_filtered['Remaining Credits'].max() + 1

# Increase step size from 0.1 to 0.5 to reduce memory usage
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.1), np.arange(y_min, y_max, 0.1))

# Ensure valid numeric values for predictions
Z = log_reg.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

# Make sure Z is a valid numeric array (replace any NaN with a valid number)
Z = np.nan_to_num(Z)

# Plot the decision boundary
plt.contourf(xx, yy, Z, alpha=0.75, cmap='coolwarm')
plt.scatter(eco_rounds_filtered['Loadout Value'], eco_rounds_filtered['Remaining Credits'], c=eco_rounds_filtered['Outcome'], cmap='coolwarm', edgecolors='k')
plt.xlabel('Loadout Value')
plt.ylabel('Remaining Credits')
plt.title('Logistic Regression Decision Boundary (Filtered for Valorant Champions 2024)')
plt.show()


Coefficients: [[ 5.56471571e-05 -1.58500919e-05]]
Intercept: [-0.86160298]


MemoryError: Unable to allocate 888. GiB for an array with shape (415020, 287020) and data type float64

In [1]:


# Visualization 1: Correlation Heatmap
plt.figure(figsize=(12, 8))
corr_matrix = data[['Pick Rate', 'Attacker Side Win Percentage', 'Defender Side Win Percentage', 'Rating (R)', 'Kills Per Round (KPR)']].corr()
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', fmt=".2f")
plt.title("Correlation Heatmap")
plt.show()

# Visualization 2: Scatter Plot - Pick Rate vs Rating (R)
plt.figure(figsize=(10, 6))
sns.scatterplot(x='Pick Rate', y='Rating (R)', hue='Total Wins By Map', size='Total Maps Played', sizes=(20, 200), data=data)
plt.title("Pick Rate vs Rating (R) with Total Wins and Maps Played")
plt.xlabel("Pick Rate")
plt.ylabel("Rating (R)")
plt.legend(title="Legend", bbox_to_anchor=(1.05, 1), loc='upper left')
plt.show()

# Visualization 3: Boxplot - Economy Type vs Kills Per Round (KPR)
plt.figure(figsize=(12, 6))
sns.boxplot(x='Type', y='Kills Per Round (KPR)', data=data)
plt.title("Economy Round Type vs Kills Per Round (KPR)")
plt.xlabel("Economy Round Type")
plt.ylabel("Kills Per Round (KPR)")
plt.show()

# Visualization 4: Pairplot for Player Performance Metrics
performance_metrics = data[['Rating (R)', 'Kills Per Round (KPR)', 'Assists Per Round (APR)', 'Clutch Success %', 'Average Combat Score (ACS)']]
sns.pairplot(performance_metrics)
plt.suptitle("Player Performance Metrics Pairplot", y=1.02)
plt.show()

# Visualization 5: Bar Plot - Map Win Rates by Attacker and Defender Side
plt.figure(figsize=(14, 8))
win_rates = data.groupby('Game ID')[['Attacker Side Win Percentage', 'Defender Side Win Percentage']].mean().reset_index()
win_rates = win_rates.melt(id_vars='Game ID', var_name='Side', value_name='Win Percentage')
sns.barplot(x='Game ID', y='Win Percentage', hue='Side', data=win_rates)
plt.title("Average Win Rates by Side and Map")
plt.xlabel("Game ID")
plt.ylabel("Win Percentage")
plt.xticks(rotation=45)
plt.show()


FileNotFoundError: [Errno 2] No such file or directory: 'your_dataset.csv'