# Nome: Gabriel Delgado Panovich de Barros
# RA: 176313

# Sobre o dataset
---
> Este arquivo foi copiado diretamente da [fonte](https://www.kaggle.com/datasets/christianlillelund/csgo-round-winner-classification) dos dados, com exceção das imagens.

## Context

CS:GO is a tactical shooter, where two teams (CT and Terrorist) play for a best of 30 rounds, with each round being 1 minute and 55 seconds. There are 5 players on each team (10 in total) and the first team to reach 16 rounds wins the game. At the start, one team plays as CT and the other as Terrorist. After 15 rounds played, the teams swap side. There are 7 different maps a game can be played on. You win a round as Terrorist by either planting the bomb and making sure it explodes, or by eliminating the other team. You win a round as CT by either eliminating the other team, or by disarming the bomb, should it have been planted.

## Content

The dataset was originally published by Skybox as part of their CS:GO AI Challenge, running from Spring to Fall 2020. The data set consists of ~700 demos from high level tournament play in 2019 and 2020. Warmup rounds and restarts have been filtered, and for the remaining live rounds a round snapshot have been recorded every 20 seconds until the round is decided. Following the initial publication, It has been pre-processed and flattened to improve readability and make it easier for algorithms to process. The total number of snapshots is 122411.

Skybox website: https://skybox.gg/

Learn more about CS:GO: https://en.wikipedia.org/wiki/Counter-Strike:_Global_Offensive

View CS:GO on Steam Store: https://store.steampowered.com/app/730/CounterStrike_Global_Offensive/

Find in-depth information on competitive CS:GO: https://www.hltv.org/

## Acknowledgements

Thanks to Skybox for taking the time to sample all the snapshots and organising the challenge. It wouldn't be possible to publish any of this without their help.

## Inspiration

- What types of machine learning models perform best on this dataset?
- Which features are most indicative of which teams wins the round?
- How often does the team with most money win?
- Are some weapons favourable to others?
- What attributes should your team have to win? Health, armor or money?

## Data Dictionary
Note: All snapshots are i.i.d in the sense that they each describe the state of a round
and can therefore be treated individually. Although multiple snaphots can be taken from the same round.

You are suppose to predict a label (round winner) based on each individual snapshot.

|Variable	     |Definition	                                        |Key                                      |
|----------------|------------------------------------------------------|-----------------------------------------|
|time_left       |The time left in the current round.                   |                                         |
|ct_score        |The current score of the Counter-Terrorist team.      |                                         |
|t_score         |The current score of the Terrorist team.              |                                         |
|map             |The map the round is being played on.	                |E.g. de_dust2, de_inferno and de_overpass|
|bomb_planted    |If the bomb has been planted or not.	                |False = No, True = Yes                   |
|ct_health       |The total health of all Counter-Terrorist players.	|Player health in range 0-100.            |
|t_health        |The total health of all Terrorist players.	        |Player health in range 0-100.            |
|ct_armor        |The total armor of all Counter-Terrorist players.	    |                                         |
|t_armor         |The total armor of all Terrorist players.	            |                                         |
|ct_money        |The total bankroll of all Counter-Terrorist players.  |Amount in USD.                           |
|t_money         |The total bankroll of all Terrorist players.	        |Amount in USD.                           |
|ct_helmets      |Number of helmets on the Counter-Terrorist team.	    |                                         |
|t_helmets       |Number of helmets on the Terrorist team.	            |                                         |
|ct_defuse_kits  |Number of defuse kits on the Counter-Terrorist team.  |	                                      |
|ct_players_alive|Number of alive players on the Counter-Terrorist team.|Range 0 to 5.                            |
|t_players_alive |Number of alive players on the Terrorist team.	    |Range 0 to 5.                            |
|ct_weapon_X     |Weapon X count on Counter-Terrorist team.	            |E.g. Ak47, Deagle and UMP45.             |
|t_weapon_X      |Weapon X count on Terrorist team.	                    |E.g. Ak47, Deagle and UMP45.             |
|ct_grenade_X    |Grenade X count on Counter-Terrorist team.            |E.g. HeGrenade, Flashbang.               |
|t_grenade_X     |Grenade X count on Terrorist team.	                |E.g. HeGrenade, Flashbang.               |
|round_winner    |Winner.	                                            |CT = Counter-Terrorist, T = Terrorist    |

# Objetivo do modelo de rede neural
---
A função desta rede neural é reconhecer padrões nos times (Terroristas ou Contraterroristas) e verificar a influência desses atributos para a vitória da rodada da partida, jogada em uma melhor de 30 rodadas.

Utilizamos os dados relativos à quantidade de jogadores vivos em cada time, a vida total dos jogadores de cada time, o dinheiro em USD de cada time, o tempo restante no final do round e se a bomba foi plantada, além do time que venceu a rodada.

# Resultados
---
A análise de cada modelo está no [final do caderno](#resultados-obtidos-resumo).

In [None]:
import os
import math
import torch
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

from torch import nn
from torch.utils.data import DataLoader, TensorDataset
from typing import Dict, Tuple

# Preparação dos dados

In [None]:
def prepareData(path: str,
                shuffle: bool = True,
                batch_size: int = 32) -> Tuple[DataLoader, torch.Tensor]:
    """
    Recebe o caminho de um dataset e retorna os dados processados e preparados para serem utilizados
    no processo de aprendizado do modelo.

    ENTRADA:
        path: Caminho do arquivo.
        batch_size (opcional): Tamanho dos lotes nos dataloaders.
    SAÍDA:
        Tuple[DataLoader, torch.Tensor]: Tupla que contém os dados normalizados em um DataLoader,
        sem labels, e um tensor com as labels do dataset.
    """
    # --- 1. Carregamos o dataset completo, para uma divisão posterior ---
    full_data = pd.read_csv(path)

    # --- 2. Separação dos features e do label ---
    feats_df = full_data.iloc[:, :-1]
    label_df = full_data.iloc[:, -1]
    # Convertemos em tensores antes de normalizar
    feats_tensor = torch.tensor(feats_df.values, dtype=torch.float32)
    label_tensor = torch.tensor(label_df.values, dtype=torch.float32)

    # --- 3. Etapa de normalização dos dados ---
    # Usamos o método de normalização por distribuição
    # dim=0 indica que estamos trabalhando com as colunas
    mean = feats_tensor.mean(dim=0)
    std = feats_tensor.std(dim=0)
    # Evita divisão por zero se uma feature for constante
    std[std == 0] = 1.0
    feats_normalized = (feats_tensor - mean) / std

    # --- 4. Criação do TensorDataset e do DataLoader ---
    feats_dataset = TensorDataset(feats_normalized)
    data_loader = DataLoader(feats_dataset, batch_size=batch_size, shuffle=shuffle)

    return data_loader, label_tensor

def filterData(path: str = 'db/') -> None:
    """
    Função específica para o dataset escolhido. Filtra os dados para manter apenas:

    'ct_health',
    't_health',
    'ct_money',
    't_money',
    'time_left',
    'bomb_planted',
    'ct_players_alive',
    't_players_alive',
    'round_winner'

    ENTRADA:
        path: Caminho do dataset original, com exceção do arquivo em si.
    SAÍDA:
        None
    """
    # Carrega o arquivo CSV grande.
    db_path = os.path.join(path, 'csgo_round_snapshots.csv')
    df = pd.read_csv(db_path)

    # --- 1. Convertendo a coluna alvo (label) ---
    # Mapeia 'CT' para 1 e 'T' para 0
    df['round_winner_numeric'] = df['round_winner'].map({'CT': 1, 'T': 0})

    # --- 2. Converte a coluna bomb_planted para inteiros ---
    # False se torna 0 e True se torna 1
    df['bomb_planted'] = df['bomb_planted'].astype(int)

    # --- 3. Selecionando e ordenando as colunas finais ---
    # Lista com as features selecionadas por você
    features = [
        'ct_health',
        't_health',
        'ct_money',
        't_money',
        'time_left',
        'bomb_planted',
        'ct_players_alive',
        't_players_alive'
    ]

    # Coluna do label (alvo)
    label = 'round_winner_numeric'

    # Cria o DataFrame final com as features e o label como a ÚLTIMA coluna
    final_df = df[features + [label]]
    # Retiramos os dados
    filter = final_df[final_df['ct_health'] == 500.0]
    filter = filter[filter['t_health'] == 500.0]
    final_df.drop(filter.index, inplace=True)

    # --- 4. Salvando o novo arquivo CSV ---
    output_filename = 'csgo_processed.csv'
    final_path = os.path.join(path, output_filename)
    final_df.to_csv(final_path, index=False)

def viewData(path: str = 'db/') -> pd.DataFrame:
    arch = 'csgo_processed.csv'
    file_path = os.path.join(path, arch)

    df = pd.read_csv(file_path)
    return df

if __name__ == "__main__":    
    filterData()

# Funções de plotagem

In [None]:
def plotLosses(loss1: Dict[int, float], loss2: Dict[int ,float], loss1_label: str, loss2_label: str) -> None:
    plt.close("all")
    plt.figure()
    plt.plot(loss1.keys(), loss1.values(), label=loss1_label)
    plt.plot(loss2.keys(), loss2.values(), label=loss2_label)
    plt.title(f"{loss1_label} e {loss2_label}")
    plt.xlabel("Épocas")
    plt.ylabel("Erro")
    plt.legend()
    plt.show()

def hitMap(hit_map, saving: str):
    plt.close("all")
    plt.figure()
    sns.heatmap(hit_map, annot=True, fmt="d", cmap="viridis")
    plt.title("Hit Map (Frequência dos BMUs)")
    plt.xlabel("Coordenada X")
    plt.ylabel("Coordenada Y")
    plt.savefig(saving)
    plt.show()

def uMatrix(u_matrix, saving: str):
    plt.close("all")
    plt.figure()
    # Usar um mapa de cores reverso (ex: "gray_r") é comum para U-Matrix
    sns.heatmap(u_matrix, cmap="gray_r", annot=False)
    plt.title("U-Matrix (Fronteiras dos Clusters)")
    plt.xlabel("Coordenada X")
    plt.ylabel("Coordenada Y")
    plt.savefig(saving)
    plt.show()

def heatMap(plane_data: Dict[str, np.ndarray], saving) -> None:
    """
    Recebe um dicionário de planos de componentes (heatmaps) e plota
    num grid.
    
    plane_data: Dicionário no formato {nome_da_feature: array_2D_de_pesos}
    saving: Caminho para salvar a imagem (ex: "component_planes.png")
    """
    plt.close("all")
    num_features = len(plane_data)

    # --- 1. Determina o layout do grid (ex: 8 features -> 3x3 grid) ---
    grid_size = math.ceil(math.sqrt(num_features))
    
    # --- 2. Cria a figura e os eixos (subplots) ---
    fig, axs = plt.subplots(grid_size, grid_size, figsize=(grid_size * 5, grid_size * 5))
    # Transforma axs num array 1D para fácil iteração
    axs = axs.flatten() 

    i = 0
    # --- 3. Itera sobre o dicionário e plota cada heatmap ---
    for feature_name, feature_plane in plane_data.items():
        sns.heatmap(feature_plane, ax=axs[i], cmap="coolwarm", cbar=True)
        axs[i].set_title(feature_name)
        axs[i].set_aspect('equal') # Garante que os "pixels" sejam quadrados
        i += 1
    
    # --- 4. Desliga os eixos (subplots) extras que não foram usados ---
    for j in range(i, len(axs)):
        axs[j].axis('off')

    plt.tight_layout() # Ajusta para evitar sobreposição de títulos
    plt.savefig(saving)
    plt.show()