# Revised Autoencoder
In this notebook we attempt to train an autoencoder on the input data for one minute segments. Two main challenges are that the input data is a mix of analog and digital signals and that buttons are not pressed much more than they are pressed. This necessitaies a custom loss function.

## Import


In [1]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader
from torch.optim import Adam
from torchsummary import summary
from torch.cuda.amp import autocast, GradScaler
from torch.optim.lr_scheduler import ReduceLROnPlateau

import numpy as np
import gzip
import pickle
import os
from sklearn.model_selection import train_test_split
from tqdm import tqdm
import gc
import time
import random
import matplotlib.pyplot as plt
from sklearn.metrics import ConfusionMatrixDisplay, confusion_matrix
from collections import deque
import pandas as pd
import math


import sys
sys.path.append('..')
from slp_package.input_dataset import InputDataSet
import slp_package.pytorch_functions as slp_pytorch_functions

def set_seed(seed=42):
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)  # if you are using CUDA
    np.random.seed(seed)
    random.seed(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False

set_seed(42)
torch.cuda.is_available()

True

## Create the training and test sets
We use the InputDataset class to create the training and test sets. Unlike the classification with ResNet, we do not create a balacened dataset (with respect to the character the player is playing).  We still need to specify the labels, but they are not used in the training.

In [2]:
source_data = ['ranked','public','mango']

general_features = {
    'stage_name': ['FOUNTAIN_OF_DREAMS','FINAL_DESTINATION','BATTLEFIELD','YOSHIS_STORY','POKEMON_STADIUM','DREAMLAND'],
    'num_players': [2],
    'conclusive': [True],
}
player_features = {
    'character_name': ['FOX', 'CAPTAIN_FALCON', 'SHEIK', 'FALCO', 'GAME_AND_WATCH', 'MARTH', 'LINK', 'ICE_CLIMBERS', 'SAMUS', 'GANONDORF', 'BOWSER', 'MEWTWO', 'YOSHI', 'PIKACHU', 'JIGGLYPUFF', 'NESS', 'DR_MARIO', 'MARIO', 'PEACH', 'ROY', 'LUIGI', 'YOUNG_LINK', 'DONKEY_KONG', 'PICHU', 'KIRBY'],
    'type_name': ['HUMAN']
}
opposing_player_features = {
    # 'character_name': ['MARTH'],
    # 'netplay_code': ['KOD#0', 'ZAIN#0']
    'type_name': ['HUMAN']
}

# We will not be training with a label.
label_info = {
    'source': ['player'], # Can be 'general', 'player
    'feature': ['character_name']
}

In [3]:
dataset = InputDataSet(source_data, general_features, player_features, opposing_player_features, label_info)
dataset.dataset.head()

  processed_df = pd.concat([player_1_df, player_2_df], ignore_index=True)


Unnamed: 0,stage_name,num_players,conclusive,player_character_name,player_type_name,opposing_player_type_name,player_inputs_np_sub_path,length,labels
0,FINAL_DESTINATION,2,True,FALCO,HUMAN,HUMAN,mango\FALCO\727e819f-8cb3-4c3f-bf0a-ceefa9e41c...,5606,FALCO
1,FINAL_DESTINATION,2,True,FALCO,HUMAN,HUMAN,mango\FALCO\76fe3db5-60de-46bb-8f0d-80d48822a8...,5754,FALCO
2,POKEMON_STADIUM,2,True,MARTH,HUMAN,HUMAN,mango\MARTH\7e6b417f-249d-4629-b6dc-2fe1d95d8f...,6213,MARTH
3,FOUNTAIN_OF_DREAMS,2,True,FOX,HUMAN,HUMAN,mango\FOX\32305eaf-71d8-46e5-a8a1-2c7c890a9baf...,7621,FOX
4,FINAL_DESTINATION,2,True,FALCO,HUMAN,HUMAN,mango\FALCO\a5396c32-6f2c-4b88-8582-f8b875bb55...,7840,FALCO


Here we specify the segment length and the shift we wish to take. We choose a shift of 1800 frames meaning the segments overlap by half. The training and test set do not contain segments from the same game, this avoids data leakage.

In [4]:
segment_length = 3600
shift = 1800

train_df, test_df = dataset.all_segments_train_test_split_dataframes(segment_length,shift=shift, proportion_of_segments=1, test_ratio = .1, val = False)
# porportion = 1
# train_df = train_df.sample(frac=porportion, random_state = 42)
porportion = .5
test_df = test_df.sample(frac=porportion, random_state = 42)
print(train_df.shape)
print(test_df.shape)
train_df.head()


(1472456, 8)
(81775, 8)


Unnamed: 0,player_inputs_np_sub_path,length,num_segments,labels,encoded_labels,segment_index,segment_start_index,segment_length
0,ranked\FALCO\e34756b0-46b5-4db2-a3c4-453aec014...,11040,5,FALCO,4,0,0,3600
1,ranked\FALCO\e34756b0-46b5-4db2-a3c4-453aec014...,11040,5,FALCO,4,1,1800,3600
2,ranked\FALCO\e34756b0-46b5-4db2-a3c4-453aec014...,11040,5,FALCO,4,2,3600,3600
3,ranked\FALCO\e34756b0-46b5-4db2-a3c4-453aec014...,11040,5,FALCO,4,3,5400,3600
4,ranked\FALCO\e34756b0-46b5-4db2-a3c4-453aec014...,11040,5,FALCO,4,4,7200,3600


## Data Loaders
We have a mix of binary and semi-continuous data.

### The Analog Inputs
The analog sticks take values between $[-1, -0.2875]\cup\{0\}\cup[0.2875, 1]$ in increments of .0125. There is a deadzone around 0. The values are not uniformly distributed, there is more values at the edges of the circle. The distribution of the values is shown below (we remove the (0,0) value so that the other values show up on the plots).
![Analog Sticks](stick_hist.png)
We transform the analog inputs in the TrainingDataset class to evenly spaced in the interval $[-1, 1]$.
```python
 # 1) Shift and scale analog inputs to [0, 1]
analog_transformed = np.copy(segment[0:4])
analog_transformed[analog_transformed > 0] -= 0.2875 + 0.0125
analog_transformed[analog_transformed < 0] += 0.2875 - 0.0125
analog_transformed *= 0.5 / 0.725
analog_transformed += 0.5
transformed[0:4] = analog_transformed

```
When we asses the model's performance, we care that the model gets within the bins of the analog inputs or how many bins we are away from the target.
```python
integer_stick_targets = np.round(target[:,0:4] / 0.008620689655172415 ).astype(np.int32)
integer_stick_pred = np.round(pred[:,0:4] / 0.008620689655172415).astype(np.int32)
```
Because the model has the most trouble predicting the J-Stick values (followed by the C-Stick values), we engineer a feature that is 1 if the stick value is zero on a particular axis (this is optional).
```python
# 2) Mark positions where analog inputs are zero
transformed[4:8] += (segment[:4] == 0)
```
### The Digital Inputs
The digital inputs are binary, but the buttons are not pressed much more than they are pressed. We do not do perform any transformations on those inputs.

In [5]:
class TrainingDataset(Dataset):
    """
    Custom dataset for loading and optionally transforming game segments from compressed NumPy files.
    
    Parameters
    ----------
    df : pd.DataFrame
        Must include the following columns:
          - 'player_inputs_np_sub_path': file paths to the compressed NumPy files
          - 'encoded_labels': integer-encoded labels
          - 'segment_start_index': start index for each segment
          - 'segment_length': length of each segment in frames
    transform : bool, default=False
        If True, applies a specific transformation to each loaded segment (e.g., scaling analog inputs).
    """
    def __init__(self, df, transform=False):
        self.file_paths = df['player_inputs_np_sub_path'].to_numpy()
        self.encoded_labels = df['encoded_labels'].to_numpy()
        self.segment_start_index = df['segment_start_index'].to_numpy()
        self.segment_length = df['segment_length'].to_numpy()
        self.transform = transform

        # Optional: you can store a shape attribute to document the shape 
        # of data that __getitem__ will return. 
        # We'll initialize it to None and fill it when the first item is fetched.
        self.sample_shape = None

    def __len__(self):
        """
        Returns the total number of samples in the dataset.
        """
        return len(self.file_paths)

    def __getitem__(self, idx):
        """
        Retrieves the sample (and possibly label) from the dataset at index 'idx'.

        In this custom dataset:
          1. We open the compressed file corresponding to self.file_paths[idx].
          2. We slice out the segment using self.segment_start_index[idx] and
             self.segment_length[idx].
          3. If transform=True, we apply additional transformations (shifting, scaling, etc.).
          4. We return a PyTorch tensor containing the processed segment.

        Parameters
        ----------
        idx : int
            Index of the sample to be fetched.

        Returns
        -------
        torch.Tensor
            A tensor representing the selected segment, after optional transformations.
        """
        # Load the uncompressed file
        file_path = self.file_paths[idx].replace('\\', '/')
        with gzip.open('/workspace/melee_project_data/input_np/' + file_path, 'rb') as f:
            segment = np.load(f)

        # Determine slice boundaries
        start = int(self.segment_start_index[idx])
        end = start + int(self.segment_length[idx])

        # Extract the segment
        segment = segment[:, start:end]

        # Apply transformations if requested
        if self.transform:
            # Example transformation: shape = (9+4, 3600) for some reason
            transformed = np.zeros((9 + 4, int(self.segment_length[idx])))

            # 1) Shift and scale analog inputs to [0, 1]
            analog_transformed = np.copy(segment[0:4])
            analog_transformed[analog_transformed > 0] -= 0.2875 + 0.0125
            analog_transformed[analog_transformed < 0] += 0.2875 - 0.0125
            analog_transformed *= 0.5 / 0.725
            analog_transformed += 0.5
            transformed[0:4] = analog_transformed

            # 2) Mark positions where analog inputs are zero
            transformed[4:8] += (segment[:4] == 0)

            # # Possible additional transformations:
            # # 3) Some custom “transition” measure on last 5 rows
            # prepend = np.expand_dims(segment[-5:, 0], axis=1)
            # transitions = np.abs(np.diff(segment[-5:], axis=1, prepend=prepend))
            # transformed[8:13] += transitions

            # 4) Add button inputs
            transformed[-5:] += segment[-5:]

        else:
            # If not transforming, produce something simpler (9 x 60)
            transformed = np.zeros((9, int(self.segment_length[idx])))

            # 1) Shift and scale analog inputs to [0, 1]
            analog_transformed = np.copy(segment[0:4])
            analog_transformed[analog_transformed > 0] -= 0.2875 + 0.0125
            analog_transformed[analog_transformed < 0] += 0.2875 - 0.0125
            analog_transformed *= 0.5 / 0.725
            analog_transformed += 0.5
            transformed[0:4] = analog_transformed

            # 2) Transform the Trigger to 0/1
            transformed[-5] += (segment[-5] > 0.5)

            # 3) The last 4 rows become button inputs
            transformed[-4:] += segment[-4:]

        # Convert to PyTorch tensor
        segment_tensor = torch.from_numpy(transformed).float()

        # Optionally store the shape of the output the first time __getitem__ is called
        if self.sample_shape is None:
            self.sample_shape = segment_tensor.shape

        return segment_tensor


def prepare_data_loaders(train_df, test_df, batch_size, num_workers,  transform = True):
    """
    Creates DataLoader objects for training and testing sets.

    Parameters
    ----------
    train_df : pd.DataFrame
    test_df : pd.DataFrame
    batch_size : int
    num_workers : int

    Returns
    -------
    dict of DataLoader
        'train' -> training DataLoader
        'test' -> testing DataLoader
    """
    train_dataset = TrainingDataset(train_df, transform=transform)
    test_dataset = TrainingDataset(test_df, transform=transform)

    loaders = {
        'train': DataLoader(
            train_dataset, 
            batch_size=batch_size, 
            shuffle=True, 
            num_workers=num_workers, 
            pin_memory=True,
            persistent_workers=True
        ),
        'test': DataLoader(
            test_dataset, 
            batch_size=batch_size, 
            shuffle=False, 
            num_workers=num_workers, 
            pin_memory=True,
            persistent_workers=True
        )
    }
    return loaders


## Training and Predict
Our training loop includes a progress bar that updates every five minutes of traning. We display the batches per second (this is useful for comparinig how much of a difference compiling the model makes). The progress bar keeps track of the loss for each of over the previous 5 minutes as well as the best loss the model achived. It has a patience that keeps track of the number of 5 minute intervals that the loss has not improved.

We also display the maximum and minimum of both the gradient and the parameters. This is a holdover from debugging the model where we learned that we cannot use mixed precision traning to train the autoencoder.

In [None]:
def train_model_with_virtual_epochs(model, criterion, optimizer, loaders, device, channels, segment_length, num_epochs=1, bce_scale=100):
    scheduler = ReduceLROnPlateau(optimizer, 'min', patience=15, factor=0.1)
    best_loss = float('inf')
    best_model = None
    
    vepoch_total = 0
    vepoch_loss_sum = 0
    best_vepoch_loss = float('inf')
    early_stopping_patience = 0

    for epoch in range(num_epochs):
        model.train()
        train_loader_tqdm = tqdm(loaders['train'], desc=f'Epoch {epoch+1}/{num_epochs}', unit='batch')
        virtual_epoch_start_time = time.time()

        # Initialize variables for tracking gradient and parameter stats
        grad_max = float('-inf')
        grad_min = float('inf')
        param_max = float('-inf')
        param_min = float('inf')

        for batch_number, target_cpu in enumerate(train_loader_tqdm):
            target_gpu = target_cpu.to(device)
            optimizer.zero_grad()
            output_gpu = model(target_gpu)
            loss = criterion(output_gpu, target_gpu) / (channels * segment_length * target_cpu.size(0))
            
            loss.backward()

            # Track max and min of gradients
            batch_grad_max = max((p.grad.max().item() for p in model.parameters() if p.grad is not None), default=grad_max)
            batch_grad_min = min((p.grad.min().item() for p in model.parameters() if p.grad is not None), default=grad_min)
            grad_max = max(grad_max, batch_grad_max)
            grad_min = min(grad_min, batch_grad_min)

            optimizer.step()

            vepoch_total += target_cpu.size(0)
            vepoch_loss_sum += loss.item() * target_cpu.size(0)

            if time.time() - virtual_epoch_start_time > 60*5:
                vepoch_loss = vepoch_loss_sum / vepoch_total
                if best_vepoch_loss > vepoch_loss:
                    best_vepoch_loss = vepoch_loss
                else:
                    early_stopping_patience += 1

                # Calculate max and min of model parameters at the end of the virtual epoch
                param_max = max(p.data.max().item() for p in model.parameters())
                param_min = min(p.data.min().item() for p in model.parameters())

                train_loader_tqdm.set_postfix(
                    Best=f'{best_vepoch_loss * bce_scale:.10f}',
                    Vepoch=f'{vepoch_loss * bce_scale:.10f}',
                    patience=early_stopping_patience,
                    Grad_Max=grad_max,
                    Grad_Min=grad_min,
                    Param_Max=param_max,
                    Param_Min=param_min
                )
                # print('Grad Max:', grad_max, ' Grad Min:', grad_min)
                virtual_epoch_start_time = time.time()
                vepoch_total = 0
                vepoch_loss_sum = 0
                grad_max = float('-inf')  # Reset for next virtual epoch
                grad_min = float('inf')   # Reset for next virtual epoch

    return best_model

def predict(model, loaders, loader, device):
    model.eval()
    predictions = []
    targets = []
    
    with torch.no_grad():
        eval_loader_tqdm = tqdm(loaders[loader], unit='batch')
        
        for _, target_cpu in enumerate(eval_loader_tqdm):
            target_gpu = target_cpu.to(device)
            output_gpu = model(target_gpu)
            # output_gpu = torch.sigmoid(output_gpu)
            
            predictions.append(torch.sigmoid(output_gpu).cpu().numpy())
            targets.append(target_cpu.numpy())
    
    predictions = np.concatenate(predictions, axis=0)
    targets = np.concatenate(targets, axis=0)
    
    

    return predictions, targets



### Custom Loss Function Explanation

Our dataset contains a mixture of analog and binary signals, so we need a loss function that can handle both types effectively. In the code snippet, we define a `CustomLoss` class that does the following:

1. **Analog Channels (first 4 channels) with MSE-like Objective**  
   - We want our network to reconstruct the analog stick values in a way that maps closely back to the original controller bins.  
   - Because the model outputs logits that get fed through a sigmoid for the analog channels, we first apply:
     ```python
     torch.sigmoid(pred[:, 0:4, :])
     ```
     to constrain predictions to the range \([0, 1]\).  
   - We then compare that to the ground-truth analog values (also in \([0, 1]\)), but we scale the difference by a factor \(\tfrac{1}{5 \times 0.0086206896}\). This factor can be interpreted as an approximation of the “bin width” for the analog sticks and a further manual scaling to ensure the MSE term is on a comparable scale to the binary loss term.
   - Finally, we take the squared difference and feed it to a BCE-like criterion (with `zeros` as target). This construction effectively pushes the scaled difference toward zero. Although slightly unorthodox in its implementation (since we typically see `nn.MSELoss` directly), the outcome is similar: small differences between predictions and targets lead to a smaller loss, encouraging precise analog reconstruction.
   
   We apply log to the mse term for the anolog channels to make the loss less sensitive to small errors. We see in the gragh below that $x^2$ (green) has does not level off near zero, but $\log(x^2)$ (blue) does. We prefer the leveling off because we do not want the model to be penalized too much for small errors (once the model gets within $0.008620689655172415$ of the target, we round to the target).

   ![Log MSE](quadratic_vs_log_quadratic.png)

2. **Binary Channels (remaining channels) with BCE**  
   - The rest of the channels are binary (buttons pressed or not). We use a binary cross-entropy loss to measure how well the model predicts these button states:
     ```python
     self.BCE_buttons(pred[:, 4:, :], target[:, 4:, :])
     ```
   - Because some buttons are pressed much more (or less) frequently than others, we use `pos_weight` for class-imbalance correction if `weighted=True`. This gives the model a stronger penalty for misclassifying the minority (pressed) cases, preventing it from trivially predicting zeros.

3. **Combining the Losses**  
   - We sum the analog-channel loss and the binary-channel loss into a single scalar.  
   - The final loss ensures the model learns to reconstruct both the analog inputs (through the scaled MSE-like term) and the binary inputs (through the BCE term).  
   - The hyperparameter `bce_scale` (or an equivalent division factor) can be adjusted to balance the importance of reconstructing analog vs. binary channels.

Overall, this custom loss function is designed to tackle two key challenges: (1) making sure the model can accurately generate analog outputs that match the discrete bins of the controller sticks, and (2) dealing with significant class imbalance in the binary button-press signals. By scaling and weighting different components of the loss, we aim to encourage the network to pay attention to both the analog and digital elements of the signal without allowing one to dominate the optimization.

In [None]:
class CustomLoss(nn.Module):
    def __init__(self, bce_scale=100, transform=False, weighted=False, channels=13, segment_length=3600):
        super(CustomLoss, self).__init__()
        
        # Fraction of times each button is pressed in your sample
        buttons_sample_mean = [
            0.16908772957310006,  # TRIGGER_LOGICAL
            0.008974353071937505, # Z
            0.060945588829374495, # A
            0.04591526858731047,  # B
            0.09663690337362206   # X_or_Y
        ]
        # If transform == True, you also have additional ones for jstick/cstick?
        trigger_logical_sample_mean = [
            0.45849791926398437,  # JSTICK_X_LOGICAL
            0.6879025510132348,   # JSTICK_Y_LOGICAL
            0.9726537459234259,   # CSTICK_X_LOGICAL
            0.971675825912117     # CSTICK_Y_LOGICAL
        ]

        # Create pos_weight or bce_weights depending on your logic
        if transform:
            # Merge your two sets if needed
            sample_means = trigger_logical_sample_mean + buttons_sample_mean
        else:
            sample_means = buttons_sample_mean

        # pos_weight for each dimension: (1 - p) / p
        pos_weight_vals = np.zeros((channels-4, segment_length))
        for i, mean in enumerate(sample_means):
            p_pos = mean
            p_neg = 1.0 - mean
            
            pos_weight_vals[i,:] += p_neg / p_pos
        pos_weight_tensor = torch.tensor(pos_weight_vals, dtype=torch.float, device='cuda')

        if weighted:
            # Use pos_weight instead of 'weight'
            self.BCE_buttons = nn.BCEWithLogitsLoss(reduction='sum', pos_weight=pos_weight_tensor)
        else:
            self.BCE_buttons = nn.BCELoss(reduction='sum')

        # Save the other components
        self.bce_scale = bce_scale
        self.BCE_sticks = nn.BCEWithLogitsLoss(reduction='sum')

    def forward(self, pred, target):
        """
        pred, target shape: (B, Channels, T)
        We'll assume:
          - pred[:, 0:4, :] are analog predictions (MSE)
          - pred[:, 4:, :] are button predictions (BCE)
        """
        # 1) MSE for first 4 analog channels
        mse_loss = ((torch.sigmoid(pred[:, 0:4, :])- target[:, 0:4, :]) / (5 * 0.008620689623058)).pow(2) / 2
        zeros = torch.zeros_like(mse_loss)
        bce_loss_sticks = self.BCE_sticks(mse_loss, zeros)
        # mse_loss_cstick = self.MSE(pred[:, 2:4, :], target[:, 2:4, :]) 
        # 2) BCE for the rest
        bce_loss_buttons = self.BCE_buttons(pred[:, 4:, :], target[:, 4:, :])

        # Scale & return combined
        return  bce_loss_buttons + bce_loss_sticks

## Initialize the Model

In [14]:
# trasform = True adds binary features corresponding to when the analog inputs are 0.
transform = True
# bce_scale is a tunable parameter that scales the binary cross-entropy loss.
bce_scale = 1
# weighted = True weights the loss function to account for the imbalance of the button being pressed.
weighted = True

loaders = prepare_data_loaders(train_df, test_df, batch_size=16, num_workers=20,  transform=transform)
# Grab one item (segment tensor) from the train dataset
train_dataset = loaders['train'].dataset
first_item = train_dataset[0]
channels = first_item.size(0)

from Convolutional_Autoencoder_Model import ResNet_Autoencoder
# Initialize the model
model = ResNet_Autoencoder(channels)
state_dict = torch.load('/workspace/melee_project_data/autoencoder_models/autoencoder_revised_one_epoch_4.pt')
model.load_state_dict(state_dict)
model = model.cuda()
model = torch.compile(model, mode='max-autotune')
# With the size of an input we can get a model summary.
summary(model, input_size=(channels, segment_length))

13


## Model Training
We train the model for an epoch (which takes about three hours), evaluate it, then train another epoch.

In [23]:
criterion = CustomLoss(bce_scale=bce_scale, transform=transform, weighted=weighted, channels=channels, segment_length=segment_length)

optimizer = Adam(model.parameters(), lr=0.0001)
num_epochs = 1

# This seems to sometimes help
gc.collect()
torch.cuda.empty_cache()
# Train the model
train_model_with_virtual_epochs(model, criterion, optimizer, loaders, 'cuda', channels, segment_length, num_epochs, bce_scale=bce_scale)


Epoch 1/1: 100%|██████████| 92029/92029 [3:09:59<00:00,  8.07batch/s, Best=0.4978289099, Grad_Max=4.41, Grad_Min=-6.17, Param_Max=3.56, Param_Min=-2.35, Vepoch=0.4978289099, patience=23]  


### Save the Model
We need to save the model in its original form so that we can load it later. If we just saved the state dictionoary of the compiled model, we would not be able to load it later.

In [16]:
orig_model = model._orig_mod
torch.save(orig_model.state_dict(), '/workspace/melee_project_data/autoencoder_models/autoencoder_revised_one_epoch_5.pt')


## Stop the notebook
We make sure to stop the notebook here becauese the next cells use a lot of memory and we do not want to run them by accident.

In [17]:
# stop it from running all
raise Exception('Stop running')

Exception: Stop running

Check to see that the model loeads correctly.

In [24]:
from Convolutional_Autoencoder_Model import ResNet_Autoencoder
# Initialize the model
model_2 = ResNet_Autoencoder(channels)
state_dict_2 = torch.load('/workspace/melee_project_data/autoencoder_models/autoencoder_revised_one_epoch_5.pt')
model_2.load_state_dict(state_dict_2)
model_2.to('cuda')
model_2 = torch.compile(model_2, mode = 'max-autotune')


Predict on the test set.

In [25]:
gc.collect()
torch.cuda.empty_cache()

pred, target = predict(model_2, loaders, 'test','cuda')

100%|██████████| 5111/5111 [02:28<00:00, 34.45batch/s]


### Accuracy of the Stick predictions
The transforemed stick values are in the range $[0, 1]$, but only take values in increments of roughly $0.00862$. We round the predictions to the nearest permissible value and calculate the percentage of frames that are correct to within a certain number of bins.

In [26]:
unique = np.unique(target[:,0])
unique_diff = np.diff(unique)
scale = np.mean(unique_diff[1:])

integer_stick_targets = np.round(target[:,0:4] / scale ).astype(np.int16)
integer_stick_pred = np.round(pred[:,0:4] / scale).astype(np.int16)



n = 10

buttons = ['JSTICK_X', 'JSTICK_Y', 'CSTICK_X', 'CSTICK_Y']
# buttons = ['X_or_Y']


stick_accuracy_df = pd.DataFrame(np.arange(n,dtype=np.int16),columns=['How Close'])
# print(summary_df)


for j in range(4):
    unique, counts = np.unique(integer_stick_pred[:,j] - integer_stick_targets[:,j], return_counts=True)
    data = []
    num = np.sum(counts)
    for i in range(n):
        mask = np.abs(unique) <= i
        data += [np.sum(counts[mask]) / num * 100]
    stick_accuracy_df[buttons[j]] = data
        
stick_accuracy_df



Unnamed: 0,How Close,JSTICK_X,JSTICK_Y,CSTICK_X,CSTICK_Y
0,0,21.846934,30.849162,63.738467,53.31581
1,1,38.235865,51.5148,87.480318,82.347296
2,2,47.925924,59.925596,92.453506,89.021015
3,3,55.149839,65.521103,94.54251,91.803883
4,4,60.622391,69.701577,95.611314,93.420068
5,5,64.945117,73.027633,96.227636,94.466899
6,6,68.510912,75.825144,96.615769,95.182737
7,7,71.553068,78.244224,96.876054,95.696722
8,8,74.206838,80.364655,97.060998,96.083443
9,9,76.563239,82.240707,97.198743,96.383779


We check to see how accurate the model is at predicting the stick being zero. We engineered a binary feature that is 1 if the stick value is zeor, but here we check to see if the model predicts the stick value and not the binary feature.

In [27]:
target_stick_is_zero = ((integer_stick_targets == 0)*1).astype(np.int16)
pred_stick_is_zero = ((integer_stick_pred == 0)*1).astype(np.int16)

zero_accuracy = []
# find the accuracy of the model when the stick is zero
for j in range(4):
    diff = np.abs(target_stick_is_zero[:,j] - pred_stick_is_zero[:,j]) 
    data = []
    num_correct = np.sum(diff == 0)
    zero_accuracy.append(num_correct / np.prod(diff.shape) * 100)
print(zero_accuracy)
# zero_accuracy_df = pd.DataFrame(columns=buttons, data=[zero_accuracy])


[95.10578993851693, 97.6362474948198, 99.72547980569992, 99.66317877645301]


By this point the notebook is taking up a lot of memory so we delete some variables.

In [28]:
del integer_stick_targets
del integer_stick_pred
del target_stick_is_zero
del pred_stick_is_zero
del diff
# del train_df
# del train_dataset
# del pred
# del pred_buttons
# del target_buttons
# del target

We check the accuracy of the button values.

In [29]:
buttons = [ 'TRIGGER_LOGICAL', 'Z', 'A', 'B', 'X_or_Y']

# Initializing the DataFrame
button_accuracy_df = pd.DataFrame(columns=['Button', 'Accuracy', 'Acc of 0', 'Acc of 1'])

target_buttons = target[:, 4 + 4 * transform:]
pred_buttons = pred[:, 4 + 4 * transform:] > 0.5
total = np.prod(target_buttons.shape[0]*target_buttons.shape[2])

# Computing accuracies and filling the DataFrame
rows = []  # List to hold row data

for i, button in enumerate(buttons):
    correct_predictions = np.sum(target_buttons[:, i] == pred_buttons[:, i])
    correct_zeros = np.sum((target_buttons[:, i] == 0) & (pred_buttons[:, i] == 0))
    correct_ones = np.sum((target_buttons[:, i] == 1) & (pred_buttons[:, i] == 1))

    accuracy = correct_predictions / total * 100
    acc_of_0 = correct_zeros / np.sum(target_buttons[:, i] == 0) * 100 if np.sum(target_buttons[:, i] == 0) > 0 else 0
    acc_of_1 = correct_ones / np.sum(target_buttons[:, i] == 1) * 100 if np.sum(target_buttons[:, i] == 1) > 0 else 0

    rows.append({
        'Button': button,
        'Accuracy': accuracy,
        'Acc of 0': acc_of_0,
        'Acc of 1': acc_of_1
    })

# Use concat to add all new rows to the DataFrame at once
button_accuracy_df = pd.concat([button_accuracy_df, pd.DataFrame(rows)], ignore_index=True)

# Output the DataFrame
button_accuracy_df

Unnamed: 0,Button,Accuracy,Acc of 0,Acc of 1
0,TRIGGER_LOGICAL,88.519695,90.179586,95.243926
1,Z,98.780531,98.769754,99.966871
2,A,93.515164,93.13387,99.312793
3,B,95.820327,95.658194,99.278211
4,X_or_Y,87.05754,86.287204,94.053059
