# <p style="font-family:JetBrains Mono; font-weight:bold; letter-spacing: 2px; font-size:120%; text-align:left;padding: 0px; border-bottom: 3px solid ">Google Slow VS Fast AI Runtime</p>

## Competition Understanding

At first glance this competition has a lot of moving parts, making it hard to understand how to approach. In this notebook I'm going through some of my thought processes to understand the competition and it's data.

Initial thoughts:
- **What are we predicting?** - Predict the runtime length of ML graphs and configurations.
- **What does the data look like?** - Graph configurations in npz format.
    - Two "collection types": `tile` and `layout` collections
- **What does the target look like?** - "Finally, for the layout collections, your job is to predict the order of the indices from best-to-worse configurations (i.e., ones leading to the smallest `d["config_runtime"]`)"
- **How are we evaluated?** two evaluation metrics, described below.

## Imports

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from glob import glob
from pathlib import Path
from tqdm import tqdm
import seaborn as sns
import os


## Data Structure
- `npz_all` contains the data.
    - Two collections types : `layout` and `tile` folders.
        - `layout` has `nlp` and `xla`
        - `tile` has `xla`
    -Each collection type has default/random
    -The data is split `train/test/valid`

In [None]:
# !tree -I *.npz /kaggle/input/predict-ai-model-runtime/

## Evaluation Metric

1. For collection `tile:xla`
    - We use the (1-slowdown) incurred of the top-K predictions to reflect how much slower the top-K configurations predicted by the model is from the actual fastest configuration
2. For collection `layout:*`
    - We use the Kendal Tau Correlation (a ranking metric: how well does your model-predicted ranking, correspond to the real ranking of runtimes).

## EDA

From the repo we get that:
​
> Suppose a `.npz` file stores a graph (representing a kernel) with `n` nodes and `m` edges. In addition, suppose we compile the graph with `c` different configurations, and run each on a TPU. Crucially, the configuration is at the graph-level. Then, the `.npz` file stores the following dictionary (can be loaded with `d = dict(np.load("npz/tile/xla/train/<pick 1>.npz"))`):
>   - Key `node_feat`: contains `float32` matrix with shape `(n, 140)`. The `u`-th row contains the feature vector for node `u` < `n` (please see Subsection "Node Features", below). Nodes are ordered topologically.
>   - Key `node_opcode` contains `int32` vector with shape `(n, )`. The `u`-th entry stores the op-code for node u (please see the mapping of opcode to instruction name here).
>   - Key `edge_index` contains `int32` matrix with shape `(m, 2)`. If entry `i` is = `[u, v]` (where `0 <= u, v < n`), then there is a directed edge from node `u` to node `v`, where `u` consumes the output of `v`.
>   - Key `config_feat` contains `float32` matrix with shape `(c, 24)` with row `j` containing the (graph-level) configuration feature vector (please see Subsection "Tile Config Features").
>   - Keys `config_runtime` and `config_runtime_normalizers`: both are `int64` vectors of length `c`. Entry `j` stores the runtime (in nanoseconds) of the given graph compiled with configuration `j` and a default configuration, respectively. Samples from the same graph may have slightly different `config_runtime_normalizers` because they are measured from different runs on multiple machines.
> Finally, for the tile collection, your job is to predict the indices of the best configurations (i.e., ones leading to the smallest `d["config_runtime"] / d["config_runtime_normalizers"]`).

>Lets try to look at one file from each collection<br>
>    - `layout/nlp` : '/kaggle/input/predict-ai-model-runtime/npz_all/npz/layout/nlp/default/train/electra_base_batch_size_16_train.npz'
>    - `layout/xla` : '/kaggle/input/predict-ai-model-runtime/npz_all/npz/layout/xla/default/train/resnet_v2_152_batch_64.npz'
>    - `tile/xla` :   '/kaggle/input/predict-ai-model-runtime/npz_all/npz/tile/xla/train/xception_imagenet_9b1704c883ceb0d.npz'

In [None]:
# EDA of albert_en_base_batch_size_16_train.npz

# Function to plot histograms for 1D arrays
def plot_histogram(data, title, xlabel, ylabel):
    plt.figure(figsize=(10, 6))
    sns.histplot(data, bins=30, kde=True)
    plt.title(title)
    plt.xlabel(xlabel)
    plt.ylabel(ylabel)
    plt.show()

In [None]:
# Load the .npz file
npz_file_path = '/kaggle/input/predict-ai-model-runtime/npz_all/npz/layout/nlp/default/train/electra_base_batch_size_16_train.npz'
npz_data = np.load(npz_file_path)

# Plot histogram for node_opcode
plot_histogram(npz_data['node_opcode'], 'Distribution of Node Opcodes', 'Opcode', 'Frequency')

# Plot histogram for config_runtime
plot_histogram(npz_data['config_runtime'], 'Distribution of Config Runtime', 'Runtime', 'Frequency')

# Plot some sample histograms for node_feat (first 5 features)
for i in range(5):
    plot_histogram(npz_data['node_feat'][:, i], f'Distribution of Node Feature {i+1}', f'Feature {i+1}', 'Frequency')

In [None]:
# Load the .npz file
npz_file_path = '/kaggle/input/predict-ai-model-runtime/npz_all/npz/layout/xla/default/train/resnet_v2_152_batch_64.npz'
npz_data = np.load(npz_file_path)

# Plot histogram for node_opcode
plot_histogram(npz_data['node_opcode'], 'Distribution of Node Opcodes', 'Opcode', 'Frequency')

# Plot histogram for config_runtime
plot_histogram(npz_data['config_runtime'], 'Distribution of Config Runtime', 'Runtime', 'Frequency')

# Plot some sample histograms for node_feat (first 5 features)
for i in range(5):
    plot_histogram(npz_data['node_feat'][:, i], f'Distribution of Node Feature {i+1}', f'Feature {i+1}', 'Frequency')

In [None]:
# Load the .npz file
npz_file_path = '/kaggle/input/predict-ai-model-runtime/npz_all/npz/tile/xla/train/xception_imagenet_9b1704c883ceb0d.npz'
npz_data = np.load(npz_file_path)

# Plot histogram for node_opcode
plot_histogram(npz_data['node_opcode'], 'Distribution of Node Opcodes', 'Opcode', 'Frequency')

# Plot histogram for config_runtime
plot_histogram(npz_data['config_runtime'], 'Distribution of Config Runtime', 'Runtime', 'Frequency')

# Plot some sample histograms for node_feat (first 5 features)
for i in range(5):
    plot_histogram(npz_data['node_feat'][:, i], f'Distribution of Node Feature {i+1}', f'Feature {i+1}', 'Frequency')

## Insights from basic EDA

### Node Opcodes (`node_opcode`)
- The operation codes (opcodes) are distributed in the wide range of 0-100.

### Config Runtime (`config_runtime`)
- The runtime values are distributed between 2.15 * 10^7 and 2.31 * 10^7. There's no distinct peak, but the data is slightly right-skewed.

### Node Features (`node_feat`)
- Histograms were plotted for the first 5 features.
- Features 1 and 5 exhibit a continuous distribution.
- Features 2 and 3 are all zeros.
- Feature 4 appears to be categorical (either 0 or 1).

Through this EDA, we have revealed the basic characteristics and distributions of each feature and target variable. The next step could involve investigating the correlations between features, as well as the relationship between the features and the target variables.

## Improved Model

In [None]:
!pip install torch-geometric torch-scatter

In [None]:
# Model imports
import torch
from torch import nn
from torch import Tensor
from torch_geometric.nn import GCNConv
from torch_geometric.datasets import Planetoid
from torch.utils.data import DataLoader, Dataset
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(device)

In [None]:
# We can load all the data in the dataframes to make working with it easier
def load_df(directory):
    splits = ["train", "valid", "test"]
    dfs = dict()
    
    for split in splits:
        path = os.path.join(directory, split)
        files = os.listdir(path)
        list_df = []
        
        for file in files:
            d = dict(np.load(os.path.join(path,file)))
            d['file'] = file
            list_df.append(d)
        dfs[split] = pd.DataFrame.from_dict(list_df)
    return dfs

If you try to run the following cell completely uncommented the Kaggle kernel will run out of memory and crash, so we will have to study the datasets individually

In [None]:
tile_xla = load_df("/kaggle/input/predict-ai-model-runtime/npz_all/npz/tile/xla/")
#layout_nlp_random = load_df("/kaggle/input/predict-ai-model-runtime/npz_all/npz/layout/nlp/random/")
#layout_nlp_default = load_df("/kaggle/input/predict-ai-model-runtime/npz_all/npz/layout/nlp/default/")
#layout_xla_random = load_df("/kaggle/input/predict-ai-model-runtime/npz_all/npz/layout/xla/random/")
#layout_xla_random = load_df("/kaggle/input/predict-ai-model-runtime/npz_all/npz/layout/xla/default/")

In [None]:
train_data = tile_xla["train"]

In [None]:
train_data.head()

In [None]:
# Define dataset and Model

class TileDataset(Dataset):
    def __init__(self, df):
        self.df = df

    def __len__(self):
        return len(self.df)

    def __getitem__(self, idx):
        row = self.df.iloc[idx]
        config_feat = torch.tensor(row['config_feat'].astype(np.float32))
        node_feat = torch.tensor(row['node_feat'].astype(np.float32))
        node_opcode = torch.tensor(row['node_opcode'].astype(np.int32))
        edge_index = torch.tensor(np.swapaxes(row['edge_index'],0,1).astype(np.int32))
        target = (row['config_runtime']/row['config_runtime_normalizers']).astype(np.float32)
        # minmax scale the target, we only care about order
        target = (target-min(target))/(max(target) -min(target))
        target = torch.tensor(target)
        return config_feat,node_feat,node_opcode,edge_index,target

In [None]:
class SimpleModel(torch.nn.Module):
    def __init__(self, hidden_channels, graph_feats, hidden_dim):
        super().__init__()
        op_embedding_dim = 4 # I choose 4-dimensional embedding
        self.embedding = torch.nn.Embedding(120, #120 different op-codes
                                            op_embedding_dim,
                                           )
        assert len(hidden_channels)>0
        in_channels = op_embedding_dim+140
        self.convs = torch.nn.ModuleList()
        last_dim = hidden_channels[-1]
        self.convs.append(GCNConv(in_channels, hidden_channels[0]))
        for i in range(len(hidden_channels)-1):
            self.convs.append(GCNConv(hidden_channels[i], hidden_channels[i+1]))
        self.convs.append(GCNConv(last_dim, graph_feats))
        
        self.dense = torch.nn.Sequential(nn.Linear(graph_feats+24, 64),
                                         nn.ReLU(),
                                         nn.Linear(64, 64),
                                         nn.ReLU(),
                                         nn.Linear(64, 1),
                                        )

        self.norms = torch.nn.ModuleList()
        for i in range(len(hidden_channels)):
            self.norms.append(torch.nn.BatchNorm1d(hidden_channels[i]))
        self.norms.append(torch.nn.BatchNorm1d(graph_feats))

    def forward(self, x_cfg: Tensor,x_feat: Tensor, x_op: Tensor, edge_index: Tensor) -> Tensor:
        
        #get graph features
        x = torch.concat([x_feat,self.embedding(x_op)],dim = 1)
        #pass though conv layers
        for i, conv in enumerate(self.convs):
            x = conv(x, edge_index).relu()
            x = self.norms[i](x)
        # get 1d graph embedding using average pooling
        x_graph = torch.mean(x,0)
        
        
        #put graph data into config data
        x = torch.concat([x_cfg,x_graph.repeat((len(x_cfg),1))],axis=1)
        #put into dense nn
        x = torch.flatten(self.dense(x))
        return x

model = SimpleModel(hidden_channels = [16,32,16,48],graph_feats = 64,hidden_dim=64).to(device)

In [None]:
%%time
# Lets train one many epoch

dataset = TileDataset(tile_xla["train"])
criterion = torch.nn.HuberLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-4,weight_decay = 0.01)

model.train()
pbar = tqdm(range(len(dataset)))
loss_sum = 0
n = 0
epoch_num = 35
for now_epoch in range(epoch_num):
    print('--------------epoch {}: ------------------'.format(now_epoch))
    for i in pbar:
        cfg_ft,nd_ft,nd_op,ind,target = dataset[i]
        cfg_ft,nd_ft,nd_op,ind,target = cfg_ft.to(device),nd_ft.to(device),nd_op.to(device),ind.to(device),target.to(device)

        out = model(cfg_ft,nd_ft,nd_op,ind)
        loss = criterion(out, target)
        loss.backward()
        torch.nn.utils.clip_grad_norm_(model.parameters(), 0.01)
        optimizer.step()

        loss_sum+=loss.item()
        n+=1
#         pbar.set_description(f'running loss: {(loss_sum/n):.6f},current loss: {(loss.item()):.6f}')

In [None]:
# Evaluate on validation dataset

dataset = TileDataset(tile_xla["valid"])
tile_xla_predictions = []
model.eval()

pbar = tqdm(range(len(dataset)))
for i in pbar:
    cfg_ft,nd_ft,nd_op,ind,target = dataset[i]
    cfg_ft,nd_ft,nd_op,ind,target = cfg_ft.to(device),nd_ft.to(device),nd_op.to(device),ind.to(device),target.to(device)
    
    out = model(cfg_ft,nd_ft,nd_op,ind)
    tile_xla_predictions.append(np.argsort(out.detach().numpy())[:5])

def score_tile(predictions, df):
    score = 0
    for i in range(len(df)):
        predbest = min(df.iloc[i]['config_runtime'][predictions[i]])
        best = min(df.iloc[i]['config_runtime'])
        score +=2 - predbest/best
    score /= len(df)
    return score
score_tile(tile_xla_predictions, tile_xla["valid"])

In [None]:
# Predict (only tile:xla predictions)

dataset = TileDataset(tile_xla["test"])
tile_xla_predictions = []
model.eval()
pbar = tqdm(range(len(dataset)))
for i in pbar:
    cfg_ft,nd_ft,nd_op,ind,target = dataset[i]
    cfg_ft,nd_ft,nd_op,ind,target = cfg_ft.to(device),nd_ft.to(device),nd_op.to(device),ind.to(device),target.to(device)
    
    out = model(cfg_ft,nd_ft,nd_op,ind)
    tile_xla_predictions.append(np.argsort(out.detach().numpy())[:5])

## Submission

In [None]:
sub = pd.read_csv('/kaggle/input/predict-ai-model-runtime/sample_submission.csv')
for i,filename in enumerate(tile_xla["test"]['file'].values):
    id = 'tile:xla:' +filename[:-4]
    sub.loc[sub.ID == id,'TopConfigs'] = ';'.join(tile_xla_predictions[i].astype(str))
sub.to_csv('submission.csv',index=False)
sub