# Action prediction on EASG using GNNs


In questo notebook sono poste le basi per eseguire il task di action prediction su EASG. Per rendere il dataset originale compatibile con PyTorch Geometric, è stato convertito in un insieme di tensori rappresentanti nodi e connessioni. Per eseguire l'embedding di verbi, oggetti e relazioni si è utilizzato un dizionario, convertendo la parola in questione nell'indice della riga dove è situata. I grafi sono rappresentati nel modo piu fedele possibile rispetto al dataset originale. Per rispettare la metodologia proposta nel paper, i grafi rappresentanti le scene sono suddivisi in sottografi rappresentanti un numero regolabile di grafi consecutivi. Sono state aggiunte, per massimizzare le informazioni alcune features. In primis, al nodo CW, sono state aggiunte le connessioni con oggetti non partecipanti nell'azione, in quanto possono essere parte di azioni successive.
Inoltre, nella creazione dei sottografi, è stata inclusa una relazione con gli oggetti visti nei grafi precedenti al sottografo in questione, in modo da tenere conto dell'esperienza pregressa dell'osservatore. Il task in questione rispetta quanto proposto nel paper, ma il modello che si utilizzerà sarà radicalmente diverso rispetto a GPT. I file originali sono stati modificati rimuovendo grafi non conformi e aggiungendo annotazioni necessarie.

## Setup

In [None]:
import json
import random
import os
import sys
import torch
import torch.nn.functional as F
os.environ['TORCH'] = torch.__version__
print(torch.__version__)

!pip install -q torch-scatter -f https://data.pyg.org/whl/torch-${TORCH}.html
!pip install -q torch-sparse -f https://data.pyg.org/whl/torch-${TORCH}.html
!pip install -q git+https://github.com/pyg-team/pytorch_geometric.git
!pip install wandb
!pip install torchmetrics
!pip install gensim

2.3.0+cu121
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m10.9/10.9 MB[0m [31m55.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.1/5.1 MB[0m [31m50.2 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
  Building wheel for torch-geometric (pyproject.toml) ... [?25l[?25hdone
Collecting wandb
  Downloading wandb-0.17.4-py3-none-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.9/6.9 MB[0m [31m20.7 MB/s[0m eta [36m0:00:00[0m
Collecting docker-pycreds>=0.4.0 (from wandb)
  Downloading docker_pycreds-0.4.0-py2.py3-none-any.whl (9.0 kB)
Collecting gitpython!=3.1.29,>=1.0.0 (from wandb)
  Downloading GitPython-3.1.43-py3-none-any.whl (207 kB)
[2K     [

In [None]:
base_path = ''

In [None]:
from google.colab import drive
drive.mount('/content/drive')
base_path = '/content/drive/MyDrive/EASG/'

Mounted at /content/drive


In [None]:
import gensim.downloader as api
w2v = api.load('word2vec-google-news-300')




## Files

In [None]:
with open(base_path + 'easg.json', 'r') as f:
    data = json.load(f)

In [None]:
def create_word_index_dict(file_path):
    word_index_dict = {}
    with open(file_path, 'r') as file:
        for index, line in enumerate(file, start=1):
            word = line.strip()
            word_index_dict[word] = index
    return word_index_dict

objects_path = base_path + 'objects.txt'
verbs_path = base_path + 'verbs.txt'
relationships_path = base_path + 'relationships.txt'

o_dict = create_word_index_dict(objects_path)
v_dict = create_word_index_dict(verbs_path)
r_dict = create_word_index_dict(relationships_path)

print(len(o_dict))
print(len(v_dict))
print(len(r_dict))

483
227
16


In [None]:
import pickle
verb_feats = torch.load(base_path + 'verb_features.pt')
with open(base_path + 'roi_feats_val.pkl', 'rb') as f:
  roi_val_feats = pickle.load(f)
with open(base_path + 'roi_feats_train.pkl', 'rb') as f:
  roi_train_feats = pickle.load(f)

## Utilities

In [None]:
def rindex(lst, item):
    try:
        return len(lst) - 1 - lst[::-1].index(item)
    except ValueError:
        return -1

In [None]:
def calc_possible_seq(seq_len, data_obj_list):
  non_valid = 0
  valid = 0
  tot_seq = 0
  for data in data_obj_list:
    if len(data.extra_features["graph_uids"]) <= seq_len:
      non_valid = non_valid + 1
    if len(data.extra_features["graph_uids"]) > seq_len:
      valid = valid + 1
      tot_seq = tot_seq + (len(data.extra_features["graph_uids"]) - seq_len)
  return non_valid, valid, tot_seq

In [None]:
def create_mask(original, id_list):
    if isinstance(original, list):
        return torch.tensor([x in id_list for x in original], dtype=torch.bool)
    elif isinstance(original, torch.Tensor):
        return torch.tensor([x.item() in id_list for x in original], dtype=torch.bool)
    else:
        return False

In [None]:
import torch.nn.init as init
def init_model(model):
    for name, param in model.named_parameters():
          if 'weight' in name:
            if 'lin' in name or 'linear' in name:
              init.kaiming_normal_(param, mode='fan_in', nonlinearity='relu')

## Parsing

In [None]:
def extract_from_sequence(seq, include_extracted_features = False):
  num_obj_features = 13
  num_extra_features = 1
  num_verb_features = 2305
  # One per sequence
  objects = [] #semplice contatore
  seen_objects = []
  CW_to_seen = [[], []]
  verbs = [] #funge da verb feature
  v_to_o_edges = [[], []] # usa index in object feature pper identificare gli oggetti
  v_to_o_attr = [] # usa index dell'edge per indetificare edge
  o_to_o_edges = [[],[]]
  o_to_o_attr = []
  time_edges = [[],[]] # usa index oggetti, non ha features
  dobj_edge = [[],[]]
  previous_graph_objects = {} # struttura di controllo
  current_graph_objects = {} # struttura di controllo
  current_seen_objects = {}
  objects_features = torch.empty(0, num_obj_features, dtype=torch.float) # unico index degli oggetti nella struttura finale
  seen_objects_features = torch.empty(0, num_obj_features, dtype=torch.float)
  graph_index_list = []
  verb_uids = []
  seen_obj_uid = []
  CW_uids = []
  extra_features = {"split": seq["split"], "video_uid": seq["video_uid"], "shape": [seq["W"], seq["H"]], "graph_uids": []}

  for g in seq['graphs']:
    extra_features['graph_uids'].append(g['graph_uid'])
    current_verb = "" # reset
    o_count = 0
    duplicate_index = -1
    CW_uids.append(extra_features['graph_uids'].index(g['graph_uid']))
    # Triplets
    for ent1, rel, ent2 in g['triplets']:
      if rel != 'verb':
        if ent1 in v_dict.keys(): # Verb su Object
          if current_verb == "": #prima tripla
            current_verb = v_dict[ent1]
            verbs.append(current_verb) #verbo è scelto per questo grafo
            verb_uids.append(extra_features['graph_uids'].index(g['graph_uid']))
          elif current_verb != v_dict[ent1]:
            print(f"Error a: two verbs present {ent1} {ent2} {current_verb}")
            print(g['graph_uid'])
            sys.exit()
          if rel == 'dobj':
            dobj_edge[0].append(len(verbs) - 1)
            if o_dict[ent2] not in current_graph_objects.keys():
              o_count = o_count + 1
              current_graph_objects[o_dict[ent2]] = len(objects)
              objects.append(o_dict[ent2])
            dobj_edge[1].append(rindex(objects,o_dict[ent2]))
          else:
            v_to_o_edges[0].append(len(verbs) - 1) #ci si riferisce al verbo tramite l'indice (è ultimo)

            if ":1" in ent2: # Gestire caso in cui :1 viene prima del'oggetto originale
              ent2 = ent2[:len(ent2)- 2]
              o_count = o_count + 1
              duplicate_index = len(objects)
              objects.append(o_dict[ent2])
              current_graph_objects[o_dict[ent2] + len(o_dict.keys())] = rindex(objects, o_dict[ent2]) #usato solo in costruzione grafo, nel tensore oggetto avrà indice corretto

            elif o_dict[ent2] not in current_graph_objects.keys():
              o_count = o_count + 1
              current_graph_objects[o_dict[ent2]] = len(objects)
              objects.append(o_dict[ent2])


            v_to_o_edges[1].append(rindex(objects,o_dict[ent2])) # Si usa l'indice dell'oggetto presente tra gli ultimi aggiunti
            v_to_o_attr.append(r_dict[rel])

        else:
          # controllare se oggetto è gia presente nei current objects,
          # in caso contrario si aggiunge.
          # Non dovrebbe esserci più oggetti dello stesso tipo nel grafo in cui si connettono tra loro oggetti
            if o_dict[ent1] not in current_graph_objects.keys():
                o_count = o_count + 1
                current_graph_objects[o_dict[ent1]] = len(objects)
                objects.append(o_dict[ent1])
            o_to_o_edges[0].append(current_graph_objects[o_dict[ent1]])
            if o_dict[ent2] not in current_graph_objects.keys():
                o_count = o_count + 1
                current_graph_objects[o_dict[ent2]] = len(objects)
                objects.append(o_dict[ent2])
            o_to_o_edges[1].append(current_graph_objects[o_dict[ent2]])
            o_to_o_attr.append(r_dict[rel])
      else:
        if current_verb == "":
          current_verb = v_dict[ent2]
          verbs.append(current_verb)
          verb_uids.append(extra_features['graph_uids'].index(g['graph_uid']))
        elif current_verb != v_dict[ent2]:
          print(f"Error b: two verbs present {ent1} {ent2} {current_verb}")
          print(g['graph_uid'])
          sys.exit()
    # Features
    for j in range(0, o_count):
      index = len(objects_features)
      objects_features = torch.cat((objects_features, torch.zeros(num_obj_features).unsqueeze(0)), dim=0)
      objects_features[index, 0] = objects[j]
    i = 1
    groundings = g['groundings']
    for frame in groundings:
        for key in groundings[frame].keys():
            values = list(groundings[frame][key].values())
            values = [float(v) for v in values]
            values = torch.tensor(values, dtype = torch.float)
            index = -1
            if ":1" in key and duplicate_index != -1:
                index = duplicate_index
                key = key[:len(key)-2]
            else:
                if key in o_dict.keys() and o_dict[key] in current_graph_objects.keys():
                    index = current_graph_objects[o_dict[key]]
            if index != -1:
                for j in range(0, 4):
                    objects_features[index, j + i] = values[j]
            else:
              if o_dict[key] not in current_seen_objects.keys():
                CW_to_seen[0].append(extra_features['graph_uids'].index(g['graph_uid']))
                CW_to_seen[1].append(len(seen_objects_features))
                current_seen_objects[o_dict[key]] = len(seen_objects_features)
                index = current_seen_objects[o_dict[key]]
                seen_objects_features = torch.cat((seen_objects_features, torch.zeros(num_obj_features).unsqueeze(0)), dim=0)
                seen_objects_features[-1][0] = o_dict[key]
                seen_obj_uid.append(extra_features['graph_uids'].index(g['graph_uid']))
              else:
                index = current_seen_objects[o_dict[key]]
              for j in range(0, 4):
                    seen_objects_features[index, j + i] = values[j]
        i = i + 4
    #ID assignment
    for j in range(0, o_count):
       graph_index_list.append(extra_features['graph_uids'].index(g['graph_uid']))
    #Time edges
    if previous_graph_objects != {}:
        for key in previous_graph_objects.keys():
            if key in current_graph_objects.keys():
                time_edges[0].append(previous_graph_objects[key])
                time_edges[1].append(current_graph_objects[key])
    previous_graph_objects = current_graph_objects
    current_graph_objects = {}
    current_seen_objects = {}
  #v to v
  v_to_v = [[], []]
  for i in range(len(verbs) - 1):
    v_to_v[0].append(i)
    v_to_v[1].append(i + 1)
  # Verbs features
  verbs_features = torch.tensor(verbs, dtype=torch.float)
  if include_extracted_features:
    verbs_features = torch.empty(0, num_verb_features, dtype=torch.float)
    for i in range(len(verbs)):
      t = torch.cat((torch.tensor([verbs[i]]), verb_feats[extra_features['graph_uids'][verb_uids[i]]]))
      verbs_features = torch.cat((verbs_features, t.unsqueeze(0)), dim=0)
  # Seen objects
  s_index = len(objects_features)
  objects_features = torch.cat((objects_features, seen_objects_features), dim=0)
  obj_graph_uids = torch.tensor(graph_index_list + seen_obj_uid, dtype=torch.int64)
  for i in range(len(CW_to_seen[0])):
    CW_to_seen[1][i] = CW_to_seen[1][i] + s_index

  dictionary = {}
  if include_extracted_features:
    dictionary['verbs_features'] = verbs_features
  else:
    dictionary['verbs_features'] = verbs_features.unsqueeze(1)
  dictionary['obj_features'] = objects_features
  # dictionary['seen_obj_features'] = seen_objects_features
  dictionary['v_to_o'] = torch.tensor(v_to_o_edges, dtype=torch.int64)
  dictionary['v_to_o_attr'] = torch.tensor(v_to_o_attr, dtype=torch.float)
  dictionary['dobj_edge'] = torch.tensor(dobj_edge, dtype=torch.int64)
  dictionary['o_to_o'] = torch.tensor(o_to_o_edges, dtype=torch.int64)
  dictionary['o_to_o_attr'] = torch.tensor(o_to_o_attr, dtype=torch.float)
  dictionary['time_edges'] = torch.tensor(time_edges, dtype=torch.int64)
  dictionary['v_to_v'] = torch.tensor(v_to_v, dtype=torch.int64)
  dictionary['CW_to_seen'] = torch.tensor(CW_to_seen, dtype=torch.int64)
  dictionary['obj_graph_uids'] = obj_graph_uids
  dictionary['verb_graph_uids'] = torch.tensor(verb_uids, dtype=torch.int64)
  # dictionary['seen_obj_graph_uids'] = torch.tensor(seen_obj_uid, dtype=torch.int64)
  dictionary['CW_graph_uids'] = torch.tensor(CW_uids, dtype=torch.int64)
  dictionary['extra'] = extra_features

  return dictionary

In [None]:
from torch_geometric.data import HeteroData

def createHeteroData(source):
  data = HeteroData()

  data['verb'].x = source['verbs_features']
  data['object'].x = source['obj_features']
  # data['seen_object'].x = source['seen_obj_features']
  data['CW'].num_nodes = len(source['verbs_features'])

  data['verb', 'rel', 'object'].edge_index = source['v_to_o']
  data['verb', 'rel', 'object'].edge_attr = source['v_to_o_attr']
  data['verb', 'dobj', 'object'].edge_index = source['dobj_edge']
  data['object', 'rel', 'object'].edge_index = source['o_to_o']
  data['object', 'rel', 'object'].edge_attr = source['o_to_o_attr']
  data['object', 'time', 'object'].edge_index = source['time_edges']
  data['verb', 'next', 'verb'].edge_index = source['v_to_v']
  data['CW', 'sees', 'object'].edge_index = source['CW_to_seen']

  data['object'].extra_features = source['obj_graph_uids']
  data['verb'].extra_features = source['verb_graph_uids']
  # data['seen_object'].extra_features = source['seen_obj_graph_uids']
  data['CW'].extra_features = source['CW_graph_uids']

  data.extra_features = source['extra']
  return data


In [None]:
def from_json_to_heteroData_list(data):
  var = 0
  data_list = []
  for d in data.keys():
    graph_dict = extract_from_sequence(data[d], include_extracted_features)
    if var == 0:
      var += 1
    h_data = createHeteroData(graph_dict)
    if var == 1:
      var = 2
    data_list.append(h_data)
  return data_list

In [None]:
import copy

def extract_subgraph(original, span, include_future_object = False, word2vec = False):
  label = {'verb': -1, 'object': -1}
  target_index = span[-1] + 1
  verb_idx = torch.nonzero(create_mask(original['verb'].extra_features, [target_index]))
  label['verb'] = int(original['verb'].x[int(verb_idx)][0])
  for index in range(len(original['verb', 'dobj', 'object'].edge_index[0])):
    if original['verb', 'dobj', 'object'].edge_index[0][index] == verb_idx:
      label['object'] = int(original['object'].x[original['verb', 'dobj', 'object'].edge_index[1][index]][0])
      break
  """
  label_objects = torch.nonzero(create_mask(original['object'].extra_features, [target_index]))
  for index in range(len(original['verb', 'rel', 'object'].edge_index[0])):
    if original['verb', 'rel', 'object'].edge_index[1][index] in label_objects and original['verb', 'rel', 'object'].edge_attr[index] == r_dict['dobj']:
      label['object'] = int(original['object'].x[original['verb', 'rel', 'object'].edge_index[1][index]][0])
      break"""
  if(label['object'] == -1):
    print("NO LABEL")
    sys.exit()
  masks = {}
  for node_type in original.node_types:
    masks[node_type] = create_mask(original[node_type].extra_features, span)
  subgraph_data = original.subgraph(masks)
  subgraph_data.extra_features = original.extra_features.copy()
  subgraph_data.extra_features['graph_uids'] = copy.deepcopy(original.extra_features['graph_uids'][span[0]:span[0]+len(span)])
  subgraph_data.extra_features['label_uid'] = original.extra_features['graph_uids'][target_index]
  inv_mask = ~masks['object']

  if not include_future_object:
    inv_mask[int(torch.nonzero(masks['object'])[0]):] = False
  other_obj = original['object'].x[inv_mask]
  other_obj_extra = original['object'].extra_features[inv_mask]
  other_edges = [[],[]]
  s_index = len(subgraph_data['object'].x)
  subgraph_data['object'].x = torch.cat((subgraph_data['object'].x, other_obj), dim=0)
  subgraph_data['object'].extra_features = torch.cat((subgraph_data['object'].extra_features, other_obj_extra), dim=0)
  for i in range(len(other_obj)):
    for j in range(len(span)):
      other_edges[0].append(i + s_index)
      other_edges[1].append(j)
  subgraph_data['object', 'has_seen', 'CW'].edge_index = torch.tensor(other_edges, dtype=torch.int64)
  other_edges.reverse()
  subgraph_data['CW', 'has_seen', 'object'].edge_index = torch.tensor(other_edges, dtype=torch.int64)
  subgraph_data.y_verb = torch.tensor([label['verb']], dtype=torch.int64)
  subgraph_data.y_obj = torch.tensor([label['object']], dtype=torch.int64)
  subgraph_data['CW'].x = torch.zeros(subgraph_data['CW'].num_nodes, 1, dtype=torch.float)

  if word2vec:
    subgraph_data = create_embeddings(subgraph_data)
  return remove_empty(subgraph_data)

In [None]:
def remove_empty(data):
  to_remove = []
  for node_type in data.node_types:
    if 'x' in data[node_type].keys() and data[node_type].x.size(0) == 0:
      to_remove.append(node_type)

  for edge_typpe in data.edge_types:
    if 'edge_index' in data[node_type].keys() and data[node_type].edge_index.size(0) == 0:
      to_remove.append(node_type)

  for node_type in to_remove:
    del data[node_type]
  return data

In [None]:
rev_o_dict = {v: k for k, v in o_dict.items()}
rev_v_dict = {v: k for k, v in v_dict.items()}
rev_r_dict = {v: k for k, v in r_dict.items()}
rev_r_dict[r_dict['dobj']] ='direct object'
rev_r_dict[r_dict['to']] ='To'
rev_v_dict[v_dict['unhang']] = 'not hang'

def create_embedding(label):
  embeddings= []
  for word in label.split():
    embeddings.append(torch.tensor(w2v[word]))
  final_embedding = []
  for embd in embeddings:
    if final_embedding == []:
      final_embedding = torch.tensor(embd, dtype=torch.float)
    else:
      final_embedding = torch.mean(torch.stack([final_embedding, embd], dim=0), dim=0)
  return final_embedding

def create_embeddings(hData):
  for node_type, y in hData.node_items():
    u_dict = rev_o_dict
    if 'x' in y.keys() and node_type != 'CW':
      if hData[node_type].x.numel() != 0:
        new_x = torch.empty(0, len(hData[node_type].x[0]) + 299, dtype=torch.float)
        if node_type == 'verb':
          u_dict = rev_v_dict
        for i in range(0, len(hData[node_type].x)):
          obj_embedding = create_embedding(u_dict[int(hData[node_type].x[i][0])])
          existing_features = hData[node_type].x[i][1:]
          obj = torch.cat((obj_embedding, existing_features), dim=0)
          new_x = torch.cat((new_x, obj.unsqueeze(0)), dim=0)
        hData[node_type].x = new_x
  for edge_type, y in hData.edge_items():
    if 'edge_attr' in y.keys() and len(hData[edge_type].edge_attr) >0:
      original_t = hData[edge_type].edge_attr
      embeddings = []
      for i in range(len(hData[edge_type].edge_attr)):
        embeddings.append(create_embedding(rev_r_dict[int(hData[edge_type].edge_attr[i])]))
      hData[edge_type].edge_attr = torch.stack(embeddings)
  return hData

In [None]:
from torch_geometric.data import Dataset

class HeteroDataset(Dataset):
    def __init__(self, data_list):
        super().__init__()
        self.data_list = data_list
        self._indices = None

    def len(self):
        return len(self.data_list)

    def get(self, idx):
        return self.data_list[idx]

In [None]:
def create_samples(data_obj_list, seq_length):
  data_list = []
  id_lists = []
  for hData in data_obj_list:
    if len(hData.extra_features["graph_uids"]) > seq_length:
      id_lists.append(hData.extra_features["video_uid"])
      for i in range(len(hData.extra_features["graph_uids"]) - seq_length):
        data_list.append(extract_subgraph(hData, range(i, i + seq_length), include_future_object, word2vec=word2vec))
  return data_list

## Hyperparameters

In [None]:
seq_length = 20
samples = 3000

include_future_object = True
word2vec= True
include_extracted_features = True

out_channels = (len(v_dict), len(o_dict))

In [None]:
batch_size = 10
epochs = 20
epoch_max = 10
lr = 0.01
momentum = 0.9
num_layers = 4
hidden_channels = 256
extra_features = False
trained_loss = True
init_weights = False
loss_weights = (1,2)

In [None]:
device ='cpu'
if torch.cuda.is_available():
  device = 'cuda'
print(f"The active device is {device}")

The active device is cuda


## Models

### Classifier

In [None]:
from torch_geometric.nn import Linear, to_hetero
from torch.nn import Softmax, Parameter

class GNNWithClassifier(torch.nn.Module):
    def __init__(self, metadata, hidden_channels, out_channels):
        super().__init__()

        model = GNN(hidden_channels, hidden_channels)
        self.gnn = to_hetero(model, metadata, aggr='sum')
        self.linear_verb = Linear(hidden_channels, out_channels[0])
        self.linear_obj = Linear(hidden_channels, out_channels[1])

        self.alpha = Parameter(torch.tensor(1.0), requires_grad=True)
        self.beta = Parameter(torch.tensor(1.0), requires_grad=True)

    def forward(self, x, edge_index):
        x = self.gnn(x, edge_index)

        x_verb = self.linear_verb(x['verb'])
        x_obj = self.linear_obj(x['object'])

        return x_verb, x_obj

### SAGEConv

In [None]:
from torch_geometric.nn import Linear, to_hetero, SAGEConv
from torch.nn import Softmax, Parameter
import torch.nn.init as init

class SAGEConvGNN(torch.nn.Module):
    def __init__(self, hidden_channels, out_channels):
        super().__init__()
        self.conv1 = SAGEConv((-1, -1), hidden_channels)
        self.conv2 = SAGEConv((-1, -1), hidden_channels)
        self.conv3 = SAGEConv((-1, -1), out_channels)
    def forward(self, x, edge_index):
        x = self.conv1(x, edge_index).relu()
        x = self.conv2(x, edge_index).relu()
        x = self.conv3(x, edge_index)
        return x

class SAGEWithClassifier(torch.nn.Module):
    def __init__(self, metadata, hidden_channels, out_channels):
        super().__init__()
        model = SAGEConvGNN(hidden_channels, hidden_channels)
        self.gnn = to_hetero(model, metadata, aggr='sum')

        self.lin1 = Linear(-1, hidden_channels)
        self.lin2 = Linear(-1, hidden_channels)
        self.lin3 = Linear(-1, hidden_channels)

        self.linear_verb = Linear(-1, out_channels[0])
        self.linear_obj = Linear(-1, out_channels[1])

        self.theta = Parameter(torch.randn(1))
        self.register_buffer('alpha', torch.zeros(1))
        self.register_buffer('beta', torch.zeros(1))

    def forward(self, x, edge_index):
        self.beta = torch.sigmoid(self.theta)
        self.alpha = 1 - self.beta

        x = self.gnn(x, edge_index)

        x_verb = self.lin1(x['verb']).relu()
        x_verb = self.lin2(x_verb).relu()
        x_verb = self.lin3(x_verb).relu()
        x_verb = self.linear_verb(x_verb)

        x_obj = self.lin1(x['object']).relu()
        x_obj = self.lin2(x_obj).relu()
        x_obj = self.lin3(x_obj).relu()
        x_obj = self.linear_obj(x_obj)

        return x_verb, x_obj

### GAT


In [None]:
from torch_geometric.nn import GAT
from torch_geometric.nn import Linear, to_hetero
from torch.nn import Softmax, Parameter


class GATWithClassifier(torch.nn.Module):
    def __init__(self, metadata, hidden_channels, out_channels):
        super().__init__()

        model = GAT((-1, -1), hidden_channels, out_channels=hidden_channels, num_layers=num_layers, add_self_loops=False)
        self.gnn = to_hetero(model, metadata, aggr='sum')
        self.lin1 = Linear(hidden_channels, hidden_channels)
        self.lin2 = Linear(hidden_channels, hidden_channels)
        self.lin3 = Linear(hidden_channels, hidden_channels)
        self.linear_verb = Linear(hidden_channels, out_channels[0])
        self.linear_obj = Linear(hidden_channels, out_channels[1])

        self.theta = Parameter(torch.randn(1))
        self.register_buffer('alpha', torch.zeros(1))
        self.register_buffer('beta', torch.zeros(1))

    def forward(self, x, edge_index, edge_attr=None):
        self.beta = torch.sigmoid(self.theta)
        self.alpha = 1 - self.beta

        if edge_attr == None:
          x = self.gnn(x, edge_index)
        else:
          x = self.gnn(x, edge_index, edge_attr)

        x_verb = self.lin1(x['verb']).relu()
        x_verb = self.lin2(x_verb).relu()
        x_verb = self.lin3(x_verb).relu()
        x_verb = self.linear_verb(x_verb)

        x_obj = self.lin1(x['object']).relu()
        x_obj = self.lin2(x_obj).relu()
        x_obj = self.lin3(x_obj).relu()
        x_obj = self.linear_obj(x_obj)

        return x_verb, x_obj

### Transformer


In [None]:
from torch_geometric.nn import Linear, to_hetero, TransformerConv
from torch.nn import Softmax, Parameter
import torch.nn.init as init

class TransformerConvGNN(torch.nn.Module):
    def __init__(self, hidden_channels, out_channels):
        super().__init__()
        self.conv1 = TransformerConv((-1, -1), hidden_channels, heads=num_layers)
        self.conv2 = TransformerConv((-1, -1), hidden_channels, heads=num_layers)
        self.conv3 = TransformerConv((-1, -1), out_channels, heads=num_layers)
    def forward(self, x, edge_index):
        x = self.conv1(x, edge_index).relu()
        x = self.conv2(x, edge_index).relu()
        x = self.conv3(x, edge_index)
        return x

class TransformerWithClassifier(torch.nn.Module):
    def __init__(self, metadata, hidden_channels, out_channels):
        super().__init__()
        model = TransformerConvGNN(hidden_channels, hidden_channels)
        self.gnn = to_hetero(model, metadata, aggr='sum')

        self.lin1 = Linear(-1, hidden_channels)
        self.lin2 = Linear(-1, hidden_channels)
        self.lin3 = Linear(-1, hidden_channels)

        self.linear_verb = Linear(-1, out_channels[0])
        self.linear_obj = Linear(-1, out_channels[1])

        self.theta = Parameter(torch.randn(1))
        self.register_buffer('alpha', torch.zeros(1))
        self.register_buffer('beta', torch.zeros(1))

    def forward(self, x, edge_index):
        self.beta = torch.sigmoid(self.theta)
        self.alpha = 1 - self.beta

        x = self.gnn(x, edge_index)

        x_verb = self.lin1(x['verb']).relu()
        x_verb = self.lin2(x_verb).relu()
        x_verb = self.lin3(x_verb).relu()
        x_verb = self.linear_verb(x_verb)

        x_obj = self.lin1(x['object']).relu()
        x_obj = self.lin2(x_obj).relu()
        x_obj = self.lin3(x_obj).relu()
        x_obj = self.linear_obj(x_obj)

        return x_verb, x_obj

## Loops

In [None]:
import torch.nn.functional as F

def process_logits(verb_logits, object_logits, batch_dict):
  batches = torch.unique(batch_dict['verb'])
  v_temp_list = []
  o_temp_list = []
  for b in batches:

    v_indices = torch.nonzero(batch_dict['verb'] == b, as_tuple=False).squeeze(1)
    v_temp = torch.mean(verb_logits[v_indices], dim=0, keepdim=True)
    v_temp_list.append(v_temp)

    o_indices = torch.nonzero(batch_dict['object'] == b, as_tuple=False).squeeze(1)
    o_temp = torch.mean(object_logits[o_indices], dim=0, keepdim=True)
    o_temp_list.append(o_temp)

  v_temp_matrix = torch.cat(v_temp_list, dim=0)
  v_temp_matrix = F.softmax(v_temp_matrix, dim=1)

  o_temp_matrix = torch.cat(o_temp_list, dim=0)
  o_temp_matrix = F.softmax(o_temp_matrix, dim=1)

  return v_temp_matrix, o_temp_matrix

In [None]:
from tqdm import tqdm
import wandb

def train(model, train_loader, optimizer, scheduler, device = 'cpu', epochs = 10, val_loader = None, extra_features = False, trained_loss = False, loss_weights=(1,1)):
    model = model.to(device)

    history = []
    for epoch in range(epochs):
        model.train()

        total_loss = 0
        total_verb_correct_1 = 0
        total_object_correct_1 = 0
        total_verb_correct_5 = 0
        total_object_correct_5 = 0
        total_action_correct_1 = 0
        total_action_correct_5 = 0
        total_verb_loss = 0
        total_object_loss = 0

        for data in tqdm(train_loader, desc="Training", unit="batch"):
            data = data.to(device)
            optimizer.zero_grad()

            if extra_features:
              verb_logits, object_logits = model(data.x_dict, data.edge_index_dict, data.edge_attr_dict)
            else:
              verb_logits, object_logits = model(data.x_dict, data.edge_index_dict)
            n_verb_logits, n_object_logits = process_logits(verb_logits, object_logits, data.batch_dict)
            loss1 = F.cross_entropy(n_verb_logits, data.y_verb)
            loss2 = F.cross_entropy(n_object_logits, data.y_obj)
            if trained_loss:
              loss = (10 * model.alpha * loss1) + (model.beta * loss2 * 10)
            else:
              loss = loss_weights[0] * loss1 + loss_weights[1] * loss2
            total_verb_loss += loss1.item()
            total_object_loss += loss2.item()
            total_loss += loss.item()

            loss.backward()
            optimizer.step()

            _, verb_logits_topk = torch.topk(n_verb_logits, 5, largest=True, sorted=True)
            _, object_logits_topk = torch.topk(n_object_logits, 5, largest=True, sorted=True)

            total_verb_correct_1 += (verb_logits_topk[:, 0] == data.y_verb).sum().item()
            total_object_correct_1 += (object_logits_topk[:, 0] == data.y_obj).sum().item()
            total_action_correct_1 += ((verb_logits_topk[:, 0] == data.y_verb) & (object_logits_topk[:, 0] == data.y_obj)).sum().item()

            total_verb_correct_5 += sum([data.y_verb[i] in verb_logits_topk[i] for i in range(data.y_verb.size(0))])
            total_object_correct_5 += sum([data.y_obj[i] in object_logits_topk[i] for i in range(data.y_obj.size(0))])
            total_action_correct_5 += sum([(data.y_verb[i] in verb_logits_topk[i]) & (data.y_obj[i] in object_logits_topk[i]) for i in range(data.y_obj.size(0))])

        avg_loss = total_loss / len(train_dataset)
        train_metrics = {
            'lr': optimizer.param_groups[0]['lr'],
            'train_loss': avg_loss,
            'train_verb_loss': total_verb_loss,
            'train_obj_loss': total_object_loss,
            'train_v_acc_1': total_verb_correct_1 / len(train_dataset),
            'train_o_acc_1': total_object_correct_1 / len(train_dataset),
            'train_a_acc_1': total_action_correct_1 / len(train_dataset),
            'train_v_acc_5': total_verb_correct_5 / len(train_dataset),
            'train_o_acc_5': total_object_correct_5 / len(train_dataset),
            'train_a_acc_5': total_action_correct_5 / len(train_dataset)
        }
        if trained_loss:
          train_metrics['loss alpha'] = model.alpha.item()
          train_metrics['loss beta'] = model.beta.item()
        if val_loader is not None:
          val_metrics = validate(model, val_loader, device, extra_features, trained_loss, loss_weights)

        history.append({**train_metrics, **val_metrics})
        wandb.log({**train_metrics, **val_metrics})
        scheduler.step()
        print(f"Epoch {epoch+1}/{epochs}, "
              f"Train Loss: {avg_loss:.4f}, "
              f"Verb Acc@1: {train_metrics['train_v_acc_1']*100:.2f}%, "
              f"Object Acc@1: {train_metrics['train_o_acc_1']*100:.2f}%, "
              f"Action Acc@1: {train_metrics['train_a_acc_1']*100:.2f}%, "
              f"Verb Acc@5: {train_metrics['train_v_acc_5']*100:.2f}%, "
              f"Object Acc@5: {train_metrics['train_o_acc_5']*100:.2f}%, "
              f"Action Acc@5: {train_metrics['train_a_acc_5']*100:.2f}%")
        if val_dataset is not None:
          print(f"Validation Loss: {val_metrics['val_loss']:.4f}, "
                f"Verb Acc@1: {val_metrics['val_v_acc_1']*100:.2f}%, "
                f"Object Acc@1: {val_metrics['val_o_acc_1']*100:.2f}%, "
                f"Action Acc@1: {val_metrics['val_a_acc_1']*100:.2f}%, "
                f"Verb Acc@5: {val_metrics['val_v_acc_5']*100:.2f}%, "
                f"Object Acc@5: {val_metrics['val_o_acc_5']*100:.2f}%, "
                f"Action Acc@5: {val_metrics['val_a_acc_5']*100:.2f}%")
          print(f"Loss alpha: {model.alpha},"
                f"Loss beta: {model.beta}")
    return history

def validate(model, val_loader, device, extra_features=False, trained_loss = False, loss_weights=(1,1)):
    model.eval()

    total_loss = 0
    total_verb_correct_1 = 0
    total_object_correct_1 = 0
    total_verb_correct_5 = 0
    total_object_correct_5 = 0
    total_action_correct_1 = 0
    total_action_correct_5 = 0
    total_verb_loss = 0
    total_object_loss = 0

    with torch.no_grad():
        for data in tqdm(val_loader, desc="Evaluating", unit="batch"):
            data = data.to(device)

            if extra_features:
              verb_logits, object_logits = model(data.x_dict, data.edge_index_dict, data.edge_attr_dict)
            else:
              verb_logits, object_logits = model(data.x_dict, data.edge_index_dict)
            n_verb_logits, n_object_logits = process_logits(verb_logits, object_logits, data.batch_dict)
            loss1 = F.cross_entropy(n_verb_logits, data.y_verb)
            loss2 = F.cross_entropy(n_object_logits, data.y_obj)
            if trained_loss:
              loss = (10 * model.alpha * loss1) + (model.beta * loss2 * 10)
            else:
              loss = loss_weights[0] * loss1 + loss_weights[1] * loss2
            total_loss += loss.item()

            _, verb_logits_topk = torch.topk(n_verb_logits, 5, largest=True, sorted=True)
            _, object_logits_topk = torch.topk(n_object_logits, 5, largest=True, sorted=True)

            total_verb_correct_1 += (verb_logits_topk[:, 0] == data.y_verb).sum().item()
            total_object_correct_1 += (object_logits_topk[:, 0] == data.y_obj).sum().item()
            total_action_correct_1 += ((verb_logits_topk[:, 0] == data.y_verb) & (object_logits_topk[:, 0] == data.y_obj)).sum().item()

            total_verb_correct_5 += sum([data.y_verb[i] in verb_logits_topk[i] for i in range(data.y_verb.size(0))])
            total_object_correct_5 += sum([data.y_obj[i] in object_logits_topk[i] for i in range(data.y_obj.size(0))])
            total_action_correct_5 += sum([(data.y_verb[i] in verb_logits_topk[i]) & (data.y_obj[i] in object_logits_topk[i]) for i in range(data.y_obj.size(0))])


    avg_loss = total_loss / len(val_dataset)
    val_metrics = {
        'val_loss': avg_loss,
        'val_verb_loss': total_verb_loss,
        'val_obj_loss': total_object_loss,
        'val_v_acc_1': total_verb_correct_1 / len(val_dataset),
        'val_o_acc_1': total_object_correct_1 / len(val_dataset),
        'val_a_acc_1': total_action_correct_1 / len(val_dataset),
        'val_v_acc_5': total_verb_correct_5 / len(val_dataset),
        'val_o_acc_5': total_object_correct_5 / len(val_dataset),
        'val_a_acc_5': total_action_correct_5 / len(val_dataset)
    }

    return val_metrics

def test(model, test_loader, device, extra_features=False, trained_loss = False, loss_weights=(1,1)):
    model.eval()

    total_loss = 0
    total_verb_correct_1 = 0
    total_object_correct_1 = 0
    total_verb_correct_5 = 0
    total_object_correct_5 = 0
    total_action_correct_1 = 0
    total_action_correct_5 = 0
    total_verb_loss = 0
    total_object_loss = 0

    with torch.no_grad():
        for data in tqdm(val_loader, desc="Testing", unit="batch"):
            data = data.to(device)

            if extra_features:
              verb_logits, object_logits = model(data.x_dict, data.edge_index_dict, data.edge_attr_dict)
            else:
              verb_logits, object_logits = model(data.x_dict, data.edge_index_dict)
            n_verb_logits, n_object_logits = process_logits(verb_logits, object_logits, data.batch_dict)
            loss1 = F.cross_entropy(n_verb_logits, data.y_verb)
            loss2 = F.cross_entropy(n_object_logits, data.y_obj)
            if trained_loss:
              loss = (10 * model.alpha * loss1) + (model.beta * loss2 * 10)
            else:
              loss = loss_weights[0] * loss1 + loss_weights[1] * loss2
            total_loss += loss.item()

            _, verb_logits_topk = torch.topk(n_verb_logits, 5, largest=True, sorted=True)
            _, object_logits_topk = torch.topk(n_object_logits, 5, largest=True, sorted=True)

            total_verb_correct_1 += (verb_logits_topk[:, 0] == data.y_verb).sum().item()
            total_object_correct_1 += (object_logits_topk[:, 0] == data.y_obj).sum().item()
            total_action_correct_1 += ((verb_logits_topk[:, 0] == data.y_verb) & (object_logits_topk[:, 0] == data.y_obj)).sum().item()

            total_verb_correct_5 += sum([data.y_verb[i] in verb_logits_topk[i] for i in range(data.y_verb.size(0))])
            total_object_correct_5 += sum([data.y_obj[i] in object_logits_topk[i] for i in range(data.y_obj.size(0))])
            total_action_correct_5 += sum([(data.y_verb[i] in verb_logits_topk[i]) & (data.y_obj[i] in object_logits_topk[i]) for i in range(data.y_obj.size(0))])


    avg_loss = total_loss / len(val_dataset)
    test_metrics = {
        'test_loss': avg_loss,
        'test_verb_loss': total_verb_loss,
        'test_obj_loss': total_object_loss,
        'test_v_acc_1': total_verb_correct_1 / len(val_dataset),
        'test_o_acc_1': total_object_correct_1 / len(val_dataset),
        'test_a_acc_1': total_action_correct_1 / len(val_dataset),
        'test_v_acc_5': total_verb_correct_5 / len(val_dataset),
        'test_o_acc_5': total_object_correct_5 / len(val_dataset),
        'test_a_acc_5': total_action_correct_5 / len(val_dataset)
    }
    print(f"Validation Loss: {test_metrics['test_loss']:.4f}, "
                f"Verb Acc@1: {test_metrics['test_v_acc_1']*100:.2f}%, "
                f"Object Acc@1: {test_metrics['test_o_acc_1']*100:.2f}%, "
                f"Action Acc@1: {test_metrics['test_a_acc_1']*100:.2f}%, "
                f"Verb Acc@5: {test_metrics['test_v_acc_5']*100:.2f}%, "
                f"Object Acc@5: {test_metrics['test_o_acc_5']*100:.2f}%, "
                f"Action Acc@5: {test_metrics['test_a_acc_5']*100:.2f}%")
    wandb.log({**test_metrics})


## Experiments

#### Baseline

In [None]:
batch_size = 10
epochs = 20
epoch_max = 10
lr = 0.01
momentum = 0.9
num_layers = 4
hidden_channels = 256
extra_features = False
trained_loss = True
init_weights = False
loss_weights = (1,2)

seq_length = 20
samples = 3000

include_future_object = True
word2vec= True
include_extracted_features = True

out_channels = (len(v_dict), len(o_dict))

#### No extra

In [None]:
batch_size = 10
epochs = 20
epoch_max = 10
lr = 0.01
momentum = 0.9
num_layers = 4
hidden_channels = 256
extra_features = False
trained_loss = True
init_weights = False
loss_weights = (1,2)

seq_length = 20
samples = 3000

include_future_object = True
word2vec= True
include_extracted_features = False #<----------

out_channels = (len(v_dict), len(o_dict))

#### No Word2Vec

In [None]:
batch_size = 10
epochs = 20
epoch_max = 10
lr = 0.01
momentum = 0.9
num_layers = 4
hidden_channels = 256
extra_features = False
trained_loss = True
init_weights = False
loss_weights = (1,2)

seq_length = 20
samples = 3000

include_future_object = True
word2vec= False #<----------
include_extracted_features = True

out_channels = (len(v_dict), len(o_dict))

#### No future object

In [None]:
batch_size = 10
epochs = 20
epoch_max = 10
lr = 0.01
momentum = 0.9
num_layers = 4
hidden_channels = 256
extra_features = False
trained_loss = True
init_weights = False
loss_weights = (1,2)

seq_length = 20
samples = 3000

include_future_object = False #<----------
word2vec= True
include_extracted_features = True

out_channels = (len(v_dict), len(o_dict))

#### No trained loss

In [None]:
batch_size = 10
epochs = 20
epoch_max = 10
lr = 0.01
momentum = 0.9
num_layers = 4
hidden_channels = 256
extra_features = False
trained_loss = False #<----------
init_weights = False
loss_weights = (1,2)

seq_length = 20
samples = 3000

include_future_object = False
word2vec= True
include_extracted_features = True

out_channels = (len(v_dict), len(o_dict))

## Main

### Working

In [None]:
data_obj_list = from_json_to_heteroData_list(data)

In [None]:
non_valid, valid, tot = calc_possible_seq(seq_length, data_obj_list)
print(f"{tot} sequences available from {valid} valid scenes. \n{non_valid} non valid scens for a sequence length of {seq_length}")
dataset = HeteroDataset(create_samples(data_obj_list, seq_length))
metadata = dataset[0].metadata()

3029 sequences available from 106 valid scenes. 
115 non valid scens for a sequence length of 20


In [None]:
seed = random.randint(0, 10000)
print(f"The seed is: {seed}")
random.seed(seed)

subset = random.sample(dataset.data_list, samples)

random.shuffle(subset)

num_train = int(0.70 * len(subset))  # 70% for training
num_val = int(0.15 * len(subset))    # 15% for validation
num_test = len(subset) - num_train - num_val  # 15% for testing

train_data = subset[:num_train]
val_data = subset[num_train:num_train + num_val]
test_data = subset[num_train + num_val:]

train_dataset = HeteroDataset(train_data)
val_dataset = HeteroDataset(val_data)
test_dataset = HeteroDataset(test_data)

# Print sizes to verify
print(f"Training set size: {len(train_data)}")
print(f"Validation set size: {len(val_data)}")
print(f"Testing set size: {len(test_data)}")


The seed is: 3260
Training set size: 2100
Validation set size: 450
Testing set size: 450


In [None]:
from torch_geometric.loader import DataLoader

train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=True)

In [None]:
batch = next(iter(train_loader))
batch = batch.to(device)
model = TransformerWithClassifier(metadata, hidden_channels, out_channels).to(device)
a, b = model(batch.x_dict, batch.edge_index_dict)
if init_weights:
  init_model(model)

In [None]:
import torch.optim as optim
from torch.optim.lr_scheduler import StepLR

optimizer = optim.SGD(model.parameters(), lr=lr, momentum=momentum)
total_epochs = epochs

def lr_lambda(epoch):
    if epoch + 1 < epoch_max:
        return epoch / epoch_max
    else:
        return (total_epochs - epoch) / (total_epochs - epoch_max)

scheduler = optim.lr_scheduler.LambdaLR(optimizer, lr_lambda)

In [None]:
import wandb
wandb.login()
wandb.init(
        settings=wandb.Settings(start_method="fork"),
        project="EASG GNN",
        config={
            "epochs": epochs,
            "batch_size": batch_size,
            "lr": lr,
            "model_name": type(model).__name__,
            "sequence_length": seq_length,
            "extra_features": extra_features,
            "trained_loss": trained_loss,
            "init_weights": init_weights,
            "loss_weights": loss_weights,
            "word2vec": word2vec,
            "other objects": include_future_object,
            "extracted features": include_extracted_features
            })

config = wandb.config
history = train(model, train_loader, optimizer, scheduler, device, epochs, val_loader=val_loader, extra_features=extra_features, loss_weights=loss_weights)
test(model, test_loader, device, extra_features=extra_features, loss_weights=loss_weights)
wandb.finish()

Training: 100%|██████████| 210/210 [00:36<00:00,  5.76batch/s]
Evaluating: 100%|██████████| 45/45 [00:03<00:00, 12.59batch/s]


Epoch 1/20, Train Loss: 1.7791, Verb Acc@1: 0.00%, Object Acc@1: 0.00%, Action Acc@1: 0.00%, Verb Acc@5: 0.57%, Object Acc@5: 0.00%, Action Acc@5: 0.00%
Validation Loss: 1.7791, Verb Acc@1: 0.00%, Object Acc@1: 0.00%, Action Acc@1: 0.00%, Verb Acc@5: 0.00%, Object Acc@5: 0.00%, Action Acc@5: 0.00%
Loss alpha: tensor([0.3126], device='cuda:0'),Loss beta: tensor([0.6874], device='cuda:0')


Training: 100%|██████████| 210/210 [00:36<00:00,  5.74batch/s]
Evaluating: 100%|██████████| 45/45 [00:03<00:00, 12.71batch/s]


Epoch 2/20, Train Loss: 1.7792, Verb Acc@1: 0.00%, Object Acc@1: 0.00%, Action Acc@1: 0.00%, Verb Acc@5: 0.57%, Object Acc@5: 0.00%, Action Acc@5: 0.00%
Validation Loss: 1.7792, Verb Acc@1: 0.00%, Object Acc@1: 0.00%, Action Acc@1: 0.00%, Verb Acc@5: 0.22%, Object Acc@5: 0.00%, Action Acc@5: 0.00%
Loss alpha: tensor([0.3126], device='cuda:0'),Loss beta: tensor([0.6874], device='cuda:0')


Training: 100%|██████████| 210/210 [00:36<00:00,  5.73batch/s]
Evaluating: 100%|██████████| 45/45 [00:03<00:00, 12.65batch/s]


Epoch 3/20, Train Loss: 1.7792, Verb Acc@1: 0.00%, Object Acc@1: 0.00%, Action Acc@1: 0.00%, Verb Acc@5: 0.67%, Object Acc@5: 0.00%, Action Acc@5: 0.00%
Validation Loss: 1.7792, Verb Acc@1: 0.00%, Object Acc@1: 0.00%, Action Acc@1: 0.00%, Verb Acc@5: 0.22%, Object Acc@5: 0.00%, Action Acc@5: 0.00%
Loss alpha: tensor([0.3126], device='cuda:0'),Loss beta: tensor([0.6874], device='cuda:0')


Training: 100%|██████████| 210/210 [00:36<00:00,  5.71batch/s]
Evaluating: 100%|██████████| 45/45 [00:03<00:00, 12.61batch/s]


Epoch 4/20, Train Loss: 1.7792, Verb Acc@1: 0.00%, Object Acc@1: 0.00%, Action Acc@1: 0.00%, Verb Acc@5: 0.71%, Object Acc@5: 0.00%, Action Acc@5: 0.00%
Validation Loss: 1.7792, Verb Acc@1: 0.00%, Object Acc@1: 0.00%, Action Acc@1: 0.00%, Verb Acc@5: 0.22%, Object Acc@5: 0.00%, Action Acc@5: 0.00%
Loss alpha: tensor([0.3126], device='cuda:0'),Loss beta: tensor([0.6874], device='cuda:0')


Training: 100%|██████████| 210/210 [00:36<00:00,  5.70batch/s]
Evaluating: 100%|██████████| 45/45 [00:03<00:00, 12.56batch/s]


Epoch 5/20, Train Loss: 1.7792, Verb Acc@1: 0.14%, Object Acc@1: 0.00%, Action Acc@1: 0.00%, Verb Acc@5: 0.90%, Object Acc@5: 0.00%, Action Acc@5: 0.00%
Validation Loss: 1.7792, Verb Acc@1: 0.00%, Object Acc@1: 0.00%, Action Acc@1: 0.00%, Verb Acc@5: 0.44%, Object Acc@5: 0.00%, Action Acc@5: 0.00%
Loss alpha: tensor([0.3126], device='cuda:0'),Loss beta: tensor([0.6874], device='cuda:0')


Training: 100%|██████████| 210/210 [00:36<00:00,  5.71batch/s]
Evaluating: 100%|██████████| 45/45 [00:03<00:00, 12.50batch/s]


Epoch 6/20, Train Loss: 1.7792, Verb Acc@1: 0.14%, Object Acc@1: 0.00%, Action Acc@1: 0.00%, Verb Acc@5: 1.86%, Object Acc@5: 0.00%, Action Acc@5: 0.00%
Validation Loss: 1.7792, Verb Acc@1: 0.00%, Object Acc@1: 0.00%, Action Acc@1: 0.00%, Verb Acc@5: 5.56%, Object Acc@5: 0.00%, Action Acc@5: 0.00%
Loss alpha: tensor([0.3126], device='cuda:0'),Loss beta: tensor([0.6874], device='cuda:0')


Training: 100%|██████████| 210/210 [00:36<00:00,  5.70batch/s]
Evaluating: 100%|██████████| 45/45 [00:03<00:00, 12.60batch/s]


Epoch 7/20, Train Loss: 1.7792, Verb Acc@1: 3.43%, Object Acc@1: 0.00%, Action Acc@1: 0.00%, Verb Acc@5: 10.90%, Object Acc@5: 0.00%, Action Acc@5: 0.00%
Validation Loss: 1.7792, Verb Acc@1: 11.56%, Object Acc@1: 0.00%, Action Acc@1: 0.00%, Verb Acc@5: 15.33%, Object Acc@5: 0.00%, Action Acc@5: 0.00%
Loss alpha: tensor([0.3126], device='cuda:0'),Loss beta: tensor([0.6874], device='cuda:0')


Training: 100%|██████████| 210/210 [00:36<00:00,  5.71batch/s]
Evaluating: 100%|██████████| 45/45 [00:03<00:00, 12.65batch/s]


Epoch 8/20, Train Loss: 1.7792, Verb Acc@1: 13.14%, Object Acc@1: 0.00%, Action Acc@1: 0.00%, Verb Acc@5: 14.86%, Object Acc@5: 0.10%, Action Acc@5: 0.00%
Validation Loss: 1.7792, Verb Acc@1: 15.56%, Object Acc@1: 0.00%, Action Acc@1: 0.00%, Verb Acc@5: 16.22%, Object Acc@5: 0.22%, Action Acc@5: 0.00%
Loss alpha: tensor([0.3126], device='cuda:0'),Loss beta: tensor([0.6874], device='cuda:0')


Training: 100%|██████████| 210/210 [00:36<00:00,  5.71batch/s]
Evaluating: 100%|██████████| 45/45 [00:04<00:00, 11.03batch/s]


Epoch 9/20, Train Loss: 1.7792, Verb Acc@1: 14.05%, Object Acc@1: 0.00%, Action Acc@1: 0.00%, Verb Acc@5: 20.19%, Object Acc@5: 0.19%, Action Acc@5: 0.00%
Validation Loss: 1.7791, Verb Acc@1: 15.56%, Object Acc@1: 0.00%, Action Acc@1: 0.00%, Verb Acc@5: 21.78%, Object Acc@5: 0.22%, Action Acc@5: 0.00%
Loss alpha: tensor([0.3126], device='cuda:0'),Loss beta: tensor([0.6874], device='cuda:0')


Training: 100%|██████████| 210/210 [00:36<00:00,  5.71batch/s]
Evaluating: 100%|██████████| 45/45 [00:03<00:00, 12.47batch/s]


Epoch 10/20, Train Loss: 1.7745, Verb Acc@1: 14.05%, Object Acc@1: 0.00%, Action Acc@1: 0.00%, Verb Acc@5: 31.38%, Object Acc@5: 0.19%, Action Acc@5: 0.14%
Validation Loss: 1.7644, Verb Acc@1: 15.56%, Object Acc@1: 0.00%, Action Acc@1: 0.00%, Verb Acc@5: 32.44%, Object Acc@5: 0.22%, Action Acc@5: 0.00%
Loss alpha: tensor([0.3126], device='cuda:0'),Loss beta: tensor([0.6874], device='cuda:0')


Training: 100%|██████████| 210/210 [00:36<00:00,  5.68batch/s]
Evaluating: 100%|██████████| 45/45 [00:03<00:00, 12.61batch/s]


Epoch 11/20, Train Loss: 1.7659, Verb Acc@1: 14.05%, Object Acc@1: 0.00%, Action Acc@1: 0.00%, Verb Acc@5: 34.29%, Object Acc@5: 0.19%, Action Acc@5: 0.14%
Validation Loss: 1.7644, Verb Acc@1: 15.56%, Object Acc@1: 0.00%, Action Acc@1: 0.00%, Verb Acc@5: 32.44%, Object Acc@5: 0.22%, Action Acc@5: 0.00%
Loss alpha: tensor([0.3126], device='cuda:0'),Loss beta: tensor([0.6874], device='cuda:0')


Training: 100%|██████████| 210/210 [00:36<00:00,  5.71batch/s]
Evaluating: 100%|██████████| 45/45 [00:03<00:00, 12.59batch/s]


Epoch 12/20, Train Loss: 1.7659, Verb Acc@1: 14.05%, Object Acc@1: 0.00%, Action Acc@1: 0.00%, Verb Acc@5: 34.29%, Object Acc@5: 0.19%, Action Acc@5: 0.14%
Validation Loss: 1.7644, Verb Acc@1: 15.56%, Object Acc@1: 0.00%, Action Acc@1: 0.00%, Verb Acc@5: 32.44%, Object Acc@5: 0.22%, Action Acc@5: 0.00%
Loss alpha: tensor([0.3126], device='cuda:0'),Loss beta: tensor([0.6874], device='cuda:0')


Training: 100%|██████████| 210/210 [00:36<00:00,  5.72batch/s]
Evaluating: 100%|██████████| 45/45 [00:03<00:00, 12.62batch/s]


Epoch 13/20, Train Loss: 1.7659, Verb Acc@1: 14.05%, Object Acc@1: 0.00%, Action Acc@1: 0.00%, Verb Acc@5: 34.29%, Object Acc@5: 0.19%, Action Acc@5: 0.14%
Validation Loss: 1.7644, Verb Acc@1: 15.56%, Object Acc@1: 0.00%, Action Acc@1: 0.00%, Verb Acc@5: 32.44%, Object Acc@5: 0.22%, Action Acc@5: 0.00%
Loss alpha: tensor([0.3126], device='cuda:0'),Loss beta: tensor([0.6874], device='cuda:0')


Training: 100%|██████████| 210/210 [00:36<00:00,  5.71batch/s]
Evaluating: 100%|██████████| 45/45 [00:03<00:00, 12.61batch/s]


Epoch 14/20, Train Loss: 1.7659, Verb Acc@1: 14.05%, Object Acc@1: 0.00%, Action Acc@1: 0.00%, Verb Acc@5: 34.29%, Object Acc@5: 0.19%, Action Acc@5: 0.14%
Validation Loss: 1.7644, Verb Acc@1: 15.56%, Object Acc@1: 0.00%, Action Acc@1: 0.00%, Verb Acc@5: 32.44%, Object Acc@5: 0.22%, Action Acc@5: 0.00%
Loss alpha: tensor([0.3126], device='cuda:0'),Loss beta: tensor([0.6874], device='cuda:0')


Training: 100%|██████████| 210/210 [00:36<00:00,  5.71batch/s]
Evaluating: 100%|██████████| 45/45 [00:03<00:00, 12.51batch/s]


Epoch 15/20, Train Loss: 1.7659, Verb Acc@1: 14.05%, Object Acc@1: 0.00%, Action Acc@1: 0.00%, Verb Acc@5: 34.29%, Object Acc@5: 0.19%, Action Acc@5: 0.14%
Validation Loss: 1.7644, Verb Acc@1: 15.56%, Object Acc@1: 0.00%, Action Acc@1: 0.00%, Verb Acc@5: 32.44%, Object Acc@5: 0.22%, Action Acc@5: 0.00%
Loss alpha: tensor([0.3126], device='cuda:0'),Loss beta: tensor([0.6874], device='cuda:0')


Training: 100%|██████████| 210/210 [00:36<00:00,  5.71batch/s]
Evaluating: 100%|██████████| 45/45 [00:03<00:00, 12.56batch/s]


Epoch 16/20, Train Loss: 1.7659, Verb Acc@1: 14.05%, Object Acc@1: 0.00%, Action Acc@1: 0.00%, Verb Acc@5: 34.29%, Object Acc@5: 0.19%, Action Acc@5: 0.14%
Validation Loss: 1.7644, Verb Acc@1: 15.56%, Object Acc@1: 0.00%, Action Acc@1: 0.00%, Verb Acc@5: 32.44%, Object Acc@5: 0.22%, Action Acc@5: 0.00%
Loss alpha: tensor([0.3126], device='cuda:0'),Loss beta: tensor([0.6874], device='cuda:0')


Training: 100%|██████████| 210/210 [00:36<00:00,  5.71batch/s]
Evaluating: 100%|██████████| 45/45 [00:03<00:00, 12.71batch/s]


Epoch 17/20, Train Loss: 1.7659, Verb Acc@1: 14.05%, Object Acc@1: 0.00%, Action Acc@1: 0.00%, Verb Acc@5: 34.29%, Object Acc@5: 0.19%, Action Acc@5: 0.14%
Validation Loss: 1.7644, Verb Acc@1: 15.56%, Object Acc@1: 0.00%, Action Acc@1: 0.00%, Verb Acc@5: 32.44%, Object Acc@5: 0.22%, Action Acc@5: 0.00%
Loss alpha: tensor([0.3126], device='cuda:0'),Loss beta: tensor([0.6874], device='cuda:0')


Training: 100%|██████████| 210/210 [00:36<00:00,  5.73batch/s]
Evaluating: 100%|██████████| 45/45 [00:03<00:00, 12.54batch/s]


Epoch 18/20, Train Loss: 1.7659, Verb Acc@1: 14.05%, Object Acc@1: 0.00%, Action Acc@1: 0.00%, Verb Acc@5: 34.29%, Object Acc@5: 0.19%, Action Acc@5: 0.14%
Validation Loss: 1.7644, Verb Acc@1: 15.56%, Object Acc@1: 0.00%, Action Acc@1: 0.00%, Verb Acc@5: 32.44%, Object Acc@5: 0.22%, Action Acc@5: 0.00%
Loss alpha: tensor([0.3126], device='cuda:0'),Loss beta: tensor([0.6874], device='cuda:0')


Training: 100%|██████████| 210/210 [00:36<00:00,  5.72batch/s]
Evaluating: 100%|██████████| 45/45 [00:03<00:00, 12.57batch/s]


Epoch 19/20, Train Loss: 1.7659, Verb Acc@1: 14.05%, Object Acc@1: 0.00%, Action Acc@1: 0.00%, Verb Acc@5: 34.29%, Object Acc@5: 0.19%, Action Acc@5: 0.14%
Validation Loss: 1.7644, Verb Acc@1: 15.56%, Object Acc@1: 0.00%, Action Acc@1: 0.00%, Verb Acc@5: 32.44%, Object Acc@5: 0.22%, Action Acc@5: 0.00%
Loss alpha: tensor([0.3126], device='cuda:0'),Loss beta: tensor([0.6874], device='cuda:0')


Training: 100%|██████████| 210/210 [00:36<00:00,  5.71batch/s]
Evaluating: 100%|██████████| 45/45 [00:03<00:00, 12.60batch/s]


Epoch 20/20, Train Loss: 1.7659, Verb Acc@1: 14.05%, Object Acc@1: 0.00%, Action Acc@1: 0.00%, Verb Acc@5: 34.29%, Object Acc@5: 0.19%, Action Acc@5: 0.14%
Validation Loss: 1.7644, Verb Acc@1: 15.56%, Object Acc@1: 0.00%, Action Acc@1: 0.00%, Verb Acc@5: 32.44%, Object Acc@5: 0.22%, Action Acc@5: 0.00%
Loss alpha: tensor([0.3126], device='cuda:0'),Loss beta: tensor([0.6874], device='cuda:0')


Testing: 100%|██████████| 45/45 [00:03<00:00, 12.63batch/s]


Validation Loss: 1.7644, Verb Acc@1: 15.56%, Object Acc@1: 0.00%, Action Acc@1: 0.00%, Verb Acc@5: 32.44%, Object Acc@5: 0.22%, Action Acc@5: 0.00%


VBox(children=(Label(value='0.002 MB of 0.002 MB uploaded\r'), FloatProgress(value=1.0, max=1.0)))

0,1
lr,▁▂▂▃▄▄▅▅▆█▇▇▆▅▅▄▄▃▂▂
test_a_acc_1,▁
test_a_acc_5,▁
test_loss,▁
test_o_acc_1,▁
test_o_acc_5,▁
test_obj_loss,▁
test_v_acc_1,▁
test_v_acc_5,▁
test_verb_loss,▁

0,1
lr,0.001
test_a_acc_1,0.0
test_a_acc_5,0.0
test_loss,1.76441
test_o_acc_1,0.0
test_o_acc_5,0.00222
test_obj_loss,0.0
test_v_acc_1,0.15556
test_v_acc_5,0.32444
test_verb_loss,0.0
