This cell sets up the environment for fine-tuning by importing libraries, configuring random seeds, and enabling GPU if available. It ensures the data directory exists, initializes spaCy for NLP tasks, and loads the RoBERTa model and tokenizer for generating embeddings.

In [None]:
import spacy
import torch
import pandas as pd
import pickle
import os
from torch_geometric.loader import DataLoader
from torch_geometric.data import Data
from transformers import AutoTokenizer, AutoModel
import pytorch_lightning as pl
from torch_geometric.nn import GCNConv
import random
import numpy as np
import warnings
from sklearn.metrics import accuracy_score, f1_score

os.environ["CUDA_LAUNCH_BLOCKING"] = "1"

warnings.filterwarnings("ignore", category=UserWarning, module="torch_geometric.data.collate")
warnings.filterwarnings("ignore", category=UserWarning, module="torch.nn.modules.loss")

os.environ["TOKENIZERS_PARALLELISM"] = "false"

torch.manual_seed(42)
np.random.seed(42)
random.seed(42)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
if torch.cuda.is_available():
    torch.cuda.set_device(0)
    print(f"Using GPU: {torch.cuda.get_device_name(0)}")
    print(f"PyTorch CUDA version: {torch.version.cuda}")
    print(f"Number of CUDA devices: {torch.cuda.device_count()}")
else:
    print("Using CPU for fine-tuning.")
os.makedirs('data', exist_ok=True)
print("Ensured data/ directory exists")
print("PyTorch Lightning version:", pl.__version__)

try:
    nlp = spacy.load('en_core_web_sm', disable=['ner', 'lemmatizer'])
    tokenizer = AutoTokenizer.from_pretrained('roberta-base')
    roberta_model = AutoModel.from_pretrained('roberta-base').to(device)
    print(f"RoBERTa model loaded successfully on {device}.")
except Exception as e:
    print(f"Failed to load spaCy or RoBERTa: {e}")
    exit()

Using GPU: NVIDIA L40S
PyTorch CUDA version: 11.7
Number of CUDA devices: 1
Ensured data/ directory exists
PyTorch Lightning version: 2.0.0


Some weights of the model checkpoint at roberta-base were not used when initializing RobertaModel: ['lm_head.decoder.weight', 'lm_head.dense.weight', 'lm_head.bias', 'lm_head.layer_norm.bias', 'lm_head.dense.bias', 'lm_head.layer_norm.weight']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


RoBERTa model loaded successfully on cuda.


This cell loads dependency types from dep_types.pkl for graph construction and the SemEval ABSA dataset. It validates the dataset for required columns (sentence, aspect, sentiment) and prints sample data to ensure proper loading.


# Loading ABSA dataset

In [None]:
try:
    with open('data/dep_types.pkl', 'rb') as f:
        dep_data = pickle.load(f)
    dep_types = dep_data['dep_types']
    dep_type_to_idx = dep_data['dep_type_to_idx']
    print(f"Loaded dep_types and dep_type_to_idx (num_dep_types: {len(dep_types)})")
except FileNotFoundError:
    print("Dependency types file not found. Ensure 'data/dep_types.pkl' exists from pre-training.")
    exit()

try:
    train_df = pd.read_csv('data/semeval_train.csv')
    val_df = pd.read_csv('data/semeval_val.csv')
    required_columns = ['sentence', 'aspect', 'sentiment']
    for df, name in [(train_df, 'train_df'), (val_df, 'val_df')]:
        missing_columns = [col for col in required_columns if col not in df.columns]
        if missing_columns:
            print(f"Error: Missing required columns in {name}: {missing_columns}")
            exit()
        df = df[df['sentence'].notna() & (df['sentence'].str.strip() != '') & df['aspect'].notna()]
        print(f"Loaded {name} with {len(df)} samples")
        print(f"Sample of {name}:")
        print(df.head().to_json(orient='records'))
        if df.empty:
            raise ValueError(f"No valid data found in {name} after cleaning.")
except FileNotFoundError as e:
    print(f"Dataset file not found: {e}. Ensure 'data/semeval_train.csv' and 'data/semeval_val.csv' exist.")
    exit()


Loaded dep_types and dep_type_to_idx (num_dep_types: 44)
Loaded train_df with 160 samples
Sample of train_df:
[{"sentence":"But the staff was so horrible to us.","aspect":"staff","sentiment":"negative"},{"sentence":"To be completely fair, the only redeeming factor was the food, which was above average, but couldn't make up for all the other deficiencies of Teodora.","aspect":"food","sentiment":"positive"},{"sentence":"The food is uniformly exceptional, with a very capable kitchen which will proudly whip up whatever you feel like eating, whether it's on the menu or not.","aspect":"food","sentiment":"positive"},{"sentence":"The food is uniformly exceptional, with a very capable kitchen which will proudly whip up whatever you feel like eating, whether it's on the menu or not.","aspect":"kitchen","sentiment":"positive"},{"sentence":"The food is uniformly exceptional, with a very capable kitchen which will proudly whip up whatever you feel like eating, whether it's on the menu or not.","asp

This cell maps sentiment labels to indices, computes class weights, and generates dependency and contextual graphs for the ABSA dataset. It creates RoBERTa embeddings, identifies aspect nodes, and saves the train and validation graphs to pickle files.

In [None]:

sentiment_to_idx = {'positive': 0, 'negative': 1, 'neutral': 2}
idx_to_sentiment = {idx: label for label, idx in sentiment_to_idx.items()}
for df in [train_df, val_df]:
    df['sentiment_label'] = df['sentiment'].map(sentiment_to_idx)
    if df['sentiment_label'].isna().any():
        print(f"Warning: Some sentiment labels in {df.name if hasattr(df, 'name') else 'dataset'} could not be mapped.")
        print(df[df['sentiment_label'].isna()][['sentence', 'sentiment']])
        df = df.dropna(subset=['sentiment_label'])
        print(f"Dataset size after dropping unmapped sentiments: {len(df)}")

print("Class distribution in train_df:")
class_counts = train_df['sentiment_label'].value_counts().sort_index()
print(class_counts)
print("Class distribution in val_df:")
print(val_df['sentiment_label'].value_counts().sort_index())
class_weights = torch.tensor([1.0 / class_counts[i] for i in range(len(class_counts))], dtype=torch.float).to(device)
class_weights = class_weights / class_weights.sum() * len(class_counts)
print("Class weights:", class_weights)

def get_roberta_embeddings(sentence, nodes):
    sentences = [sentence]
    nodes_list = [nodes]
    valid_sentences = []
    valid_nodes = []
    valid_indices = []
    for i, (sent, node) in enumerate(zip(sentences, nodes_list)):
        if isinstance(sent, str) and sent.strip() and node:
            valid_sentences.append(sent)
            valid_nodes.append(node)
            valid_indices.append(i)
    if not valid_sentences:
        return None, 0
    try:
        inputs = tokenizer(valid_sentences, return_tensors='pt', padding=True, truncation=True, max_length=512)
        inputs = {k: v.to(device) for k, v in inputs.items()}
        with torch.no_grad():
            outputs = roberta_model(**inputs)
        embeddings = outputs.last_hidden_state.cpu()
        num_nodes = min(len(valid_nodes[0]), embeddings[0].shape[0] - 2)
        if num_nodes == 0:
            print(f"Warning: No valid nodes for sentence {valid_sentences[0]}")
            return None, 0
        emb = embeddings[0, 1:num_nodes+1].clone()
        return emb, num_nodes
    except RuntimeError as e:
        print(f"Error in RoBERTa embeddings: {e}")
        return None, 0

def create_dependency_graph(sentence, num_nodes=None):
    doc = nlp(sentence)
    original_nodes = [token.text.lower() for token in doc]
    if num_nodes is None:
        num_nodes = len(original_nodes)
    nodes = original_nodes[:num_nodes]
    edges = []
    edge_types = []
    for token in doc:
        if token.head != token and token.i < num_nodes and token.head.i < num_nodes:
            edges.append([token.i, token.head.i])
            edge_types.append(dep_type_to_idx.get(token.dep_, 0))
    edge_index = torch.tensor(edges, dtype=torch.long).t().contiguous() if edges else torch.empty((2, 0), dtype=torch.long)
    edge_types = torch.tensor(edge_types, dtype=torch.long) if edge_types else torch.empty((0,), dtype=torch.long)
    noun_indices = [i for i, token in enumerate(doc) if token.pos_ == 'NOUN' and i < num_nodes]
    adj_indices = [i for i, token in enumerate(doc) if token.pos_ == 'ADJ' and i < num_nodes]
    graph = Data(edge_index=edge_index, edge_attr=edge_types, sentence=sentence, nodes=nodes, noun_indices=noun_indices, adj_indices=adj_indices)
    return graph

def create_contextual_graph(sentence, num_nodes=None):
    doc = nlp(sentence)
    original_nodes = [token.text.lower() for token in doc]
    if num_nodes is None:
        num_nodes = len(original_nodes)
    nodes = original_nodes[:num_nodes]
    edges = []
    window = 2
    for i in range(num_nodes):
        for j in range(max(0, i - window), min(num_nodes, i + window + 1)):
            if i != j and i < num_nodes and j < num_nodes:
                edges.append([i, j])
    edge_index = torch.tensor(edges, dtype=torch.long).t().contiguous() if edges else torch.empty((2, 0), dtype=torch.long)
    noun_indices = [i for i, token in enumerate(doc) if token.pos_ == 'NOUN' and i < num_nodes]
    adj_indices = [i for i, token in enumerate(doc) if token.pos_ == 'ADJ' and i < num_nodes]
    graph = Data(edge_index=edge_index, sentence=sentence, nodes=nodes, noun_indices=noun_indices, adj_indices=adj_indices)
    return graph

def normalize_tokens(text):
    return ' '.join(text.lower().replace('-', ' ').split())

train_graphs = []
for _, row in train_df.iterrows():
    dep_graph = create_dependency_graph(row['sentence'])
    if dep_graph is None:
        continue
    context_graph = create_contextual_graph(row['sentence'])
    if context_graph is None:
        continue
    embeddings, num_nodes = get_roberta_embeddings(row['sentence'], dep_graph.nodes)
    if embeddings is None:
        continue
    dep_graph = create_dependency_graph(row['sentence'], num_nodes)
    context_graph = create_contextual_graph(row['sentence'], num_nodes)
    dep_graph.x = embeddings
    context_graph.x = embeddings
    aspect = normalize_tokens(row['aspect'])
    aspect_tokens = aspect.split()
    aspect_node_idx = None
    nodes = [normalize_tokens(node) for node in dep_graph.nodes]
    for i in range(len(nodes) - len(aspect_tokens) + 1):
        if ' '.join(nodes[i:i + len(aspect_tokens)]) == aspect:
            aspect_node_idx = i
            break
    if aspect_node_idx is None:
        print(f"Warning: Aspect '{aspect}' not found in sentence '{row['sentence']}'.")
        continue
    dep_graph.aspect_node_idx = aspect_node_idx
    context_graph.aspect_node_idx = aspect_node_idx
    dep_graph.aspect = row['aspect']
    context_graph.aspect = row['aspect']
    dep_graph.label = row['sentiment_label']
    context_graph.label = row['sentiment_label']
    train_graphs.append((context_graph, dep_graph))

val_graphs = []
for _, row in val_df.iterrows():
    dep_graph = create_dependency_graph(row['sentence'])
    if dep_graph is None:
        continue
    context_graph = create_contextual_graph(row['sentence'])
    if context_graph is None:
        continue
    embeddings, num_nodes = get_roberta_embeddings(row['sentence'], dep_graph.nodes)
    if embeddings is None:
        continue
    dep_graph = create_dependency_graph(row['sentence'], num_nodes)
    context_graph = create_contextual_graph(row['sentence'], num_nodes)
    dep_graph.x = embeddings
    context_graph.x = embeddings
    aspect = normalize_tokens(row['aspect'])
    aspect_tokens = aspect.split()
    aspect_node_idx = None
    nodes = [normalize_tokens(node) for node in dep_graph.nodes]
    for i in range(len(nodes) - len(aspect_tokens) + 1):
        if ' '.join(nodes[i:i + len(aspect_tokens)]) == aspect:
            aspect_node_idx = i
            break
    if aspect_node_idx is None:
        print(f"Warning: Aspect '{aspect}' not found in sentence '{row['sentence']}'.")
        continue
    dep_graph.aspect_node_idx = aspect_node_idx
    context_graph.aspect_node_idx = aspect_node_idx
    dep_graph.aspect = row['aspect']
    context_graph.aspect = row['aspect']
    dep_graph.label = row['sentiment_label']
    context_graph.label = row['sentiment_label']
    val_graphs.append((context_graph, dep_graph))

with open('data/semeval_train_graphs_embedded.pkl', 'wb') as f:
    pickle.dump(train_graphs, f)
with open('data/semeval_val_graphs_embedded.pkl', 'wb') as f:
    pickle.dump(val_graphs, f)
print(f"Total train graphs: {len(train_graphs)}, Total validation graphs: {len(val_graphs)}")


                                              sentence sentiment
65   The food was delicious but do not come here on...  conflict
102  The service varys from day to day- sometimes t...  conflict
134  Though the Spider Roll may look like a challen...  conflict
143  An oasis of refinement:  Food, though somewhat...  conflict
Dataset size after dropping unmapped sentiments: 156
Class distribution in train_df:
0.0    99
1.0    27
2.0    30
Name: sentiment_label, dtype: int64
Class distribution in val_df:
0    20
1     8
2    12
Name: sentiment_label, dtype: int64
Class weights: tensor([0.3766, 1.3808, 1.2427], device='cuda:0')


Total train graphs: 156, Total validation graphs: 40


This cell defines a custom CustomData class to handle aspect_node_idx as an integer, a custom collate function for DataLoader, and functions to validate data, balance classes, and generate graphs. It processes the ABSA dataset, creates graphs with embeddings, filters invalid graphs, and saves them to pickle files.

In [None]:
import torch
import torch.nn as nn
import pytorch_lightning as pl
from torch_geometric.nn import GCNConv
from torch_geometric.data import Data, DataLoader
import pickle
import pandas as pd
from sklearn.metrics import accuracy_score, f1_score
from torch.optim.lr_scheduler import CosineAnnealingLR
from sklearn.utils import resample

class CustomData(Data):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        if hasattr(self, 'aspect_node_idx') and isinstance(self.aspect_node_idx, torch.Tensor):
            self.aspect_node_idx = self.aspect_node_idx.item()

    def __setattr__(self, key, value):
        if key == 'aspect_node_idx' and isinstance(value, torch.Tensor):
            value = value.item()
        super().__setattr__(key, value)

    def to(self, *args, **kwargs):
        obj = super().to(*args, **kwargs)
        if hasattr(obj, 'aspect_node_idx') and isinstance(obj.aspect_node_idx, torch.Tensor):
            obj.aspect_node_idx = obj.aspect_node_idx.item()
        return obj

    def __getstate__(self):
        state = self.__dict__.copy()
        if 'aspect_node_idx' in state and isinstance(state['aspect_node_idx'], torch.Tensor):
            state['aspect_node_idx'] = state['aspect_node_idx'].item()
        return state

def custom_collate(batch):
    if len(batch) != 1:
        raise ValueError(f"Expected batch_size=1, but got batch size {len(batch)}")
    context_graph, dep_graph = batch[0]
    if isinstance(context_graph.aspect_node_idx, torch.Tensor):
        context_graph.aspect_node_idx = context_graph.aspect_node_idx.item()
    if isinstance(dep_graph.aspect_node_idx, torch.Tensor):
        dep_graph.aspect_node_idx = dep_graph.aspect_node_idx.item()
    return context_graph, dep_graph

def validate_data(df, name="dataset"):
    valid_rows = []
    for idx, row in df.iterrows():
        aspect = normalize_tokens(row['aspect'])
        sentence_tokens = [normalize_tokens(token.text) for token in nlp(row['sentence'])]
        sentiment = str(row['sentiment']).lower() 
    
        if not any(token in sentence_tokens for token in aspect.split()):
            print(f"Skipping row {idx} in {name}: No aspect tokens from '{aspect}' found in sentence '{row['sentence']}'")
            continue
        if sentiment not in [k.lower() for k in sentiment_to_idx.keys()]:
            print(f"Skipping row {idx} in {name}: Invalid sentiment '{sentiment}' in sentence '{row['sentence']}'")
            continue
        valid_rows.append(row)
    return pd.DataFrame(valid_rows)

sentiment_to_idx = {'positive': 0, 'negative': 1, 'neutral': 2}
idx_to_sentiment = {idx: label for label, idx in sentiment_to_idx.items()}
train_df = validate_data(train_df, "train_df")
val_df = validate_data(val_df, "val_df")

max_size = train_df['sentiment_label'].value_counts().max()
balanced_dfs = []
for label in sentiment_to_idx.values():
    df_label = train_df[train_df['sentiment_label'] == label]
    df_oversampled = resample(df_label, replace=True, n_samples=max_size, random_state=42)
    balanced_dfs.append(df_oversampled)
train_df = pd.concat(balanced_dfs)

for df in [train_df, val_df]:
    df['sentiment_label'] = df['sentiment'].str.lower().map({k.lower(): v for k, v in sentiment_to_idx.items()})
    if df['sentiment_label'].isna().any():
        print(f"Warning: NaN values found in sentiment_label for {df.name if hasattr(df, 'name') else 'dataset'}")
        print(df[df['sentiment_label'].isna()][['sentence', 'sentiment']])
        df = df.dropna(subset=['sentiment_label'])
    df['sentiment_label'] = df['sentiment_label'].astype(int)

print("Class distribution in train_df:")
class_counts = train_df['sentiment_label'].value_counts().sort_index()
print(class_counts)
print("Class distribution in val_df:")
print(val_df['sentiment_label'].value_counts().sort_index())
if any(class_counts == 0):
    raise ValueError("One or more classes have zero instances in train_df.")
class_weights = torch.tensor([1.0 / class_counts[i] for i in range(len(class_counts))], dtype=torch.float).to(device)
class_weights = class_weights / class_weights.max()
print("Class weights:", class_weights)

def get_roberta_embeddings(sentence, nodes):
    sentences = [sentence]
    nodes_list = [nodes]
    valid_sentences = []
    valid_nodes = []
    for i, (sent, node) in enumerate(zip(sentences, nodes_list)):
        if isinstance(sent, str) and sent.strip() and node:
            valid_sentences.append(sent)
            valid_nodes.append(node)
    if not valid_sentences:
        return None, 0
    try:
        inputs = tokenizer(valid_sentences, return_tensors='pt', padding=True, truncation=True, max_length=512)
        inputs = {k: v.to(device) for k, v in inputs.items()}
        with torch.no_grad():
            outputs = roberta_model(**inputs)
        embeddings = outputs.last_hidden_state.cpu()
        num_nodes = min(len(valid_nodes[0]), embeddings[0].shape[0] - 2)
        if num_nodes == 0:
            return None, 0
        emb = embeddings[0, 1:num_nodes+1].clone()
        return emb, num_nodes
    except RuntimeError:
        return None, 0

def create_dependency_graph(sentence, num_nodes=None):
    doc = nlp(sentence)
    original_nodes = [token.text.lower() for token in doc]
    if not original_nodes:
        return None
    if num_nodes is None:
        num_nodes = len(original_nodes)
    nodes = original_nodes[:num_nodes]
    edges = []
    edge_types = []
    for token in doc:
        if token.head != token and token.i < num_nodes and token.head.i < num_nodes:
            edges.append([token.i, token.head.i])
            edge_types.append(dep_type_to_idx.get(token.dep_, 0))
    edge_index = torch.tensor(edges, dtype=torch.long).t().contiguous() if edges else torch.empty((2, 0), dtype=torch.long)
    edge_types = torch.tensor(edge_types, dtype=torch.long) if edge_types else torch.empty((0,), dtype=torch.long)
    noun_indices = [i for i, token in enumerate(doc) if token.pos_ == 'NOUN' and i < num_nodes]
    adj_indices = [i for i, token in enumerate(doc) if token.pos_ == 'ADJ' and i < num_nodes]
    graph = CustomData(edge_index=edge_index, edge_attr=edge_types, sentence=sentence, nodes=nodes, noun_indices=noun_indices, adj_indices=adj_indices)
    return graph


def create_contextual_graph(sentence, num_nodes=None):
    doc = nlp(sentence)
    original_nodes = [token.text.lower() for token in doc]
    if not original_nodes:
        return None
    if num_nodes is None:
        num_nodes = len(original_nodes)
    nodes = original_nodes[:num_nodes]
    edges = []
    window = 2
    for i in range(num_nodes):
        for j in range(max(0, i - window), min(num_nodes, i + window + 1)):
            if i != j and i < num_nodes and j < num_nodes:
                edges.append([i, j])
    edge_index = torch.tensor(edges, dtype=torch.long).t().contiguous() if edges else torch.empty((2, 0), dtype=torch.long)
    noun_indices = [i for i, token in enumerate(doc) if token.pos_ == 'NOUN' and i < num_nodes]
    adj_indices = [i for i, token in enumerate(doc) if token.pos_ == 'ADJ' and i < num_nodes]
    graph = CustomData(edge_index=edge_index, sentence=sentence, nodes=nodes, noun_indices=noun_indices, adj_indices=adj_indices)
    return graph

def normalize_tokens(text):
    return ' '.join(str(text).lower().replace('-', ' ').split())

train_graphs = []
for idx, row in train_df.iterrows():
    dep_graph = create_dependency_graph(row['sentence'])
    if dep_graph is None:
        continue
    context_graph = create_contextual_graph(row['sentence'])
    if context_graph is None:
        continue
    embeddings, num_nodes = get_roberta_embeddings(row['sentence'], dep_graph.nodes)
    if embeddings is None:
        continue
    dep_graph = create_dependency_graph(row['sentence'], num_nodes)
    context_graph = create_contextual_graph(row['sentence'], num_nodes)
    dep_graph.x = embeddings
    context_graph.x = embeddings
    aspect = normalize_tokens(row['aspect'])
    aspect_tokens = aspect.split()
    aspect_node_idx = None
    nodes = [normalize_tokens(node) for node in dep_graph.nodes]
    for i in range(len(nodes) - len(aspect_tokens) + 1):
        if ' '.join(nodes[i:i + len(aspect_tokens)]) == aspect:
            aspect_node_idx = i
            break
    if aspect_node_idx is None:
        continue
    dep_graph.aspect_node_idx = int(aspect_node_idx)
    context_graph.aspect_node_idx = int(aspect_node_idx)
    dep_graph.aspect = row['aspect']
    context_graph.aspect = row['aspect']
    dep_graph.label = row['sentiment_label']
    context_graph.label = row['sentiment_label']
    train_graphs.append((context_graph, dep_graph))

val_graphs = []
for idx, row in val_df.iterrows():
    dep_graph = create_dependency_graph(row['sentence'])
    if dep_graph is None:
        continue
    context_graph = create_contextual_graph(row['sentence'])
    if context_graph is None:
        continue
    embeddings, num_nodes = get_roberta_embeddings(row['sentence'], dep_graph.nodes)
    if embeddings is None:
        continue
    dep_graph = create_dependency_graph(row['sentence'], num_nodes)
    context_graph = create_contextual_graph(row['sentence'], num_nodes)
    dep_graph.x = embeddings
    context_graph.x = embeddings
    aspect = normalize_tokens(row['aspect'])
    aspect_tokens = aspect.split()
    aspect_node_idx = None
    nodes = [normalize_tokens(node) for node in dep_graph.nodes]
    for i in range(len(nodes) - len(aspect_tokens) + 1):
        if ' '.join(nodes[i:i + len(aspect_tokens)]) == aspect:
            aspect_node_idx = i
            break
    if aspect_node_idx is None:
        continue
    dep_graph.aspect_node_idx = int(aspect_node_idx)
    context_graph.aspect_node_idx = int(aspect_node_idx)
    dep_graph.aspect = row['aspect']
    context_graph.aspect = row['aspect']
    dep_graph.label = row['sentiment_label']
    context_graph.label = row['sentiment_label']
    val_graphs.append((context_graph, dep_graph))

def is_valid_graph(c, d):
    try:
        if c.x.shape[0] == 0 or d.x.shape[0] == 0:
            return False
        if c.edge_index.numel() > 0 and c.edge_index.max() >= c.x.shape[0]:
            return False
        if d.edge_index.numel() > 0 and d.edge_index.max() >= d.x.shape[0]:
            return False
        if not isinstance(c.aspect_node_idx, int) or not isinstance(d.aspect_node_idx, int):
            return False
        if c.aspect_node_idx < 0 or c.aspect_node_idx >= c.x.shape[0]:
            return False
        if d.aspect_node_idx < 0 or d.aspect_node_idx >= d.x.shape[0]:
            return False
        return True
    except Exception:
        return False

train_graphs = [(c, d) for c, d in train_graphs if is_valid_graph(c, d)]
val_graphs = [(c, d) for c, d in val_graphs if is_valid_graph(c, d)]

with open('data/semeval_train_graphs_embedded.pkl', 'wb') as f:
    pickle.dump(train_graphs, f)
with open('data/semeval_val_graphs_embedded.pkl', 'wb') as f:
    pickle.dump(val_graphs, f)
print(f"Total train graphs: {len(train_graphs)}, Total validation graphs: {len(val_graphs)}")


This cell defines the HybridGCNForABSA model for aspect-based sentiment analysis, combining T-GCN and ARGCN with a classification head. It loads pre-trained weights, creates DataLoaders with a custom collate function, and fine-tunes the model with early stopping and checkpointing, saving the final model to finetuned_gcn_200_clean.ckpt.

In [None]:

class HybridGCNForABSA(pl.LightningModule):
    def __init__(self, input_dim=768, hidden_dim=256, num_dep_types=50, num_classes=3, class_weights=None):
        super().__init__()
        self.save_hyperparameters()
        self.tgcn_conv1 = GCNConv(input_dim, hidden_dim)
        self.tgcn_conv2 = GCNConv(hidden_dim, hidden_dim)
        self.tgcn_dropout = nn.Dropout(0.3)
        self.argcn_conv1 = GCNConv(input_dim, hidden_dim)
        self.argcn_conv2 = GCNConv(hidden_dim, hidden_dim)
        self.argcn_dropout = nn.Dropout(0.3)
        self.type_emb = nn.Embedding(num_dep_types, hidden_dim)
        self.fc_sentiment = nn.Linear(hidden_dim * 2, num_classes)
        self.dep_type_to_idx = dep_type_to_idx
        self.criterion = nn.CrossEntropyLoss(weight=class_weights)
        self.val_preds = []
        self.val_labels = []
        self.train_preds = []
        self.train_labels = []

    def forward(self, context_data, dep_data, target_idx=None):
        x_tgcn = context_data.x.to(device)
        edge_index_tgcn = context_data.edge_index.to(device)
        x_argcn = dep_data.x.to(device)
        edge_index_argcn = dep_data.edge_index.to(device)
        edge_attr = dep_data.edge_attr.to(device) if dep_data.edge_attr is not None else None

        num_nodes_tgcn = x_tgcn.shape[0]
        num_nodes_argcn = x_argcn.shape[0]
        if edge_index_tgcn.numel() > 0 and edge_index_tgcn.max().item() >= num_nodes_tgcn:
            raise ValueError(f"Invalid edge_index_tgcn: max index {edge_index_tgcn.max().item()}, num_nodes {num_nodes_tgcn}")
        if edge_index_argcn.numel() > 0 and edge_index_argcn.max().item() >= num_nodes_argcn:
            raise ValueError(f"Invalid edge_index_argcn: max index {edge_index_argcn.max().item()}, num_nodes {num_nodes_argcn}")
        if target_idx is None or target_idx >= num_nodes_tgcn or target_idx >= num_nodes_argcn:
            raise ValueError(f"Invalid target_idx: {target_idx}, num_nodes_tgcn={num_nodes_tgcn}, num_nodes_argcn={num_nodes_argcn}")

        x_tgcn1 = torch.relu(self.tgcn_conv1(x_tgcn, edge_index_tgcn))
        x_tgcn1 = self.tgcn_dropout(x_tgcn1)
        x_tgcn2 = torch.relu(self.tgcn_conv2(x_tgcn1, edge_index_tgcn))
        x_tgcn = (x_tgcn1 + x_tgcn2) / 2

        edge_weights = torch.ones(edge_index_argcn.size(1), device=device)
        if edge_attr is not None:
            valid_mask = edge_attr != -1
            valid_edge_attr = edge_attr[valid_mask]
            if valid_edge_attr.numel() > 0:
                edge_weights[valid_mask] = self.type_emb(valid_edge_attr).mean(dim=1)
        x_argcn1 = torch.relu(self.argcn_conv1(x_argcn, edge_index_argcn, edge_weights))
        x_argcn1 = self.argcn_dropout(x_argcn1)
        x_argcn2 = torch.relu(self.argcn_conv2(x_argcn1, edge_index_argcn, edge_weights))
        x_argcn = (x_argcn1 + x_argcn2) / 2

        x_tgcn_aspect = x_tgcn[target_idx].unsqueeze(0)
        x_argcn_aspect = x_argcn[target_idx].unsqueeze(0)
        x_combined = torch.cat([x_tgcn_aspect, x_argcn_aspect], dim=-1)
        sentiment_logits = self.fc_sentiment(x_combined)
        return sentiment_logits

    def on_train_epoch_start(self):
        self.train_preds = []
        self.train_labels = []

    def on_validation_epoch_start(self):
        self.val_preds = []
        self.val_labels = []

    def training_step(self, batch, batch_idx):
        context_data, dep_data = batch
        context_data = context_data.to(device)
        dep_data = dep_data.to(device)
        loss = torch.tensor(0.0, device=device, requires_grad=True)

        try:
         
            if context_data.x.shape[0] == 0 or dep_data.x.shape[0] == 0:
                return None
            
          
            if context_data.edge_index.numel() > 0 and context_data.edge_index.max().item() >= context_data.x.shape[0]:
                return None
            if dep_data.edge_index.numel() > 0 and dep_data.edge_index.max().item() >= dep_data.x.shape[0]:
                return None

            target_idx = context_data.aspect_node_idx
            if isinstance(target_idx, torch.Tensor):
                if target_idx.numel() == 1:
                    target_idx = target_idx.item()
                else:
                    return None
            elif not isinstance(target_idx, int):
                return None

        
            label = torch.tensor([context_data.label], device=device, dtype=torch.long)

           
            logits = self(context_data, dep_data, target_idx=target_idx)
            batch_loss = self.criterion(logits, label)
            loss = loss + batch_loss
            pred = torch.argmax(logits, dim=1).cpu().item()
            self.train_preds.append(pred)
            self.train_labels.append(label.cpu().item())
            self.log('train_loss', batch_loss, on_step=True, on_epoch=True, prog_bar=True, batch_size=1)
            return loss

        except Exception:
            return None

    def validation_step(self, batch, batch_idx):
        context_data, dep_data = batch
        context_data = context_data.to(device)
        dep_data = dep_data.to(device)

        try:
        
            if context_data.x.shape[0] == 0 or dep_data.x.shape[0] == 0:
                return
            
            
            if context_data.edge_index.numel() > 0 and context_data.edge_index.max().item() >= context_data.x.shape[0]:
                return
            if dep_data.edge_index.numel() > 0 and dep_data.edge_index.max().item() >= dep_data.x.shape[0]:
                return

      
            target_idx = context_data.aspect_node_idx
            if isinstance(target_idx, torch.Tensor):
                if target_idx.numel() == 1:
                    target_idx = target_idx.item()
                else:
                    return
            elif not isinstance(target_idx, int):
                return

            label = int(context_data.label.item())

            logits = self(context_data, dep_data, target_idx=target_idx)
            pred = torch.argmax(logits, dim=1).cpu().item()

            self.val_preds.append(pred)
            self.val_labels.append(label)

        except Exception:
            return

    def on_train_epoch_end(self):
        if self.train_preds and self.train_labels:
            train_accuracy = accuracy_score(self.train_labels, self.train_preds)
            train_f1 = f1_score(self.train_labels, self.train_preds, average='macro')
            self.log('train_accuracy', train_accuracy, on_epoch=True, prog_bar=True)
            self.log('train_f1', train_f1, on_epoch=True, prog_bar=True)
            print(f"Training Accuracy: {train_accuracy:.4f}, F1-Score: {train_f1:.4f}")
        self.train_preds = []
        self.train_labels = []

    def on_validation_epoch_end(self):
        if not self.val_preds or not self.val_labels:
            self.log('val_accuracy', 0.0, on_epoch=True, prog_bar=True)
            self.log('val_f1', 0.0, on_epoch=True, prog_bar=True)
            print("Validation Accuracy: 0.0000, F1-Score: 0.0000")
        else:
            accuracy = accuracy_score(self.val_labels, self.val_preds)
            f1 = f1_score(self.val_labels, self.val_preds, average='macro')
            self.log('val_accuracy', accuracy, on_epoch=True, prog_bar=True)
            self.log('val_f1', f1, on_epoch=True, prog_bar=True)
            print(f"Validation Accuracy: {accuracy:.4f}, F1-Score: {f1:.4f}")

        self.val_preds = []
        self.val_labels = []

    def configure_optimizers(self):
        optimizer = torch.optim.Adam(self.parameters(), lr=5e-5)
        scheduler = CosineAnnealingLR(optimizer, T_max=10, eta_min=1e-6)
        return {
            "optimizer": optimizer,
            "lr_scheduler": {
                "scheduler": scheduler,
                "interval": "epoch",
                "frequency": 1
            }
        }

This cell initializes the `HybridGCNForABSA` model, loads pre-trained weights from `pretrained_gcn_1.ckpt` (excluding node and edge prediction layers), and creates DataLoaders for training and validation graphs. It fine-tunes the model for 15 epochs with early stopping based on validation accuracy, saving the best model to `finetuned_gcn_200_clean.ckpt`.

In [None]:

try:
    model = HybridGCNForABSA(num_dep_types=len(dep_types), num_classes=3, class_weights=class_weights)
    model.dep_type_to_idx = dep_type_to_idx
    checkpoint = torch.load('data/pretrained_gcn_1.ckpt', map_location=device)
    state_dict = checkpoint['state_dict']
    state_dict = {k: v for k, v in state_dict.items() if not k.startswith('fc_node') and not k.startswith('fc_edge')}
    model.load_state_dict(state_dict, strict=False)
    print("Loaded pre-trained weights from 'data/pretrained_gcn_200_clean.ckpt'.")
except Exception as e:
    print(f"Failed to load pre-trained model: {e}")
    raise e

train_loader = DataLoader(train_graphs, batch_size=1, shuffle=True, num_workers=0, pin_memory=True, collate_fn=custom_collate)
val_loader = DataLoader(val_graphs, batch_size=1, shuffle=False, num_workers=0, pin_memory=True, collate_fn=custom_collate)

try:
    trainer = pl.Trainer(
        max_epochs=15,
        accelerator='gpu' if torch.cuda.is_available() else 'cpu',
        devices=1,
        log_every_n_steps=5,
        enable_checkpointing=True,
        default_root_dir='checkpoints/',
        val_check_interval=1.0,
        limit_val_batches=1.0,
        callbacks=[
            pl.callbacks.EarlyStopping(monitor='val_accuracy', patience=3, mode='max', min_delta=0.005),
            pl.callbacks.ModelCheckpoint(monitor='val_accuracy', mode='max', save_top_k=1)
        ]
    )
    trainer.fit(model, train_loader, val_loader)
    trainer.save_checkpoint('data/finetuned_gcn_200_clean.ckpt')
    print("Fine-tuning completed. Checkpoint saved at: data/finetuned_gcn_Final_clean.ckpt")
except Exception as e:
    print(f"Fine-tuning failed: {e}")
    raise e

Class distribution in train_df:
0    99
1    99
2    99
Name: sentiment_label, dtype: int64
Class distribution in val_df:
0    20
1     8
2    12
Name: sentiment_label, dtype: int64
Class weights: tensor([1., 1., 1.], device='cuda:0')


GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
`Trainer(limit_val_batches=1.0)` was configured so 100% of the batches will be used..
`Trainer(val_check_interval=1.0)` was configured so validation will run at the end of the training epoch..
You are using a CUDA device ('NVIDIA L40S') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name          | Type             | Params
---------------------------------------------------
0 | tgcn_conv1    | GCNConv          | 196 K 
1 | tgcn_conv2    | GCNConv          | 65.8 K
2 | tgcn_dropout  | Dropout          | 0     
3 | argcn_conv1   | GCNC

Total train graphs: 292, Total validation graphs: 40
Loaded pre-trained weights from 'data/pretrained_gcn_200_clean.ckpt'.


Sanity Checking: 0it [00:00, ?it/s]

Validation Accuracy: 0.5000, F1-Score: 0.3333


  rank_zero_warn(
  rank_zero_warn(


Training: 0it [00:00, ?it/s]

Validation: 0it [00:00, ?it/s]

Validation Accuracy: 0.5750, F1-Score: 0.4750
Training Accuracy: 0.3904, F1-Score: 0.3414


Validation: 0it [00:00, ?it/s]

Validation Accuracy: 0.8750, F1-Score: 0.8628
Training Accuracy: 0.6130, F1-Score: 0.6089


Validation: 0it [00:00, ?it/s]

Validation Accuracy: 0.8250, F1-Score: 0.8111
Training Accuracy: 0.7979, F1-Score: 0.7935


Validation: 0it [00:00, ?it/s]

Validation Accuracy: 0.8500, F1-Score: 0.8330
Training Accuracy: 0.8253, F1-Score: 0.8246


Validation: 0it [00:00, ?it/s]

Validation Accuracy: 0.8500, F1-Score: 0.8330
Training Accuracy: 0.8801, F1-Score: 0.8796
Fine-tuning completed. Checkpoint saved at: data/finetuned_gcn_Final_clean.ckpt
