# Text Classification - Adjaceny Matrix


## $\color{blue}{Sections:}$

* Preamble
1.   Admin
2.   Data
3.   Adjacency Matrix
4.   Save


## $\color{blue}{Preamble:}$

The representation of our graph is central to any graph neiral network.

In this notebook we create the adjacency matricies that will be central to all GNN approaches.

## $\color{blue}{Admin}$
* Install relevant Libraries
* Import relevant Libraries

In [1]:
import openai
import re
import pandas as pd
from google.colab import drive
from google.colab import userdata
import os

## $\color{blue}{Data}$

* Connect to Drive
* Load the data to a string

In [2]:
drive.mount("/content/drive")
%cd '/content/drive/MyDrive'

Mounted at /content/drive
/content/drive/MyDrive


In [3]:
import pandas as pd
path = 'class/datasets/'
df_train = pd.read_pickle(path + 'df_train')
df_dev = pd.read_pickle(path + 'df_dev')
df_test = pd.read_pickle(path + 'df_test')

In [4]:
df_test.columns

Index(['index', 'master', 'book_idx', 'book', 'chapter_idx', 'chapter',
       'author', 'content', 'vanilla_embedding', 'vanilla_embedding.1',
       'ft_embedding', 'ft_embedding_pal', 'ner_responses'],
      dtype='object')

In [7]:
for el in df_train['ner_responses'][0:5]:
  print(el)
  print()

“Is it @@John of Tuam##Person ?”   “Are you sure of that now?” asked @@Mr Fogarty##Person dubiously. “I thought it was some Italian or American.”   “@@John of Tuam##Person ,” repeated @@Mr Cunningham##Person , “was the man.”   He drank and the other gentlemen followed his lead.

sibly there were several others. He personally, being of a sceptical bias, believed and didn’t make the smallest bones about saying so either that man or men in the plural were always hanging around on the waiting list about a lady,

@@Stephen##Person , who was trying his dead best to yawn if he could, suffering from lassitude generally, replied:   —To fill the ear of a cow elephant. They were haggling over money.   —Is that so? @@Mr Bloom##Person asked.

Now to the historical, for as @@Madam Mina##Person write not in her stenography, I must, in my cumbrous old fashion, that so each day of us may not go unrecorded. We got to the @@Borgo Pass##Location just after sunrise yesterday morning. When I saw the signs o

## $\color{blue}{Adjacency-Matrix}$


In [None]:
def get_entities(df):

  # Extract entities
  pattern = r"@@([^#]*)##(\w+\b)\S*"
  all_entities = [re.findall(pattern, text) for text in df['ner_responses']]

  #hold entities
  people = [None] * df.shape[0]
  locations = [None] * df.shape[0]
  entities = [None] * df.shape[0]

  count = 0
  # populate entity holders
  for i in range(len(entities)):

    people_holder = []
    locations_holder = []
    entity_holder = []

    for entity, label in all_entities[i]:
      if (label == 'Person') or (label == 'person'):
        person_input = entity.lower()
        pattern = r'\b(dr\.?|mr\.?|mrs\.?|miss)\b'
        person_clean = re.sub(pattern, '', person_input, flags=re.IGNORECASE)
        people_holder.append(person_clean.strip())
        entity_holder.append(person_clean.strip())
      elif (label == 'Location') or (label == 'location'):
        locations_holder.append(entity.lower().strip())
        entity_holder.append(entity.lower().strip())

    if people_holder:
      people[i] = people_holder
    if locations_holder:
      locations[i] = locations_holder
    if entity_holder:
      entities[i] = entity_holder

  return people, locations, entities

In [None]:
train_people, train_locations, train_entities = get_entities(df_train)
dev_people, dev_locations, dev_entities = get_entities(df_dev)
test_people, test_locations, test_entities = get_entities(df_test)

In [None]:
# make adjacency of train + dev nodes
df1 = df_train[['index', 'ner_responses']]
df2 = df_dev[['index', 'ner_responses']]
df_val = pd.concat([df1,df2])
val_people, val_locations, val_entities = get_entities(df_val)

In [None]:
import torch
def create_adjacency(lstr):
  n = len(lstr)
  matrix = torch.zeros((n, n))
  for i in range(n):
    for j in range(n):
      if (i != j) and (lstr[i] != None) and (lstr[j] != None):
        for entity in lstr[i]:
          if entity in lstr[j]:
            matrix[i,j] = 1
  return matrix


## $\color{blue}{Save}$


In [None]:
path = 'class/tensors/adj_{}.pt'

In [None]:
# train
train_people_adj = create_adjacency(train_people)
torch.save(train_people_adj, path.format('train_people'))

train_locations_adj = create_adjacency(train_locations)
torch.save(train_locations_adj, path.format('train_locations'))

train_entities_adj = create_adjacency(train_entities)
torch.save(train_entities_adj, path.format('train_entities'))


In [None]:
# dev
dev_people_adj = create_adjacency(dev_people)
torch.save(dev_people_adj, path.format('dev_people'))

dev_locations_adj = create_adjacency(dev_locations)
torch.save(dev_locations_adj, path.format('dev_locations'))

dev_entities_adj = create_adjacency(dev_entities)
torch.save(dev_entities_adj, path.format('dev_entities'))

In [None]:
# test
test_people_adj = create_adjacency(test_people)
torch.save(test_people_adj, path.format('test_people'))

test_locations_adj = create_adjacency(test_locations)
torch.save(test_locations_adj, path.format('test_locations'))

test_entities_adj = create_adjacency(test_entities)
torch.save(test_entities_adj, path.format('test_entities'))


In [None]:
# train
val_people_adj = create_adjacency(val_people)
torch.save(val_people_adj, path.format('val_people'))

val_locations_adj = create_adjacency(val_locations)
torch.save(val_locations_adj, path.format('val_locations'))

val_entities_adj = create_adjacency(val_entities)
torch.save(val_entities_adj, path.format('val_entities'))