This is my solution to the question given as case-study. I will briefly describe the overall procedure initially and also include explanation as docstrings wherever needed. The problem required me to classify publicly-listed companies according to their descriptions. The GNN model was a necessity and hence I had to go through a couple of research papers before attempting to solve it. My approach is as follows:

# Formulation
1. Pre-process the data and tokenize it
2. Convert the text and labels into a graph using the approach given in "Graph Convolutional Networks for Text Classification" by Liang Yao, Chengsheng Mao and Yuan Luo. We create an adjacency matrix out of the text data using the following rules. A(i, j) is given by:

                                                    PMI(i, j)     i, j are words, PMI(i, j) > 0
                                                    TF-IDF(ij)    i is document, j is word
                                                    1             i = j
                                                    0             otherwise

3. After creating the graph using networkx, we then define our GCN model. In this case I have used a 2-layer model.
4. Finally we train the GCN, with the Classification tags as our outputs.

# Implementation
I had to run this notebook on Google cloud by creating a VM instance of AI notebooks as Google Colab and Kaggle both have RAM limits of 13 GB (being free) which weren't enough when the notebook reached the graph making step. 
Hence I used a VM with 8 vCPU's and 64 GB RAM. Even after this, the VM has crashed every time at 40% of the graph plotting being completed with error 524. Though I haven't been able to complete the task, given a bit more time and resources I could surely complete it as I have gone through multiple research papers to grasp what needs to be done logically.

---------------------------------------------------------------------------------------------------------------------------------------------------------------------

In [1]:
import os
import pickle
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
import nltk
import numpy as np
import networkx as nx
from collections import OrderedDict
from itertools import combinations
import math
from tqdm import tqdm

In [2]:
df = pd.read_csv("Training_Data.01 (1).csv")
df.head()

Unnamed: 0,Company Name,Business Description,Industry Classification Tag
0,"ADSOUTH PARTNERS, INC.","Adsouth Partners, Inc. provides advertising ag...",Advertising
1,"Artec Global Media, Inc.","Artec Global Media, Inc., formerly Artec Consu...",Advertising
2,Betawave Corp.,Betawave Corporation provides online marketing...,Advertising
3,BOSTON OMAHA Corp,Boston Omaha Corporation is engaged in the bus...,Advertising
4,Bright Mountain Media Inc,"Bright Mountain Media, Inc. is a digital media...",Advertising


In [3]:
df = df.astype(str)
df.dtypes

Company Name                   object
Business Description           object
Industry Classification Tag    object
dtype: object

In [4]:
def word_word_edges(p_ij):
    word_word = []
    cols = list(p_ij.columns); cols = [str(w) for w in cols]
    for w1, w2 in tqdm(combinations(cols, 2), total=nCr(len(cols), 2)):
        if (p_ij.loc[w1,w2] > 0):
            word_word.append((w1,w2,{"weight":p_ij.loc[w1,w2]}))
    return word_word

def nCr(n,r):
    f = math.factorial
    return int(f(n)/(f(r)*f(n-r)))

def dummy_fun(doc):
    return doc

stopwords = list(set(nltk.corpus.stopwords.words("english")))

# remove stopwords and non-words from tokens list
def filter_tokens(tokens, stopwords):
    tokens1 = []
    for token in tokens:
        if (token not in stopwords) and (token not in [".",",",";","&","'s", ":", "?", "!","(",")",\
            "'","'m","'no","***","--","...","[","]"]):
            tokens1.append(token)
    return tokens1

# Company name isn't important for our problem
df.drop(["Company Name"], axis=1, inplace=True)
    
# tokenize & remove funny characters
df["Business Description"] = df["Business Description"].apply(lambda x: nltk.word_tokenize(x)).apply(lambda x: filter_tokens(x, stopwords))
    
# Tf-idf
print("Calculating Tf-idf...")
vectorizer = TfidfVectorizer(input="content", max_features=None, tokenizer=dummy_fun, preprocessor=dummy_fun)
vectorizer.fit(df["Business Description"])
df_tfidf = vectorizer.transform(df["Business Description"])
df_tfidf = df_tfidf.toarray()
vocab = vectorizer.get_feature_names()
vocab = np.array(vocab)
df_tfidf = pd.DataFrame(df_tfidf,columns=vocab)
    
# PMI between words
names = vocab
n_i  = OrderedDict((name, 0) for name in names)
word2index = OrderedDict( (name,index) for index,name in enumerate(names) )

occurrences = np.zeros( (len(names),len(names)) ,dtype=np.int32)
# Find the co-occurrences:
no_windows = 0
print("Calculating co-occurences...")
window = 10
for l in tqdm(df["Business Description"], total=len(df["Business Description"])):
    for i in range(len(l)-window):
        no_windows += 1
        d = set(l[i:(i+window)])

        for w in d:
            n_i[w] += 1
        for w1,w2 in combinations(d,2):
            i1 = word2index[w1]
            i2 = word2index[w2]

            occurrences[i1][i2] += 1
            occurrences[i2][i1] += 1

### convert to PMI
p_ij = pd.DataFrame(occurrences, index = names,columns=names)/no_windows
p_i = pd.Series(n_i, index=n_i.keys())/no_windows

del occurrences
del n_i

for col in p_ij.columns:
    p_ij[col] = p_ij[col]/p_i[col]
    
for row in p_ij.index:
    p_ij.loc[row,:] = p_ij.loc[row,:]/p_i[row]
    
p_ij = p_ij + 1E-9
flag = 0
for col in p_ij.columns:
    p_ij[col] = np.log(p_ij[col])
    flag += 1
print(flag)

Calculating Tf-idf...


  0%|          | 2/6045 [00:00<09:05, 11.08it/s]

Calculating co-occurences...


100%|██████████| 6045/6045 [03:34<00:00, 28.25it/s]


48760


In [5]:
### Build graph
print("Building graph (No. of document, word nodes: %d, %d)..." %(len(df_tfidf.index), len(vocab)))
G = nx.Graph()
print("Adding document nodes to graph...")
G.add_nodes_from(df_tfidf.index) ## document nodes
print("Adding word nodes to graph...")
G.add_nodes_from(vocab) ## word nodes
### build edges between document-word pairs
print("Building document-word edges...")

document_word = []
flag = 0
for doc in tqdm(df_tfidf.index, total=len(df_tfidf.index)):
    flag += 1
    for w in df_tfidf.columns:
        document_word.append((doc,w,{"weight":df_tfidf.loc[doc,w]}))
    if flag == 2000:                    # break at 2000 iterations as
        break
print(flag)
    
print("Building word-word edges...")
word_word = word_word_edges(p_ij)
print("Adding document-word and word-word edges...")
G.add_edges_from(document_word)
G.add_edges_from(word_word)

  0%|          | 0/6045 [00:00<?, ?it/s]

Building graph (No. of document, word nodes: 6045, 48760)...
Adding document nodes to graph...
Adding word nodes to graph...
Building document-word edges...


 33%|███▎      | 1999/6045 [26:38<53:56,  1.25it/s]  
  0%|          | 1439/1188744420 [00:00<22:57:07, 14386.69it/s]

2000
Building word-word edges...


  7%|▋         | 88852806/1188744420 [23:30<4:51:03, 62983.71it/s] 


KeyboardInterrupt: 

In [None]:
nx.draw(G, with_labels = True)

In [None]:
# 2 layered GCN
# We are going to use a two-layer GCN(features are convolved twice) here as, according to the paper, it gives the best results. 
# The convoluted output feature tensor after the two-layer GCN is given by
import torch
import torch.nn as nn
import torch.nn.functional as F

class gcn(nn.Module):
    def __init__(self, X_size, A_hat, args, bias=True): # X_size = num features
        super(gcn, self).__init__()
        self.A_hat = torch.tensor(A_hat, requires_grad=False).float()
        self.weight = nn.parameter.Parameter(torch.FloatTensor(X_size, args.hidden_size_1))
        var = 2./(self.weight.size(1)+self.weight.size(0))
        self.weight.data.normal_(0,var)
        self.weight2 = nn.parameter.Parameter(torch.FloatTensor(args.hidden_size_1, args.hidden_size_2))
        var2 = 2./(self.weight2.size(1)+self.weight2.size(0))
        self.weight2.data.normal_(0,var2)
        if bias:
            self.bias = nn.parameter.Parameter(torch.FloatTensor(args.hidden_size_1))
            self.bias.data.normal_(0,var)
            self.bias2 = nn.parameter.Parameter(torch.FloatTensor(args.hidden_size_2))
            self.bias2.data.normal_(0,var2)
        else:
            self.register_parameter("bias", None)
        self.fc1 = nn.Linear(args.hidden_size_2, args.num_classes)
        
    def forward(self, X): ### 2-layer GCN architecture
        X = torch.mm(X, self.weight)
        if self.bias is not None:
            X = (X + self.bias)
        X = F.relu(torch.mm(self.A_hat, X))
        X = torch.mm(X, self.weight2)
        if self.bias2 is not None:
            X = (X + self.bias2)
        X = F.relu(torch.mm(self.A_hat, X))
        return self.fc1(X)

In [None]:
df_test = pd.read_csv("Testing_Data_2_ (1).csv")