### Guidelines

1. Load dataset
2. Run parallel representation
3. Run AFRN

_Note:_ This file has been updated from the Youtube video to reflect the following change in the package:

- The AFRN module has been renamed to tGBS 


#### 1 Load dataset

In this step, we'll use the distress dataset that is included in the examples folder of the repository.

Textagon requires that the text column in your dataframe has the column name "corpus" and the label column has the name "classLabels".

In [2]:
import pandas as pd
from textagon.textagon import Textagon
from textagon.tGBS import tGBS

df = pd.read_csv(
    "sample_data/distress_raw.txt", 
    sep="\t",     
    header=None, 
    names=["classLabels", "corpus"]
)


  from .autonotebook import tqdm as notebook_tqdm
[nltk_data] Downloading package stopwords to /home/lalor/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /home/lalor/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package averaged_perceptron_tagger_eng to
[nltk_data]     /home/lalor/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger_eng is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package wordnet to /home/lalor/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package sentiwordnet to
[nltk_data]     /home/lalor/nltk_data...
[nltk_data]   Package sentiwordnet is already up-to-date!
[nltk_data] Downloading package vader_lexicon to
[nltk_data]     /home/lalor/nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!
[nltk_data] 

Collecting en-core-web-sm==3.8.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.8.0/en_core_web_sm-3.8.0-py3-none-any.whl (12.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.8/12.8 MB[0m [31m76.3 MB/s[0m eta [36m0:00:00[0m
[?25h[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')


ModuleNotFoundError: No module named 'textagon.tGBS'

#### 2. Run parallel representation



In [None]:

tgon = Textagon(
    inputFile=df, 
    outputFileName="distress"
)

tgon.RunFeatureConstruction()
tgon.RunPostFeatureConstruction()

NameError: name 'Textagon' is not defined

#### 3. Run tGBS

In this step we apply tGBS to score and rank the representations. 


Before running tGBS, we need to unzip the file storing the generated representations. 
In this case, it's named as "distress_representations.zip"

In [None]:
import zipfile
import os

# Specify the path to the zip file
zip_file_path = './output/distress_representations.zip'

# Specify the directory to extract files to
extract_to_directory = './output/distress_representations'

# Ensure the directory exists
os.makedirs(extract_to_directory, exist_ok=True)

# Open the zip file
with zipfile.ZipFile(zip_file_path, 'r') as zip_ref:
    # Extract all the contents
    zip_ref.extractall(extract_to_directory)

print(f"Files extracted to {extract_to_directory}")


Files extracted to ./output/distress_representations


In [None]:

featuresFile = './output/distress_key.txt'
trainFile = './output/distress.csv'
weightFile = './output/distress_weights.txt'


ranker=tGBS(
	featuresFile=featuresFile,
	trainFile=trainFile,
	weightFile=weightFile
)

ranker.RankRepresentations()

Loading features
0 NA
1 BINARY
2 CHARBINARY
Total categories found =  3
Total features found =  163813
Total lexicons =  0
Loading training data
Classes= 2 1 0 Num Instances =  1860
Number of features in Features file and Train file are different!!! 163812 163813
Loading sentiment scores 4763
Loading lexicons...
NumLex =  0 NumLexItems =  0
Assigning training weights
Adding semantic weights
0...
10000...
20000...
30000...
40000...
50000...
60000...
70000...
80000...
90000...
100000...
110000...
120000...
130000...
140000...
150000...
160000...

Running within-category subsumption relations
Subsuming category  1  of  3 NA
Subsuming category  2  of  3 BINARY
Subsuming category  3  of  3 CHARBINARY
Running cross-category subsumption relations
Running parallel relations


The Textagon representations and weights are now stored in the *output* folder, where they can be used for downstream tasks. 

For two such examples, please look at the other notebooks in the examples folder:

- 2-calculate_informativeness.ipynb
- 3-classification_with_textagon.ipynb