In [13]:
import os

import nltk
import numpy as np
import pandas as pd
from bs4 import BeautifulSoup
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer
from sklearn.metrics import (
    accuracy_score,
    classification_report,
    f1_score,
    precision_score,
    recall_score,
)
from sklearn.model_selection import GridSearchCV, StratifiedKFold, cross_val_score, ParameterGrid
from sklearn.multiclass import OneVsRestClassifier
from sklearn.preprocessing import MultiLabelBinarizer
from sklearn.svm import SVC
from tqdm.auto import tqdm

nltk.download("punkt")
nltk.download("stopwords")
nltk.download("wordnet")

# Task 1: Sentence-level Aspect Based Sentiment Analysis

The core idea for this part is to predict product categories and polarities from given text inputs. One SVM classifier predicts categories, while the other classifier predicts polarities based on the one-hot encoding of categories combined with text feature vectors.

## Category Prediction

In the category prediction phase, instead of using the default assignment of SVM, the probabilities for each category are predicted, and a threshold is manually set to enhance performance and avoid empty outputs. By analyzing the probability distribution of the categories, greater control over the predictions is achieved, making the system more adaptable to different text inputs.

## Polarity Prediction

For the polarity prediction, the categories are split and the text and category vectors are concatenated. This approach strengthens the model's recognition of different categories within a single text, making polarity prediction more specific and relevant to the analyzed aspect.

## Neutral Prediction Strategy

Neutral prediction can be challenging, but it was observed that when the probabilities of positive and negative are close, there is a high likelihood that the polarity is neutral. This observation makes intuitive sense since, when it's difficult to determine whether the sentiment is positive or negative, it often indicates a neutral stance. Based on this insight, a formula was designed to calculate the deviation between different polarities:

```
max(abs(negative - positive)/max(negative, positive), abs(negative - neutral)/max(negative, neutral), abs(neutral - positive)/max(neutral, positive))
```

By comparing the deviation with a threshold, if the deviation is below the threshold, the result is considered as neutral.

## Reason for Choosing SVM

SVM was chosen over simple probabilistic classifiers like Naive Bayes due to its powerful ability to handle high-dimensional data and complex decision boundaries. SVM's margin maximization principle enhances the classifier's generalization, making it a suitable option for text analysis tasks.

## Highlights of the Design

Some key highlights of the system's design include:

- One-hot encoding of categories to enhance feature representation
- Grid search for hyperparameter tuning to optimize model performance
- Threshold management to control category and polarity predictions 

## Design's Rationality

The design is rational as it leverages SVM's core benefits, implementing techniques like one-hot encoding and grid search for performance enhancement. Combining text and category vectors and using probability thresholds provides higher recognition for polarity prediction. Additionally, the neutral prediction strategy showcases the system's logical foundation and adaptability.

In [15]:
# Load the dataset from the XML file
def load_dataset(filename):
    # Open the file and read its contents
    with open(filename, "r", encoding="utf-8") as f:
        contents = f.read()

    # Parse the XML with BeautifulSoup
    bsoup = BeautifulSoup(contents, "lxml")
    sentences = bsoup.find_all("sentence")
    data = []

    # Iterate through the sentences in the XML
    for sent in sentences:
        out_of_scope = sent.get("outofscope", "FALSE")
        id = sent["id"]
        text = sent.text.strip()
        opinions = sent.findAll("opinion")
        labels = []
        
        # Skip the sentence if it is out-of-scope or has no opinions
        if out_of_scope == "TRUE" or len(opinions) == 0:
            continue
        
        # Extract category and polarity from the opinions
        for opi in opinions:
            labels.append(opi["category"] + "#" + opi["polarity"])
        
        # Append the data to the list
        data.append((id, text, labels))
    
    return data

# Define the train and test data files
train_data_file = "Laptops_Train_p1.xml"
test_data_file = "Laptops_Test_p1_gold.xml"

# Load the train and test datasets
train_data = load_dataset(train_data_file)
test_data = load_dataset(test_data_file)

# Define stopwords and stemmer
stopwords = nltk.corpus.stopwords.words("english")
stemmer = nltk.stem.PorterStemmer()



#### Data Processing
This part focus on loading the dataset and preparing the data for training and testing. The preprocess_text function preprocesses the text, in the following aspects: removing stopwords, lower the alphabet, and applying stemming to create a simpler representation of the source text.

In [16]:
# Preprocess the text
def preprocess_text(text):
    tokens = nltk.word_tokenize(text)
    tokens = [
        tok.lower() for tok in tokens if tok.isalpha() and tok.lower() not in stopwords
    ]
    tokens = [stemmer.stem(tok) for tok in tokens]
    return " ".join(tokens)

# Preprocess the train and test texts
train_texts = [preprocess_text(text) for id, text, label in train_data]
test_texts = [preprocess_text(text) for id, text, label in test_data]

# Generate a list of all categories
all_categories = list(
    set(
        [
            cat.split("#")[0] + "#" + cat.split("#")[1]
            for _, _, labels in train_data
            for cat in labels
        ]
    )
)

'''
Method 2: Generate all entity and attribute conbinations

entity_labels = ['DISPLAY', 'KEYBOARD', 'MOUSE', 'MOTHERBOARD', 'CPU', 'FANS_COOLING', 'PORTS', 'MEMORY', 'POWER_SUPPLY', 'OPTICAL_DRIVES', 'BATTERY', 'GRAPHICS', 'HARD_DISK', 'MULTIMEDIA_DEVICES', 'HARDWARE', 'SOFTWARE', 'OS', 'WARRANTY', 'SHIPPING', 'SUPPORT', 'COMPANY']
attribute_labels = ['GENERAL', 'PRICE', 'QUALITY', 'DESIGN_FEATURES', 'OPERATION_PERFORMANCE', 'USABILITY', 'PORTABILITY', 'CONNECTIVITY', 'MISCELLANEOUS']
all_categories = [e + '#' + a for e in entity_labels for a in attribute_labels]
'''

"\nMethod 2: Generate all entity and attribute conbinations\n\nentity_labels = ['LAPTOP', 'DISPLAY', 'KEYBOARD', 'MOUSE', 'MOTHERBOARD', 'CPU', 'FANS_COOLING', 'PORTS', 'MEMORY', 'POWER_SUPPLY', 'OPTICAL_DRIVES', 'BATTERY', 'GRAPHICS', 'HARD_DISK', 'MULTIMEDIA_DEVICES', 'HARDWARE', 'SOFTWARE', 'OS', 'WARRANTY', 'SHIPPING', 'SUPPORT', 'COMPANY']\nattribute_labels = ['GENERAL', 'PRICE', 'QUALITY', 'DESIGN_FEATURES', 'OPERATION_PERFORMANCE', 'USABILITY', 'PORTABILITY', 'CONNECTIVITY', 'MISCELLANEOUS']\nall_categories = [e + '#' + a for e in entity_labels for a in attribute_labels]\n"

#### Feature Engineering
This part mainly focus on generating feature vectors for texts using TfidfVectorizer to account for term frequency and inverse document frequency. It also creates extended vectors by concatenating category one-hot encoding with the text vector representation.

In [17]:
# Initialize the TfidfVectorizer
vectorizer = TfidfVectorizer(ngram_range=(1, 2))

# Transform the train and test texts into feature vectors
X_train = vectorizer.fit_transform(train_texts)
X_test = vectorizer.transform(test_texts)

# Create label mappings for categories
label_mapping = {label: index for index, label in enumerate(all_categories)}
reverse_label_mapping = {index: label for label, index in label_mapping.items()}

# Generate extended feature vectors by concatenating category one-hot encoding and text vector
def generate_extended_vectors(data, categories, label_mapping):
    X_extend = []
    y = []

    for id, text, labels in data:
        for label in labels:
            entity, attribute, polarity = label.split("#")
            category = entity + "#" + attribute

            # Generate one-hot encoding for the category
            category_onehot = np.zeros(len(categories))
            if category in label_mapping:
                category_onehot[label_mapping[category]] = 1

            # Transform the text into a vector
            text_vector = vectorizer.transform([text]).toarray()[0]

            # Concatenate the category one-hot encoding and text vector
            extended_vector = np.hstack((category_onehot, text_vector))
            X_extend.append(extended_vector)
            y.append(polarity)

    return np.array(X_extend), np.array(y)

# Generate extended feature vectors for train and test datasets
X_train_extend, y_train_extend = generate_extended_vectors(
    train_data, all_categories, label_mapping
)
X_test_extend, y_test_extend = generate_extended_vectors(
    test_data, all_categories, label_mapping
)

# Generate target labels for train and test datasets
y_train = []
y_test = []

for _, _, labels in train_data:
    label_mask = [0] * len(label_mapping)
    for cat in labels:
        if cat.split("#")[0] + "#" + cat.split("#")[1] in all_categories:
            label_mask[label_mapping[cat.split("#")[0] + "#" + cat.split("#")[1]]] = 1
        else:
            print("Some thing went wrong, check all_categories.")
    y_train.append(label_mask)

for _, _, labels in test_data:
    label_mask = [0] * len(label_mapping)
    for cat in labels:
        if cat.split("#")[0] + "#" + cat.split("#")[1] in all_categories:
            label_mask[label_mapping[cat.split("#")[0] + "#" + cat.split("#")[1]]] = 1
        else:
            print(
                str(cat.split("#")[0] + "#" + cat.split("#")[1])
                + " will not be predicted."
            )
    y_test.append(label_mask)

# Convert the target labels to NumPy arrays
y_train = np.array(y_train)
y_test = np.array(y_test)

POWER_SUPPLY#GENERAL will not be predicted.
OPTICAL_DRIVES#DESIGN_FEATURES will not be predicted.
OPTICAL_DRIVES#DESIGN_FEATURES will not be predicted.
HARD_DISC#GENERAL will not be predicted.
OPTICAL_DRIVES#GENERAL will not be predicted.
HARD_DISC#OPERATION_PERFORMANCE will not be predicted.
BATTERY#DESIGN_FEATURES will not be predicted.
CPU#GENERAL will not be predicted.
HARD_DISC#GENERAL will not be predicted.
HARD_DISC#OPERATION_PERFORMANCE will not be predicted.
HARD_DISC#OPERATION_PERFORMANCE will not be predicted.
HARD_DISC#OPERATION_PERFORMANCE will not be predicted.
HARD_DISC#OPERATION_PERFORMANCE will not be predicted.


#### Model Training and Hyperparameter Search
Using grid search and cross-validation, the best set of parameters for two One-vs-Rest SVM classifiers are identified - one classifier for category prediction and another for polarity prediction. The best classifiers are then trained on the entire training dataset.

In [18]:
# Category Prediction

# Define the grid of parameters for the category prediction model
param_grid_category = {
    "C": [0.1, 1, 10],
    "kernel": ["rbf"],
    "gamma": [0.1, 1, "scale"],
    "probability": [True],
}

# Initialize variables to store the best category model, its score, and parameters
best_score_category = -1
best_params_category = None
best_classifier_category = None

# Iterate through parameter combinations to find the best model
param_combinations_category = tqdm(ParameterGrid(param_grid_category), desc="Parameter combinations", unit="comb")

for params in param_combinations_category:
    # Train a One-vs-Rest classifier using a Support Vector Machine with the current parameters
    clf_category = OneVsRestClassifier(SVC(**params))
    
    # Use a 2-fold cross-validation to select the best model based on its score
    cv_score_category = GridSearchCV(clf_category, [{}], refit=True, cv=2, verbose=0, return_train_score=False, n_jobs=-1).fit(X_train, y_train).best_score_
    
    # If the current CV score is better than the previous best score, update the best model
    if cv_score_category > best_score_category:
        best_score_category = cv_score_category
        best_params_category = params
        best_classifier_category = clf_category

# Print the best parameters found
print("Best parameters: ", best_params_category)
# Best parameters:  {'C': 10, 'gamma': 'scale', 'kernel': 'rbf'}

# Fit the best classifier to the training data
best_classifier_category.fit(X_train, y_train)


Parameter combinations:   0%|          | 0/9 [00:00<?, ?comb/s]



Best parameters:  {'C': 10, 'gamma': 0.1, 'kernel': 'rbf', 'probability': True}


In [19]:
# Polarity Prediction

'''
It takes a long time to train this model, so the best parameters are provided without searching every time.

# Define the grid of parameters for the polarity prediction model
param_grid_polarity = {
    "C": [0.1, 1, 10],
    "kernel": ["rbf"],
    "gamma": [0.1, 1, "scale"],
    "probability": [True],
}

# Initialize variables to store the best polarity model, its score, and parameters
best_score = -1
best_params = None
best_classifier_polarity = None

# Iterate through parameter combinations to find the best model
param_combinations = tqdm(ParameterGrid(param_grid_polarity), desc="Parameter combinations", unit="comb")

for params in param_combinations:
    # Train a One-vs-Rest classifier using a Support Vector Machine with the current parameters
    clf = OneVsRestClassifier(SVC(**params))
    
    # Use a 2-fold cross-validation to select the best model based on its score
    cv_score = GridSearchCV(clf, [{}], refit=True, cv=2, verbose=0, return_train_score=False, n_jobs=-1).fit(X_train_extend, y_train_extend).best_score_
    
    # If the current CV score is better than the previous best score, update the best model
    if cv_score > best_score:
        best_score = cv_score
        best_params = params
        best_classifier_polarity = clf

# Print the best parameters found
print("Best parameters: ", best_params)
# Best parameters:  {'C': 1, 'gamma': 'scale', 'kernel': 'rbf'}
'''

# Set the best parameters for the polarity prediction model
best_params = {'C': 1, 'gamma': 'scale', 'kernel': 'rbf', 'probability': True}
best_classifier_polarity = OneVsRestClassifier(SVC(**best_params))

# Fit the best classifier for polarity prediction to the training data
best_classifier_polarity.fit(X_train_extend, y_train_extend)



#### Category Prediction 
This part implemented a function that predicts categories for a given text input using the best category classifier. It filters the predictions based on a probability threshold to select the most relevant categories.

In [20]:
# Predict the category given a text input
def predict_category(
    text,
    category_classifier,
    vectorizer,
    label_mapping,
    reverse_label_mapping,
    threshold=0.33,
):
    # Preprocess the input text
    preprocessed_text = preprocess_text(text)
    
    # Transform the preprocessed text into a feature vector
    x = vectorizer.transform([preprocessed_text])
    
    # Predict the probabilities of each category using the category classifier
    category_probabilities = category_classifier.predict_proba(x)[0]

    # Create a list of categories whose probability is above the threshold
    categories = []
    for idx, prob in enumerate(category_probabilities):
        if prob >= threshold:
            categories.append(reverse_label_mapping[idx])

    return categories


# Example
text = "Waiting to install MS Office and see how it goes from there."
predicted_category = predict_category(
    text,
    best_classifier_category,
    vectorizer,
    label_mapping,
    reverse_label_mapping,
)
print(predicted_category)

['SUPPORT#QUALITY']


#### Polarity Prediction
This part implemented a function that predicts polarities for a given text input and category using the best polarity classifier. It returns polarity probabilities in a dictionary format.

In [21]:
# Predict the polarity given a text input and category
def predict_polarity(text, category, polarity_classifier, vectorizer, label_mapping):
    # Preprocess the input text
    preprocessed_text = preprocess_text(text)
    
    # Create an empty one-hot encoding for the provided category
    category_vector = np.zeros(len(all_categories))
    if category in label_mapping:
        category_vector[label_mapping[category]] = 1
    
    # Transform the preprocessed text into a feature vector
    text_vector = vectorizer.transform([preprocessed_text]).toarray()[0]
    
    # Concatenate the category one-hot encoding and text vector
    extended_vector = np.hstack((category_vector, text_vector))
    
    # Predict the probabilities of each polarity using the polarity classifier
    polarity_probabilities = polarity_classifier.predict_proba([extended_vector])[0]
    
    # Return the predictions in a dictionary format
    return {
        "positive": polarity_probabilities[2],
        "negative": polarity_probabilities[0],
        "neutral": polarity_probabilities[1],
    }

# Example
text = "Medium settings on titles like Tomb Raider for it to be acceptable (60 fps is the minimum acceptable fps #pcmasterrace)."
category = "DISPLAY#OPERATION_PERFORMANCE"
polarity_probs = predict_polarity(
    text, category, best_classifier_polarity, vectorizer, label_mapping
)
print(polarity_probs)

{'positive': 0.19389414272155903, 'negative': 0.7581240411717293, 'neutral': 0.047981816106711775}


#### General Prediction 
This part implemented a function to predict categories and polarities simultaneously for a given text input. It calls predict_category and predict_polarity functions for each category and stores the highest probability polarity to the result along with the category.

In [22]:
# Performing general predictions on the given text input
def predict(text, threshold=0.50):
    # Call the 'predict_category' function to predict the categories of the text
    categories = predict_category(
        text,
        best_classifier_category,
        vectorizer,
        label_mapping,
        reverse_label_mapping,
    )
    print(categories)
    
    # Initialize an empty list to store the results of category and polarity predictions
    result = []
    
    # Iterate over the predicted categories
    for category in categories:
        # Call the 'predict_polarity' function to predict the polarity for each category
        polarity_probs = predict_polarity(
            text, category, best_classifier_polarity, vectorizer, label_mapping
        )
        
        # Find the polarity with the highest probability
        max_polarity = max(polarity_probs, key=polarity_probs.get)
        
        # Check if the polarity probabilities are below the threshold
        if polarity_probs["positive"] < threshold and polarity_probs["negative"] < threshold:
            result.append([category, "neutral"])
        else:
            result.append([category, max_polarity])

    # Return the final result as a list of categories and their predicted polarities
    return result

# Example
text = "Waiting to install MS Office and see how it goes from there."
output = predict(text)
print(output)

['SUPPORT#QUALITY']
[['SUPPORT#QUALITY', 'negative']]


#### XML Capture
This part implemented a function to predict and complete the test set with missing categories and polarities by iterating through sentences and making predictions to fill in opinions when necessary.

In [23]:
# Predict and complete the XML file with categories and polarities
def predict_and_complete_xml(input_file, output_file):
    # Open the input XML file and read its contents
    with open(input_file, "r", encoding="utf-8") as f:
        contents = f.read()

    # Parse the XML with BeautifulSoup
    bsoup = BeautifulSoup(contents, "lxml", from_encoding='utf-8')
    sentences = bsoup.find_all("sentence")

    # Iterate through the sentences in the XML file
    for sent in sentences:
        out_of_scope = sent.get("outofscope", "FALSE")
        opinions = sent.find("opinions")

        # If the sentence is not out of scope and has no opinions, predict categories and polarities
        if out_of_scope != "TRUE" and opinions is not None and not opinions.find_all('opinion'):
            text = sent.text.strip()
            predictions = predict(text)

            # If no predictions were made, add an UNKNOWN category and polarity
            if not predictions:
                new_opinion = bsoup.new_tag(
                    "opinion", category="UNKNOWN#UNKNOWN", polarity="UNKNOWN"
                )
                opinions.insert(0, new_opinion)
            else:
                print(predictions)
                
                # For each prediction, create a new opinion tag with the predicted category and polarity
                for pred in predictions:
                    category, polarity = pred
                    new_opinion = bsoup.new_tag(
                        "opinion", category=category, polarity=polarity
                    )
                    opinions.insert(0, new_opinion)

    # Generate the output XML string
    reviews = bsoup.find("reviews")
    xml_output = '<?xml version="1.0" encoding="UTF-8" standalone="yes"?>\n' + str(reviews)

    # Write the output XML string to a new file
    with open(output_file, "w", encoding="utf-8") as f:
        f.write(xml_output)

# Define input and output XML file paths
input_file = 'Laptops_Test_p1_gold.xml'
output_file = 'output.xml'

# Call the predict_and_complete_xml function to generate and save the output XML file
predict_and_complete_xml(input_file, output_file)
print('XML file has been generated and saved as output.xml.')



['SUPPORT#QUALITY']
[['SUPPORT#QUALITY', 'negative']]
['LAPTOP#QUALITY', 'COMPANY#GENERAL']
[['LAPTOP#QUALITY', 'negative'], ['COMPANY#GENERAL', 'negative']]
[]
['LAPTOP#DESIGN_FEATURES', 'LAPTOP#CONNECTIVITY', 'COMPANY#GENERAL']
[['LAPTOP#DESIGN_FEATURES', 'neutral'], ['LAPTOP#CONNECTIVITY', 'negative'], ['COMPANY#GENERAL', 'negative']]
[]
['LAPTOP#CONNECTIVITY', 'LAPTOP#MISCELLANEOUS']
[['LAPTOP#CONNECTIVITY', 'positive'], ['LAPTOP#MISCELLANEOUS', 'positive']]
['LAPTOP#DESIGN_FEATURES']
[['LAPTOP#DESIGN_FEATURES', 'positive']]
['SUPPORT#QUALITY']
[['SUPPORT#QUALITY', 'negative']]
[]
['LAPTOP#GENERAL', 'LAPTOP#MISCELLANEOUS']
[['LAPTOP#GENERAL', 'positive'], ['LAPTOP#MISCELLANEOUS', 'positive']]
['MOUSE#OPERATION_PERFORMANCE', 'SHIPPING#QUALITY']
[['MOUSE#OPERATION_PERFORMANCE', 'negative'], ['SHIPPING#QUALITY', 'negative']]
['LAPTOP#OPERATION_PERFORMANCE']
[['LAPTOP#OPERATION_PERFORMANCE', 'neutral']]
['SOFTWARE#MISCELLANEOUS', 'LAPTOP#MISCELLANEOUS']
[['SOFTWARE#MISCELLANEOUS', 'pos

#### Model Evaluation
Evaluating the performance of the category and polarity prediction models using a variety of metrics like accuracy, weighted precision, recall, and F1-score.

In [24]:
# Evaluate the category prediction model
def evaluate_model_category(y_true, y_probas, model_name, target_names=None, threshold=0.33):
    # Create binary predictions using the threshold
    y_pred = (y_probas >= threshold).astype(int)
    
    # Calculate evaluation metrics
    accuracy = accuracy_score(y_true, y_pred)
    precision = precision_score(y_true, y_pred, average="weighted", zero_division=1)
    recall = recall_score(y_true, y_pred, average="weighted", zero_division=1)
    f1 = f1_score(y_true, y_pred, average="weighted", zero_division=1)

    report = f"""{model_name} Evaluation Report:
        Accuracy : {accuracy:.4f}
        Precision: {precision:.4f}
        Recall   : {recall:.4f}
        F1-Score : {f1:.4f}
    """
    print(report)

    # Print the classification report
    if target_names is not None:
        print(classification_report(y_true, y_pred, target_names=target_names, zero_division=0))

# Category prediction evaluation
y_test_category_probas = best_classifier_category.predict_proba(X_test)
cat_target_names = [reverse_label_mapping[i] for i in range(len(reverse_label_mapping))]
evaluate_model_category(y_test, y_test_category_probas, model_name="Category Prediction", target_names=cat_target_names)

Category Prediction Evaluation Report:
        Accuracy : 0.2949
        Precision: 0.5318
        Recall   : 0.5725
        F1-Score : 0.5147
    
                                          precision    recall  f1-score   support

                  LAPTOP#DESIGN_FEATURES       0.55      0.62      0.58        73
                          SOFTWARE#PRICE       0.00      0.00      0.00         1
                      SOFTWARE#USABILITY       0.25      0.14      0.18         7
                        KEYBOARD#GENERAL       0.17      0.50      0.25         2
            MULTIMEDIA_DEVICES#USABILITY       0.00      0.00      0.00         1
                          LAPTOP#GENERAL       0.58      0.77      0.66       158
              POWER_SUPPLY#MISCELLANEOUS       0.00      0.00      0.00         0
                           SUPPORT#PRICE       0.00      0.00      0.00         2
            LAPTOP#OPERATION_PERFORMANCE       0.63      0.74      0.68        70
                          LAPTO

In [25]:
reverse_label_mapping_polarity = {0: 'negative', 1: 'neutral', 2: 'positive'}
pol_target_names = ['negative', 'neutral', 'positive']

# Evaluate the polarity prediction model
def evaluate_model_polarity(y_true, y_pred, model_name, target_names=None):
    # Calculate evaluation metrics
    accuracy = accuracy_score(y_true, y_pred)
    precision = precision_score(y_true, y_pred, average="weighted", zero_division=1)
    recall = recall_score(y_true, y_pred, average="weighted", zero_division=1)
    f1 = f1_score(y_true, y_pred, average="weighted", zero_division=1)

    # Print the evaluation report
    report = f"""{model_name} Evaluation Report:
        Accuracy : {accuracy:.4f}
        Precision: {precision:.4f}
        Recall   : {recall:.4f}
        F1-Score : {f1:.4f}
    """
    print(report)

    # If target names are provided, print the classification report
    if target_names is not None:
        print(classification_report(y_true, y_pred, target_names=target_names, zero_division=0))

# Define a function to predict polarities with a threshold
def predict_polarity_with_threshold(X, classifier, threshold_ratio=0.1):
    # Get polarity probabilities by using the classifier
    probabilities = classifier.predict_proba(X)
    
    results = []
    
    # Iterate over the probabilities and apply the threshold ratio
    for prob in probabilities:
        negative, neutral, positive = prob
        max_diff_percentage = max(abs(negative - positive)/max(negative, positive), abs(negative - neutral)/max(negative, neutral), abs(neutral - positive)/max(neutral, positive))
        if max_diff_percentage <= threshold_ratio:
            results.append("neutral")
        else:
            max_idx = np.argmax(prob)
            results.append(reverse_label_mapping_polarity[max_idx])
    
    return results

# Create the y_test_polarity based on the test dataset
y_test_polarity = []
for id, text, labels in test_data:
    for label in labels:
        y_test_polarity.append(label.split("#")[2])

# Evaluate the polarity prediction model
y_test_polarity_pred_with_threshold = predict_polarity_with_threshold(X_test_extend, best_classifier_polarity)
evaluate_model_polarity(y_test_polarity, y_test_polarity_pred_with_threshold, model_name="Polarity Prediction", target_names=pol_target_names)


Polarity Prediction Evaluation Report:
        Accuracy : 0.6991
        Precision: 0.6780
        Recall   : 0.6991
        F1-Score : 0.6862
    
              precision    recall  f1-score   support

    negative       0.61      0.62      0.62       274
     neutral       0.20      0.07      0.10        46
    positive       0.76      0.80      0.78       481

    accuracy                           0.70       801
   macro avg       0.52      0.50      0.50       801
weighted avg       0.68      0.70      0.69       801



# Task 1 Model Performance Evaluation

## Category Prediction Model Evaluation

```
Accuracy: 0.2949
Precision: 0.5318
Recall: 0.5725
F1-Score: 0.5147
```

### Possible Reasons for Category Prediction Model Performance:

1. **Imbalanced Data**: In the dataset, some categories occur much more frequently than others. This imbalance can affect the model's performance, as it may overfit to the more frequent categories and struggle to generalize well to less frequent ones. The model will be unable to generalize to categories it has never encountered.

2. **Limited Hyperparameter Search**: Grid search for hyperparameters was conducted over a small, finite parameter space. Exploring a wider set of parameters might help achieve better results under different settings.

3. **Strong Coupling of Entity and Attribute**: The combination of entity and attribute used for training significantly increases the possibilities for model predictions. This may result in reduced prediction effectiveness.

4. **Feature Extraction**: Features were extracted using TfidfVectorizer with an n-gram range of (1, 2). Different feature extraction methods or adjustments to the n-gram range could potentially lead to improved feature representation and model performance.

## Sentiment Polarity Prediction Model Evaluation

```
Accuracy: 0.6991
Precision: 0.6780
Recall: 0.6991
F1-Score: 0.6862
```

### Possible Reasons for Sentiment Polarity Prediction Model Performance:

1. **Parameter Selection**: Due to time constraints, the best parameters for the polarity prediction model could not be obtained through grid search. Therefore, the model may not be trained with the best hyperparameters, possibly affecting its performance.

2. **Neutral Polarity Threshold Ratio**: The function `predict_polarity_with_threshold` applies a threshold ratio to determine neutral polarity. Different threshold ratios may produce varying results; exploring the optimal threshold could be beneficial.

3. **Feature Extraction**: Similar to the category prediction model, the sentiment polarity prediction model also uses TfidfVectorizer with an n-gram range of (1, 2). Different feature extraction methods or adjustments to the n-gram range might lead to better model performance.

## Conclusion

The proposed sentence level ABSA model achieves moderately acceptable results. Further improvements may be attained through extensive grid search for hyperparameters, exploration of the best threshold values, and experimentation.

---

# Task 2: Text-level Aspect-Based Sentiment Analysis
For Task 2, a similar approach to Task 1 is employed, with a few notable differences. The goal here is to analyze sentiment per aspect in the texts, while also identifying conflicts where both positive and negative sentiment coexist for a particular category.

### Dataset Processing
The dataset is processed in a way that all sentences are combined into a single unit for analysis. Tokens are used to better recognize sentence boundaries during training. A [CLS] token represents the start, and a [SEP] token separates sentences, which is inspired by the BERT model implementation.

### Conflict Handling
Since the dataset contains a limited number of examples, training a model to learn the relationship between conflict and text directly may not yield effective features. To address this limitation, a novel approach is proposed, involving training two separate models - an optimistic model and a pessimistic model.

The conflict fundamentally represents both positive and negative perspectives of a particular category in the description. Therefore, an optimistic model is trained to focus more on the positive aspects, whereas a pessimistic model is trained to emphasize the negative aspects. The difference between the positive and negative sentiment probabilities predicted by the two models is analyzed. If the difference exceeds a threshold, it can be inferred that the models' focus is diverse, indicating the presence of conflict.

### SVM Model with Optimistic and Pessimistic Polarity

For Task 2, the SVM model from Task 1 is utilized, but with a significant modification: polarity prediction is split into optimistic and pessimistic polarities. Two separate models are trained to predict the text's polarities, where one focuses on optimistic predictions and the other on pessimistic predictions.

When predicting text, both models analyze the input simultaneously. If they exhibit conflict features (i.e., the difference between positive or negative predictions of the two models exceeds the threshold), the text and corresponding category are defined as in conflict. Otherwise, the models' predictions are averaged, and the process from Task 1 for judgment is repeated.

In [26]:
def load_dataset(filename):
    with open(filename, "r", encoding="utf-8") as f:
        contents = f.read()

    bsoup = BeautifulSoup(contents, "lxml")
    reviews = bsoup.find_all("review")
    data = []

    for review in reviews:
        sentences = review.find_all("sentence")
        text = "[CLS]"
        for sent in sentences:
            text += sent.text.strip() + "[SEP]"
        text = text.strip()
        opinions = review.find_all("opinion")
        labels = []
        for opi in opinions:
            labels.append(opi["category"] + "#" + opi["polarity"])
        id = review["rid"]
        data.append((id, text, labels))
    return data

train_data_file = "Laptops_Train_p2.xml"
test_data_file = "Laptops_Test_p2_gold.xml"

train_data = load_dataset(train_data_file)
test_data = load_dataset(test_data_file)

stopwords = nltk.corpus.stopwords.words("english")
stemmer = nltk.stem.PorterStemmer()

def preprocess_text(text):
    tokens = nltk.word_tokenize(text)
    tokens = [
        tok.lower() for tok in tokens if tok.isalpha() and tok.lower() not in stopwords
    ]
    tokens = [stemmer.stem(tok) for tok in tokens]
    return " ".join(tokens)

train_texts = [preprocess_text(text) for id, text, label in train_data]
test_texts = [preprocess_text(text) for id, text, label in test_data]

all_categories = list(
    set(
        [
            cat.split("#")[0] + "#" + cat.split("#")[1]
            for _, _, labels in train_data
            for cat in labels
        ]
    )
)

vectorizer = TfidfVectorizer(ngram_range=(1, 2))

X_train = vectorizer.fit_transform(train_texts)
X_test = vectorizer.transform(test_texts)

label_mapping = {label: index for index, label in enumerate(all_categories)}
reverse_label_mapping = {index: label for label, index in label_mapping.items()}

y_train = []
y_test = []

for _, _, labels in train_data:
    label_mask = [0] * len(label_mapping)
    for cat in labels:
        if cat.split("#")[0] + "#" + cat.split("#")[1] in all_categories:
            label_mask[label_mapping[cat.split("#")[0] + "#" + cat.split("#")[1]]] = 1
        else:
            print("Some thing went wrong, check all_categories.")
    y_train.append(label_mask)

for _, _, labels in test_data:
    label_mask = [0] * len(label_mapping)
    for cat in labels:
        if cat.split("#")[0] + "#" + cat.split("#")[1] in all_categories:
            label_mask[label_mapping[cat.split("#")[0] + "#" + cat.split("#")[1]]] = 1
        else:
            print(
                str(cat.split("#")[0] + "#" + cat.split("#")[1])
                + " will not be predicted."
            )
    y_test.append(label_mask)

y_train = np.array(y_train)
y_test = np.array(y_test)



POWER_SUPPLY#GENERAL will not be predicted.
OPTICAL_DRIVES#DESIGN_FEATURES will not be predicted.
HARD_DISC#GENERAL will not be predicted.
OPTICAL_DRIVES#GENERAL will not be predicted.
HARD_DISC#OPERATION_PERFORMANCE will not be predicted.
BATTERY#DESIGN_FEATURES will not be predicted.
CPU#GENERAL will not be predicted.
HARD_DISC#GENERAL will not be predicted.
HARD_DISC#OPERATION_PERFORMANCE will not be predicted.
HARD_DISC#OPERATION_PERFORMANCE will not be predicted.


In [27]:
param_grid_category = {
    "C": [10],
    "kernel": ["rbf"],
    "gamma": ["scale"],
    "probability": [True],
}
best_score_category = -1
best_params_category = None
best_classifier_category = None
param_combinations_category = tqdm(ParameterGrid(param_grid_category), desc="Parameter combinations", unit="comb")
for params in param_combinations_category:
    clf_category = OneVsRestClassifier(SVC(**params))
    cv_score_category = GridSearchCV(clf_category, [{}], refit=True, cv=2, verbose=0, return_train_score=False, n_jobs=-1).fit(X_train, y_train).best_score_
    
    if cv_score_category > best_score_category:
        best_score_category = cv_score_category
        best_params_category = params
        best_classifier_category = clf_category
print("Best parameters: ", best_params_category)
# Best parameters:  {'C': 10, 'gamma': 'scale', 'kernel': 'rbf'}
best_classifier_category.fit(X_train, y_train)

Parameter combinations:   0%|          | 0/1 [00:00<?, ?comb/s]



Best parameters:  {'C': 10, 'gamma': 'scale', 'kernel': 'rbf', 'probability': True}




In [28]:
def predict_category(
    text,
    category_classifier,
    vectorizer,
    label_mapping,
    reverse_label_mapping,
    threshold=0.33,
):
    preprocessed_text = preprocess_text(text)
    x = vectorizer.transform([preprocessed_text])
    category_probabilities = category_classifier.predict_proba(x)[0]

    categories = []
    for idx, prob in enumerate(category_probabilities):
        if prob >= threshold:
            categories.append(reverse_label_mapping[idx])

    return categories

text = "I love the size, keyboard, the functions. I don't really have a complaint. It is easy to use, good quality and good price. Perfect trifecta! I would recommend this product."
predicted_category = predict_category(
    text,
    best_classifier_category,
    vectorizer,
    label_mapping,
    reverse_label_mapping,
)
print(predicted_category)

['LAPTOP#DESIGN_FEATURES', 'KEYBOARD#GENERAL', 'LAPTOP#GENERAL', 'LAPTOP#OPERATION_PERFORMANCE', 'LAPTOP#QUALITY', 'LAPTOP#PRICE', 'LAPTOP#USABILITY']


In [29]:
# For test set
def generate_extended_vectors(data, categories, label_mapping):
    X_extend = []
    y = []

    for id, text, labels in data:
        for label in labels:
            entity, attribute, polarity = label.split("#")
            category = entity + "#" + attribute
            category_onehot = np.zeros(len(categories))
            if category in label_mapping:
                category_onehot[label_mapping[category]] = 1
            text_vector = vectorizer.transform([text]).toarray()[0]
            extended_vector = np.hstack((category_onehot, text_vector))
            X_extend.append(extended_vector)
            y.append(polarity)

    return np.array(X_extend), np.array(y)

# Positive model
def generate_extended_vectors_positive(data, categories, label_mapping):
    X_extend = []
    y = []

    for id, text, labels in data:
        for label in labels:
            entity, attribute, polarity = label.split("#")
            if(polarity == "conflict"):
                polarity = "positive"
            category = entity + "#" + attribute
            category_onehot = np.zeros(len(categories))
            if category in label_mapping:
                category_onehot[label_mapping[category]] = 1
            text_vector = vectorizer.transform([text]).toarray()[0]
            extended_vector = np.hstack((category_onehot, text_vector))            
            X_extend.append(extended_vector)
            y.append(polarity)
                
    return np.array(X_extend), np.array(y)

# Negative model
def generate_extended_vectors_negative(data, categories, label_mapping):
    X_extend = []
    y = []

    for id, text, labels in data:
        for label in labels:
            entity, attribute, polarity = label.split("#")
            if(polarity == "conflict"):
                polarity = "negative"
            category = entity + "#" + attribute
            category_onehot = np.zeros(len(categories))
            if category in label_mapping:
                category_onehot[label_mapping[category]] = 1
            text_vector = vectorizer.transform([text]).toarray()[0]
            extended_vector = np.hstack((category_onehot, text_vector))
            X_extend.append(extended_vector)
            y.append(polarity)
                
    return np.array(X_extend), np.array(y)

X_train_extend_positive, y_train_extend_positive = generate_extended_vectors_positive(
    train_data, all_categories, label_mapping
)
X_train_extend_negative, y_train_extend_negative = generate_extended_vectors_negative(
    train_data, all_categories, label_mapping
)

X_test_extend, y_test_extend = generate_extended_vectors(
    test_data, all_categories, label_mapping
)

In [30]:
# Polarity Prediction

'''
param_grid_polarity = {
    "C": [1],
    "kernel": ["rbf"],
    "gamma": ["scale"],
    "probability": [True],
}

best_score = -1
best_params = None
best_classifier_polarity = None

param_combinations = tqdm(ParameterGrid(param_grid_polarity), desc="Parameter combinations", unit="comb")

for params in param_combinations:
    clf = OneVsRestClassifier(SVC(**params))
    cv_score = GridSearchCV(clf, [{}], refit=True, cv=2, verbose=0, return_train_score=False, n_jobs=-1).fit(X_train_extend, y_train_extend).best_score_
    
    if cv_score > best_score:
        best_score = cv_score
        best_params = params
        best_classifier_polarity = clf

print("Best parameters: ", best_params)
# Best parameters:  {'C': 1, 'gamma': 'scale', 'kernel': 'rbf'}
'''

'\nparam_grid_polarity = {\n    "C": [1],\n    "kernel": ["rbf"],\n    "gamma": ["scale"],\n    "probability": [True],\n}\n\nbest_score = -1\nbest_params = None\nbest_classifier_polarity = None\n\nparam_combinations = tqdm(ParameterGrid(param_grid_polarity), desc="Parameter combinations", unit="comb")\n\nfor params in param_combinations:\n    clf = OneVsRestClassifier(SVC(**params))\n    cv_score = GridSearchCV(clf, [{}], refit=True, cv=2, verbose=0, return_train_score=False, n_jobs=-1).fit(X_train_extend, y_train_extend).best_score_\n    \n    if cv_score > best_score:\n        best_score = cv_score\n        best_params = params\n        best_classifier_polarity = clf\n\nprint("Best parameters: ", best_params)\n# Best parameters:  {\'C\': 1, \'gamma\': \'scale\', \'kernel\': \'rbf\'}\n'

In [31]:
# Positive SVC
best_params = {'C': 1, 'gamma': 'scale', 'kernel': 'rbf', 'probability': True}
best_classifier_polarity_positive = OneVsRestClassifier(SVC(**best_params))
best_classifier_polarity_positive.fit(X_train_extend_positive, y_train_extend_positive)



In [32]:
# Negative SVC
best_params = {'C': 1, 'gamma': 'scale', 'kernel': 'rbf', 'probability': True}
best_classifier_polarity_negative = OneVsRestClassifier(SVC(**best_params))
best_classifier_polarity_negative.fit(X_train_extend_negative, y_train_extend_negative)

In [33]:
def predict_polarity(text, category, polarity_classifier_positive, polarity_classifier_negative, vectorizer, label_mapping, threshold=0.09):
    preprocessed_text = preprocess_text(text)
    category_vector = np.zeros(len(all_categories))
    if category in label_mapping:
        category_vector[label_mapping[category]] = 1
    text_vector = vectorizer.transform([preprocessed_text]).toarray()[0]
    extended_vector = np.hstack((category_vector, text_vector))
    #posi
    polarity_probabilities_positive = polarity_classifier_positive.predict_proba([extended_vector])[0]
    print(polarity_probabilities_positive)
    #neg
    polarity_probabilities_negative = polarity_classifier_negative.predict_proba([extended_vector])[0]
    print(polarity_probabilities_negative)
    
    if(max(abs(polarity_probabilities_positive[2] - polarity_probabilities_negative[2]),abs(polarity_probabilities_positive[0] - polarity_probabilities_negative[0])) >= threshold):
        print("Conflict Detected")
        conflict_probabilities = 1
    else:
        conflict_probabilities = 0
    
    return {
        "positive": (polarity_probabilities_positive[2] + polarity_probabilities_negative[2])/2,
        "negative": (polarity_probabilities_positive[0] + polarity_probabilities_negative[0])/2,
        "neutral": (polarity_probabilities_positive[1] + polarity_probabilities_negative[1])/2,
        "conflict": conflict_probabilities
    }

text = "so far so good, i have not encountered any issues with this computer." + "love the flip feature and the touch screen feature." + "for the price that i paid i feel that i got good value." + "this lap top is not the lightest you can purchase, so if that is important to you maybe you should shop some more." + "my previous laptop served me well and i am very with its replacement."

category = "LAPTOP#DESIGN_FEATURES"

polarity_probs = predict_polarity(
    text, category, best_classifier_polarity_positive, best_classifier_polarity_negative, vectorizer, label_mapping
)
print(polarity_probs)

[0.15109307 0.04767473 0.8012322 ]
[0.23308192 0.05347366 0.71344442]
{'positive': 0.7573383088825846, 'negative': 0.19208749517204549, 'neutral': 0.050574195945369924, 'conflict': 0}


In [34]:
# General prediction
def predict(text, threshold=0.50):
    categories = predict_category(
        text,
        best_classifier_category,
        vectorizer,
        label_mapping,
        reverse_label_mapping,
    )
    print(categories)
    result = []
    for category in categories:
        polarity_probs = predict_polarity(
            text, category, best_classifier_polarity_positive, best_classifier_polarity_negative, vectorizer, label_mapping
        )
        
        max_polarity = max(polarity_probs, key=polarity_probs.get)
        if polarity_probs["positive"] < threshold and polarity_probs["negative"] < threshold:
            result.append([category, "neutral"])
        else:
            result.append([category, max_polarity])

    return result


text = "One of the very few laptops still available with Win 7 Pro, multiple USB ports, and a DVD read/write drive." + "Only drawback is no LED to show when the hard drive is being accessed." + "Not a necessity, but something I've come to expect." + "Does what I need it to do and the price was right!"
text = "so far so good, i have not encountered any issues with this computer." + "love the flip feature and the touch screen feature." + "for the price that i paid i feel that i got good value." + "this lap top is not the lightest you can purchase, so if that is important to you maybe you should shop some more." + "my previous laptop served me well and i am very with its replacement."
output = predict(text)
print(output)

['LAPTOP#GENERAL', 'LAPTOP#OPERATION_PERFORMANCE', 'LAPTOP#QUALITY', 'DISPLAY#DESIGN_FEATURES', 'LAPTOP#PRICE']
[0.12469728 0.02878839 0.84651433]
[0.17345347 0.03507022 0.79147631]
[0.17422146 0.03233042 0.79344811]
[0.21684981 0.03822241 0.74492778]
[0.26209871 0.04284442 0.69505687]
[0.37661468 0.04747905 0.57590626]
Conflict Detected
[0.15841119 0.07082912 0.77075969]
[0.21679266 0.07285355 0.71035379]
[0.09071379 0.13784827 0.77143795]
[0.10471326 0.12784676 0.76743999]
[['LAPTOP#GENERAL', 'positive'], ['LAPTOP#OPERATION_PERFORMANCE', 'positive'], ['LAPTOP#QUALITY', 'conflict'], ['DISPLAY#DESIGN_FEATURES', 'positive'], ['LAPTOP#PRICE', 'positive']]


In [35]:
# Takes the untagged reviews and automatically labels them
def predict_and_complete_xml(input_file, output_file):
    with open(input_file, "r", encoding="utf-8") as f:
        contents = f.read()

    bsoup = BeautifulSoup(contents, "lxml", from_encoding='utf-8')
    sentences = bsoup.find_all("sentence")

    for sent in sentences:
        out_of_scope = sent.get("outofscope", "FALSE")
        opinions = sent.find("opinions")

        if out_of_scope != "TRUE" and opinions is not None and not opinions.find_all('opinion'):
            text = sent.text.strip()
            predictions = predict(text)

            if not predictions:
                new_opinion = bsoup.new_tag(
                    "opinion", category="UNKNOWN#UNKNOWN", polarity="UNKNOWN"
                )
                opinions.insert(0, new_opinion)
            else:
                print(predictions)
                for pred in predictions:
                    category, polarity = pred
                    new_opinion = bsoup.new_tag(
                        "opinion", category=category, polarity=polarity
                    )
                    opinions.insert(0, new_opinion)

    reviews = bsoup.find("reviews")
    xml_output = '<?xml version="1.0" encoding="UTF-8" standalone="yes"?>\n' + str(reviews)

    with open(output_file, "w", encoding="utf-8") as f:
        f.write(xml_output)

input_file = 'Laptops_Test_p1_gold.xml'
output_file = 'output.xml'

predict_and_complete_xml(input_file, output_file)
print('XML file has been generated and saved as output.xml.')

['LAPTOP#DESIGN_FEATURES', 'LAPTOP#GENERAL', 'LAPTOP#OPERATION_PERFORMANCE', 'LAPTOP#USABILITY']




[0.26701207 0.05092878 0.68205915]
[0.38839232 0.05509314 0.55651455]
Conflict Detected
[0.33889989 0.0278645  0.63323561]
[0.36269736 0.03335366 0.60394898]
[0.38719815 0.04433915 0.5684627 ]
[0.39472863 0.04826691 0.55700446]
[0.34069326 0.04418091 0.61512583]
[0.35648293 0.04837495 0.59514211]
[['LAPTOP#DESIGN_FEATURES', 'conflict'], ['LAPTOP#GENERAL', 'positive'], ['LAPTOP#OPERATION_PERFORMANCE', 'positive'], ['LAPTOP#USABILITY', 'positive']]
['LAPTOP#GENERAL', 'COMPANY#GENERAL']
[0.32413584 0.03068551 0.64517864]
[0.36607081 0.03593465 0.59799454]
[0.41886726 0.05455911 0.52657363]
[0.43044864 0.05661346 0.51293789]
[['LAPTOP#GENERAL', 'positive'], ['COMPANY#GENERAL', 'positive']]
['LAPTOP#GENERAL', 'LAPTOP#OPERATION_PERFORMANCE', 'LAPTOP#MISCELLANEOUS']
[0.39551354 0.03050695 0.57397951]
[0.39885136 0.03573284 0.5654158 ]
[0.39187978 0.03939069 0.56872953]
[0.37702948 0.04376123 0.57920928]
[0.3519014  0.04376681 0.60433178]
[0.38386692 0.04763629 0.56849679]
[['LAPTOP#GENERAL', 

In [36]:
def evaluate_model_category(y_true, y_probas, model_name, target_names=None, threshold=0.33):
    y_pred = (y_probas >= threshold).astype(int)
    accuracy = accuracy_score(y_true, y_pred)
    precision = precision_score(y_true, y_pred, average="weighted", zero_division=1)
    recall = recall_score(y_true, y_pred, average="weighted", zero_division=1)
    f1 = f1_score(y_true, y_pred, average="weighted", zero_division=1)

    report = f"""{model_name} Evaluation Report:
        Accuracy : {accuracy:.4f}
        Precision: {precision:.4f}
        Recall   : {recall:.4f}
        F1-Score : {f1:.4f}
    """
    print(report)

    if target_names is not None:
        print(classification_report(y_true, y_pred, target_names=target_names, zero_division=0))

# Category prediction evaluation
y_test_category_probas = best_classifier_category.predict_proba(X_test)
cat_target_names = [reverse_label_mapping[i] for i in range(len(reverse_label_mapping))]
evaluate_model_category(y_test, y_test_category_probas, model_name="Category Prediction", target_names=cat_target_names)

Category Prediction Evaluation Report:
        Accuracy : 0.0125
        Precision: 0.6501
        Recall   : 0.6891
        F1-Score : 0.5834
    
                                          precision    recall  f1-score   support

                  LAPTOP#DESIGN_FEATURES       0.59      0.92      0.72        39
                          SOFTWARE#PRICE       0.00      0.00      0.00         1
                      SOFTWARE#USABILITY       0.23      0.50      0.32         6
                        KEYBOARD#GENERAL       0.00      0.00      0.00         2
            MULTIMEDIA_DEVICES#USABILITY       0.00      0.00      0.00         1
                          LAPTOP#GENERAL       1.00      1.00      1.00        80
              POWER_SUPPLY#MISCELLANEOUS       0.00      0.00      0.00         0
                           SUPPORT#PRICE       0.00      0.00      0.00         2
            LAPTOP#OPERATION_PERFORMANCE       0.59      0.96      0.73        46
                          LAPTO

In [37]:
reverse_label_mapping_polarity = {0: 'negative', 1: 'neutral', 2: 'positive', 3: 'conflict'}
pol_target_names = ['conflict', 'negative', 'neutral', 'positive']

def evaluate_model_polarity(y_true, y_pred, model_name, target_names=None):
    accuracy = accuracy_score(y_true, y_pred)
    precision = precision_score(y_true, y_pred, average="weighted", zero_division=1)
    recall = recall_score(y_true, y_pred, average="weighted", zero_division=1)
    f1 = f1_score(y_true, y_pred, average="weighted", zero_division=1)

    report = f"""{model_name} Evaluation Report:
        Accuracy : {accuracy:.4f}
        Precision: {precision:.4f}
        Recall   : {recall:.4f}
        F1-Score : {f1:.4f}
    """
    print(report)

    if target_names is not None:
        print(classification_report(y_true, y_pred, target_names=target_names, zero_division=0))

def predict_polarity_with_threshold(X, classifier_neg, classifier_posi, threshold=0.06, threshold_ratio=0.8):
    polarity_probabilities_negative = classifier_neg.predict_proba(X)
    polarity_probabilities_positive = classifier_posi.predict_proba(X)
    
    prob_array_neg = np.array(polarity_probabilities_negative)
    prob_array_pos = np.array(polarity_probabilities_positive)
    
    averaged_probabilities = (prob_array_neg + prob_array_pos) / 2
    print(averaged_probabilities)
    results = []
    
    for idx, prob in enumerate(averaged_probabilities):
        negative, neutral, positive = prob
        if max(abs(prob_array_pos[idx, 2] - prob_array_neg[idx, 2]),
               abs(prob_array_pos[idx, 0] - prob_array_neg[idx, 0])) >= threshold:
            conflict_probabilities = 1
        else:
            conflict_probabilities = 0

        max_diff_percentage = max(abs(negative - positive)/max(negative, positive), 
                                  abs(negative - neutral)/max(negative, neutral), 
                                  abs(neutral - positive)/max(neutral, positive))

        if conflict_probabilities == 1:
            results.append("conflict")
        elif max_diff_percentage <= threshold_ratio:
            results.append("neutral")
        else:
            max_idx = np.argmax(prob)
            results.append(reverse_label_mapping_polarity[max_idx])
    return results

y_test_polarity = []
for id, text, labels in test_data:
    for label in labels:
        y_test_polarity.append(label.split("#")[2])

y_test_polarity_pred_with_threshold = predict_polarity_with_threshold(X_test_extend, best_classifier_polarity_negative, best_classifier_polarity_positive)
evaluate_model_polarity(y_test_polarity, y_test_polarity_pred_with_threshold, model_name="Polarity Prediction", target_names=pol_target_names)

[[0.05324172 0.03449044 0.91226784]
 [0.06619864 0.04453815 0.88926321]
 [0.05157048 0.03097093 0.91745858]
 ...
 [0.69433492 0.05675874 0.24890634]
 [0.5946218  0.08882448 0.31655372]
 [0.10326006 0.03015902 0.86658092]]
Polarity Prediction Evaluation Report:
        Accuracy : 0.6349
        Precision: 0.7414
        Recall   : 0.6349
        F1-Score : 0.6749
    
              precision    recall  f1-score   support

    conflict       0.08      0.57      0.14        14
    negative       0.76      0.51      0.61       162
     neutral       0.14      0.13      0.13        31
    positive       0.82      0.75      0.78       338

    accuracy                           0.63       545
   macro avg       0.45      0.49      0.42       545
weighted avg       0.74      0.63      0.67       545



# Task 2 Model Performance Evaluation

## Category Prediction Model Evaluation

```
Accuracy: 0.0125
Precision: 0.6501
Recall: 0.6891
F1-Score: 0.5834
```

### Possible Reasons for Category Prediction Model Performance:

Since the category prediction model was not further improved, the possible reasons for its performance are the same as those stated in Task 1.

## Sentiment Polarity Prediction Model Evaluation

```
Accuracy: 0.6349
Precision: 0.7414
Recall: 0.6349
F1-Score: 0.6749
```

### Possible Reasons for Sentiment Polarity Prediction Model Performance and Comparison to Task 1:

1. **Threshold Setting**: Multiple functions apply threshold values, and different thresholds may produce varying results. Manually-determined threshold values in experiments may not be optimal; exploring the best threshold settings could be beneficial.

2. **Text Scope**: Adding more text and opinion labels in the text-level approach may make it difficult for the model to learn relationships between text vectors and opinion labels. This might contribute to the model's slightly lower performance in Task 2 compared to Task 1.

3. **Lower Scores for Conflict and Neutral Labels**: Detailed results show that the F1-scores for 'conflict' (0.14) and 'neutral' (0.13) are less than ideal. This may be related to suboptimal methods for determining these labels.

## Conclusion

The proposed aspect-based sentiment analysis model yields moderately acceptable results, although it exhibits less-than-ideal performance for specific labels, such as 'conflict'. Further improvements may be achieved by conducting extensive hyperparameter grid searches, exploring the best threshold values, and experimenting with different feature extraction methods or adjustments to the n-gram range.

# Future Considerations

The current implementation of aspect-based sentiment analysis utilizes several evaluation metrics, including accuracy, precision, recall, and F1-score, to accurately assess the performance of the classifiers. Due to limitations in dataset size and other factors, model performance is suboptimal, particularly for neutral and conflict sentiment analysis. In future:

1. **Entity and Attribute Decoupling**: The current category prediction treats entity and attribute as a combined unit, resulting in strong coupling between them. This limits the model to predicting only combinations that appear in the training set. To mitigate this issue, entity and attribute can be separated into two distinct attributes, requiring two separate models for prediction.

2. **Finer-Grained Sentence Segmentation**: The current model is expected to learn the relationship between categories and text, which might be challenging. More advanced techniques, such as Context-Free Grammars (CFG), can be utilized to further segment sentences, thus improving training efficacy by addressing noun phrases and identifying polarity-related adjectives.

3. **Enhanced Neutral Detection**: The accuracy of neutral detection is low, which could be due to various factors, such as limited sample size. Alternative methods may address this issue more effectively. For example, avoiding direct prediction of neutral and instead treating positive and negative predictions as two independent events with their respective probability scores. If both probabilities are below a certain threshold, neutral polarity can be predicted. This approach can effectively reduce noise and increase the model's accuracy.

4. **Identifying Opinion Labels in Text**: For Task 2, the model might struggle to learn the relationship between longer texts and opinions. In the future, it could be possible to train a separate model to detect the presence of an opinion in a sentence, analyze each sentence individually, and average the results. If one sentence's polarity is negative and another's is positive, a conflict might be present, which could improve model performance.

5. **Advanced Text Preprocessing Techniques**: Implementing advanced text preprocessing methods, such as lemmatization, removal of special characters, and even considering n-gram representations, may further refine the dataset and enhance model performance.

6. **State-of-the-Art Feature Extraction**: Adopting additional feature extraction techniques, such as word embeddings (e.g., Word2Vec, GloVe, FastText) or contextual embeddings (e.g., BERT, ELMo, GPT-2), can contribute to better capturing of semantic information in text, thereby improving prediction accuracy.

7. **Fine-Tuning Threshold Settings**: In the current experiments, thresholds are set by trial and error, which may not be optimal. Performing a grid search to determine the best threshold values could potentially improve model performance and robustness.

8. **Multi-Task Learning**: Treating category and polarity predictions as interrelated tasks and training a multi-task learning model, either by incorporating shared hidden layers in neural networks or using a common feature representation, could enhance predictive abilities through the learning of shared knowledge across tasks.

9. **Transfer Learning**: Leveraging pre-trained models on related tasks or larger, annotated datasets, such as BERT, to initialize the classifier's weights and subsequently fine-tuning these models on the current dataset can improve performance by harnessing previously learned knowledge in similar contexts.

# References

1. Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). **BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding**. arXiv preprint arXiv:1810.04805. [Link](https://arxiv.org/abs/1810.04805)

2. Cortes, C., & Vapnik, V. (1995). **Support-vector networks**. Machine Learning, 20(3), 273-297. [Link](https://link.springer.com/article/10.1007/BF00994018)

3. Pontiki et al. (2015). **[SemEval-2015 Task 12: Aspect Based Sentiment Analysis]**. SemEval 2015. [Link](https://aclanthology.org/S15-2082)

4. Yue, Shihong, Ping Li, and Peiyi Hao. (2003). **SVM classification: Its contents and challenges**. Applied Mathematics-A Journal of Chinese Universities 18, 332-342.

5. Pang, Bo, Lillian Lee, and Shivakumar Vaithyanathan. (2002). **Thumbs up? Sentiment classification using machine learning techniques**. arXiv preprint cs/0205070. [Link](https://arxiv.org/abs/cs/0205070)