# Aspect-Based Sentiment Analysis (ABSA): Cleaning, Detection & Evaluation

To extract structured signals from unstructured Glassdoor reviews, this notebook applies **Aspect-Based Sentiment Analysis (ABSA)** on the `pros` and `cons` sections of each review. The final goal is to build company-level summaries of how employees feel about different aspects like “pay”, “management”, or “culture”.

We proceed in several key stages:

1. **Data Sampling & Cleaning**  
   We preprocess a small sample of reviews using four distinct cleaning strategies to test which one yields the most accurate aspect detection and sentiment classification.

2. **Modeling Approaches for ABSA**  
   We implement two models:
   - **PyABSA** for automatic aspect term extraction
   - **BERT-based classifier** for assigning sentiment scores to each known aspect

3. **Manual Aspect Dictionary Construction**  
   Using PyABSA output and domain knowledge, we build an enhanced dictionary of aspects and their keywords (e.g., “pay” includes "salary", "bonus", etc.).

4. **Pipeline Evaluation**  
   We test four cleaning variants using a manually annotated evaluation set and compare performance across accuracy metrics.

5. **Full Dataset Inference**  
   Once the best approach is identified, we apply it to the full dataset to generate aspect-based star ratings, which feed into downstream company analysis, recommendation, and summarization.


In [1]:
import pandas as pd
import numpy as np
import spacy
import nltk
import string
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from scipy.special import softmax
import torch
from textblob import TextBlob
from tqdm.notebook import tqdm
from symspellpy import SymSpell, Verbosity
import pkg_resources
tqdm.pandas()
pd.set_option('display.max_colwidth', None)
from pyabsa import ATEPCCheckpointManager
from collections import defaultdict
import pandas as pd
import ast, re, itertools
from pathlib import Path
from sklearn.metrics import f1_score, precision_recall_fscore_support, accuracy_score



No CUDA GPU found in your device


/bin/sh: nvidia-smi: command not found


[2025-05-14 21:58:45] (2.4.1.post1) PyABSA(2.4.1.post1): If your code crashes on Colab, please use the GPU runtime. Then run "pip install pyabsa[dev] -U" and restart the kernel.
Or if it does not work, you can use v1.x versions, e.g., pip install pyabsa<2.0 -U




Try to downgrade transformers<=4.29.0.






  _warn(f"unclosed running multiprocessing pool {self!r}",


In [2]:
df = pd.read_csv('filtered_england_recent.csv')

## 1. Text Cleaning Strategies

Employee reviews vary in formality, spelling, and structure. To ensure clean, consistent input for aspect detection and sentiment modeling, we test four cleaning pipelines:

- **min**: Minimal cleaning (newline and encoding fixes)
- **lemma**: Lemmatization only, keeping all words
- **stopword**: Lemmatization with stopwords removed
- **symspell**: Spelling correction with SymSpell followed by lemmatization

Each cleaned version of the text is stored in new columns so we can later compare their impact on our final ABSA performance.


In [3]:
# Load SpaCy English model
nlp = spacy.load("en_core_web_sm")

# Sample 5% of the data 
df_sample = df.sample(frac=0.01, random_state=21).copy()
def clean_min(text):
    if not isinstance(text, str):
        return ""
    return text.replace('\n', ' ').replace('\r', '').encode("ascii", errors="ignore").decode().strip()

#only lematizing, want to keep as many words as possible. 
def preprocess_text_lemmaonly(text):
    if pd.isnull(text):
        return ""
    doc = nlp(text.lower())
    return " ".join(token.lemma_ for token in doc if not token.is_punct and not token.is_space)

def preprocess_text_with_stopwords(text):
    if pd.isnull(text):
        return ""
    doc = nlp(text.lower())
    return " ".join(
        token.lemma_ 
        for token in doc 
        if not token.is_punct 
        and not token.is_space 
        and not token.is_stop
    )

sym_spell = SymSpell(max_dictionary_edit_distance=2, prefix_length=7)
def correct_token(token_text):
    suggestions = sym_spell.lookup(token_text, Verbosity.CLOSEST, max_edit_distance=2)
    if suggestions:
        return suggestions[0].term
    return token_text

def preprocess_text_symspell(text):
    if pd.isnull(text):
        return ""
    corrected_tokens = []
    for token in nlp(text.lower()):
        if token.is_punct or token.is_space or token.like_num or token.is_stop or len(token) < 3:
            continue
        corrected = correct_token(token.text)
        doc = nlp(corrected)  # lemmatize after correction
        for sub_token in doc:
            if not sub_token.is_punct and not sub_token.is_space:
                corrected_tokens.append(sub_token.lemma_)
    return " ".join(corrected_tokens)


# Apply preprocessing with progress bar
for col in ['pros', 'cons']:
    print(f"Processing column: {col}")
    df_sample[f'{col}_clean_min'] = df_sample[col].progress_apply(clean_min)
    df_sample[f'{col}_clean_lemma_only'] = df_sample[col].progress_apply(preprocess_text_lemmaonly)
    df_sample[f'{col}_clean_with_stopwords'] = df_sample[col].progress_apply(preprocess_text_with_stopwords)
    df_sample[f'{col}_clean_with_symspell'] = df_sample[col].progress_apply(preprocess_text_symspell)


# Preview result
df_sample.head()

Processing column: pros


  0%|          | 0/761 [00:00<?, ?it/s]

  0%|          | 0/761 [00:00<?, ?it/s]

  0%|          | 0/761 [00:00<?, ?it/s]

  0%|          | 0/761 [00:00<?, ?it/s]

Processing column: cons


  0%|          | 0/761 [00:00<?, ?it/s]

  0%|          | 0/761 [00:00<?, ?it/s]

  0%|          | 0/761 [00:00<?, ?it/s]

  0%|          | 0/761 [00:00<?, ?it/s]

Unnamed: 0.1,Unnamed: 0,firm,date_review,job_title,current,location,overall_rating,work_life_balance,culture_values,diversity_inclusion,...,cons,year,pros_clean_min,pros_clean_lemma_only,pros_clean_with_stopwords,pros_clean_with_symspell,cons_clean_min,cons_clean_lemma_only,cons_clean_with_stopwords,cons_clean_with_symspell
72228,819748,Vodafone,2021-05-04,Business Analyst,"Current Employee, more than 1 year","Reading, England, England",4,,,,...,Pay could be a little higher,2021,Champions team work and builds a great network,champion team work and build a great network,champion team work build great network,champion team work build great network,Pay could be a little higher,pay could be a little high,pay little high,pay little higher
37518,323899,Hastings Direct,2019-05-31,Customer Service Representative,"Current Employee, more than 3 years","Bexhill, England, England",2,2.0,2.0,,...,"The company is compiled of 90% student employees, however, team leaders at Hastings, do not remember this, and it is an ordeal for students to get time off for exams If a student needs time off for exams and they have the holiday time available management should not require them to make up the time, they should have the option to use a holiday. The company should not be allowed to deny any student time off for exams. I have been there for three years and I notice the changes; now it is a pressure cooker. The original TL I had recently quit because the direction the company is going in is not about treating employees fairly. The TL's have their favourites and if you are not one of them you will be subjected to bullying. Upper management does nothing to intervene and you are treated like just a number, and your only purpose here is to make as much money for the company as possible and spend less time on the phone trying to help customers. Different information told from coaches so customers are angry. Some TL's have inappropriate relationships with team members. No chance for advancement in the company unless you are friends with TL's\r\n\r\nShow Less\r\nAdvice to Management",2019,Pay and the other employees,pay and the other employee,pay employee,pay employee,"The company is compiled of 90% student employees, however, team leaders at Hastings, do not remember this, and it is an ordeal for students to get time off for exams If a student needs time off for exams and they have the holiday time available management should not require them to make up the time, they should have the option to use a holiday. The company should not be allowed to deny any student time off for exams. I have been there for three years and I notice the changes; now it is a pressure cooker. The original TL I had recently quit because the direction the company is going in is not about treating employees fairly. The TL's have their favourites and if you are not one of them you will be subjected to bullying. Upper management does nothing to intervene and you are treated like just a number, and your only purpose here is to make as much money for the company as possible and spend less time on the phone trying to help customers. Different information told from coaches so customers are angry. Some TL's have inappropriate relationships with team members. No chance for advancement in the company unless you are friends with TL's Show Less Advice to Management",the company be compile of 90 student employee however team leader at hasting do not remember this and it be an ordeal for student to get time off for exam if a student need time off for exam and they have the holiday time available management should not require they to make up the time they should have the option to use a holiday the company should not be allow to deny any student time off for exam I have be there for three year and I notice the change now it be a pressure cooker the original tl I have recently quit because the direction the company be go in be not about treat employee fairly the tl 's have their favourite and if you be not one of they you will be subject to bully upper management do nothing to intervene and you be treat like just a number and your only purpose here be to make as much money for the company as possible and spend less time on the phone try to help customer different information tell from coach so customer be angry some tl 's have inappropriate relationship with team member no chance for advancement in the company unless you be friend with tl 's show less advice to management,company compile 90 student employee team leader hasting remember ordeal student time exam student need time exam holiday time available management require time option use holiday company allow deny student time exam year notice change pressure cooker original tl recently quit direction company go treat employee fairly tl favourite subject bully upper management intervene treat like number purpose money company possible spend time phone try help customer different information tell coach customer angry tl inappropriate relationship team member chance advancement company friend tl advice management,company compile student employee team leader hasting remember ordeal student time exam student need time exam holiday time available management require time option use holiday company allow deny student time exam year notice change pressure cooker original recently quit direction company go treat employee fairly favourite subject bully upper management intervene treat like number purpose money company possible spend time phone try help customer different information tell coach customer angry inappropriate relationship team member chance advancement company friend advice management
38840,327300,Her Majesty s Revenue and Customs,2016-12-07,Senior Officer,"Current Employee, more than 10 years","London, England, England",3,2.0,2.0,,...,"Often you get the feeling of disorganised chaos because the department is breaking through to the digital world and setting up for tax payers to become more able to self-serve. \r\nOften you have to work long hours to get the job done, even though work-life balance is promoted by senior managers, which is a contradiction.",2016,"Flexible working is one of the biggest pros. Some departments are more stretched than others, which will mean work-life balance in some areas could be better. If you are able and willing to move for a job then this is fine as the Civil Service is massively reducing their building locations. There are some opportunities to work from home. Promotion prospects are good, generally speaking.",flexible working be one of the big pro some department be more stretched than other which will mean work life balance in some area could be well if you be able and willing to move for a job then this be fine as the civil service be massively reduce their building location there be some opportunity to work from home promotion prospect be good generally speak,flexible working big pro department stretched mean work life balance area well able willing job fine civil service massively reduce building location opportunity work home promotion prospect good generally speak,flexible work big pro department stretch mean work life balance area well able willing job fine civil service massively reduce build location opportunity work home promotion prospect good generally speak,"Often you get the feeling of disorganised chaos because the department is breaking through to the digital world and setting up for tax payers to become more able to self-serve. Often you have to work long hours to get the job done, even though work-life balance is promoted by senior managers, which is a contradiction.",often you get the feeling of disorganised chaos because the department be break through to the digital world and set up for tax payer to become more able to self serve often you have to work long hour to get the job do even though work life balance be promote by senior manager which be a contradiction,feeling disorganised chaos department break digital world set tax payer able self serve work long hour job work life balance promote senior manager contradiction,feel disorganise chaos department break digital world set tax payer able self serve work long hour job work life balance promote senior manager contradiction
32069,270294,Foxtons,2016-05-13,.,Former Employee,"London, England, England",1,1.0,1.0,,...,"Back-stabbing workers, no such thing as friends here, everyone is out for themselves.\r\nMicromanagement from 'team leaders' who think they are something special when really the power of managing someone else gets to their head.\r\nYou cannot take sick leave/ trips to the doctor or you are given a warning\r\nThe hours are nearly 12 hour days, starting with pointless morning and afternoon meetings\r\nThe uniform policy is very strict and they treat you like children.\r\nYou have no work-life balance, say goodbye to your 'outside work friend' and say hello to your new foxtons family, you eat, sleep and play with them. - I say sleep literally as there are a lot of relationships that go on with people in the same offices, (lots of bias and nepotism)",2016,The play music in both front office and head office!,the play music in both front office and head office,play music office head office,play music office head office,"Back-stabbing workers, no such thing as friends here, everyone is out for themselves. Micromanagement from 'team leaders' who think they are something special when really the power of managing someone else gets to their head. You cannot take sick leave/ trips to the doctor or you are given a warning The hours are nearly 12 hour days, starting with pointless morning and afternoon meetings The uniform policy is very strict and they treat you like children. You have no work-life balance, say goodbye to your 'outside work friend' and say hello to your new foxtons family, you eat, sleep and play with them. - I say sleep literally as there are a lot of relationships that go on with people in the same offices, (lots of bias and nepotism)",back stab worker no such thing as friend here everyone be out for themselves micromanagement from team leader who think they be something special when really the power of manage someone else get to their head you can not take sick leave/ trip to the doctor or you be give a warning the hour be nearly 12 hour day start with pointless morning and afternoon meeting the uniform policy be very strict and they treat you like child you have no work life balance say goodbye to your outside work friend and say hello to your new foxton family you eat sleep and play with they I say sleep literally as there be a lot of relationship that go on with people in the same office lot of bias and nepotism,stab worker thing friend micromanagement team leader think special power manage get head sick leave/ trip doctor give warning hour nearly 12 hour day start pointless morning afternoon meeting uniform policy strict treat like child work life balance goodbye outside work friend hello new foxton family eat sleep play sleep literally lot relationship people office lot bias nepotism,stab worker thing friend micromanagement team leader think special power manage get head sick leave/ trip doctor give warn hour nearly hour day start pointless morning afternoon meeting uniform policy strict treat like child work life balance goodbye outside work friend hello new foxton family eat sleep play sleep literally lot relationship people office lot bias nepotism
68859,781407,Thomson Reuters,2018-06-17,Solutions Architect,"Current Employee, less than 1 year","London, England, England",1,1.0,2.0,,...,- Poor management \r\n- No future career developments \r\n- Lack of support from management\r\n- Job role are not clear and not based on the job descriptions\r\n- long work hours as required to work with US region,2018,There is not any positive side in this organisation,there be not any positive side in this organisation,positive organisation,positive organisation,- Poor management - No future career developments - Lack of support from management - Job role are not clear and not based on the job descriptions - long work hours as required to work with US region,poor management no future career development lack of support from management job role be not clear and not base on the job description long work hour as require to work with us region,poor management future career development lack support management job role clear base job description long work hour require work region,poor management future career development lack support management job role clear base job description long work hour require work region


In [None]:
#printing new columns
df_sample[['pros', 'cons', 'pros_clean_lemma_only', 'pros_clean_with_stopwords', 'pros_clean_with_symspell', 'cons_clean_lemma_only','cons_clean_with_stopwords', 'cons_clean_with_symspell']].head()

Unnamed: 0,pros,cons,pros_clean_lemma_only,pros_clean_with_stopwords,pros_clean_with_symspell,cons_clean_lemma_only,cons_clean_with_stopwords,cons_clean_with_symspell
72228,Champions team work and builds a great network,Pay could be a little higher,champion team work and build a great network,champion team work build great network,champion team work build great network,pay could be a little high,pay little high,pay little higher
37518,Pay and the other employees,"The company is compiled of 90% student employees, however, team leaders at Hastings, do not remember this, and it is an ordeal for students to get time off for exams If a student needs time off for exams and they have the holiday time available management should not require them to make up the time, they should have the option to use a holiday. The company should not be allowed to deny any student time off for exams. I have been there for three years and I notice the changes; now it is a pressure cooker. The original TL I had recently quit because the direction the company is going in is not about treating employees fairly. The TL's have their favourites and if you are not one of them you will be subjected to bullying. Upper management does nothing to intervene and you are treated like just a number, and your only purpose here is to make as much money for the company as possible and spend less time on the phone trying to help customers. Different information told from coaches so customers are angry. Some TL's have inappropriate relationships with team members. No chance for advancement in the company unless you are friends with TL's\r\n\r\nShow Less\r\nAdvice to Management",pay and the other employee,pay employee,pay employee,the company be compile of 90 student employee however team leader at hasting do not remember this and it be an ordeal for student to get time off for exam if a student need time off for exam and they have the holiday time available management should not require they to make up the time they should have the option to use a holiday the company should not be allow to deny any student time off for exam I have be there for three year and I notice the change now it be a pressure cooker the original tl I have recently quit because the direction the company be go in be not about treat employee fairly the tl 's have their favourite and if you be not one of they you will be subject to bully upper management do nothing to intervene and you be treat like just a number and your only purpose here be to make as much money for the company as possible and spend less time on the phone try to help customer different information tell from coach so customer be angry some tl 's have inappropriate relationship with team member no chance for advancement in the company unless you be friend with tl 's show less advice to management,company compile 90 student employee team leader hasting remember ordeal student time exam student need time exam holiday time available management require time option use holiday company allow deny student time exam year notice change pressure cooker original tl recently quit direction company go treat employee fairly tl favourite subject bully upper management intervene treat like number purpose money company possible spend time phone try help customer different information tell coach customer angry tl inappropriate relationship team member chance advancement company friend tl advice management,company compile student employee team leader hasting remember ordeal student time exam student need time exam holiday time available management require time option use holiday company allow deny student time exam year notice change pressure cooker original recently quit direction company go treat employee fairly favourite subject bully upper management intervene treat like number purpose money company possible spend time phone try help customer different information tell coach customer angry inappropriate relationship team member chance advancement company friend advice management
38840,"Flexible working is one of the biggest pros. Some departments are more stretched than others, which will mean work-life balance in some areas could be better. \r\nIf you are able and willing to move for a job then this is fine as the Civil Service is massively reducing their building locations. There are some opportunities to work from home.\r\nPromotion prospects are good, generally speaking.","Often you get the feeling of disorganised chaos because the department is breaking through to the digital world and setting up for tax payers to become more able to self-serve. \r\nOften you have to work long hours to get the job done, even though work-life balance is promoted by senior managers, which is a contradiction.",flexible working be one of the big pro some department be more stretched than other which will mean work life balance in some area could be well if you be able and willing to move for a job then this be fine as the civil service be massively reduce their building location there be some opportunity to work from home promotion prospect be good generally speak,flexible working big pro department stretched mean work life balance area well able willing job fine civil service massively reduce building location opportunity work home promotion prospect good generally speak,flexible work big pro department stretch mean work life balance area well able willing job fine civil service massively reduce build location opportunity work home promotion prospect good generally speak,often you get the feeling of disorganised chaos because the department be break through to the digital world and set up for tax payer to become more able to self serve often you have to work long hour to get the job do even though work life balance be promote by senior manager which be a contradiction,feeling disorganised chaos department break digital world set tax payer able self serve work long hour job work life balance promote senior manager contradiction,feel disorganise chaos department break digital world set tax payer able self serve work long hour job work life balance promote senior manager contradiction
32069,The play music in both front office and head office!,"Back-stabbing workers, no such thing as friends here, everyone is out for themselves.\r\nMicromanagement from 'team leaders' who think they are something special when really the power of managing someone else gets to their head.\r\nYou cannot take sick leave/ trips to the doctor or you are given a warning\r\nThe hours are nearly 12 hour days, starting with pointless morning and afternoon meetings\r\nThe uniform policy is very strict and they treat you like children.\r\nYou have no work-life balance, say goodbye to your 'outside work friend' and say hello to your new foxtons family, you eat, sleep and play with them. - I say sleep literally as there are a lot of relationships that go on with people in the same offices, (lots of bias and nepotism)",the play music in both front office and head office,play music office head office,play music office head office,back stab worker no such thing as friend here everyone be out for themselves micromanagement from team leader who think they be something special when really the power of manage someone else get to their head you can not take sick leave/ trip to the doctor or you be give a warning the hour be nearly 12 hour day start with pointless morning and afternoon meeting the uniform policy be very strict and they treat you like child you have no work life balance say goodbye to your outside work friend and say hello to your new foxton family you eat sleep and play with they I say sleep literally as there be a lot of relationship that go on with people in the same office lot of bias and nepotism,stab worker thing friend micromanagement team leader think special power manage get head sick leave/ trip doctor give warning hour nearly 12 hour day start pointless morning afternoon meeting uniform policy strict treat like child work life balance goodbye outside work friend hello new foxton family eat sleep play sleep literally lot relationship people office lot bias nepotism,stab worker thing friend micromanagement team leader think special power manage get head sick leave/ trip doctor give warn hour nearly hour day start pointless morning afternoon meeting uniform policy strict treat like child work life balance goodbye outside work friend hello new foxton family eat sleep play sleep literally lot relationship people office lot bias nepotism
68859,There is not any positive side in this organisation,- Poor management \r\n- No future career developments \r\n- Lack of support from management\r\n- Job role are not clear and not based on the job descriptions\r\n- long work hours as required to work with US region,there be not any positive side in this organisation,positive organisation,positive organisation,poor management no future career development lack of support from management job role be not clear and not base on the job description long work hour as require to work with us region,poor management future career development lack support management job role clear base job description long work hour require work region,poor management future career development lack support management job role clear base job description long work hour require work region


## 2. Discovering Aspects Using PyABSA

We use PyABSA’s pretrained ATEPC model to **automatically identify aspect terms** in both `pros` and `cons`. This serves two purposes:

- Provides an empirical list of common aspects directly from user language
- Informs construction of our manual aspect dictionary (later used with BERT)

We apply PyABSA to a random 0.5 sample of the cleaned text and aggregate the most frequently mentioned aspect terms across reviews. We use minimal cleaning (clean min) to preserve informal expressions and domain-specific phrasing that may be important for accurate automatic aspect detection.


In [5]:
model_name = "nlptown/bert-base-multilingual-uncased-sentiment"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

In [6]:
def get_aspect_sentiment(text, aspect, tone):
    if pd.isnull(text):
        return None

    # Compose prompt-like input for aspect analysis
    if tone == 'pos':
        combined = f"The following is a positive experience about {aspect}: {text}"
    else:
        combined = f"The following is a negative experience about {aspect}: {text}"

    inputs = tokenizer(combined, return_tensors="pt", truncation=True, max_length=512)

    with torch.no_grad():
        logits = model(**inputs).logits
    scores = softmax(logits.numpy()[0])

    # This model returns scores for: [1 star, 2 stars, 3 stars, 4 stars, 5 stars]
    star_labels = [1, 2, 3, 4, 5]
    result = dict(zip(star_labels, scores))
    result["label"] = star_labels[scores.argmax()]
    return result


In [7]:
from collections import Counter
failed_sentiment = Counter()

def analyze_all_aspect_groups(text, aspect_groups, tone):
    results = {}
    for aspect, keywords in aspect_groups.items():
        if any(kw in text.lower() for kw in keywords):
            sent = get_aspect_sentiment(text, aspect, tone)
            if sent:
                results[aspect] = sent
            else:
                failed_sentiment[aspect] += 1  # Track failures
    return results


In [8]:
# Load ABSA model
aspect_extractor = ATEPCCheckpointManager.get_aspect_extractor(
    checkpoint='english',
    auto_device=True
)

# Sample for ABSA (adjust size as needed)
texts_pros = df_sample['pros_clean_min'].dropna().sample(frac=0.5, random_state=42).tolist()
texts_cons = df_sample['cons_clean_min'].dropna().sample(frac=0.5, random_state=42).tolist()

# Extract aspects from pros and cons
absa_results_pros = aspect_extractor.extract_aspect(inference_source=texts_pros, print_result=False)
absa_results_cons = aspect_extractor.extract_aspect(inference_source=texts_cons, print_result=False)

# Collect discovered aspects
from collections import Counter

aspect_counter = Counter()
for result in absa_results_pros + absa_results_cons:
    for asp in result.get("aspect", []):
        aspect_counter[asp.lower()] += 1

# Show top discovered aspects
top_aspects = [asp for asp, count in aspect_counter.items() if count >= 3]
print(f"Top discovered aspects: {top_aspects}")


[2025-05-14 21:59:37] (2.4.1.post1) ********** Available ATEPC model checkpoints for Version:2.4.1.post1 (this version) **********
[2025-05-14 21:59:37] (2.4.1.post1) ********** Available ATEPC model checkpoints for Version:2.4.1.post1 (this version) **********
[2025-05-14 21:59:37] (2.4.1.post1) Downloading checkpoint:english 
[2025-05-14 21:59:37] (2.4.1.post1) Notice: The pretrained model are used for testing, it is recommended to train the model on your own custom datasets
[2025-05-14 21:59:37] (2.4.1.post1) Checkpoint already downloaded, skip
[2025-05-14 21:59:37] (2.4.1.post1) Load aspect extractor from checkpoints/ATEPC_ENGLISH_CHECKPOINT/fast_lcf_atepc_English_cdw_apcacc_82.36_apcf1_81.89_atef1_75.43
[2025-05-14 21:59:37] (2.4.1.post1) config: checkpoints/ATEPC_ENGLISH_CHECKPOINT/fast_lcf_atepc_English_cdw_apcacc_82.36_apcf1_81.89_atef1_75.43/fast_lcf_atepc.config
[2025-05-14 21:59:37] (2.4.1.post1) state_dict: checkpoints/ATEPC_ENGLISH_CHECKPOINT/fast_lcf_atepc_English_cdw_apc

  torch.load(
  _EPOCH_DATETIME_NAIVE = datetime.datetime.utcfromtimestamp(0)
preparing ate inference dataloader: 100%|██████████| 380/380 [00:00<00:00, 3691.11it/s]
extracting aspect terms: 100%|██████████| 12/12 [00:39<00:00,  3.30s/it]
preparing apc inference dataloader: 100%|██████████| 473/473 [00:00<00:00, 2140.42it/s]
  lcf_cdm_vec = torch.tensor(
classifying aspect sentiments: 100%|██████████| 15/15 [00:53<00:00,  3.59s/it]


[2025-05-14 22:01:13] (2.4.1.post1) The results of aspect term extraction have been saved in /Users/monikakaczorowska/Desktop/glassdoor_reviews/Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


preparing ate inference dataloader: 100%|██████████| 380/380 [00:00<00:00, 2594.14it/s]
extracting aspect terms: 100%|██████████| 12/12 [00:41<00:00,  3.48s/it]
preparing apc inference dataloader: 100%|██████████| 374/374 [00:00<00:00, 1437.90it/s]
classifying aspect sentiments: 100%|██████████| 12/12 [00:44<00:00,  3.69s/it]


[2025-05-14 22:02:40] (2.4.1.post1) The results of aspect term extraction have been saved in /Users/monikakaczorowska/Desktop/glassdoor_reviews/Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json
Top discovered aspects: ['pay', 'projects', 'environment', 'people', 'organisation', 'company', 'colleagues', 'work', 'management', 'place', 'office', 'atmosphere', 'training', 'team', 'bonus', 'benefits', 'culture', 'employees', 'managers', 'communication', 'hours', 'staff', 'salary', 'manager', 'job', 'development', 'food', 'events', 'location', 'values', 'workload', 'facilities', 'pension', 'time', 'support', 'opportunities', 'learn', 'flexibility', 'teams', 'benefits package', 'perks', 'bonuses', 'networking', 'diversity', 'department', 'service', 'progression', 'it', 'bureaucracy', 'politics', 'processes']


In [None]:
from collections import Counter

# Combine pros and cons into one string per row
combined_texts = (
    df_sample['pros_clean_min'].fillna('') + ' ' + df_sample['cons_clean_min'].fillna('')
).str.lower()

# Count how many unique reviews mention each aspect
aspect_review_counts = {asp: combined_texts.str.contains(rf'\b{asp.lower()}\b').sum() for asp in top_aspects}

# Display sorted
print("\nNumber of Reviews Mentioning Each Auto-Detected Aspect (Unique Mentions Only):")
for asp, count in sorted(aspect_review_counts.items(), key=lambda x: -x[1]):
    print(f"{asp:<15}: {count}")



🧾 Number of Reviews Mentioning Each Auto-Detected Aspect (Unique Mentions Only):
work           : 325
people         : 212
company        : 158
it             : 147
management     : 143
hours          : 122
pay            : 114
opportunities  : 106
staff          : 104
team           : 97
environment    : 90
benefits       : 89
culture        : 86
training       : 78
salary         : 75
progression    : 74
job            : 67
time           : 67
support        : 54
colleagues     : 50
place          : 50
managers       : 50
development    : 50
office         : 45
employees      : 45
organisation   : 32
bonus          : 32
manager        : 30
projects       : 25
teams          : 25
learn          : 24
politics       : 21
processes      : 21
location       : 20
perks          : 19
service        : 19
atmosphere     : 18
events         : 18
flexibility    : 17
values         : 16
communication  : 13
pension        : 13
food           : 12
workload       : 12
bureaucracy    : 12
bonuses  

## 3. Building a Curated Aspect Dictionary

While PyABSA is useful for aspect discovery, we want **full control** over which aspects are analyzed. To achieve this, we build a **manual aspect dictionary** that maps high-level aspects to keyword variants.

Examples:
- `"pay"`: ["pay", "salary", "wage", "bonus"]
- `"culture"`: ["culture", "environment", "atmosphere"]

We combine:
- Automatically extracted top aspects
- Domain knowledge of what matters to candidates
- Refinements from multiple review iterations

This enhanced dictionary is later used with a classifier to evaluate only these controlled aspects.


In [None]:
# 1. Normalise everything to lowercase
existing_keys = {k.lower() for k in aspect_groups.keys()}

# flatten every keyword in the dictionary
existing_keywords = {kw.lower() for kws in aspect_groups.values() for kw in kws}

# 2. Compare
# a) aspects that are missing entirely
missing_new_keys = [asp for asp in top_aspects
                    if asp.lower() not in existing_keys
                    and asp.lower() not in existing_keywords]

# b) aspects that already exist as *keywords* but not as a top‑level key
present_as_keyword = [asp for asp in top_aspects
                      if asp.lower() not in existing_keys    # not a key
                      and asp.lower() in existing_keywords]  # but inside values

print("Candidates to ADD as new keys:")
for asp in missing_new_keys:
    print("  •", asp)

print("\nAlready present as a keyword (maybe promote to key?):")
for asp in present_as_keyword:
    print("  •", asp)

🔍 Candidates to ADD as new keys:
  • projects
  • organisation
  • company
  • work
  • place
  • office
  • employees
  • communication
  • staff
  • manager
  • job
  • food
  • events
  • location
  • workload
  • facilities
  • time
  • opportunities
  • learn
  • teams
  • benefits package
  • bonuses
  • networking
  • department
  • service
  • progression
  • it
  • bureaucracy
  • politics
  • processes

✅ Already present as a keyword (maybe promote to key?):
  • environment
  • colleagues
  • atmosphere
  • training
  • team
  • bonus
  • managers
  • hours
  • salary
  • development
  • values
  • pension
  • support
  • perks
  • diversity


### Combining for ultimate dictionary:

In [13]:
modified_aspect_groups = {
    'perks & benefits': [
        'benefit', 'benefits', 'employee benefits', 'perks', 'health',
        'healthcare', 'insurance', 'pension', 'pension scheme', 'rewards',
        'gym', 'lunch', 'dental', 'benefits package', 'package', 'nhs',
        'stock', 'discount', 'discounts', 'food', 'coffee', 'drink',
        'drinks', 'meal', 'meals'],

    'culture': [
        'culture', 'family', 'environment', 'atmosphere', 'company culture',
        'values', 'enviroment', 'work environment', 'office environment',
        'morale', 'communication', 'communications', 'vision', 'sustainability''attitude',
        'bullying', 'interaction', 'gossip', 'dress code'],

    'diversity & inclusion': [
        'diversity', 'inclusion', 'equality', 'equity', 'inclusive',
        'minority', 'belonging'],

    'growth & learning': [
        'growth', 'grow', 'career', 'promotion', 'promotions', 'advancement',
        'development', 'opportunity', 'progress', 'opportunities', 'progression',
        'career progression', 'learn', 'learning', 'knowledge', 'courses', 'academics',
        'academic', 'academy', 'student', 'students', 'teaching', 'university', 'internal movement'],

    'hr & recruitment': [
        'hr', 'recruitment', 'talent', 'staff', 'staff members', 'staffing',
        'policies', 'training', 'onboarding', 'orientation', 'new hire', 'introduction',
        'induction'],

    'job role clarity': [
        'role', 'responsibility', 'job description', 'tasks', 'duties',
        'expectation', 'scope'],

    'job security': [
        'job security', 'security', 'safety', 'layoff', 'firing', 'termination',
        'redundancy', 'stable job', 'employment', 'staff turnover', 'turnover'],

    'management': [
        'directors', 'management', 'managers', 'leader', 'leaders',
        'leadership', 'boss', 'supervisor', 'supervisors', 'ceo', 'bosses',
        'management structure', 'management team', 'middle management',
        'senior management', 'structure', 'bureaucracy', 'bureaucratic',
        'red tape', 'tape', 'politics'],

    'office & location': [
        'office', 'office location', 'office space',
        'offices', 'location', 'locations', 'area', 'workplace',
        'building', 'buildings', 'facilities', 'space', 'room',
        'infrastructure', 'campus', 'canteen', 'parking'],

    'pay': [
        'pay', 'paid', 'money', 'salary', 'compensation', 'wage', 'wages',
        'income', 'bonus', 'bonuses', 'reward', 'salaries', 'sick pay',
        'incentives', 'pay rates', 'commission', 'contract',
        'contracts', 'terms', 'conditions'],

    'people': [
        'people', 'colleagues', 'collegues', 'coworker', 'coworkers', 'community',
        'teammates', 'social', 'crew'],

    'performance feedback': [
        'feedback', 'review', 'appraisal', 'evaluation', 'performance',
        'supervision', 'goals', 'targets', 'kpi'],

    'remote work': [
        'remote', 'work from home', 'wfh', 'home office', 'hybrid', 'telework'],

    'stress & pressure': [
        'stress', 'paced environment', 'pace', 'pressure', 'burnout',
        'demanding', 'high pressure', 'overload', 'long hours', 'stressful'],

    'teamwork & collaboration': [
        'team', 'teams', 'teamwork', 'collaboration', 'supportive',
        'helpful', 'team spirit', 'group'],

    'tools & resources': [
        'tools', 'resources', 'equipment', 'software', 'support', 'access',
        'tech stack', 'technology', 'technology', 'tech', 'innovation',
        'technologies', 'research'],

    'work-life balance': [
        'work-life balance', 'workload', 'weekends', 'balance', 'working hours',
        'hours', 'flexibility', 'overtime', 'holiday', 'life balance', 'lifestyle',
        'annual leave', 'christmas', 'flexible', 'adjustable schedule', 'shift', 'adaptable',
        'break', 'breaks']
}

## 4. Sentiment Classification (ASC) with BERT

To classify sentiment for each detected aspect, we use a **pretrained BERT model** (`nlptown/bert-base-multilingual-uncased-sentiment`). This model:

- Accepts a sentence describing an aspect (prompt-based input)
- Returns a sentiment rating from 1 to 5 stars

We pass each review segment (pro or con) and aspect as a prompt and record the predicted sentiment distribution and label.


In [14]:
df_sample['min_pro_aspect_sentiments'] = df_sample['pros_clean_min'].progress_apply(lambda x: analyze_all_aspect_groups(x, modified_aspect_groups, tone='pos'))
df_sample['min_con_aspect_sentiments'] = df_sample['cons_clean_min'].progress_apply(lambda x: analyze_all_aspect_groups(x, modified_aspect_groups, tone='neg'))
df_sample['lemma_pro_aspect_sentiments'] = df_sample['pros_clean_lemma_only'].progress_apply(lambda x: analyze_all_aspect_groups(x, modified_aspect_groups, tone='pos'))
df_sample['lemma_con_aspect_sentiments'] = df_sample['cons_clean_lemma_only'].progress_apply(lambda x: analyze_all_aspect_groups(x, modified_aspect_groups, tone='neg'))
df_sample['Stopword_pro_aspect_sentiments'] = df_sample['pros_clean_with_stopwords'].progress_apply(lambda x: analyze_all_aspect_groups(x, modified_aspect_groups, tone='pos'))
df_sample['Stopword_con_aspect_sentiments'] = df_sample['cons_clean_with_stopwords'].progress_apply(lambda x: analyze_all_aspect_groups(x, modified_aspect_groups, tone='neg'))
df_sample['Symspell_pro_aspect_sentiments'] = df_sample['pros_clean_with_symspell'].progress_apply(lambda x: analyze_all_aspect_groups(x, modified_aspect_groups, tone='pos'))
df_sample['Symspell_con_aspect_sentiments'] = df_sample['cons_clean_with_symspell'].progress_apply(lambda x: analyze_all_aspect_groups(x, modified_aspect_groups, tone='neg'))

  0%|          | 0/761 [00:00<?, ?it/s]

  0%|          | 0/761 [00:00<?, ?it/s]

  0%|          | 0/761 [00:00<?, ?it/s]

  0%|          | 0/761 [00:00<?, ?it/s]

  0%|          | 0/761 [00:00<?, ?it/s]

  0%|          | 0/761 [00:00<?, ?it/s]

  0%|          | 0/761 [00:00<?, ?it/s]

  0%|          | 0/761 [00:00<?, ?it/s]

In [15]:
df_sample.to_csv('comparison_cleans.csv')

In [16]:
df_sample[['firm', 'pros', 'cons',
       'pros_clean_min', 'pros_clean_lemma_only',
       'pros_clean_with_stopwords', 'pros_clean_with_symspell',
       'cons_clean_min', 'cons_clean_lemma_only', 'cons_clean_with_stopwords',
       'cons_clean_with_symspell', 'min_pro_aspect_sentiments','lemma_pro_aspect_sentiments','Stopword_pro_aspect_sentiments','Symspell_pro_aspect_sentiments',
       'min_con_aspect_sentiments', 
       'lemma_con_aspect_sentiments', 
       'Stopword_con_aspect_sentiments', 
       'Symspell_con_aspect_sentiments']].to_csv('comparison_cleans.csv')

## 5. Evaluating Cleaning Pipelines

We compare how each cleaning method performs when used as input to the ABSA pipeline. Metrics include:

- Number of aspects detected
- Coverage of known aspects
- Consistency in predictions

This allows us to select the preprocessing pipeline that preserves important content without introducing noise or omissions.


In [17]:
from collections import defaultdict

# Define clean types and columns
clean_types = ['min', 'lemma', 'Stopword', 'Symspell']
pro_cols = [f'{c}_pro_aspect_sentiments' for c in clean_types]
con_cols = [f'{c}_con_aspect_sentiments' for c in clean_types]

# Collect all aspects seen in the data
all_aspects = sorted({
    aspect
    for col in pro_cols + con_cols
    for entry in df_sample[col].dropna()
    for aspect in entry.keys()  # no need to eval, it's already a dict
})

# Count mentions in pros
pro_counts = defaultdict(dict)
for method in clean_types:
    col = f'{method}_pro_aspect_sentiments'
    for aspect in all_aspects:
        pro_counts[aspect][method] = df_sample[col].dropna().apply(lambda x: aspect in x).sum()

# Count mentions in cons
con_counts = defaultdict(dict)
for method in clean_types:
    col = f'{method}_con_aspect_sentiments'
    for aspect in all_aspects:
        con_counts[aspect][method] = df_sample[col].dropna().apply(lambda x: aspect in x).sum()

# Create DataFrames
pro_df = pd.DataFrame.from_dict(pro_counts, orient='index').fillna(0).astype(int).sort_index()
con_df = pd.DataFrame.from_dict(con_counts, orient='index').fillna(0).astype(int).sort_index()

# Pretty print like your example
print("\nAspect Mentions in PROS:")
print(f"{'Aspect':30}" + "".join([f"{ct:>12}" for ct in clean_types]))
print("-" * (30 + 12 * len(clean_types)))
for aspect, row in pro_df.iterrows():
    print(f"{aspect:30}" + "".join([f"{row[ct]:12}" for ct in clean_types]))

print("\nAspect Mentions in CONS:")
print(f"{'Aspect':30}" + "".join([f"{ct:>12}" for ct in clean_types]))
print("-" * (30 + 12 * len(clean_types)))
for aspect, row in con_df.iterrows():
    print(f"{aspect:30}" + "".join([f"{row[ct]:12}" for ct in clean_types]))



Aspect Mentions in PROS:
Aspect                                 min       lemma    Stopword    Symspell
------------------------------------------------------------------------------
culture                                151         149         149         149
diversity & inclusion                   14          14          14          14
growth & learning                      199         198         198         198
hr & recruitment                       135         132         128          83
job role clarity                        27          28          28          28
job security                            13          13          13          13
management                              81          64          64          64
office & location                       80          72          72          71
pay                                    151         143         143         143
people                                 208         170         170         170
performance feedback      

## 6. Manual Evaluation Against Ground Truth

To measure accuracy, we created a **gold-standard dataset** of 100 reviews:

- Labeled independently by two human annotators
- Includes aspect-sentiment pairs (e.g., "pay: 2", "management: 4")
- Includes ChatGPT-labeled version for triangulation

We compare model predictions against human annotations for each cleaning method using:

- **Macro / Micro F1**
- **Exact match** (aspect + sentiment)
- **Aspect precision, recall, and F1** (regardless of sentiment)



In [18]:
dfManual = pd.read_csv("Manual aspect detection.csv")

First, we parse through the JSON-like output from PyABSA across the four cleaning conditions.

In [None]:
def parse_machine(cell: str) -> dict[str, int]:
    """
    Turn the JSON-ish string from the model into {aspect: label}.
    Empty/malformed → {}.
    """
    try:
        d = ast.literal_eval(cell)
        return {k: int(v["label"]) for k, v in d.items() if "label" in v}
    except Exception:
        return {}

_hum_re = re.compile(r"\s*([^:;]+)\s*:\s*([1-5])")

def parse_human(cell) -> dict[str, int]:
    """
    Turn 'culture:4; pay:2' into {'culture':4, 'pay':2}.
    Safely skip non-string (NaN) cells.
    """
    if not isinstance(cell, str):
        return {}
    return {m.group(1).strip(): int(m.group(2)) for m in _hum_re.finditer(cell)}



In [20]:
CLEAN_VARIANTS = {
    "min"      : ("min_pro_aspect_sentiments",      "min_con_aspect_sentiments"),
    "lemma"    : ("lemma_pro_aspect_sentiments",    "lemma_con_aspect_sentiments"),
    "stopword" : ("Stopword_pro_aspect_sentiments", "Stopword_con_aspect_sentiments"),
    "symspell" : ("Symspell_pro_aspect_sentiments", "Symspell_con_aspect_sentiments"),
}

In [21]:
def merge_reviews(*dicts):
    combined = {}
    keys = set().union(*[d.keys() for d in dicts])
    for key in keys:
        scores = [d.get(key) for d in dicts if d.get(key) is not None]
        if scores:
            combined[key] = round(sum(scores) / len(scores))
    return combined


We begin by testing the gold-standard set, which includes two human annotations and one LLM-generated evaluation.

In [None]:
records = []

for idx, row in dfManual.iterrows():
    gold_marie_pro = parse_human(row["Marie's pros"])
    gold_marie_con = parse_human(row["Marie's cons"])
    gold_javi_pro  = parse_human(row["Javi's pros"])
    gold_javi_con  = parse_human(row["Javi's cons"])
    gold_chat_pro  = parse_human(row["Chat's pros"])
    gold_chat_con  = parse_human(row["Chat's cons"])

    # Option A: Human only
    gold_pro = merge_reviews(gold_marie_pro, gold_javi_pro, gold_chat_pro)
    gold_con = merge_reviews(gold_marie_con, gold_javi_con, gold_chat_con)


    gold_full = {**gold_pro, **gold_con}          # treat pros & cons together

    for variant, (pro_col, con_col) in CLEAN_VARIANTS.items():
        pred_pro  = parse_machine(row[pro_col])
        pred_con  = parse_machine(row[con_col])
        pred_full = {**pred_pro, **pred_con}

        aspects = set(gold_full) | set(pred_full)
        for asp in aspects:
            records.append(
                dict(method=variant,
                     aspect=asp,
                     gold = gold_full.get(asp),     # None if missing
                     pred = pred_full.get(asp))
            )

long = pd.DataFrame.from_records(records)
# drop rows where the human reviewer had no opinion for that aspect
long = long[long.gold.notna()]
long["gold_binary"] = 1
long["pred_binary"] = long["pred"].notna().astype(int)



In [None]:
METRICS = {}

for variant, grp in long.groupby("method"):
    y_true = grp["gold"]
    y_pred = grp["pred"].fillna(0)          # 0 = “model didn’t predict this aspect”

    # The class list forces metrics to treat all 5 sentiment levels equally
    labels = [1, 2, 3, 4, 5]

    macro_f1 = f1_score(y_true, y_pred, labels=labels, average="macro")
    micro_f1 = f1_score(y_true, y_pred, labels=labels, average="micro")
    acc      = accuracy_score(y_true, y_pred)

    METRICS[variant] = dict(macro_f1=macro_f1,
                            micro_f1=micro_f1,
                            exact_match=acc,
                            samples=len(grp))

results = (pd.DataFrame(METRICS)
           .T.sort_values("macro_f1", ascending=False)
           .round(3))


In [None]:
from sklearn.metrics import precision_score, recall_score, f1_score

aspect_metrics = {}

for variant, grp in long.groupby("method"):
    prec = precision_score(grp["gold_binary"], grp["pred_binary"])
    rec  = recall_score(grp["gold_binary"], grp["pred_binary"])
    f1   = f1_score(grp["gold_binary"], grp["pred_binary"])

    aspect_metrics[variant] = {
        "aspect_precision": round(prec, 3),
        "aspect_recall": round(rec, 3),
        "aspect_f1": round(f1, 3)
    }

aspect_results = pd.DataFrame(aspect_metrics).T


In [None]:
combined_results = results.join(aspect_results)
print("Including LLM")
print(combined_results.sort_values("macro_f1", ascending=False))


including chat
          macro_f1  micro_f1  exact_match  samples  aspect_precision  \
min          0.309     0.388        0.302    559.0               1.0   
lemma        0.276     0.391        0.295    559.0               1.0   
stopword     0.275     0.390        0.293    559.0               1.0   
symspell     0.266     0.377        0.281    559.0               1.0   

          aspect_recall  aspect_f1  
min               0.558      0.716  
lemma             0.510      0.675  
stopword          0.504      0.671  
symspell          0.490      0.658  


While we initially included LLM-labeled outputs for triangulation and scalability, we found them inconsistent for fine-grained ABSA. Since minimal cleaning (min) showed the best performance across metrics, we now compare all model predictions exclusively against the human-labeled gold-standard to more reliably assess its effectiveness.

In [22]:
records = []

for idx, row in dfManual.iterrows():
    gold_marie_pro = parse_human(row["Marie's pros"])
    gold_marie_con = parse_human(row["Marie's cons"])
    gold_javi_pro  = parse_human(row["Javi's pros"])
    gold_javi_con  = parse_human(row["Javi's cons"])

    # Option A: Human only
    gold_pro = merge_reviews(gold_marie_pro, gold_javi_pro)
    gold_con = merge_reviews(gold_marie_con, gold_javi_con)


    gold_full = {**gold_pro, **gold_con}          # treat pros & cons together

    for variant, (pro_col, con_col) in CLEAN_VARIANTS.items():
        pred_pro  = parse_machine(row[pro_col])
        pred_con  = parse_machine(row[con_col])
        pred_full = {**pred_pro, **pred_con}

        aspects = set(gold_full) | set(pred_full)
        for asp in aspects:
            records.append(
                dict(method=variant,
                     aspect=asp,
                     gold = gold_full.get(asp),     # None if missing
                     pred = pred_full.get(asp))
            )

long = pd.DataFrame.from_records(records)
# drop rows where the human reviewer had no opinion for that aspect
long = long[long.gold.notna()]
long["gold_binary"] = 1
long["pred_binary"] = long["pred"].notna().astype(int)



In [23]:
METRICS = {}

for variant, grp in long.groupby("method"):
    y_true = grp["gold"]
    y_pred = grp["pred"].fillna(0)          # 0 = “model didn’t predict this aspect”

    # The class list forces metrics to treat all 5 sentiment levels equally
    labels = [1, 2, 3, 4, 5]

    macro_f1 = f1_score(y_true, y_pred, labels=labels, average="macro")
    micro_f1 = f1_score(y_true, y_pred, labels=labels, average="micro")
    acc      = accuracy_score(y_true, y_pred)

    METRICS[variant] = dict(macro_f1=macro_f1,
                            micro_f1=micro_f1,
                            exact_match=acc,
                            samples=len(grp))

results = (pd.DataFrame(METRICS)
           .T.sort_values("macro_f1", ascending=False)
           .round(3))


In [24]:
from sklearn.metrics import precision_score, recall_score, f1_score

aspect_metrics = {}

for variant, grp in long.groupby("method"):
    prec = precision_score(grp["gold_binary"], grp["pred_binary"])
    rec  = recall_score(grp["gold_binary"], grp["pred_binary"])
    f1   = f1_score(grp["gold_binary"], grp["pred_binary"])

    aspect_metrics[variant] = {
        "aspect_precision": round(prec, 3),
        "aspect_recall": round(rec, 3),
        "aspect_f1": round(f1, 3)
    }

aspect_results = pd.DataFrame(aspect_metrics).T


In [None]:
combined_results = results.join(aspect_results)
print("Excluding LLM")
print(combined_results.sort_values("macro_f1", ascending=False))


excluding chat
          macro_f1  micro_f1  exact_match  samples  aspect_precision  \
min          0.345     0.443        0.365    444.0               1.0   
lemma        0.331     0.443        0.351    444.0               1.0   
stopword     0.300     0.415        0.329    444.0               1.0   
symspell     0.300     0.405        0.318    444.0               1.0   

          aspect_recall  aspect_f1  
min               0.646      0.785  
lemma             0.588      0.740  
stopword          0.586      0.739  
symspell          0.568      0.724  


**Result:** Minimal cleaning (min) consistently yields the best performance across all metrics—including accuracy, recall, and F1—both when evaluating solely on human-labeled data and when combining it with LLM-generated labels.


## 7. Apply Best Pipeline to Full Dataset

We apply the best ABSA configuration:
- **Minimal cleaning**
- **Manual aspect dictionary**
- **BERT-based sentiment scoring**

For each review:
- `pros` and `cons` are analyzed separately
- All matching aspects are assigned a 1–5 star rating
- Probabilities for each star level are retained

We flatten the output into long-format rows, one per (firm, aspect, source), to prepare for aggregation.


In [30]:
dfFinal = pd.read_csv('filtered_england_recent.csv')

In [31]:
for col in ['pros', 'cons']:
    print(f"Processing column: {col}")
    dfFinal[f'{col}_clean_min'] = dfFinal[col].progress_apply(clean_min)

Processing column: pros


  0%|          | 0/76126 [00:00<?, ?it/s]

Processing column: cons


  0%|          | 0/76126 [00:00<?, ?it/s]

In [32]:
dfFinal.to_csv("preprocessed_england_dataset.csv", index=False)

### Applying aspect groups

In [None]:
dfFinal['min_pro_aspect_sentiments'] = dfFinal['pros_clean_min'].progress_apply(lambda x: analyze_all_aspect_groups(x, modified_aspect_groups, tone='pos'))
dfFinal['min_con_aspect_sentiments'] = dfFinal['cons_clean_min'].progress_apply(lambda x: analyze_all_aspect_groups(x, modified_aspect_groups, tone='neg'))

  0%|          | 0/76126 [00:00<?, ?it/s]

## 8. Company-Level Aggregation

Using the full set of aspect sentiments, we compute:

- **Average star rating** per (firm, aspect)
- **Number of mentions** per (firm, aspect)

We output two matrices:
- **Long format**: for analysis and visualizations
- **Wide format**: for use in modeling and recommendation (e.g., similarity matching)

These outputs are saved and used in:
- Aspect-based EDA which allows for modelling and summarization

In [None]:
def flatten_aspect_sentiments(row, source):
    results = []
    firm = row.get('firm') or row.get('company')  # fallback
    sentiments = row.get(source, {})
    for aspect, scores in sentiments.items():
        # Aggregate actual star rating (1–5)
        rating = scores.get("label")
        all_stars = {i: scores.get(i, 0.0) for i in range(1, 6)}

        results.append({
            "firm": firm,
            "aspect": aspect,
            "source": source.replace('_aspect_sentiments', ''),  # 'pro' or 'con'
            "star_rating": rating,
            **all_stars
        })
    return results

In [None]:
# 3. Apply to full dataset
flattened = []
for _, row in dfFinal.iterrows():
    flattened += flatten_aspect_sentiments(row, 'min_pro_aspect_sentiments')
    flattened += flatten_aspect_sentiments(row, 'min_con_aspect_sentiments')

flat_df = pd.DataFrame(flattened)

# 4. Aggregate star ratings + count mentions per company/aspect
company_aspect_matrix = flat_df.groupby(['firm', 'aspect'])['star_rating'].agg(
    avg_star_rating='mean',
    n_mentions='count'
).reset_index()
company_aspect_pivot = company_aspect_matrix.pivot(
    index='firm',
    columns='aspect',
    values='avg_star_rating'
)

### Exporting results

In [None]:
# 6. Export
company_aspect_matrix.to_csv('Final_company_aspect_matrix_with_counts.csv', index=False)
company_aspect_pivot.to_csv('Final_company_aspect_matrix_wide.csv', index=False)


# Summary

This notebook implemented and evaluated an ABSA pipeline for extracting fine-grained aspect-level sentiments from employee reviews.

Key components included:
- Testing four text cleaning strategies
- Using PyABSA for automatic aspect discovery
- Building a custom aspect dictionary
- Applying a BERT classifier for sentiment scoring
- Manually evaluating performance against human labels

The output is a structured, interpretable dataset capturing how each company is perceived across a set of predefined candidate-relevant aspects. This forms the foundation for recommendation, exploration, and summarization in the following modules.
