# Subtheme Sentiment Analysis 

# Problem statement
      .To tackle the Subtheme Sentiment Analysis Task, we need to develop a method that identifies subthemes and their respective sentiments from customer reviews. Below is an approach using a combination of natural language processing (NLP) techniques and machine learning to achieve this.

In [1]:
# Import Pandas library
import pandas as pd


  from pandas.core import (


In [3]:
# Load Dataset
df = pd.read_csv(r"C:\Users\subha\Downloads\Evaluation-dataset.csv")
df.head()

Unnamed: 0,"Tires where delivered to the garage of my choice,the garage notified me when they had been delivered. A day and time was arranged with the garage and I went and had them fitted,a Hassel free experience.",garage service positive,ease of booking positive,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6,Unnamed: 7,Unnamed: 8,Unnamed: 9,Unnamed: 10,Unnamed: 11,Unnamed: 12,Unnamed: 13,Unnamed: 14
0,"Easy Tyre Selection Process, Competitive Prici...",garage service positive,value for money positive,,,,,,,,,,,,
1,Very easy to use and good value for money.,value for money positive,,,,,,,,,,,,,
2,Really easy and convenient to arrange,ease of booking positive,,,,,,,,,,,,,
3,It was so easy to select tyre sizes and arrang...,location positive,value for money positive,ease of booking positive,,,,,,,,,,,
4,service was excellent. Only slight downside wa...,length of fitting positive,ease of booking positive,ease of booking negative,,,,,,,,,,,


In [4]:
# Convert the dataset into one column
df = pd.read_csv(r"C:\Users\subha\Downloads\Evaluation-dataset-single-column.csv")
df

Unnamed: 0,Combined
0,"Easy Tyre Selection Process, Competitive Prici..."
1,Very easy to use and good value for money. val...
2,Really easy and convenient to arrange ease of ...
3,It was so easy to select tyre sizes and arrang...
4,service was excellent. Only slight downside wa...
...,...
10126,"I ordered the wrong tyres, however [REDACTED] ..."
10127,"Good experience, first time I have used [REDAC..."
10128,"I ordered the tyre I needed on line, booked a ..."
10129,Excellent service from point of order to fitti...


In [5]:
df.head()

Unnamed: 0,Combined
0,"Easy Tyre Selection Process, Competitive Prici..."
1,Very easy to use and good value for money. val...
2,Really easy and convenient to arrange ease of ...
3,It was so easy to select tyre sizes and arrang...
4,service was excellent. Only slight downside wa...


In [8]:
df = pd.read_csv(r"C:\Users\subha\Downloads\Evaluation-dataset-single-column.csv")
df

Unnamed: 0,Combined
0,"Easy Tyre Selection Process, Competitive Prici..."
1,Very easy to use and good value for money. val...
2,Really easy and convenient to arrange ease of ...
3,It was so easy to select tyre sizes and arrang...
4,service was excellent. Only slight downside wa...
...,...
10126,"I ordered the wrong tyres, however [REDACTED] ..."
10127,"Good experience, first time I have used [REDAC..."
10128,"I ordered the tyre I needed on line, booked a ..."
10129,Excellent service from point of order to fitti...


In [7]:
import re
import nltk
from nltk.sentiment.vader import SentimentIntensityAnalyzer
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report

In [9]:
# Data Preprocessing
def clean_text(text):
    text = re.sub(r'[^\w\s]', '', text)  # Remove punctuation
    text = text.lower()  # Convert to lowercase
    text = re.sub(r'\d+', '', text)  # Remove numbers
    return text


In [10]:
df['cleaned'] = df['Combined'].apply(clean_text)

In [11]:
# Tokenization and Lemmatization
nltk.download('punkt')
nltk.download('wordnet')
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer

lemmatizer = WordNetLemmatizer()
df['tokens'] = df['cleaned'].apply(lambda x: [lemmatizer.lemmatize(word) for word in word_tokenize(x)])


[nltk_data] Error loading punkt: <urlopen error [Errno 11001]
[nltk_data]     getaddrinfo failed>
[nltk_data] Error loading wordnet: <urlopen error [Errno 11001]
[nltk_data]     getaddrinfo failed>


In [12]:
# Subtheme Extraction (Example using simple keyword matching)
subthemes = ['garage service', 'wait time', 'incorrect tyres']
def extract_subthemes(text):
    found_themes = []
    for theme in subthemes:
        if theme in text:
            found_themes.append(theme)
    return found_themes

df['subthemes'] = df['cleaned'].apply(extract_subthemes)


In [13]:
# Sentiment Analysis
sid = SentimentIntensityAnalyzer()
def get_sentiment(text):
    scores = sid.polarity_scores(text)
    if scores['compound'] >= 0.05:
        return 'positive'
    elif scores['compound'] <= -0.05:
        return 'negative'
    else:
        return 'neutral'

df['sentiment'] = df['Combined'].apply(get_sentiment)


In [16]:
# Combine Subthemes and Sentiments
def combine_subtheme_sentiment(row):
    subtheme_sentiments = {}
    for subtheme in row['subthemes']:
        subtheme_sentiments[subtheme] = row['sentiment']
    return subtheme_sentiments

df['subtheme_sentiment'] = df.apply(combine_subtheme_sentiment, axis=1)

# Evaluation (Placeholder, depends on labeled data)
# Assuming df['true_subthemes'] and df['true_sentiments'] are the true labels
# y_true = df[['true_subthemes', 'true_sentiments']]
# y_pred = df[['subthemes', 'sentiment']]
# print(classification_report(y_true, y_pred))

# Output result
df[['Combined', 'subtheme_sentiment']].head()


Unnamed: 0,Combined,subtheme_sentiment
0,"Easy Tyre Selection Process, Competitive Prici...",{'garage service': 'positive'}
1,Very easy to use and good value for money. val...,{}
2,Really easy and convenient to arrange ease of ...,{}
3,It was so easy to select tyre sizes and arrang...,{}
4,service was excellent. Only slight downside wa...,{}


#  Approach

# 
    Data Preprocessing:

Clean the Text: Remove any unnecessary characters, punctuation, and common words that don’t add much meaning (stop words).
Tokenization: Break down the text into individual words.
Normalization: Use techniques like lemmatization or stemming to simplify words to their root form.

    Subtheme and Sentiment Identification:

    Subtheme Extraction:
Use a machine learning model trained on language patterns (like BERT or GPT) to pick out key topics in the text.
Or, match words against a list of known topics to find them in the text.

    Sentiment Analysis:
Apply tools like VADER or TextBlob, or train your own model, to figure out if the context around a topic is positive or negative.Adjust the model with specific examples to improve its accuracy.

    Combining Results:

For each piece of text, identify the topics and link each one with its sentiment (good or bad).
Make sure to correctly identify negative topics related to problems.

    Evaluation:

Measure how well your topic and sentiment identification works using metrics like precision (accuracy), recall (completeness), and F1-score (a balance of both).

    Improvements and Iteration:

Keep training the model with more examples.
Use more complex techniques to better understand how words in a sentence relate to each other.

    Explanation and Motivation:

The method starts with basic text cleaning and uses existing tools for quick results.
It’s easy to understand and adjust.

    Improvements:

Upgrade to more advanced language models for deeper understanding.
Train specialized models for identifying topics and sentiments.
Use parsing techniques to better grasp the structure and meaning of sentences.

    Possible Problems:

Simple keyword matching might miss out on related words or phrases specific to the context.
Sentiment tools might need adjustments to work well on specialized text.
It can be tricky to deal with sentences that say one thing but mean another (like sarcasm or double negatives).
