<a href="https://colab.research.google.com/github/nutyfreshz/MADT8101_Customer_Analytics/blob/main/EP_14_Topics_Modeling.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Voice of customer**

Voice of Customer (VoC) analytics involves converting qualitative and quantitative customer feedback into categorized datasets for analysis, drawn from various sources like surveys, customer support cases, reviews, social media comments, and more. Utilizing multiple data sources enhances the understanding of customer touchpoints along their journey.

## **Benefits of voice of customer**

* **Understanding Customer Feedback**

Customer feedback analysis helps businesses identify communication, marketing, and product/service gaps, enabling informed decisions to enhance customer satisfaction.

* **Increasing Customer Loyalty**

Customer experience strongly influences customer loyalty, contributing to two-thirds of its driving factors, as per Gartner study. Businesses can enhance loyalty by delivering exceptional experiences that address pain points and positively impact customers' lives.

* **Product Research and Development**

Voice of Customer feedback provides valuable insights into a brand's products and services. Analyzing customer feedback helps understand pain points, inspire new ideas for development, and address customer needs, fostering innovation and customer satisfaction.

* **Crisis Management**

Real-time monitoring of Voice of Customer insights is invaluable during crises. It helps predict and address potential issues swiftly, showcasing strong crisis management capabilities.

* **Enhancing Brand Reputation**

A customer-centric company that values feedback improves reputation and attracts new customers, boosting market share.

* **Identifying Customer Pain Points**

Voice of Customer insights help businesses identify customer pain points, leading to targeted solutions for improved satisfaction and experience.


## **Topic Modeling with LDA**

Topic Modeling with Latent Dirichlet Allocation (LDA) is a popular technique in natural language processing and machine learning. It identifies underlying topics in a collection of text documents using a probabilistic model. The process involves preprocessing, creating a Document-Term Matrix, applying LDA, and interpreting the results.

# **WORKSHOP: Topics Modeling with LDA**

In [None]:
!pip install pyLDAvis

Collecting pyLDAvis
  Downloading pyLDAvis-3.4.1-py3-none-any.whl (2.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.6/2.6 MB[0m [31m9.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting numpy>=1.24.2 (from pyLDAvis)
  Downloading numpy-1.25.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m17.6/17.6 MB[0m [31m34.9 MB/s[0m eta [36m0:00:00[0m
Collecting pandas>=2.0.0 (from pyLDAvis)
  Downloading pandas-2.0.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (12.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.3/12.3 MB[0m [31m30.8 MB/s[0m eta [36m0:00:00[0m
Collecting funcy (from pyLDAvis)
  Downloading funcy-2.0-py2.py3-none-any.whl (30 kB)
Collecting tzdata>=2022.1 (from pandas>=2.0.0->pyLDAvis)
  Downloading tzdata-2023.3-py2.py3-none-any.whl (341 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m341.8/341.8 kB[0m [3

In [None]:
import pandas as pd
import gensim
from gensim import corpora
from gensim.models import LdaModel
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer
import pyLDAvis.gensim
import matplotlib.pyplot as plt

import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning, module="ipykernel\\.*")

pyLDAvis.enable_notebook()

In [None]:
!gdown --id 1e6Iu3QTW_o1sjVUFpSP_jd1fFRTKqHx-

Downloading...
From: https://drive.google.com/uc?id=1e6Iu3QTW_o1sjVUFpSP_jd1fFRTKqHx-
To: /content/cannabis-reviews - blue_dream.csv
100% 16.8k/16.8k [00:00<00:00, 51.8MB/s]


In [None]:
path = '/content/cannabis-reviews - blue_dream.csv'

df = pd.read_csv(path)

df

Unnamed: 0,review_id,review_context
0,1001,"Friends, stoners, red-eyed countrymen, lend me..."
1,1002,I consider this strain perfect for my psycholo...
2,1003,TL:DR - This is easily the best strain I have ...
3,1004,"Finally, I've gotten to try Blue Dream. This i..."
4,1005,Recently I've come to grips with having a frus...
5,1006,this bud will have you soaring! I've been smok...
6,1007,This strain was a staple in Bay Area weed cult...
7,1008,Super common strain over here on the east coas...
8,1009,Lineage: Super Silver Haze x Blueberry Blue Dr...
9,1010,Today's go-to for a blast and a boost! Enjoyed...


In [None]:
nltk.download('punkt')
nltk.download('stopwords')
nltk.download('wordnet')

stop_words = set(stopwords.words('english'))

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


In [None]:
def preprocess_text(text):
    words = word_tokenize(text.lower())

    words = [word for word in words if word.isalpha() and word not in stop_words]

    lemmatizer = WordNetLemmatizer()
    words = [lemmatizer.lemmatize(word) for word in words]

    return words

In [None]:
df['processed_review'] = df['review_context'].apply(preprocess_text)

df

Unnamed: 0,review_id,review_context,processed_review
0,1001,"Friends, stoners, red-eyed countrymen, lend me...","[friend, stoner, countryman, lend, ear, bring,..."
1,1002,I consider this strain perfect for my psycholo...,"[consider, strain, perfect, psychological, pro..."
2,1003,TL:DR - This is easily the best strain I have ...,"[tl, dr, easily, best, strain, ever, enjoyed, ..."
3,1004,"Finally, I've gotten to try Blue Dream. This i...","[finally, gotten, try, blue, dream, highly, to..."
4,1005,Recently I've come to grips with having a frus...,"[recently, come, grip, frustratingly, high, to..."
5,1006,this bud will have you soaring! I've been smok...,"[bud, soaring, smoking, regularly, year, got, ..."
6,1007,This strain was a staple in Bay Area weed cult...,"[strain, staple, bay, area, weed, culture, ca,..."
7,1008,Super common strain over here on the east coas...,"[super, common, strain, east, coast, lot, stra..."
8,1009,Lineage: Super Silver Haze x Blueberry Blue Dr...,"[lineage, super, silver, haze, x, blueberry, b..."
9,1010,Today's go-to for a blast and a boost! Enjoyed...,"[today, blast, boost, enjoyed, one, coffee, go..."


In [None]:
dictionary = corpora.Dictionary(df['processed_review'])

doc_term_matrix = [dictionary.doc2bow(doc) for doc in df['processed_review']]

# Build the LDA model
num_topics = 3  #Adjust this parameter based on your analysis requirements
lda_model = LdaModel(doc_term_matrix, num_topics=num_topics, id2word=dictionary, passes=50)

In [None]:
def get_dominant_topic(lda_model, doc_term_matrix):
    topic_probs = lda_model.get_document_topics(doc_term_matrix)
    dominant_topics = [max(topic_probs[i], key=lambda x: x[1])[0] for i in range(len(topic_probs))]
    return dominant_topics

In [None]:
df['dominant_topic'] = get_dominant_topic(lda_model, doc_term_matrix)
df['dominant_topic'] = df['dominant_topic'] +1

df

Unnamed: 0,review_id,review_context,processed_review,dominant_topic
0,1001,"Friends, stoners, red-eyed countrymen, lend me...","[friend, stoner, countryman, lend, ear, bring,...",2
1,1002,I consider this strain perfect for my psycholo...,"[consider, strain, perfect, psychological, pro...",3
2,1003,TL:DR - This is easily the best strain I have ...,"[tl, dr, easily, best, strain, ever, enjoyed, ...",1
3,1004,"Finally, I've gotten to try Blue Dream. This i...","[finally, gotten, try, blue, dream, highly, to...",3
4,1005,Recently I've come to grips with having a frus...,"[recently, come, grip, frustratingly, high, to...",3
5,1006,this bud will have you soaring! I've been smok...,"[bud, soaring, smoking, regularly, year, got, ...",1
6,1007,This strain was a staple in Bay Area weed cult...,"[strain, staple, bay, area, weed, culture, ca,...",3
7,1008,Super common strain over here on the east coas...,"[super, common, strain, east, coast, lot, stra...",2
8,1009,Lineage: Super Silver Haze x Blueberry Blue Dr...,"[lineage, super, silver, haze, x, blueberry, b...",2
9,1010,Today's go-to for a blast and a boost! Enjoyed...,"[today, blast, boost, enjoyed, one, coffee, go...",3


In [None]:
df.to_csv('10_review_topics_modeling.csv', index = False)

In [None]:
pyLDAvis.gensim.prepare(lda_model, doc_term_matrix, dictionary)

In [None]:
topic_mapping = {
    1: "Experience of User",
    2: "Strains comparison",
    3: "Taste of Buds"
}

df['dominant_topic_trans'] = df['dominant_topic'].map(topic_mapping)

In [None]:
df

Unnamed: 0,review_id,review_context,processed_review,dominant_topic,dominant_topic_trans
0,1001,"Friends, stoners, red-eyed countrymen, lend me...","[friend, stoner, countryman, lend, ear, bring,...",2,Strains comparison
1,1002,I consider this strain perfect for my psycholo...,"[consider, strain, perfect, psychological, pro...",3,Taste of Buds
2,1003,TL:DR - This is easily the best strain I have ...,"[tl, dr, easily, best, strain, ever, enjoyed, ...",1,Experience of User
3,1004,"Finally, I've gotten to try Blue Dream. This i...","[finally, gotten, try, blue, dream, highly, to...",3,Taste of Buds
4,1005,Recently I've come to grips with having a frus...,"[recently, come, grip, frustratingly, high, to...",3,Taste of Buds
5,1006,this bud will have you soaring! I've been smok...,"[bud, soaring, smoking, regularly, year, got, ...",1,Experience of User
6,1007,This strain was a staple in Bay Area weed cult...,"[strain, staple, bay, area, weed, culture, ca,...",3,Taste of Buds
7,1008,Super common strain over here on the east coas...,"[super, common, strain, east, coast, lot, stra...",2,Strains comparison
8,1009,Lineage: Super Silver Haze x Blueberry Blue Dr...,"[lineage, super, silver, haze, x, blueberry, b...",2,Strains comparison
9,1010,Today's go-to for a blast and a boost! Enjoyed...,"[today, blast, boost, enjoyed, one, coffee, go...",3,Taste of Buds
