<a href="https://colab.research.google.com/github/muhammedmusa16/code/blob/main/Topicmodeling_of_mtnapp_customer_review.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Topic Modeling 

**Super simple topic modeling using both the Non Negative Matrix Factorization (NMF) and Latent Dirichlet Allocation (LDA) algorithms.**

Textual data can be loaded from a Google Sheet and topics derived from NMF and LDA can be generated. Only simple form entry is required to set:

* the name of the google sheet
* the number of topics to be generated
* the number of top words and documents that must be printed out for each topic





In [None]:
#@title Install pyLDAVis (specific version for Google Collab)
!pip install pyLDAvis==2.1.2

Collecting pyLDAvis==2.1.2
  Downloading pyLDAvis-2.1.2.tar.gz (1.6 MB)
[?25l[K     |▏                               | 10 kB 17.0 MB/s eta 0:00:01[K     |▍                               | 20 kB 22.6 MB/s eta 0:00:01[K     |▋                               | 30 kB 25.3 MB/s eta 0:00:01[K     |▉                               | 40 kB 11.6 MB/s eta 0:00:01[K     |█                               | 51 kB 13.5 MB/s eta 0:00:01[K     |█▏                              | 61 kB 15.6 MB/s eta 0:00:01[K     |█▍                              | 71 kB 14.9 MB/s eta 0:00:01[K     |█▋                              | 81 kB 15.6 MB/s eta 0:00:01[K     |█▉                              | 92 kB 16.8 MB/s eta 0:00:01[K     |██                              | 102 kB 15.3 MB/s eta 0:00:01[K     |██▎                             | 112 kB 15.3 MB/s eta 0:00:01[K     |██▍                             | 122 kB 15.3 MB/s eta 0:00:01[K     |██▋                             | 133 kB 15.3 MB/s eta 0:

In [None]:
from google.colab import auth
auth.authenticate_user()

import gspread
from google.auth import default
creds, _ = default()

gc = gspread.authorize(creds)

sh = gc.create('A new spreadsheet')

In [None]:
#@title Install gspread, authenticate and load data from a Google Sheet
!pip install --upgrade -q gspread

from google.colab import auth
auth.authenticate_user()

import gspread
from oauth2client.client import GoogleCredentials

# Default data from
# http://web.eecs.utk.edu/~berry/order/node4.html#SECTION00022000000000000000

googlesheet_filename = 'my data' #@param {type:"string"}
data_rows_to_preview = 10 #@param {type:"integer"}


In [None]:
#@title Load and preview data from a Google Sheet

#gc = gspread.authorize(GoogleCredentials.get_application_default())

worksheet = gc.open(googlesheet_filename).sheet1

# get_all_values gives a list of rows.
rows = worksheet.get_all_values()

# convert the 2nd column values to a list
documents = []
for row in rows[1:]:
  documents.append(row[1])
  
#print(documents)

# Convert to a DataFrame and render.
import pandas as pd
dataset_df = pd.DataFrame.from_records(rows)
dataset_df.head(n=data_rows_to_preview)


Unnamed: 0,0,1
0,score,content
1,3,Excellent
2,1,Sorry to say but the old app was far better th...
3,5,Interesting
4,1,Very useless app
5,2,this app is a step back ward the previous app ...
6,2,Former app is better. I can't find the data de...
7,1,"Very very very very poor app, the toggle butto..."
8,1,The older app was navigated better.
9,3,Always logging me out automatically




---



---



In [None]:
#@title Set topic modeling algorithm arguments

no_topics =  3#@param {type:"integer"}

no_top_words = 4 #@param {type:"integer"}

no_top_documents = 3 #@param {type:"integer"}

from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer
from sklearn.decomposition import NMF, LatentDirichletAllocation
import numpy as np

In [None]:
#@title Run NMF

def display_topics(H, W, feature_names, documents, no_top_words, no_top_documents):
    for topic_idx, topic in enumerate(H):
        print("Topic %d:" % (topic_idx))
        print(" ".join([ (feature_names[i] + " (" + str(topic[i].round(2)) + ")")
          for i in topic.argsort()[:-no_top_words - 1:-1]]))
        top_doc_indices = np.argsort( W[:,topic_idx] )[::-1][0:no_top_documents]
        for doc_index in top_doc_indices:
            print(str(doc_index) + ". " + documents[doc_index])

# NMF is able to use tf-idf
tfidf_vectorizer = TfidfVectorizer(max_df=0.95, min_df=2, stop_words='english')
tfidf = tfidf_vectorizer.fit_transform(documents)
tfidf_feature_names = tfidf_vectorizer.get_feature_names()

# Run NMF
nmf_model = NMF(n_components=no_topics, random_state=1, alpha=.1, l1_ratio=.5, init='nndsvd').fit(tfidf)
nmf_W = nmf_model.transform(tfidf)
nmf_H = nmf_model.components_

print("NMF Topics")
display_topics(nmf_H, nmf_W, tfidf_feature_names, documents, no_top_words, no_top_documents)
print("--------------")



NMF Topics
Topic 0:
good (5.12) app (0.12) use (0.07) application (0.07)
760. Good
55. Good
1057. Good
Topic 1:
nice (4.34) app (0.15) experience (0.04) easy (0.02)
430. Nice one
1264. Very nice 👍
1116. Nice
Topic 2:
app (4.1) better (0.88) old (0.87) useless (0.75)
1149. My favourite app
578. Ineed your app
217. Sweet app.
--------------




In [None]:
#@title Visualise NMF with pyLDAVis

import pyLDAvis.sklearn

pyLDAvis.enable_notebook()

#pyLDAvis_data = pyLDAvis.sklearn.prepare(nmf_model, tfidf, tfidf_vectorizer)
# Visualization can be displayed in the notebook
#pyLDAvis.display(pyLDAvis_data)

  from collections import Iterable
  from collections import Mapping


In [None]:
#@title Run LDA

# LDA can only use raw term counts for LDA because it is a probabilistic graphical model
tf_vectorizer = CountVectorizer(max_df=0.95, min_df=2, stop_words='english')
tf = tf_vectorizer.fit_transform(documents)
tf_feature_names = tf_vectorizer.get_feature_names()

# Run LDA
lda_model = LatentDirichletAllocation(n_components=no_topics, max_iter=5, learning_method='online', learning_offset=50.,random_state=0).fit(tf)
lda_W = lda_model.transform(tf)
lda_H = lda_model.components_

print("LDA Topics")
display_topics(lda_H, lda_W, tf_feature_names, documents, no_top_words, no_top_documents)



LDA Topics
Topic 0:
good (211.52) app (75.2) mtn (59.27) great (50.61)
66. There's something the developers need to work on this app. No where one can edit. Paraventure one make a mistake when filling details on the app. You know when u first download the app. The ask of email. If one make mistake. There's isn't where to edit it. Gone to check on the app. It's not responding. MTN should provide a platform where one can make correction. Like edit profile. GLO NETWORK AS IT. MTN SHOULD FOLLOW SUIT.
245. Feel the best my MTN NG sim card App ever for keeping save profile this MTN NG App is the best ever up MTN NG
156. The last app show the las previos call make which it has been used to track the person who stole my line but now is not included mtn need to work on this thak you mtn you are the best
Topic 1:
app (1004.64) old (214.58) data (212.19) better (161.29)
865. Your rent upgrade is a disaster, I wish I could get my formal version. For weeks now I can't buy data, I can't share data w

In [None]:
#@title Visualise LDA with pyLDAVis

import pyLDAvis.sklearn

pyLDAvis.enable_notebook()

pyLDAvis_data = pyLDAvis.sklearn.prepare(lda_model, tf, tf_vectorizer)
# Visualization can be displayed in the notebook
pyLDAvis.display(pyLDAvis_data)

  head(R).drop('saliency', 1)
