# Topic Modeling Demo

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1EEVMxtS_8f47gWhrGnoTfHUYryO98JXG?usp=sharing)

In [16]:
texts_daily = [
    "Just made coffee and starting work. Mondays are tough.",
    "Gym was packed today, but I finally hit my step goal.",
    "Cooking pasta for dinner and watching Netflix.",
    "Had a meeting with my manager about the new project deadlines.",
    "I need to buy groceries before the weekend.",
    "Traffic was bad this morning but I found a great podcast.",
    "Met my friend for brunch downtown — amazing pancakes!",
    "Finally cleaned my apartment, feels so much better now.",
    "My cat knocked over a plant again while I was on a call.",
    "Booked tickets for next month’s concert. Can’t wait!"
]

In [15]:
texts_disaster = [
    "Made coffee and checked emails before work.",
    "Went for a quick run by the river this morning.",
    "Cooking lunch and listening to music.",
    "Just started reading a new mystery novel.",
    "Suddenly felt the floor shaking — is this an earthquake?",
    "People are running outside; the power just went out.",
    "Trying to reach family but cell service is down.",
    "Aftershock just hit, buildings are damaged everywhere.",
    "Emergency crews are setting up shelters near the park.",
    "It’s quiet now, but everyone is still scared to go back inside."
]

In [26]:
tweets_movie_feelings = [
    "Watched *Inside Out* again and cried at the part where Bing Bong fades away. Pixar always finds that perfect mix of joy and heartbreak.",
    "Rewatched *Interstellar* and it hit differently this time — that final scene between Cooper and Murph just shattered me.",
    "The soundtrack in *La La Land* makes me both happy and sad every time. It’s the perfect film for people who’ve loved and lost while chasing dreams.",
    "Horror movies like *Hereditary* don’t scare me because of the gore — it’s that slow dread that seeps into your bones long after it’s over.",
    "Watched *The Grand Budapest Hotel* again and it instantly lifted my mood. The colors, the symmetry — it’s like visual serotonin.",
    "The ending of *The Green Mile* always leaves me speechless. Sad movies about kindness hurt the most.",
    "I find comfort in rewatching *The Secret Life of Walter Mitty*. It’s about the quiet courage to actually live, not just dream.",
    "That rainy scene in *Eternal Sunshine of the Spotless Mind* gets me every single time — heartbreak in slow motion.",
    "Action movies like *Mad Max: Fury Road* make me feel so alive — pure adrenaline, chaos, and precision in motion.",
    "Animated films like *Up* remind me that joy and grief are never far apart — those first ten minutes say more about love than most romances ever do."
]


In [32]:
tweets_books = [
    "Reading *1984* again and it feels even more relevant now — language as control, truth as illusion. It’s terrifying how timeless it is.",
    "Finished *The Night Circus* and I can’t stop thinking about the imagery — everything feels like it’s dipped in candlelight and magic.",
    "*The Great Gatsby* hits different as an adult. It’s less about parties and more about loneliness hiding under all that glitter.",
    "Just started *Beloved* by Toni Morrison. The writing is heavy and lyrical, like every sentence carries history’s weight.",
    "Reading *The Alchemist* again reminds me how simple the pursuit of meaning can be — a boy, a dream, and the desert as his teacher.",
    "Halfway through *The Handmaid’s Tale* and I keep pausing — the quiet moments are the most horrifying.",
    "*Norwegian Wood* feels like reading a memory. Everything is muted and drenched in melancholy.",
    "Rereading *To Kill a Mockingbird* and realizing how deeply it shaped how I think about empathy and justice.",
    "Got lost in *Circe* last night. I love how Madeline Miller turns myth into something deeply human — power, exile, and identity.",
    "I just finished *Project Hail Mary* and felt this weird mix of awe and sadness. Science and loneliness, but with hope at the end."
]


In [17]:
import re, numpy as np, pandas as pd
import plotly.graph_objects as go
from sklearn.decomposition import LatentDirichletAllocation
from sklearn.feature_extraction.text import CountVectorizer

def top_words_for_topic(topic_idx, topn=10):
    comp = lda.components_[topic_idx]
    ids = np.argsort(comp)[::-1][:topn]
    return terms[ids], comp[ids]

In [18]:
vec = CountVectorizer(lowercase=True, stop_words="english", min_df=1, max_df=0.95)
X = vec.fit_transform(texts_disaster)

n_topics = 5
lda = LatentDirichletAllocation(n_components=n_topics, random_state=0)
lda.fit(X)

terms = np.array(vec.get_feature_names_out())

In [19]:
# Interactive viewer
fig = go.Figure()
tt, ss = top_words_for_topic(0, topn=12)
fig.add_trace(go.Bar(x=tt, y=ss, name="Topic 0"))

buttons = []
for t in range(n_topics):
    tt, ss = top_words_for_topic(t, topn=12)
    buttons.append(dict(
        label=f"Topic {t}",
        method="update",
        args=[{"x":[tt], "y":[ss]},
              {"title":f"Top Words — Topic {t}"}]
    ))

fig.update_layout(
    title="LDA Topics (Script) — select topic",
    updatemenus=[dict(type="dropdown", buttons=buttons, x=0.3, y=1.2, xanchor="left")],
    yaxis_title="Topic-word weight"
)
fig.show()

## LDA Visualization

In [2]:
import warnings
warnings.filterwarnings("ignore")

from gensim.models.ldamodel import LdaModel
!pip install pyLDAvis
import pyLDAvis
import pyLDAvis.gensim_models as gensimvis
from gensim import corpora

  return datetime.utcnow().replace(tzinfo=utc)
  return datetime.utcnow().replace(tzinfo=utc)
  return datetime.utcnow().replace(tzinfo=utc)
  return datetime.utcnow().replace(tzinfo=utc)
  return datetime.utcnow().replace(tzinfo=utc)




In [3]:
pyLDAvis.enable_notebook()

In [27]:
dataset = tweets_movie_feelings

In [28]:
texts_tokenized = [
    [word.lower() for word in doc.split() if word.isalpha()]
    for doc in dataset
]

dictionary = corpora.Dictionary(texts_tokenized)
corpus = [dictionary.doc2bow(text) for text in texts_tokenized]


In [29]:
lda_model = LdaModel(
    corpus=corpus,
    id2word=dictionary,
    num_topics=5,       # change to explore
    random_state=0,
    passes=15,
    alpha='auto',
    per_word_topics=True
)

In [30]:
print("\nTop words per topic:\n")
for idx, topic in lda_model.print_topics(-1):
    print(f"Topic {idx}: {topic}")


Top words per topic:

Topic 0: 0.061*"the" + 0.042*"and" + 0.032*"of" + 0.022*"about" + 0.022*"like" + 0.022*"watched" + 0.022*"again" + 0.022*"always" + 0.022*"me" + 0.022*"movies"
Topic 1: 0.024*"those" + 0.024*"say" + 0.024*"apart" + 0.024*"than" + 0.024*"me" + 0.024*"romances" + 0.024*"are" + 0.024*"joy" + 0.024*"films" + 0.024*"ten"
Topic 2: 0.055*"in" + 0.030*"scene" + 0.030*"every" + 0.030*"time" + 0.030*"slow" + 0.030*"sunshine" + 0.030*"spotless" + 0.030*"single" + 0.030*"gets" + 0.030*"rainy"
Topic 3: 0.055*"and" + 0.030*"that" + 0.030*"time" + 0.030*"between" + 0.030*"this" + 0.030*"scene" + 0.030*"differently" + 0.030*"shattered" + 0.030*"final" + 0.030*"murph"
Topic 4: 0.050*"the" + 0.034*"and" + 0.034*"me" + 0.034*"that" + 0.019*"while" + 0.019*"makes" + 0.019*"chasing" + 0.019*"loved" + 0.019*"happy" + 0.019*"la"


In [31]:
vis = gensimvis.prepare(lda_model, corpus, dictionary)
vis