In [1]:
!pip install bertopic sentence-transformers

import pandas as pd
from bertopic import BERTopic
from sklearn.feature_extraction.text import CountVectorizer

# Load dataset
df = pd.read_csv("/content/drive/MyDrive/Dessertation/processed_bluesky_data.csv", encoding="latin1")
df['processed_text'] = df['text'].fillna("")

# Remove very short posts for better topic quality
df = df[df['processed_text'].str.len() > 20]

# Initialize BERTopic model
topic_model = BERTopic(language="english", calculate_probabilities=True, verbose=True)

# Fit BERTopic model on social media text
topics, probs = topic_model.fit_transform(df['processed_text'].tolist())

# Get topic summary
print("\n🔹 Top 5 Topics Found:")
print(topic_model.get_topic_info().head())

# Save topic model
topic_model.save("bertopic_model")

# Visualize topics interactively (in Jupyter/Colab)
topic_model.visualize_topics()

# Get top words for a specific topic
print("\n🔹 Words for Topic 0:")
print(topic_model.get_topic(0))

# Extract representative examples per topic
topics_info = topic_model.get_topic_info()
print("\n🔹 Full Topic Overview:\n", topics_info.head())


Collecting bertopic
  Downloading bertopic-0.17.0-py3-none-any.whl.metadata (23 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)


2025-03-26 10:06:53,524 - BERTopic - Embedding - Transforming documents to embeddings.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.5k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Batches:   0%|          | 0/323 [00:00<?, ?it/s]

2025-03-26 10:11:51,395 - BERTopic - Embedding - Completed ✓
2025-03-26 10:11:51,397 - BERTopic - Dimensionality - Fitting the dimensionality reduction algorithm
2025-03-26 10:12:31,429 - BERTopic - Dimensionality - Completed ✓
2025-03-26 10:12:31,431 - BERTopic - Cluster - Start clustering the reduced embeddings
2025-03-26 10:12:45,191 - BERTopic - Cluster - Completed ✓
2025-03-26 10:12:45,215 - BERTopic - Representation - Fine-tuning topics using representation models.
2025-03-26 10:12:45,754 - BERTopic - Representation - Completed ✓



🔹 Top 5 Topics Found:
   Topic  Count                                        Name  \
0     -1   3949                          -1_the_to_riots_uk   
1      0    863             0_antiracism_white_racism_black   
2      1    514                            1_he_him_hes_his   
3      2    254  2_canadiangreens_liberalparty_ndp_oligarch   
4      3    125    3_rich_ripoffuk_money_peoplebeforeprofit   

                                      Representation  \
0    [the, to, riots, uk, and, of, in, is, that, it]   
1  [antiracism, white, racism, black, antiracist,...   
2  [he, him, hes, his, up, wont, turn, hell, will...   
3  [canadiangreens, liberalparty, ndp, oligarch, ...   
4  [rich, ripoffuk, money, peoplebeforeprofit, ti...   

                                 Representative_Docs  
0  [So long as enough of the good people are here...  
1  [Happy Juneteenth! Non Black people who got th...  
2  [We as a country need to remind him that we wi...  
3  [Meanwhile, in the trenches OF THE OLI