
# Topic Extraction Experiment
This notebook demonstrates multiple methods for extracting the top 5 discussion topics with 5 different models from 4 transcripts ( generated by different models using OpenAI Whisper from transcriptioin_demo.ipynb).

## Decision : KeyBERT 
- Balance between semantic quality, resource usage, and task fit
- Topics Extracted: ['starships', 'terraforming', 'scales', 'plan', 'synchronization']


Methods Choices
1. TF-IDF - fast but not ideal with conversational fillers
2. spaCy - fast and a little better with conversational fillers than TF-IDF
3. KeyBERT - good balance between quality and resources needed
4. DistilBERT + clustering - good semantic meaning but the output type doesn't fit this task
5. SentenceTransformer + clustering - good semantic meaning but the output type doesn't fit this task


### Step1 : Env Setup

In [55]:
import time
import psutil
import os
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from keybert import KeyBERT
from sentence_transformers import SentenceTransformer
from sklearn.cluster import KMeans
import spacy


In [None]:
def get_memory_usage_mb():
    """Return current memory usage (MB) of this process."""
    process = psutil.Process(os.getpid())
    return process.memory_info().rss / (1024 * 1024)

### Step2 : Load Transcript from transciption_demo.ipynb

In [None]:
# transcripts detected by different models using OpenAI Whisper
transcripts = {
    "tiny": "So let's take it past the point where you have the scales, you have reusable ship. Yeah. You've got it dialed in, then what are the steps? What's next step after that? Is it an unmanned voyage tomorrow's first? I'm Anne Fliedamores. The Earth and Mars will be synchronized every two years. Or every 26 months, technically. So the next orbital synchronization is November of next year. So, and you can launch plus minus a month roughly. So we'd have to launch in November or December of next year. And so the default plan is to launch hopefully several starships tomorrow at the end of next year. And what would they be doing? Well, at first we're just going to try to land on Mars and see if we succeed in landing. Do we do we succeed in landing? Like let's say we were able to send five ships to all five land intact or do we add some creators to Mars? If we add some creators, we've got to be put more courses about setting people. You know, we need to. So we're going to make sure the thing lands safely. How does it land on Mars? With on rocket thrusters. So it'll just land. We'll add legs. Okay. I'll just land and have legs in the air. So it'll be remote controlled from Earth. Or just autonomous autonomous completely. Mars is you can't remote control things from both because Mars. Mars is far. Yeah, it's too far. It's beautiful light constraints. So Mars at closest approaches roughly four light minutes. And when it's on the other side of the sun, it's about 12 light minutes. So you know, round trip would be like 40 minutes. Press case. If Mars is on the other side of the sun. So once you do that, then how long do you think before you start sending people up there? What we're going to try to go as fast as possible. You can think this is really. Erase against time. Can we make Mars self sufficient before. Civilization has some sort of. Future folk in the road where there's either like a war, nuclear war or something or a. We'll get hit by a meteor. Or simply, civilization might just die with a whimper in adult diapers instead of with a bang. I think we can do this and. Well, I don't know at least I think we've built within 15 earthmoss. Inconciliation event or you know, so it's like 30. Years. If we have an exponential increase in. If if every year, if every two years, we have like a major increase in. Number of people in town is tomorrow's. I think as a rough approximation, we need about a million tons of the so far as maybe a million people. That kind of thing to actually have a civilization. Yeah, that that would you terraform like what would you do you would eventually. Terrible moss at first people would live in some kind of protected environment like domes and underground. Kind of thing. Charafoing would take too long. I mean, we're at this point in time. Where. In the for the first time in the four and a half billion year history of earth. It is possible to extend consciousness beyond our home planet. And that window may be open for a long time or maybe open for a short time. I hope it's open for a long time. But it might only be open for a short time. And we just make sure that we extend the light of consciousness tomorrow. Before some. Before civilization. Either extinguishes or subsides. You know, all that and all that. Any's evidence that the technology level of Mars drops below. Or technology level of earth drops below what is necessary to send spaceships to Mars. So. If there's some really destructive war or like said some natural cataclysm. Or or simply the birth rate is so low that. You know, we we just like said die in adult diapers with a one per. That's one of the possible outcomes for a lot of countries ahead of that way. So Japan is right Japan. Yeah. Yeah. I mean, at at dangerously. Yeah, at current birth rates in three generations. Korea will be about 4% of its current size. That's insane. Yeah, maybe maybe even less than that. There are only at one third of placement rates. If you have three generations that one that's your one twenty-seventh. Of your current population, which is three percentish. Jesus Christ. Yeah. Basically population collapse happens fast. So. And seems to be accelerating in most possible. So basically, I mean, for my same one, I'm like. This is the first time it's been possible to extend life. You extend consciousness beyond earth. Maybe that went over you open for a long time. But I might only be open for a short time. We should make sure that we make life multi-planetary and make consciousness multi-planetary while it's possible.",
    "base": "So let's let's take it past the point where you have these scales you have a reusable ship Yeah, and you've you've got it dialed in then what are the steps? What what's next step after that is it an unmanned Voyage to Mars first. I'm an flight of Mars the Earth and Mars Orbit synchronize every two years or every 26 months technically so The next orbital synchronization is November of next year So and you can launch plus minus a month roughly so we'd have to launch in November or December of next year so the default plan is to launch hopefully several Starships to Mars at the end of next year And what would they be doing? Well at first we're just gonna try to land on Mars and see if we succeed in landing Do we succeed in landing like let's say we were able to send five ships do all five land intact or do we? Add some craters to Mars If we add some craters we've got to be But more cautious about sending people, you know, and we need to So we're gonna make sure the thing lands Safely how does it land on Mars with on rocket's restors? So it'll just land. Oh, well add legs. Okay. Yeah, we'll just land and have legs and yeah, so It'll be remote controlled from Earth Or just autonomous autonomous completely Mars is you can't remote control things from both because Mars. Yeah, it's too far speed of light you have speed of light constraints so Mars at closest approaches roughly four light minutes and When it's on the other side of the Sun it's it's about 12 light minutes, so you know round trip would be like 40 minutes best case if Mars is on the other side of the Sun So once you do that then how long do you think before you start sending people up there? Well, we're gonna try to go as fast as possible. You can think this is really Erase against time can we make Mars self-sufficient before Civilization has some sort of Future fork in the road where there's either like a war or nuclear war or something or a We'll get hit by a meteor Or or simply civilization might just die with a Womper in adult diapers instead of with a bang I Think we can do this and I don't know at least I think we do it within 15 Earth Mars Inquanization events or you know, so we select 30ish years If we have an exponential increase in If every year if every two years we're we have like a major increase in Number of people and tonished Mars like I think as a rough approximation We need about a million tons of the surface of Mars maybe a million people that kind of thing To actually have a civilization Yeah, the The would you terraform like what would you do you would eventually terraform Mars at first people would live in Some kind of protected environment like domes and underground kind of thing Terraforming would take too long I were at this point in time where In the for the first time in the four and a half billion year history of Earth It is possible to extend Consciousness beyond our home planet and That window may be open for long time or it may be open for a short time. I hope it's open for a long time but it might only be open for a short time and we just make sure that we extend the light of consciousness to Mars before Civilization either extinguishes or subsides You know, we'll let me let any Savon is that the technology level of Mars drops below or technology level of Earth drops below what is necessary to send space ships to Mars so If there's some really destructive war or like some natural cataclysm Or simply the birth rate is so low that you know here We're just like to die in In adult diapers with a one per that's one of the possible outcomes for a lot of countries ahead of that way By the way, so Japan is right? Japan Korea yeah, yeah, I mean at dangerously yeah at current birth rates in three generations Korea will be about 4% of its current size That's insane. Yeah, maybe maybe even less than that There they're only at one third replacement rate so if you if you have three generations that want that's your 127th Of your current population, which is three percentish Jesus Christ yeah Basically population collapse happens fast So and seems to be accelerating in most parts of the world So so basically I mean for myself when I'm like This is the first time it's been possible to extend life you extend consciousness beyond Earth Maybe that window will be open for a long time, but it might only be open for a short time We should make sure that we make life multi planetary and make consciousness multi planetary while it's possible",
    "small": "So let's let's take it past the point where you have these scales you have a reusable ship Yeah, and you've got it dialed in then what are the steps? What what's next step after that? Is it an unmanned? Vert or voyage to Mars first? I'm an flight of Mars the Earth and Mars Or would synchronize every two years Or every 26 months technically so the next orbital synchronization is November of next year So and you can launch plus minus a month roughly so we'd have to launch in November or December of next year that's so the default plan is to launch hopefully several Starships to Mars at the end of next year and what would they be doing? Well at first we're just gonna Try to land on Mars and see if we succeed in landing Do we succeed in landing? Like let's say we were able to send five ships do all five land intact or do we add some craters to Mars? If we add some craters we've got to be a bit more cautious about setting people you know if we need to So we're gonna make sure the thing lands Safely how does it land on Mars? With our rockets rusters, so it'll just land. Oh, we'll add legs. Okay. Yeah, I just land and have legs and yeah, so It'll be remote controlled from Earth Or just autonomous autonomous completely Mars is you can't remote control things from earth because Mars far. Yeah, it's too far speed of light. Yes, be like constraints. So Mars at closest approach is roughly four light minutes and When it's on the other side of the Sun, it's it's about 12 light minutes. So, you know round trip would be like 40 minutes Best case if Mars is on the other side of the Sun So once you do that then how long do you think before you start sending people up there? Well, we're gonna try to go as fast as possible you can think of this is really a Race against time. Can we make Mars? Self-sufficient before Civilization has some sort of Future folk in the road where there's either like a war nuclear war or something or a We'll get hit by a meteor Or or simply so the civilization might just die with a whimper in adult diapers instead of with a bang I think we can do this and Well, I don't know at least I think we do it within 15 earth Mars synchronization events or you know, so basically like 30 years if We have an exponential increase in If every year if every two years we have like a major increase in Number of people and tundish to Mars like I think as a rough approximation We need about a million tons this of some Mars maybe a million people that kind of thing to actually have a civilization Yeah, the that would you terraform like what would you do you would eventually terraform Mars at first people would live in Some kind of protected environment like domes and underground and kind of thing Terraforming would take too long And we're at this point in time where In the for the first time in the four and a half billion year history of earth it is possible to extend consciousness beyond our home planet and That window may be open for a long time or it may be open for a short time. I hope it's open for a long time but it might only be open for a short time and We should just make sure that we extend the light of consciousness to Mars before some Before civilization either extinguishes or subsides You know it will let me let any saviours that the Technology level of Mars drops below or technology level of earth drops below what is necessary to send space ships to Mars so If there's some really destructive war or like said some natural cataclysm or simply the birth rate is so low that you know We just likes to die In adult diapers with a whimper. That's one of the possible outcomes A lot of countries ahead of that way by the way, Japan is right Japan. Yeah, yeah, I mean at dangerously Yeah, at current both rates in three generations Korea will be about four percent of its current size That's insane. Yeah, maybe maybe even less than that There they're only at one third replacement rates if you if you have three generations that one that's your 127th Of your current population, which is three percent dish Jesus Christ. Yeah basically population class happens fast So it seems to be accelerating in most parts of the world So so basically I mean for myself when I'm like This is the first time it's been possible to extend life Extend consciousness beyond earth Maybe that window will be open for a long time, but it might only be open for a short time We should make sure that we make life multi planetary and make consciousness multi planetary while it's possible",
    "medium": "So let's let's take it past the point where you have these scales you have a reusable Ship yeah, and you've got it dialed in then. What are the steps? What what's next step after that is it an unmanned? Ver voyage to Mars first I'm and flight of Mars the Earth and Mars Orbit synchronized every two years Or every 26 months technically so the next orbital synchronization is November of next year So and you can launch plus minus a month roughly so we'd have to launch in November or December of next year and so the default plan is to launch hopefully several Starships to Mars at the end of next year and what would they be doing? Well at first we're just gonna Try to land on Mars and see if we succeed in landing Do we succeed in landing? Like let's say we were able to send five ships do all five land intact or do we add some craters to Mars? If we add some craters we've got to be a bit more cautious about sending people you know we need to So we're gonna make sure the thing lands Safely how does it land on Mars with our rocket thrusters, so it'll just land No, we'll add legs. Okay. Yeah, I just land and have legs and yeah, so It'll be remote controlled from Earth Or just autonomous autonomous completely Mars is you can't remote control things from Earth because Mars. Yes too far speed of light. You have speed of light constraints, so Mars at closest approach is roughly four light minutes and When it's on the other side of the Sun, it's it's about 12 light minutes So you know round trip would be like 40 minutes best case if Mars is on the other side of the Sun. Mm-hmm So once you do that, then how long do you think before you start sending people up there? Well, we're gonna try to go as fast as possible You think this is really A race against time can we make Mars self-sufficient before Civilization has some sort of Future folk in the road where there's either like a war nuclear war or something or a We'll get hit by a meteor Or simply civilization might just die with a whimper in adult diapers instead of with a bang I Think we can do this and Why don't at least I think we do it within 15 Earth Mars synchronization events or you know, so basically like 30 ish years If We have an exponential increase in If every year if every two years we have like a major increase in Number of people and tundish to Mars like I think as a rough approximation We need about a million tons the surface of Mars maybe a million people that kind of thing to actually have a civilization Yeah, the Would you terraform like what would you do? You would eventually terraform Mars at first people would live in some kind of protected environment like domes and underground kind of thing Terraforming would take too long And we're at this point in time where In the for the first time in the four and a half billion year history of Earth. It is possible to extend Consciousness beyond our home planet and That window may be open for a long time or it may be open for a short time. I hope it's open for a long time but it might only be open for a short time and We should just make sure that we extend the light of consciousness to Mars Before some Before civilization either extinguishes or subsides You know, we'll let any Any savannas that the technology level of Mars drops below or technology level of Earth drops below What is necessary to send spaceships to Mars? So If there's some really destructive war or like said some natural cataclysm Or simply the birth rate is so low that you know here We just like to die In adult diapers with a whimper that's one of the possible outcomes For a lot of countries ahead of that way, by the way, Japan is right Japan Korea. Yeah I mean Dangerously. Yeah at current birth rates in three generations Korea will be about four percent of its current size That's insane. Yeah, maybe even less than that Yeah, maybe maybe even less than that There there are only at one third replacement rate. So if you have three generations that one that's your 127th Of your current population, which is three percent dish Jesus Christ, yeah, basically population collapse happens fast So and seems to be accelerating in most parts of the world So so basically I mean from my standpoint I'm like This is the first time it's been possible to extend life extend consciousness beyond Earth Maybe that window will be open for a long time, but it might only be open for a short time We should make sure that we make life multi-planetary and make consciousness multi-planetary while it's possible"
}

In [3]:
transcript = """So let's let's take it past the point where you have these scales you have a reusable ship Yeah, and you've you've got it dialed in then what are the steps? What what's next step after that is it an unmanned Voyage to Mars first. I'm an flight of Mars the Earth and Mars Orbit synchronize every two years or every 26 months technically so The next orbital synchronization is November of next year So and you can launch plus minus a month roughly so we'd have to launch in November or December of next year so the default plan is to launch hopefully several Starships to Mars at the end of next year And what would they be doing? Well at first we're just gonna try to land on Mars and see if we succeed in landing Do we succeed in landing like let's say we were able to send five ships do all five land intact or do we? Add some craters to Mars If we add some craters we've got to be But more cautious about sending people, you know, and we need to So we're gonna make sure the thing lands Safely how does it land on Mars with on rocket's restors? So it'll just land. Oh, well add legs. Okay. Yeah, we'll just land and have legs and yeah, so It'll be remote controlled from Earth Or just autonomous autonomous completely Mars is you can't remote control things from both because Mars. Yeah, it's too far speed of light you have speed of light constraints so Mars at closest approaches roughly four light minutes and When it's on the other side of the Sun it's it's about 12 light minutes, so you know round trip would be like 40 minutes best case if Mars is on the other side of the Sun So once you do that then how long do you think before you start sending people up there? Well, we're gonna try to go as fast as possible. You can think this is really Erase against time can we make Mars self-sufficient before Civilization has some sort of Future fork in the road where there's either like a war or nuclear war or something or a We'll get hit by a meteor Or or simply civilization might just die with a Womper in adult diapers instead of with a bang I Think we can do this and I don't know at least I think we do it within 15 Earth Mars Inquanization events or you know, so we select 30ish years If we have an exponential increase in If every year if every two years we're we have like a major increase in Number of people and tonished Mars like I think as a rough approximation We need about a million tons of the surface of Mars maybe a million people that kind of thing To actually have a civilization Yeah, the The would you terraform like what would you do you would eventually terraform Mars at first people would live in Some kind of protected environment like domes and underground kind of thing Terraforming would take too long I were at this point in time where In the for the first time in the four and a half billion year history of Earth It is possible to extend Consciousness beyond our home planet and That window may be open for long time or it may be open for a short time. I hope it's open for a long time but it might only be open for a short time and we just make sure that we extend the light of consciousness to Mars before Civilization either extinguishes or subsides You know, we'll let me let any Savon is that the technology level of Mars drops below or technology level of Earth drops below what is necessary to send space ships to Mars so If there's some really destructive war or like some natural cataclysm Or simply the birth rate is so low that you know here We're just like to die in In adult diapers with a one per that's one of the possible outcomes for a lot of countries ahead of that way By the way, so Japan is right? Japan Korea yeah, yeah, I mean at dangerously yeah at current birth rates in three generations Korea will be about 4% of its current size That's insane. Yeah, maybe maybe even less than that There they're only at one third replacement rate so if you if you have three generations that want that's your 127th Of your current population, which is three percentish Jesus Christ yeah Basically population collapse happens fast So and seems to be accelerating in most parts of the world So so basically I mean for myself when I'm like This is the first time it's been possible to extend life you extend consciousness beyond Earth Maybe that window will be open for a long time, but it might only be open for a short time We should make sure that we make life multi planetary and make consciousness multi planetary while it's possible"""
print("The length of the transcript: ", len(transcript))

The length of the transcript:  4487


### Step3 : Experiment on Different Model with Different Transcripts
#### Step3-1: TF-IDF Method

Output Type : Keywords (single words or short n-grams)

 
**Observation:** 
- TF-IDF alone may not work well for very short transcripts with conversational fillers (e.g. 'like', 'time', 'yeah', 'just')

In [18]:
def extract_topics_tfidf(text, top_n=5):
    vectorizer = TfidfVectorizer(stop_words='english', ngram_range=(1,2))
    X = vectorizer.fit_transform([text])
    scores = X.toarray()[0]
    terms = vectorizer.get_feature_names_out()
    term_score_pairs = list(zip(terms, scores))
    term_score_pairs.sort(key=lambda x: x[1], reverse=True)
    top_terms = [t for t, s in term_score_pairs[:top_n]]
    return top_terms

In [46]:
print(f"TF-IDF Topics")
for model, transcript in transcripts.items():
    start_mem = get_memory_usage_mb()
    start_time = time.time()
    tfidf_topics = extract_topics_tfidf(transcript, top_n=5)
    tfidf_time = time.time() - start_time
    tfidf_mem = get_memory_usage_mb() - start_mem

    print(f"Model {model}: ")
    print(f"\t Topics: {tfidf_topics}")
    print(f"\t Time: {tfidf_time:.3f}s, Memory Used: {tfidf_mem:.1f} MB")

TF-IDF Topics
Model tiny: 
	 Topics: ['mars', 'like', 'time', 'yeah', 'just']
	 Time: 0.005s, Memory Used: 0.0 MB
Model base: 
	 Topics: ['mars', 'like', 'time', 'yeah', 'just']
	 Time: 0.002s, Memory Used: 0.0 MB
Model small: 
	 Topics: ['mars', 'like', 'time', 'yeah', 'earth']
	 Time: 0.002s, Memory Used: 0.0 MB
Model medium: 
	 Topics: ['mars', 'like', 'time', 'yeah', 'earth']
	 Time: 0.034s, Memory Used: 0.0 MB


#### Step3-2 : spaCy Method (small nlp model for CPU-friendly environment)

Output Type : Words based on POS (nouns/proper nouns)

**Observation:**
- spaCy alone may not work well for very short transcripts with conversational fillers either
- it better filters out the stopping words than tf-idf (e.g. filtered out "yeah", "just")

In [None]:
# !python -m spacy download en_core_web_sm
nlp = spacy.load("en_core_web_sm")  # small pre-trained English language model 

def extract_topics_spacy(text, top_n=5):
    doc = nlp(text.lower())
    # Use nouns and proper nouns only
    candidates = [token.text for token in doc if token.pos_ in ["NOUN", "PROPN"] and not token.is_stop]
    # Count frequency
    from collections import Counter
    counts = Counter(candidates)
    top_keywords = [word for word, _ in counts.most_common(top_n)]
    return top_keywords

In [63]:
print(f"spaCy Topics")
for model, transcript in transcripts.items():
    start_mem = get_memory_usage_mb()
    start_time = time.time()
    spacy_topics = extract_topics_spacy(transcript, top_n=5)
    spacy_time = time.time() - start_time
    spacy_mem = get_memory_usage_mb() - start_mem

    print(f"Model {model}: ")
    print(f"\t Topics: {spacy_topics}")
    print(f"\t Time: {spacy_time:.3f}s, Memory Used: {spacy_mem:.1f} MB")

spaCy Topics
Model tiny: 
	 Topics: ['mars', 'time', 'earth', 'year', 'people']
	 Time: 0.180s, Memory Used: 58.5 MB
Model base: 
	 Topics: ['mars', 'time', 'earth', 'year', 'people']
	 Time: 0.155s, Memory Used: 7.8 MB
Model small: 
	 Topics: ['mars', 'time', 'earth', 'year', 'people']
	 Time: 0.145s, Memory Used: 6.9 MB
Model medium: 
	 Topics: ['mars', 'time', 'earth', 'year', 'people']
	 Time: 0.148s, Memory Used: 1.2 MB


#### Step3-3 : KeyBERT Method

Output Type : Keywords / multi-word phrases

**Observation:** 
- KeyBERT captures semantic meaning better due to its embedding-based semantic mechanism 
- Moderate speed, an average of 3 seconds

In [32]:
def extract_topics_keybert(text, top_n=5, model_name="all-MiniLM-L6-v2"):
    kw_model = KeyBERT(model_name)
    # stop_words='english' removes common English stop words
    # use_mmr=True (Maximal Marginal Relevance) reduces repeated/similar keywords
    keywords = kw_model.extract_keywords(text, top_n=top_n, stop_words='english', use_mmr=True)
    return [kw for kw, score in keywords]

In [47]:
print(f"KeyBERT Topics")
for model, transcript in transcripts.items():
    start_mem = get_memory_usage_mb()
    start_time = time.time()
    keybert_topics = extract_topics_keybert(transcript, top_n=5)
    keybert_time = time.time() - start_time
    keybert_mem = get_memory_usage_mb() - start_mem

    print(f"Model {model}: ")
    print(f"\t Topics: {keybert_topics}")
    print(f"\t Time: {keybert_time:.3f}s, Memory Used: {keybert_mem:.1f} MB")

KeyBERT Topics
Model tiny: 
	 Topics: ['mars', 'ship', 'tomorrow', 'synchronization', 'scales']
	 Time: 3.455s, Memory Used: -183.1 MB
Model base: 
	 Topics: ['mars', 'ship', 'scales', 'synchronization', 'hopefully']
	 Time: 2.821s, Memory Used: 12.4 MB
Model small: 
	 Topics: ['starships', 'terraforming', 'scales', 'plan', 'synchronization']
	 Time: 2.931s, Memory Used: 14.2 MB
Model medium: 
	 Topics: ['starships', 'terraforming', 'scales', 'synchronization', 'hopefully']
	 Time: 2.868s, Memory Used: 5.7 MB


### Step3-4: DistilBERT + Clustering Method

Output Type : Representative sentence from each cluster

**Observation** : 
- Sentence result type is not suitable for our task here
- Good semantic clustering
- Longer execution time

In [49]:
def extract_topics_distilbert(text, top_n=5, model_name="distilbert-base-nli-stsb-mean-tokens"):
    sentences = [s.strip() for s in text.split('.') if s.strip()]
    model = SentenceTransformer(model_name)
    embeddings = model.encode(sentences)
    
    n_clusters = min(top_n, len(sentences))
    kmeans = KMeans(n_clusters=n_clusters, random_state=42)
    kmeans.fit(embeddings)
    
    labels = kmeans.labels_
    topics = []
    for i in range(n_clusters):
        cluster_sentences = [sentences[j] for j in range(len(sentences)) if labels[j]==i]
        topics.append(max(cluster_sentences, key=len))  # pick longest sentence
    return topics

In [51]:
print(f"DistilBERT Topics")
for model, transcript in transcripts.items():
    start_mem = get_memory_usage_mb()
    start_time = time.time()
    distilbert_topics = extract_topics_distilbert(transcript)
    distilbert_time = time.time() - start_time
    distilbert_mem = get_memory_usage_mb() - start_mem

    print(f"Model {model}: ")
    print(f"\t Topics: {distilbert_topics}")
    print(f"\t Time: {distilbert_time:.3f}s, Memory Used: {distilbert_mem:.1f} MB")

DistilBERT Topics
Model tiny: 
	 Topics: ["Do we do we succeed in landing? Like let's say we were able to send five ships to all five land intact or do we add some creators to Mars? If we add some creators, we've got to be put more courses about setting people", "You've got it dialed in, then what are the steps? What's next step after that? Is it an unmanned voyage tomorrow's first? I'm Anne Fliedamores", "So let's take it past the point where you have the scales, you have reusable ship", 'Terrible moss at first people would live in some kind of protected environment like domes and underground', "I mean, we're at this point in time"]
	 Time: 3.145s, Memory Used: 32.5 MB
Model base: 
	 Topics: ['Oh, well add legs', "You can think this is really Erase against time can we make Mars self-sufficient before Civilization has some sort of Future fork in the road where there's either like a war or nuclear war or something or a We'll get hit by a meteor Or or simply civilization might just die w

### Step3-5 : SentenceTransformer + Clustering Method

Output Type : Representative sentence from each cluster

**Observation** : 
- Sentence result type is not suitable for our task here
- Good semantic clustering
- Longer execution time

In [48]:
def extract_topics_st_clustering(text, top_n=5, model_name="all-MiniLM-L6-v2"):
    # Split transcript into sentences
    sentences = [s.strip() for s in text.split('.') if s.strip()]
    model = SentenceTransformer(model_name)
    embeddings = model.encode(sentences)
    
    n_clusters = min(top_n, len(sentences))
    kmeans = KMeans(n_clusters=n_clusters, random_state=42)
    kmeans.fit(embeddings)
    
    cluster_centers = kmeans.cluster_centers_
    labels = kmeans.labels_
    
    topics = []
    for i in range(n_clusters):
        cluster_sentences = [sentences[j] for j in range(len(sentences)) if labels[j]==i]
        # pick the longest sentence as representative topic
        topics.append(max(cluster_sentences, key=len))
    return topics

In [62]:
print(f"SentenceTransformer + Clustering Topics")
for model, transcript in transcripts.items():
    start_mem = get_memory_usage_mb()
    start_time = time.time()
    st_topics = extract_topics_st_clustering(transcript)
    st_time = time.time() - start_time
    st_mem = get_memory_usage_mb() - start_mem

    print(f"Model {model}: ")
    print(f"\t Topics: {st_topics}")
    print(f"\t Time: {st_time:.3f}s, Memory Used: {st_mem:.1f} MB")

SentenceTransformer + Clustering Topics
Model tiny: 
	 Topics: ['Terrible moss at first people would live in some kind of protected environment like domes and underground', "We should make sure that we make life multi-planetary and make consciousness multi-planetary while it's possible", "Do we do we succeed in landing? Like let's say we were able to send five ships to all five land intact or do we add some creators to Mars? If we add some creators, we've got to be put more courses about setting people", "So once you do that, then how long do you think before you start sending people up there? What we're going to try to go as fast as possible", "So basically, I mean, for my same one, I'm like"]
	 Time: 3.932s, Memory Used: 74.0 MB
Model base: 
	 Topics: ['Oh, well add legs', "You can think this is really Erase against time can we make Mars self-sufficient before Civilization has some sort of Future fork in the road where there's either like a war or nuclear war or something or a We'll 

### Step4: Result Comparison 
With transcript generated by **base** model (decision of transciption_demo.ipynb) 

In [66]:
tfidf_topics = extract_topics_tfidf(transcripts["base"])
spacy_topics = extract_topics_spacy(transcripts["base"])
keybert_topics = extract_topics_keybert(transcripts["base"])
distilbert_topics = extract_topics_distilbert(transcripts["base"])
st_topics = extract_topics_st_clustering(transcripts["base"])

summary_df = pd.DataFrame([
    {"method": "TF-IDF", "topics": tfidf_topics, "time_sec": tfidf_time, "memory_mb": tfidf_mem},
    {"method": "spaCy", "topics": spacy_topics, "time_sec": spacy_time, "memory_mb": spacy_mem},
    {"method": "KeyBERT", "topics": keybert_topics, "time_sec": keybert_time, "memory_mb": keybert_mem},
    {"method": "DistilBERT+Clustering", "topics": distilbert_topics, "time_sec": distilbert_time, "memory_mb": distilbert_mem},
    {"method": "ST+Clustering", "topics": st_topics, "time_sec": st_time, "memory_mb": st_mem}
])

summary_df

Unnamed: 0,method,topics,time_sec,memory_mb
0,TF-IDF,"[mars, like, time, yeah, just]",0.033735,0.0
1,spaCy,"[mars, time, earth, year, people]",0.148005,1.21875
2,KeyBERT,"[mars, ship, scales, synchronization, hopefully]",2.867706,5.65625
3,DistilBERT+Clustering,"[Oh, well add legs, You can think this is real...",3.758725,3.484375
4,ST+Clustering,"[Oh, well add legs, You can think this is real...",3.066503,9.765625


### Observation
1. Speed / Resource
- TF-IDF and spaCy are extremely lightweight — ideal for CPU and quick testing
- KeyBERT is moderately heavy (embedding-based), but still practical on CPU
- DistilBERT + clustering and SentenceTransformer + clustering are heavy, usually need GPU for larger transcripts

2.	Semantic Accuracy
- TF-IDF and spaCy ignore context; may select trivial words
- KeyBERT captures semantic meaning and multi-word phrases
- Clustering methods produce semantically coherent representative sentences, but sentences is not a output type we're looking for here

3.	Task Requirements 
- Since the next step is vectorizing topics and combining with psychometrics, short phrases are easier to encode than full sentences


### Trade-off and Conclusion
- For this pipeline, **KeyBERT** is selected as the default method due to its balance of semantic quality, resource usage, and task fit