In [None]:
reddit_sample = """Two years ago, I made the hardest, most soul-shattering decision of my life: I left my wife. Not because I stopped loving her or because we grew apart. I walked away because I couldn’t keep tearing myself apart, trying to be someone I wasn’t, to fit into a life that felt like it was slowly strangling me.

My ex-wife and I met when we were just kids, barely 18, with the kind of naive hope that thinks love can conquer anything. She was my best friend, my other half, my safe place in a world that sometimes felt too big and too cold. When we stood there exchanging vows, I truly believed we’d be together forever. And maybe we could have been—if it was just us. But it wasn’t.

Her family had expectations. They wanted a doctor, a respected man with a title and prestige, someone who fit into their perfect image of success. They never said it outright, but every look, every subtle comment chipped away at me. They didn’t see me for who I was—they only saw the gap between who I was and who they thought I should be. And my wife…she was caught in the middle, torn between their love and mine.

We tried to make it work. God, we tried everything. We moved to different cities, created distance from her family, hoping that it would give us space to build something that was just ours. For a while, it worked, and I thought maybe we’d found a way. But every holiday, every phone call brought those expectations crashing back down on us. It was like her family’s hopes for her future were this invisible weight pressing down on us, one I couldn’t shake no matter how hard I tried.

My wife started to change. She didn’t want to, I could see that. But that constant tension wore on her. She looked at me like she was caught in a trap, like she was torn in two. And I felt powerless. I couldn’t ask her to choose, but I could see the pain she felt, the guilt she carried. Her family had her in their grip, and I was just another weight, pulling her in the opposite direction.

Eventually, we both broke. I remember the day we decided to separate. There wasn’t yelling or accusations, no one stormed out. It was quiet, like a mutual surrender. We sat on the couch, both of us just holding each other, crying, knowing that love wasn’t enough to save us from the world around us. And in that moment, I felt like my heart was being torn out of my chest.

I let go of the one person who knew me inside and out, not because I wanted to, but because I felt like I was suffocating. I thought that maybe, if I wasn’t in her life, she’d finally be free to live without that constant push and pull, free from the guilt and the expectations that haunted us. I thought I was doing something selfless, but the truth? I was broken. Walking away shattered me in ways I didn’t know were possible.

In the months after, I barely recognized myself. I was hollow, a shell of who I’d been, wondering if I’d made the biggest mistake of my life. Every day felt like a battle to just get out of bed, to go on with a life that suddenly felt like it wasn’t mine. And the worst part? The memories. Everywhere I looked, I saw her—our inside jokes, her smile, the way she’d curl up next to me at night. I felt haunted by a love that was still so painfully alive, yet completely out of reach.

I’ve spent the last two years trying to rebuild, piece by piece, but there’s a part of me that’s still living in that moment on the couch, holding her, crying, both of us knowing that what we had wasn’t strong enough to withstand the world. I still love her. I don’t think that will ever change. And I don’t know if I’ll ever truly move on.

People say time heals, but some wounds feel too deep to ever really close. I wanted to share this here, not because I’m looking for advice or sympathy, but because the weight of this loss is too much to carry alone. I lost myself trying to make a life with her, and now, in letting go, I’m still trying to find the pieces of who I am without her.

If you’re reading this, thank you. Sometimes, just putting it into words feels like the only way to make sense of it all."""

: 

In [65]:
import spacy


nlp = spacy.load('en_core_web_sm')

In [66]:
doc = nlp(reddit_sample)

In [67]:
sentences = list(doc.sents)


In [68]:
for sentence in sentences:
    print(sentence.text)

Two years ago, I made the hardest, most soul-shattering decision of my life: I left my wife.
Not because I stopped loving her or because we grew apart.
I walked away because I couldn’t keep tearing myself apart, trying to be someone I wasn’t, to fit into a life that felt like it was slowly strangling me.


My ex-wife
and I met when we were just kids, barely 18, with the kind of naive hope that thinks love can conquer anything.
She was my best friend, my other half, my safe place in a world that sometimes felt too big and too cold.
When we stood there exchanging vows, I truly believed we’d be together forever.
And maybe we could have been—if it was just us.
But it wasn’t.


Her family had expectations.
They wanted a doctor, a respected man with a title and prestige, someone who fit into their perfect image of success.
They never said it outright, but every look, every subtle comment chipped away at me.
They didn’t see me for who I was—they only saw the gap between who I was and who they

In [69]:
len(sentences)

54

In [70]:
# prompt: Iterate through all sentences. If the sentence is just whitespace, skip over it. Otherwise, print it
sample = sentences[0:3]

for sam in sample:
  print(f"\"{sam.text}\" {sam.start_char}")

"Two years ago, I made the hardest, most soul-shattering decision of my life: I left my wife." 0
"Not because I stopped loving her or because we grew apart." 93
"I walked away because I couldn’t keep tearing myself apart, trying to be someone I wasn’t, to fit into a life that felt like it was slowly strangling me.

" 152


In [71]:
bare_text = [sen.text.strip() for sen in sentences]

In [72]:
import torch.backends
import torch.cuda
import torch

# Check for GPU
torch.backends.mps.is_available()
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")

Using device: cuda


In [73]:
from transformers import DistilBertTokenizer, DistilBertModel

# Load the DistilBERT model and tokenizer
tokenizer = DistilBertTokenizer.from_pretrained("distilbert-base-uncased")
model = DistilBertModel.from_pretrained("distilbert-base-uncased").to(device)


In [74]:
def embed_sentences(sentences):
    embeddings = []
    for sentence in sentences:
        # Tokenize and send to device
        inputs = tokenizer(sentence, return_tensors="pt", truncation=True, max_length=512).to(device)

        # Forward pass without gradients to save memory
        with torch.no_grad():
            outputs = model(**inputs)

        # Mean pooling on the last hidden state to get a single embedding vector for the sentence
        sentence_embedding = outputs.last_hidden_state.mean(dim=1).squeeze().cpu().numpy()
        embeddings.append(sentence_embedding)

    return embeddings

In [75]:
embeddings = embed_sentences(bare_text)
len(embeddings)

54

In [76]:

from sklearn.cluster import KMeans
import numpy as np

# Define the number of initial clusters (e.g., 20 or 30 for 53 sentences)
n_initial_clusters = len(sentences)
kmeans = KMeans(n_clusters=n_initial_clusters, random_state=0)
labels = kmeans.fit_predict(np.array(embeddings))


In [77]:
from collections import defaultdict

# Group sentences by initial cluster labels
initial_chunks = defaultdict(list)
for sentence, label in zip(bare_text, labels):
    initial_chunks[label].append(sentence)

# Convert to a list of lists for further processing
initial_chunks_list = list(initial_chunks.values())


In [78]:
from sklearn.metrics.pairwise import cosine_similarity

# Define similarity threshold for merging clusters
similarity_threshold = 0.75
final_chunks = []
current_chunk = initial_chunks_list[0]

for i in range(1, len(initial_chunks_list)):
    # Calculate similarity between the last sentence of the current chunk and the first sentence of the next cluster
    similarity = cosine_similarity(
        [embed_sentences([current_chunk[-1]])[0]],
        [embed_sentences([initial_chunks_list[i][0]])[0]]
    )[0, 0]

    if similarity > similarity_threshold:
        current_chunk.extend(initial_chunks_list[i])
    else:
        final_chunks.append(current_chunk)
        current_chunk = initial_chunks_list[i]

# Append the last chunk
final_chunks.append(current_chunk)

# Display the results
for i, chunk in enumerate(final_chunks):
    print(f"Chunk {i + 1}: {chunk}\n")


Chunk 1: ['Two years ago, I made the hardest, most soul-shattering decision of my life: I left my wife.', 'Not because I stopped loving her or because we grew apart.', 'I walked away because I couldn’t keep tearing myself apart, trying to be someone I wasn’t, to fit into a life that felt like it was slowly strangling me.']

Chunk 2: ['My ex-wife']

Chunk 3: ['and I met when we were just kids, barely 18, with the kind of naive hope that thinks love can conquer anything.', 'She was my best friend, my other half, my safe place in a world that sometimes felt too big and too cold.', 'When we stood there exchanging vows, I truly believed we’d be together forever.']

Chunk 4: ['And maybe we could have been—if it was just us.', 'But it wasn’t.']

Chunk 5: ['Her family had expectations.']

Chunk 6: ['They wanted a doctor, a respected man with a title and prestige, someone who fit into their perfect image of success.']

Chunk 7: ['They never said it outright, but every look, every subtle comment

In [79]:
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

similarity_threshold = 0.75

# Initialize the list to store chunks, with each sentence as (index, text)
final_chunks = []
current_chunk = [(0, bare_text[0])]  # Start the first chunk with the first sentence and its index

# Iterate through sentences sequentially
for i in range(1, len(bare_text)):
    # Calculate similarity between the last sentence of the current chunk and the next sentence
    similarity = cosine_similarity(
        [embeddings[i - 1]], [embeddings[i]]
    )[0, 0]

    # If similarity is above the threshold, add to current chunk with the index
    if similarity >= similarity_threshold:
        current_chunk.append((i, bare_text[i]))  # Append as (index, text)
    else:
        # Otherwise, finalize the current chunk and start a new one
        final_chunks.append(current_chunk)
        current_chunk = [(i, bare_text[i])]  # Start a new chunk with the new sentence

# Append the last chunk if it wasn't added
if current_chunk:
    final_chunks.append(current_chunk)

# Display the results
for i, chunk in enumerate(final_chunks):
    print(f"Chunk {i + 1}:")
    for idx, sentence in chunk:
        print(f"Sentence {idx + 1}: {sentence}")
    print("\n")


Chunk 1:
Sentence 1: Two years ago, I made the hardest, most soul-shattering decision of my life: I left my wife.
Sentence 2: Not because I stopped loving her or because we grew apart.
Sentence 3: I walked away because I couldn’t keep tearing myself apart, trying to be someone I wasn’t, to fit into a life that felt like it was slowly strangling me.


Chunk 2:
Sentence 4: My ex-wife


Chunk 3:
Sentence 5: and I met when we were just kids, barely 18, with the kind of naive hope that thinks love can conquer anything.
Sentence 6: She was my best friend, my other half, my safe place in a world that sometimes felt too big and too cold.
Sentence 7: When we stood there exchanging vows, I truly believed we’d be together forever.


Chunk 4:
Sentence 8: And maybe we could have been—if it was just us.
Sentence 9: But it wasn’t.


Chunk 5:
Sentence 10: Her family had expectations.


Chunk 6:
Sentence 11: They wanted a doctor, a respected man with a title and prestige, someone who fit into their per

In [80]:
from transformers import pipeline

# Load the emotion detection pipeline
emotion_detector = pipeline("text-classification", model="j-hartmann/emotion-english-distilroberta-base", top_k=None, device=device)



In [81]:
def analyze_emotions_in_chunk(chunk_text):
    # Get emotion scores from the pipeline
    results = emotion_detector(chunk_text)

    # Process results to get a dictionary of emotions and their scores
    emotions = {item['label']: item['score'] for item in results[0]}

    # Return the chunk text and its emotions
    return {
        "chunk": chunk_text,
        "emotions": emotions
    }


In [82]:
# Analyze emotions for each chunk individually
chunk_emotions = []

for chunk in final_chunks:
    # Join only the sentence texts to create a single text passage for emotion detection
    chunk_text = " ".join(sentence for _, sentence in chunk)

    # Run emotion analysis on the joined text passage
    analysis_result = analyze_emotions_in_chunk(chunk_text)

    # Store only the 
    # irst and last indices of the sentences in this chunk
    start_index = chunk[0][0]
    end_index = chunk[-1][0]

    # Add the text, emotion analysis, and index range to the result
    analysis_result["start_index"] = start_index
    analysis_result["end_index"] = end_index
    chunk_emotions.append(analysis_result)

# Display results
for i, result in enumerate(chunk_emotions):
    print(f"Chunk {i + 1}:")
    print(f"Text: {result['chunk']}")
    print(f"Sentence Indices in Original Text: {result['start_index']} to {result['end_index']}")
    print("Emotions:")
    for emotion, score in result['emotions'].items():
        print(f"  {emotion}: {score:.3f}")
    print("\n")


Chunk 1:
Text: Two years ago, I made the hardest, most soul-shattering decision of my life: I left my wife. Not because I stopped loving her or because we grew apart. I walked away because I couldn’t keep tearing myself apart, trying to be someone I wasn’t, to fit into a life that felt like it was slowly strangling me.
Sentence Indices in Original Text: 0 to 2
Emotions:
  sadness: 0.693
  fear: 0.214
  anger: 0.044
  disgust: 0.026
  neutral: 0.017
  joy: 0.003
  surprise: 0.002


Chunk 2:
Text: My ex-wife
Sentence Indices in Original Text: 3 to 3
Emotions:
  sadness: 0.968
  anger: 0.010
  disgust: 0.007
  surprise: 0.006
  joy: 0.004
  neutral: 0.004
  fear: 0.001


Chunk 3:
Text: and I met when we were just kids, barely 18, with the kind of naive hope that thinks love can conquer anything. She was my best friend, my other half, my safe place in a world that sometimes felt too big and too cold. When we stood there exchanging vows, I truly believed we’d be together forever.
Sentence I

In [83]:
chunk_emotions

[{'chunk': 'Two years ago, I made the hardest, most soul-shattering decision of my life: I left my wife. Not because I stopped loving her or because we grew apart. I walked away because I couldn’t keep tearing myself apart, trying to be someone I wasn’t, to fit into a life that felt like it was slowly strangling me.',
  'emotions': {'sadness': 0.6933683753013611,
   'fear': 0.21393989026546478,
   'anger': 0.04419312626123428,
   'disgust': 0.026415416970849037,
   'neutral': 0.016583560034632683,
   'joy': 0.0032139967661350965,
   'surprise': 0.002285611815750599},
  'start_index': 0,
  'end_index': 2},
 {'chunk': 'My ex-wife',
  'emotions': {'sadness': 0.9677837491035461,
   'anger': 0.009964397177100182,
   'disgust': 0.00702268909662962,
   'surprise': 0.006057017482817173,
   'joy': 0.004129399079829454,
   'neutral': 0.003566598054021597,
   'fear': 0.0014762701466679573},
  'start_index': 3,
  'end_index': 3},
 {'chunk': 'and I met when we were just kids, barely 18, with the ki

In [84]:
for chunk in chunk_emotions:
  first_sentence = sentences[chunk['start_index']]
  last_sentence = sentences[chunk['end_index']]
  print(f"{first_sentence.start_char} - {last_sentence.end_char}")
  print(" Emotions:")
  for emotion, score in chunk['emotions'].items():
    print(f"  {emotion}: {score:.3f}")
  print()

0 - 307
 Emotions:
  sadness: 0.693
  fear: 0.214
  anger: 0.044
  disgust: 0.026
  neutral: 0.017
  joy: 0.003
  surprise: 0.002

307 - 317
 Emotions:
  sadness: 0.968
  anger: 0.010
  disgust: 0.007
  surprise: 0.006
  joy: 0.004
  neutral: 0.004
  fear: 0.001

318 - 615
 Emotions:
  joy: 0.631
  anger: 0.164
  fear: 0.092
  neutral: 0.046
  sadness: 0.043
  surprise: 0.020
  disgust: 0.004

616 - 680
 Emotions:
  neutral: 0.779
  surprise: 0.092
  sadness: 0.045
  disgust: 0.029
  anger: 0.028
  fear: 0.022
  joy: 0.006

680 - 708
 Emotions:
  neutral: 0.796
  disgust: 0.091
  anger: 0.075
  surprise: 0.018
  sadness: 0.014
  joy: 0.004
  fear: 0.002

709 - 826
 Emotions:
  neutral: 0.935
  disgust: 0.017
  joy: 0.017
  surprise: 0.014
  anger: 0.009
  sadness: 0.006
  fear: 0.001

827 - 1097
 Emotions:
  disgust: 0.335
  neutral: 0.283
  anger: 0.121
  surprise: 0.115
  fear: 0.093
  sadness: 0.048
  joy: 0.004

1097 - 1148
 Emotions:
  anger: 0.415
  disgust: 0.296
  surprise: 0.1