# Journal Analyzer Project

#### 0) Goal of this notebook 

Create a prototype that shows how well LLMs handle:
- Sentiment Analysis: understand user mood
- Clustering Themes: group similar topics from multiple journal entries
- Empathetic Response Generation: kind, helpful replies

#### 1) Setup


In [1]:
import torch
from transformers import pipeline
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.cluster import KMeans
import random
from sklearn.decomposition import LatentDirichletAllocation
import re
import spacy
import nltk
from nltk.corpus import stopwords
from collections import Counter
from flask import Flask, request, jsonify
import numpy as np
from rake_nltk import Rake
from transformers import AutoModelForCausalLM, AutoTokenizer
from pathlib import Path
from llama_cpp import Llama

#### 2) User Journal Entries

In [2]:
user_journal_entries = [
    "I met a pretty girl at uni and asked for her number, she gave it to me and smiled at me",
    "I went fishing this morning with my dad. It was a nice, relaxing morning.",
    "I felt a little uneasy catching and killing a fish.",
    "In class today I made a good comment and it made me feel smart.",
    "I texted the girl and she responded. We are going out on a date this weekend."
]                      

#### 3) Sentiment Analysis

In [15]:
device = 0 if torch.backends.mps.is_available() else -1

sentiment_pipeline = pipeline(
    "text-classification",
    model="SamLowe/roberta-base-go_emotions",
    tokenizer="SamLowe/roberta-base-go_emotions",
    framework="pt",
    device=device
)

def get_sentiment(journal_entries: list[str]) -> list[str]:
    emotion_counts = Counter()

    for entry in journal_entries:
        results = sentiment_pipeline(entry)
        for r in results:
            emotion_counts[r['label']] += 1

    if emotion_counts:
        top_3 = [label for label, _ in emotion_counts.most_common(3)]
        return f'{top_3[0]}, {top_3[1]}, {top_3[2]}'
    return "neutral"

Device set to use mps:0


In [16]:
print(get_sentiment(user_journal_entries))

joy, nervousness, admiration


#### 4) Clustering Themes

In [13]:
def get_themes(journal_entries: list[str], top_n=3) -> list[str]:
    # Combine all journal entries into one string
    text = " ".join(journal_entries)
    
    # Initialize Rake with default settings
    r = Rake()
    
    # Extract keywords from the combined text
    r.extract_keywords_from_text(text)

    themes = [kw for _ , kw in r.get_ranked_phrases_with_scores()[:top_n]]
    
    # Get the top 'n' ranked key phrases (we want only the phrase, so we ignore the scores)
    return f'{themes[0]}, {themes[1]}, {themes[2]}'

In [14]:
print(get_themes(user_journal_entries))

little uneasy catching, went fishing, good comment


#### 5) Empathetic Response Generation

In [61]:
llm = Llama(model_path="MODEL/zephyr-quiklang-3b-4k.Q5_K_M.gguf", n_ctx=4096)

llama_model_load_from_file_impl: using device Metal (Apple M1) - 4404 MiB free
llama_model_loader: loaded meta data with 21 key-value pairs and 356 tensors from MODEL/zephyr-quiklang-3b-4k.Q5_K_M.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = stablelm
llama_model_loader: - kv   1:                               general.name str              = source
llama_model_loader: - kv   2:                    stablelm.context_length u32              = 4096
llama_model_loader: - kv   3:                  stablelm.embedding_length u32              = 2560
llama_model_loader: - kv   4:                       stablelm.block_count u32              = 32
llama_model_loader: - kv   5:               stablelm.feed_forward_length u32              = 6912
llama_model_loader: - kv   6:              stablelm.rope.dimension_count u32            

In [68]:
def get_empathy(user_responses, emotions):

    combined_responses = " ".join(user_responses)

    # Use Zephyr-style formatting
    prompt = f"""<|system|>
You are a calm, soothing, relaxed therapist and spiritual guru. You give short, gentle advice based on what the user is feeling. Keep it vague, uplifting, and one sentence max. Do not thank the user or explain what you're doing.
</s>
<|user|>
Today's stories: {combined_responses}
Emotions: {emotions}

Give one short, positive suggestion that helps me stay centered or feel okay.
</s>
<|assistant|>
"""

    output = llm(
        prompt,
        max_tokens=50,
        temperature=0.4,
        top_p=0.8,
        stop=["</s>"]  # Zephyr models stop properly on this
    )

    raw_text = output['choices'][0]['text'].strip()

    # Clean output to ensure it's a single sentence
    clean_text = re.sub(r'[^\w\s.,!?\'-]', '', raw_text)
    sentences = re.findall(r'[^.!?]*[.!?]', clean_text)
    return sentences[0].strip() if sentences else clean_text


In [69]:
print(get_empathy(user_journal_entries,get_sentiment(user_journal_entries)))

Llama.generate: 10 prefix-match hit, remaining 185 prompt tokens to eval
llama_perf_context_print:        load time =    5842.54 ms
llama_perf_context_print: prompt eval time =    3926.93 ms /   185 tokens (   21.23 ms per token,    47.11 tokens per second)
llama_perf_context_print:        eval time =    3578.42 ms /    33 runs   (  108.44 ms per token,     9.22 tokens per second)
llama_perf_context_print:       total time =    7532.33 ms /   218 tokens


Acknowledge your achievements and feelings, then remind yourself to be compassionate and enjoy your time on the date, knowing you've shared a positive experience with the girl.


#### 6) Testing & Research Summary

- Sentiment output was very accurate, with over a 0.9 accuracy per emotion (of 28 emotions).
- Theme extraction worked well for finding key events and feelings in the entries.
- Empathetic messages were at times very appropriate and contextual, but also could be random / off topic.
- Next step: evaluate with real user inputs or longer entries, and maybe fine-tune prompts for consistency.