# **Overview of the Message Generation and Evaluation Process**


This notebook presents the generation and evaluation of messages aligned with 15 psychological concepts or constructs taken from 4 popular behavioral theories: Self-determination theory, cognitive dissonance theory, social norms theory and self-efficacy theory. Here's how the process works:

### **1. Message Generation by Llama 3.3 70B**

- **Construct-Specific Customization**: Uses theoretical construct definitions, examples and differentiation information from other theoretical constructs.
    - This information is used to create a tailored prompt that guides the LLM toward generating a message that aligns with the target construct.
    - The prompt includes explicit instructions on how the message should differentiate from competing constructs.
    
- **Context Selection**: Rotates through these task contexts:
    - People working on tasks requiring persistence and skill development
    - Individuals engaging with learning activities to develop expertise
    - Participants facing problem-solving scenarios testing their abilities
    - People working through a series of progressive challenges
    - Learners developing new skills through practice and persistence

- **Diversity Enhancement**: Multiple techniques are employed to ensure message diversity:
    - Diversity focuses: Each message focuses on a different aspect of the construct (e.g., "how this construct manifests in personal growth" vs. "practical day-to-day applications").
    - Structural variations: The system rotates through different message structures (question-answer format, when-then conditional format, comparison structure, cause-effect format).
    - Phrase avoidance: Key phrases from previous messages are extracted and explicitly listed as phrases to avoid.
    - Temperature modulation: The LLM's temperature parameter is increased for subsequent messages to encourage more creative outputs.

- **Message Validation and Refinement**:
    - After generation, each message undergoes cleaning to remove prefixes, quotes, and other artifacts.
    - The system checks if the new message is too semantically similar to previous messages.
    - If similarity exceeds a threshold (0.8 by default), the system regenerates the message with higher temperature.
    - The process allows up to 3 attempts to generate a sufficiently diverse message.

### **2. Semantic Similarity Measurement**

  - Uses SentenceTransformer to convert messages into embeddings
  - Calculates cosine similarity between messages
  - Generates diversity metrics (average similarity, maximum/minimum similarity, diversity score)
  - Assigns uniqueness scores to each message

### **3. Message Evaluation by GPT-4o**

- Creates a detailed prompt that provides the evaluator model with:
    - The message to evaluate
    - Target construct definition and description
    - Examples of messages that exemplify the construct
    - Differentiation information to distinguish the construct from others
    - A scoring rubric with detailed criteria for different score ranges (0-100%); The system uses a detailed rubric with score ranges:
      - 95-100%: Perfect alignment with all aspects of the construct
      - 90-94%: Excellent alignment with nearly all aspects
      - 85-89%: Strong alignment with most aspects
      - 80-84%: Clear alignment with several key aspects
      - Down to 0-49%: Poor alignment or contradiction of the construct

- LLM-Based Evaluation
    - Uses an OpenAI model (typically GPT-4o) to evaluate the message.
    - The model returns a structured JSON response with:
      - A target score for the primary construct (0-100%)
      - Ratings for all psychological constructs to assess differentiation
      - Feedback on message strengths and improvement areas

### **4. Message Ranking**
  - Calculates combined score: target_score × 0.5 + score_difference × 0.3 + uniqueness_score × 0.2
  - Ranks messages by combined score
  - Selects top messages as best examples for the target construct

# **Code**

##Imports and Setup Section

In [None]:
!pip install sentence-transformers together openai python-dotenv matplotlib numpy

import os
import json
import time
import random
from datetime import datetime
import matplotlib.pyplot as plt
import numpy as np
import ipywidgets as widgets
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity
from together import Together
from openai import OpenAI

# API Key setup
from google.colab import userdata
from google.colab import files, drive

TOGETHER_API_KEY = userdata.get('TOGETHER_API_KEY')
OPENAI_API_KEY = userdata.get('OPENAI_API_KEY')
# TOGETHER_API_KEY = input("Enter your TOGETHER API key: ")
# OPENAI_API_KEY = input("Enter your OPENAI API key: ")

# Load the sentence transformer model
model = SentenceTransformer('all-MiniLM-L6-v2')

drive.mount('/content/drive')
storage = os.path.join("/content/drive/My Drive", "Suvo Pedro Shared Folder", "Python code for message generation + results")  ## Change to your destination folder in Google Drive



The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
# @title
# shared resources by generator and evaluator

task_contexts = [
    """People are working on tasks that require persistence and skill development.
    Some might take shortcuts, while others invest in mastering the challenge properly
    through honest effort and commitment to improvement.""",

    """Individuals are engaging with learning activities where they can develop expertise over time.
    Their approach to these activities reflects their values and motivations, with some choosing
    authentic skill development and others looking for easier paths.""",

    """Participants are faced with problem-solving scenarios that test their abilities.
    How they approach these problems varies based on their personal goals and ethics,
    with a choice between genuine mastery and taking shortcuts.""",

    """People are working through a series of challenges that build upon one another.
    Their commitment to honest effort determines both their skill growth and self-perception,
    creating opportunities for authentic achievement or temptations to cut corners.""",

    """Learners are developing new skills through practice and persistence.
    Their choices about whether to take the time to develop genuine capability or
    to find shortcuts reflects their approach to personal development."""
]

all_constructs = {
    # SDT constructs
    "Autonomy": {
        "theory": "Self-Determination Theory (SDT)",
        "description": "Autonomy refers to the intrinsic motivation arising from an individual's experience of volition and self-regulation, with a sense of ownership over their choices and actions. It emphasizes acting according to personal values without external pressure.",
        "examples": [
            "Choose your own approach to these tasks based on what works best for you. Set personal goals that align with your values. Your authentic choices matter more than external expectations. When you make decisions based on your own judgment, you develop a deeper connection to your work.",
            "Trust your instincts about which method fits your learning style best. The freedom to determine your own path creates more meaningful outcomes. You can customize your approach without seeking permission or validation. Your genuine choices reflect your unique perspective and priorities."
        ],
        "differentiation": {
            "from_Competence": "Unlike Competence (skill development), Autonomy focuses on freedom of choice regardless of skill level.",
            "from_Relatedness": "Unlike Relatedness (social connections), Autonomy focuses on individual independence without reference to others.",
            "from_Self-concept": "Unlike Self-concept (identity alignment), Autonomy concerns only making choices freely, not identity reflection.",
            "from_Cognitive inconsistency": "Unlike Cognitive inconsistency (contradictions), Autonomy emphasizes freedom from external constraints.",
            "from_Dissonance arousal": "Unlike Dissonance arousal (discomfort), Autonomy emphasizes positive experiences without negative emotions.",
            "from_Dissonance reduction": "Unlike Dissonance reduction (resolving conflicts), Autonomy focuses on freedom without resolving cognitions.",
            "from_Performance accomplishments": "Unlike Performance accomplishments (past achievements), Autonomy focuses on present freedom regardless of success.",
            "from_Vicarious experience": "Unlike Vicarious experience (learning from others), Autonomy focuses on personal choice.",
            "from_Verbal persuasion": "Unlike Verbal persuasion (encouragement from others), Autonomy focuses on present freedom and self-direction.",
            "from_Emotional arousal": "Unlike Emotional arousal (affective states), Autonomy focuses on cognitive aspects of choice.",
            "from_Descriptive Norms": "Unlike Descriptive Norms (common behaviors), Autonomy emphasizes individual choice regardless of group norms.",
            "from_Injunctive Norms": "Unlike Injunctive Norms (community approval), Autonomy rejects external standards for personal choice.",
            "from_Social Sanctions": "Unlike Social Sanctions (social consequences), Autonomy emphasizes freedom from social judgment.",
            "from_Reference Group Identification": "Unlike Reference Group Identification (group alignment), Autonomy emphasizes independence from group standards."
        }
    },
    "Competence": {
        "theory": "Self-Determination Theory (SDT)",
        "description": "Competence refers to the innate need to feel effective and capable in activities, driven by a desire to master challenges through effort and persistence. It includes both task performance ability and the satisfaction from overcoming obstacles and learning new skills.",
        "examples": [
            "Each time you tackle a challenging task honestly, you're building valuable skills that make you more effective. The satisfaction of knowing your growing abilities are truly your own creates a genuine sense of competence that no shortcut can provide.",
            "As you practice solving difficult problems through your own effort, you're developing abilities that transfer to many situations. The challenge might seem difficult at first, but your brain is quietly mastering these skills, and you'll experience a deeper sense of capability with each honest attempt."
        ],
        "differentiation": {
            "from_Autonomy": "Unlike Autonomy (freedom of choice), Competence emphasizes skill development regardless of choice freedom.",
            "from_Relatedness": "Unlike Relatedness (social connections), Competence focuses on individual skill mastery.",
            "from_Self-concept": "Unlike Self-concept (identity), Competence is about objective capability rather than identity reflection.",
            "from_Cognitive inconsistency": "Unlike Cognitive inconsistency (contradictions), Competence addresses progressive skill development.",
            "from_Dissonance arousal": "Unlike Dissonance arousal (discomfort), Competence emphasizes positive mastery experiences.",
            "from_Dissonance reduction": "Unlike Dissonance reduction (resolving conflicts), Competence focuses on skill building.",
            "from_Performance accomplishments": "Unlike Performance accomplishments (specific achievements), Competence focuses on ongoing skill development.",
            "from_Vicarious experience": "Unlike Vicarious experience (learning from others), Competence focuses on direct personal experience.",
            "from_Verbal persuasion": "Unlike Verbal persuasion (encouragement from others), Competence emphasizes developing actual capabilities rather than belief in capabilities.",
            "from_Emotional arousal": "Unlike Emotional arousal (emotions), Competence focuses on cognitive aspects of mastery.",
            "from_Descriptive Norms": "Unlike Descriptive Norms (common behaviors), Competence emphasizes individual skill development.",
            "from_Injunctive Norms": "Unlike Injunctive Norms (community approval), Competence focuses on personal skill mastery.",
            "from_Social Sanctions": "Unlike Social Sanctions (external judgment), Competence emphasizes internal satisfaction from skill development.",
            "from_Reference Group Identification": "Unlike Reference Group Identification (group membership), Competence focuses on individual capability."
        }
    },
    "Relatedness": {
        "theory": "Self-Determination Theory (SDT)",
        "description": "Relatedness refers to the inherent human need for social connection, belonging, and feeling understood by others. It encompasses feelings of companionship, acceptance, and validation essential for psychological well-being and motivation.",
        "examples": [
            "Sharing your experiences with others builds meaningful connections during this journey. Your contributions help everyone learn and grow together. Supporting others through challenges creates bonds of mutual trust. Knowing that others face similar struggles makes the work more meaningful for everyone involved.",
            "Working together gives us different perspectives to solve problems better. Your unique insights help the whole group succeed. Asking questions often helps others clarify their own thinking too. The connections we form through collaboration create value beyond just completing tasks. We achieve more when we support each other's authentic effort."
        ],
        "differentiation": {
            "from_Autonomy": "Unlike Autonomy (independence), Relatedness focuses on interdependence and connection with others.",
            "from_Competence": "Unlike Competence (skill development), Relatedness emphasizes social bonds regardless of skill level.",
            "from_Self-concept": "Unlike Self-concept (personal identity), Relatedness focuses on interpersonal connections.",
            "from_Cognitive inconsistency": "Unlike Cognitive inconsistency (contradictions), Relatedness focuses on social harmony.",
            "from_Dissonance arousal": "Unlike Dissonance arousal (discomfort), Relatedness emphasizes positive social connections.",
            "from_Dissonance reduction": "Unlike Dissonance reduction (resolving conflicts), Relatedness focuses on building social bonds.",
            "from_Performance accomplishments": "Unlike Performance accomplishments (achievements), Relatedness focuses on connections regardless of success.",
            "from_Vicarious experience": "Unlike Vicarious experience (learning from others), Relatedness focuses on connecting without skill acquisition.",
            "from_Verbal persuasion": "Unlike Verbal persuasion (encouragement from others), Relatedness focuses on genuine connection rather than motivation.",
            "from_Emotional arousal": "Unlike Emotional arousal (individual emotions), Relatedness focuses on shared social experiences.",
            "from_Descriptive Norms": "Unlike Descriptive Norms (common behaviors), Relatedness emphasizes emotional connections.",
            "from_Injunctive Norms": "Unlike Injunctive Norms (community approval), Relatedness focuses on genuine personal connections.",
            "from_Social Sanctions": "Unlike Social Sanctions (consequences), Relatedness emphasizes positive social bonds.",
            "from_Reference Group Identification": "Unlike Reference Group Identification (group membership), Relatedness focuses on quality of interpersonal connections."
        }
    },

    # CDT constructs
    "Self-concept": {
        "theory": "Cognitive Dissonance Theory (CDT)",
        "description": "Self-concept refers to an individual's cognitive representation of themselves, encompassing perceived identity, values, and traits. It involves the dynamic interplay between self-image, goals, and behaviors, influencing self-perception in relation to others and the world.",
        "examples": [
            "Your thoughtful approach to challenges reflects the kind of person you truly are. The persistence you show when facing obstacles demonstrates your authentic character. You value genuine understanding over quick fixes. Your commitment to honest effort aligns with your core values and shapes your achievements.",
            "Your willingness to tackle difficult tasks shows your genuine commitment to growth. You value finding the right solution, not just the quickest one. Your methodical approach reflects your belief in doing things with integrity. These actions align with your values as someone who takes pride in authentic accomplishment."
        ],
        "differentiation": {
            "from_Autonomy": "Unlike Autonomy (freedom of choice), Self-concept emphasizes how actions reflect identity regardless of choice freedom.",
            "from_Competence": "Unlike Competence (skill mastery), Self-concept emphasizes identity regardless of actual skill level.",
            "from_Relatedness": "Unlike Relatedness (social connections), Self-concept emphasizes internal self-perception independent of relationships.",
            "from_Cognitive inconsistency": "Unlike Cognitive inconsistency (contradictions), Self-concept focuses on consistent identity aspects.",
            "from_Dissonance arousal": "Unlike Dissonance arousal (discomfort), Self-concept emphasizes identity without negative emotional states.",
            "from_Dissonance reduction": "Unlike Dissonance reduction (resolving conflicts), Self-concept maintains identity coherence.",
            "from_Performance accomplishments": "Unlike Performance accomplishments (specific achievements), Self-concept involves broader identity traits.",
            "from_Vicarious experience": "Unlike Vicarious experience (learning from others), Self-concept focuses on personal identity.",
            "from_Verbal persuasion": "Unlike Verbal persuasion (encouragement from others), Self-concept focuses on internal identity perceptions.",
            "from_Emotional arousal": "Unlike Emotional arousal (emotions), Self-concept focuses on cognitive self-perception.",
            "from_Descriptive Norms": "Unlike Descriptive Norms (common behaviors), Self-concept emphasizes personal identity independent of group behaviors.",
            "from_Injunctive Norms": "Unlike Injunctive Norms (community approval), Self-concept focuses on internal self-perception.",
            "from_Social Sanctions": "Unlike Social Sanctions (consequences), Self-concept emphasizes internal identity without external judgments.",
            "from_Reference Group Identification": "Unlike Reference Group Identification (group membership), Self-concept focuses on individual identity traits."
        }
    },

    "Cognitive inconsistency": {
        "theory": "Cognitive Dissonance Theory (CDT)",
        "description": "Cognitive inconsistency refers to recognizing the discrepancy between one's cognitions, values, or goals and their current behavior or choices. It represents the initial stage of cognitive dissonance where individuals become aware of conflicting information within themselves.",
        "examples": [
            "You notice that you aim to complete tasks through your own honest effort, yet sometimes find yourself looking for shortcuts or assistance. You recognize this gap between your stated values and actual behaviors without judging yourself. This pattern appears consistently in different situations. You can observe both aspects of your approach existing side by side.",
            "You find yourself encouraging others to put in the time for thorough work while sometimes taking shortcuts yourself. You recognize these opposing tendencies in your approach to tasks. This contradiction exists in your daily habits without causing immediate distress. You see how these contrasting patterns operate in your routine. The inconsistency is simply a fact you've observed."
        ],
        "differentiation": {
            "from_Autonomy": "Unlike Autonomy (freedom of choice), Cognitive inconsistency emphasizes contradictions regardless of choice freedom.",
            "from_Competence": "Unlike Competence (skill development), Cognitive inconsistency emphasizes conflicts between cognitions.",
            "from_Relatedness": "Unlike Relatedness (social connections), Cognitive inconsistency emphasizes internal cognitive conflicts.",
            "from_Self-concept": "Unlike Self-concept (stable identity), Cognitive inconsistency focuses on conflicts between cognition or behavior.",
            "from_Dissonance arousal": "Unlike Dissonance arousal (emotional discomfort), Cognitive inconsistency emphasizes rational recognition of contradictions without emotional reaction.",
            "from_Dissonance reduction": "Unlike Dissonance reduction (resolving conflicts), Cognitive inconsistency focuses on initial recognition of contradictions without resolving them.",
            "from_Performance accomplishments": "Unlike Performance accomplishments (achievements), Cognitive inconsistency focuses on contradictions regardless of success.",
            "from_Vicarious experience": "Unlike Vicarious experience (learning from others), Cognitive inconsistency emphasizes internal conflicts.",
            "from_Verbal persuasion": "Unlike Verbal persuasion (encouragement from others), Cognitive inconsistency focuses on internal contradictions regardless of external influence.",
            "from_Emotional arousal": "Unlike Emotional arousal (emotions), Cognitive inconsistency focuses on logical contradictions.",
            "from_Descriptive Norms": "Unlike Descriptive Norms (common behaviors), Cognitive inconsistency emphasizes internal contradictions.",
            "from_Injunctive Norms": "Unlike Injunctive Norms (community approval), Cognitive inconsistency focuses on internal contradictions.",
            "from_Social Sanctions": "Unlike Social Sanctions (consequences), Cognitive inconsistency emphasizes internal conflicts.",
            "from_Reference Group Identification": "Unlike Reference Group Identification (group membership), Cognitive inconsistency focuses on internal contradictions."
        }
    },
    "Dissonance arousal": {
        "theory": "Cognitive Dissonance Theory (CDT)",
        "description": "Dissonance arousal refers to the emotional discomfort or tension that arises when self-perception is threatened by inconsistency between actions, behaviors, or cognitions. It captures the motivational drive to resolve cognitive dissonance triggered by incongruence between values, goals, and behavior.",
        "examples": [
            "Taking shortcuts on tasks while valuing thorough work creates a growing sense of inner discomfort. Your unease increases when you receive praise for work that didn't reflect your best effort. This internal conflict between your actions and values feels increasingly stressful. The tension remains even after completing assignments, making it hard to focus on new tasks.",
            "Using assistance on problems you claimed to solve independently creates internal tension and anxiety. Each time you receive credit, your discomfort intensifies. The clash between your public image and private reality feels deeply unsettling. This feeling of unease follows you even into new challenges. Your mind repeatedly returns to this troubling inconsistency."
        ],
        "differentiation": {
            "from_Autonomy": "Unlike Autonomy (freedom of choice), Dissonance arousal emphasizes negative emotional states from conflicting cognitions.",
            "from_Competence": "Unlike Competence (skill development), Dissonance arousal emphasizes emotional discomfort.",
            "from_Relatedness": "Unlike Relatedness (social connections), Dissonance arousal emphasizes internal psychological discomfort.",
            "from_Self-concept": "Unlike Self-concept (stable identity), Dissonance arousal focuses on emotional distress when identity is threatened.",
            "from_Cognitive inconsistency": "Unlike Cognitive inconsistency (logical recognition), Dissonance arousal focuses on emotional discomfort from contradictions.",
            "from_Dissonance reduction": "Unlike Dissonance reduction (resolving conflicts), Dissonance arousal focuses on uncomfortable emotional state before resolution.",
            "from_Performance accomplishments": "Unlike Performance accomplishments (achievements), Dissonance arousal focuses on emotional discomfort.",
            "from_Vicarious experience": "Unlike Vicarious experience (learning from others), Dissonance arousal emphasizes internal emotional discomfort.",
            "from_Verbal persuasion": "Unlike Verbal persuasion (encouragement from others), Dissonance arousal focuses on internal emotional discomfort regardless of external feedback.",
            "from_Emotional arousal": "Unlike Emotional arousal (various emotions), Dissonance arousal specifically focuses on negative emotional states from contradictions.",
            "from_Descriptive Norms": "Unlike Descriptive Norms (common behaviors), Dissonance arousal emphasizes internal emotional discomfort.",
            "from_Injunctive Norms": "Unlike Injunctive Norms (community approval), Dissonance arousal focuses on internal discomfort.",
            "from_Social Sanctions": "Unlike Social Sanctions (social consequences), Dissonance arousal emphasizes internal psychological discomfort.",
            "from_Reference Group Identification": "Unlike Reference Group Identification (group membership), Dissonance arousal focuses on internal emotional discomfort."
        }
    },
    "Dissonance reduction": {
        "theory": "Cognitive Dissonance Theory (CDT)",
        "description": "Dissonance reduction is the psychological process of reconciling conflicting cognitions or behaviors to mitigate discomfort from perceived inconsistencies. It involves deliberate reconfiguration of thoughts, attitudes, or behaviors to restore consonance and alleviate dissonant states.",
        "examples": [
            "After feeling conflicted about using resources for help, you now view strategic assistance as part of effective learning while maintaining your commitment to honesty. This perspective aligns your actions with your goal of developing genuine understanding. You've created a balanced approach that values both independence and ethical use of resources. Your new framework eliminates the previous mental conflict and restores consistency between your behavior and values.",
            "To resolve the tension between wanting perfect results and needing to move forward, you've redefined success as making steady progress through honest effort. This new view bridges the gap between your high standards and practical constraints. You now set specific quality thresholds for different parts of your work. This thoughtful compromise reduces your previous anxiety and aligns your expectations with your actions."
        ],
        "differentiation": {
            "from_Autonomy": "Unlike Autonomy (freedom of choice), Dissonance reduction emphasizes resolving cognitive conflicts.",
            "from_Competence": "Unlike Competence (skill development), Dissonance reduction emphasizes resolving psychological tension.",
            "from_Relatedness": "Unlike Relatedness (social connections), Dissonance reduction emphasizes resolving internal conflicts.",
            "from_Self-concept": "Unlike Self-concept (stable identity), Dissonance reduction focuses on aligning contradictory aspects.",
            "from_Cognitive inconsistency": "Unlike Cognitive inconsistency (identifying contradictions), Dissonance reduction focuses on resolving contradictions.",
            "from_Dissonance arousal": "Unlike Dissonance arousal (emotional discomfort), Dissonance reduction focuses on alleviating that discomfort.",
            "from_Performance accomplishments": "Unlike Performance accomplishments (achievements), Dissonance reduction focuses on resolving cognitive conflicts.",
            "from_Vicarious experience": "Unlike Vicarious experience (learning from others), Dissonance reduction emphasizes resolving internal conflicts.",
            "from_Verbal persuasion": "Unlike Verbal persuasion (encouragement from others), Dissonance reduction focuses on internal resolution processes regardless of external input.",
            "from_Emotional arousal": "Unlike Emotional arousal (emotions), Dissonance reduction focuses on cognitive resolution processes.",
            "from_Descriptive Norms": "Unlike Descriptive Norms (common behaviors), Dissonance reduction emphasizes resolving internal conflicts.",
            "from_Injunctive Norms": "Unlike Injunctive Norms (community approval), Dissonance reduction focuses on internal resolution.",
            "from_Social Sanctions": "Unlike Social Sanctions (consequences), Dissonance reduction emphasizes internal psychological processes.",
            "from_Reference Group Identification": "Unlike Reference Group Identification (group membership), Dissonance reduction focuses on internal resolution."
        }
    },

    # SNT constructs
    "Descriptive Norms": {
        "theory": "Social Norm Theory (SNT)",
        "description": "Descriptive Norms refer to the perception of what is typical or common behavior in a group, emphasizing observation and understanding of others' actions as a guide for individual behavior. This focuses on how individuals perceive and internalize others' behaviors.",
        "examples": [
            "Most people who excel at these tasks spend extra time understanding the fundamentals before moving forward through honest effort. Successful participants typically review their work carefully before submitting. Many people find that taking short breaks improves their overall performance while maintaining integrity. Those who do well generally tackle challenging parts when their energy is highest, rather than taking shortcuts.",
            "You'll observe that effective learners often ask questions when they're uncertain rather than guessing or using shortcuts. People typically achieve better results when they follow instructions step by step with genuine effort. Most successful participants create a distraction-free environment while working. Many find that explaining concepts to themselves improves their understanding more than looking up answers."
        ],
        "differentiation": {
            "from_Autonomy": "Unlike Autonomy (personal choice), Descriptive Norms focus on what most people actually do as a behavioral guide.",
            "from_Competence": "Unlike Competence (skill development), Descriptive Norms emphasize observing common behaviors regardless of skill.",
            "from_Relatedness": "Unlike Relatedness (emotional connections), Descriptive Norms emphasize behavioral patterns without personal relationships.",
            "from_Self-concept": "Unlike Self-concept (identity), Descriptive Norms focus on observed behaviors without identity reflection.",
            "from_Cognitive inconsistency": "Unlike Cognitive inconsistency (contradictions), Descriptive Norms emphasize consistent group behavior patterns.",
            "from_Dissonance arousal": "Unlike Dissonance arousal (discomfort), Descriptive Norms focus on observation without emotional discomfort.",
            "from_Dissonance reduction": "Unlike Dissonance reduction (resolving conflicts), Descriptive Norms emphasize observation without addressing inconsistencies.",
            "from_Performance accomplishments": "Unlike Performance accomplishments (achievements), Descriptive Norms focus on common behaviors regardless of success.",
            "from_Vicarious experience": "Unlike Vicarious experience (learning from others), Descriptive Norms focus on observing what people do without emphasizing learning.",
            "from_Verbal persuasion": "Unlike Verbal persuasion (encouragement from others), Descriptive Norms focus on observed behaviors rather than explicit guidance.",
            "from_Emotional arousal": "Unlike Emotional arousal (emotions), Descriptive Norms focus on behavior patterns without emotional responses.",
            "from_Injunctive Norms": "Unlike Injunctive Norms (what people should do), Descriptive Norms focus solely on what people typically do.",
            "from_Social Sanctions": "Unlike Social Sanctions (consequences), Descriptive Norms focus on observation without rewards or punishments.",
            "from_Reference Group Identification": "Unlike Reference Group Identification (group identity), Descriptive Norms focus on observing behaviors without group identification."
        }
    },
    "Injunctive Norms": {
        "theory": "Social Norm Theory (SNT)",
        "description": "Injunctive Norms refer to community-level standards that dictate what behavior is considered \"right\" or \"acceptable,\" focusing on collective expectations and moral obligations guiding individual actions. They emphasize the \"should\" and \"ought\" aspects of behavior.",
        "examples": [
            "In our learning community, we value thorough understanding achieved through honest effort over quick completion. Everyone should take the time needed to produce quality work they can be proud of without resorting to shortcuts. We expect all participants to rely on their own understanding when completing tasks. Careful work based on genuine effort is essential for meaningful achievement. Respecting these standards leads to authentic growth.",
            "Our group believes in putting genuine effort into mastering each concept through legitimate practice. Participants should seek help when truly needed rather than searching for easy answers. Everyone is expected to represent their own work and understanding accurately. We value the learning process as much as the final outcome. Following these principles builds a foundation for lasting knowledge and skill development."
        ],
        "differentiation": {
            "from_Autonomy": "Unlike Autonomy (freedom from external influence), Injunctive Norms focus on external social expectations.",
            "from_Competence": "Unlike Competence (skill development), Injunctive Norms emphasize adherence to social standards regardless of skill.",
            "from_Relatedness": "Unlike Relatedness (personal connections), Injunctive Norms emphasize community standards rather than relationships.",
            "from_Self-concept": "Unlike Self-concept (identity), Injunctive Norms focus on external standards without reference to personal identity.",
            "from_Cognitive inconsistency": "Unlike Cognitive inconsistency (contradictions), Injunctive Norms emphasize clear social standards.",
            "from_Dissonance arousal": "Unlike Dissonance arousal (discomfort), Injunctive Norms focus on social approval without emotional states.",
            "from_Dissonance reduction": "Unlike Dissonance reduction (resolving conflicts), Injunctive Norms emphasize external standards.",
            "from_Performance accomplishments": "Unlike Performance accomplishments (achievements), Injunctive Norms focus on social standards regardless of success.",
            "from_Vicarious experience": "Unlike Vicarious experience (learning from others), Injunctive Norms emphasize what one should do.",
            "from_Verbal persuasion": "Unlike Verbal persuasion (encouragement from others), Injunctive Norms focus on social expectations rather than personal capability feedback.",
            "from_Emotional arousal": "Unlike Emotional arousal (emotions), Injunctive Norms focus on social standards without emotional responses.",
            "from_Descriptive Norms": "Unlike Descriptive Norms (what people do), Injunctive Norms focus on what people should do according to standards.",
            "from_Social Sanctions": "Unlike Social Sanctions (consequences), Injunctive Norms emphasize the standards rather than punishments.",
            "from_Reference Group Identification": "Unlike Reference Group Identification (group identification), Injunctive Norms focus on standards without personal identification."
        }
    },
    "Social Sanctions": {
        "theory": "Social Norm Theory (SNT)",
        "description": "Social Sanctions refer to the perceived consequences associated with conforming to or deviating from social norms, including both positive and negative outcomes. This emphasizes the subjective experience of being judged by others in relation to one's behavior.",
        "examples": [
            "People who take the time to understand concepts thoroughly through honest effort often receive recognition for their insightful contributions. Those who rush through work without genuine effort typically need to repeat tasks more often. Participants who demonstrate consistent integrity receive more opportunities for advancement. Those who take shortcuts often find themselves struggling with later, more complex material. Your approach to these tasks influences how others perceive your commitment to quality.",
            "Learners who show dedication to mastering difficult concepts through authentic effort earn respect from peers and instructors. Those who regularly seek unauthorized shortcuts miss valuable learning opportunities that benefit others. People who maintain high personal standards of honesty find doors opening to new challenges and responsibilities. Those who prioritize appearance over substance eventually find their progress limited. The reputation you build through your ethical work approach follows you to future opportunities."
        ],
        "differentiation": {
            "from_Autonomy": "Unlike Autonomy (freedom from influence), Social Sanctions focus on external judgments and consequences.",
            "from_Competence": "Unlike Competence (skill development), Social Sanctions emphasize social consequences regardless of capability.",
            "from_Relatedness": "Unlike Relatedness (genuine connections), Social Sanctions emphasize evaluative judgments rather than authentic relationships.",
            "from_Self-concept": "Unlike Self-concept (identity), Social Sanctions focus on external judgments without personal identity.",
            "from_Cognitive inconsistency": "Unlike Cognitive inconsistency (contradictions), Social Sanctions emphasize external consequences.",
            "from_Dissonance arousal": "Unlike Dissonance arousal (internal discomfort), Social Sanctions focus on anticipated external judgments.",
            "from_Dissonance reduction": "Unlike Dissonance reduction (resolving conflicts), Social Sanctions emphasize external consequences.",
            "from_Performance accomplishments": "Unlike Performance accomplishments (achievements), Social Sanctions focus on social consequences regardless of success.",
            "from_Vicarious experience": "Unlike Vicarious experience (learning from others), Social Sanctions emphasize consequences of behavior.",
            "from_Verbal persuasion": "Unlike Verbal persuasion (encouragement from others), Social Sanctions focus on community reactions rather than specific capability feedback.",
            "from_Emotional arousal": "Unlike Emotional arousal (personal emotions), Social Sanctions focus on anticipated social judgments.",
            "from_Descriptive Norms": "Unlike Descriptive Norms (what people do), Social Sanctions emphasize consequences of conforming or deviating.",
            "from_Injunctive Norms": "Unlike Injunctive Norms (standards themselves), Social Sanctions emphasize consequences of adherence or violation.",
            "from_Reference Group Identification": "Unlike Reference Group Identification (group membership), Social Sanctions focus on consequences without group identification."
        }
    },
    "Reference Group Identification": {
        "theory": "Social Norm Theory (SNT)",
        "description": "Reference Group Identification captures the extent to which individuals define themselves by membership in a particular social group, prioritizing alignment with its values and norms over personal preferences. This emphasizes the self-referential process of identifying with a specific collective.",
        "examples": [
            "As a member of this learning community, you value thorough understanding through honest effort over taking shortcuts. You feel a sense of pride when upholding our shared commitment to genuine mastery through ethical approaches. Your approach to challenges reflects the high standards of integrity our group maintains. Being part of this community influences how you tackle difficult problems. Your connection to fellow learners who value authentic achievement guides your choices.",
            "Your identity as someone who values deep, honest learning shapes how you approach these tasks. You evaluate your progress based on the principles of integrity our community respects. The satisfaction of representing our group's dedication to quality motivates your ethical efforts. You naturally consider whether your work reflects the standards of honesty we collectively value. Your membership in this community of committed ethical learners influences your approach to challenges."
        ],
        "differentiation": {
            "from_Autonomy": "Unlike Autonomy (independence), Reference Group Identification focuses on alignment with group values and identity.",
            "from_Competence": "Unlike Competence (skill development), Reference Group Identification emphasizes group membership regardless of skill.",
            "from_Relatedness": "Unlike Relatedness (personal connections), Reference Group Identification emphasizes identity with a collective.",
            "from_Self-concept": "Unlike Self-concept (personal identity), Reference Group Identification focuses on how group membership shapes identity.",
            "from_Cognitive inconsistency": "Unlike Cognitive inconsistency (contradictions), Reference Group Identification emphasizes consistent alignment with group values.",
            "from_Dissonance arousal": "Unlike Dissonance arousal (discomfort), Reference Group Identification focuses on positive identification.",
            "from_Dissonance reduction": "Unlike Dissonance reduction (resolving conflicts), Reference Group Identification emphasizes group alignment.",
            "from_Performance accomplishments": "Unlike Performance accomplishments (achievements), Reference Group Identification focuses on group membership regardless of success.",
            "from_Vicarious experience": "Unlike Vicarious experience (learning from others), Reference Group Identification emphasizes identity with the group.",
            "from_Verbal persuasion": "Unlike Verbal persuasion (encouragement from others), Reference Group Identification focuses on group belonging rather than capability feedback.",
            "from_Emotional arousal": "Unlike Emotional arousal (emotions), Reference Group Identification focuses on social identity.",
            "from_Descriptive Norms": "Unlike Descriptive Norms (what people do), Reference Group Identification emphasizes personal identification with the group.",
            "from_Injunctive Norms": "Unlike Injunctive Norms (community standards), Reference Group Identification emphasizes personal identification with standards as identity.",
            "from_Social Sanctions": "Unlike Social Sanctions (consequences), Reference Group Identification emphasizes voluntary alignment with group values."
        }
    },

    # SET constructs
    "Performance accomplishments": {
        "theory": "Self-Efficacy Theory (SET)",
        "description": "Performance accomplishments refer to the belief that one's past successes serve as concrete evidence of capability to perform specific tasks or behaviors. This emphasizes the direct connection between prior accomplishments and future performance, allowing individuals to infer competence from their track record.",
        "examples": [
            "Remember how you solved that challenging problem last week through your own honest effort? That achievement proves you have the skills needed for today's tasks. You've already demonstrated your ability to understand complex instructions before. The quality improvement in your recent work shows your growing capability. These past achievements through legitimate effort are solid evidence of your ability.",
            "Your steady progress from basic to advanced concepts shows your ability to master new material through dedicated practice. You've already overcome several obstacles that seemed difficult at first through perseverance. Your successful completion of similar tasks through your own effort proves you can handle this challenge too. Your track record of honest achievement is the best predictor of your continued success."
        ],
        "differentiation": {
            "from_Autonomy": "Unlike Autonomy (freedom of choice), Performance accomplishments emphasizes specific past achievements.",
            "from_Competence": "Unlike Competence (general skill development), Performance accomplishments focuses on concrete past achievements as evidence.",
            "from_Relatedness": "Unlike Relatedness (social connections), Performance accomplishments emphasizes personal achievements.",
            "from_Self-concept": "Unlike Self-concept (broad identity), Performance accomplishments focuses narrowly on specific achievements.",
            "from_Cognitive inconsistency": "Unlike Cognitive inconsistency (contradictions), Performance accomplishments emphasizes consistent success patterns.",
            "from_Dissonance arousal": "Unlike Dissonance arousal (discomfort), Performance accomplishments focuses on positive achievement history.",
            "from_Dissonance reduction": "Unlike Dissonance reduction (resolving conflicts), Performance accomplishments emphasizes achievement history.",
            "from_Vicarious experience": "Unlike Vicarious experience (learning from others), Performance accomplishments emphasizes direct personal achievements.",
            "from_Verbal persuasion": "Unlike Verbal persuasion (encouragement from others), Performance accomplishments emphasizes one's own proven track record rather than others' feedback.",
            "from_Emotional arousal": "Unlike Emotional arousal (emotions), Performance accomplishments focuses on concrete achievements.",
            "from_Descriptive Norms": "Unlike Descriptive Norms (common behaviors), Performance accomplishments emphasizes personal achievement history.",
            "from_Injunctive Norms": "Unlike Injunctive Norms (community approval), Performance accomplishments focuses on personal achievements.",
            "from_Social Sanctions": "Unlike Social Sanctions (consequences), Performance accomplishments emphasizes personal achievement history.",
            "from_Reference Group Identification": "Unlike Reference Group Identification (group membership), Performance accomplishments focuses on individual achievements."
        }
    },
    "Vicarious experience": {
        "theory": "Self-Efficacy Theory (SET)",
        "description": "Vicarious experience refers to acquiring confidence and motivation by observing others' successful experiences, achievements, and behaviors. It emphasizes how witnessing peers overcome challenges enhances one's own belief in their capacity for similar success.",
        "examples": [
            "Notice how others with similar backgrounds have successfully completed these challenging tasks through persistent honest effort. They started with the same questions you have now. Their step-by-step progress shows a clear path forward based on genuine work. Watch how they tackle difficult sections with persistence and integrity. Their success shows these challenges are absolutely conquerable with authentic effort.",
            "Seeing your colleagues work through these problems demonstrates effective approaches you can adopt while maintaining your commitment to honest work. They faced the same initial confusion yet found their way forward through genuine effort. Their methods of breaking down complex tasks into manageable steps are strategies you can apply to your own learning journey. Their achievements show what's possible with consistent, ethical effort."
        ],
        "differentiation": {
            "from_Autonomy": "Unlike Autonomy (personal choice), Vicarious experience emphasizes learning from others' behaviors.",
            "from_Competence": "Unlike Competence (direct skill development), Vicarious experience focuses on observational learning.",
            "from_Relatedness": "Unlike Relatedness (emotional connections), Vicarious experience emphasizes observing others' successes.",
            "from_Self-concept": "Unlike Self-concept (identity), Vicarious experience focuses on learning from others.",
            "from_Cognitive inconsistency": "Unlike Cognitive inconsistency (contradictions), Vicarious experience emphasizes learning from consistent patterns.",
            "from_Dissonance arousal": "Unlike Dissonance arousal (discomfort), Vicarious experience focuses on positive observational learning.",
            "from_Dissonance reduction": "Unlike Dissonance reduction (resolving conflicts), Vicarious experience emphasizes observational learning.",
            "from_Performance accomplishments": "Unlike Performance accomplishments (personal achievements), Vicarious experience focuses on learning from others' successes.",
            "from_Verbal persuasion": "Unlike Verbal persuasion (encouragement and feedback), Vicarious experience focuses on observing rather than hearing about capabilities.",
            "from_Emotional arousal": "Unlike Emotional arousal (emotions), Vicarious experience focuses on cognitive learning processes.",
            "from_Descriptive Norms": "Unlike Descriptive Norms (common behaviors), Vicarious experience emphasizes learning derived from observation.",
            "from_Injunctive Norms": "Unlike Injunctive Norms (community approval), Vicarious experience focuses on observational learning.",
            "from_Social Sanctions": "Unlike Social Sanctions (consequences), Vicarious experience emphasizes learning opportunities.",
            "from_Reference Group Identification": "Unlike Reference Group Identification (group membership), Vicarious experience focuses on learning from others."
        }
    },
    "Verbal persuasion": {
        "theory": "Self-Efficacy Theory (SET)",
        "description": "Verbal persuasion refers to encouragement, feedback, and expressions of confidence from others that influence an individual's belief in their capabilities. It emphasizes how external validation and guidance can strengthen one's conviction in their ability to succeed at specific tasks.",
        "examples": [
            "Your thoughtful approach to problems shows you have a natural talent for this work when you apply yourself honestly. The insightful questions you ask demonstrate your growing understanding of key concepts. Your progress so far indicates you'll do well with these more challenging sections if you continue to put in genuine effort. Your careful attention to details shows your commitment to quality work.",
            "I've seen how quickly you picked up earlier concepts through your own honest effort, which shows your strong learning ability. Your systematic method of working through problems is exactly what these tasks require. The improvements in your recent work demonstrate your capacity to master this material through authentic practice. Your persistence when facing obstacles will serve you well here."
        ],
        "differentiation": {
            "from_Autonomy": "Unlike Autonomy (freedom from external influence), Verbal persuasion emphasizes the positive impact of others' encouragement.",
            "from_Competence": "Unlike Competence (actual skill development), Verbal persuasion focuses on beliefs about capabilities based on others' feedback.",
            "from_Relatedness": "Unlike Relatedness (emotional connection), Verbal persuasion focuses on motivational aspects of social communication.",
            "from_Self-concept": "Unlike Self-concept (identity), Verbal persuasion focuses on specific task-related confidence.",
            "from_Cognitive inconsistency": "Unlike Cognitive inconsistency (contradictions), Verbal persuasion emphasizes consistent encouraging feedback.",
            "from_Dissonance arousal": "Unlike Dissonance arousal (discomfort), Verbal persuasion focuses on positive reinforcement.",
            "from_Dissonance reduction": "Unlike Dissonance reduction (resolving conflicts), Verbal persuasion emphasizes building confidence.",
            "from_Performance accomplishments": "Unlike Performance accomplishments (personal history), Verbal persuasion relies on others' evaluations and encouragement.",
            "from_Vicarious experience": "Unlike Vicarious experience (observing others), Verbal persuasion focuses on direct communication and feedback.",
            "from_Emotional arousal": "Unlike Emotional arousal (physiological states), Verbal persuasion focuses on externally provided confidence.",
            "from_Descriptive Norms": "Unlike Descriptive Norms (common behaviors), Verbal persuasion emphasizes personal encouragement.",
            "from_Injunctive Norms": "Unlike Injunctive Norms (community standards), Verbal persuasion focuses on specific capability feedback.",
            "from_Social Sanctions": "Unlike Social Sanctions (consequences), Verbal persuasion emphasizes positive encouragement.",
            "from_Reference Group Identification": "Unlike Reference Group Identification (group membership), Verbal persuasion focuses on personal capability feedback."
        }
    },
    "Emotional arousal": {
        "theory": "Self-Efficacy Theory (SET)",
        "description": "Emotional Arousal refers to the intense, personally relevant emotional states experienced during and after successful or challenging tasks. It encompasses positive emotions like excitement, pride, and satisfaction directly tied to perceived mastery and accomplishment.",
        "examples": [
            "Notice the deep satisfaction you feel when solving a difficult problem through your own honest efforts. That sense of genuine accomplishment creates a positive energy that fuels further progress. Your confidence grows with each challenge you overcome independently. The excitement of discovery makes the ethical approach worthwhile. These authentic positive feelings strengthen your connection to the work.",
            "Pay attention to that moment of clarity when a confusing concept suddenly makes sense through your own effort. Your mind feels sharper and more focused when you're fully engaged in challenging tasks with integrity. The tension of struggle followed by the relief of genuine understanding creates a rewarding cycle. That feeling of authentic mastery motivates you to take on greater challenges. These emotional responses make honest learning deeply satisfying."
        ],
        "differentiation": {
            "from_Autonomy": "Unlike Autonomy (freedom of choice), Emotional arousal emphasizes affective responses.",
            "from_Competence": "Unlike Competence (skill development), Emotional arousal emphasizes emotional responses to achievements.",
            "from_Relatedness": "Unlike Relatedness (social connections), Emotional arousal emphasizes personal emotional experiences.",
            "from_Self-concept": "Unlike Self-concept (identity), Emotional arousal focuses on momentary emotional experiences.",
            "from_Cognitive inconsistency": "Unlike Cognitive inconsistency (contradictions), Emotional arousal emphasizes affective responses.",
            "from_Dissonance arousal": "Unlike Dissonance arousal (discomfort from contradictions), Emotional arousal includes any task-related emotions.",
            "from_Dissonance reduction": "Unlike Dissonance reduction (resolving conflicts), Emotional arousal emphasizes emotional experiences.",
            "from_Performance accomplishments": "Unlike Performance accomplishments (concrete achievements), Emotional arousal focuses on emotional responses to achievements.",
            "from_Vicarious experience": "Unlike Vicarious experience (learning from others), Emotional arousal emphasizes direct personal emotional experiences.",
            "from_Verbal persuasion": "Unlike Verbal persuasion (encouragement from others), Emotional arousal focuses on internal feeling states rather than external feedback.",
            "from_Descriptive Norms": "Unlike Descriptive Norms (common behaviors), Emotional arousal emphasizes personal emotional experiences.",
            "from_Injunctive Norms": "Unlike Injunctive Norms (community approval), Emotional arousal focuses on personal emotions.",
            "from_Social Sanctions": "Unlike Social Sanctions (social consequences), Emotional arousal emphasizes personal emotional experiences.",
            "from_Reference Group Identification": "Unlike Reference Group Identification (group membership), Emotional arousal focuses on individual emotional experiences."
        }
    },
}

## Helper Functions Section
These utility functions facilitate user interaction and handle data conversion for the message generation and evaluation process.

In [None]:
# Helper functions for user input

def get_user_input(prompt, default=None, cast_func=str):
    """Ask user for input with a default value."""
    if default is not None:
        prompt = f"{prompt} (default: {default}): "
    else:
        prompt = f"{prompt}: "

    user_input = input(prompt).strip()
    return cast_func(user_input) if user_input else default

def default_serializer(obj):
    """ Custom serializer to handle non-serializable objects """
    if hasattr(obj, 'to_json'):
        return obj.to_json()  # If object has a JSON method
    if isinstance(obj, plt.Axes):
        return "Matplotlib Axes object (not serializable)"  # Avoid dumping it
    return str(obj)  # Convert any other unknown objects to string

##Semantic Analysis Section
These functions measure and analyze semantic similarity between generated messages, ensuring diversity and preventing repetition in the message set.

In [None]:
# Functions for measuring message similarity and semantic analysis

def calculate_semantic_similarity(message1, message2):
    """Calculate semantic similarity between two messages using cosine similarity.

    Args:
        message1: First message
        message2: Second message

    Returns:
        float: Similarity score between 0 and 1
    """
    # Encode the messages to get their embeddings
    embeddings = model.encode([message1, message2], convert_to_tensor=True)

    # Convert to numpy and calculate cosine similarity
    embedding1 = embeddings[0].cpu().numpy().reshape(1, -1)
    embedding2 = embeddings[1].cpu().numpy().reshape(1, -1)

    sim_score = cosine_similarity(embedding1, embedding2)[0][0]
    return sim_score

def check_message_similarity(new_message, previous_messages, threshold=0.8):
    """Check if new message is too semantically similar to previous ones.

    Args:
        new_message: New message to check
        previous_messages: List of previous messages
        threshold: Similarity threshold (0-1)

    Returns:
        bool: True if too similar, False otherwise
    """
    if not previous_messages:
        return False

    for prev_msg in previous_messages:
        similarity = calculate_semantic_similarity(new_message, prev_msg)
        if similarity > threshold:
            return True
    return False

def calculate_pairwise_similarities(messages):
    """Calculate pairwise semantic similarities between all messages.

    Args:
        messages: List of messages

    Returns:
        list: List of tuples (pair_name, similarity_score)
    """
    if len(messages) <= 1:
        return []

    # Encode all messages at once for efficiency
    embeddings = model.encode(messages, convert_to_tensor=True)

    # Convert to numpy for cosine_similarity calculation
    embeddings_np = embeddings.cpu().numpy()

    # Calculate full similarity matrix
    similarity_matrix = cosine_similarity(embeddings_np)

    # Extract pairwise similarities (upper triangle of the matrix)
    pairs = []
    similarities = []
    for i in range(len(messages)):
        for j in range(i+1, len(messages)):
            pairs.append(f"M{i+1}-M{j+1}")
            similarities.append(similarity_matrix[i, j])

    return list(zip(pairs, similarities))

def calculate_message_similarity_metrics(message, previous_messages):
    """Calculate similarity metrics between a message and all previous messages.

    Args:
        message: Current message
        previous_messages: List of all previous messages

    Returns:
        dict: Similarity metrics
    """
    metrics = {"has_previous": False}

    if len(previous_messages) > 0:
        metrics["has_previous"] = True

        # Compare with the last message
        prev_message = previous_messages[-1]
        similarity = calculate_semantic_similarity(message, prev_message)
        metrics["last_message_similarity"] = similarity

        # If we have multiple messages, calculate avg similarity with all previous
        if len(previous_messages) > 1:
            all_similarities = []
            for prev_msg in previous_messages:
                sim = calculate_semantic_similarity(message, prev_msg)
                all_similarities.append(sim)

            metrics["all_similarities"] = all_similarities
            metrics["avg_similarity"] = sum(all_similarities) / len(all_similarities)
            metrics["max_similarity"] = max(all_similarities)
            metrics["min_similarity"] = min(all_similarities)

    return metrics

##LLM Models Setup Section
These functions configure the LLaMA message generator and GPT-4o evaluator with appropriate parameters and API connections for the generation and evaluation pipeline.

In [None]:
# Functions to set up the generator and evaluator

def setup_generator(generator_model, generator_temp, together_api_key):
    """Set up the text generator and return the function and configuration.

    Args:
        generator_model: Model name to use for generation
        generator_temp: Temperature for generation
        together_api_key: API key for Together.ai

    Returns:
        tuple: (generation_function, config_dict)
    """
    if together_api_key:
        # Set up Together.ai client
        together_client = Together(api_key=together_api_key)

        # Configure model parameters
        generator_config = {
            "model": generator_model,
            "temperature": generator_temp,
            "top_p": 0.95,
            "max_tokens": 2048
        }

        # Function to generate text with Llama
        def generate_text(prompt):
            response = together_client.chat.completions.create(
                model=generator_config["model"],
                messages=[{"role": "user", "content": prompt}],
                temperature=generator_config["temperature"],
                top_p=generator_config["top_p"],
                max_tokens=generator_config["max_tokens"]
            )
            return response.choices[0].message.content.strip()
    else:
        print("Warning: TOGETHER_API_KEY not found. Llama model will not be available.")
        generate_text = lambda prompt: "API key missing - unable to generate text"
        generator_config = {
            "model": generator_model,
            "temperature": generator_temp,
            "top_p": 0.95,
            "max_tokens": 2048
        }

    return generate_text, generator_config

def setup_evaluator(evaluator_model, evaluator_temp, openai_api_key):
    """Set up the message evaluator and return the function and configuration.

    Args:
        evaluator_model: Model name to use for evaluation
        evaluator_temp: Temperature for evaluation
        openai_api_key: API key for OpenAI

    Returns:
        tuple: (evaluation_function, config_dict)
    """
    if openai_api_key:
        # Set up OpenAI client
        openai_client = OpenAI(api_key=openai_api_key)

        # Configure model parameters
        evaluator_config = {
            "model": evaluator_model,
            "temperature": evaluator_temp,
            "max_tokens": 2048
        }

        # Function to evaluate text with OpenAI
        def evaluate_message(message, construct_name, context=None):
            # Create the evaluation prompt
            evaluation_prompt = create_evaluation_prompt(message, construct_name, context)

            response = openai_client.chat.completions.create(
                model=evaluator_config["model"],
                messages=[{"role": "user", "content": evaluation_prompt}],
                temperature=evaluator_config["temperature"],
                response_format={"type": "json_object"}
            )

            # Parse JSON response
            result_text = response.choices[0].message.content
            try:
                result = json.loads(result_text)
                if construct_name not in result["ratings"]:
                    # If target construct is missing, add it using the overall score
                    result["ratings"][construct_name] = result["score"]
                return result
            except json.JSONDecodeError:
                # Fallback if response is not valid JSON
                print(f"Error parsing evaluation response: {result_text[:100]}...")
                return {
                    "score": 50,
                    "ratings": {construct_name: 50},
                    "feedback": {
                        "strengths": "Error parsing evaluation response",
                        "improvements": "Error parsing evaluation response",
                        "differentiation_tips": "Error parsing evaluation response"
                    }
                }
    else:
        print("Warning: OPENAI_API_KEY not found. Evaluation will not be available.")
        evaluate_message = lambda message, construct_name, context=None: {
            "score": 0,
            "ratings": {construct_name: 0},
            "feedback": {
                "strengths": "API key missing - unable to evaluate",
                "improvements": "API key missing - unable to evaluate",
                "differentiation_tips": "API key missing - unable to evaluate"
            }
        }
        evaluator_config = {
            "model": evaluator_model,
            "temperature": evaluator_temp,
            "max_tokens": 2048
        }

    return evaluate_message, evaluator_config

def create_evaluation_prompt(message, construct_name, context=None):
    """Create a detailed evaluation prompt for the message.

    Args:
        message: The message to evaluate
        construct_name: Name of the target construct
        context: Optional context to use

    Returns:
        str: Evaluation prompt
    """
    # Get construct info
    construct_info = all_constructs.get(construct_name, {})
    construct_description = construct_info.get("description", "")

    # Get examples
    construct_examples = construct_info.get("examples", [])
    examples_text = ""
    for i, example in enumerate(construct_examples, 1):
        examples_text += f"{i}. \"{example}\"\n"

    # Create differentiation text
    differentiation = construct_info.get("differentiation", {})
    differentiation_text = ""
    for other_construct, diff_description in differentiation.items():
        differentiation_text += f"- {diff_description}\n"

    # Prepare evaluation prompt - use angle brackets instead of square brackets
    evaluation_prompt = f"""
    Context: {context or "Task completion scenario with ethical considerations"}

    Message to evaluate: "{message}"

    Target Construct: {construct_name}

    Construct Description: {construct_description}

    Construct Examples: {examples_text}

    Construct Differentiation: {differentiation_text}

    IMPORTANT SCORING GUIDANCE:
    - Each message should align strongly with ONE primary construct - the target construct
    - When a message strongly aligns with the target construct, competing constructs MUST receive proportionally LOWER scores
    - High differentiation is essential - if target construct scores > 80%, competing constructs should score significantly lower
    - Avoid score inflation for non-target constructs - they should score at least 30% lower than the target construct
    - Be critical and demanding in your evaluation of alignment

    Evaluate how well this message aligns with the target construct {construct_name}.
    For each construct, assign a confidence score (0-100%) based on the following criteria:

    SCORING RUBRIC:
    - 95-100%: Message perfectly captures all aspects of the construct with ideal emphasis while completely avoiding elements of differentiated constructs. Message uses natural language that precisely captures the psychological mechanism and perfectly resembles the provided examples.
    - 90-94%: Message excellently captures nearly all aspects of the construct with appropriate emphasis while clearly avoiding most elements of differentiated constructs. Message uses language that very clearly captures the psychological mechanism and closely resembles the provided examples.
    - 85-89%: Message strongly captures most aspects of the construct with good emphasis while avoiding important elements of differentiated constructs. Message uses language that clearly captures the psychological mechanism and resembles the provided examples well.
    - 80-84%: Message clearly captures several key aspects of the construct and largely avoids elements of differentiated constructs. Message contains similar themes to the examples with only minimal overlap with related constructs.
    - 75-79%: Message adequately captures some important aspects of the construct but may include minor elements from differentiated constructs. Message shows similarity to examples but lacks precision in differentiating from other constructs.
    - 70-74%: Message conveys basic aspects of the construct but includes elements from one or two differentiated constructs. Message shows general similarity to examples but lacks precision.
    - 60-69%: Message only partially relates to the construct description and fails to maintain boundaries from multiple differentiated constructs. Message has limited similarity to examples.
    - 50-59%: Message tangentially relates to the construct description but primarily reflects aspects of differentiated constructs. Message has minimal similarity to examples.
    - 0-49%: Message contradicts the construct description or primarily exemplifies differentiated constructs. Message bears little resemblance to provided examples.

    Provide ratings for all psychological constructs, not just the target. Ensure proper differentiation between scores.

    Respond in this JSON format:
    {{
        "score": <score for the target construct>,
        "ratings": {{
            "Autonomy": <score>,
            "Competence": <score>,
            "Relatedness": <score>,
            ...
        }},
        "feedback": {{
            "strengths": <what the message does well>,
            "improvements": <how the message could better align with the target construct>,
            "differentiation_tips": <how to better differentiate from competing constructs>
        }}
    }}
    """

    return evaluation_prompt

##Results Processing Section
These functions process evaluation results into standardized formats, calculate summary statistics, and extract key metrics for analysis and visualization.

In [None]:
# Functions for processing and analyzing results

def process_evaluation_results(evaluations, construct_name):
    """Process evaluation results to ensure serializable data.

    Args:
        evaluations: List of evaluation results
        construct_name: Target construct name

    Returns:
        tuple: (serializable_ratings, evaluation_scores)
    """
    # Create a JSON-serializable copy of all ratings
    serializable_ratings = []
    for evaluation in evaluations:
        # Convert each rating dict to ensure all values are native Python types
        serializable_rating = {}
        for k, v in evaluation["ratings"].items():
            # Ensure we're storing native Python types, not numpy types
            if hasattr(v, "item"):  # Check if it's a numpy type
                serializable_rating[k] = v.item()
            else:
                serializable_rating[k] = float(v)
        serializable_ratings.append(serializable_rating)

    # Create a list of evaluation scores using native Python types
    evaluation_scores = []
    for e in evaluations:
        if hasattr(e["score"], "item"):  # Check if it's a numpy type
            evaluation_scores.append(e["score"].item())
        else:
            evaluation_scores.append(float(e["score"]))

    return serializable_ratings, evaluation_scores

def calculate_summary_statistics(evaluations, generation_times, evaluation_times):
    """Calculate summary statistics from test results.

    Args:
        evaluations: List of evaluation results
        generation_times: List of generation times
        evaluation_times: List of evaluation times

    Returns:
        dict: Summary statistics
    """
    # Calculate average score
    avg_score = np.mean([e["score"] for e in evaluations])
    if hasattr(avg_score, "item"):  # Convert numpy types to Python native types
        avg_score = avg_score.item()

    # Calculate average generation time
    avg_generation_time = np.mean(generation_times)
    if hasattr(avg_generation_time, "item"):
        avg_generation_time = avg_generation_time.item()

    # Calculate average evaluation time
    avg_evaluation_time = np.mean(evaluation_times)
    if hasattr(avg_evaluation_time, "item"):
        avg_evaluation_time = avg_evaluation_time.item()

    return {
        "avg_score": avg_score,
        "avg_generation_time": avg_generation_time,
        "avg_evaluation_time": avg_evaluation_time
    }

def calculate_diversity_metrics(messages):
    """Calculate diversity metrics for a set of messages.

    Args:
        messages: List of messages

    Returns:
        dict: Diversity metrics
    """
    if len(messages) <= 1:
        return {}

    # Calculate pairwise semantic similarities
    pairwise_similarities = calculate_pairwise_similarities(messages)

    # Extract just the similarity values
    similarities = [sim for _, sim in pairwise_similarities]

    # Calculate metrics
    avg_similarity = sum(similarities) / len(similarities)
    max_similarity = max(similarities)
    min_similarity = min(similarities)
    diversity_score = 1.0 - avg_similarity  # Inverse of average similarity

    return {
        "average_similarity": avg_similarity,
        "max_similarity": max_similarity,
        "min_similarity": min_similarity,
        "diversity_score": diversity_score,
        "pairwise_similarities": pairwise_similarities
    }

def calculate_message_metrics(messages, evaluations, construct_name, all_ratings):
    """Calculate metrics for all messages including target score, differentiation, and uniqueness.

    Args:
        messages: List of all messages
        evaluations: List of evaluation scores
        construct_name: Target construct name
        all_ratings: List of all ratings dictionaries

    Returns:
        list: List of message metric dictionaries
    """
    message_metrics = []

    for i, (message, eval_score, rating) in enumerate(zip(messages, evaluations, all_ratings)):
        # Calculate score difference from highest competing construct
        other_scores = {k: v for k, v in rating.items() if k != construct_name}
        score_difference = 0
        highest_competing_name = None

        if other_scores:
            highest_competing_item = max(other_scores.items(), key=lambda x: x[1])
            highest_competing_name = highest_competing_item[0]
            highest_competing = highest_competing_item[1]
            score_difference = float(eval_score) - highest_competing

        # Calculate message uniqueness (how different it is from all other messages)
        # Lower similarity = higher uniqueness/diversity
        total_similarity = 0
        comparison_count = 0

        for j, other_message in enumerate(messages):
            if i != j:  # Don't compare with itself
                similarity = calculate_semantic_similarity(message, other_message)
                total_similarity += similarity
                comparison_count += 1

        # Average similarity to all other messages (if there are any)
        avg_similarity = total_similarity / comparison_count if comparison_count > 0 else 0
        uniqueness_score = 1 - avg_similarity  # Convert to uniqueness (higher is better)

        # Store metrics for this message
        message_metrics.append({
            "index": i,
            "message": message,
            "target_score": float(eval_score),
            "score_difference": score_difference,
            "competing_construct": highest_competing_name,
            "uniqueness_score": uniqueness_score,
            "avg_similarity": avg_similarity,  # Store both for completeness
            "combined_score": float(eval_score) * 0.5 + score_difference * 0.3 + uniqueness_score * 0.2  # Weighted score
        })

    return message_metrics

def select_best_messages(message_metrics, count=3):
    """Select the best messages based on combined metrics.

    Args:
        message_metrics: List of message metric dictionaries
        count: Number of best messages to select

    Returns:
        list: Top N message metrics
    """
    # Sort by combined score (descending)
    sorted_metrics = sorted(message_metrics, key=lambda x: x["combined_score"], reverse=True)

    # Get top N messages
    return sorted_metrics[:count]


##Message Generation Section
These functions create diverse messages aligned with psychological constructs, applying advanced diversity strategies to ensure each message takes a unique approach.

In [None]:
# Functions for message generation and text cleaning

def generate_diverse_message(generator_config, generate_text_fn, construct_name,
                            previous_messages, iteration, diversity_level=0.5):
    """Generate a message with enforced diversity from previous messages.

    Args:
        generator_config: Dictionary with generator configuration
        generate_text_fn: Function to generate text using LLM
        construct_name: Name of the construct
        previous_messages: List of previously generated messages
        iteration: Current iteration number
        diversity_level: How aggressive diversity should be (0-1)

    Returns:
        str: Generated message
    """
    # Store original temperature
    original_temp = generator_config.get("temperature", 0.7)

    # Select a random context from available task contexts
    if task_contexts:
        # Use a different context for each message if possible
        context = task_contexts[iteration % len(task_contexts)]

    # Create diversity focus based on iteration
    diversity_focuses = [
        "how this construct manifests in personal growth over time",
        "how this construct helps people overcome specific challenges and obstacles",
        "the universal experience of engaging with this construct",
        "the long-term benefits of applying this construct consistently",
        "how this construct relates to authentic skill development",
        "the relationship between this construct and maintaining integrity",
        "how this construct guides effective decision-making processes",
        "practical day-to-day applications of this construct",
        "how this construct shapes perspectives on learning",
        "the relationship between this construct and genuine satisfaction"
    ]

    # Use a different diversity focus for each iteration
    diversity_focus = diversity_focuses[iteration % len(diversity_focuses)]

    # Increase temperature for diversity (scales with iteration)
    temp_increase = min(0.5, 0.1 + (iteration * 0.05))
    generator_config["temperature"] = min(0.9, original_temp + temp_increase)

    # Get construct info for more targeted generation
    construct_info = all_constructs.get(construct_name, {})
    construct_description = construct_info.get("description", "")

    # Get examples
    construct_examples = construct_info.get("examples", [])
    examples_text = ""
    for i, example in enumerate(construct_examples, 1):
        examples_text += f"{i}. \"{example}\"\n"

    # Create differentiation text
    differentiation = construct_info.get("differentiation", {})
    differentiation_text = ""
    for other_construct, diff_description in differentiation.items():
        differentiation_text += f"- {diff_description}\n"

    # Craft a diversity-focused generation prompt
    generation_prompt = f"""
    Context: {context}

    I need you to craft a message that strongly aligns with the psychological construct of {construct_name}, but approaches it from a unique angle.

    Here is the detailed description of {construct_name}:
    {construct_description}

    Here are examples of messages that exemplify {construct_name}:
    {examples_text}

    This construct is differentiated from other constructs in these ways:
    {differentiation_text}

    IMPORTANT DIVERSITY GUIDANCE:
    - Focus on {diversity_focus}
    - Use a different approach than typical messages about this construct
    - Avoid common phrasings or standard motivational language
    - Create a message that feels distinctly different from the previous messages
    - Still maintain perfect alignment with the core principles of {construct_name}
    - When a message aligns strongly with the target construct, ensure competing constructs receive proportionally lower scores for the next messages

    Create exactly one message that is 3-4 sentences long, using simple, conversational and natural language at approximately an 8th-grade reading level. Use short sentences and avoid complex vocabulary, long words, jargon, or academic phrasing. The message should be easily understood by the average person while still conveying the core principles of {construct_name}.

    Critical:
    - Keep each sentence short for better readability. Break longer ideas into multiple short sentences.
    - Your message should have NO elements that could be confused with other constructs. Refer to the differentiation guidance in {differentiation_text} to ensure clear boundaries between constructs.

    The message should encourage honest effort and authentic skill development, rather than taking shortcuts or using unauthorized assistance.
    """

    # Extract key phrases from previous messages to avoid
    if previous_messages:
        # Extract 2-3 key phrases from each previous message
        key_phrases = []
        for msg in previous_messages[-3:]:  # Consider last 3 messages
            words = msg.split()
            # Get some 3-word phrases to avoid
            if len(words) > 3:
                for i in range(len(words) - 2):
                    if random.random() < 0.3:  # Only sample some phrases
                        phrase = " ".join(words[i:i+3])
                        if len(phrase) > 10:  # Only meaningful phrases
                            key_phrases.append(phrase)

        # Add these phrases to avoid
        if key_phrases:
            generation_prompt += "\n\nAvoid using these specific phrases or very similar ones:"
            for phrase in key_phrases[:5]:  # Limit to 5 phrases
                generation_prompt += f"\n- \"{phrase}\""

    # Add specific structural diversity based on iteration
    if iteration % 4 == 0:
        generation_prompt += "\nCreate a message that uses a question-answer format."
    elif iteration % 4 == 1:
        generation_prompt += "\nCreate a message that uses a conditional 'when-then' structure."
    elif iteration % 4 == 2:
        generation_prompt += "\nCreate a message that uses a comparison or contrast structure."
    else:
        generation_prompt += "\nCreate a message that uses cause-effect reasoning format."

    generation_prompt += """
        IMPORTANT:
        1. Create exactly one message that is 3-4 sentences long, using simple, conversational and natural language.
        2. Avoid using first-person perspective (I, me, my). Address the reader directly with 'you' or use universal third-person language (people, one, learners, etc.).
        3. Keep each sentence short for better readability. Break longer ideas into multiple short sentences.
        4. The message should be easily understood by the average person while still conveying the core principles of the construct.
    """

    max_attempts = 3
    message = None

    for attempt in range(max_attempts):
        # Generate message with diversity instructions
        raw_message = generate_text_fn(generation_prompt)

        # Clean up the message
        message = clean_message(raw_message)

        # Check if it's similar to previous messages
        if not check_message_similarity(message, previous_messages, threshold=0.8):
            # We found a sufficiently different message
            break

        # If too similar, increase temperature and try again
        print(f"  Generated message too semantically similar, retrying with higher temperature...")
        generator_config["temperature"] = min(0.95, generator_config["temperature"] + 0.1)

    # Reset generator temperature
    generator_config["temperature"] = original_temp

    return message


def clean_message(message):
    """Clean up generated message by removing prefixes, quotes, etc."""
    # Remove common prefixes
    prefixes = [
        "Here's an improved message:",
        "Improved message:",
        "Here is the improved message:",
        "Here's a message:",
        "Message:",
        "Here is a message that",
        "Here's my message:",
        "Here is my response:",
        "Here is the message:"
    ]

    for prefix in prefixes:
        if message.startswith(prefix):
            message = message[len(prefix):].strip()

    # Remove surrounding quotes
    if (message.startswith('"') and message.endswith('"')) or \
       (message.startswith("'") and message.endswith("'")):
        message = message[1:-1]

    # Remove trailing quoted attribution
    if message.endswith('"'):
        last_quote_start = message.rfind('"', 0, -1)
        if last_quote_start != -1 and last_quote_start > len(message) // 2:
            message = message[:last_quote_start].strip()

    return message.strip()


##Main Execution Section
This function orchestrates the entire generation and evaluation process, from model initialization to results storage, tracking detailed metrics throughout the pipeline.

In [None]:
# Main execution function
def generate_and_evaluate_messages(construct_name, num_messages=10,
                                 generator_temp=0.7, evaluator_temp=0.2,
                                 generator_model="meta-llama/Llama-3.3-70B-Instruct-Turbo",
                                 evaluator_model="gpt-4o",
                                 output_dir="results"):
    """Generate and evaluate messages for a construct with diversity.

    Args:
        construct_name: Name of the construct to test
        num_messages: Number of messages to generate and evaluate
        generator_temp: Temperature for the generator
        evaluator_temp: Temperature for the evaluator
        generator_model: Model for the generator
        evaluator_model: Model for the evaluator
        output_dir: Directory to save results

    Returns:
        dict: Test results
    """
    # Create output directory if it doesn't exist
    output_dir = os.path.join(storage, output_dir)
    os.makedirs(output_dir, exist_ok=True)
    print("google drive path:", output_dir)

    print(f"\nInitializing generator ({generator_model}) and evaluator ({evaluator_model})...")

    # Set up generator and evaluator
    generate_text, generator_config = setup_generator(generator_model, generator_temp, TOGETHER_API_KEY)
    evaluate_message, evaluator_config = setup_evaluator(evaluator_model, evaluator_temp, OPENAI_API_KEY)

    # Validate construct exists
    if construct_name not in all_constructs:
        raise ValueError(f"Unknown construct: {construct_name}")

    # Track results
    messages = []
    evaluations = []
    generation_times = []
    evaluation_times = []

    print(f"\nGenerating and evaluating {num_messages} messages for '{construct_name}'...")

    # Generate and evaluate messages
    for i in range(1, num_messages + 1):
        print(f"\nMessage {i}/{num_messages}:")

        # Generate message with diversity
        start_time = time.time()
        message = generate_diverse_message(
            generator_config,
            generate_text,
            construct_name,
            messages,  # Pass all previous messages
            i - 1,     # 0-based iteration
            diversity_level=0.7  # Higher diversity level
        )
        generation_time = time.time() - start_time
        generation_times.append(generation_time)

        messages.append(message)
        print(f"Generated: {message}")
        print(f"Generation time: {generation_time:.2f}s")

        # Calculate similarity metrics and print info
        similarity_metrics = calculate_message_similarity_metrics(message, messages[:-1])
        if similarity_metrics["has_previous"]:
            print(f"Semantic similarity with previous message: {similarity_metrics['last_message_similarity']:.2f}")
            if len(messages) > 2:
                print(f"Average similarity with all previous messages: {similarity_metrics['avg_similarity']:.2f}")

        # Evaluate message
        start_time = time.time()
        evaluation = evaluate_message(message, construct_name)
        evaluation_time = time.time() - start_time
        evaluation_times.append(evaluation_time)

        evaluations.append(evaluation)
        print(f"Evaluation score: {evaluation['score']}%")
        print(f"Evaluation time: {evaluation_time:.2f}s")

        # Show top competing constructs
        other_scores = {k: v for k, v in evaluation["ratings"].items() if k != construct_name}
        if other_scores:
            top_competing = sorted(other_scores.items(), key=lambda x: x[1], reverse=True)[:3]
            top_competing_str = ", ".join([f"{name} ({score}%)" for name, score in top_competing])
            print(f"Top competing constructs: {top_competing_str}")

    # Process results for proper serialization
    serializable_ratings, evaluation_scores = process_evaluation_results(evaluations, construct_name)

    # Calculate summary statistics
    stats = calculate_summary_statistics(evaluations, generation_times, evaluation_times)

    # Calculate diversity metrics
    diversity_metrics = calculate_diversity_metrics(messages)
    if diversity_metrics:
        print(f"\nSemantic Diversity Metrics:")
        print(f"  Average semantic similarity: {diversity_metrics['average_similarity']:.2f}")
        print(f"  Maximum similarity: {diversity_metrics['max_similarity']:.2f}")
        print(f"  Minimum similarity: {diversity_metrics['min_similarity']:.2f}")
        print(f"  Overall diversity score: {diversity_metrics['diversity_score']:.2f}")

    # Create message metrics
    message_metrics = calculate_message_metrics(messages, evaluation_scores, construct_name, serializable_ratings)

    # Select best messages
    best_messages = select_best_messages(message_metrics, count=num_messages)
    print("\n===== Messages in order (best --> worst) =====")
    for i, msg_data in enumerate(best_messages, 1):
        print(f"\n{i}. Message {msg_data['index'] + 1}: \"{msg_data['message']}\"")
        print(f"   Target Score: {msg_data['target_score']:.1f}%")
        print(f"   Score Difference: {msg_data['score_difference']:.1f}%")
        print(f"   Uniqueness/Diversity: {msg_data['uniqueness_score']:.2f}")
        print(f"   Combined Score: {msg_data['combined_score']:.2f}")


    # Create results dictionary
    results = {
        "construct": construct_name,
        "num_messages": num_messages,
        "generator_model": generator_model,
        "evaluator_model": evaluator_model,
        "generator_temperature": generator_temp,
        "evaluator_temperature": evaluator_temp,
        "avg_score": stats["avg_score"],
        "avg_generation_time": stats["avg_generation_time"],
        "avg_evaluation_time": stats["avg_evaluation_time"],
        "messages": messages,
        "evaluations": evaluation_scores,
        "all_ratings": serializable_ratings,
        "diversity_metrics": diversity_metrics,
        "best_messages": [
            {
                "message_index": m["index"] + 1,
                "message": m["message"],
                "target_score": m["target_score"],
                "score_difference": m["score_difference"],
                "diversity_score": m["uniqueness_score"],
                "avg_similarity": m["avg_similarity"],
                "combined_score": m["combined_score"]
            } for m in best_messages
        ]
    }

    # Save results to file
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    filename = f"{construct_name}_{timestamp}.json"
    filepath = os.path.join(output_dir, filename)

    with open(filepath, "w") as f:
        json.dump(results, f, indent=2, default=default_serializer)

    print(f"\nResults saved to: {filepath}")

    return results


##Visualization Section
This function creates comprehensive visualizations of analysis results, showcasing alignment scores, competing constructs and semantic diversity patterns.

In [None]:
# Visualization function
def plot_results(results, output_dir="results"):
    """Create visualization of test results with improved graphs matching message_optimizer style.

    Args:
        results: Test results from generate_and_evaluate_messages
    """
    import matplotlib.pyplot as plt
    import numpy as np
    from datetime import datetime
    import os

    construct_name = results["construct"]
    messages = results["messages"]
    evaluations = results["evaluations"]
    all_ratings = results["all_ratings"]
    diversity_metrics = results.get("diversity_metrics", {})

    # Create a figure with multiple subplots
    fig, (ax1, ax2, ax3) = plt.subplots(3, 1, figsize=(12, 18),
                                       gridspec_kw={'height_ratios': [1, 1, 0.7]})

    # Plot 1: Target construct scores versus top competing constructs across messages
    message_indices = list(range(1, len(messages) + 1))

    # Plot target construct scores
    ax1.plot(message_indices, evaluations, marker='o', label=construct_name, linewidth=2)

    # Identify top competing constructs across all messages
    all_competing = {}
    for ratings in all_ratings:
        for construct, score in ratings.items():
            if construct != construct_name:
                if construct not in all_competing:
                    all_competing[construct] = 0
                all_competing[construct] += float(score)

    # Get top 4 competing constructs overall
    top_competing = sorted(all_competing.items(), key=lambda x: x[1], reverse=True)[:4]
    top_competing_names = [c[0] for c in top_competing]

    # Plot top competing constructs
    colors = ['#EA4335', '#FBBC05', '#34A853', '#4285F4']
    for i, comp_name in enumerate(top_competing_names):
        comp_scores = [float(ratings.get(comp_name, 0)) for ratings in all_ratings]
        ax1.plot(message_indices, comp_scores, marker='s', label=comp_name,
                 color=colors[i % len(colors)], linewidth=1.5, alpha=0.8)

    ax1.axhline(y=85, color='#EA4335', linestyle='--', alpha=0.7, label='Target threshold (85%)')
    ax1.set_title(f"Score Trends: Target vs Top Competing Constructs")
    ax1.set_xlabel("Message Number")
    ax1.set_ylabel("Score (%)")
    ax1.set_yticks(range(10, 110, 10))
    ax1.set_xticks(range(1, len(messages), 1))
    ax1.grid(True, alpha=0.3)
    ax1.legend(loc='upper left', bbox_to_anchor=(1, 1))

    # Plot 2: Average construct scores across all messages with error bars
    # Get all construct names
    all_construct_names = list(all_constructs.keys())

    # Calculate average scores and standard deviations across all messages
    avg_scores = {}
    std_devs = {}

    for construct in all_construct_names:
        scores = [float(ratings.get(construct, 0)) for ratings in all_ratings]
        avg_scores[construct] = np.mean(scores)
        std_devs[construct] = np.std(scores) if len(scores) > 1 else 0

    # Sort constructs by theory groups for better visualization
    constructs_by_theory = {}
    for construct in all_construct_names:
        theory = all_constructs[construct]["theory"]
        if theory not in constructs_by_theory:
            constructs_by_theory[theory] = []
        constructs_by_theory[theory].append(construct)

    # Create ordered list of constructs grouped by theory
    all_constructs_ordered = []
    theory_boundaries = []
    current_position = 0

    for theory, constructs in constructs_by_theory.items():
        all_constructs_ordered.extend(constructs)
        current_position += len(constructs)
        theory_boundaries.append((current_position - len(constructs)/2, theory))

    # Calculate values for plotting
    values = [avg_scores[c] for c in all_constructs_ordered]
    errors = [std_devs[c] for c in all_constructs_ordered]
    positions = np.arange(len(all_constructs_ordered))

    # Create bar colors, highlighting the target construct
    colors = ['darkblue' if construct == construct_name else 'gray' for construct in all_constructs_ordered]

    # Create bars with error bars
    ax2.bar(positions, values, color=colors, width=0.7, alpha=0.7)
    ax2.errorbar(positions, values, yerr=errors, fmt='none', ecolor='black',
                capsize=4, elinewidth=1, capthick=1)

    # Format the plot
    ax2.set_title(f"Average Construct Scores Across All Messages", fontsize=14)
    ax2.set_ylabel("Average Score (%)", fontsize=12)
    ax2.set_ylim(0, 100)
    ax2.set_xticks(positions)
    ax2.set_xticklabels(all_constructs_ordered, rotation=45, ha='right', fontsize=9)

    # Highlight target construct label
    if construct_name in all_constructs_ordered:
        target_idx = all_constructs_ordered.index(construct_name)
        plt.setp(ax2.get_xticklabels()[target_idx], color='darkblue', weight='bold')

    # Add grid lines
    ax2.grid(axis='y', linestyle='--', alpha=0.3)

    # Remove top and right spines
    for spine in ['top', 'right']:
        ax2.spines[spine].set_visible(False)

    # Plot 3: Message similarity matrix (if more than one message)
    if len(messages) > 1 and diversity_metrics and "pairwise_similarities" in diversity_metrics:
        if len(messages) <= 10:
            # Get all pairwise similarities
            pairwise_similarities = diversity_metrics["pairwise_similarities"]
            pairs = [pair for pair, _ in pairwise_similarities]
            similarities = [sim for _, sim in pairwise_similarities]

            # Create a bar chart for message similarities
            bars = ax3.bar(pairs, similarities, color='#34A853')
            ax3.axhline(y=0.8, color='#EA4335', linestyle='--', alpha=0.7,
                       label='Similarity threshold (0.8)')

            # Add text labels above the bars
            for i, v in enumerate(similarities):
                ax3.text(i, v + 0.02, f"{v:.2f}", ha='center', fontsize=8)

            ax3.set_title("Pairwise Semantic Similarities")
            ax3.set_xlabel("Message Pairs")
            ax3.set_ylabel("Semantic Similarity (0-1)")
            ax3.set_ylim(0, 1.1)
            ax3.set_xticks(range(len(pairs)))
            ax3.set_xticklabels(pairs, rotation=90, ha="right", fontsize=8)
            ax3.grid(True, alpha=0.3)
            ax3.legend()
        else:
            # Just show summary diversity metrics as text
            ax3.axis('off')  # Hide the axes
            avg_sim = diversity_metrics.get("average_similarity", 0)
            max_sim = diversity_metrics.get("max_similarity", 0)
            min_sim = diversity_metrics.get("min_similarity", 0)
            div_score = diversity_metrics.get("diversity_score", 0)

            # Create a text box with diversity metrics
            textstr = '\n'.join((
                f'Semantic Diversity Metrics:',
                f'Average Semantic Similarity: {avg_sim:.2f}',
                f'Max Semantic Similarity: {max_sim:.2f}',
                f'Min Semantic Similarity: {min_sim:.2f}',
                f'Diversity Score: {div_score:.2f} (higher is better)'
            ))

            # Place a text box at the center of ax3
            props = dict(boxstyle='round', facecolor='wheat', alpha=0.5)
            ax3.text(0.5, 0.5, textstr, transform=ax3.transAxes, fontsize=12,
                    verticalalignment='center', horizontalalignment='center', bbox=props)
    else:
        # If no diversity metrics, hide this plot
        ax3.axis('off')

    plt.tight_layout(pad=3.0)

    # Save the plot
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    filename = f"{construct_name}_{timestamp}.png"
    filepath = os.path.join(storage, output_dir, filename)

    plt.savefig(filepath, dpi=300, bbox_inches='tight')
    print(f"Test summary visualization saved to: {filepath}")
    plt.close(fig)

    return filepath


##Interactive UI Section
These widgets enable easy selection of psychological constructs and configuration of generation parameters through an intuitive user interface.

In [None]:
# Create a big, easy-to-click button for running the entire process
from IPython.display import display, clear_output

# Create a progress output area
progress_output = widgets.Output()

# Store results for later use
global_results = {"data": None}

def display_construct_choices():
    """Display available constructs organized by theory."""
    # Group constructs by theory
    constructs_by_theory = {}
    for construct in all_constructs:
        theory = all_constructs[construct]["theory"]
        if theory not in constructs_by_theory:
            constructs_by_theory[theory] = []
        constructs_by_theory[theory].append(construct)

    # Create options for dropdown
    all_options = []
    for theory, constructs in constructs_by_theory.items():
        all_options.extend(constructs)

    # Create dropdown widget
    construct_dropdown = widgets.Dropdown(
        options=all_options,
        description='Construct:',
        style={'description_width': 'initial'},
        layout={'width': 'auto'}
    )

    display(construct_dropdown)
    return construct_dropdown

# Call the function to display the dropdown
construct_dropdown = display_construct_choices()

# Parameter selection widgets
num_messages_slider = widgets.IntSlider(
    value=10, min=1, max=30, step=1,
    description='Number of messages:',
    style={'description_width': 'initial'}
)

generator_temp_slider = widgets.FloatSlider(
    value=0.7, min=0.1, max=1.0, step=0.1,
    description='Generator temperature:',
    style={'description_width': 'initial'}
)

evaluator_temp_slider = widgets.FloatSlider(
    value=0.2, min=0.1, max=1.0, step=0.1,
    description='Evaluator temperature:',
    style={'description_width': 'initial'}
)

generator_model_dropdown = widgets.Dropdown(
    options=['meta-llama/Llama-3.3-70B-Instruct-Turbo',
             'meta-llama/Llama-3.3-8B-Instruct-Turbo'],
    value='meta-llama/Llama-3.3-70B-Instruct-Turbo',
    description='Generator model:',
    style={'description_width': 'initial'},
    layout={'width': 'auto'}
)

evaluator_model_dropdown = widgets.Dropdown(
    options=['gpt-4o', 'gpt-3.5-turbo'],
    value='gpt-4o',
    description='Evaluator model:',
    style={'description_width': 'initial'},
    layout={'width': 'auto'}
)

output_dir_text = widgets.Text(
    value='results',
    description='Output directory:',
    style={'description_width': 'initial'},
    layout={'width': 'auto'}
)

# Display all widgets
display(num_messages_slider, generator_temp_slider, evaluator_temp_slider,
        generator_model_dropdown, evaluator_model_dropdown, output_dir_text)


def run_full_analysis(button):
    """Run the complete analysis process with the selected parameters."""
    construct_name = construct_dropdown.value

    with progress_output:
        clear_output()
        print(f"Starting analysis for: {construct_name}")
        print(f"Using parameters: {num_messages_slider.value} messages, "
              f"generator temp={generator_temp_slider.value}, "
              f"evaluator temp={evaluator_temp_slider.value}")
        print("This may take several minutes. Please wait...")

        try:
            # Run the analysis
            results = generate_and_evaluate_messages(
                construct_name,
                num_messages=num_messages_slider.value,
                generator_temp=generator_temp_slider.value,
                evaluator_temp=evaluator_temp_slider.value,
                generator_model=generator_model_dropdown.value,
                evaluator_model=evaluator_model_dropdown.value,
                output_dir=output_dir_text.value
            )

            # Store results for later visualization
            global_results["data"] = results

            print("\nAnalysis completed successfully!")
            print(f"Average score: {results['avg_score']:.1f}%")
            print(f"Generated {len(results['messages'])} messages")
            print("\nClick the 'Visualize Results' button below to create charts.")

        except Exception as e:
            print(f"\nError during analysis: {str(e)}")
            print("Please check your API keys and parameters and try again.")
            import traceback
            traceback.print_exc()

# Create run button
run_button = widgets.Button(
    description='Generate & Evaluate Messages',
    button_style='success',
    icon='play',
    layout=widgets.Layout(width='300px', height='50px')
)

run_button.on_click(run_full_analysis)

# Display the button and progress area
display(run_button)
display(progress_output)

Dropdown(description='Construct:', layout=Layout(width='auto'), options=('Autonomy', 'Competence', 'Relatednes…

IntSlider(value=10, description='Number of messages:', max=30, min=1, style=SliderStyle(description_width='ini…

FloatSlider(value=0.7, description='Generator temperature:', max=1.0, min=0.1, style=SliderStyle(description_w…

FloatSlider(value=0.2, description='Evaluator temperature:', max=1.0, min=0.1, style=SliderStyle(description_w…

Dropdown(description='Generator model:', layout=Layout(width='auto'), options=('meta-llama/Llama-3.3-70B-Instr…

Dropdown(description='Evaluator model:', layout=Layout(width='auto'), options=('gpt-4o', 'gpt-3.5-turbo'), sty…

Text(value='results', description='Output directory:', layout=Layout(width='auto'), style=DescriptionStyle(des…

Button(button_style='success', description='Generate & Evaluate Messages', icon='play', layout=Layout(height='…

Output()

##Results Visualization Section
This function creates and displays visualizations of analysis results, showcasing top-performing messages and their evaluation metrics.

In [None]:
# Create a visualization button and output area
viz_output = widgets.Output()

def create_visualizations(button):
    """Create and display visualizations of the results."""
    with viz_output:
        clear_output()

        if global_results["data"] is None:
            print("No analysis results available.")
            print("Please run the analysis first by clicking the 'Generate & Evaluate Messages' button above.")
            return

        print("Creating visualizations... Please wait.")
        try:
            # Generate visualizations
            chart_path = plot_results(global_results["data"], output_dir_text.value)

            # Display the chart
            from IPython.display import Image
            display(Image(chart_path))

            print(f"\n==== Messages in order (best --> worst) =====")
            # Also show best messages
            if "best_messages" in global_results["data"]:
                for i, msg in enumerate(global_results["data"]["best_messages"], 1):
                    print(f"\nMessage {msg['message_index']}: {msg['message']}")
                    print(f"\tAlignment Score: {msg['target_score']:.1f}%")
                    print(f"\tScore Difference from Top Competing Construct: {msg['score_difference']:.1f}%")
                    print(f"\tMessage Uniqueness: {msg['diversity_score']*100:.1f}%")
                    print(f"\tCombined score for ranking: {msg['combined_score']:.1f}% (0.5*Alignment + 0.3*Score difference + 0.2*Uniqueness)")

        except Exception as e:
            print(f"Error creating visualizations: {str(e)}")
            import traceback
            traceback.print_exc()

# Create visualization button
viz_button = widgets.Button(
    description='Visualize Results',
    button_style='info',
    icon='chart-bar',
    layout=widgets.Layout(width='300px', height='50px')
)

viz_button.on_click(create_visualizations)

# Display the button and output area
display(viz_button)
display(viz_output)

Button(button_style='info', description='Visualize Results', icon='chart-bar', layout=Layout(height='50px', wi…

Output()