<a href="https://colab.research.google.com/github/jerichosy/THS-STX_user-idea-similarity-for-fixation/blob/main/Similarity_Metric_on_ChatGPT.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

for context these are the user ideas the users throughout the session.

we want to know how the ideas evolved/progressed throughout its iteration. that is, what is the similarity score of the ideas to each other as it iterated using ChatGPT? A higher similarity score between the iteration of ideas may indicate the users remain fixated on the idea and didn't explore/branch out.

In [None]:
# Install the required libraries
!pip install boto3 cohere

Collecting boto3
  Downloading boto3-1.34.153-py3-none-any.whl.metadata (6.6 kB)
Collecting cohere
  Downloading cohere-5.6.2-py3-none-any.whl.metadata (3.3 kB)
Collecting botocore<1.35.0,>=1.34.153 (from boto3)
  Downloading botocore-1.34.153-py3-none-any.whl.metadata (5.7 kB)
Collecting jmespath<2.0.0,>=0.7.1 (from boto3)
  Downloading jmespath-1.0.1-py3-none-any.whl.metadata (7.6 kB)
Collecting s3transfer<0.11.0,>=0.10.0 (from boto3)
  Downloading s3transfer-0.10.2-py3-none-any.whl.metadata (1.7 kB)
Collecting fastavro<2.0.0,>=1.9.4 (from cohere)
  Downloading fastavro-1.9.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (5.5 kB)
Collecting httpx>=0.21.2 (from cohere)
  Downloading httpx-0.27.0-py3-none-any.whl.metadata (7.2 kB)
Collecting httpx-sse<0.5.0,>=0.4.0 (from cohere)
  Downloading httpx_sse-0.4.0-py3-none-any.whl.metadata (9.0 kB)
Collecting parameterized<0.10.0,>=0.9.0 (from cohere)
  Downloading parameterized-0.9.0-py2.py3-none-any.whl.metadata (18 k

In [None]:
import cohere
import numpy as np
from scipy.spatial.distance import cosine
import os
import boto3
import json

# Initialize Cohere client
co = cohere.Client('your_api_key_here')

# Set AWS credentials and region
os.environ['AWS_ACCESS_KEY_ID'] = ''
os.environ['AWS_SECRET_ACCESS_KEY'] = ''
os.environ['AWS_DEFAULT_REGION'] = 'us-east-1'

# Initialize the Bedrock client
bedrock = boto3.client(service_name='bedrock-runtime')
model_id = "cohere.embed-english-v3"

def get_embeddings(texts):
    embeddings = []
    for text in texts:
        body = json.dumps({
            "texts": [text],
            "input_type": "clustering"
        })
        response = bedrock.invoke_model(
            body=body,
            modelId=model_id,
            accept="application/json",
            contentType="application/json"
        )
        response_body = json.loads(response['body'].read())
        embeddings.append(response_body['embeddings'][0])
    return embeddings

def calculate_similarity_scores(data):
    results = {}

    for user, ideas in data.items():
        all_similarities = []
        print(f"-- Analyzing {user} --")  # DEBUG info

        if len(ideas) > 1:
            for i in range(len(ideas) - 1):
                pair = (ideas[i], ideas[i + 1])
                similarity = process_pair(pair)
                all_similarities.append(similarity)

        if all_similarities:
            mean_similarity = np.mean(all_similarities)
            std_similarity = np.std(all_similarities)

            results[user] = {
                'mean_similarity': mean_similarity,
                'std_similarity': std_similarity,
                'similarity_scores': all_similarities
            }
        else:
            results[user] = {
                'mean_similarity': None,
                'std_similarity': None,
                'similarity_scores': None
            }

    return results

def process_pair(pair):
    embeddings = get_embeddings(pair)
    similarity = 1 - cosine(embeddings[0], embeddings[1])
    print(f"Pair: {pair}, Similarity: {similarity:.4f}")  # DEBUG info
    return similarity

# New dataset schema
data = {
    "user 2": [
        "Optimal locations, food types, and food prices for attracting college students to eat at my restaurant.",
        "Further developing the restaurant to expand, keep, and grow the customer base, in terms of the optimal locations, food types, and food prices."
    ],
    "user 5": [
        "Features for a social media platform that doesn't distract users and promote productivity."
    ],
    "user 7": [
        "Unsure of new trends for putting up a restaurant where college students can dine in.",
        "Combating inevitable increase of prices because of inflation while still securing customers long-term.",
        "Is using only social media to promote the new restaurant a good option?",
        "Loyalty programs for encouraging students to dine in more at my potential restaurant business.",
        "What is one thing I can do to attract customers without going all-out with my promotions?"
    ],
    "user 8": [
        "List of suggested features for a social media aimed at encouraging professional growth.",
        "How can we implement Project Showcases and Job Board Integration?"
    ],
    "user 9": [
        "Is gamification of job posting and the like a good idea?",
        "Other aspects to gamify?",
        "Event hosting for job opportunities."
    ],
    "user 11": [
        "Restaurant aimed at a younger demographic that will remain loyal even after they grow older.",
        "Factors to consider and weigh in in creating a brand identity.",
        "Other ways to promote aside from seasonal menus since options are limited in an equatorial country.",
        "How to create an impactful and unique brand identity despite being a startup without much funding?"
    ]
}

# Calculate similarity scores
similarity_results = calculate_similarity_scores(data)

# Print results
all_mean_similarities = []
for user, result in similarity_results.items():

    print(f"{user}:")
    if result['mean_similarity'] is not None:
        print(f"  Mean similarity: {result['mean_similarity']:.4f} (std: {result['std_similarity']:.4f})")
        print(f"  Similarity scores: {result['similarity_scores']}")

        all_mean_similarities.append(result['mean_similarity'])
    else:
        print("  Not enough ideas to calculate similarity.")
    print()

print(all_mean_similarities)
print(f"{np.mean(all_mean_similarities)=}")

-- Analyzing user 2 --
Pair: ('Optimal locations, food types, and food prices for attracting college students to eat at my restaurant.', 'Further developing the restaurant to expand, keep, and grow the customer base, in terms of the optimal locations, food types, and food prices.'), Similarity: 0.6738
-- Analyzing user 5 --
-- Analyzing user 7 --
Pair: ('Unsure of new trends for putting up a restaurant where college students can dine in.', 'Combating inevitable increase of prices because of inflation while still securing customers long-term.'), Similarity: 0.2917
Pair: ('Combating inevitable increase of prices because of inflation while still securing customers long-term.', 'Is using only social media to promote the new restaurant a good option?'), Similarity: 0.2791
Pair: ('Is using only social media to promote the new restaurant a good option?', 'Loyalty programs for encouraging students to dine in more at my potential restaurant business.'), Similarity: 0.4564
Pair: ('Loyalty progra

### Analysis Methodology



### Results and Discussion

#### User Analysis



### Interpretation Summary


### Conclusions


### Limitations


### Scratch code