# Exploration of RAG Application for PRISM
### Potential for PRISM Automated Feedback on Social Media Challenge Goals

This notebook is a proof of concept to see the potential use of a RAG-based application for users of PRISM
1. Base: Start with base query to ChatGPT to observe its output
2. Prompt: Add context prompt for ChatGPT to have more instructions in its output generation
3. RAG: Expand approach to include additional content using vector similarity 

In [1]:
from openai import OpenAI
import pandas as pd
import numpy as np

client = OpenAI()

In [2]:
# original goal question
goal_question = 'How would you like to see your behaviors change? Be specific when listing your goals and intentions. For example "I would like to use Instagram less" does not count - RATHER say something like - "I would like use Instagram for 20 mins per day or less and I will set a limit on my phone to achieve this goal." AND "When I am on Instagram I will only watch my close friends stories, and not watch any celebrities, influencers, or people otherwise unknown to me." Describe AT LEAST 3 concrete goals and intentions.'
print(goal_question)

How would you like to see your behaviors change? Be specific when listing your goals and intentions. For example "I would like to use Instagram less" does not count - RATHER say something like - "I would like use Instagram for 20 mins per day or less and I will set a limit on my phone to achieve this goal." AND "When I am on Instagram I will only watch my close friends stories, and not watch any celebrities, influencers, or people otherwise unknown to me." Describe AT LEAST 3 concrete goals and intentions.


## 1. Base LLM query

In [3]:
query = "My first goal is to delete reddit off my phone. My second goal is to limit my total phone screen time to 2 hours a day. My last goal is to be more intentional on social media."

In [4]:
response_base = client.chat.completions.create(
    model = "gpt-3.5-turbo",
    messages = [{"role":"user", "content":query}]
)
response_base

ChatCompletion(id='chatcmpl-AGse0YxI9hxXyrAmu89qboIl2SBl6', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content="These are great goals! To achieve your first goal of deleting Reddit off your phone, you can start by uninstalling the app and finding alternative ways to fill that time. For your second goal of limiting phone screen time to 2 hours a day, you can utilize screen time tracking apps and set daily limits for yourself. Lastly, to be more intentional on social media, you can start by unfollowing accounts that don't bring you value or joy, and setting boundaries for when and how often you use social media. Good luck on achieving your goals!", role='assistant', function_call=None, tool_calls=None, refusal=None))], created=1728586664, model='gpt-3.5-turbo-0125', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=111, prompt_tokens=48, total_tokens=159, prompt_tokens_details={'c

## 2. Add extra prompt instructions

In [6]:
response_prompt = client.chat.completions.create(
    model = "gpt-3.5-turbo",
    messages = [{"role":"system", "content":f"You are a friendly life coach helping a client improve their relationship with technology and social media. Your client is supposed to answer the following question: {goal_question}. Provide feedback on their answer."},
                {"role":"user", "content":query}]
)
response_prompt

ChatCompletion(id='chatcmpl-AGseFt3GZYo2sf2hsv8Q7ot1ZIsBD', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content="It's great to see that you have identified specific goals to improve your relationship with technology and social media. Deleting Reddit from your phone is a concrete step towards reducing your screen time and potential distractions. Limiting your total phone screen time to 2 hours a day is a measurable and achievable goal that can help you be more mindful of how you're spending your time. Being more intentional on social media is an excellent goal, but it may help to specify what being intentional means to you - such as engaging with posts that bring you joy or following accounts that align with your values. Setting clear boundaries and intentions will greatly benefit your overall well-being and productivity. Good luck with your goals, and remember to track your progress along the way!", role='assistant', function_call=None, t

## 3. Expand to add RAG functionality

In [7]:
# 1. get reference content for RAG application to use
# 2. clean the content
# 3. vectorize and store content
# 4. when user submits query, vectorize query
# 5. take dot product between vectorized query and each vectorized elt in content store
# 6. take top n=5 elts with largest dot product
# 7. retrieve raw text of those elts 
# 8. pass that text as context into LLM query
# 9. tada! 

### 1 & 2. Get and clean reference content

In [15]:
feedback_df = pd.read_csv("SMC Goals Feedback - Cleaned.csv")

In [16]:
feedback_df

Unnamed: 0,Stated Goals,Annie's Feedback
0,I would like to see my behaviors change by del...,This is a great start! Deleting your most prob...
1,- I will NOT click ignore on the time limit an...,"Okay, if you want to NOT click the time limit,..."
2,If and when i am on tick-tock i have a goal to...,These are a great place to start! Maybe spend ...
3,- When I am on Instagram I will only look at m...,This is a great start! Generally people who se...
4,I would like to limit myself to one YouTube v...,How long? Toward what goal? How will you decid...
...,...,...
66,I want to consume less mindless content on Tik...,This is a great goal! How wil you make sure yo...
67,- 1 hour on TikTok and Instagram Reels a day,Admirable goal! How will you know you were on ...
68,consume content that is more positive,Is there any way you could curate your feed to...
69,I will connect with my family through calling ...,Awesome goal! Make sure and set times when you...


In [17]:
import re

def clean_stated_goals(goal):
    goal = re.sub('[^0-9a-zA-Z]+', ' ', goal) #remove non alphanumeric
    goal = re.sub('\s+', ' ', goal) #remove duplicate spaces
    return goal
    

  goal = re.sub('\s+', ' ', goal) #remove duplicate spaces


In [18]:
feedback_df['Stated Goals'] = feedback_df['Stated Goals'].apply(clean_stated_goals)
feedback_df

Unnamed: 0,Stated Goals,Annie's Feedback
0,I would like to see my behaviors change by del...,This is a great start! Deleting your most prob...
1,I will NOT click ignore on the time limit any...,"Okay, if you want to NOT click the time limit,..."
2,If and when i am on tick tock i have a goal to...,These are a great place to start! Maybe spend ...
3,When I am on Instagram I will only look at my...,This is a great start! Generally people who se...
4,I would like to limit myself to one YouTube v...,How long? Toward what goal? How will you decid...
...,...,...
66,I want to consume less mindless content on Tik...,This is a great goal! How wil you make sure yo...
67,1 hour on TikTok and Instagram Reels a day,Admirable goal! How will you know you were on ...
68,consume content that is more positive,Is there any way you could curate your feed to...
69,I will connect with my family through calling ...,Awesome goal! Make sure and set times when you...


### 3. Vectorize and store content (get embeddings)

In [19]:
# get embeddings
def get_embedding(text, model="text-embedding-3-small"):
    return client.embeddings.create(input=[text], model=model).data[0].embedding

In [20]:
# feedback_df['Stated Goals Embedding'] = feedback_df['Stated Goals'].apply(get_embedding)
# feedback_df.to_csv("SMC Goals Feedback - Embeddings.csv", index=False)
# feedback_df.to_pickle("SMC Goals Feedback - Embeddings.pkl")

In [21]:
feedback_df = pd.read_pickle("SMC Goals Feedback - Embeddings.pkl")

In [22]:
feedback_df

Unnamed: 0,Stated Goals,Annie's Feedback,Stated Goals Embedding
0,I would like to see my behaviors change by del...,This is a great start! Deleting your most prob...,"[0.013152338564395905, 0.0013790624216198921, ..."
1,I will NOT click ignore on the time limit anym...,"Okay, if you want to NOT click the time limit,...","[0.006847853772342205, 0.010144710540771484, -..."
2,If and when i am on tick-tock i have a goal to...,These are a great place to start! Maybe spend ...,"[0.013891640119254589, 0.009084099903702736, -..."
3,When I am on Instagram I will only look at my ...,This is a great start! Generally people who se...,"[0.04316026717424393, 0.02773171290755272, -0...."
4,I would like to limit myself to one YouTube vi...,How long? Toward what goal? How will you decid...,"[0.011394932866096497, -0.01680963858962059, -..."
...,...,...,...
66,I want to consume less mindless content on Tik...,This is a great goal! How wil you make sure yo...,"[0.02413846179842949, -0.01985219307243824, -0..."
67,hour on TikTok and Instagram Reels a day,Admirable goal! How will you know you were on ...,"[0.012546670623123646, 0.004898448474705219, -..."
68,consume content that is more positive,Is there any way you could curate your feed to...,"[0.030527938157320023, -0.016606703400611877, ..."
69,I will connect with my family through calling ...,Awesome goal! Make sure and set times when you...,"[0.00023016663908492774, -0.001089050550945103..."


### 4. Vectorize user query (query embedding)

In [23]:
query_embedding = get_embedding(query)
# query_embedding

### 5. Take dot product between vectorized content and query embedding

In [24]:
# calculate dot product between stated goals embedding and my query
feedback_df['Dot Product'] = feedback_df['Stated Goals Embedding'].apply(lambda x: np.dot(x, query_embedding))

### 6. Choose closest vectors as determined by dot product

In [25]:
# the larger the dot product the closer the vectors, aka the close the semantic content
# so lets sort by dot product in descending order
feedback_df = feedback_df.sort_values(by='Dot Product', ascending=False)
feedback_df.head()

Unnamed: 0,Stated Goals,Annie's Feedback,Stated Goals Embedding,Dot Product
30,Goals:. - I would like to use Instagram for 30...,How will you enforce these new time limits? If...,"[-0.006893911864608526, 0.017961472272872925, ...",0.636281
23,I would like to cut my Instagram time to 30 mi...,This is a lot of change all at once. How much ...,"[-0.006098544225096703, 0.005079406779259443, ...",0.622786
19,I would like for social media not to affect my...,"This is an admirable goal, but how will you en...","[0.03488994017243385, -0.00036550185177475214,...",0.601755
0,I would like to see my behaviors change by del...,This is a great start! Deleting your most prob...,"[0.013152338564395905, 0.0013790624216198921, ...",0.594602
61,I want to reduce random scrolling on Instagram...,People who consult their screen time as the on...,"[-0.009930817410349846, 0.03150276094675064, -...",0.585196


In [26]:
# lets pull out Annie's Feedback for the top 3 results and pass that in as content for the LLM 
# to generate its answer from

### 7. Retrieve Annie's Feedback text of associated vectors 

In [27]:
top_3_annie_feedback = '\n'.join(list(feedback_df.head(3)["Annie's Feedback"].values))
top_3_annie_feedback

"How will you enforce these new time limits? If we want to succeed we need to have a replacement behavior and/or a strong reward system set up! How much are you on Instagram, reddit, and YikYak currently? What do you want to do with the new time you'll have? Knowing what you DO want to be doing, rather than only what you DON'T want to be doing will set you up for success! Great job including your intentions! It sounds like your main intent is to improve your mood by not viewing as much negative / hateful content. I recommend checking out some loving kindness meditations online! Brain changes can happen with as little as 5 mins a day!\nThis is a lot of change all at once. How much are you currently using instagram and snapchat? How will you enforce this new behaviors? When the time limit comes up, what will you do instead to ensure you don't just click ignore? If you want to completely stop using snapchat will you delete the app off your phone? Some people like to let their friends know

### 8. Submit retrieved feedback as additional content for LLM to generate feedback

In [30]:
response_rag = client.chat.completions.create(
    model = "gpt-3.5-turbo",
    messages = [{"role":"system", "content":f"You are a friendly life coach helping a client improve their relationship with technology and social media. Your client is supposed to answer the following question: {goal_question}. Provide feedback on their answer."},
                {"role":"user", "content":query},
                {"role":"assistant", "content": f"Match the tone of your response to the following: {top_3_annie_feedback}"}]
)
response_rag

ChatCompletion(id='chatcmpl-AGsi8BOh11GxFTxlWWR5swgK3liq1', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content="It's great to see that you have specific goals in mind to improve your relationship with technology and social media. To achieve these goals effectively, consider the following:\n\n1. For your first goal of deleting Reddit off your phone, make sure to replace this habit with a different activity or hobby that is more fulfilling and beneficial to you. Maybe consider reading a book, going for a walk, or practicing a hobby you enjoy.\n\n2. Limiting your total phone screen time to 2 hours a day is a significant change, so it's important to have a plan in place to enforce this limit. You can set app timers or use screen time management features on your phone to track and control your usage. When the time limit is reached, make sure to actively engage in other activities instead of mindlessly scrolling on your phone.\n\n3. Being more

# Comparing output

In [51]:
def extract_output(chatgpt_response):
    print(chatgpt_response.to_dict()['choices'][0]['message']['content'])

In [52]:
extract_output(response_base)

These are great goals! To achieve your first goal of deleting Reddit off your phone, you can start by uninstalling the app and finding alternative ways to fill that time. For your second goal of limiting phone screen time to 2 hours a day, you can utilize screen time tracking apps and set daily limits for yourself. Lastly, to be more intentional on social media, you can start by unfollowing accounts that don't bring you value or joy, and setting boundaries for when and how often you use social media. Good luck on achieving your goals!


In [53]:
extract_output(response_prompt)

It's great to see that you have identified specific goals to improve your relationship with technology and social media. Deleting Reddit from your phone is a concrete step towards reducing your screen time and potential distractions. Limiting your total phone screen time to 2 hours a day is a measurable and achievable goal that can help you be more mindful of how you're spending your time. Being more intentional on social media is an excellent goal, but it may help to specify what being intentional means to you - such as engaging with posts that bring you joy or following accounts that align with your values. Setting clear boundaries and intentions will greatly benefit your overall well-being and productivity. Good luck with your goals, and remember to track your progress along the way!


In [54]:
extract_output(response_rag)

It's great to see that you have specific goals in mind to improve your relationship with technology and social media. To achieve these goals effectively, consider the following:

1. For your first goal of deleting Reddit off your phone, make sure to replace this habit with a different activity or hobby that is more fulfilling and beneficial to you. Maybe consider reading a book, going for a walk, or practicing a hobby you enjoy.

2. Limiting your total phone screen time to 2 hours a day is a significant change, so it's important to have a plan in place to enforce this limit. You can set app timers or use screen time management features on your phone to track and control your usage. When the time limit is reached, make sure to actively engage in other activities instead of mindlessly scrolling on your phone.

3. Being more intentional on social media is a commendable goal. Consider curating your social media feeds to follow accounts that inspire and uplift you. Set boundaries for yourse

## Notes

#### Base
The base response thoroughly addresses the input, but it is simple feedback without much elaboration. However, I was impressed that it provided any feedback whatsoever given that I just gave it my goals without any context.

#### Prompt
When we provide ChatGPT with extra context about being a supportive life coach who gives feedback on goals its performance noticably improved. It correctly identifies that the final goal of "being more intentional" is not very specific and offers suggestions to make the goal more achievable. It even mentions the importance of aligning social media use with values. 

#### RAG
The RAG approach far exceeds the base and prompt approaches. The response enumerates each goal and provides feedback in turn. It suggests replacing time spent on reddit with a different, more fulfilling activity. It suggests having a plan in place for what to do once a time limit is reached on your phone so you can actively engage with something else. Finally, it commends the user for desiring more intentional time spent on social media, but proceeds to clarify what that process might look like (similar to the prompt response). The final lines of the response really sound like Annie, referring to the PRISM process as a 'journey', reminding the user to be kind to themselves, to celebrate their victories, before signing off with a final line of encouragement: 'Keep up the great work!'


### Final thoughts:
I think this proves the effectiveness of the RAG approach. The feedback generated by ChatGPT was more specific, more elaborated, and more similar in tone to Annie than the base or prompt-only approaches. We should continue testing with more diverse goals and edge cases, but, in my opinion, we should proceed with the development of a fully usable, user-friendly application.