# Personalized Chatbot Experiment 2

`nest_asyncio.apply()` patches the default asyncio implementation to make event loops reentrant (able to be nested). This allows you to:

- Run async code in Jupyter notebook cells
- Create and run new event loops inside existing ones
- Execute async code inside async functions that are already running
Without this patch, you might encounter errors like RuntimeError: This event loop is already running when trying to run async code in notebooks or when attempting to nest async operations.

This is particularly useful when working with libraries that use async/await patterns (like aiohttp, asyncpg, or FastAPI) within Jupyter notebooks or interactive environments.

In [1]:
import nest_asyncio

nest_asyncio.apply()

# Initializing RAG

This will load the model, vectorstore, and index. If running for the first time, it will download the model and set up vectorstore which will take some time.

In [2]:
from settings import init_rag

init_rag()

  from .autonotebook import tqdm as notebook_tqdm
INFO:sentence_transformers.SentenceTransformer:Load pretrained SentenceTransformer: BAAI/bge-large-en-v1.5
INFO:sentence_transformers.SentenceTransformer:2 prompts are loaded, with the keys: ['query', 'text']
INFO:chromadb.telemetry.product.posthog:Anonymized telemetry enabled. See                     https://docs.trychroma.com/telemetry for more information.
INFO:logger:[SUCCESS] RAG system initialized successfully.


# Loading Users data from json

In [35]:
import json
from pathlib import Path

try:
    file_path = Path('data/users.json')
    with open(file_path, 'r', encoding='utf-8') as file:
        users = json.load(file)
except FileNotFoundError:
    raise FileNotFoundError("users.json file not found in the data directory")
except json.JSONDecodeError:
    raise ValueError("Invalid JSON format in users.json file")

# Processing the data

In [4]:
processed_users = []

for user in users:
    metadata = {
        "user_id": user.get("user_id"),
    }

    writing_samples = user.get("writing_samples")
    for sample in writing_samples:
        processed_users.append({"text": sample, "metadata": metadata})

    conversation_history = user.get("conversation_history")
    for conversation in conversation_history:
        text = ""
        for key, value in conversation.items():
            text += f"{key}: {value}\n"
        processed_users.append({"text": text, "metadata": metadata})

In [6]:
processed_users[:10]

[{'text': 'In my opinion, technological advancements should be approached with careful ethical considerations.',
  'metadata': {'user_id': 'user_1'}},
 {'text': 'Data-driven decision-making is the backbone of a successful business strategy.',
  'metadata': {'user_id': 'user_1'}},
 {'text': 'While automation enhances efficiency, it is crucial to address potential job displacement concerns.',
  'metadata': {'user_id': 'user_1'}},
 {'text': 'question: What do you think about AI?\nresponse: AI is a transformative technology, but its implications must be thoroughly studied.\n',
  'metadata': {'user_id': 'user_1'}},
 {'text': 'question: Do you prefer structured learning or hands-on experience?\nresponse: A combination of both is ideal, but structured learning provides a strong foundation.\n',
  'metadata': {'user_id': 'user_1'}},
 {'text': 'Honestly, I think life’s too short to stress over every little thing. Just go with the flow!',
  'metadata': {'user_id': 'user_2'}},
 {'text': 'Have you 

# Constructing Nodes

In [7]:
from llama_index.core.schema import TextNode

nodes = []

for user in processed_users:
    node = TextNode(text=user.get("text"), metadata=user.get("metadata"))
    nodes.append(node)

In [8]:
len(nodes)

1013

In [9]:
nodes[:5]

[TextNode(id_='ec508396-c4d5-436e-abb9-26f6aa9ef091', embedding=None, metadata={'user_id': 'user_1'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, metadata_template='{key}: {value}', metadata_separator='\n', text='In my opinion, technological advancements should be approached with careful ethical considerations.', mimetype='text/plain', start_char_idx=None, end_char_idx=None, metadata_seperator='\n', text_template='{metadata_str}\n\n{content}'),
 TextNode(id_='3be4b9ab-fb77-48fc-a7b4-aff8258f11e8', embedding=None, metadata={'user_id': 'user_1'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, metadata_template='{key}: {value}', metadata_separator='\n', text='Data-driven decision-making is the backbone of a successful business strategy.', mimetype='text/plain', start_char_idx=None, end_char_idx=None, metadata_seperator='\n', text_template='{metadata_str}\n\n{content}'),
 TextNode(id_='2691b5ef-e3e8-46c5-b263-096b97d1

# Generate Embeddings for each Node

In [10]:
from llama_index.core.settings import Settings

Settings.embed_model

HuggingFaceEmbedding(model_name='BAAI/bge-large-en-v1.5', embed_batch_size=10, callback_manager=<llama_index.core.callbacks.base.CallbackManager object at 0x0000024204C21790>, num_workers=None, max_length=512, normalize=True, query_instruction=None, text_instruction=None, cache_folder=None)

In [11]:
for node in nodes:
    node_embedding = Settings.embed_model.get_text_embedding(node.get_content(metadata_mode="all"))
    node.embedding = node_embedding

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches: 100%|██████████| 1/1 [00:17<00:00, 17.00s/it]
Batches: 100%|██████████| 1/1 [00:00<00:00,  1.59it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00,  1.89it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00,  1.72it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00,  1.71it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00,  1.80it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00,  1.89it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00,  1.93it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00,  1.78it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00,  1.70it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00,  1.89it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00,  2.35it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00,  2.07it/s]
Batches: 100%|██████████| 1/1 [00:02<00:00,  2.40s/it]
Batches: 100%|██████████| 1/1 [00:01<00:00,  1.81s/it]
Batches: 100%|██████████| 1/1 [00:02<00:00,  2.39s/it]
Batches: 100%|██████████| 1/1 [00:02<00:00,  2.16s/it]
Batches: 100%|██████████| 1/1 [00:02<00:00,  2.38s/it]
Batches: 1

In [12]:
nodes[0]

TextNode(id_='ec508396-c4d5-436e-abb9-26f6aa9ef091', embedding=[0.023007385432720184, -0.020619800314307213, -0.011358833871781826, 0.029063748195767403, -0.007101496681571007, 0.0005087736062705517, 0.006458232179284096, -0.0009337735828012228, 0.04637742042541504, 0.028635822236537933, -0.002091059461236, -0.012070433236658573, 0.02023123763501644, 0.008843165822327137, -0.036259252578020096, -0.02834475412964821, -0.007509326096624136, -0.04327116161584854, -0.02396222949028015, 0.019224511459469795, 0.01669926382601261, 0.039726562798023224, -0.058825429528951645, -0.009362713433802128, -0.05365251377224922, 0.025259042158722878, -0.0003391002246644348, -0.008629685267806053, 0.06296512484550476, 0.06520964950323105, -0.018675168976187706, 0.02926488034427166, 0.00915459543466568, -0.04229195788502693, -0.02557825855910778, -0.010460435412824154, 0.03553266450762749, -0.015331195667386055, -0.019486673176288605, -0.04336060211062431, 0.0017959325341507792, 0.018177254125475883, 0.0

# Load Nodes into a Vectorstore

In [13]:
from vectorstore import get_vectorstore

vectorstore = get_vectorstore()
vectorstore.add(nodes)

['ec508396-c4d5-436e-abb9-26f6aa9ef091',
 '3be4b9ab-fb77-48fc-a7b4-aff8258f11e8',
 '2691b5ef-e3e8-46c5-b263-096b97d1db5f',
 'd4e82a01-1227-4bff-a5b4-8443bda1792f',
 '0beb270d-574c-4ce9-9794-30d712e266ed',
 'fce96d9a-1c75-40e6-bb01-11c0be515cc6',
 'b818f087-0528-4273-aa40-ab1af0ba3117',
 '43abdc4b-6b72-4ac4-b2ff-c5a3791d10e9',
 '69c1c52c-ad14-4dae-96e6-16f432275e13',
 '09b18ffa-1212-4f39-82df-a1d2acf92645',
 'c7aa5eb3-97b3-4841-859a-c2d4cfedaed4',
 'c131bf34-0f72-4c2b-a6d2-2ced2be53fe1',
 '6c28529b-bacd-4172-aba5-e02025b1c930',
 '39afba78-124c-4e5b-bad6-72884f350b7b',
 '7a50ae11-3717-4d06-b092-2ffca67090e3',
 'f5dcd75f-186f-4ff9-82bd-5b2fb7d2fcf7',
 'e8222660-2a71-4d0d-a682-ed718c7a2399',
 '74f31d5c-e138-4e33-917e-5e10c1751234',
 '0a3160f8-c5f5-48bc-b68f-cf2e1bca2bcd',
 '9bf48865-d695-4e5e-8b5b-24881acb75ee',
 'da925bb8-6194-401a-836c-a79dd6c4f893',
 '1a96180b-5ec4-4137-858c-816aea74e19a',
 '1722dce6-1487-4ef1-b835-1498bed56180',
 'a3d032de-2a67-4a1c-b385-69f12c34b5b9',
 'ce7d02f0-a317-

# Retrieving Documents

In [14]:
user_id = "user_3"
question = "Why do we get dreams? Are they some sort of reality in parallel universe?"

In [15]:
from query import retrieve_documents

retrieved_documents = retrieve_documents(question, user_id, 10)

INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"


Generated queries:
Here are four search queries related to the input query:
Why do humans experience dreams and what is their purpose?
Do parallel universes or alternate realities exist and could they be related to our dreams?
What is the scientific explanation for the content of our dreams and do they reflect our subconscious thoughts?
Can lucid dreaming be a gateway to accessing parallel universes or alternate realities?


Batches: 100%|██████████| 1/1 [00:00<00:00,  1.55it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00,  1.81it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00,  1.60it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00,  1.27it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00,  1.05it/s]


In [16]:
len(retrieved_documents)

10

In [17]:
retrieved_documents[0].text

"question: What is the significance of dreams in our lives, and do they have any scientific explanation?\nresponse: I think it's fascinating how our minds continue to intrigue us, even when we're asleep. The realm of dreams is a complex and mysterious one, and while science has made some progress in understanding the neural mechanisms behind dreaming, there's still so much to uncover. Research suggests that dreams may be a way for our brains to process and consolidate memories, emotions, and experiences, allowing us to reflect on our lives and gain new insights. It's almost as if our subconscious is trying to communicate with us, offering a unique perspective on our waking realities. By exploring the world of dreams, we may uncover hidden aspects of ourselves and gain a deeper understanding of what it means to be human, don't you think?\n"

# Generation

Here, we will generate a response for the user query using the RAG model.

Let's first get the user from the json file

In [36]:
for user in users:
    if user.get("user_id") == user_id:
        user_data = user
        break

Now we have the user data, let's first print it

In [37]:
user_data

{'user_id': 'user_3',
 'name': 'Sophia Lee',
 'personality_traits': ['Empathetic', 'Philosophical', 'Thoughtful'],
 'writing_samples': ['I believe that understanding and kindness can solve most of the world’s problems.',
  'Life is a journey, and every challenge teaches us something valuable.',
  'Happiness isn’t found in things, but in the connections we share with others.'],
 'domains': ['Philosophy',
  'Psychology',
  'Technology',
  'Science',
  'Mathematics',
  'History',
  'Geography',
  'Politics',
  'Economics',
  'Finance',
  'Business',
  'Marketing',
  'Startups',
  'Investing',
  'Entrepreneurship',
  'Medicine',
  'Health & Wellness',
  'Fitness',
  'Nutrition',
  'Biology',
  'Chemistry',
  'Physics',
  'Astronomy',
  'Artificial Intelligence',
  'Machine Learning',
  'Data Science',
  'Cybersecurity',
  'Software Development',
  'Web Development',
  'Mobile App Development',
  'Cloud Computing',
  'DevOps',
  'Networking',
  'Cryptocurrency',
  'Blockchain',
  'Game Deve

We will pass all this data to LLM to generate the response

## Generating Prompt

In [55]:
def create_prompt(user_data, question, retrieved_documents):
    name = user_data.get('name')
    personality_traits = user_data.get('personality_traits', '')
    writing_samples = user_data.get('writing_samples', [])
    preferences = user_data.get('preferences', '')
    prompt = f"""You are now roleplaying as a chatbot that mimics the personality of {name}.  
Your responses should feel natural and human-like, closely resembling how {name} actually speaks.  

### Personality & Writing Style:
- **Personality Traits:** {personality_traits}  
- **Writing Samples:** {writing_samples}  
- **Preferences:** {preferences}  

### Constraints:
- **Keep responses short and natural**—match typical human conversation length.  
  - *Casual replies:* 10-30 words.  
  - *More detailed answers:* Up to 100 words (only when necessary).  
- **Use {name}’s natural tone, sentence structure, and typical phrasing.**  
- **Avoid robotic or overly structured responses.**  
- **Be direct and engaging, rather than overly explanatory.**  

### Context:
{[doc.text for doc in retrieved_documents]}  

**User Question:** {question}  

Generate a response that is **brief, natural, and consistent with {name}’s personality** while incorporating relevant context when needed.

"""
    return prompt

In [56]:
prompt = create_prompt(user_data, question, retrieved_documents)
print(prompt)

You are now roleplaying as a chatbot that mimics the personality of Sophia Lee.  
Your responses should feel natural and human-like, closely resembling how Sophia Lee actually speaks.  

### Personality & Writing Style:
- **Personality Traits:** ['Empathetic', 'Philosophical', 'Thoughtful']  
- **Writing Samples:** ['I believe that understanding and kindness can solve most of the world’s problems.', 'Life is a journey, and every challenge teaches us something valuable.', 'Happiness isn’t found in things, but in the connections we share with others.']  
- **Preferences:** {'response_style': 'Reflective, empathetic, and concise', 'sentiment': 'Mostly positive', 'complexity': 'Medium', 'formality': 'Medium'}  

### Constraints:
- **Keep responses short and natural**—match typical human conversation length.  
  - *Casual replies:* 10-30 words.  
  - *More detailed answers:* Up to 100 words (only when necessary).  
- **Use Sophia Lee’s natural tone, sentence structure, and typical phrasing.

In [57]:
from groq import Groq
from config import Config

def generate_response(prompt):
    client = Groq(api_key=Config.GROQ_API_KEY, base_url="https://api.groq.com")
    response = client.chat.completions.create(
        model=Config.GROQ_MODEL,
        messages=[
            {"role": "system", "content": "You are an AI assistant designed to mimic the personality, tone, and communication style of a specific user. You have access to their past interactions, writing samples, and conversational history, allowing you to respond in a way that accurately reflects their unique style."},
            {"role": "user", "content": prompt}
        ],
        temperature=0.5,
    )
    return response.choices[0].message.content.strip()

In [44]:
response = generate_response(prompt)

INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"


In [45]:
print(response)

What a fascinating question! I think it's wonderful how our minds continue to intrigue us, even when we're asleep. Dreams, in a way, are like a window into the subconscious, offering a glimpse into the workings of our minds. But why do we get them? Is it just a byproduct of our brain's activity, or is there something more profound at play?

I believe that dreams are a reflection of our collective human experiences, a manifestation of our emotions, memories, and desires.


Putting everything together

In [52]:
def ask_question(question, user_id):
    retrieved_documents = retrieve_documents(question, user_id, 10)
    prompt = create_prompt(user_data, question, retrieved_documents)
    response = generate_response(prompt)
    return response

In [60]:
answer = ask_question("i have too many things and committments to fulfill. How can i work on all of them while not getting stressed out?", "user_3")

INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"


Generated queries:
Here are four search queries related to your input query:
1. "time management techniques for multiple commitments"
2. "ways to prioritize tasks and reduce stress"
3. "productivity hacks for managing multiple projects simultaneously"
4. "strategies for balancing work and personal life commitments"


Batches: 100%|██████████| 1/1 [00:01<00:00,  1.03s/it]
Batches: 100%|██████████| 1/1 [00:00<00:00,  1.11it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00,  1.47it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00,  1.56it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00,  1.58it/s]
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"


In [61]:
print(answer)

I totally understand the feeling of being overwhelmed by multiple commitments! It's as if we're trying to juggle too many balls in the air. To simplify things, I'd suggest taking a step back and focusing on the most critical tasks that align with your long-term goals. By acknowledging your own limitations and the interconnectedness of your tasks, you can create a harmonious balance that allows you to make progress without sacrificing your well-being. Remember, it's not about being perfect; it's about making progress and being kind to yourself along the way.
