# Testing LLM Capabilities for Convey - An Interactive Survey Interface

In this notebook, we will explore the capabilities of Large Language Models (LLMs) for our project in building an interactive survey interface. We'll focus on the following tasks:

## 1. RAG (Retrieval-Augmented Generation)
- Implementing and fine-tuning RAG for tasks such as responding and asking follow-up questions to users in a personalised manner.
- Exploring RAG's ability to provide relevant product-specific responses based on retrieval from a knowledge source.

## 2. Prompt Engineering
- Crafting effective prompts to guide the LLM's responses.
- Experimenting with different prompt formats and strategies to optimise performance.

## 3. Vector Store Manipulation
- Manipulating vector stores to enhance the understanding and generation capabilities of the LLM.
- Examining the impact of vector store modifications on the quality and relevance of generated responses.

We'll use this notebook to test various features and functionalities provided by the LLM and assess its suitability for the Convey platform.

# Getting Started

1. Create and activate a virtual environment before running the command below to install the necessary Python packages.
2. Create a hugging face api token and store it in the current working directory in a .env file as follows:

    HUGGINGFACEHUB_API_TOKEN="hf_***************"

In [None]:
#%pip install -r requirements.txt

# Import Packages

In [None]:
import os
from dotenv import load_dotenv, find_dotenv
from pathlib import Path
import pandas as pd

from langchain.docstore.document import Document
from langchain.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_community.llms import HuggingFaceEndpoint
from langchain.memory import ConversationBufferMemory
from langchain.prompts import PromptTemplate, ChatPromptTemplate
from operator import itemgetter
from langchain_core.runnables import RunnableParallel
from langchain_core.output_parsers import StrOutputParser, JsonOutputParser
from transformers import pipeline
from transformers.utils import logging

# Loading Hugging Face Hub API Token into OS

In [None]:
# Load API keys from local .env file if available
if os.path.isfile('.env'):
    # Set path to api key
    dotenv_path = Path('.env')
    load_dotenv(dotenv_path=dotenv_path)
else:
    load_dotenv(find_dotenv())

# Vector Store Using Survey Questions

## Defining Survey Questions and Creating Document Objects

In [None]:
demographic_questions = [
    #{'id': 1, 'question': "What is your name?", "check_user_response": 0},   # This question is taken out and assumed as the first survey question
    {'id': 2, 'question': "What is your age group?", "check_user_response": 0},
    {'id': 3, 'question': "what is your gender identity?", "check_user_response": 0},
]

# Creating Document objects for survey questions
demographic_documents = [
    Document(
        page_content=question['question'],
        metadata={
            "id": question['id'],
            "stage": -1,
            "check": question['check_user_response']
        }
    ) for question in demographic_questions
]

stage_0_questions = [
    {'id': 4, 'question': "What is your hair length?", "check_user_response": 0},
    {'id': 5, 'question': "What is your hair type?", "check_user_response": 0},
    {'id': 6, 'question': "What are your hair concerns?", "check_user_response": 0},
    {'id': 7, 'question': "What is your scalp type?", "check_user_response": 0},
    {'id': 8, 'question': "What are your scalp concerns?", "check_user_response": 0},
    {'id': 9, 'question': "What hair treatments have you done?", "check_user_response": 0},
]
# Creating Document objects for survey questions
stage_0_documents = [
    Document(
        page_content=question['question'],
        metadata={
            "id": question['id'],
            "stage": 0,
            "check": question['check_user_response']
        }
    ) for question in stage_0_questions
]

stage_1_questions = [
    {'id': 10, 'question': "How often do you wash your hair?", "check_user_response": 0},
    {'id': 11, 'question': "What hair products do you use regularly?", "check_user_response": 0},
    {'id': 12, 'question': "What hair styling products do you use regularly?", "check_user_response": 0},
    {'id': 13, 'question': "How often do you switch hair product brands?", "check_user_response": 0},
    {'id': 14, 'question': "How often do you visit hair salons or barber shops?", "check_user_response": 0},
    {'id': 15, 'question': "What is your ideal hair goal?", "check_user_response": 0},
    {'id': 16, 'question': "How important is hair health to you?", "check_user_response": 0},
]
stage_1_documents = [
    Document(
        page_content=question['question'],
        metadata={
            "id": question['id'],
            "stage": 1,
            "check": question['check_user_response']
        }
    ) for question in stage_1_questions
]

stage_2_questions = [
    {'id': 17, 'question': "Which of the following Pantene product series (collections) are you aware of?", "check_user_response": 0},
    {'id': 18, 'question': "From where did you know Pantene?", "check_user_response": 0},
    {'id': 19, 'question': "What is your favorite Pantene product and what do you like about it?", "check_user_response": 0},
    {'id': 20, 'question': "What is your least favorite Pantene product and what do you dislike about it?", "check_user_response": 0},
    {'id': 21, 'question': "How would you rate the overall effectiveness Pantene products?", "check_user_response": 0},
    {'id': 22, 'question': "Would you recommend your current hair products to others? Why?", "check_user_response": 1},
    {'id': 23, 'question': "What hair product improvements would you like to see in the future?", "check_user_response": 1},
]
stage_2_documents = [
    Document(
        page_content=question['question'],
        metadata={
            "id": question['id'],
            "stage": 2,
            "check": question['check_user_response']
        }
    ) for question in stage_2_questions
]

stage_3_questions = [
    {'id': 24, 'question': "When choosing hair products, how important are the following factors to you?", "check_user_response": 0},
    {'id': 25, 'question': "What is your preferred price range for hair products?", "check_user_response": 0},
    {'id': 26, 'question': "Do you prefer to purchase hair products online or in-store? If in-store, which stores?", "check_user_response": 1},
]
stage_3_documents = [
    Document(
        page_content=question['question'],
        metadata={
            "id": question['id'],
            "stage": 3,
            "check": question['check_user_response']
        }
    ) for question in stage_3_questions
]

In [None]:
# Define prompt templates
demographic_prompt_template = "Hey there! Welcome to the survey! We're thrilled to have you on board. Let's kick things off by getting to know you a little better. Please take a moment to answer the following demographic questions:\n{}"
stage_0_prompt_template = "Great! Now, let's talk about your hair care routine. We're here to make sure our products match your needs perfectly. Share your thoughts with us:\n{}"
stage_1_prompt_template = "Awesome! We're diving deeper into your hair care habits and preferences. Your feedback is invaluable in helping us improve. Let's get started:\n{}"
stage_2_prompt_template = "You're doing great! Now, we're eager to hear what you think about Pantene products. Your insights will shape our future offerings. Share your thoughts with us:\n{}"
stage_3_prompt_template = "Almost there! We're curious about your shopping preferences and priorities. Let's wrap up with a few more questions:\n{}"

# Few-shot examples
# Demographic Questions
demographic_few_shot_examples = [
    ("What is your age group?", "Under 18", "18-24", "25-34", "35-44", "45-54", "55-64", "Above 65"),
    ("What is your gender identity?", "Male", "Female", "Non-binary", "Prefer not to share")
]

# Stage 0 Questions
stage_0_few_shot_examples = [
    ("What is your hair length?", "Short", "Medium", "Long", "No hair"),
    ("What is your hair type?", "Curly", "straight", "wavy", "dry", "normal", "oily", "thin", "thick"),
    ("What are your hair concerns?", "Frizzy", "dry", "split ends", "hair loss", "breakage", "none", "others"),
    ("What is your scalp type?", "Oily", "dry", "normal"),
    ("What are your scalp concerns?", "Itchiness", "sensitive", "allergies", "dandruff", "dryness", "none", "others"),
    ("What hair treatments have you done?", "Keratin treatments", "dyed", "permed", "bleached", "none", "others")
]

# Stage 1 Questions
stage_1_few_shot_examples = [
    ("How often do you wash your hair?", "Daily", "several times a day", "every other day", "others"),
    ("What hair products do you use regularly?", "shampoo", "conditioner", "leave-in treatments", "hair masks"),
    ("What hair styling products do you use regularly?", "gel", "hair dryer", "flat iron", "curler", "mousses", "serums", "others"),
    ("How often do you switch hair product brands?", "every few months", "every year", "every few years", "I do not switch"),
    ("How often do you visit hair salons or barber shops?", "every few weeks", "every few months", "once a year", "I do not visit"),
    ("What is your ideal hair goal?", "Shiny", "healthy", "volume", "smoothness", "others"),
    ("How important is hair health to you?", "Very important", "1", "5", "7", "10")
]

# Stage 2 Questions
stage_2_few_shot_examples = [
    ("Which of the following Pantene product series (collections) are you aware of?", "Pantene Pro-V", "Hair Care Shampoo and Conditioner", "I don't know any"),
    ("From where did you know Pantene?", "TV commercials", "word of mouth", "retail shops", "social media", "others"),
    ("What is your favorite Pantene product and what do you like about it?", "Pro-V shampoo, makes my hair soft", "conditioner, smells nice"),
    ("What is your least favorite Pantene product and what do you dislike about it?", "Pantene conditioner, weighs down my hair", "conditioner, makes my hair fall"),
    ("How would you rate the overall effectiveness Pantene products?", "Highly effective", "1", "5", "7", "10"),
    ("Would you recommend your current hair products to others? Why?", "Yes, they make my hair feel great", "yes, they are affordabe", "no, there are better brands", "no, they made me drop more hair"),
    ("What hair product improvements would you like to see in the future?", "More natural ingredients", "cheaper", "more benefits in a product")
]

# Stage 3 Questions
stage_3_few_shot_examples = [
    ("When choosing hair products, how important are the following factors to you?", "natural or synthetic ingredients", "fragrance", "specific certifications", "specific claims", "price", "celebrity endorsements or influencer recommendations", "specific hair concerns", "long-lasting effects", "multi-functional benefits", "eco-friendly or sustainable packaging", "hair stylists for salon professionals", "packaging", "advertising campaigns or promotions"), 
    ("What is your preferred price range for hair products?", "under $10", "$10-50", "$50-100", "above $100"),
    ("Do you prefer to purchase hair products online or in-store? If in-store, which stores?", "Online, Amazon", "online, shopee", "in store, NTUC", "in-store, salons")
]

# Connect prompt templates for smooth conversation flow
demographic_prompt = demographic_prompt_template.format("\n".join([q['question'] for q in demographic_questions]))
stage_0_prompt = stage_0_prompt_template.format("\n".join([q['question'] for q in stage_0_questions]))
stage_1_prompt = stage_1_prompt_template.format("\n".join([q['question'] for q in stage_1_questions]))
stage_2_prompt = stage_2_prompt_template.format("\n".join([q['question'] for q in stage_2_questions]))
stage_3_prompt = stage_3_prompt_template.format("\n".join([q['question'] for q in stage_3_questions]))

# Define function to select prompt based on the stage of the survey
def get_prompt(stage):
    if stage == 0:
        return stage_0_prompt
    elif stage == 1:
        return stage_1_prompt
    elif stage == 2:
        return stage_2_prompt
    elif stage == 3:
        return stage_3_prompt
    else:
        return "Invalid stage number"

# Example usage:
current_stage = 0
current_prompt = get_prompt(current_stage)
print(current_prompt)

## Initialising an Embedding Model from Hugging Face

In [None]:
# Using an embedding model from Hugging Face
embedding_model = HuggingFaceEmbeddings(
    model_name='all-MiniLM-L6-v2', 
    model_kwargs={'device': 'cpu'},
    encode_kwargs = {'normalize_embeddings': False}
)

## Employing FAISS Vector Store

In [None]:
# Creating a vectorstore for the documents/survey questions
demographic_db = FAISS.from_documents(
    demographic_documents,
    embedding=embedding_model,
)
# Saving the vectorstore in local directory - persistence
demographic_db.save_local("demographic_questions")
# Loading the vectorstore from local directory
demographic_db = FAISS.load_local("demographic_questions", embedding_model, allow_dangerous_deserialization=True)

stage_0_db = FAISS.from_documents(
    stage_0_documents,
    embedding=embedding_model,
)
stage_0_db.save_local("stage_0_questions")
stage_0_db = FAISS.load_local("stage_0_questions", embedding_model, allow_dangerous_deserialization=True)

stage_1_db = FAISS.from_documents(
    stage_1_documents,
    embedding=embedding_model,
)
stage_1_db.save_local("stage_1_questions")
stage_1_db = FAISS.load_local("stage_1_questions", embedding_model, allow_dangerous_deserialization=True)

stage_2_db = FAISS.from_documents(
    stage_2_documents,
    embedding=embedding_model,
)
stage_2_db.save_local("stage_2_questions")
stage_2_db = FAISS.load_local("stage_2_questions", embedding_model, allow_dangerous_deserialization=True)

stage_3_db = FAISS.from_documents(
    stage_3_documents,
    embedding=embedding_model,
)
stage_3_db.save_local("stage_3_questions")
stage_3_db = FAISS.load_local("stage_3_questions", embedding_model, allow_dangerous_deserialization=True)

## Similarity Search

In [None]:
text = "30 years old"

demographic_db.similarity_search_with_score(text, k=1, filter=dict(category='demographics'))

# RAG Pipeline

## Initialising an Open-source LLM from Hugging Face 

In [None]:
# ENDPOINT_URL = "https://api-inference.huggingface.co/models/mistralai/Mixtral-8x7B-Instruct-v0.1"
ENDPOINT_URL = "mistralai/Mixtral-8x7B-Instruct-v0.1"

# callbacks = [StreamingStdOutCallbackHandler()]
llm = HuggingFaceEndpoint(
    endpoint_url=ENDPOINT_URL,
    task="text-generation",
    max_new_tokens=250,
    top_k=300,
    temperature=1,
    return_full_text=False,
    streaming=True,
    stop_sequences=['</s>'],
    # callbacks=callbacks,
)

## Creating a Retriever with Vector Store

In [None]:
def get_retriever(vectorstore: FAISS):
    # Setting retriever to only retrieve the best follow-up question 
    retriever = vectorstore.as_retriever(search_kwargs={'k': 1})
    return retriever

retriever = get_retriever(demographic_db)

## Simulating First Survey Question

In [None]:
# Ask the first question
def generate_first_question(question: str) -> str:
    prompt = ChatPromptTemplate.from_template("""
        [INST] Welcome the survey respondent to my survey on hair routines and hair products in a friendly and cheerful language. Ask the first question given:

        # Question:
        {question}

        [/INST]"""
    )
    chain = prompt | llm | StrOutputParser()
    output = chain.invoke({"question": question})
    return output

first_question = generate_first_question("What is your name?")
first_question

## Creating a Chat Log Object 

In [None]:
# Logging of chat
def create_chat_log():
    memory = ConversationBufferMemory(return_messages=False, memory_key='chat_history')
    return memory

def add_to_chat_log(chat_log, message_type: str, message: str):
    if message_type == 'ai':
        chat_log.chat_memory.add_ai_message(message)
    else:
        chat_log.chat_memory.add_user_message(message)

def get_chat_history(chat_log):
    chat_history = chat_log.load_memory_variables({})['chat_history']
    return chat_history


chat_log = create_chat_log()
add_to_chat_log(chat_log, message_type='ai', message=first_question)
get_chat_history(chat_log)

## Initialising RAG Chain

In [None]:
#from langchain_core.runnables import RunnableLambda - to be used for multiple arguments input

def get_rag_chain(retriever):
    # General prompt for all questions
    #prompt_template = """
    prompt = ChatPromptTemplate.from_template("""
        [INST] As a friendly survey interface assistant, your task is to respond to the user's survey response in a personalized and friendly manner but do not ask any questions here.
        Additionally, ask the follow-up question provided below.
    
        # Question:
        {previous_question}
        # User response:
        {user_response}
        # User sentiment:
        {sentiment}
        # Follow-up question:
        {next_question}

        Reply: [/INST]"""
    )
    
    # prompt = PromptTemplate(
    #     template=prompt_template, input_variables=['previous_question', 'user_response', 'next_question', 'sentiment']
    # )

    def format_docs(docs):
        return "\n\n".join(doc.page_content for doc in docs)
        # return "\n\n".join(doc.metadata['prompt'] + '\n' + doc.page_content for doc in docs)

    rag_chain = (
        # Retrieve next best question
        RunnableParallel({"docs": itemgetter("user_response") | retriever, "user_response": itemgetter("user_response"), "sentiment": itemgetter("sentiment"), "previous_question": itemgetter("previous_question")})
        # Optional: Format question to ask user
        | ({"docs": lambda x: x['docs'], "user_response": itemgetter("user_response"), "sentiment": itemgetter("sentiment"), "next_question": lambda x: format_docs(x['docs']), "previous_question": itemgetter("previous_question")})
        # Optional: Prompt Engineering - Each question to have their own prompt template for LLM to ask the question
        | ({"docs": lambda x: x['docs'], "prompt": prompt, "user_response": itemgetter("user_response"), "sentiment": itemgetter("sentiment"), "next_question": itemgetter("next_question"), "previous_question": itemgetter("previous_question")}) 
        # Output results
        | ({"answer": itemgetter("prompt") | llm | StrOutputParser(), "docs": lambda x: x['docs'], "user_response": itemgetter("user_response"), "sentiment": itemgetter("sentiment"), "previous_question": itemgetter("previous_question")})
    )
    return rag_chain 


rag_chain = get_rag_chain(retriever)

## Invoking RAG Chain with User Response to First Question

In [None]:
user_response = "I am Xiao Ming."
add_to_chat_log(chat_log, message_type='user', message=user_response)
get_chat_history(chat_log)

### Sentiment of user response

In [None]:
logging.set_verbosity_error() 

def get_user_sentiment(user_response: str):
    pipe = pipeline("text-classification", model="cardiffnlp/twitter-roberta-base-sentiment-latest")
    user_sentiment = pipe(user_response)[0]['label']
    return user_sentiment

user_sentiment = get_user_sentiment(user_response)
user_sentiment

In [None]:
def invoke_rag_chain(rag_chain, user_response: str, user_sentiment: str, previous_question: str):
    output = {}
    for chunk in rag_chain.stream(dict(user_response=user_response, sentiment=user_sentiment, previous_question=previous_question)):
        for key in chunk:
            if key not in output:
                output[key] = chunk[key].strip() if key == 'answer' else chunk[key]
            # if key == 'answer':
                # new_token = chunk[key]
                # yield new_token
                # output[key] += new_token
            else:
                output[key] += chunk[key]
            if key == 'answer':
                print(chunk[key], end="", flush=True)
    return output
    
def get_llm_outputs(rag_chain, user_response: str, previous_question: str):
    user_sentiment = get_user_sentiment(user_response)
    output = invoke_rag_chain(rag_chain, user_response, user_sentiment, previous_question)
    # LLM reply to output to frontend
    llm_reply = output['answer']
    # Get document of question asked by LLM 
    next_question_document = output['docs'][0]
    # id of question asked to output to frontend 
    next_question_id = next_question_document.metadata['id']
    return llm_reply, next_question_document, next_question_id


llm_reply, next_question_document, next_question_id = get_llm_outputs(rag_chain, user_response, first_question)

## Deleting Asked Question from Vector Store Object

In [None]:
def remove_question_from_db(vectorstore: FAISS, document_to_delete: Document):
    count = 0
    for key, item in vectorstore.docstore._dict.items():
        count += 1
        if item == document_to_delete:
            break
    if count >= 0:
        vectorstore.delete([vectorstore.index_to_docstore_id[count-1]])
    return vectorstore


print(len(demographic_db.docstore._dict))
demographic_db = remove_question_from_db(demographic_db, next_question_document)
print(len(demographic_db.docstore._dict))

## End Survey Chain

In [None]:
# LLM chain to end the survey:
def end_survey(user_response: str, question: str) -> str:
    # print("It was interesting to get to know more about you! Thank you for participating in the survey!")
    # print("If you have any further questions or feedback, feel free to reach out to us.")
    prompt = ChatPromptTemplate.from_template("""
        [INST] Respond kindly to the user's input to the given question below. Avoid asking further questions at this stage. Finally, thank the survey participant for their participation warmly in a clear and exaggerated tone.

        # User Response:
        {response}
        # Question:
        {question}

        [/INST]"""
                                            
    )
    chain = prompt | llm | StrOutputParser()
    output = chain.invoke({"response": user_response, "question": question})
    return output

# end_survey("Convenient to buy online", "Why buy online")

## Open Ended Questions

In [None]:
question = "What hair product improvements would you like to see in the future?"
question = "Would you recommend your current hair products to others? Why?"
question = "Do you prefer to purchase hair products online or in-store? If in-store, which stores?"

### Assess if Follow-Up Question is Necessary

In [None]:
def evaluate_response(user_response: str, question: str) -> dict:
    prompt = ChatPromptTemplate.from_template("""
        [INST] Evaluate whether a follow-up question is necessary based on the user's response to the given question. Provide a "Yes" if a follow-up question is necessary or "No" otherwise, along with a confidence score between 0.0 and 1.0, and the reasoning. Your response should be in the form of a JSON object with the keys "Assessment" and "Confidence" and "Reason".

        # User Response:
        {response}

        # Question:
        {question}

        [/INST]"""
    )
    chain = prompt | llm | JsonOutputParser()
    output = chain.invoke({"response": user_response, "question": question})
    return output

# response = "No, I don't like my products because they are consistently expensive, and despite the high cost, the quality is often subpar. Additionally, the products are notoriously difficult to find, which adds to the frustration of already dissatisfied customers. The combination of these factors makes it challenging to justify purchasing these products when there are more affordable and higher-quality alternatives available in the market. As a result, I am actively seeking alternative options that offer better value for money and a more satisfying shopping experience."
# response = "No"
response = "Online"

assessment = evaluate_response(response, question)
assessment

### Ask a Follow-up Question Based on User Response

In [None]:
def generateFollowUp(user_response: str, question: str):
    prompt = ChatPromptTemplate.from_template(
        """
        [INST] You are a follow-up question generator. You are to provide a follow up question based on the given the survey user response to the question asked.
        In clear and friendly tone and language, provide the follow-up question.
        
        # User Response:
        {response}
        
        # Question:
        {question}

        Follow-up question: [/INST]"""
    )
    chain = prompt | llm | StrOutputParser()
    output = chain.invoke({"response": user_response, "question": question})
    return output


if assessment["Assessment"] == "Yes":
    follow_up_q = generateFollowUp(response, question)
    print(follow_up_q)

### Evaluation Chain with Langchain Presets

In [None]:
from langchain.evaluation import Criteria

list(Criteria)

In [None]:
from langchain.evaluation import load_evaluator
from langchain.evaluation import EvaluatorType

evaluator = load_evaluator(EvaluatorType.CRITERIA, criteria="coherence", llm=llm)

In [None]:
response = "ahhahahah"
eval_result = evaluator.evaluate_strings(
        prediction=response,
        input='What is your gender identity?',
    )
eval_result

In [None]:
def verify_user_response(question,response):
    eval_result = evaluator.evaluate_strings(
        prediction=response,
        input=question,
    )

    return eval_result['value']

# Conversation Simulation

Make sure to run the above functions.

## Reload Vector Store From Local Directory

In [None]:
demographic_db = FAISS.load_local("demographic_questions", embedding_model, allow_dangerous_deserialization=True)
stage_0_db = FAISS.load_local("stage_0_questions", embedding_model, allow_dangerous_deserialization=True)
stage_1_db = FAISS.load_local("stage_1_questions", embedding_model, allow_dangerous_deserialization=True)
stage_2_db = FAISS.load_local("stage_2_questions", embedding_model, allow_dangerous_deserialization=True)
stage_3_db = FAISS.load_local("stage_3_questions", embedding_model, allow_dangerous_deserialization=True)

if os.path.exists('history.json'):
    os.remove('history.json')

## Begin Loop

In [None]:
# Get question asked
def get_question_asked(question_document):
    # Retrieve original question based on question_id
    return question_document.page_content

# get_question_asked(next_question_document)

In [None]:
stage = None # Change this for testing different stages
db = demographic_db
retriever = get_retriever(demographic_db)
question_asked = "What is your name?"
user_response = ""
next_question_document = None
clarified = False

first_question = generate_first_question("What is your name?")
print(f"LLM: {first_question}")

# Create a json file to store the survey history 
history = pd.DataFrame({'id': [1], 'question': ["What is your name?"], 'llm_question': [first_question], 'user_response': [""], 'stage': [-1]})
history.to_json("history.json", orient="records")

while True:
    # User responded
    if user_response:
        
        # Load in survey history
        history = pd.read_json("history.json")
        # Add user response to history
        history.loc[history.index[-1], "user_response"] = user_response

        # Check user response for questions that are specified to check
        if (next_question_document is not None) and (next_question_document.metadata['check'] == 1):
            # Check if a follow up question is needed based on user response and the question asked
            assessment = evaluate_response(user_response, question_asked)
            needFollowUp = True if assessment["Assessment"] == "Yes" else False
            # If need follow up question, ask the question again
            if needFollowUp:
                # Allow only one follow-up per question i.e. repeat the question once
                if clarified:
                    clarified = False
                    pass
                else:
                    clarified = True
                    # TO DO: Improve the instruction or construct a LLM chain to ask the question again.
                    follow_up_question = generateFollowUp(user_response, question_asked)
                    print('\n')
                    print(f"LLM: {follow_up_question}")
                    # Wait for user input
                    user_response = input()
                    print('\n')
                    print("User: ", end='')
                    print(user_response)

                    # Saving the question that the RAG chain has chosen to history
                    new_row = pd.DataFrame({'id': [next_question_id], 'question': [question_asked], 'llm_question': [follow_up_question], 'user_response': [""], 'stage': [next_question_document.metadata['stage']]})
                    history = pd.concat([history, new_row], ignore_index=True)
                    history.to_json("history.json", orient="records")
                    continue

        
        # Survey flow
        if len(db.docstore._dict) == 0 and stage is None:
            stage = 0
            db = stage_0_db
        elif len(db.docstore._dict) == 0 and stage == 0:
            stage = 1
            db = stage_1_db
        elif len(db.docstore._dict) == 0 and stage == 1:
            stage = 2
            db = stage_2_db
        if len(db.docstore._dict) == 0 and stage == 2:
            stage = 3
            db = stage_3_db
        elif len(db.docstore._dict) == 0 and stage == 3:
            history.to_json("history.json", orient="records")
            # To end the survey gracefully
            end = end_survey(user_response, question_asked)
            print('\n')
            print("LLM: {end}")
            break

        ## Ask the next best question based on previous survey user response
        # Create new retriever object with updated vectorstore
        retriever = get_retriever(db)
        # Create new RAG chain with updated retriever
        qa_chain = get_rag_chain(retriever)
        print('\n')
        print("LLM: ", end='')
        # Get LLM reply, next question to ask and its question id
        llm_reply, next_question_document, next_question_id = get_llm_outputs(qa_chain, user_response, question_asked)
        # Get question asked
        question_asked = get_question_asked(next_question_document)
        # Updated vectorstore with asked question removed
        db = remove_question_from_db(db, next_question_document)

        # Saving the question that the RAG chain has chosen to history
        new_row = pd.DataFrame({'id': [next_question_id], 'question': [next_question_document.page_content], 'llm_question': [llm_reply], 'user_response': [""], "stage": [next_question_document.metadata['stage']]})
        history = pd.concat([history, new_row], ignore_index=True)
        history.to_json("history.json", orient="records")

    # Wait for user input
    user_response = input()
    #user_response = f'The user\'s response is:{user_response}'
    print('\n')
    print("User: ", end='')
    print(user_response)


In [None]:
# My To Dos:
# Add updated survey first question chain in utils
# Add updated RAG prompt and LLM parameters in utils
# Add evaluation chain of open-ended questions in utils
# Add updated way in which history.json file is stored in utils
# Add updated end survey chain in utils
# Explore feature: adding conversation history into the RAG chain

## Update Database

In [None]:
import os
from dotenv import load_dotenv
import mysql.connector

load_dotenv()
mysql_root_password = os.getenv("MYSQL_ROOT_PASSWORD")

In [None]:
import json

#run api test to get history.json
with open('../../history.json', 'r') as file:
    hist = json.load(file)

In [None]:
#returns the n-th user_response in chat log
def get_r(hist, id):
    value = []
    for chat in hist:
        if chat['id'] == id:
            value.append(chat['user_response'])
    return ','.join(value)

#accepts chat log and updates database
def update_db(history):
    #connect to database
    db = mysql.connector.connect(
        host="localhost",
        port=3307,
        user="root",
        password=mysql_root_password,
    )
    mycursor = db.cursor()

    #add to database
    mycursor.execute("USE testdatabase")

    mycursor.execute(
        "INSERT INTO Stage_0(hair_length, hair_type, hair_concerns, scalp_type, scalp_concerns, hair_treatment) VALUES (%s,%s,%s,%s,%s,%s)", (get_r(history,4), get_r(history,5), get_r(history,6), get_r(history,7), get_r(history,8), get_r(history,9))
    )
    stage0_id = mycursor.lastrowid

    mycursor.execute(
        "INSERT INTO Stage_1(wash_frequency, hair_products, styling_products, prod_switch_freq, salon_freq, hair_goal, hair_health_importance) VALUES (%s,%s,%s,%s,%s,%s,%s)", (get_r(history,10), get_r(history,11), get_r(history,12), get_r(history,13), get_r(history,14), get_r(history,15),get_r(history,16))
    )
    stage1_id = mycursor.lastrowid

    mycursor.execute(
        "INSERT INTO Stage_2(pantene_prod, pantene_info, most_fav_product, least_fav_product, prod_effectiveness, prod_recommend, desired_ingredients) VALUES (%s,%s,%s,%s,%s,%s,%s)", (get_r(history,17), get_r(history,18), get_r(history,19), get_r(history,20), get_r(history,21), get_r(history,22),get_r(history,23))
    )
    stage2_id = mycursor.lastrowid

    mycursor.execute(
        "INSERT INTO Stage_3(important_factors, preferred_price_range, purchase_method) VALUES (%s,%s,%s)", (get_r(history,24), get_r(history,25), get_r(history,26))
    )
    stage3_id = mycursor.lastrowid

    mycursor.execute(
        "INSERT INTO Demographic(name, age, gender, stage0_id, stage1_id, stage2_id, stage3_id) VALUES (%s,%s,%s,%s,%s,%s,%s)", (get_r(history,1), get_r(history,2), get_r(history,3),stage0_id,stage1_id,stage2_id,stage3_id)
    )

    db.commit()

    mycursor.close()
    db.close()
    print('db updated')
    return


In [None]:
#test one response per question
test_json = [{"id":1,"question":"What is your name?","user_response":"the user's response is 'ti'"},{"id":3,"question":"what is your gender identity?","user_response":"the user's response is 'male'"},{"id":2,"question":"What is your age group?","user_response":"the user's response is '799'"},{"id":4,"question":"What is your hair length?","user_response":"the user's response is 'lengthy'"},{"id":5,"question":"What is your hair type?","user_response":"the user's response is 'brownian motion'"},{"id":7,"question":"What is your scalp type?","user_response":"the user's response is 'the second law of thermodynamics'"},{"id":6,"question":"What are your hair concerns?","user_response":"the user's response is 'e do be equal to mc squared'"},{"id":8,"question":"What are your scalp concerns?","user_response":"the user's response is 'eahahahahhaha'"},{"id":9,"question":"What hair treatments have you done?","user_response":"the user's response is 'no hair treatments'"},{"id":11,"question":"What hair products do you use regularly?","user_response":"the user's response is \"none\""},{"id":15,"question":"What is your ideal hair goal?","user_response":""},{"id":16,"user_response":"awwd"},{"id":12,"user_response":"a"},{"id":13,"user_response":"a"},{"id":14,"user_response":"a"},{"id":10,"user_response":"a"},{"id":17,"user_response":"a"},{"id":18,"user_response":"a"},{"id":19,"user_response":"a"},{"id":20,"user_response":"a"},{"id":21,"user_response":"a"},{"id":22,"user_response":"a"},{"id":23,"user_response":"a"},{"id":24,"user_response":"a"},{"id":25,"user_response":"a"},{"id":26,"user_response":"a"}]


In [None]:
#test multiple response for same question
test_double = [{"id":23,"question":"What hair product improvements would you like to see in the future?","llm_question":"I'm glad to hear that you're taking the time to share your thoughts with us! It seems like you're not aware of any Pantene product series at the moment. That's totally okay! We're always looking to improve and innovate, so I'm curious: what hair product improvements would you like to see in the future? Your input is truly valuable to us.","user_response":"None","stage":2},
    {"id":23,"question":"What hair product improvements would you like to see in the future?","llm_question":" I'm sorry for any confusion, but it seems like I didn't receive a response from you yet regarding the hair product improvements you'd like to see in the future. Your insights are valuable to us, so could you please share what changes or enhancements you'd like to see in hair products?","user_response":"Ok priec","stage":2}]

In [None]:
#connect to docker before running (docker start test-mysql)
update_db(hist)