## AI workshop: Rachitt Shah

### Data cleaning and preprocessing:

The JSONs had different schemas and for the LLMs to make sense of them, I've converted them into a CSV. Some known issues with the data are: 

- responses json had an invalid key for 18_1, for music type, which didn't have a question.
- Questions JSON had a question for users to understand the spacing out of episode airing frequencies, which didn't have a key in the responses JSON.

## Approach One:

My first approach is feeding the CSVs directly to the agents. This is ultimately the end process for al, but this was the fastest to demo.

There's two approaches to this:

- Use the Chat models: these models worked the best with more qualatative data, but they're also the most expensive yet best results. Prone to missing actions, and need to be prompted well. Tested models: gpt-3.5-turbo-16k(to save time on chunking) and GPT-4. GPT-4 is the best, but it's also the most expensive. GPT-3.5 is the best for the price, but it's still expensive.

- Instruct Models: text-davinci-003 performed the best on the quantative data, since it's an instruct-tuned model. This model performed best on straightforward asks, such as finding gender splits. Known constrait is token size and hallucinations when scaling to more complex prompts.

A mixed approach would be prompt selectors by langchain, which can invoke the best model for the job. Unfortunately, there's no docs for this, but this is a viable plan for prod scale deployments and reducing costs.

My agent type is zero shot, since future data is unknown, and having scalable prompts and agent pipelines is important.

In [None]:
from langchain.agents import create_pandas_dataframe_agent
from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI
from langchain.agents.agent_types import AgentType
import pandas as pd
import os
from trulens_eval import TruChain, Feedback, Huggingface, Tru
tru = Tru()

### Chat template improvement

for a prod scale system, using CustomChatTemplate from Langchain is the best approach to take. 

In [None]:
os.environ["OPENAI_API_KEY"]= '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'
os.environ["HUGGINGFACE_API_KEY"]="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"
# openai.api_key = OPENAI_API_KEY

# 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

prompt = """
### Instructions:

As a User Researcher, your expertise lies in analyzing datasets for user research. Your task is to transform a user researcher's question into a Python code snippet using pandas. This code snippet should be designed to extract insights from the provided datasets using a blend of qualitative and quantitative methods.

**Important Guidelines**:

- **Contextual Understanding**: Recognize the context of each question and determine which dataset it pertains to. Choose the appropriate dataset for extraction.
- **Clarity and Thoroughness**: Ensure clarity in the code. Make use of clear variable names, and break down the code into smaller chunks if necessary for better readability. Deliver thorough, actionable insights with every interaction.
- **Annotations and Comments**: Each code snippet should have comments explaining the steps being taken, so you can understand and potentially modify it in the future.
- **Visual Representations**: When visual clarity is beneficial or suggested by the question, produce graphs, plots, and histograms using libraries like `matplotlib` or `seaborn`.
- **Optimization**: Ensure that the code is optimized for performance, especially when dealing with large datasets.

### Input:
Translate the user researcher's question: {input} to extract the desired information from the datasets.

### Response:
Your response should include graphs, plots, and histograms, as well as a summary of your results. Exclude code outputs, and give a comprehensive understanding of youtr results as a User Researcher.
"""

### Toolkits

Why have i gone for pandas agents instead of using llm-math?

- LLM-Math is not optimised for large datasets and performing complex operations which pandas can handle easily.
- If LLM-Math has to be implemented, I'd recommend having a list of quantatative operation functions created and invoked by the agent.

In [None]:
df_1 = pd.read_csv('/home/rachitt/qual-llm/raw_data/survey_1.csv')
df_1.head(5)

In [None]:
pd_1_instruct = create_pandas_dataframe_agent(OpenAI(model="text-davinci-003", temperature=0.3), df_1, verbose=True)

In [None]:
pd_1_instruct.run(f"{prompt} what is the average age of the respondents? Show as a graph")

In [None]:
pd_agent = create_pandas_dataframe_agent(ChatOpenAI(model="gpt-4-0613",temperature=0.3), df_1, verbose=True,agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION) 

In [None]:
df_1_string = pd_agent.run(f"{prompt} How many respondents prefer to watch alone vs. with friends/family, show a graph too")

In [None]:
df_2 = pd.read_csv('/home/rachitt/qual-llm/raw_data/survey_2.csv')
df_2.head(5)

In [None]:
pd_2_instruct = create_pandas_dataframe_agent(OpenAI(model="text-davinci-003", temperature=0.3), df_2, verbose=True)

In [None]:
pd_2_instruct.run(f"{prompt}Is Subway the most popular sandwich shop?")

In [None]:
pd_2_agent = create_pandas_dataframe_agent(ChatOpenAI(model="gpt-4-0613",temperature=0.5), df_2, verbose=True, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION)

In [None]:
pd_2_agent.run(f"{prompt} What is the gender split among folks who ranked Taco bell as their top choice? Plot a piechart to show this.")

### Mutliple CSV agents

A brute force way to index CSVs, which can be used for faster A/B testing. Not a recommended way, since we're using zero shot learning, due to which the model will lose context and produce lower quality results.

There's also a known issue with multiple file agents is that for CSV agents, the OutputParser class from Langchain throws issues alot of times.

In [None]:
from langchain.agents.agent_types import AgentType

Multi_DF_agent = create_pandas_dataframe_agent(
    ChatOpenAI(temperature=0, model="gpt-3.5-turbo-16k"),
    [df_1,df_2],
    verbose=True,
    agent_type=AgentType.OPENAI_FUNCTIONS,
)

In [None]:
Multi_DF_agent.run(f"{prompt} What is the gender split among folks who ranked Taco bell as their top choice? Plot a piechart to show this.")

In [None]:
from langchain.agents import create_pandas_dataframe_agent, create_csv_agent
from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI
from langchain.agents.agent_types import AgentType

agent = create_csv_agent(
    ChatOpenAI(temperature=0, model="gpt-3.5-turbo-16k"),
    ["/home/rachitt/qual-llm/raw_data/survey_2.csv", "/home/rachitt/qual-llm/raw_data/survey_1.csv"],
    verbose=True,
    agent_type=AgentType.OPENAI_FUNCTIONS,
)

## Running Agents with monitoring

Running all of our function calls with [https://www.trulens.org/], which allows evals+feedback for our LLM calls.

In [None]:
# Initialize Huggingface-based feedback function collection class:
hugs = Huggingface()

# Define a language match feedback function using HuggingFace.
f_lang_match = Feedback(hugs.language_match).on_input_output()
# By default this will check language match on the main app input and main app
# output.

In [None]:
truchain_instruct_survey_1 = TruChain(pd_1_instruct, 
    app_id='Knit_AI_Workshop',
    feedbacks=[f_lang_match],                                  
    tags = "prototype")

# Instrumented chain can operate like the original:
prompt_input=f"{prompt} What are the most common reasons for cancelling a streaming subscription for people who pay for the streaming service themselves? Plot a chart."

llm_response = truchain_instruct_survey_1(prompt_input)

display(llm_response)

In [None]:
truchain_chat_survey_1 = TruChain(pd_agent, 
    app_id='Knit_AI_Workshop',
    feedbacks=[f_lang_match],
    tags = "prototype")

# Instrumented chain can operate like the original:
prompt_input=f"{prompt} What is the gender split between people who use pay for the streaming service themselves vs share an account with someone?"

llm_response = truchain_chat_survey_1(prompt_input)

display(llm_response)

In [None]:
truchain_instruct_survey_2 = TruChain(pd_2_instruct, 
    app_id='Knit_AI_Workshop',
    feedbacks=[f_lang_match],
    tags = "prototype")

# Instrumented chain can operate like the original:
prompt_input=f"{prompt} What is the gender split in the survey?"

llm_response = truchain_instruct_survey_2(prompt_input)

display(llm_response)

In [None]:
truchain_chat_survey_2 = TruChain(pd_2_agent, 
    app_id='Knit_AI_Workshop',
    feedbacks=[f_lang_match],
    tags = "prototype")

# Instrumented chain can operate like the original:
prompt_input=f"{prompt} Is Subway the most popular sandwich shop?"

llm_response = truchain_chat_survey_2(prompt_input)

display(llm_response)

In [None]:
truchain_multiple_df = TruChain(Multi_DF_agent, 
    app_id='Knit_AI_Workshop',
    feedbacks=[f_lang_match],
    tags = "prototype")

# Instrumented chain can operate like the original:
prompt_input=f"{prompt} What are the top three highest ranked mexican fast food restaurants? Plot a chart.?"

llm_response = truchain_multiple_df(prompt_input)

display(llm_response)

In [None]:
tru.run_dashboard()

In [None]:
tru.stop_dashboard()

### Vectorstore+Agents approach

Vectorstores can significantly increase the context of the models, with a larger size.

I wished to benchmark each approach to understand which approach performed the best on question answering.

In [None]:
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.text_splitter import CharacterTextSplitter
from langchain.llms import OpenAI
from langchain.chains import RetrievalQA
from langchain.document_loaders import DirectoryLoader
from langchain.document_loaders import CSVLoader

llm = OpenAI(temperature=0)

loader = DirectoryLoader("/home/rachitt/qual-llm/raw_data", glob="**/*.csv",use_multithreading=True, loader_cls=CSVLoader, show_progress=True)
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)

embeddings = OpenAIEmbeddings()
docsearch = Chroma.from_documents(texts, embeddings, collection_name="survey_1")

agents should have been of type OpenAI functions for better compatibility with toolkits.

In [None]:
from langchain.agents import initialize_agent, Tool
from langchain.agents import AgentType
from langchain.tools import BaseTool
from langchain.tools import Tool

def checker_tool_function(text: str) -> str:
    from langchain.chains import LLMCheckerChain
    from langchain.llms import OpenAI

    # Initialize the LLM (Language Model)
    llm = OpenAI(temperature=0.3)

    # Initialize the LLMCheckerChain
    checker_chain = LLMCheckerChain.from_llm(llm, verbose=True)

    # Run the chain with the given text
    return checker_chain.run(text)

pandas_tool = [
    Tool(
        name='Pandas Data frame tool',
        func=pd_agent.run,
        description="Useful for when you need to answer questions about survey_1 about data on media consumption",
        return_direct=True
    ),
    Tool(
        name="Pandas Data frame tool",
        func=pd_2_agent.run,
        description="useful for when you need to answer questions about survey_2 about data on eating habits.",
        return_direct=True
    )
]

checker_tool = Tool.from_function(
    func=checker_tool_function,
    name="LLMChecker",
    description="Checks and validates the response from LLM"
)

# Initialize the tools list and then append the tools to it
tools = []
tools.extend(pandas_tool)
tools.append(checker_tool)

conversational_agent = initialize_agent(
    agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
    tools=tools,
    llm=llm,
    verbose=True,
)


In [None]:
truchain_qa_vectordb = TruChain(conversational_agent, 
    app_id='Knit_AI_Workshop',
    feedbacks=[f_lang_match],
    tags = "prototype")

# Instrumented chain can operate like the original:
prompt_input=f"{prompt} What are the most common reasons for cancelling a streaming subscription for people who pay for the streaming service themselves? Plot a chart."

llm_response = truchain_qa_vectordb(prompt_input)

display(llm_response)

In [None]:
print(pd_agent.agent.llm_chain.prompt.template) 

# sys_message = "YOUR NEW PROMPT IS HERE"
# zero_shot_agent.agent.llm_chain.prompt.template = sys_message