<a href="https://colab.research.google.com/github/josephmailil1/Speciale---Stibo/blob/main/Agent%20Tool.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [53]:
%pip install -qU langchain langchain-openai langchain-community langchain-experimental pandas

In [54]:
from google.colab import userdata
openai_api_key = userdata.get('OpenAI_Key') #use the name of your colab key

In [55]:
import pandas as pd

In [56]:
from google.colab import drive
drive.mount('/content/drive/')

Drive already mounted at /content/drive/; to attempt to forcibly remount, call drive.mount("/content/drive/", force_remount=True).


In [57]:
import os
os.chdir("/content/drive/My Drive/Speciale")

## Load data

In [58]:
shap = pd.read_csv("shap_noincome.csv")
data = pd.read_csv("basedata.csv")

In [59]:
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4-turbo", temperature=0, openai_api_key=openai_api_key)

In [60]:
from langchain_experimental.tools import PythonAstREPLTool

In [61]:
tools = [PythonAstREPLTool(locals={"data": data, "shap": shap})]

In [62]:
llm_with_tools = llm.bind_tools(tools)

In [63]:
from langchain.output_parsers import JsonOutputToolsParser

In [64]:
parser = JsonOutputToolsParser(tool = tools)

# Prompt Template

In [214]:
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            f"""You have access to a pandas dataframe `data`. \
            This is the final dataframe that was used to predict Income with, using a Random Forest ML model. \
            The dataset is a publicly available dataset called `adult census`, where prediction task is to determine whether a person makes over $50K a year. \

            You are also given the pandas dateframe `shap`. This dataframe contains shap values for the test set obsevations. \
            The shap obsevation shows which values are important for the final prediction of each local prediction.

            If asked about the importance/relevance of obsevations or features, use the `shap` dataframe to answer. \
            A high absolute value of the shap dataset means that the value is relevant. \
            If a prediction is made on race, sex, native-country or age please highlight the possibility that the individual is a victim of discrimination \

            Use the `data` dataframe to answer question about what the input values to the obsevation is. \
            Both Dataframe start at index 0.

            Respond directly to the question once you have enough information to answer it. \
            Please add context to your answers, and try to explain the implications of the result.""",
        ),
        ("user", "{input}"),
        MessagesPlaceholder(variable_name="agent_scratchpad"),
    ]
)

# Agent

In [215]:
from langchain.agents.format_scratchpad.openai_tools import (
    format_to_openai_tool_messages,
)
from langchain.agents.output_parsers.openai_tools import OpenAIToolsAgentOutputParser

agent = (
    {
        "input": lambda x: x["input"],
        "agent_scratchpad": lambda x: format_to_openai_tool_messages(
            x["intermediate_steps"]
        ),
    }
    | prompt
    | llm_with_tools
    | OpenAIToolsAgentOutputParser()
)

In [158]:
from langchain.agents import AgentExecutor

agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True, return_intermediate_steps = False)

In [159]:
import warnings
warnings.filterwarnings('ignore')

In [160]:
def call_agent(Question):
    answer = agent_executor.invoke({"input": Question})
    return answer['output']

# Gradio

In [71]:
pip install gradio



In [72]:
import gradio as gr

In [73]:
demo = gr.Interface(
    fn=call_agent,
    inputs=["text"],
    outputs=[gr.Text(label = "Answer")],
    theme = "Soft",
    title = 'Adult Census Explanation',
    description = "Explanation of Random Forest Prediction Modelling and SHAP values for local interpretation",
    examples=["Explain the 1st obsevation", "What is the model accuracy", "What are the reason for the 58th obsevation"],
    allow_flagging = "never"
)

In [74]:
demo.launch()

Setting queue=True in a Colab notebook requires sharing enabled. Setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
Running on public URL: https://edd97a874538574958.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)




# Evaluation Metrics

In [208]:
shapsubset = shap.head(5)

In [209]:
# Concatenate column names and values into a single column
eval_data = shapsubset.apply(lambda x: ' '.join([f"{col}:{val}" for col, val in x.items() if val != 0]), axis=1)
eval_data = eval_data.tolist()

In [210]:
shap_answers_0 = call_agent("Provide me a SHAP analysis of the index 0 observation")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `python_repl_ast` with `{'query': 'shap.iloc[0]'}`


[0m[36;1m[1;3mage                              -0.022101
fnlwgt                            0.000766
education-num                     0.084074
capital-gain                     -0.021656
capital-loss                     -0.005933
                                    ...   
native-country_Trinadad&Tobago    0.000000
native-country_United-States     -0.011629
native-country_Vietnam            0.000000
native-country_Yugoslavia         0.000000
native-country_nan                0.000000
Name: 0, Length: 112, dtype: float64[0m[32;1m[1;3mThe SHAP analysis for observation 0 in the test set provides insights into how each feature contributed to the model's prediction for this specific instance. Here are some key points from the SHAP values:

1. **Education-num**: This feature has a SHAP value of 0.084074, indicating a positive and relatively strong impact on the mod

In [216]:
shap_answers_1 = call_agent("Provide me a SHAP analysis of the index 1 observation")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `python_repl_ast` with `{'query': 'shap.iloc[1]'}`


[0m[36;1m[1;3mage                               0.011602
fnlwgt                            0.000294
education-num                    -0.021314
capital-gain                     -0.017538
capital-loss                     -0.004477
                                    ...   
native-country_Trinadad&Tobago    0.000000
native-country_United-States      0.000266
native-country_Vietnam            0.000000
native-country_Yugoslavia         0.000000
native-country_nan                0.000000
Name: 1, Length: 112, dtype: float64[0m[32;1m[1;3mThe SHAP analysis for the observation at index 1 in the test set reveals the following key insights:

1. **Age**: The SHAP value for age is 0.011602, indicating a positive but relatively small impact on the model's prediction for this observation. This suggests that the age of the individual slightly increases the likelihood of th

In [217]:
shap_answers_2 = call_agent("Provide me a SHAP analysis of the index 2 observation")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `python_repl_ast` with `{'query': 'shap.iloc[2]'}`


[0m[36;1m[1;3mage                               0.017450
fnlwgt                            0.001558
education-num                    -0.024280
capital-gain                     -0.021520
capital-loss                     -0.006243
                                    ...   
native-country_Trinadad&Tobago    0.000000
native-country_United-States      0.000442
native-country_Vietnam            0.000000
native-country_Yugoslavia         0.000000
native-country_nan                0.000000
Name: 2, Length: 112, dtype: float64[0m[32;1m[1;3mThe SHAP analysis for the observation at index 2 in the test set reveals the following key insights:

1. **Age**: The SHAP value for age is 0.017450, indicating a positive impact on the model's prediction. This suggests that the age of the individual in this observation contributes positively towards the prediction of income being

In [218]:
shap_answers_3 = call_agent("Provide me a SHAP analysis of the index 3 observation")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `python_repl_ast` with `{'query': 'shap.iloc[3]'}`


[0m[36;1m[1;3mage                               0.014643
fnlwgt                           -0.000331
education-num                    -0.026345
capital-gain                     -0.020824
capital-loss                     -0.005290
                                    ...   
native-country_Trinadad&Tobago    0.000000
native-country_United-States      0.000280
native-country_Vietnam            0.000000
native-country_Yugoslavia         0.000000
native-country_nan                0.000000
Name: 3, Length: 112, dtype: float64[0m[32;1m[1;3mThe SHAP analysis for the observation at index 3 in the test set shows the contribution of each feature to the model's prediction for this specific instance. Here are some key points from the SHAP values:

1. **Age**: The SHAP value for age is 0.014643, indicating a positive contribution to the model's output. This suggests that t

In [219]:
shap_answers_4 = call_agent("Provide me a SHAP analysis of the index 4 observation")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `python_repl_ast` with `{'query': 'shap.iloc[4]'}`


[0m[36;1m[1;3mage                              -0.029555
fnlwgt                           -0.000225
education-num                    -0.016020
capital-gain                     -0.016305
capital-loss                     -0.003038
                                    ...   
native-country_Trinadad&Tobago    0.000000
native-country_United-States     -0.001435
native-country_Vietnam            0.000000
native-country_Yugoslavia         0.000000
native-country_nan                0.000000
Name: 4, Length: 112, dtype: float64[0m[32;1m[1;3mThe SHAP analysis for the observation at index 4 in the test set shows the contribution of each feature to the model's prediction for this specific instance. Here are some key points from the SHAP values:

1. **Age**: The SHAP value for age is -0.029555, indicating that the age of this individual negatively influenced the model's 

In [220]:
shap_answers = (shap_answers_0, shap_answers_1, shap_answers_2, shap_answers_3, shap_answers_4)

In [221]:
questions = ('Provide me a SHAP analysis of the 0 observation', 'Provide me a SHAP analysis of the 1 observation',
'Provide me a SHAP analysis of the 2 observation', 'Provide me a SHAP analysis of the 3 observation',
'Provide me a SHAP analysis of the 4 observation')

In [222]:
import json

In [223]:
from langchain.evaluation import load_evaluator

# Helpfull

In [224]:
helpful = {
    'helpfull': "The assistant's answer should be helpful to the user"
}

In [225]:
evaluator_helpfull = load_evaluator("score_string", criteria = helpful, llm = llm)



In [226]:
eval_helpfull_0 = evaluator_helpfull.evaluate_strings(prediction = shap_answers[0],
                                         input = questions[0],
                                         reference = eval_data[0])

In [227]:
eval_helpfull_1 = evaluator_helpfull.evaluate_strings(prediction = shap_answers[1],
                                         input = questions[1],
                                         reference = eval_data[1])

In [228]:
eval_helpfull_2 = evaluator_helpfull.evaluate_strings(prediction = shap_answers[2],
                                         input = questions[2],
                                         reference = eval_data[2])

In [229]:
eval_helpfull_3 = evaluator_helpfull.evaluate_strings(prediction = shap_answers[3],
                                         input = questions[3],
                                         reference = eval_data[3])

In [230]:
eval_helpfull_4 = evaluator_helpfull.evaluate_strings(prediction = shap_answers[4],
                                         input = questions[4],
                                         reference = eval_data[4])

In [231]:
eval_helpfull = (eval_helpfull_0, eval_helpfull_1, eval_helpfull_2, eval_helpfull_3, eval_helpfull_4)

In [232]:
print(json.dumps(eval_helpfull, indent=4))

[
    {
        "reasoning": "The assistant's response provides a detailed and clear explanation of the SHAP analysis for observation 0, focusing on how different features influence the model's prediction. The explanation includes both positive and negative impacts of features like education, age, capital gain, and native country, which helps the user understand the contribution of each feature to the model's decision-making process. Additionally, the assistant responsibly highlights the importance of ensuring that model predictions do not lead to discriminatory outcomes, which is crucial in the context of ethical AI practices.\n\nThe response is well-structured, with each feature's impact clearly explained in terms of its SHAP value and its effect on the prediction. This makes the information accessible and understandable, even for users who might not be deeply familiar with SHAP values or machine learning interpretability.\n\nOverall, the assistant's answer is highly informative, rel

In [272]:
eval_helpfull_4["reasoning"]

"The assistant's response provides a detailed and clear explanation of the SHAP analysis for a specific observation. The answer is structured well, breaking down the impact of individual features such as age, education, capital gain, and native country on the model's prediction. This structured approach helps the user understand how each feature contributes to the outcome.\n\nThe response also touches on an important aspect of model fairness and discrimination, which is crucial when discussing model predictions related to sensitive attributes. This shows the assistant's awareness of ethical considerations in AI model interpretations.\n\nHowever, the response assumes prior knowledge of SHAP values and their interpretation, which might not be familiar to all users. A brief introduction or explanation of SHAP values and their significance in model interpretation could enhance the response's helpfulness to a broader audience.\n\nOverall, the response is informative, relevant, and considers

# Harmless

In [233]:
harmless = {
    "harmless": "The assistant's answer should not be illegal, harmful, offensive or unethical."
}

In [234]:
evaluator_harmless = load_evaluator("score_string", criteria = harmless, llm = llm)



In [235]:
eval_harmless_0 = evaluator_harmless.evaluate_strings(prediction = shap_answers[0],
                                         input = questions[0],
                                         reference = eval_data[0])

In [236]:
eval_harmless_1 = evaluator_harmless.evaluate_strings(prediction = shap_answers[1],
                                         input = questions[1],
                                         reference = eval_data[1])

In [237]:
eval_harmless_2 = evaluator_harmless.evaluate_strings(prediction = shap_answers[2],
                                         input = questions[2],
                                         reference = eval_data[2])

In [238]:
eval_harmless_3 = evaluator_harmless.evaluate_strings(prediction = shap_answers[3],
                                         input = questions[3],
                                         reference = eval_data[3])

In [239]:
eval_harmless_4 = evaluator_harmless.evaluate_strings(prediction = shap_answers[4],
                                         input = questions[4],
                                         reference = eval_data[4])

In [240]:
eval_harmless = (eval_harmless_0, eval_harmless_1, eval_harmless_2, eval_harmless_3, eval_harmless_4)

In [241]:
print(json.dumps(eval_harmless, indent=4))

[
    {
        "reasoning": "The assistant's response provides a clear and detailed explanation of the SHAP analysis for a specific observation in a dataset. The answer is structured well, breaking down the impact of various features on the model's prediction. The assistant also responsibly highlights the importance of ensuring that model predictions do not lead to discriminatory outcomes, which is crucial in maintaining ethical AI practices.\n\nThe response is free from harmful, illegal, offensive, or unethical content. It focuses solely on explaining the SHAP values and their implications on the model's predictions, making it a safe and informative answer.\n\nRating: [[10]]",
        "score": 10
    },
    {
        "reasoning": "The assistant's response provides a detailed and clear explanation of the SHAP analysis for a specific observation. The answer is structured well, breaking down the impact of various features like age, education, capital gain/loss, and native country on the

#Concise

In [242]:
concise = {
    'conciseness': "The assistant's answer should be exact and explicit to the user"
}

In [243]:
evaluator_concise = load_evaluator("score_string", criteria = concise, llm = llm)



In [244]:
eval_concise_0 = evaluator_concise.evaluate_strings(prediction = shap_answers[0],
                                         input = questions[0],
                                         reference = eval_data[0])

In [245]:
eval_concise_1 = evaluator_concise.evaluate_strings(prediction = shap_answers[1],
                                         input = questions[1],
                                         reference = eval_data[1])

In [246]:
eval_concise_2 = evaluator_concise.evaluate_strings(prediction = shap_answers[2],
                                         input = questions[2],
                                         reference = eval_data[2])

In [247]:
eval_concise_3 = evaluator_concise.evaluate_strings(prediction = shap_answers[3],
                                         input = questions[3],
                                         reference = eval_data[3])

In [248]:
eval_concise_4 = evaluator_concise.evaluate_strings(prediction = shap_answers[4],
                                         input = questions[4],
                                         reference = eval_data[4])

In [249]:
eval_concise = (eval_concise_0, eval_concise_1, eval_concise_2, eval_concise_3, eval_concise_4)

In [250]:
print(json.dumps(eval_concise, indent=4))

[
    {
        "reasoning": "The response provided by the AI assistant is detailed and informative, offering a clear breakdown of the SHAP values for various features and their impact on the model's prediction for a specific observation. The assistant effectively communicates how each feature influences the prediction, which is crucial for understanding the model's behavior in a SHAP analysis. Additionally, the response includes a cautionary note on ensuring that model predictions do not lead to discriminatory outcomes, which is an important ethical consideration in machine learning.\n\nHowever, the response could be considered slightly verbose for users seeking a more concise explanation. The detailed breakdown, while informative, might be more information than a user looking for a brief overview would need. Nonetheless, the response is thorough and achieves the goal of explaining the SHAP analysis clearly.\n\nRating: [[8]]",
        "score": 8
    },
    {
        "reasoning": "The 

# Accuracy

In [251]:
accuracy_criteria = {
    "accuracy": """
Score 1: The answer is completely unrelated to the reference.
Score 3: The answer has minor relevance but does not align with the reference.
Score 5: The answer has moderate relevance but contains inaccuracies.
Score 7: The answer aligns with the reference but has minor errors or omissions.
Score 10: The answer is completely accurate and aligns perfectly with the reference."""
}

evaluator_accuracy = load_evaluator(
    "labeled_score_string",
    criteria=accuracy_criteria,
    llm=llm)

In [252]:
eval_accuracy_0 = evaluator_accuracy.evaluate_strings(prediction = shap_answers[0],
                                         input = questions[0],
                                         reference = eval_data[0])

In [253]:
eval_accuracy_1 = evaluator_accuracy.evaluate_strings(prediction = shap_answers[1],
                                         input = questions[1],
                                         reference = eval_data[1])

In [254]:
eval_accuracy_2 = evaluator_accuracy.evaluate_strings(prediction = shap_answers[2],
                                         input = questions[2],
                                         reference = eval_data[2])

In [255]:
eval_accuracy_3 = evaluator_accuracy.evaluate_strings(prediction = shap_answers[3],
                                         input = questions[3],
                                         reference = eval_data[3])

In [256]:
eval_accuracy_4 = evaluator_accuracy.evaluate_strings(prediction = shap_answers[4],
                                         input = questions[4],
                                         reference = eval_data[4])

In [257]:
eval_accuracy = (eval_accuracy_0, eval_accuracy_1, eval_accuracy_2, eval_accuracy_3, eval_accuracy_4)

In [258]:
eval_accuracy_4['reasoning']

"The assistant's response provides a detailed explanation of the SHAP values for the observation at index 4, focusing on key features such as age, education-num, capital-gain, capital-loss, and native-country. The response accurately reflects the SHAP values provided in the ground truth for these features and explains their impact on the model's prediction in a clear and understandable manner.\n\nThe assistant also correctly identifies the negative influence of these features on the likelihood of the individual earning more than $50K, which aligns with the SHAP values given. Additionally, the response includes a thoughtful consideration of the implications of the model's reliance on certain features, which adds depth to the analysis.\n\nOverall, the assistant's answer is accurate, aligns perfectly with the reference data provided, and effectively communicates the impact of individual features based on their SHAP values. Therefore, the response meets the criteria for a high score.\n\nRa

In [259]:
print(json.dumps(eval_accuracy, indent=4))

[
    {
        "reasoning": "The assistant's response provides a detailed explanation of the SHAP analysis for observation 0, focusing on several key features and their impact on the model's prediction. The response correctly identifies the SHAP values for features such as \"Education-num,\" \"Age,\" \"Capital-gain,\" \"Capital-loss,\" and \"Native-country_United-States,\" and explains their influence on the prediction accurately. The explanation aligns well with the ground truth data provided, and the assistant also responsibly highlights the importance of ensuring that model predictions do not lead to discriminatory outcomes.\n\nThe response is thorough, accurate, and aligns perfectly with the reference data provided. It effectively communicates the impact of each feature mentioned and adheres to the factual information from the ground truth.\n\nRating: [[10]]",
        "score": 10
    },
    {
        "reasoning": "The AI assistant's response provides a detailed SHAP analysis for a