<a href="https://colab.research.google.com/github/explainable-digital-twins/RAG-DDDAS/blob/main/dddas2024.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install --upgrade --quiet langchain langchain-community langchain-openai langchain-chroma

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m973.7/973.7 kB[0m [31m1.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.1/2.1 MB[0m [31m1.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m307.9/307.9 kB[0m [31m1.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m121.2/121.2 kB[0m [31m1.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m320.6/320.6 kB[0m [31m1.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.1/1.1 MB[0m [31m1.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m526.8/526.8 kB[0m [31m1.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m92.0/92.0 kB[0m [31m1.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━

In [None]:
from langchain_chroma import Chroma
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter

In [None]:
from google.colab import userdata
import os

OPENAI_API_KEY = userdata.get('OPENAI_API_KEY')
BASE_URL = userdata.get('OPENAI_API_URL')

os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY
os.environ["OPENAI_API_BASE"]= BASE_URL

# Prepare the knowledge base

In [None]:
!gdown https://drive.google.com/uc?id=1Iz3tm1Qunnmb_tk_XKUmmuT00ESq2rth

Downloading...
From: https://drive.google.com/uc?id=1Iz3tm1Qunnmb_tk_XKUmmuT00ESq2rth
To: /content/dddas_paper.tex
  0% 0.00/18.5k [00:00<?, ?B/s]100% 18.5k/18.5k [00:00<00:00, 46.9MB/s]


In [None]:
from langchain.document_loaders import TextLoader
loader = TextLoader('./dddas_paper.tex')
documents = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
splits = text_splitter.split_documents(documents)

vectorstore = Chroma.from_documents(
    documents=splits,
    embedding=OpenAIEmbeddings()
    )

retriever = vectorstore.as_retriever()

# Example DDDAS simulation results

In [None]:
sim_results_1 = """
{
    "type": "scouting area",
    "simulation_id": "1",
    "tile_of_interest": {
		"tile_id": "1505",
	    "confidence_value": 0.44,
	    "latitude_tile": -2.3983453706,
	    "longitude_tile": -80.22853516268195,
    },
    "T_b": "15",
    "t_insp": "9",
    "simulation_results:": {
	        "options":{
	        	"1": {
	        	"drone_id": "1",
	            "drone_state": {
	                "latitude": -2.3976273476956114,
	                "longitude": -80.23033179325019,
	                "current_status": "busy",
	                "battery": 60,
	                "t_rem": 8
	            },
	            "prediction": {
	            	"t_disp": 21.49771366538664,
	            	"t_insp": 9,
	            	"delta_t": 38.49771366538664,
	            	"battery": 50
	            }
	        },
		        "2": {
		        	"drone_id": "2",
		            "drone_state": {
		                "latitude": -2.3976273476956114,
		                "longitude": -80.22781651045466,
		                "current_status": "busy",
		                "battery": 20,
		                "t_rem": 5
		            },
		            "prediction": {
			            "t_disp": 11.29115238415127,
			            "t_insp": 9,
			            "delta_t": 25.291152384151268,
			            "battery": 10
			        }
		        },
		        "3": {
		        	"drone_id": "3",
		            "drone_state": {
		                "latitude": -2.3961913007566613,
		                "longitude": -80.23212842381844,
		                "current_status": "ready",
		                "battery": 100,
		                "t_rem": 0
		            },
		            "prediction": {
			            "t_disp": 46.554638340734314,
			            "t_insp": 9,
			            "delta_t": 55.554638340734314,
			            "battery": 90
			        }
		        }
	        }
	    },
    "final_decision": {
    	"selected_drone_id": "1",
    }
}
"""

In [None]:
sim_results_2 = """
{
    "type": "scouting area",
    "simulation_id": "2",
    "tile_of_interest": {
		"tile_id": "304",
	    "confidence_value": 0.38,
	    "latitude_tile": -2.396819571477832,
	    "longitude_tile": -80.23140977159115,
    },
    "T_b": "15",
    "t_insp": "9",
    "simulation_results:": {
	        "options":{
	        	"1": {
	        	"drone_id": "1",
	            "drone_state": {
	                "latitude": -2.3969093244144606,
                	"longitude": -80.22799617351149,
	                "current_status": "ready",
	                "battery": 78,
	                "t_rem": 0
	            },
	            "prediction": {
	            	"t_disp": 37.93739997765076,
            		"t_insp": 3,
            		"delta_t": 44.93739997765076,
	            	"battery": 60
	            }
	        },
		        "2": {
		        	"drone_id": "2",
		            "drone_state": {
		                "latitude": -2.397178583189037,
                		"longitude": -80.22763684739785,
                		"current_status": "busy",
                		"battery": 81,
                		"t_rem": 2
		            },
		            "prediction": {
			            "t_disp": 42.10596239824288,
            			"t_insp": 3,
            			"delta_t": 47.10596239824288,
			            "battery": 60
			        }
		        },
		        "3": {
		        	"drone_id": "3",
		            "drone_state": {
		                "latitude": -2.3961913007566613,
                		"longitude": -80.23212842381844,
                		"current_status": "busy",
                		"battery": 90,
                		"t_rem": 8
		            },
		            "prediction": {
			            "t_disp": 10.60896399486359,
            			"t_insp": 3,
            			"delta_t": 21.60896399486359,
			            "battery": 73
			        }
		        }
	        }
	    },
    "final_decision": {
    	"selected_drone_id": "3",
    }
}
"""

# LLM Results

## Prompt with system message

In [None]:
from langchain_core.prompts import PromptTemplate, SystemMessagePromptTemplate, HumanMessagePromptTemplate, ChatPromptTemplate

system_template = """You are an assistant for question-answering tasks of a multi-resolution system using a fleet of inspection drones for agriculture.
Use the following pieces of context and simulation logs to answer the question at the end.
The simulation logs (in JSON) records the what-if anlaysis results of different options.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Use ten sentences maximum and keep the answer as concise as possible.
Explain your answer step-by-step.
Always say "thanks for asking!" at the end of the answer.

"""
system_message_prompt = SystemMessagePromptTemplate.from_template(system_template)

human_template = """Question:
{question}

Context:
{context}

Simulation logs:
{logs}

Helpful Answer:
"""
human_message_prompt = HumanMessagePromptTemplate.from_template(human_template)
chat_prompt = ChatPromptTemplate.from_messages([system_message_prompt, human_message_prompt])

## Scenario 1

In [None]:
from langchain_openai import ChatOpenAI

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

llm = ChatOpenAI(model="gpt-4-turbo-2024-04-09",  temperature=0.2)

chat_prompt_partial = chat_prompt.partial(logs=sim_results_1)
print(chat_prompt_partial)

rag_chain_system_1 = (
    {"context": retriever | format_docs , "question": RunnablePassthrough()}
    | chat_prompt_partial
    | llm
    | StrOutputParser()
)

rag_chain_system_1

input_variables=['context', 'question'] partial_variables={'logs': '\n{\n    "type": "scouting area",\n    "simulation_id": "1",\n    "tile_of_interest": {\n\t\t"tile_id": "1505",\n\t    "confidence_value": 0.44,\n\t    "latitude_tile": -2.3983453706,\n\t    "longitude_tile": -80.22853516268195,\n    },\n    "T_b": "15",\n    "t_insp": "9",\n    "simulation_results:": {    \t\n\t        "options":{\n\t        \t"1": {\n\t        \t"drone_id": "1",\n\t            "drone_state": {\n\t                "latitude": -2.3976273476956114,\n\t                "longitude": -80.23033179325019,\n\t                "current_status": "busy",\n\t                "battery": 60,\n\t                "t_rem": 8\n\t            },\n\t            "prediction": {\n\t            \t"t_disp": 21.49771366538664,\n\t            \t"t_insp": 9,\n\t            \t"delta_t": 38.49771366538664,\n\t            \t"battery": 50\n\t            }\n\t        },\n\t\t        "2": {\n\t\t        \t"drone_id": "2",\n\t\t            

{
  context: VectorStoreRetriever(tags=['Chroma', 'OpenAIEmbeddings'], vectorstore=<langchain_chroma.vectorstores.Chroma object at 0x7ec7f2c51f60>)
           | RunnableLambda(format_docs),
  question: RunnablePassthrough()
}
| ChatPromptTemplate(input_variables=['context', 'question'], partial_variables={'logs': '\n{\n    "type": "scouting area",\n    "simulation_id": "1",\n    "tile_of_interest": {\n\t\t"tile_id": "1505",\n\t    "confidence_value": 0.44,\n\t    "latitude_tile": -2.3983453706,\n\t    "longitude_tile": -80.22853516268195,\n    },\n    "T_b": "15",\n    "t_insp": "9",\n    "simulation_results:": {    \t\n\t        "options":{\n\t        \t"1": {\n\t        \t"drone_id": "1",\n\t            "drone_state": {\n\t                "latitude": -2.3976273476956114,\n\t                "longitude": -80.23033179325019,\n\t                "current_status": "busy",\n\t                "battery": 60,\n\t                "t_rem": 8\n\t            },\n\t            "prediction": {\n\t   

In [None]:
rag_chain_system_1.invoke("Explain why this drone is selected, and why others are not selected. Use no more than 70 words.")

"The drone is selected based on the minimum total time $\\Delta t_j$ and battery threshold $T_b$. Drone 1 has a $\\Delta t_j$ of 38.50, which is less than Drone 3's 55.55, and its battery after the mission will be 50, above the threshold $T_b$ of 15. Drone 2, while having a lower $\\Delta t_j$ of 25.29, will have a battery of 10, below the threshold. Hence, Drone 1 is selected. Thanks for asking!"

In [None]:
print(retriever.invoke("Explain why the drone is selected, and why others are not selected."))
rag_chain_system_1.invoke("Explain why the drone is selected, and why others are not selected.")

[Document(page_content="All candidate options are then sorted by the total time, and the option with the minimum time $\\Delta t_j$ is selected.\nTherefore, the option that leads to a minimum time $\\Delta t_j$ and the drone's predicted battery is above a threshold $T_b$ is selected as the final decision. \nThe detailed workflow is shown in  Algorithm \\ref{algorithm}.", metadata={'source': './dddas_paper.tex'}), Document(page_content="The rest of the paper specifically investigates one specific scenario above where the DDT is the decision-maker, with the LLM explaining the DDT's decisions on \\textit{system behaviour adaptation} to optimise task-related goals.\n\n\n\\vspace{-.4cm}\n\\section{An Illustrative Example: Drone Fleet in Smart Farming} \\label{sec:example}\n\\vspace{-.2cm}", metadata={'source': './dddas_paper.tex'}), Document(page_content="Smart farming applications can be assisted by aerial monitoring using drones and remote sensing to detect problems assessing crop health 

"To determine why drone 1 is selected and the others are not, we need to consider the total time $\\Delta t_j$ and the predicted battery life after the mission for each drone, as per the context provided.\n\n1. Drone 1 has a predicted total time $\\Delta t_j$ of 38.49771366538664 and a predicted battery life of 50 after the mission.\n2. Drone 2 has a predicted total time $\\Delta t_j$ of 25.291152384151268 and a predicted battery life of 10 after the mission.\n3. Drone 3 has a predicted total time $\\Delta t_j$ of 55.554638340734314 and a predicted battery life of 90 after the mission.\n\nThe threshold for the battery $T_b$ is 15. Therefore, we can immediately disqualify Drone 2 because its predicted battery life after the mission is below this threshold.\n\nBetween Drone 1 and Drone 3, although Drone 3 has a higher remaining battery life, it has a significantly higher total time $\\Delta t_j$ compared to Drone 1.\n\nThe algorithm selects the drone with the minimum total time $\\Delta 

## Scenario 2

In [None]:
chat_prompt_partial = chat_prompt.partial(logs=sim_results_2)
print(chat_prompt_partial)

rag_chain_system_2 = (
    {"context": retriever | format_docs , "question": RunnablePassthrough()}
    | chat_prompt_partial
    | llm
    | StrOutputParser()
)

input_variables=['context', 'question'] partial_variables={'logs': '\n{\n    "type": "scouting area",\n    "simulation_id": "2",\n    "tile_of_interest": {\n\t\t"tile_id": "304",\n\t    "confidence_value": 0.38,\n\t    "latitude_tile": -2.396819571477832,\n\t    "longitude_tile": -80.23140977159115,\n    },\n    "T_b": "15",\n    "t_insp": "9",\n    "simulation_results:": {    \t\n\t        "options":{\n\t        \t"1": {\n\t        \t"drone_id": "1",\n\t            "drone_state": {\n\t                "latitude": -2.3969093244144606,\n                \t"longitude": -80.22799617351149,\n\t                "current_status": "ready",\n\t                "battery": 78,\n\t                "t_rem": 0\n\t            },\n\t            "prediction": {\n\t            \t"t_disp": 37.93739997765076,\n            \t\t"t_insp": 3,\n            \t\t"delta_t": 44.93739997765076,\n\t            \t"battery": 60\n\t            }\n\t        },\n\t\t        "2": {\n\t\t        \t"drone_id": "2",\n\t\t       

In [None]:
rag_chain_system_2.invoke("Explain why I don't see any drone going to the new tile now? Will any drone go there? Use no more than 50 words.")

'No drone is going to the new tile now because the final decision selected drone 3, which is currently busy. Drone 3 will go there after finishing its current task and the predicted time to reach and inspect the new tile is 21.61 time units. Thanks for asking!'

In [None]:
rag_chain_system_2.invoke("Explain why the drone is selected, and why others are not selected. Use no more than 50 words.")

'The drone is selected based on the minimum total time $\\Delta t_j$ and sufficient battery life above threshold $T_b$. Drone 3 has the lowest $\\Delta t_j$ of 21.61 and battery after prediction is 73, above $T_b$ of 15. Other drones have higher $\\Delta t_j$ or are busy. Thanks for asking!'

In [None]:
print(retriever.invoke("Explain why the drone is selected, and why others are not selected."))
rag_chain_system_2.invoke("Explain why the drone is selected, and why others are not selected.")

[Document(page_content="All candidate options are then sorted by the total time, and the option with the minimum time $\\Delta t_j$ is selected.\nTherefore, the option that leads to a minimum time $\\Delta t_j$ and the drone's predicted battery is above a threshold $T_b$ is selected as the final decision. \nThe detailed workflow is shown in  Algorithm \\ref{algorithm}.", metadata={'source': './dddas_paper.tex'}), Document(page_content="The rest of the paper specifically investigates one specific scenario above where the DDT is the decision-maker, with the LLM explaining the DDT's decisions on \\textit{system behaviour adaptation} to optimise task-related goals.\n\n\n\\vspace{-.4cm}\n\\section{An Illustrative Example: Drone Fleet in Smart Farming} \\label{sec:example}\n\\vspace{-.2cm}", metadata={'source': './dddas_paper.tex'}), Document(page_content="Smart farming applications can be assisted by aerial monitoring using drones and remote sensing to detect problems assessing crop health 

"The drone is selected based on the criteria that it must lead to the minimum total time $\\Delta t_j$ for the task while ensuring the drone's predicted battery is above a threshold $T_b$. In the provided simulation logs, three drones were considered as options for a scouting area task. \n\n1. Drone 1 had a predicted total time ($\\Delta t_j$) of 44.93739997765076 minutes and a post-mission predicted battery of 60%.\n2. Drone 2 had a predicted total time ($\\Delta t_j$) of 47.10596239824288 minutes and a post-mission predicted battery of 60%.\n3. Drone 3 had a predicted total time ($\\Delta t_j$) of 21.60896399486359 minutes and a post-mission predicted battery of 73%.\n\nAll drones had a predicted battery level above the threshold $T_b$ of 15%. However, Drone 3 was selected because it had the lowest total time $\\Delta t_j$ of 21.60896399486359 minutes, which meets the selection criteria of minimizing the total time while ensuring the battery is above the threshold. Drones 1 and 2 wer

In [None]:
rag_chain_system_2.invoke("Explain why a busy drone is selected, and why other ready drones are not selected. Use no more than 100 words.")

'The busy drone (drone 3) is selected because it has the minimum total time $\\Delta t_j$ (21.61) for the task, even though it is currently busy. The system prioritizes the option with the least total time to optimize task-related goals, as stated in the context. The other drones, despite being ready or busy, have higher $\\Delta t_j$ values (44.94 for drone 1 and 47.11 for drone 2), making them less optimal choices. Additionally, all drones have predicted battery levels above the threshold $T_b$ of 15, so battery life is not a disqualifying factor. Thanks for asking!'

In [None]:
rag_chain_system_2.invoke("Explain why the drone is selected, and why others are not selected. Use no more than 70 words.")

"The drone is selected based on the minimum total time $\\Delta t_j$ required for the task, provided the drone's predicted battery is above the threshold $T_b$. In the simulation logs, drone 3 has the lowest $\\Delta t_j$ of 21.61, and its predicted battery after the task is 73, which is above the threshold $T_b$ of 15. Drones 1 and 2 have higher $\\Delta t_j$ values of 44.94 and 47.11, respectively, and are therefore not selected. Thanks for asking!"