In [1]:
from langchain.prompts import ChatPromptTemplate
from langchain_ollama import OllamaEmbeddings, OllamaLLM
from langchain_chroma import Chroma

import os, shutil

Preparing database and emebedding for queries

In [2]:
embeddings = OllamaEmbeddings(model='nomic-embed-text')
database = Chroma(persist_directory='./database', embedding_function=embeddings)

Template for prompt to LLM

In [3]:
PROMPT_TEMPLATE = """
Answer the question based only on the following context: {context}

---

Answer the question based only on the above context: {question}
"""

Finding relevant chunks to be fed to llm

In [4]:
#query texts for llm
query_text1 = 'What is Dynamic Time Warping?'
query_text2 = 'Is deep learning popular in predictive maintenance?'
query_text3 = 'What does Tor traffic mean and/or darknet?'
query_text4 = 'Is it difficult to work with coolant temperature data?'


In [5]:
model = OllamaLLM(model='llama3.1')

Testing LLM with prompts

In [6]:
results = database.similarity_search_with_relevance_scores(query_text1, k=4)
contexts = '\n\n'.join([result.page_content for result, _ in results])
template = ChatPromptTemplate.from_template(PROMPT_TEMPLATE)
prompt = template.format(context=contexts, question=query_text1)
response = model.invoke(prompt)
print(response)

Dynamic Time Warping (DTW) is a technique that aligns two sequences in a non-linear way to match each other's trends [2, 24]. It is used for comparing time series or sequences without considering their indices or length.


In [7]:
results = database.similarity_search_with_relevance_scores(query_text2, k=4)
contexts = '\n\n'.join([result.page_content for result, _ in results])
template = ChatPromptTemplate.from_template(PROMPT_TEMPLATE)
prompt = template.format(context=contexts, question=query_text2)
response = model.invoke(prompt)
print(response)

Yes, according to the text, the use of deep learning will "gain popularity" as newer methods are developed (such as attention mechanisms) and will enable identifying patterns over large volumes of data, increasing accuracy of predictions. However, it's also mentioned that there is a trade-off between accuracy and computational effort required by deep learning methods.


In [8]:
results = database.similarity_search_with_relevance_scores(query_text3, k=4)
sources = [result.metadata.get('id') for result, _ in results]
contexts = '\n\n'.join([result.page_content for result, _ in results])
template = ChatPromptTemplate.from_template(PROMPT_TEMPLATE)
prompt = template.format(context=contexts, question=query_text3)
response = model.invoke(prompt)
print(response)
print(sources)

According to the provided context, Tor traffic refers to network traffic that originates from or passes through the Tor network, which is used for anonymous communication. The Darknet specifically refers to the part of the internet address space that does not interact with other computers in the world, often associated with illegal activities due to its anonymity features.
['TAVo_Tor_Application_Detection_with_Voting_Critic.pdf-1:7', 'TAVo_Tor_Application_Detection_with_Voting_Critic.pdf-1:6', 'TAVo_Tor_Application_Detection_with_Voting_Critic.pdf-1:5', 'TAVo_Tor_Application_Detection_with_Voting_Critic.pdf-3:1']


In [9]:
results = database.similarity_search_with_relevance_scores(query_text4, k=4)
sources = [result.metadata.get('id') for result, _ in results]
contexts = '\n\n'.join([result.page_content for result, _ in results])
template = ChatPromptTemplate.from_template(PROMPT_TEMPLATE)
prompt = template.format(context=contexts, question=query_text4)
response = model.invoke(prompt)
print(response)
print(sources)

Yes, according to the text, working with coolant temperature data was found to be particularly challenging due to its noisy and inconsistent nature. The synthetic data generated from this dataset failed to capture almost all of the temporal relations, resulting in flat and constant readings that did not accurately reflect the real data. In addition, a larger sampling value (200) was needed compared to other sensors to make the data more interpretable.
['Vira_Gautam_2024_thesis.pdf-59:4', 'Predictive_Maintenance_by_Detection_of_Gradual_Faults_in_an_IoT-Enabled_Public_Bus.pdf-5:4', 'ACM_Journal_TOSN.pdf-9:3', 'ACM_Journal_TOSN.pdf-19:5']


In [10]:
results = database.similarity_search_with_relevance_scores('Are elephants capable of driving?', k=4)
contexts = '\n\n'.join([result.page_content for result, _ in results])
template = ChatPromptTemplate.from_template(PROMPT_TEMPLATE)
prompt = template.format(context=contexts, question='Are elephants capable of driving?')
response = model.invoke(prompt)
print(response)

The provided context does not mention elephants at all. It appears to be a technical paper discussing bus engine torque systems and faulty buses. Therefore, it is not possible to answer whether elephants are capable of driving based on this context.


In [11]:
#Unlink database
database = None