In [32]:
from langchain.prompts import ChatPromptTemplate
from langchain_ollama import OllamaEmbeddings, OllamaLLM
from langchain_chroma import Chroma

import os, shutil

Preparing database and emebedding for queries

In [33]:
embeddings = OllamaEmbeddings(model='nomic-embed-text')
database = Chroma(persist_directory='./database', embedding_function=embeddings)

Template for prompt to LLM

In [34]:
PROMPT_TEMPLATE = """
Answer the question based only on the following context: {context}

---

Answer the question based only on the above context: {question}
"""

Finding relevant chunks to be fed to llm

In [44]:
#query texts for llm
query_text1 = 'What is Dynamic Time Warping?'
query_text2 = 'Is deep learning popular in predictive maintenance?'
query_text3 = 'What does Tor traffic mean and/or darknet?'
query_text4 = 'Is it difficult to work with coolant temperature data?'


In [45]:
model = OllamaLLM(model='llama3.1')

Testing LLM with prompts

In [46]:
results = database.similarity_search_with_relevance_scores(query_text1, k=4)
contexts = '\n\n'.join([result.page_content for result, _ in results])
template = ChatPromptTemplate.from_template(PROMPT_TEMPLATE)
prompt = template.format(context=contexts, question=query_text1)
response = model.invoke(prompt)
print(response)

Dynamic Time Warping (DTW) is a technique that aligns two sequences in a non-linear way to match each other's trends. The idea behind using DTW is that it aligns similar trends between two time series that are being compared, independent of the indices or length of the time series.


In [47]:
results = database.similarity_search_with_relevance_scores(query_text2, k=4)
contexts = '\n\n'.join([result.page_content for result, _ in results])
template = ChatPromptTemplate.from_template(PROMPT_TEMPLATE)
prompt = template.format(context=contexts, question=query_text2)
response = model.invoke(prompt)
print(response)

According to the provided context, yes, deep learning is gaining popularity in predictive maintenance. The analysis of works mentioned in this survey suggests that the use of deep learning will become more prevalent as newer methods are developed, such as attention mechanisms, which will enable identifying patterns over large volumes of data and increase accuracy.


In [48]:
results = database.similarity_search_with_relevance_scores(query_text3, k=4)
sources = [result.metadata.get('id') for result, _ in results]
contexts = '\n\n'.join([result.page_content for result, _ in results])
template = ChatPromptTemplate.from_template(PROMPT_TEMPLATE)
prompt = template.format(context=contexts, question=query_text3)
response = model.invoke(prompt)
print(response)
print(sources)

According to the context, "Tor traffic" refers to internet traffic that passes through the Onion Router (TOR) network, which is a system designed for anonymity. The TOR network allows users to browse the internet without being tracked or identified.

The "darknet", on the other hand, is defined as the part of the internet address space that does not interact with other computers in the world, and is often associated with illegal activities due to its anonymous nature. It is said to harbor malicious software and make it difficult to detect such traffic.

In this context, Tor traffic is a subset of darknet traffic, which can be malicious or legitimate.
['TAVo_Tor_Application_Detection_with_Voting_Critic.pdf-1:7', 'TAVo_Tor_Application_Detection_with_Voting_Critic.pdf-1:6', 'TAVo_Tor_Application_Detection_with_Voting_Critic.pdf-1:5', 'TAVo_Tor_Application_Detection_with_Voting_Critic.pdf-3:1']


In [49]:
results = database.similarity_search_with_relevance_scores(query_text4, k=4)
sources = [result.metadata.get('id') for result, _ in results]
contexts = '\n\n'.join([result.page_content for result, _ in results])
template = ChatPromptTemplate.from_template(PROMPT_TEMPLATE)
prompt = template.format(context=contexts, question=query_text4)
response = model.invoke(prompt)
print(response)
print(sources)

Yes, it appears that working with coolant temperature data was challenging due to its noisy and inconsistent nature. This is evident in several parts of the text, such as:

* The synthetic data failed to capture almost all of the temporal relations, resulting in flat and constantly around 90 degrees Celsius readings.
* The real data contained temperature dips.
* A larger sampling value (200) was needed to make the noisy coolant temperature data more interpretable.

This suggests that the coolant temperature data required special handling and processing to be useful for analysis.
['Vira_Gautam_2024_thesis.pdf-59:4', 'Predictive_Maintenance_by_Detection_of_Gradual_Faults_in_an_IoT-Enabled_Public_Bus.pdf-5:4', 'ACM_Journal_TOSN.pdf-9:3', 'ACM_Journal_TOSN.pdf-19:5']


In [50]:
results = database.similarity_search_with_relevance_scores('Are elephants capable of driving?', k=4)
contexts = '\n\n'.join([result.page_content for result, _ in results])
template = ChatPromptTemplate.from_template(PROMPT_TEMPLATE)
prompt = template.format(context=contexts, question='Are elephants capable of driving?')
response = model.invoke(prompt)
print(response)

There is no mention of elephants or their ability to drive in the provided context. The text appears to be discussing buses and a study on machine learning models for predicting vehicle condition, with no reference to elephants whatsoever.
