# RAG skeleton 
In the following we'll have the skeleton of the RAG system. It is going to be a very basic implementation, that we are going to expand on later milestones.

In [1]:
import os
from pathlib import Path
from llama_index.llms.ollama import Ollama
from llama_index.core import VectorStoreIndex
from llama_index.core.embeddings import resolve_embed_model
from llama_index.readers.json import JSONReader
from llama_index.core.node_parser import JSONNodeParser
from llama_index.readers.file import FlatReader

#### Loading and Indexing
Load the data in order to make the documents' embeddings

In [None]:
# set a path to folder containing all the json files
DATA_PATH = "./data/"

# setting up reader, parser, and llm
reader = JSONReader()

# parser = JSONNodeParser()     # if we want to split the documents into nodes
llm = Ollama(model="mistral", request_timeout=60.0)

In [None]:
# creating the documents out of the json files
documents = []
for filename in os.listdir(DATA_PATH):
    if filename.endswith(".json"):
        file_path = os.path.join(DATA_PATH, filename)
        #documents.extend(FlatReader().load_data(Path(file_path)))     # if we want to load the data to then split it into nodes
        documents.extend(reader.load_data(input_file=file_path))

# nodes = parser.get_nodes_from_documents(documents)            # if we want to split documents into nodes


### Document splitting
In later works, we are going to split Documents (if big enough) to have nodes to work with

In [None]:
# parser = JSONNodeParser()
# nodes = parser.get_nodes_from_documents(documents)

### Querying strategy

In [None]:
embed_model = resolve_embed_model("local:BAAI/bge-m3")

In [9]:
# if we work with nodes
#vector_index = VectorStoreIndex.from_documents(nodes, embed_model=embed_model)

In [12]:
# if we work with documents
vector_index = VectorStoreIndex.from_documents(documents, embed_model=embed_model, show_progress=True)

Parsing nodes: 100%|████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 168.36it/s]
Generating embeddings: 100%|█████████████████████████████████████████████████████████████| 7/7 [00:21<00:00,  3.01s/it]


we use top-k similarity strategy to get the k most similar documents

In [13]:
query_engine = vector_index.as_query_engine(llm=llm, verbose=True, similarity_top_k=2)
retriever = vector_index.as_retriever(verbose=True)


### Evaluation
We test the RAG system with some queries regarding the data in the json files

In [14]:
result = query_engine.query("What was the average  of Assembly Machines?")
print(result)

 The average consumption for Testing Machine 1 is approximately 1.7630247e-06, while the average consumption for Riveting Machine is around 6.5934167e-05. It's important to note that these values are not directly comparable due to their different scales and units.


In [23]:
result = query_engine.query("What was the average consumption of machines?")
print(result)

 The average consumption for "Testing Machine 1" is 1.7630247097690623e-06 and for "Medium Capacity Cutting Machine 2", it is 0.0023731402496540857.


In [14]:
result = query_engine.query("List the conspumption for each machine in March 2024?")
print(result)

 The consumption for each machine in March 2024 is as follows:

1. Assembly Machine 1: 0.0
2. Assembly Machine 2: 0.0
3. Assembly Machine 3: 0.0
4. Large Capacity Cutting Machine 1: 0.0021111835419563543
5. Large Capacity Cutting Machine 2: 0.00152712326133702


In [17]:
result = query_engine.query("General Summarized Overview Large Capacity Cutting Machine 2?")
print(result)

 In March 2024, the Large Capacity Cutting Machine 2 has an average number of cycles per month of approximately 742, with a range from 0 to 18,860. The average cycle time is 0 seconds, with a minimum and maximum not provided. The machine had no bad cycles during this period.

The consumption for this machine averages at 0.0009503 kWh, with a range from 0 to 0.15497427 kWh. This includes an average idle consumption of 0.0006514 kWh and working consumption of 0.0015271 kWh.

The cost for this machine is approximately 0.0007899 USD, with no recorded costs related to idle or working time.

The number of good cycles in March was about 651, with a range from 0 to 18,860. The machine did not have any offline time or idle time during this period, and the working time is not provided. The power consumption averages at 0.0025818 kW, with a range from 0 to 0.0723988 kW.

The Medium Capacity Cutting Machine 2 has similar KPIs for the same month but different values due to differences in the machin

In [20]:
result = query_engine.query("Which machine has higher idle time")
print(result)

 The machine with higher idle time is "Assembly Machine 1". This can be inferred from the context data where the average idle time for "Assembly Machine 1" is 4175.258064516129 hours, while for "Large Capacity Cutting Machine 1", it is 0.0 hours.


In [13]:
retriever.retrieve("General Summarized Overview Assembly Machine 1?")


[NodeWithScore(node=TextNode(id_='7a3f1ecb-1cb2-4aa0-9cf9-e1ace4e882a2', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='6ee8dc74-411b-4b67-bba4-83d3ec586f9a', node_type=<ObjectType.DOCUMENT: '4'>, metadata={}, hash='34496ed108b9f8ca29ad1e0bf7cb60af6c22315f29d8d73ac50bdc3339d33446')}, text='"data_structure_overview": {\n"machines": [\n"category": "Metal Cutting Machines",\n"machines": [\n"Large Capacity Cutting Machine 1",\n"Large Capacity Cutting Machine 2",\n"Medium Capacity Cutting Machine 1",\n"Medium Capacity Cutting Machine 2",\n"Medium Capacity Cutting Machine 3",\n"Low Capacity Cutting Machine 1"\n"category": "Laser Welding Machines",\n"machines": [\n"Laser Welding Machine 1",\n"Laser Welding Machine 2"\n"category": "Assembly Machines",\n"machines": [\n"Assembly Machine 1",\n"Assembly Machine 2",\n"Assembly Machine 3"\n"category": "Testing Machines",\n"machines":

In [18]:
result = query_engine.query("Which one was more effective and productive: Medium Capacity machine 1 vs Medium Capacity machine 2?")
print(result)

 Based on the provided KPIs for both machines, we can calculate certain metrics to compare their effectiveness and productivity. However, please note that this comparison is based on the numerical data given in the context, and other factors such as the nature of work, maintenance requirements, etc., might influence their overall performance.

To make a comparison, we can focus on the following key points:

1. Number of good cycles: Medium Capacity Cutting Machine 1 had an average of 1935.8064516129032 good cycles per month, while Medium Capacity Cutting Machine 2 had an average of 1822.032258064516 good cycles per month. Machine 1 seems to have a slightly higher number of good cycles per month.

2. Consumption: The average consumption for Machine 1 is 0.0025908645679010044, while the average consumption for Machine 2 is 0.0023731402496540857. Machine 1 consumes slightly more resources on average compared to Machine 2.

3. Power: The average power consumed by Machine 1 is 0.00600611930