# My Update
In this notebook I will expand the functionalities of the current RAG system implemented together with my group.

In [1]:
import json
import pandas as pd

from rag_service.rag_chat import rag_interaction, ChatMemory

# CHAT
Use case as chat: make general queries on the Knowledge Base and KPI Generation.

## Knowledge Base
Instead of getting the Knowledge Base data through the API, as we did for the project, I downloaded the data so I could have it locally and not have to log in every time to get it. The data in the KB consists of:
- a JSON file that contains the description of the available KPIs
- a JSON file that contains the list of machines with the corresponding KPIs
- a txt file with general information about the factory site
- 3 JSON files with the analysis carried out by Topic 3 (cost prediction, utilization, and energy efficiency).

In [2]:
file_path = "data/documentation.txt"

with open(file_path, "r") as file:
    documentation = file.read()

file_path = "data/kpis.json"

with open(file_path, "r") as file:
    kpi_data = json.load(file)

file_path = "data/machines.json"

with open(file_path, "r") as file:
    machine_data = json.load(file)

In [3]:
file_path = "data/costs.json"

with open(file_path, "r") as file:
    costs = json.load(file)

file_path = "data/energies.json"

with open(file_path, "r") as file:
    energies = json.load(file)

file_path = "data/utilizations.json"

with open(file_path, "r") as file:
    utilizations = json.load(file)

In [4]:
data = [kpi_data, machine_data, costs, energies, utilizations]
data.append({"type": "txt", "content": documentation}) # in this way we are able to include other file types to the kb

## Queries

In [7]:
query = "How many machines are there?"
response = rag_interaction(data, query)

Using chatbot mode...
Creating new vector index...


Parsing nodes:   0%|          | 0/6 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/6 [00:00<?, ?it/s]



In [8]:
print(response)

There are 16 machines in total.


---

In [9]:
query = "How many KPIs are there?"
response = rag_interaction(data, query)

Using chatbot mode...
Loading precomputed vector index...


In [10]:
print(response)

There are 14 key performance indicators (KPIs) in total.


---

In [11]:
query = "Are there KPIs related to consumption?"
response = rag_interaction(data, query)

Using chatbot mode...
Loading precomputed vector index...


In [12]:
print(response)

Yes, there are KPIs related to consumption. The consumption-related KPIs include "consumption", "consumption_working", and "consumption_idle".


---

In [13]:
query = "Show me the list of the KPIs available for the Assembly Machine 1"
response = rag_interaction(data, query)

Using chatbot mode...
Loading precomputed vector index...


In [14]:
print(response)

The available KPIs for the Assembly Machine 1 are: working_time, idle_time, offline_time, average_cycle_time, consumption, consumption_working, cost, cost_idle, good_cycles and bad_cycles.


---

In [15]:
query = "Generate a new KPI"
response = rag_interaction(data, query)

Using chatbot mode...
Loading precomputed vector index...


In [16]:
print(response)

{
  "KPIs": [
    {
      "name": "mean_average_cycle_time",
      "type": "Performance Metric",
      "description": "The average time taken by machines to complete their cycles",
      "unit_of_measure": "Minutes",
      "formula": "(working_time + idle_time) / 2"
    },
    {
      "name": "energy_efficiency_ratio",
      "type": "Efficiency Metric",
      "description": "The ratio of energy efficiency to consumption of machines",
      "unit_of_measure": "",
      "formula": "(1 - (consumption / energy_production)) * 100"
    }
  ]
}


# REPORT
I use a different Knowledge Base for the Report Generation use case. This is justified by the fact that this RAG should be integrated with a proper interface, and the user should be able to choose whether to `CHAT` or to make a `REPORT`. I assume that the user can choose a period of time and a way to aggregate the data in the dataset (the pkl file), together with the machines he is interested in. The Knowledge Base would be made of the data the user is interested in, so it is better not to store the embeddings in this case. 

In the following section, I will create a JSON file that could be the product of the choices of the user.

## Report Data
In this section I am going to extract a small part of the orginal dataset provided by the professors. I will use this subset of data to test the report generation functionality of the RAG system.

I considered 2 machines: the Riveting Machine and the Laser Cutter, in the working week that goes from the 14/10/2024 to the 18/10/2024. I aggregated the data doing the average, obtaining 28 rows of data.

In [None]:
df = pd.read_pickle('data/smart_app_data.pkl')
df.head()

Unnamed: 0,time,asset_id,name,kpi,sum,avg,min,max
0,2024-03-01T00:00:00Z,ast-yhccl1zjue2t,Large Capacity Cutting Machine 1,working_time,0.0,0.0,0.0,0.0
1,2024-03-01T00:00:00Z,ast-yhccl1zjue2t,Large Capacity Cutting Machine 1,idle_time,0.0,0.0,0.0,0.0
2,2024-03-01T00:00:00Z,ast-yhccl1zjue2t,Large Capacity Cutting Machine 1,offline_time,0.0,0.0,0.0,0.0
3,2024-03-01T00:00:00Z,ast-yhccl1zjue2t,Large Capacity Cutting Machine 1,consumption,0.066106,0.002321,0.0,0.066106
4,2024-03-01T00:00:00Z,ast-yhccl1zjue2t,Large Capacity Cutting Machine 1,power,,0.003673,0.0,0.012801


In [None]:
# Convert the "time" column to just the date (YYYY-MM-DD) format
df['time'] = pd.to_datetime(df['time']).dt.date
df.head()

Unnamed: 0,time,asset_id,name,kpi,sum,avg,min,max
0,2024-03-01,ast-yhccl1zjue2t,Large Capacity Cutting Machine 1,working_time,0.0,0.0,0.0,0.0
1,2024-03-01,ast-yhccl1zjue2t,Large Capacity Cutting Machine 1,idle_time,0.0,0.0,0.0,0.0
2,2024-03-01,ast-yhccl1zjue2t,Large Capacity Cutting Machine 1,offline_time,0.0,0.0,0.0,0.0
3,2024-03-01,ast-yhccl1zjue2t,Large Capacity Cutting Machine 1,consumption,0.066106,0.002321,0.0,0.066106
4,2024-03-01,ast-yhccl1zjue2t,Large Capacity Cutting Machine 1,power,,0.003673,0.0,0.012801


In [None]:
df['time'] = pd.to_datetime(df['time'])

In [None]:
# Keep only the time, name, kpi, and avg columns
df = df[['time', 'name', 'kpi', 'avg']]
df.head()

Unnamed: 0,time,name,kpi,avg
0,2024-03-01,Large Capacity Cutting Machine 1,working_time,0.0
1,2024-03-01,Large Capacity Cutting Machine 1,idle_time,0.0
2,2024-03-01,Large Capacity Cutting Machine 1,offline_time,0.0
3,2024-03-01,Large Capacity Cutting Machine 1,consumption,0.002321
4,2024-03-01,Large Capacity Cutting Machine 1,power,0.003673


In [None]:
# Consider only the working week that goes from the 2024-10-14 to the 2024-10-18
df_week = df[(df['time'] >= '2024-10-14') & (df['time'] <= '2024-10-18')]
df_week.head()

Unnamed: 0,time,name,kpi,avg
92482,2024-10-14,Large Capacity Cutting Machine 1,working_time,25406.0
92483,2024-10-14,Large Capacity Cutting Machine 1,idle_time,21791.0
92484,2024-10-14,Large Capacity Cutting Machine 1,offline_time,888.0
92485,2024-10-14,Large Capacity Cutting Machine 1,consumption,0.004759
92486,2024-10-14,Large Capacity Cutting Machine 1,power,0.004834


In [None]:
# Keep only two machines: Riveting Machine and Laser Cutter
df_week = df_week[(df_week['name'] == 'Riveting Machine') | (df_week['name'] == 'Laser Cutter')]
df_week.head()

Unnamed: 0,time,name,kpi,avg
92768,2024-10-14,Riveting Machine,working_time,30038.0
92769,2024-10-14,Riveting Machine,idle_time,17380.0
92770,2024-10-14,Riveting Machine,offline_time,2698.0
92771,2024-10-14,Riveting Machine,consumption,0.001
92772,2024-10-14,Riveting Machine,power,0.000131


In [None]:
# Aggregate the data by machine and kpi and consider the average over the week
df_week = df_week.groupby(['name', 'kpi']).mean()
df_week.reset_index(inplace=True)
df_week = df_week.drop(columns=['time'])
df_week

Unnamed: 0,name,kpi,avg
0,Laser Cutter,average_cycle_time,9.511204
1,Laser Cutter,bad_cycles,2.2
2,Laser Cutter,consumption,0.0
3,Laser Cutter,consumption_idle,0.0
4,Laser Cutter,consumption_working,0.0
5,Laser Cutter,cost,0.0
6,Laser Cutter,cost_idle,0.0
7,Laser Cutter,cost_working,0.0
8,Laser Cutter,cycles,1.0
9,Laser Cutter,good_cycles,553.6


In [None]:
# count how many rows of df_week have avg column equal to 0
len(df_week[df_week['avg'] == 0])

7

In [None]:
# Convert DataFrame to JSON
json_result = df_week.to_json(orient="records", indent=4)  

# Parse the JSON string back into a Python list
list_result = json.loads(json_result)

# Add metadata to the JSON result
result = {
    "metadata": {
        "working_week": "2024-10-14 to 2024-10-18",
        "description": "The average values are calculated over the days within the specified week."
    },
    "data": list_result
}

#### Save JSON to a file

In [None]:
"""# Save JSON to a file
with open("data/report_data.json", "w") as f:
    json.dump(result, f, indent=4)"""

## Report Generation

In [20]:
file_path = "data/report_data.json"

with open(file_path, "r") as file:
    report_data = json.load(file)

In [21]:
data = []
data.append({"type": "json", "content": report_data})

In [22]:
response = rag_interaction(data, generate_report=True)

Generating report with dynamic KB...


Parsing nodes:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]

In [23]:
print(response)

**Industrial Machine Performance Report**

**Time Period Covered:** October 14, 2024 - October 18, 2024

This report provides an in-depth analysis of the performance of two industrial machines: Laser Cutter and Riveting Machine. The analysis is based on key performance indicators (KPIs) such as average cycle time, consumption, cost, idle time, offline time, power usage, working time, good cycles, bad cycles, and cycles.

**Machine Performance Comparison**

| KPI | Laser Cutter | Riveting Machine |
| --- | --- | --- |
| Average Cycle Time | 9.51 seconds (avg) | 3.31 seconds (avg) |
| Consumption | 0.00 units (low) | 0.001 units (low) |
| Cost | 0.00 units (zero) | 0.001 units (low) |
| Idle Time | 19,150.2 hours (high) | 26,285.8 hours (high) |
| Offline Time | 327.2 hours (moderate) | 703.0 hours (moderate) |
| Power Usage | 0.00012 units (very low) | 0.00012 units (very low) |
| Working Time | 7,144.9 hours (high) | 4,715.8 hours (moderate) |

**Key Insights and Recommendations**

* T

---

### Let's visualize the response!

---

**Industrial Machine Performance Report**

**Time Period Covered:** October 14, 2024 - October 18, 2024

This report provides an in-depth analysis of the performance of two industrial machines: Laser Cutter and Riveting Machine. The analysis is based on key performance indicators (KPIs) such as average cycle time, consumption, cost, idle time, offline time, power usage, working time, good cycles, bad cycles, and cycles.

**Machine Performance Comparison**

| KPI | Laser Cutter | Riveting Machine |
| --- | --- | --- |
| Average Cycle Time | 9.51 seconds (avg) | 3.31 seconds (avg) |
| Consumption | 0.00 units (low) | 0.001 units (low) |
| Cost | 0.00 units (zero) | 0.001 units (low) |
| Idle Time | 19,150.2 hours (high) | 26,285.8 hours (high) |
| Offline Time | 327.2 hours (moderate) | 703.0 hours (moderate) |
| Power Usage | 0.00012 units (very low) | 0.00012 units (very low) |
| Working Time | 7,144.9 hours (high) | 4,715.8 hours (moderate) |

**Key Insights and Recommendations**

* The Laser Cutter has a significantly shorter average cycle time compared to the Riveting Machine, indicating potential for process optimization.
* Both machines have very low consumption rates, suggesting efficient energy usage.
* However, the Riveting Machine has a lower idle time, which may indicate a more productive work schedule.
* The offline and working times suggest varying levels of downtime and productivity across both machines.
* Power usage is extremely low in both cases, indicating efficient operation.

**Actionable Recommendations**

1. **Optimize Process**: Analyze and optimize the process to reduce average cycle time for the Laser Cutter.
2. **Monitor Energy Consumption**: Regularly monitor energy consumption to ensure it remains at optimal levels.
3. **Improve Productivity**: Investigate ways to increase productivity of the Riveting Machine, particularly during working times.

This report provides a comprehensive analysis of machine performance and offers actionable recommendations to improve efficiency, reduce waste, and optimize resource allocation.

# MEMORY
Exploit the memory functionality.

In [17]:
# Use the KB of the chat use case: run the cells in CHAT > Knowledge Base 
data

[{'type': 'json',
  'description': 'List of KPIs categorized by type and including their units of measure',
  'content': {'kpis': {'Time': [{'name': 'working_time',
      'unit_of_measure': 'Minutes'},
     {'name': 'idle_time', 'unit_of_measure': 'Minutes'},
     {'name': 'offline_time', 'unit_of_measure': 'Minutes'},
     {'name': 'average_cycle_time', 'unit_of_measure': 'Minutes'}],
    'Consumption': [{'name': 'consumption', 'unit_of_measure': 'kWh'},
     {'name': 'consumption_working', 'unit_of_measure': 'kWh'},
     {'name': 'consumption_idle', 'unit_of_measure': 'kWh'}],
    'Efficiency': [{'name': 'power', 'unit_of_measure': 'Percentage'},
     {'name': 'cycles', 'unit_of_measure': 'Units'}],
    'Cost': [{'name': 'cost', 'unit_of_measure': 'Currency'},
     {'name': 'cost_working', 'unit_of_measure': 'Currency'},
     {'name': 'cost_idle', 'unit_of_measure': 'Currency'}],
    'Quality': [{'name': 'good_cycles', 'unit_of_measure': 'units'},
     {'name': 'bad_cycles', 'unit_of

### Test

In [18]:
memory = ChatMemory() # Start a session

query = "Are there metal cutter machines?"
response = rag_interaction(data, query, memory=memory)
print(response)

Using chatbot mode...
Loading precomputed vector index...
Yes, there are metal cutter machines in the knowledge base. In fact, it's mentioned that there are 6 types of metal cutters in total. Additionally, some specific models like "Large Capacity Cutting Machine 1", "Medium Capacity Cutting Machine 1", and "Low Capacity Cutting Machine 1" also fall under this category.


In [19]:
query = "What are their names?"
response = rag_interaction(data, query, memory=memory)
print(response)

Using chatbot mode...
Loading precomputed vector index...
The metal cutter machines have the following names: Large Capacity Cutting Machine 1, Medium Capacity Cutting Machine 1, Low Capacity Cutting Machine 1, Large Capacity Cutting Machine 2, Medium Capacity Cutting Machine 2, and Medium Capacity Cutting Machine 3.


### Error Correction
Handle cases where users correct themselves mid-dialogue.

In [20]:
memory = ChatMemory() # Start a new session

query = "What are the KPIs of the plasma cutter machine?"
response = rag_interaction(data, query, memory=memory)
print(response)

Using chatbot mode...
Loading precomputed vector index...
I'm unable to provide information on a plasma cutter machine as it does not appear to be part of the provided knowledge base. The knowledge base only includes information about metal cutting machines, riveters, laser cutters, assemblers, welding machines, and testing machines. If you have any questions about one of these specific types of machines, I'll do my best to help.


In [21]:
query = "Sorry, I meant the laser cutter machine."
response = rag_interaction(data, query, memory=memory)
print(response)

Using chatbot mode...
Loading precomputed vector index...
The KPIs of a laser cutter machine include:
- working_time
- idle_time
- offline_time
- average_cycle_time
- power


---

In [23]:
memory = ChatMemory() # Start a new session

query = "How many plasma cutter machines are there?"
response = rag_interaction(data, query, memory=memory)
print(response)

Using chatbot mode...
Loading precomputed vector index...
I'm unable to provide a specific number of plasma cutter machines as this type of machine is not mentioned in the provided knowledge base.


In [24]:
query = "Sorry, I meant laser cutter machines."
response = rag_interaction(data, query, memory=memory)
print(response)

Using chatbot mode...
Loading precomputed vector index...
There is only 1 laser cutter machine mentioned in the provided knowledge base.


In [25]:
query = "And metal cutters?"
response = rag_interaction(data, query, memory=memory)
print(response)

Using chatbot mode...
Loading precomputed vector index...
There are 6 metal cutters mentioned in the provided knowledge base.


In [26]:
query = "And assemblers?"
response = rag_interaction(data, query, memory=memory)
print(response)

Using chatbot mode...
Loading precomputed vector index...
There are 3 assemblers mentioned in the provided knowledge base.


In [27]:
query = "And welding?"
response = rag_interaction(data, query, memory=memory)
print(response)

Using chatbot mode...
Loading precomputed vector index...
There are 2 welding machines mentioned in the provided knowledge base.


In [28]:
query = "And testing?"
response = rag_interaction(data, query, memory=memory)
print(response)

Using chatbot mode...
Loading precomputed vector index...
There are 3 testing machines mentioned in the provided knowledge base.


In [29]:
query = "And drills?"
response = rag_interaction(data, query, memory=memory)
print(response)

Using chatbot mode...
Loading precomputed vector index...
There is no mention of drilling machines in the provided knowledge base.


### Personalized Recommendations

In [6]:
memory = ChatMemory() # Start a new session

query = "What are the KPIs related to efficiency?"
response = rag_interaction(data, query, memory=memory)
print(response)

Using chatbot mode...
Loading precomputed vector index...
The KPIs related to efficiency are power and cycles.


In [7]:
query = "Can you generate a new KPI out of them?"
response = rag_interaction(data, query, memory=memory)
print(response)

Using chatbot mode...
Loading precomputed vector index...
{
  "KPIs": [
    {
      "name": "efficiency_index",
      "type": "ratio",
      "description": "Ratio of power to cycles",
      "unit_of_measure": "",
      "formula": "(power / cycles) * 100"
    }
  ]
}


### Topic 3 Analysis

In [8]:
memory = ChatMemory() # Start a new session

query = "What is the first machine in terms of usage?"
response = rag_interaction(data, query, memory=memory)
print(response)

Using chatbot mode...
Loading precomputed vector index...
Based on the utilization data provided, "Assembly Machine 1" has the highest utilization rate with a value of 0.7296168144672052, indicating it is the first machine in terms of usage.


In [9]:
query = "What is the category of this machine?"
response = rag_interaction(data, query, memory=memory)
print(response)

Using chatbot mode...
Loading precomputed vector index...
The category of "Assembly Machine 1" isAssembler.


In [10]:
query = "How many machines of this category are there?"
response = rag_interaction(data, query, memory=memory)
print(response)

Using chatbot mode...
Loading precomputed vector index...
According to the data provided, there are 3 machines classified as Assemblers.


---

In [5]:
memory = ChatMemory() # Start a new session

query = "What is the energy efficiency of the laser cutter?"
response = rag_interaction(data, query, memory=memory)
print(response)

Using chatbot mode...
Loading precomputed vector index...
The energy efficiency of the Laser Cutter is 0.025994032221087916.


In [6]:
query = "Is it the best in terms of energy efficiency?"
response = rag_interaction(data, query, memory=memory)
print(response)

Using chatbot mode...
Loading precomputed vector index...
No, the Laser Cutter's energy efficiency of 0.025994032221087916 is lower than that of some other machines, such as Testing Machine 2 (0.003305897555474169) and Assembly Machine 3 (0.08959678406509398).
