# My Update
In this notebook I will expand the functionalities of the current RAG system implemented together with my group.

In [1]:
import json
import pandas as pd

from rag_service.rag_chat import rag_interaction

# CHAT
Use case as chat: make general queries on the Knowledge Base and KPI Generation.

## Knowledge Base
Instead of getting the Knowledge Base data through the API, as we did for the project, I downloaded the data so I could have it locally and not have to log in every time to get it. The data in the KB consists of two JSON files that contain the description of the available KPIs and the list of machines with the corresponding KPIs.

In [2]:
file_path = "data/documentation.txt"

with open(file_path, "r") as file:
    documentation = file.read()

In [3]:
file_path = "data/kpis.json"

with open(file_path, "r") as file:
    kpi_data = json.load(file)

In [4]:
file_path = "data/machines.json"

with open(file_path, "r") as file:
    machine_data = json.load(file)

In [5]:
data = [kpi_data, machine_data]
data.append({"type": "txt", "content": documentation}) # in this way we are able to include other file types to the kb

## Queries

In [6]:
query = "How many machines are there?"
response = rag_interaction(data, query)

Using chatbot mode...
Creating new vector index...


Parsing nodes:   0%|          | 0/3 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/3 [00:00<?, ?it/s]



In [7]:
print(response)

There are 16 machines in total.


---

In [8]:
query = "How many KPIs are there?"
response = rag_interaction(data, query)

Using chatbot mode...
Loading precomputed vector index...


In [9]:
print(response)

There are 14 key performance indicators (KPIs).


---

In [10]:
query = "Are there KPIs related to consumption?"
response = rag_interaction(data, query)

Using chatbot mode...
Loading precomputed vector index...


In [11]:
print(response)

Yes, there are KPIs related to consumption.


---

In [12]:
query = "Can you speak italian?"
response = rag_interaction(data, query)

Using chatbot mode...
Loading precomputed vector index...


In [13]:
print(response)

I'm not capable of speaking any languages, including Italian. I can provide information and assist with tasks based on my training data, but I don't have personal capabilities such as speech.


---

In [14]:
query = "Show me the list of the KPIs available for the Assembly Machine 1"
response = rag_interaction(data, query)

Using chatbot mode...
Loading precomputed vector index...


In [15]:
print(response)

The KPIs available for the Assembly Machine 1 are:

- working_time
- idle_time
- offline_time
- average_cycle_time
- consumption
- consumption_working
- consumption_idle
- power
- cycles
- cost
- cost_working
- cost_idle
- good_cycles
- bad_cycles


---

In [16]:
query = "Generate a new KPI"
response = rag_interaction(data, query)

Using chatbot mode...
Loading precomputed vector index...


In [17]:
print(response)

{
  "KPIs": [
    {
      "name": "Cycle Efficiency",
      "type": "Ratio",
      "description": "The ratio of good cycles to total cycles",
      "unit_of_measure": "",
      "formula": "(good_cycles / (good_cycles + bad_cycles)) * 100"
    }
  ]
}


# REPORT
I use a different Knowledge Base for the Report Generation use case. This is justified by the fact that this RAG should be integrated with a proper interface, and the user should be available to choose whether to `CHAT` or to make a `REPORT`. I assume that the user can choose a period of time and a way to aggregate the data in the dataset (the pkl file), together with the machines he is interested in. The Knowledge Base would be made of the data the user is interested in, so it is better not to store the embeddings in this case. 

In the following section, I will create a JSON file that could be the product of the choices of the user.

## Report Data
In this section I am going to extract a small part of the orginal dataset provided by the professors. I will use this subset of data to test the report generation functionality of the RAG system.

In [None]:
df = pd.read_pickle('data/smart_app_data.pkl')
df.head()

Unnamed: 0,time,asset_id,name,kpi,sum,avg,min,max
0,2024-03-01T00:00:00Z,ast-yhccl1zjue2t,Large Capacity Cutting Machine 1,working_time,0.0,0.0,0.0,0.0
1,2024-03-01T00:00:00Z,ast-yhccl1zjue2t,Large Capacity Cutting Machine 1,idle_time,0.0,0.0,0.0,0.0
2,2024-03-01T00:00:00Z,ast-yhccl1zjue2t,Large Capacity Cutting Machine 1,offline_time,0.0,0.0,0.0,0.0
3,2024-03-01T00:00:00Z,ast-yhccl1zjue2t,Large Capacity Cutting Machine 1,consumption,0.066106,0.002321,0.0,0.066106
4,2024-03-01T00:00:00Z,ast-yhccl1zjue2t,Large Capacity Cutting Machine 1,power,,0.003673,0.0,0.012801


In [None]:
# Convert the "time" column to just the date (YYYY-MM-DD) format
df['time'] = pd.to_datetime(df['time']).dt.date
df.head()

Unnamed: 0,time,asset_id,name,kpi,sum,avg,min,max
0,2024-03-01,ast-yhccl1zjue2t,Large Capacity Cutting Machine 1,working_time,0.0,0.0,0.0,0.0
1,2024-03-01,ast-yhccl1zjue2t,Large Capacity Cutting Machine 1,idle_time,0.0,0.0,0.0,0.0
2,2024-03-01,ast-yhccl1zjue2t,Large Capacity Cutting Machine 1,offline_time,0.0,0.0,0.0,0.0
3,2024-03-01,ast-yhccl1zjue2t,Large Capacity Cutting Machine 1,consumption,0.066106,0.002321,0.0,0.066106
4,2024-03-01,ast-yhccl1zjue2t,Large Capacity Cutting Machine 1,power,,0.003673,0.0,0.012801


In [None]:
df['time'] = pd.to_datetime(df['time'])

In [None]:
# Keep only the time, name, kpi, and avg columns
df = df[['time', 'name', 'kpi', 'avg']]
df.head()

Unnamed: 0,time,name,kpi,avg
0,2024-03-01,Large Capacity Cutting Machine 1,working_time,0.0
1,2024-03-01,Large Capacity Cutting Machine 1,idle_time,0.0
2,2024-03-01,Large Capacity Cutting Machine 1,offline_time,0.0
3,2024-03-01,Large Capacity Cutting Machine 1,consumption,0.002321
4,2024-03-01,Large Capacity Cutting Machine 1,power,0.003673


In [None]:
# Consider only the working week that goes from the 2024-10-14 to the 2024-10-18
df_week = df[(df['time'] >= '2024-10-14') & (df['time'] <= '2024-10-18')]
df_week.head()

Unnamed: 0,time,name,kpi,avg
92482,2024-10-14,Large Capacity Cutting Machine 1,working_time,25406.0
92483,2024-10-14,Large Capacity Cutting Machine 1,idle_time,21791.0
92484,2024-10-14,Large Capacity Cutting Machine 1,offline_time,888.0
92485,2024-10-14,Large Capacity Cutting Machine 1,consumption,0.004759
92486,2024-10-14,Large Capacity Cutting Machine 1,power,0.004834


In [None]:
# Keep only two machines: Riveting Machine and Laser Cutter
df_week = df_week[(df_week['name'] == 'Riveting Machine') | (df_week['name'] == 'Laser Cutter')]
df_week.head()

Unnamed: 0,time,name,kpi,avg
92768,2024-10-14,Riveting Machine,working_time,30038.0
92769,2024-10-14,Riveting Machine,idle_time,17380.0
92770,2024-10-14,Riveting Machine,offline_time,2698.0
92771,2024-10-14,Riveting Machine,consumption,0.001
92772,2024-10-14,Riveting Machine,power,0.000131


In [None]:
# Aggregate the data by machine and kpi and consider the average over the week
df_week = df_week.groupby(['name', 'kpi']).mean()
df_week.reset_index(inplace=True)
df_week = df_week.drop(columns=['time'])
df_week

Unnamed: 0,name,kpi,avg
0,Laser Cutter,average_cycle_time,9.511204
1,Laser Cutter,bad_cycles,2.2
2,Laser Cutter,consumption,0.0
3,Laser Cutter,consumption_idle,0.0
4,Laser Cutter,consumption_working,0.0
5,Laser Cutter,cost,0.0
6,Laser Cutter,cost_idle,0.0
7,Laser Cutter,cost_working,0.0
8,Laser Cutter,cycles,1.0
9,Laser Cutter,good_cycles,553.6


In [None]:
# count how many rows of df_week have avg column equal to 0
len(df_week[df_week['avg'] == 0])

7

In [None]:
# Convert DataFrame to JSON
json_result = df_week.to_json(orient="records", indent=4)  

# Parse the JSON string back into a Python list
list_result = json.loads(json_result)

# Add metadata to the JSON result
result = {
    "metadata": {
        "working_week": "2024-10-14 to 2024-10-18",
        "description": "The average values are calculated over the days within the specified week."
    },
    "data": list_result
}

#### Save JSON to a file

In [None]:
"""# Save JSON to a file
with open("data/report_data.json", "w") as f:
    json.dump(result, f, indent=4)"""

## Report Generation

In [18]:
file_path = "data/report_data.json"

with open(file_path, "r") as file:
    report_data = json.load(file)

In [19]:
data = []
data.append({"type": "json", "content": report_data})

In [20]:
response = rag_interaction(data, generate_report=True)

Generating report with dynamic KB...


Parsing nodes:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]

In [21]:
print(response)

**Industrial Performance Report**

**Covered Period:** October 14th to October 18th, 2024

The following report provides an in-depth analysis of the performance of two industrial machines over a specified week. The comparison is based on key performance indicators (KPIs) such as average cycle time, consumption, cost, and productivity.

**Machine Performance Overview**

| Machine | Average Cycle Time (min) | Good Cycles (%) | Idle Time (min) | Power Consumption (W) |
| --- | --- | --- | --- | --- |
| Laser Cutter | 9.5112035935 | 553.6 | 19150.2 | 0.0001175526 |
| Riveting Machine | 3.3066611148 | 1042.0 | 26285.8 | 0.0001209674 |

**Comparison Highlights**

* The Laser Cutter has a significantly shorter average cycle time compared to the Riveting Machine, indicating improved efficiency.
* Both machines have similar good cycle percentages, suggesting comparable productivity levels.
* The Riveting Machine has notably lower idle times, indicating better utilization of its working capacity

---

### Let's visualize the response!

---

**Industrial Performance Report**

**Covered Period:** October 14th to October 18th, 2024

The following report provides an in-depth analysis of the performance of two industrial machines over a specified week. The comparison is based on key performance indicators (KPIs) such as average cycle time, consumption, cost, and productivity.

**Machine Performance Overview**

| Machine | Average Cycle Time (min) | Good Cycles (%) | Idle Time (min) | Power Consumption (W) |
| --- | --- | --- | --- | --- |
| Laser Cutter | 9.5112035935 | 553.6 | 19150.2 | 0.0001175526 |
| Riveting Machine | 3.3066611148 | 1042.0 | 26285.8 | 0.0001209674 |

**Comparison Highlights**

* The Laser Cutter has a significantly shorter average cycle time compared to the Riveting Machine, indicating improved efficiency.
* Both machines have similar good cycle percentages, suggesting comparable productivity levels.
* The Riveting Machine has notably lower idle times, indicating better utilization of its working capacity.

**Insights and Recommendations**

1. **Optimize Cycle Times**: Analyzing the gap between average cycle time and good cycles can help identify opportunities to improve processing efficiency. Targeting shorter cycle times may lead to increased productivity.
2. **Idle Time Reduction**: Identifying causes for prolonged idle periods is crucial for improving machine utilization. Regular maintenance and adjustments to operational procedures may help reduce idle times, ensuring more effective use of machine capacity.

**Future Steps**

* Monitor and analyze the performance metrics over subsequent weeks to track progress in addressing identified areas for improvement.
* Adjust operational protocols and schedule maintenance as needed to maintain optimal machine performance and efficiency.

This report provides a solid foundation for informed decision-making regarding industrial machine optimization. By focusing on key performance indicators and identifying areas for improvement, operators can make data-driven decisions to enhance overall productivity and reduce waste.

# MEMORY
Exploit the memory functionality.

### Dynamic Reports

In [None]:
# Ask to split the report by machine

### Error Correction

In [None]:
# Take a query that has a wrong answer, and try to ask something that helps the rag realize the mistake

### Personalized Recommendations