# My Update
In this notebook I will expand the functionalities of the current RAG system implemented together with my group.

In [1]:
import json
import pandas as pd

from rag_service.rag_chat import rag_interaction

## Knowledge Base
Instead of getting the Knowledge Base data through the API, as we did for the project, I downloaded the data so I could have it locally and not have to log in every time to get it. The data in the KB consists of two JSON files that contain the description of the available KPIs and the list of machines with the corresponding KPIs.

In [2]:
file_path = "data/documentation.txt"

with open(file_path, "r") as file:
    documentation = file.read()

In [3]:
file_path = "data/kpis.json"

with open(file_path, "r") as file:
    kpi_data = json.load(file)

In [4]:
file_path = "data/machines.json"

with open(file_path, "r") as file:
    machine_data = json.load(file)

In [5]:
data = [kpi_data, machine_data]
data.append({"type": "txt", "content": documentation}) # in this way we are able to include other file types to the kb

## Report Data
In this section I am going to extract a small part of the orginal dataset provided by the professors. I will use this subset of data to test the report generation functionality of the RAG system.

In [76]:
df = pd.read_pickle('data/smart_app_data.pkl')
df.head()

Unnamed: 0,time,asset_id,name,kpi,sum,avg,min,max
0,2024-03-01T00:00:00Z,ast-yhccl1zjue2t,Large Capacity Cutting Machine 1,working_time,0.0,0.0,0.0,0.0
1,2024-03-01T00:00:00Z,ast-yhccl1zjue2t,Large Capacity Cutting Machine 1,idle_time,0.0,0.0,0.0,0.0
2,2024-03-01T00:00:00Z,ast-yhccl1zjue2t,Large Capacity Cutting Machine 1,offline_time,0.0,0.0,0.0,0.0
3,2024-03-01T00:00:00Z,ast-yhccl1zjue2t,Large Capacity Cutting Machine 1,consumption,0.066106,0.002321,0.0,0.066106
4,2024-03-01T00:00:00Z,ast-yhccl1zjue2t,Large Capacity Cutting Machine 1,power,,0.003673,0.0,0.012801


In [77]:
# Convert the "time" column to just the date (YYYY-MM-DD) format
df['time'] = pd.to_datetime(df['time']).dt.date
df.head()

Unnamed: 0,time,asset_id,name,kpi,sum,avg,min,max
0,2024-03-01,ast-yhccl1zjue2t,Large Capacity Cutting Machine 1,working_time,0.0,0.0,0.0,0.0
1,2024-03-01,ast-yhccl1zjue2t,Large Capacity Cutting Machine 1,idle_time,0.0,0.0,0.0,0.0
2,2024-03-01,ast-yhccl1zjue2t,Large Capacity Cutting Machine 1,offline_time,0.0,0.0,0.0,0.0
3,2024-03-01,ast-yhccl1zjue2t,Large Capacity Cutting Machine 1,consumption,0.066106,0.002321,0.0,0.066106
4,2024-03-01,ast-yhccl1zjue2t,Large Capacity Cutting Machine 1,power,,0.003673,0.0,0.012801


In [78]:
df['time'] = pd.to_datetime(df['time'])

In [79]:
# Keep only the time, name, kpi, and avg columns
df = df[['time', 'name', 'kpi', 'avg']]
df.head()

Unnamed: 0,time,name,kpi,avg
0,2024-03-01,Large Capacity Cutting Machine 1,working_time,0.0
1,2024-03-01,Large Capacity Cutting Machine 1,idle_time,0.0
2,2024-03-01,Large Capacity Cutting Machine 1,offline_time,0.0
3,2024-03-01,Large Capacity Cutting Machine 1,consumption,0.002321
4,2024-03-01,Large Capacity Cutting Machine 1,power,0.003673


In [80]:
# Consider only the working week that goes from the 2024-10-14 to the 2024-10-18
df_week = df[(df['time'] >= '2024-10-14') & (df['time'] <= '2024-10-18')]
df_week.head()

Unnamed: 0,time,name,kpi,avg
92482,2024-10-14,Large Capacity Cutting Machine 1,working_time,25406.0
92483,2024-10-14,Large Capacity Cutting Machine 1,idle_time,21791.0
92484,2024-10-14,Large Capacity Cutting Machine 1,offline_time,888.0
92485,2024-10-14,Large Capacity Cutting Machine 1,consumption,0.004759
92486,2024-10-14,Large Capacity Cutting Machine 1,power,0.004834


In [81]:
# Keep only two machines: Riveting Machine and Laser Cutter
df_week = df_week[(df_week['name'] == 'Riveting Machine') | (df_week['name'] == 'Laser Cutter')]
df_week.head()

Unnamed: 0,time,name,kpi,avg
92768,2024-10-14,Riveting Machine,working_time,30038.0
92769,2024-10-14,Riveting Machine,idle_time,17380.0
92770,2024-10-14,Riveting Machine,offline_time,2698.0
92771,2024-10-14,Riveting Machine,consumption,0.001
92772,2024-10-14,Riveting Machine,power,0.000131


In [82]:
# Aggregate the data by machine and kpi and consider the average over the week
df_week = df_week.groupby(['name', 'kpi']).mean()
df_week.reset_index(inplace=True)
df_week = df_week.drop(columns=['time'])
df_week

Unnamed: 0,name,kpi,avg
0,Laser Cutter,average_cycle_time,9.511204
1,Laser Cutter,bad_cycles,2.2
2,Laser Cutter,consumption,0.0
3,Laser Cutter,consumption_idle,0.0
4,Laser Cutter,consumption_working,0.0
5,Laser Cutter,cost,0.0
6,Laser Cutter,cost_idle,0.0
7,Laser Cutter,cost_working,0.0
8,Laser Cutter,cycles,1.0
9,Laser Cutter,good_cycles,553.6


In [92]:
# count how many rows of df_week have avg column equal to 0
len(df_week[df_week['avg'] == 0])

7

In [93]:
# Convert DataFrame to JSON
json_result = df_week.to_json(orient="records", indent=4)  

# Parse the JSON string back into a Python list
list_result = json.loads(json_result)

# Add metadata to the JSON result
result = {
    "metadata": {
        "working_week": "2024-10-14 to 2024-10-18",
        "description": "The average values are calculated over the days within the specified week."
    },
    "data": list_result
}

In [94]:
"""# Save JSON to a file
with open("data/report_data.json", "w") as f:
    json.dump(result, f, indent=4)"""

In [16]:
# TODO: Evaluate whether the report_data file should be added to the kb or not.
# Maybe NOT, and, once fixed the kb for kpi generation and chat use case, make the embeddings and store them.


"""file_path = "data/report_data.json"

with open(file_path, "r") as file:
    report_data = json.load(file)



# Add the report_data file to the kb
data.append({"type": "json", "content": report_data})"""

## Queries

In [17]:
query = "How many machines are there?"
response = rag_interaction(data, query)

Parsing nodes:   0%|          | 0/4 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/54 [00:00<?, ?it/s]

In [18]:
print(response)

There appear to be a total of 16 separate machines mentioned in the context.


---

In [19]:
query = "How many KPIs are there?"
response = rag_interaction(data, query)

Parsing nodes:   0%|          | 0/4 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/54 [00:00<?, ?it/s]

In [20]:
print(response)

There are 10 unique KPIs being tracked across all machines. These include:

1. working_time
2. idle_time
3. offline_time
4. consumption
5. power
6. cycles
7. cost
8. good_cycles
9. bad_cycles
10. average_cycle_time


---

In [21]:
query = "Are there KPIs related to consumption?"
response = rag_interaction(data, query)

Parsing nodes:   0%|          | 0/4 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/54 [00:00<?, ?it/s]

In [22]:
print(response)

Yes, there are KPIs related to consumption. The metrics include "consumption", "consumption_working", and "consumption_idle" which track energy usage or resource expenditure, allowing for monitoring of the machine's power consumption during operation and when idle. These KPIs help optimize resource utilization.


---

In [23]:
query = "Can you speak italian?"
response = rag_interaction(data, query)

Parsing nodes:   0%|          | 0/4 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/54 [00:00<?, ?it/s]

In [24]:
print(response)

I don't understand the language.


---

In [25]:
query = "Show me the list of the KPIs available for the Assembly Machine 1"
response = rag_interaction(data, query)

Parsing nodes:   0%|          | 0/4 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/55 [00:00<?, ?it/s]

In [26]:
print(response)

Repeat. 

The KPIs available for the Assembly Machine 1 are not explicitly listed in our knowledge base.


---

In [27]:
query = "Generate a new KPI"
response = rag_interaction(data, query)

Parsing nodes:   0%|          | 0/4 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/54 [00:00<?, ?it/s]

In [28]:
print(response)

{
  "KPIs": [
    {
      "name": "Utilization Rate",
      "type": "Number",
      "description": "A measure of the percentage of time spent in a productive state, considering both working and idle times.",
      "unit_of_measure": "",
      "formula": "(working_time / (working_time + idle_time))"
    },
    {
      "name": "Downtime Ratio",
      "type": "Number",
      "description": "A measure of the ratio of offline time to total processing time, indicating equipment reliability.",
      "unit_of_measure": "",
      "formula": "(offline_time / (working_time + idle_time))"
    }
  ]
}


## Report Generation

In [97]:
file_path = "data/report_data.json"

with open(file_path, "r") as file:
    report_data = json.load(file)

In [98]:
data = []
data.append({"type": "json", "content": report_data})

In [99]:
response = rag_interaction(data, generate_report=True)

Parsing nodes:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]

In [100]:
print(response)

**Industrial Performance Report**

**Time Period:** October 14, 2024, to October 18, 2024 (Working Week)

This report analyzes the industrial data provided, focusing on key performance indicators (KPIs) for two machines: Laser Cutter and Riveting Machine.

**Machine-wise Summary**

### Laser Cutter

* **Average Cycle Time**: 9.5112035935 seconds
* **Bad Cycles**: 2.2 occurrences
* **Consumption**: 0.0 units
* **Idle Time**: 19,150.2 seconds (19150.2 minutes)
* **Offline Time**: 327.2 seconds
* **Power Consumption**: 0.0001175526 units
* **Working Time**: 71,449.0 seconds (71,449 minutes)
* **Average Good Cycles**: 553.6 occurrences
* **Cycles**: 1.0 cycle
* **Idle Idle Time**: 0.001 units
* **Offline Time**: 0.001 units
* **Consumption Idle**: 0.001 units
* **Consumption Working**: 0.001 units

### Riveting Machine

* **Average Cycle Time**: 3.3066611148 seconds
* **Bad Cycles**: 0.0 occurrences
* **Consumption**: 0.001 units
* **Idle Time**: 26,285.8 seconds (26,285.8 minutes)
* **Off

**Industrial Performance Report**

**Time Period:** October 14, 2024, to October 18, 2024 (Working Week)

This report analyzes the industrial data provided, focusing on key performance indicators (KPIs) for two machines: Laser Cutter and Riveting Machine.

**Machine-wise Summary**

### Laser Cutter

* **Average Cycle Time**: 9.5112035935 seconds
* **Bad Cycles**: 2.2 occurrences
* **Consumption**: 0.0 units
* **Idle Time**: 19,150.2 seconds (19150.2 minutes)
* **Offline Time**: 327.2 seconds
* **Power Consumption**: 0.0001175526 units
* **Working Time**: 71,449.0 seconds (71,449 minutes)
* **Average Good Cycles**: 553.6 occurrences
* **Cycles**: 1.0 cycle
* **Idle Idle Time**: 0.001 units
* **Offline Time**: 0.001 units
* **Consumption Idle**: 0.001 units
* **Consumption Working**: 0.001 units

### Riveting Machine

* **Average Cycle Time**: 3.3066611148 seconds
* **Bad Cycles**: 0.0 occurrences
* **Consumption**: 0.001 units
* **Idle Time**: 26,285.8 seconds (26,285.8 minutes)
* **Offline Time**: 703.0 seconds
* **Power Consumption**: 0.0001209674 units
* **Working Time**: 47,148.2 seconds (47,148.2 minutes)
* **Average Good Cycles**: 1,042.0 occurrences
* **Cycles**: 1.0 cycle
* **Idle Idle Time**: 0.001 units
* **Offline Time**: 0.001 units
* **Consumption Idle**: 0.001 units
* **Consumption Working**: 0.001 units

**Comparison and Insights**

Both machines have demonstrated efficient working times, with minimal idle and offline periods. The Laser Cutter has a shorter average cycle time compared to the Riveting Machine, indicating faster processing times.

The Riveting Machine's consumption remains relatively low, while its good cycles are higher than that of the Laser Cutter. This suggests that the Riveting Machine is more effective in producing quality products within a given timeframe.

This report provides an initial analysis of industrial performance data for two machines over a working week. Further insights and comparisons can be made by exploring additional KPIs or analyzing historical data.

## Exploit the Memory Functionality

### Dynamic Reports

### Error Correction

### Personalized Recommendations