# RaGPT (using gpt-4o-mini)
The idea of my update is to implement the AI assistant from scratch in a slimmer and more efficient version, i.e. relying entirely on the OpenAI API and guaranteeing that all the previous features are satisfied, as well as incorporating new others.
The main difference between the RAG system implemented during the course and this newly proposed one is that the latter operates as a classical chatbot assistant, to which the user can ask whatever question about the production site. The model will retrieve stored information, compute KPIs, generate and suggest new KPIs and also provide insights.

I won't touch upon the report generation feature, as that was implemented separately from the chatbot, and also given the triviality of its implementation.

### Working with group 4's API
This will give the model the possibility to compute the KPIs for the specified machines as well as accessing the data in the mongo database

In [186]:
import requests

url_user = 'http://127.0.0.1:8000/api/v1.0/user/login'

# Make the GET request
response = requests.get(url_user)

data = response.json()
print(response)
print(data)

<Response [405]>
{'detail': 'Method Not Allowed'}


In [187]:
from requests.auth import HTTPBasicAuth
# bho non va ma se li metto manualmente nell'url funziona.
json_data = {
  "email": "ffm@example.com",
  "password": "passwordffm"
}

url_user = 'http://127.0.0.1:8000/api/v1.0/user/login'

# Make the POST request
response = requests.post(url_user, json=json_data)

data = response.json()
print(response)

<Response [200]>


In [188]:
data

{'success': True,
 'data': {'uid': 'xM2kea8akaOKvYta26NMFBy8YnJ3',
  'email': 'ffm@example.com',
  'site': 1,
  'first_name': 'Mario',
  'last_name': 'Rossi',
  'phone_number': '0987654321',
  'id_token': 'eyJhbGciOiJSUzI1NiIsImtpZCI6IjBhYmQzYTQzMTc4YzE0MjlkNWE0NDBiYWUzNzM1NDRjMDlmNGUzODciLCJ0eXAiOiJKV1QifQ.eyJyb2xlIjoiRkZNIiwiaXNzIjoiaHR0cHM6Ly9zZWN1cmV0b2tlbi5nb29nbGUuY29tL3NtYXJ0YXBwLTlmMjg3IiwiYXVkIjoic21hcnRhcHAtOWYyODciLCJhdXRoX3RpbWUiOjE3MzY4NDgxMTgsInVzZXJfaWQiOiJ4TTJrZWE4YWthT0t2WXRhMjZOTUZCeThZbkozIiwic3ViIjoieE0ya2VhOGFrYU9Ldll0YTI2Tk1GQnk4WW5KMyIsImlhdCI6MTczNjg0ODExOCwiZXhwIjoxNzM2ODUxNzE4LCJlbWFpbCI6ImZmbUBleGFtcGxlLmNvbSIsImVtYWlsX3ZlcmlmaWVkIjpmYWxzZSwiZmlyZWJhc2UiOnsiaWRlbnRpdGllcyI6eyJlbWFpbCI6WyJmZm1AZXhhbXBsZS5jb20iXX0sInNpZ25faW5fcHJvdmlkZXIiOiJwYXNzd29yZCJ9fQ.qYK3rn5nBhxwuiRmi_sE07sQips8lEKhE6VCf-P7GLNLibvxy9Nq2QeZw4hjL3k3W0DSZA_matREI82kNqQmGtaOeJ6cDTVkTLA4_LoHGJ3gMd3uqmXTyMuXaNxZ2KIGNTkkhBPhEpoD53oji71pxZ_c-6sNP9G0Cqw6rQXCsY_YozmTOIIyt7rUVMnc1kLfvPIvPp8Mi9oujMR5

In [189]:
token = data['data']['id_token']
headers = {
    'Authorization': f'Bearer {token}'
}

In [42]:
machine_url = 'http://127.0.0.1:8000/api/v1.0/machine/'

response = requests.get(machine_url, headers=headers)

machine_data = response.json()
print(response)

<Response [200]>


In [43]:
machine_data

{'success': True,
 'data': [{'_id': '6740f1cfa8e3f95f42703128',
   'category': 'Metal cutter',
   'name': 'Large Capacity Cutting Machine 1',
   'asset_id': 'ast-yhccl1zjue2t',
   'kpis_ids': ['673a6ad2d9e0b151b88cbed0',
    '673a6ad2d9e0b151b88cbed1',
    '673a6ad3d9e0b151b88cbed2',
    '673a6ad3d9e0b151b88cbed3',
    '673a6ad4d9e0b151b88cbed4',
    '673a6ad4d9e0b151b88cbed5',
    '673a6ad4d9e0b151b88cbed6',
    '673a6ad5d9e0b151b88cbed7',
    '673a6ad5d9e0b151b88cbed8',
    '673a6ad5d9e0b151b88cbed9',
    '673a6ad6d9e0b151b88cbeda',
    '673a6ad7d9e0b151b88cbedb',
    '673a6ad8d9e0b151b88cbedc',
    '673a6ad8d9e0b151b88cbedd']},
  {'_id': '6740f1cfa8e3f95f42703129',
   'category': 'Rivetter',
   'name': 'Riveting Machine',
   'asset_id': 'ast-o8xtn5xa8y87',
   'kpis_ids': ['673a6ad2d9e0b151b88cbed0',
    '673a6ad2d9e0b151b88cbed1',
    '673a6ad3d9e0b151b88cbed2',
    '673a6ad3d9e0b151b88cbed3',
    '673a6ad4d9e0b151b88cbed4',
    '673a6ad4d9e0b151b88cbed5',
    '673a6ad4d9e0b151b88cb

In [48]:
site_url = 'http://127.0.0.1:8000/api/v1.0/site/1'

response = requests.get(site_url, headers=headers)

site_data = response.json()
print(response)

<Response [200]>


In [49]:
site_data

{'success': True,
 'data': {'_id': '6749e28bb76e0afac9e0254f',
  'machines_ids': ['6740f1cfa8e3f95f4270312c',
   '6740f1cfa8e3f95f4270312d',
   '6740f1cfa8e3f95f4270312f',
   '6740f1cfa8e3f95f42703134',
   '6740f1cfa8e3f95f42703136',
   '6740f1cfa8e3f95f42703129',
   '6740f1cfa8e3f95f4270312b',
   '6740f1cfa8e3f95f42703128',
   '6740f1cfa8e3f95f4270312a',
   '6740f1cfa8e3f95f4270312e',
   '6740f1cfa8e3f95f42703130',
   '6740f1cfa8e3f95f42703132',
   '6740f1cfa8e3f95f42703133',
   '6740f1cfa8e3f95f42703131',
   '6740f1cfa8e3f95f42703135',
   '6740f1cfa8e3f95f42703137'],
  'kpis_ids': ["id=ObjectId('673a6ad2d9e0b151b88cbed0') name='working_time'",
   "id=ObjectId('675dca35bdc5654aee19df33') name='overall_energy_efficiency'",
   "id=ObjectId('6777ff388093441c748bfd6e') name='energy_cost_ratio'",
   "id=ObjectId('673a6ad5d9e0b151b88cbed7') name='cost'",
   "id=ObjectId('673a6ad5d9e0b151b88cbed8') name='cost_working'",
   "id=ObjectId('673a6ad4d9e0b151b88cbed5') name='consumption_working'",

In [53]:
kpi_url = 'http://127.0.0.1:8000/api/v1.0/kpi/'
params = {
    'site': 1
}

response = requests.get(kpi_url, headers=headers, params=params)

kpi_data = response.json()
print(response)

<Response [200]>


In [54]:
kpi_data

{'success': True,
 'data': [{'_id': '673a6ad2d9e0b151b88cbed0',
   'name': 'working_time',
   'type': 'Time',
   'description': 'placeholder',
   'unite_of_measure': 'Minutes'},
  {'_id': '673a6ad2d9e0b151b88cbed1',
   'name': 'idle_time',
   'type': 'Time',
   'description': 'placeholder',
   'unite_of_measure': 'Minutes'},
  {'_id': '673a6ad3d9e0b151b88cbed2',
   'name': 'offline_time',
   'type': 'Time',
   'description': 'placeholder',
   'unite_of_measure': 'Minutes'},
  {'_id': '673a6ad3d9e0b151b88cbed3',
   'name': 'consumption',
   'type': 'Consumption',
   'description': 'placeholder',
   'unite_of_measure': 'kWh'},
  {'_id': '673a6ad4d9e0b151b88cbed4',
   'name': 'power',
   'type': 'Efficiency',
   'description': 'placeholder',
   'unite_of_measure': 'Percentage'},
  {'_id': '673a6ad4d9e0b151b88cbed5',
   'name': 'consumption_working',
   'type': 'Consumption',
   'description': 'placeholder',
   'unite_of_measure': 'kWh'},
  {'_id': '673a6ad4d9e0b151b88cbed6',
   'name': 'c

In [230]:
compute_url = 'http://127.0.0.1:8000/api/v1.0/kpi/machine/6740f1cfa8e3f95f4270312e/compute/'
params = {
    'kpi_id': '673a6ad8d9e0b151b88cbedd',
    'start_date': '2024-09-09 00:00:00',
    'end_date': '2024-10-01 00:00:00',
    'granularity_op': 'avg'

}

response = requests.get(compute_url, headers=headers, params=params)

result = response.json()
print(response)

<Response [200]>


In [231]:
result

{'success': True,
 'data': [{'value': 0.2807651026789001}],
 'message': 'KPI computed successfully'}

Testing the api call for a specific machine and kpi

In [1]:
from lib import compute_kpi_by_machine_id

compute_kpi_by_machine_id(machine_id="6740f1cfa8e3f95f42703134", kpi_id="673a6ad8d9e0b151b88cbedd", start_date="2024-09-30 00:00:00", end_date="2024-10-01 00:00:00", granularity_op="avg")


Logging in...


{'success': True,
 'data': [{'value': 3.646914313439856}],
 'message': 'KPI computed successfully'}

### Complete workflow
Here I will use the library to make the code look cleaner.


#### What can the model do?
It can answer questions about the structure of the site (e.g. questions regarding the total number of the machines, or the number of the machine per category, as well as listings of all machines per category), compute specific KPIs for a specific machine (the model will call the group 4's API in order to compute it), and compare KPIs for a list of machines.


#### How is this possible?
To make this possible I've used a new feature proposed by OpenAI, which kind of resembles the idea of a RAG; the feature takes the name of **AssistantAPI**. An assistant gives me the possibility to *file search* on a vector store (exaclty what happens in a RAG system) as well as learning when to *call a particular function*. This two tools make it possible to both retrieve information from the dataset and compute KPIs.

More specifically, the system will first analyze the user's request and, whenever it will be asked to compute a KPI, the system will retrieve both Machine and KPI's IDs from the knowledge base, in order to call the compute_kpi_by_machine_id function (this is needed, since the user most likely won't know what are the IDs for machines and kpis, but indeed will provide their name).


#### What is in the knowledge base
Since the data stored on the mongoDB was just regarding the machines and the KPIs, I've built a json that has it all and stored it in an Assistant's vectore store. This makes it easier for the assistant to indices and query it (opposed to passing all of it as payload in each request)

In [7]:
from lib import push_file_in_vector_store, link_vector_store, start_thread, query_rag

# Push the data file into the vector store and link it to the assistant
vstore_id = push_file_in_vector_store("./data/db_data.json", verbose=True)
assistant_id = link_vector_store(vstore_id)

# Start a new thread
thread_id = start_thread()

completed
FileCounts(cancelled=0, completed=1, failed=0, in_progress=0, total=1)


#### General questions, KPI computations and KPI comparisons

In [6]:
answer = query_rag(thread_id, assistant_id, "How many machines are there?")
print(f"RaGPT: {answer}")

RaGPT: There are 16 machines in total at the site【4:0†source】.


In [5]:
answer = query_rag(thread_id, assistant_id, "Compare average_cycle_time kpi for Assembly Machine 2 and Testing Machine 1 between 9 September 2024 and 1 October 2024")
print(f"RaGPT: {answer}")

RaGPT: It looks like you've repeated your previous request, which I already addressed. Here are the results again for the average cycle time KPI for the two machines between September 9, 2024, and October 1, 2024:

- **Assembly Machine 2:** 37.17 minutes
- **Testing Machine 1:** 1.89 minutes

If you need further comparisons or have any other questions, feel free to ask!


In [6]:
answer = query_rag(thread_id, assistant_id, "List all the machines present")
print(f"RaGPT: {answer}")

RaGPT: Here is the list of all machines present at the production site:

1. **Large Capacity Cutting Machine 1** (Metal cutter)
2. **Riveting Machine** (Rivetter)
3. **Medium Capacity Cutting Machine 1** (Metal cutter)
4. **Assembly Machine 1** (Assembler)
5. **Assembly Machine 2** (Assembler)
6. **Assembly Machine 3** (Assembler)
7. **Medium Capacity Cutting Machine 2** (Metal cutter)
8. **Testing Machine 1** (Tester)
9. **Testing Machine 2** (Tester)
10. **Low Capacity Cutting Machine 1** (Metal cutter)
11. **Laser Welding Machine 1** (Laser Welder)
12. **Laser Welding Machine 2** (Laser Welder)
13. **Medium Capacity Cutting Machine 3** (Metal cutter)
14. **Testing Machine 3** (Tester)
15. **Laser Cutter** (Laser Cutter)
16. **Large Capacity Cutting Machine 2** (Metal cutter)

If you have any further questions or require additional information, feel free to ask!


In [4]:
answer = query_rag(thread_id, assistant_id, "Compute the consumption kpi for all the machines on the month of September 2024")
print(f"RaGPT: {answer}")

RaGPT: Here are the consumption KPIs (in kWh) for all the machines during September 2024:

1. **Large Capacity Cutting Machine 1**: 1.54 kWh
2. **Riveting Machine**: 0.07 kWh
3. **Medium Capacity Cutting Machine 1**: 1.16 kWh
4. **Laser Welding Machine 1**: 0.96 kWh
5. **Assembly Machine 1**: 0.29 kWh
6. **Assembly Machine 2**: 0.18 kWh
7. **Assembly Machine 3**: 0.36 kWh
8. **Medium Capacity Cutting Machine 2**: 0.02 kWh
9. **Large Capacity Cutting Machine 2**: 0.03 kWh
10. **Testing Machine 1**: 0.71 kWh

Feel free to ask if you need more information or further computations!


In [7]:
answer = query_rag(thread_id, assistant_id, "compute the cost kpi for all the machines on the month of September 2024")
print(f"RaGPT: {answer}")

RaGPT: Here are the cost KPIs (in currency units) for all the machines during September 2024:

1. **Large Capacity Cutting Machine 1**: **€462,639**
2. **Riveting Machine**: **€548,231**
3. **Medium Capacity Cutting Machine 1**: **€519,210**
4. **Assembly Machine 1**: **€0**
5. **Assembly Machine 2**: **€40,767**
6. **Assembly Machine 3**: **€394,013**
7. **Medium Capacity Cutting Machine 2**: **€89,401**
8. **Large Capacity Cutting Machine 2**: **€164,628**
9. **Low Capacity Cutting Machine 1**: **€80,278**
10. **Testing Machine 1**: **€13,191**

If you need more details or wish to compute additional KPIs, just let me know!


#### KPI generation (group 7 link-up)

In [2]:
answer = query_rag(thread_id, assistant_id, "Suggest me a new KPI for all the machines")
print(f"RaGPT: {answer}")

RaGPT: Based on the available KPIs, I suggest creating a new KPI called **"Production Efficiency Ratio."** This KPI will provide insights into the overall efficiency of the production machines by relating the good cycles to the total cycles performed.

### Formula:
```
Production Efficiency Ratio = (good_cycles / cycles) * 100
```
### Description:
This KPI will measure how effectively the machines are producing quality products compared to total outputs. It is expressed as a percentage.

If you would like to implement this KPI or need further assistance, let me know!


In [3]:
answer = query_rag(thread_id, assistant_id, "Suggest me another one")
print(f"RaGPT: {answer}")

RaGPT: I propose another KPI called **"Energy Cost Ratio."** This KPI helps to evaluate the proportion of energy costs relative to the total production costs. 

### Formula:
```
Energy Cost Ratio = (cost_working + cost_idle) / cost * 100
```
### Description:
This KPI will be expressed as a percentage and provides insights into how much of the total production cost is attributable to energy consumption. It can be useful for assessing energy efficiency and making decisions regarding cost-cutting measures.

If you'd like to implement this KPI or need further assistance, just let me know!


In [4]:
answer = query_rag(thread_id, assistant_id, "Make a new KPI that is the sum of the cycle_time and the consumption")
print(f"RaGPT: {answer}")

RaGPT: ### New KPI Suggestion

**Name:** Sum of Cycle Time and Consumption  

**Description:** This KPI represents the total time taken for machine cycles added to the total energy consumption of the machine, providing insights on both operational time and energy usage efficiency.

**Formula:**  
\[ \text{Sum of Cycle Time and Consumption} = \text{cycle_time} + \text{consumption} \]

### Note:
- **Cycle Time** refers to the duration of one cycle of operation.
- **Consumption** refers to the total energy consumed.

This new KPI can be useful for understanding the combined impact of energy use and operational efficiency of machines【12:4†source】. If you have additional requests or need further modifications, just let me know!


#### Asking questions about cost prediction, energy efficiency rate and utilization rate (group 3 link-up)

In [2]:
answer = query_rag(thread_id, assistant_id, "Is the testing machine 1 working efficiently?")
print(f"RaGPT: {answer}")

RaGPT: Based on the retrieved data about **Testing Machine 1**, here are its efficiency metrics:

1. **Utilization Rate:** 0.0027 (or 0.27% utilization)
2. **Energy Efficiency Rate:** 18.38% (which indicates how effectively the machine uses energy)

### Summary:
- **Low Utilization Rate:** At 0.27%, this suggests that Testing Machine 1 is not being used effectively.
- **Moderate Energy Efficiency:** An efficiency rate of 18.38% indicates some level of energy use optimization, but it could still be improved【4:0†source】.

Overall, Testing Machine 1 appears to be underutilized, which may warrant further investigation into its operational efficiency.


In [3]:
answer = query_rag(thread_id, assistant_id, "Is the testing machine 1 costing too much?")
print(f"RaGPT: {answer}")

RaGPT: The cost prediction data for **Testing Machine 1** indicates a cost of approximately **0.0103 EUR/kWh** for its operation【8:0†source】. 

### Cost Comparison:
Here’s how this cost compares with other machine categories:
- **Metal Cutting Machines:** 0.0464 EUR/kWh
- **Laser Cutter:** 0.00052 EUR/kWh
- **Assembly Machines:** 0.00719 EUR/kWh
- **Riveting Machine:** 0.00963 EUR/kWh
- **Laser Welding Machines:** 0.0236 EUR/kWh

### Summary:
- The cost of **0.0103 EUR/kWh** for Testing Machine 1 is lower than most machine categories except for some specific types (like Laser Cutter and Assembly Machines).
- Overall, it does not appear to be excessively costly compared to others, particularly the Metal Cutting Machines, which is significantly higher.

If you have further questions or need additional insights, feel free to ask!


#### Assistant's insights
This is what my update brings to the table: the chatbot does not work just as a retriever of information, but it actually helps the user better understand the production site's strengths and weaknesses, analyzing all the available data. 

In [5]:
answer = query_rag(thread_id, assistant_id, "What do you think can be done to improve overall efficiency of the production site?")
print(f"RaGPT: {answer}")

RaGPT: Improving the overall efficiency of a production site can involve several strategies, including:

1. **Enhancing Machine Utilization:**
   - **Conduct a Usage Audit:** Evaluate machine usage patterns to identify underutilized machines.
   - **Plan Maintenance Schedules:** Regular maintenance can reduce downtime and ensure machines are operating at optimal capacity.

2. **Implementing Energy Efficiency Measures:**
   - **Upgrade Equipment:** Invest in energy-efficient machines that consume less power while maintaining performance.
   - **Optimize Processes:** Streamline operations to minimize energy wasted during production cycles.

3. **Training and Workforce Development:**
   - **Employee Training:** Provide staff training on best practices for machine operation and maintenance to enhance performance.
   - **Encourage Feedback:** Build a culture where employees can suggest improvements based on their hands-on experience.

4. **Utilizing Data and KPIs:**
   - **Leverage KPI Trac

In [6]:
answer = query_rag(thread_id, assistant_id, "Exploiting all the data you have available, give me some insights on the production site")
print(f"RaGPT: {answer}")

RaGPT: Based on the available data regarding the production site, here's an overview and key insights related to efficiency, cost, and utilization:

### 1. Machine Utilization Patterns:
- **Testing Machine 1:** 
  - **Utilization Rate:** 0.27%, indicating very low usage.
  - **Energy Efficiency Rate:** 18.38%, which is moderate. This suggests that while the machine consumes energy, it is not being used frequently enough to justify its operational cost.
  
### 2. Cost Analysis:
- **Cost of Testing Machine 1:** 
  - The operational cost is **0.0103 EUR/kWh**, which is relatively lower compared to many other machines. However, considering its low utilization, the overall expenses may not be justified.

- **Cost Comparisons with Other Machines:**
  - **Metal Cutting Machines:** 0.0464 EUR/kWh (much higher).
  - **Laser Welding Machines:** 0.0236 EUR/kWh (higher energy footprint).
  - This indicates that Testing Machine 1 could be seen as economically beneficial, but its low utilization mig