<a href="https://colab.research.google.com/github/kairamilanifitria/PurpleBox-Intern/blob/main/03_13_RETRIEVAL_LLM.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# JSON - EMBEDDING SUPABASE

In [6]:
import json
import re
import os

# Load Markdown file
file_path = "/content/drive/MyDrive/document_rag_italy/md/Manuale-IRIS_SLIM_IN_TEC_IT.md"
file_name = os.path.basename(file_path)
with open(file_path, "r", encoding="utf-8") as file:
    markdown_text = file.read()

# Function to check if a chunk contains a Markdown table
def is_table(chunk):
    return bool(re.search(r'^\|.*\|\n\|[-| ]+\|\n(\|.*\|\n)*', chunk, re.MULTILINE))

# Function to extract and split long tables
def extract_and_split_table(chunk, max_rows=10):
    lines = chunk.strip().split("\n")
    header, table_rows = None, []
    for i, line in enumerate(lines):
        if re.match(r'^\|[-| ]+\|$', line):
            header = lines[i - 1].strip("|").split("|")
            header = [h.strip() for h in header]
            continue
        if header:
            row_data = line.strip("|").split("|")
            row_data = [cell.strip() for cell in row_data]
            table_rows.append(row_data)

    # Split table into chunks if too many rows
    table_chunks = []
    for i in range(0, len(table_rows), max_rows):
        chunk_rows = table_rows[i:i + max_rows]
        table_chunks.append({"headers": header, "rows": chunk_rows})

    return table_chunks if header and table_rows else None

# Function to extract section headers
def extract_section_title(header):
    match = re.match(r'^(#+)\s+(.*)', header.strip())
    return match.group(2) if match else None

# Function to detect table title
def detect_table_title(pre_table_text):
    lines = pre_table_text.strip().split("\n")
    if lines and len(lines[-1].split()) < 10:  # Assuming a title is a short line before a table
        return lines[-1]
    return None

# Function to split text into chunks of max 400 words with 40-word overlap
def split_text(text, section_title, max_words=400, overlap=40):
    words = text.split()
    chunks = []
    start = 0
    while start < len(words):
        end = min(start + max_words, len(words))
        chunk = " ".join(words[start:end])
        # Prepend section title to first chunk
        if start == 0:
            chunk = f"## {section_title}\n{chunk}"
        chunks.append(chunk)
        start += max_words - overlap
    return chunks

# Process Markdown
sections = re.split(r'^(#+\s+.*)', markdown_text, flags=re.MULTILINE)
final_chunks = []
current_section = "Unknown"
chunk_id = 1

for i in range(1, len(sections), 2):
    section_title = extract_section_title(sections[i]) or current_section
    content = sections[i + 1].strip()
    current_section = section_title  # Update current section to maintain hierarchy

    table_matches = list(re.finditer(r'(\|.*\|\n\|[-| ]+\|\n(?:\|.*\|\n)+)', content, re.MULTILINE))
    last_index = 0

    for match in table_matches:
        start, end = match.span()
        pre_table_text = content[last_index:start].strip()
        table_text = match.group(0)
        last_index = end

        table_title = detect_table_title(pre_table_text)  # Extract table title if present
        if pre_table_text:
            text_chunks = split_text(pre_table_text, section_title)
            for chunk in text_chunks:
                final_chunks.append({
                    "chunk_id": chunk_id,
                    "content": chunk,
                    "metadata": {
                        "source": file_name,
                        "section": section_title,
                        "position": chunk_id
                    }
                })
                chunk_id += 1

        table_chunks = extract_and_split_table(table_text)
        if table_chunks:
            for table_chunk in table_chunks:
                final_chunks.append({
                    "chunk_id": chunk_id,
                    "table": table_chunk,
                    "metadata": {
                        "source": file_name,
                        "section": section_title,
                        "table_title": table_title,
                        "position": chunk_id
                    }
                })
                chunk_id += 1

    remaining_text = content[last_index:].strip()
    if remaining_text:
        text_chunks = split_text(remaining_text, section_title)
        for chunk in text_chunks:
            final_chunks.append({
                "chunk_id": chunk_id,
                "content": chunk,
                "metadata": {
                    "source": file_name,
                    "section": section_title,
                    "position": chunk_id
                }
            })
            chunk_id += 1

# Save JSON output
output_file = "/content/Manuale-IRIS_SLIM_IN_TEC_IT.md.json"
with open(output_file, "w", encoding="utf-8") as json_file:
    json.dump(final_chunks, json_file, indent=4, ensure_ascii=False)

print(f"Chunking completed. JSON saved to: {output_file}")

Chunking completed. JSON saved to: /content/Manuale-IRIS_SLIM_IN_TEC_IT.md.json


In [None]:
!pip install supabase numpy psycopg2

In [None]:
import os
import json
import torch
import uuid
import numpy as np
from supabase import create_client, Client
from transformers import AutoTokenizer, AutoModel

# Initialize Supabase
#SUPABASE_URL = "_______________"
#SUPABASE_KEY = "_______________"
SUPABASE_URL = ""
SUPABASE_KEY = ""

supabase: Client = create_client(SUPABASE_URL, SUPABASE_KEY)

# Load Embedding Model
tokenizer = AutoTokenizer.from_pretrained("Alibaba-NLP/gte-multilingual-base", trust_remote_code=True)
model = AutoModel.from_pretrained("Alibaba-NLP/gte-multilingual-base", trust_remote_code=True).to(torch.device("cuda" if torch.cuda.is_available() else "cpu"))


In [62]:
import json
import uuid
import torch

def get_embedding(text):
    """Generates an embedding vector from input text."""
    inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=512).to(model.device)
    with torch.no_grad():
        outputs = model(**inputs)
    return outputs.last_hidden_state.mean(dim=1).squeeze().cpu().tolist()

def generate_table_description(table_data, metadata):
    """Generates a structured description of the table including metadata."""
    section = metadata.get("section", "Unknown Section")
    table_title = metadata.get("table_title", "Unknown Table")

    headers = table_data.get("headers", [])
    rows = table_data.get("rows", [])

    if len(headers) < 2:
        header_key = headers[0] if headers else "Unknown Header"
        header_value = "Value"
    else:
        header_key, header_value = headers[:2]

    row_descriptions = [f"{header_key}: {row[0]}, {header_value}: {row[1]}" for row in rows if len(row) >= 2]

    description = f"{section}\n{table_title}\n" + " | ".join(row_descriptions)
    return description

def convert_table_to_text(table_data, metadata):
    """Converts a table (headers + rows) into a structured text format with metadata and description for embedding."""
    headers = ", ".join(table_data.get("headers", []))
    rows = [" | ".join(row) for row in table_data.get("rows", [])]

    # Generate description from table data
    table_description = generate_table_description(table_data, metadata)

    # Combine metadata with table content
    return (
        f"Table Title: {metadata.get('table_title', 'Unknown Table')}. Section: {metadata.get('section', 'Unknown Section')}\n"
        f"Table Data:\nHeaders: {headers}\n" + "\n".join(rows) +
        f"\nDescription: {table_description}"
    ), table_description  # Return both formatted text & natural description

def store_chunks_in_supabase(chunks):
    """Stores text and table chunks into Supabase with improved embeddings."""
    document_entries = []
    table_entries = []

    for chunk in chunks:
        chunk_id = str(uuid.uuid4())  # Generate unique chunk_id

        # Process text content
        if "content" in chunk and chunk["content"]:
            content = chunk["content"]
            embedding = get_embedding(content)

            document_entries.append({
                "chunk_id": chunk_id,
                "content": content,
                "embedding": embedding,
                "metadata": chunk["metadata"],
                "type": "text"
            })

        # Process table data
        if "table" in chunk and chunk["table"]:
            table_data = chunk["table"]
            metadata = chunk.get("metadata", {})

            # Generate both structured table text & natural description
            table_text, table_description = convert_table_to_text(table_data, metadata)
            table_embedding = get_embedding(table_text)

            table_entries.append({
                "chunk_id": chunk_id,
                "table_data": json.dumps(table_data, ensure_ascii=False),
                "description": table_description,  # Store the generated description
                "embedding": table_embedding,
                "metadata": metadata
            })

    # Batch insert into Supabase
    if document_entries:
        supabase.table("documents").insert(document_entries).execute()
    if table_entries:
        supabase.table("tables").insert(table_entries).execute()

In [63]:
# Load JSON chunks
json_file_path = "/content/Manuale-IRIS_SLIM_IN_TEC_IT.md.json"
with open(json_file_path, "r", encoding="utf-8") as json_file:
    json_chunks = json.load(json_file)

# Store chunks in Supabase
store_chunks_in_supabase(json_chunks)
print("Text and table embeddings stored successfully in Supabase!")

Text and table embeddings stored successfully in Supabase!


# WORKS

In [None]:
import nltk
nltk.download('all')

In [171]:
import numpy as np
import ast
import re
from scipy.spatial.distance import cosine
from collections import Counter
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
import nltk
nltk.download('punkt')
nltk.download('stopwords')

def get_embedding(text):
    """Generates an embedding vector from input text."""
    inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=512).to(model.device)
    with torch.no_grad():
        outputs = model(**inputs)
    return outputs.last_hidden_state.mean(dim=1).squeeze().cpu().tolist()

def extract_keywords_simple(text):
    """Extracts important words from a query using simple filtering."""
    stop_words = set(stopwords.words('english'))
    words = word_tokenize(text.lower())
    keywords = [word for word in words if word.isalnum() and word not in stop_words]
    return keywords

def query_requires_table(user_query):
    """Determines if the query is likely asking for table data."""
    table_keywords = {"table", "data", "values", "measurements", "limits", "thresholds", "parameters"}
    return any(word in user_query.lower() for word in table_keywords)

def query_supabase(user_query):
    """Retrieves both text and table chunks based on query, ensuring relevance balance."""
    query_embedding = np.array(get_embedding(user_query), dtype=np.float32).flatten()
    requires_table = query_requires_table(user_query)
    keywords = extract_keywords_simple(user_query)

    #### Step 1: Retrieve Text Chunks (Vector Search) ####
    response_text = supabase.table("documents").select("chunk_id, content, embedding, type, metadata").execute()
    text_results = []

    for record in response_text.data:
        chunk_embedding = ast.literal_eval(record["embedding"]) if isinstance(record["embedding"], str) else record["embedding"]
        chunk_embedding = np.array(chunk_embedding, dtype=np.float32).flatten()

        if chunk_embedding.shape == query_embedding.shape:
            similarity = 1 - cosine(query_embedding, chunk_embedding)
            text_results.append((record["chunk_id"], "text", record["content"], similarity))

    text_results.sort(key=lambda x: x[3], reverse=True)  # Sort by similarity
    top_text_chunks = text_results[:3]  # Keep top 3 text chunks

    #### Step 2: Retrieve Table Chunks Using Extracted Keywords ####
    response_tables = supabase.table("tables").select("chunk_id, table_data, description, embedding, metadata").execute()
    table_results = []

    for record in response_tables.data:
        table_data = record["table_data"].lower()
        table_description = record["description"].lower()
        keyword_match_score = sum(1 for word in keywords if word in table_data or word in table_description)

        if keyword_match_score > 0:
            table_results.append((record["chunk_id"], "table", record["description"], keyword_match_score))

    table_results.sort(key=lambda x: x[3], reverse=True)  # Sort by keyword relevance

    #### Step 3: Merge & Sort Results ####
    final_results = text_results[:3] + table_results[:2]  # Ensure text priority, limit tables
    final_results.sort(key=lambda x: x[3], reverse=True)  # Sort again by relevance

    return final_results[:5]  # Return top 5 most relevant results

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


In [None]:
import openai

# OpenAI API Key
OPENAI_API_KEY = ""
openai.api_key = OPENAI_API_KEY

# Function to call OpenAI LLM with chat history
def call_openai_llm(user_query, retrieved_chunks, chat_history=[]):
    """Send the query along with retrieved context and chat history to OpenAI API."""

    # Prepare context from retrieved chunks
    context_text = "\n\n".join([f"Chunk {i+1}: {chunk[2]}" for i, chunk in enumerate(retrieved_chunks)])

    # Construct messages for conversational memory
    messages = [
        {"role": "system", "content": "You are an intelligent assistant. Use the following retrieved information to answer the user's query."},
    ]

    # Append chat history
    messages.extend(chat_history)

    # Append current query with retrieved context
    messages.append({"role": "user", "content": f"Context:\n{context_text}\n\nUser's Question: {user_query}"})

    # Call OpenAI's Chat API with the new format
    client = openai.OpenAI(api_key=openai.api_key)  # Ensure you are using the new client-based API
    response = client.chat.completions.create(
        model="gpt-4-turbo",  # You can change this to another OpenAI model
        messages=messages,
        temperature=0.7
    )

    answer = response.choices[0].message.content  # Adjusted based on the new API response format

    # Append response to chat history
    chat_history.append({"role": "user", "content": user_query})
    chat_history.append({"role": "assistant", "content": answer})

    return answer, chat_history

In [172]:
user_query = "usage limit"

retrieved_chunks = query_supabase(user_query)

for chunk in retrieved_chunks:
    print(f"Chunk ID: {chunk[0]}\nType: {chunk[1]}\nContent: {chunk[2][:300]}...\nRelevance: {chunk[3]:.4f}\n")

Chunk ID: 8941c9c0-9dc7-44c4-bf17-a05a40e824ff
Type: table
Content: Indice

...
Relevance: 1.0000

Chunk ID: a551dee1-3d8e-485d-b25e-769ee09d5b20
Type: table
Content: 2.5. Limiti Di Impiego
Tabella 1. Limiti di impiego
Alimentazione elettrica: Temperatura acqua ingresso batteria, 220 - 240 V / 50 Hz: 5 - 70 °C | Alimentazione elettrica: Temperatura ripresa aria, 220 - 240 V / 50 Hz: 10 - 35 °C | Alimentazione elettrica: Umidità relativa ripresa aria, 220 - 240 V ...
Relevance: 1.0000

Chunk ID: 8843584e-ca30-4846-b7cf-234dfabb403b
Type: text
Content: ## 2.5. Limiti Di Impiego
Tabella 1. Limiti di impiego...
Relevance: 0.8015

Chunk ID: 76d151bc-6a7a-442e-b924-981cc588e5b2
Type: text
Content: ## 2.5. Limiti Di Impiego
Si consiglia di far lavorare la macchina agli estremi dei suddetti limiti di impiego solo per brevi periodi, perché il funzionamento per lunghi periodi può ridurre la normale durata dei componenti....
Relevance: 0.7723

Chunk ID: 61b9d9e2-6e2e-4430-ba82-46680cca3884
Type: 

In [173]:
user_query = "what are the usage limit?"

retrieved_chunks = query_supabase(user_query)

for chunk in retrieved_chunks:
    print(f"Chunk ID: {chunk[0]}\nType: {chunk[1]}\nContent: {chunk[2][:300]}...\nRelevance: {chunk[3]:.4f}\n")

Chunk ID: 8941c9c0-9dc7-44c4-bf17-a05a40e824ff
Type: table
Content: Indice

...
Relevance: 1.0000

Chunk ID: a551dee1-3d8e-485d-b25e-769ee09d5b20
Type: table
Content: 2.5. Limiti Di Impiego
Tabella 1. Limiti di impiego
Alimentazione elettrica: Temperatura acqua ingresso batteria, 220 - 240 V / 50 Hz: 5 - 70 °C | Alimentazione elettrica: Temperatura ripresa aria, 220 - 240 V / 50 Hz: 10 - 35 °C | Alimentazione elettrica: Umidità relativa ripresa aria, 220 - 240 V ...
Relevance: 1.0000

Chunk ID: 76d151bc-6a7a-442e-b924-981cc588e5b2
Type: text
Content: ## 2.5. Limiti Di Impiego
Si consiglia di far lavorare la macchina agli estremi dei suddetti limiti di impiego solo per brevi periodi, perché il funzionamento per lunghi periodi può ridurre la normale durata dei componenti....
Relevance: 0.8683

Chunk ID: 64f46084-3a66-425c-8eb7-86b9ddd27c9c
Type: text
Content: ## 2.2. Usi Non Previsti E Controindicazioni
Non sono ammesse le seguenti applicazioni: - · Funzionamento all'aperto - · Funzionam

In [175]:
# Example usage
user_query = "what are the Limits of use of the unit?"
retrieved_chunks = query_supabase(user_query)
chat_history = []  # Store conversation history

if retrieved_chunks:
    response, chat_history = call_openai_llm(user_query, retrieved_chunks, chat_history)
    print("\n🔹 Chatbot Response:\n", response)
else:
    print("No relevant information found.")


🔹 Chatbot Response:
 The limits of use of the unit, as specified in Chunk 3 (2.2. Usi Non Previsti E Controindicazioni) and Chunk 5 (2.5. Limiti Di Impiego), include:

1. **Prohibited Applications**:
   - Operation outdoors.
   - Operation in humid, explosive, or dusty environments.
   - Operation in corrosive environments, particularly harmful for the aluminum fins of the battery.
   - Operation in environments with electromagnetic disturbances.

2. **User Restrictions**:
   - The machine is not intended for use by individuals (including children) with reduced physical, mental, or sensory capacities, or by those who have not received sufficient instruction, unless supervised by someone responsible for their safety.

3. **Operational Extremes**:
   - It is advised to operate the machine at the extremes of the specified limits of use only for short periods because prolonged operation can reduce the normal lifespan of the components.

These limitations are important to ensure the safe a

In [176]:
# Example usage
user_query = "what are the usage limit of the unit?"
retrieved_chunks = query_supabase(user_query)
chat_history = []  # Store conversation history

if retrieved_chunks:
    response, chat_history = call_openai_llm(user_query, retrieved_chunks, chat_history)
    print("\n🔹 Chatbot Response:\n", response)
else:
    print("No relevant information found.")


🔹 Chatbot Response:
 The usage limits of the unit, as specified in Chunk 2 ("2.5. Limiti Di Impiego"), are as follows:

1. **Electrical Supply:**
   - **Water Inlet Temperature for Battery:** 5 - 70 °C when operating at 220 - 240 V / 50 Hz.
   - **Air Recovery Temperature:** 10 - 35 °C when operating at 220 - 240 V / 50 Hz.
   - **Relative Humidity of Recovered Air:** 10 - 70 % when operating at 220 - 240 V / 50 Hz.

Additionally, it is advised in Chunk 4 that the unit should only be operated at these extreme limits for short periods to avoid reducing the normal lifespan of its components.


# **BEST PRACTICE**

In [None]:
import nltk
nltk.download('all')

In [28]:
import numpy as np
import ast
import re
from scipy.spatial.distance import cosine
from collections import Counter
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
import nltk
nltk.download('punkt')
nltk.download('stopwords')

def get_embedding(text):
    """Generates an embedding vector from input text."""
    inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=512).to(model.device)
    with torch.no_grad():
        outputs = model(**inputs)
    return outputs.last_hidden_state.mean(dim=1).squeeze().cpu().tolist()

def extract_keywords_simple(text):
    """Extracts important words from a query using simple filtering."""
    stop_words = set(stopwords.words('english'))
    words = word_tokenize(text.lower())
    keywords = [word for word in words if word.isalnum() and word not in stop_words]
    return keywords

def query_requires_table(user_query):
    """Determines if the query is likely asking for table data."""
    table_keywords = {"table", "data", "values", "measurements", "limits", "thresholds", "parameters", "average", "sum", "percentage"}
    return any(word in user_query.lower() for word in table_keywords)

def get_most_similar_keywords(query_keywords, top_text_chunks):
    """Extracts most relevant words from top retrieved text chunks."""
    all_text_words = set()
    for chunk in top_text_chunks:
        chunk_words = set(word_tokenize(chunk[2].lower()))  # Extract words from chunk text
        all_text_words.update(chunk_words)
    common_words = [word for word in query_keywords if word in all_text_words]
    return common_words if common_words else query_keywords  # Fallback to original keywords if no match

def query_supabase(user_query):
    """Retrieves both text and table chunks based on query, ensuring relevance balance."""
    query_embedding = np.array(get_embedding(user_query), dtype=np.float32).flatten()
    keywords = extract_keywords_simple(user_query)
    requires_table = query_requires_table(user_query)

    #### Step 1: Retrieve Text Chunks (Vector Search) ####
    response_text = supabase.table("documents").select("chunk_id, content, embedding, type, metadata").execute()
    text_results = []

    for record in response_text.data:
        chunk_embedding = ast.literal_eval(record["embedding"]) if isinstance(record["embedding"], str) else record["embedding"]
        chunk_embedding = np.array(chunk_embedding, dtype=np.float32).flatten()

        if chunk_embedding.shape == query_embedding.shape:
            similarity = 1 - cosine(query_embedding, chunk_embedding)
            text_results.append((record["chunk_id"], "text", record["content"], similarity))

    text_results.sort(key=lambda x: x[3], reverse=True)
    top_text_chunks = text_results[:3]

    #### Step 2: Expand Query Using Retrieved Text ####
    refined_keywords = get_most_similar_keywords(keywords, top_text_chunks)

    #### Step 3: Retrieve Table Chunks Using Specialized Scoring ####
    response_tables = supabase.table("tables").select("chunk_id, table_data, description, embedding, metadata").execute()
    table_results = []
    table_weight = 2.5 if requires_table else 1.5  # Increase weight dynamically

    for record in response_tables.data:
        table_embedding = ast.literal_eval(record["embedding"]) if isinstance(record["embedding"], str) else record["embedding"]
        table_embedding = np.array(table_embedding, dtype=np.float32).flatten()
        table_data = record["table_data"].lower()
        table_description = record["description"].lower()
        keyword_match_score = sum(3 if word in table_data.split(" ")[:5] else 1 for word in refined_keywords if word in table_data or word in table_description)

        if table_embedding.shape == query_embedding.shape:
            embedding_similarity = 1 - cosine(query_embedding, table_embedding)
            keyword_embedding_score = sum(1 - cosine(get_embedding(word), table_embedding) for word in refined_keywords) / max(len(refined_keywords), 1)

            final_table_score = (embedding_similarity ** 0.8) * 0.2 + (keyword_match_score ** 2.5) * 0.6 + (keyword_embedding_score ** 1.2) * 0.2

            if final_table_score > 0:
                table_results.append((record["chunk_id"], "table", record["description"], final_table_score))

    table_results.sort(key=lambda x: x[3], reverse=True)

    #### Step 4: Merge & Rank Results with Adaptive Prioritization ####
    if table_results and table_results[0][3] > 0.75:
        final_results = [table_results[0]] + text_results[:2] + table_results[1:2] + text_results[2:]
    else:
        final_results = text_results[:3] + table_results[:2]  # Natural sorting if no table is required

    return final_results[:5]  # Return top 5 most relevant results

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


In [29]:
user_query = "usage limit"

retrieved_chunks = query_supabase(user_query)

for chunk in retrieved_chunks:
    print(f"Chunk ID: {chunk[0]}\nType: {chunk[1]}\nContent: {chunk[2][:300]}...\nRelevance: {chunk[3]:.4f}\n")

Chunk ID: a551dee1-3d8e-485d-b25e-769ee09d5b20
Type: table
Content: 2.5. Limiti Di Impiego
Tabella 1. Limiti di impiego
Alimentazione elettrica: Temperatura acqua ingresso batteria, 220 - 240 V / 50 Hz: 5 - 70 °C | Alimentazione elettrica: Temperatura ripresa aria, 220 - 240 V / 50 Hz: 10 - 35 °C | Alimentazione elettrica: Umidità relativa ripresa aria, 220 - 240 V ...
Relevance: 0.8898

Chunk ID: 8843584e-ca30-4846-b7cf-234dfabb403b
Type: text
Content: ## 2.5. Limiti Di Impiego
Tabella 1. Limiti di impiego...
Relevance: 0.8015

Chunk ID: 76d151bc-6a7a-442e-b924-981cc588e5b2
Type: text
Content: ## 2.5. Limiti Di Impiego
Si consiglia di far lavorare la macchina agli estremi dei suddetti limiti di impiego solo per brevi periodi, perché il funzionamento per lunghi periodi può ridurre la normale durata dei componenti....
Relevance: 0.7723

Chunk ID: 8941c9c0-9dc7-44c4-bf17-a05a40e824ff
Type: table
Content: Indice

...
Relevance: 0.8775

Chunk ID: 61b9d9e2-6e2e-4430-ba82-46680cca3884
Type: 

In [30]:
user_query = "what are the usage limit?"

retrieved_chunks = query_supabase(user_query)

for chunk in retrieved_chunks:
    print(f"Chunk ID: {chunk[0]}\nType: {chunk[1]}\nContent: {chunk[2][:300]}...\nRelevance: {chunk[3]:.4f}\n")

Chunk ID: a551dee1-3d8e-485d-b25e-769ee09d5b20
Type: table
Content: 2.5. Limiti Di Impiego
Tabella 1. Limiti di impiego
Alimentazione elettrica: Temperatura acqua ingresso batteria, 220 - 240 V / 50 Hz: 5 - 70 °C | Alimentazione elettrica: Temperatura ripresa aria, 220 - 240 V / 50 Hz: 10 - 35 °C | Alimentazione elettrica: Umidità relativa ripresa aria, 220 - 240 V ...
Relevance: 0.9077

Chunk ID: 76d151bc-6a7a-442e-b924-981cc588e5b2
Type: text
Content: ## 2.5. Limiti Di Impiego
Si consiglia di far lavorare la macchina agli estremi dei suddetti limiti di impiego solo per brevi periodi, perché il funzionamento per lunghi periodi può ridurre la normale durata dei componenti....
Relevance: 0.8683

Chunk ID: 64f46084-3a66-425c-8eb7-86b9ddd27c9c
Type: text
Content: ## 2.2. Usi Non Previsti E Controindicazioni
Non sono ammesse le seguenti applicazioni: - · Funzionamento all'aperto - · Funzionamento in ambienti umidi o esplosivi o polverosi - · Funzionamento in ambienti corrosivi, in particol

In [36]:
# Example usage
user_query = "what are the usage limit of the unit?"
retrieved_chunks = query_supabase(user_query)
chat_history = []  # Store conversation history

if retrieved_chunks:
    response, chat_history = call_openai_llm(user_query, retrieved_chunks, chat_history)
    print("\n🔹 Chatbot Response:\n", response)
else:
    print("No relevant information found.")


🔹 Chatbot Response:
 The usage limits of the unit, as outlined in Chunk 1 under "2.5. Limiti Di Impiego," include the following specifications:

- **Alimentazione elettrica: Temperatura acqua ingresso batteria, 220 - 240 V / 50 Hz**: 5 - 70 °C
- **Alimentazione elettrica: Temperatura ripresa aria, 220 - 240 V / 50 Hz**: 10 - 35 °C
- **Alimentazione elettrica: Umidità relativa ripresa aria, 220 - 240 V / 50 Hz**: 10 - 70 %


In [33]:
user_query = "what are the dimensions and weight?"

retrieved_chunks = query_supabase(user_query)

for chunk in retrieved_chunks:
    print(f"Chunk ID: {chunk[0]}\nType: {chunk[1]}\nContent: {chunk[2][:300]}...\nRelevance: {chunk[3]:.4f}\n")

Chunk ID: 5435d029-60dc-4c33-8ce9-9272abc58495
Type: text
Content: ## 2.6.1. IRIS Slim Verticale Con Mobile
| Peso (kg) | 17 | 20 | 23 | 26 |...
Relevance: 0.8080

Chunk ID: 6bc39d25-a0e3-4eb5-bb19-671143f300d2
Type: text
Content: ## 2.6.1. IRIS Slim Verticale Con Mobile
Figura 1. Iris slim verticale con mobile ![Image](/content/drive/MyDrive/document_rag_italy/md/Manuale-IRIS_SLIM_IN_TEC_IT_artifacts/image_000006_b273c4f4e7016f5303fc41823055cb7b77ee9a415b0e6e68f0edbe8aa520b8a6.png) *Image Description:* The image depicts a sk...
Relevance: 0.8072

Chunk ID: eb0edfb9-bbde-471a-8c16-8ff1eb336945
Type: text
Content: ## 2.6.2. IRIS Slim Verticale Da Incasso
Figura 3. Controcassa ![Image](/content/drive/MyDrive/document_rag_italy/md/Manuale-IRIS_SLIM_IN_TEC_IT_artifacts/image_000007_e4e65540859ec50cd31bba396970d1ffc6d937f61170ddfca4bfc718bb687e96.png) *Image Description:* The image shows two technical drawings of...
Relevance: 0.8048

Chunk ID: 59bafe85-ef03-4a7a-8c83-97a08aae26c4
Type: tab

In [37]:
# Example usage
user_query = "what are the dimensions and weight of IRIS Slim Vertical With Cabinet?"
retrieved_chunks = query_supabase(user_query)
chat_history = []  # Store conversation history

if retrieved_chunks:
    response, chat_history = call_openai_llm(user_query, retrieved_chunks, chat_history)
    print("\n🔹 Chatbot Response:\n", response)
else:
    print("No relevant information found.")


🔹 Chatbot Response:
 The dimensions and weight of the IRIS Slim Verticale Con Mobile are as follows:

- **Dimension (Height in mm)**: 600
- **Weight (kg)**: The weight options available are 17 kg, 20 kg, 23 kg, and 26 kg.


In [34]:
user_query = "explain about Installation Arrangements?"

retrieved_chunks = query_supabase(user_query)

for chunk in retrieved_chunks:
    print(f"Chunk ID: {chunk[0]}\nType: {chunk[1]}\nContent: {chunk[2][:300]}...\nRelevance: {chunk[3]:.4f}\n")

Chunk ID: 40d0a96f-de61-4b5a-abb6-cae3d3cbb5ac
Type: text
Content: ## 4.1. Predisposizioni All'installazione Di IRIS Slim
Fissare l'unità al muro con le quattro viti (in base alle dimensioni delle teste delle viti possono essere necessarie delle rondelle). Al termine dell'installazione l'unità deve risultare perfettamente in orizzontale o con lieve pendenza nella d...
Relevance: 0.8430

Chunk ID: 96c40b33-dc34-467d-bd8f-bd7bc845d40d
Type: text
Content: ## 4.2. Posizionamento
L'unità deve essere installata a parete che deve essere perfettamente verticale (90° rispetto al pavimento). Rispettare le misure minime riportate in figura, che sono necessarie per una agevole installazione e corretto funzionamento dell'unità. L'unità non deve essere esposta ...
Relevance: 0.8334

Chunk ID: 07c9674e-5746-445f-8f00-05232d58a6ec
Type: text
Content: ## 4.1. Predisposizioni All'installazione Di IRIS Slim
Forare il muro con gli interassi riportati ed inserire i quattro tasselli nei fori. ![Image](/cont

In [38]:
# Example usage
user_query = "explain about Installation Arrangements"
retrieved_chunks = query_supabase(user_query)
chat_history = []  # Store conversation history

if retrieved_chunks:
    response, chat_history = call_openai_llm(user_query, retrieved_chunks, chat_history)
    print("\n🔹 Chatbot Response:\n", response)
else:
    print("No relevant information found.")


🔹 Chatbot Response:
 The installation arrangements for the IRIS Slim unit, as described in the provided texts, involve several important steps and considerations to ensure proper installation and functionality. Here is a detailed explanation based on the chunks provided:

1. **Wall Preparation and Unit Mounting**:
   - **Drilling the Wall**: Begin by drilling the wall at specified intervals (interassi), which are likely provided by the technical diagrams in the manual. The holes are meant for inserting dowels (tasselli) that will hold the unit securely.
   - **Securing the Unit**: Attach the IRIS Slim unit to the wall using four screws. Depending on the screw head size, washers may be needed to ensure a secure fit. The unit should be perfectly horizontal or slightly tilted toward the condensate drain side. It is crucial to avoid any backward tilt opposite the condensate drain as this can impede the natural flow of the condensate, potentially causing operational issues.

2. **Positioni

# TRIAL ERROR

In [64]:
import numpy as np
import ast
import re
from scipy.spatial.distance import cosine
from collections import deque

def get_embedding(text):
    """Generates an embedding vector from input text."""
    inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=512).to(model.device)
    with torch.no_grad():
        outputs = model(**inputs)
    return outputs.last_hidden_state.mean(dim=1).squeeze().cpu().tolist()

def query_supabase(user_query):
    """Retrieves both text and table chunks based on query, using improved embeddings and keyword matching."""

    #### Step 1: Get Query Embedding ####
    query_embedding = np.array(get_embedding(user_query), dtype=np.float32).flatten()

    #### Step 2: Retrieve Text Chunks (Vector Search) ####
    response_text = supabase.table("documents").select("chunk_id, content, embedding, type, metadata").execute()
    text_results = []

    for record in response_text.data:
        chunk_embedding = record["embedding"]

        # Convert stored string embeddings to list if needed
        if isinstance(chunk_embedding, str):
            chunk_embedding = ast.literal_eval(chunk_embedding)

        chunk_embedding = np.array(chunk_embedding, dtype=np.float32).flatten()

        if chunk_embedding.shape == query_embedding.shape:
            similarity = 1 - cosine(query_embedding, chunk_embedding)
            text_results.append((record["chunk_id"], "text", record["content"], similarity))

    #### Step 3: Retrieve Table Chunks (Description + Embedding Match) ####
    response_tables = supabase.table("tables").select("chunk_id, table_data, description, embedding, metadata").execute()
    table_results = []

    for record in response_tables.data:
        table_data = record["table_data"]
        metadata = record.get("metadata", {})
        table_description = record.get("description", "")  # Use generated description
        table_embedding = record.get("embedding", None)

        # Ensure metadata fields are strings
        table_title = str(metadata.get("table_title", ""))
        section = str(metadata.get("section", ""))

        # Extract table number from the query (if any)
        table_number_match = re.search(r'table (\d+)', user_query, re.IGNORECASE)
        specified_table_number = table_number_match.group(1) if table_number_match else None

        # Step 3.1: Keyword Matching for Table Title, Section & Description
        keyword_match_score = 0
        if re.search(rf"\b{re.escape(user_query)}\b", table_title, re.IGNORECASE):
            keyword_match_score += 0.5  # Higher weight for title match
        if re.search(rf"\b{re.escape(user_query)}\b", section, re.IGNORECASE):
            keyword_match_score += 0.3  # Lower weight for section match
        if re.search(rf"\b{re.escape(user_query)}\b", table_description, re.IGNORECASE):
            keyword_match_score += 0.7  # Highest weight for description match

        # Prioritize the exact table number if mentioned
        if specified_table_number and specified_table_number in table_title.lower():
            keyword_match_score += 1.0  # Give a strong boost to matching table numbers

        # Step 3.2: Compute Embedding Similarity
        if table_embedding:
            if isinstance(table_embedding, str):
                table_embedding = ast.literal_eval(table_embedding)  # Convert string to list
            table_embedding = np.array(table_embedding, dtype=np.float32).flatten()

            if table_embedding.shape == query_embedding.shape:
                similarity = 1 - cosine(query_embedding, table_embedding)
                final_score = (0.7 * similarity) + (1.3 * keyword_match_score)  # Boost keyword matching
                table_results.append((record["chunk_id"], "table", table_description, final_score))

    #### Step 4: Merge & Sort Results ####
    all_results = text_results + table_results
    all_results.sort(key=lambda x: x[3], reverse=True)  # Sort by final similarity score

    return all_results[:5]  # Return top 5 results

In [96]:
import numpy as np
import ast
import re
from scipy.spatial.distance import cosine
from collections import Counter

def get_embedding(text):
    """Generates an embedding vector from input text."""
    inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=512).to(model.device)
    with torch.no_grad():
        outputs = model(**inputs)
    return outputs.last_hidden_state.mean(dim=1).squeeze().cpu().tolist()

def extract_relevant_words(query, text):
    """Finds the most relevant words from the retrieved text based on similarity to the query."""
    query_words = query.lower().split()
    text_words = text.lower().split()
    word_counts = Counter(text_words)
    relevant_words = [word for word in text_words if word in query_words or word_counts[word] > 1]
    return list(set(relevant_words))  # Return unique words

def query_supabase(user_query):
    """Retrieves both text and table chunks based on query, ensuring relevance balance."""
    query_embedding = np.array(get_embedding(user_query), dtype=np.float32).flatten()

    #### Step 1: Retrieve Text Chunks (Vector Search) ####
    response_text = supabase.table("documents").select("chunk_id, content, embedding, type, metadata").execute()
    text_results = []

    for record in response_text.data:
        chunk_embedding = record["embedding"]
        if isinstance(chunk_embedding, str):
            chunk_embedding = ast.literal_eval(chunk_embedding)
        chunk_embedding = np.array(chunk_embedding, dtype=np.float32).flatten()

        if chunk_embedding.shape == query_embedding.shape:
            similarity = 1 - cosine(query_embedding, chunk_embedding)
            text_results.append((record["chunk_id"], "text", record["content"], similarity))

    text_results.sort(key=lambda x: x[3], reverse=True)  # Sort by similarity
    top_text_chunks = text_results[:3]  # Keep top 3 text chunks

    #### Step 2: Extract Most Relevant Words from Top Text Chunks ####
    relevant_words = []
    for chunk in top_text_chunks:
        relevant_words.extend(extract_relevant_words(user_query, chunk[2]))
    relevant_words = list(set(relevant_words))  # Remove duplicates

    #### Step 3: Retrieve Table Chunks Using Extracted Words ####
    response_tables = supabase.table("tables").select("chunk_id, table_data, description, embedding, metadata").execute()
    table_results = []

    for record in response_tables.data:
        metadata = record.get("metadata", {})
        table_data = record.get("table_data", "")
        table_description = record.get("description", "")
        table_embedding = record.get("embedding", None)

        keyword_match_score = 0
        for word in relevant_words:
            if re.search(rf"\b{re.escape(word)}\b", str(metadata), re.IGNORECASE):
                keyword_match_score += 0.5
            if re.search(rf"\b{re.escape(word)}\b", table_data, re.IGNORECASE):
                keyword_match_score += 0.3
            if re.search(rf"\b{re.escape(word)}\b", table_description, re.IGNORECASE):
                keyword_match_score += 0.7

        embedding_match_score = 0
        if table_embedding:
            if isinstance(table_embedding, str):
                table_embedding = ast.literal_eval(table_embedding)
            table_embedding = np.array(table_embedding, dtype=np.float32).flatten()

            if table_embedding.shape == query_embedding.shape:
                similarity = 1 - cosine(query_embedding, table_embedding)
                embedding_match_score = similarity

        final_score = (1.3 * keyword_match_score) + embedding_match_score
        if keyword_match_score > 0:  # Only keep tables that match relevant words
            table_results.append((record["chunk_id"], "table", table_description, final_score))

    table_results.sort(key=lambda x: x[3], reverse=True)  # Sort by final score

    #### Step 4: Merge & Sort Results ####
    all_results = top_text_chunks + table_results[:2]  # Ensure text context is maintained
    all_results.sort(key=lambda x: x[3], reverse=True)  # Sort again by relevance

    return all_results[:5]  # Return top 5 most relevant results


In [97]:
user_query = "Limiti Di Impiego"

retrieved_chunks = query_supabase(user_query)

for chunk in retrieved_chunks:
    print(f"Chunk ID: {chunk[0]}\nType: {chunk[1]}\nContent: {chunk[2][:300]}...\nRelevance: {chunk[3]:.4f}\n")

Chunk ID: a551dee1-3d8e-485d-b25e-769ee09d5b20
Type: table
Content: 2.5. Limiti Di Impiego
Tabella 1. Limiti di impiego
Alimentazione elettrica: Temperatura acqua ingresso batteria, 220 - 240 V / 50 Hz: 5 - 70 °C | Alimentazione elettrica: Temperatura ripresa aria, 220 - 240 V / 50 Hz: 10 - 35 °C | Alimentazione elettrica: Umidità relativa ripresa aria, 220 - 240 V ...
Relevance: 5.4732

Chunk ID: 05e47d48-c7be-4c38-949e-f599df68a060
Type: table
Content: 6.3.1. Troubleshooting IRIS Slim

Anomalia: Sono presenti gocce di rugiada sul  pannello frontale, Possibili guasti: Valvola termostatica che trafila | Anomalia: Sono presenti gocce di rugiada sul  pannello frontale, Possibili guasti: Isolamenti staccati | Anomalia: Perdita d'acqua in riscaldamento ...
Relevance: 5.4258

Chunk ID: 8843584e-ca30-4846-b7cf-234dfabb403b
Type: text
Content: ## 2.5. Limiti Di Impiego
Tabella 1. Limiti di impiego...
Relevance: 0.8592

Chunk ID: 76d151bc-6a7a-442e-b924-981cc588e5b2
Type: text
Content: ## 2.5.

In [99]:
user_query = "what are the operating limits?"

retrieved_chunks = query_supabase(user_query)

for chunk in retrieved_chunks:
    print(f"Chunk ID: {chunk[0]}\nType: {chunk[1]}\nContent: {chunk[2][:300]}...\nRelevance: {chunk[3]:.4f}\n")

Chunk ID: 05e47d48-c7be-4c38-949e-f599df68a060
Type: table
Content: 6.3.1. Troubleshooting IRIS Slim

Anomalia: Sono presenti gocce di rugiada sul  pannello frontale, Possibili guasti: Valvola termostatica che trafila | Anomalia: Sono presenti gocce di rugiada sul  pannello frontale, Possibili guasti: Isolamenti staccati | Anomalia: Perdita d'acqua in riscaldamento ...
Relevance: 5.5398

Chunk ID: a551dee1-3d8e-485d-b25e-769ee09d5b20
Type: table
Content: 2.5. Limiti Di Impiego
Tabella 1. Limiti di impiego
Alimentazione elettrica: Temperatura acqua ingresso batteria, 220 - 240 V / 50 Hz: 5 - 70 °C | Alimentazione elettrica: Temperatura ripresa aria, 220 - 240 V / 50 Hz: 10 - 35 °C | Alimentazione elettrica: Umidità relativa ripresa aria, 220 - 240 V ...
Relevance: 5.5133

Chunk ID: 76d151bc-6a7a-442e-b924-981cc588e5b2
Type: text
Content: ## 2.5. Limiti Di Impiego
Si consiglia di far lavorare la macchina agli estremi dei suddetti limiti di impiego solo per brevi periodi, perché il funzio

In [154]:
import numpy as np
import ast
import re
from scipy.spatial.distance import cosine
from collections import Counter

def get_embedding(text):
    """Generates an embedding vector from input text."""
    inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=512).to(model.device)
    with torch.no_grad():
        outputs = model(**inputs)
    return outputs.last_hidden_state.mean(dim=1).squeeze().cpu().tolist()

def extract_relevant_words(query, text):
    """Finds the most relevant words from the retrieved text based on similarity to the query."""
    query_words = query.lower().split()
    text_words = text.lower().split()
    word_counts = Counter(text_words)
    relevant_words = [word for word in text_words if word in query_words or word_counts[word] > 1]
    return list(set(relevant_words))  # Return unique words

def query_requires_table(user_query):
    """Determines if the query is likely asking for table data."""
    table_keywords = {"table", "data", "values", "measurements", "limits", "thresholds", "parameters"}
    return any(word in user_query.lower() for word in table_keywords)

def query_supabase(user_query):
    """Retrieves both text and table chunks based on query, ensuring relevance balance."""
    query_embedding = np.array(get_embedding(user_query), dtype=np.float32).flatten()
    requires_table = query_requires_table(user_query)

    #### Step 1: Retrieve Text Chunks (Vector Search) ####
    response_text = supabase.table("documents").select("chunk_id, content, embedding, type, metadata").execute()
    text_results = []

    for record in response_text.data:
        chunk_embedding = record["embedding"]
        if isinstance(chunk_embedding, str):
            chunk_embedding = ast.literal_eval(chunk_embedding)
        chunk_embedding = np.array(chunk_embedding, dtype=np.float32).flatten()

        if chunk_embedding.shape == query_embedding.shape:
            similarity = 1 - cosine(query_embedding, chunk_embedding)
            text_results.append((record["chunk_id"], "text", record["content"], similarity))

    text_results.sort(key=lambda x: x[3], reverse=True)  # Sort by similarity
    top_text_chunks = text_results[:3]  # Keep top 3 text chunks

    #### Step 2: Extract Most Relevant Words from Top Text Chunks ####
    relevant_words = []
    for chunk in top_text_chunks:
        relevant_words.extend(extract_relevant_words(user_query, chunk[2]))
    relevant_words = list(set(relevant_words))  # Remove duplicates

    #### Step 3: Retrieve Table Chunks Using Extracted Words ####
    response_tables = supabase.table("tables").select("chunk_id, table_data, description, embedding, metadata").execute()
    table_results = []

    for record in response_tables.data:
        metadata = record.get("metadata", {})
        table_data = record.get("table_data", "")
        table_description = record.get("description", "")
        table_embedding = record.get("embedding", None)

        keyword_match_score = 0
        for word in relevant_words:
            if re.search(rf"\b{re.escape(word)}\b", str(metadata), re.IGNORECASE):
                keyword_match_score += 0.5
            if re.search(rf"\b{re.escape(word)}\b", table_data, re.IGNORECASE):
                keyword_match_score += 0.3
            if re.search(rf"\b{re.escape(word)}\b", table_description, re.IGNORECASE):
                keyword_match_score += 0.7

        embedding_match_score = 0
        if table_embedding:
            if isinstance(table_embedding, str):
                table_embedding = ast.literal_eval(table_embedding)
            table_embedding = np.array(table_embedding, dtype=np.float32).flatten()

            if table_embedding.shape == query_embedding.shape:
                similarity = 1 - cosine(query_embedding, table_embedding)
                embedding_match_score = similarity * 0.5  # Adjusted weight for better balance

        final_score = (1.0 * keyword_match_score) + embedding_match_score  # Balanced weight ratio

        # Include tables only if they are highly relevant
        if (requires_table and keyword_match_score > 0.2) or keyword_match_score > 0.5 or embedding_match_score > 0.3:
            table_results.append((record["chunk_id"], "table", table_description, final_score))

    table_results.sort(key=lambda x: x[3], reverse=True)  # Sort by final score

    #### Step 4: Merge & Sort Results ####
    final_results = text_results[:3] + table_results[:2]  # Ensure text priority, limit tables
    final_results.sort(key=lambda x: x[3], reverse=True)  # Sort again by relevance

    return final_results[:5]  # Return top 5 most relevant results


In [155]:
user_query = "usage limit"

retrieved_chunks = query_supabase(user_query)

for chunk in retrieved_chunks:
    print(f"Chunk ID: {chunk[0]}\nType: {chunk[1]}\nContent: {chunk[2][:300]}...\nRelevance: {chunk[3]:.4f}\n")

Chunk ID: a551dee1-3d8e-485d-b25e-769ee09d5b20
Type: table
Content: 2.5. Limiti Di Impiego
Tabella 1. Limiti di impiego
Alimentazione elettrica: Temperatura acqua ingresso batteria, 220 - 240 V / 50 Hz: 5 - 70 °C | Alimentazione elettrica: Temperatura ripresa aria, 220 - 240 V / 50 Hz: 10 - 35 °C | Alimentazione elettrica: Umidità relativa ripresa aria, 220 - 240 V ...
Relevance: 3.9680

Chunk ID: e377dbf0-9b8b-4ae7-aab7-eee8f1466ac1
Type: table
Content: 6.3.1. Troubleshooting IRIS Slim

Anomalia: Il pannello radiante frontale non si  riscalda, Possibili guasti: Presenza di aria nelle tubazioni | Anomalia: La ventilazione non risponde  immediatamente alle nuove  impostazioni, Possibili guasti: La valvola impiega quale minuto ad  aprirsi | Anomalia: ...
Relevance: 2.9668

Chunk ID: 8843584e-ca30-4846-b7cf-234dfabb403b
Type: text
Content: ## 2.5. Limiti Di Impiego
Tabella 1. Limiti di impiego...
Relevance: 0.8015

Chunk ID: 76d151bc-6a7a-442e-b924-981cc588e5b2
Type: text
Content: ## 2.5.

In [156]:
user_query = "what are the usage limit?"

retrieved_chunks = query_supabase(user_query)

for chunk in retrieved_chunks:
    print(f"Chunk ID: {chunk[0]}\nType: {chunk[1]}\nContent: {chunk[2][:300]}...\nRelevance: {chunk[3]:.4f}\n")

Chunk ID: 05e47d48-c7be-4c38-949e-f599df68a060
Type: table
Content: 6.3.1. Troubleshooting IRIS Slim

Anomalia: Sono presenti gocce di rugiada sul  pannello frontale, Possibili guasti: Valvola termostatica che trafila | Anomalia: Sono presenti gocce di rugiada sul  pannello frontale, Possibili guasti: Isolamenti staccati | Anomalia: Perdita d'acqua in riscaldamento ...
Relevance: 9.1226

Chunk ID: e377dbf0-9b8b-4ae7-aab7-eee8f1466ac1
Type: table
Content: 6.3.1. Troubleshooting IRIS Slim

Anomalia: Il pannello radiante frontale non si  riscalda, Possibili guasti: Presenza di aria nelle tubazioni | Anomalia: La ventilazione non risponde  immediatamente alle nuove  impostazioni, Possibili guasti: La valvola impiega quale minuto ad  aprirsi | Anomalia: ...
Relevance: 7.7224

Chunk ID: 76d151bc-6a7a-442e-b924-981cc588e5b2
Type: text
Content: ## 2.5. Limiti Di Impiego
Si consiglia di far lavorare la macchina agli estremi dei suddetti limiti di impiego solo per brevi periodi, perché il funzio

In [113]:
user_query = "what are the operating limits?"

retrieved_chunks = query_supabase(user_query)

for chunk in retrieved_chunks:
    print(f"Chunk ID: {chunk[0]}\nType: {chunk[1]}\nContent: {chunk[2][:300]}...\nRelevance: {chunk[3]:.4f}\n")

Chunk ID: 05e47d48-c7be-4c38-949e-f599df68a060
Type: table
Content: 6.3.1. Troubleshooting IRIS Slim

Anomalia: Sono presenti gocce di rugiada sul  pannello frontale, Possibili guasti: Valvola termostatica che trafila | Anomalia: Sono presenti gocce di rugiada sul  pannello frontale, Possibili guasti: Isolamenti staccati | Anomalia: Perdita d'acqua in riscaldamento ...
Relevance: 4.0299

Chunk ID: a551dee1-3d8e-485d-b25e-769ee09d5b20
Type: table
Content: 2.5. Limiti Di Impiego
Tabella 1. Limiti di impiego
Alimentazione elettrica: Temperatura acqua ingresso batteria, 220 - 240 V / 50 Hz: 5 - 70 °C | Alimentazione elettrica: Temperatura ripresa aria, 220 - 240 V / 50 Hz: 10 - 35 °C | Alimentazione elettrica: Umidità relativa ripresa aria, 220 - 240 V ...
Relevance: 4.0167

Chunk ID: 76d151bc-6a7a-442e-b924-981cc588e5b2
Type: text
Content: ## 2.5. Limiti Di Impiego
Si consiglia di far lavorare la macchina agli estremi dei suddetti limiti di impiego solo per brevi periodi, perché il funzio

In [114]:
user_query = "what does it explain about?"

retrieved_chunks = query_supabase(user_query)

for chunk in retrieved_chunks:
    print(f"Chunk ID: {chunk[0]}\nType: {chunk[1]}\nContent: {chunk[2][:300]}...\nRelevance: {chunk[3]:.4f}\n")

Chunk ID: 05e47d48-c7be-4c38-949e-f599df68a060
Type: table
Content: 6.3.1. Troubleshooting IRIS Slim

Anomalia: Sono presenti gocce di rugiada sul  pannello frontale, Possibili guasti: Valvola termostatica che trafila | Anomalia: Sono presenti gocce di rugiada sul  pannello frontale, Possibili guasti: Isolamenti staccati | Anomalia: Perdita d'acqua in riscaldamento ...
Relevance: 2.6994

Chunk ID: 83b3e7ff-bfd9-4290-8267-4a1aee36f4e6
Type: table
Content: 2.6.1. IRIS Slim Verticale Con Mobile
Tabella 2. Dimensioni e peso
Grandezza: A (mm), 601: 600...
Relevance: 2.5954

Chunk ID: ebaa168e-5aa7-45b3-b95f-129870908ee6
Type: text
Content: ## 1.1.1. Descrizione Dei Simboli
Relevance: 0.8111

Chunk ID: 3fd64fa2-565c-4dea-bb68-cd4e61e8cd41
Type: text
Content: ## 2.6.2. IRIS Slim Verticale Da Incasso
![Image](/content/drive/MyDrive/document_rag_italy/md/Manuale-IRIS_SLIM_IN_TEC_IT_artifacts/image_000008_494da229566d9c06790515f5c0b58614e2452ec5bc0bec2b0cd785e86d64fa98.png) *Image Description:* 

In [35]:
import openai

# OpenAI API Key
OPENAI_API_KEY = ""
openai.api_key = OPENAI_API_KEY

# Function to call OpenAI LLM with chat history
def call_openai_llm(user_query, retrieved_chunks, chat_history=[]):
    """Send the query along with retrieved context and chat history to OpenAI API."""

    # Prepare context from retrieved chunks
    context_text = "\n\n".join([f"Chunk {i+1}: {chunk[2]}" for i, chunk in enumerate(retrieved_chunks)])

    # Construct messages for conversational memory
    messages = [
        {"role": "system", "content": "You are an intelligent assistant. Use the following retrieved information to answer the user's query."},
    ]

    # Append chat history
    messages.extend(chat_history)

    # Append current query with retrieved context
    messages.append({"role": "user", "content": f"Context:\n{context_text}\n\nUser's Question: {user_query}"})

    # Call OpenAI's Chat API with the new format
    client = openai.OpenAI(api_key=openai.api_key)  # Ensure you are using the new client-based API
    response = client.chat.completions.create(
        model="gpt-4-turbo",  # You can change this to another OpenAI model
        messages=messages,
        temperature=0.7
    )

    answer = response.choices[0].message.content  # Adjusted based on the new API response format

    # Append response to chat history
    chat_history.append({"role": "user", "content": user_query})
    chat_history.append({"role": "assistant", "content": answer})

    return answer, chat_history

In [119]:
# Example usage
user_query = "is there any statistical or tables data?"
retrieved_chunks = query_supabase(user_query)
chat_history = []  # Store conversation history

if retrieved_chunks:
    response, chat_history = call_openai_llm(user_query, retrieved_chunks, chat_history)
    print("\n🔹 Chatbot Response:\n", response)
else:
    print("No relevant information found.")


🔹 Chatbot Response:
 In the provided chunks, there is reference to a table in Chunk 5: "Tabella 3. Dimensioni e peso" which indicates dimensions and weight. However, the specific content or data of this table is not provided in the text snippets you shared. Therefore, while it is mentioned that such statistical or table data exists, the actual data from the table is not available in the excerpts.


In [120]:
# Example usage
user_query = "what does it explain about?"
retrieved_chunks = query_supabase(user_query)
chat_history = []  # Store conversation history

if retrieved_chunks:
    response, chat_history = call_openai_llm(user_query, retrieved_chunks, chat_history)
    print("\n🔹 Chatbot Response:\n", response)
else:
    print("No relevant information found.")


🔹 Chatbot Response:
 The provided information discusses various aspects of the IRIS Slim appliance, including troubleshooting common issues, dimensions and weight for specific models, and handling instructions during transportation. Here's a breakdown of what each chunk explains:

1. **Chunk 1 (Troubleshooting IRIS Slim)**: This section lists potential anomalies and their possible causes associated with the IRIS Slim. It covers issues like condensation on the front panel, water leakage in both heating and cooling modes, noisy operations due to fan damages or misalignments, and other specific conditions leading to equipment malfunctions.

2. **Chunk 2 (IRIS Slim Verticale Con Mobile)**: Provides the dimensions and weight for a particular model of the IRIS Slim, specifically stating that model 601 has a height of 600 mm.


4. **Chunk 4 (IRIS Slim Verticale Da Incasso)**: Describes an image related to a sensor setup used for calibration based on measurements of weight, which might be par

In [121]:
# Example usage
user_query = "What are the operating limits of the IRIS Slim unit?"
retrieved_chunks = query_supabase(user_query)
chat_history = []  # Store conversation history

if retrieved_chunks:
    response, chat_history = call_openai_llm(user_query, retrieved_chunks, chat_history)
    print("\n🔹 Chatbot Response:\n", response)
else:
    print("No relevant information found.")


🔹 Chatbot Response:
 The operating limits of the IRIS Slim unit, specifically related to its dimensions and weight, as per the provided chunks are as follows:

- **Dimensions:** The device has a height of 601 mm, and the total length ranges from 163 to 198 mm, with a height from 96 to 129 mm. 
- **Weight:** The unit's weight varies between 17 kg to 26 kg.

These specifications indicate the physical constraints within which the IRIS Slim unit operates. However, any operational temperature ranges, pressure limits, or electrical specifications were not provided in the retrieved information.


In [124]:
# Example usage
user_query = "what are the Limits of use of the unit?"
retrieved_chunks = query_supabase(user_query)
chat_history = []  # Store conversation history

if retrieved_chunks:
    response, chat_history = call_openai_llm(user_query, retrieved_chunks, chat_history)
    print("\n🔹 Chatbot Response:\n", response)
else:
    print("No relevant information found.")


🔹 Chatbot Response:
 The limits of use for the unit are specified in Chunk 3 and Chunk 5:

1. **Non-Permitted Applications and Contraindications (Chunk 3)**
   - The unit should not be operated outdoors.
   - It should not be used in humid, explosive, or dusty environments.
   - It is not suitable for operation in corrosive environments, especially concerning the aluminum fins of the battery.
   - The unit should not be operated in areas subjected to electromagnetic disturbances.
   - It is not intended for use by individuals (including children) with reduced physical, mental, or sensory capacities, or by those who have not received adequate instruction, unless they are under the supervision of a person responsible for their safety.

2. **Employment Limits (Chunk 5)**
   - It is advised to operate the machine at the extremes of the specified employment limits only for short periods, as long-term operation may reduce the normal lifespan of the components.

These limitations are designe

In [125]:
# Example usage
user_query = "what are the usage limit of the unit?"
retrieved_chunks = query_supabase(user_query)
chat_history = []  # Store conversation history

if retrieved_chunks:
    response, chat_history = call_openai_llm(user_query, retrieved_chunks, chat_history)
    print("\n🔹 Chatbot Response:\n", response)
else:
    print("No relevant information found.")


🔹 Chatbot Response:
 The usage limits of the unit, as described in Chunk 4 ("2.5. Limiti Di Impiego"), recommend operating the machine at the extremes of its operational limits only for short periods. This is because prolonged operation can reduce the normal lifespan of its components. This guidance suggests that while the unit can handle extreme conditions, it is best used within its normal operating parameters to ensure longevity and optimal performance.
