<a href="https://colab.research.google.com/github/nerealegui/capstone/blob/update-agents-backend/AgentsCreation_withRAG.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This code creates 2 agents:
- Agent 1, creates a json file. It takes as an input natural language of the rule creation and it gives as an output a json file separating the conditions and the actions of the request. This agent helps understanding what are the conditions in order to execute the following action, this will help a future agent to validate the rules in case the conditions are met and the actions do not contradict other rules.
Agent 1 has a RAG implemented, at the moment it just works with 1 document and this document is saved in a folder in Drive (make sure to change the file path to your local one).
In this document some basic business considerations have been described, but this has to be improved, as well as the preprocessing of the document.
The output of Agent 1 is passed as an input of Agent 2.

- Agent 2 takes the json file generated from Agent 1 and uses Gemini knowledge to generate a drl rule with the correct drools syntax. Obviously this will need to be fine tuned in order to use the correct fields and entities for our rules but this has to be defined yet.
Agent 2 outputs some text that is later converted into a .drl file and stored locally (path can be specified).
This agent is not connected to any RAG system at the moment.

In [2]:
# Installing packages
%pip install --upgrade --user google-cloud-aiplatform pymupdf rich colorama
!pip install -U -q "google"
!pip install -U -q "google.genai"
!pip install python-docx
!pip install python-docx PyPDF2
!pip install python-dotenv

import os
from dotenv import load_dotenv
from pathlib import Path
from google.colab import drive

# Load API key from .env file
env_path = Path('../.env')
load_dotenv(dotenv_path=env_path)

# Get API key from environment variable
API_KEY = os.getenv("GOOGLE_API_KEY")

# If not found in .env file, you can still set it manually
if not API_KEY:
    print("Warning: API_KEY not found in .env file. Please create a .env file with your API key.")
    # You can uncomment the line below to set API_KEY manually if needed
    # API_KEY = "your_api_key_here"

print(f"API key loaded: {API_KEY[:5]}...{API_KEY[-5:] if API_KEY else ''}")

try:
    drive.mount("/content/drive")
    # Please ensure that uploaded files are available in the AI Studio folder or change the working folder.
    #os.chdir("/content/drive/MyDrive/Google AI Studio")
except:
    print("Running locally, not in Colab.")

Collecting python-dotenv
  Downloading python_dotenv-1.1.0-py3-none-any.whl.metadata (24 kB)
Downloading python_dotenv-1.1.0-py3-none-any.whl (20 kB)
Installing collected packages: python-dotenv
Successfully installed python-dotenv-1.1.0


### Restart current runtime

To use the newly installed packages in this Jupyter runtime, you must restart the runtime. You can do this by running the cell below, which will restart the current kernel.

In [3]:
# Restart kernel after installs so that your environment can access the new packages
import IPython

app = IPython.Application.instance()
app.kernel.do_shutdown(True)

{'status': 'ok', 'restart': True}

### Define Google Cloud project information

In [None]:
# Define project information

# In case you have a personal API key, exchange it here
# API_KEY  = "AIzaSyBmtdoV7oWTuZ0jPAwJp7kvikelbFRaBvM"

### Import libraries

In [2]:
from IPython.display import Markdown, display
from rich.markdown import Markdown as rich_Markdown
import sys
from google.colab import userdata
from google.colab import drive
import os
import base64
from google import genai
from google.genai import types
from google import genai
import json
import re
from docx import Document
import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np
import PyPDF2

In [27]:
# Mounting the drive
drive.mount('/content/drive')

# Add the location of the knowledge base for each agent

folder_path_agent_1 = "/content/drive/My Drive/Colab Notebooks/Capstone/Agent1RAG"
folder_path_agent_2 = "/content/drive/My Drive/Colab Notebooks/Capstone/Agent2RAG"

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


# Defining generic functions and variables

In [None]:
# Defining a function to read .docx files
def read_docx(file_path):
    doc = Document(file_path)
    doc_text = []
    for para in doc.paragraphs:
        doc_text.append(para.text)
    return "\n".join(doc_text)

# Defining a function to read .pdf file
def read_pdf(file_path):
    with open(file_path, 'rb') as f:
        reader = PyPDF2.PdfReader(f)
        return "\n".join([page.extract_text() for page in reader.pages if page.extract_text()])

# Function to read documents in the knowledge base (only docx and pdf supported)
def read_documents(folder_path):
    documents = []
    for filename in os.listdir(folder_path):
        file_path = os.path.join(folder_path, filename)
        if filename.endswith(".docx"):
            try:
                text = read_docx(file_path)
                documents.append({'filename': filename, 'text': text})
            except Exception as e:
                print(f"Error reading DOCX {filename}: {e}")
        elif filename.endswith(".pdf"):
            try:
                text = read_pdf(file_path)
                documents.append({'filename': filename, 'text': text})
            except Exception as e:
                print(f"Error reading PDF {filename}: {e}")
    return documents


In [None]:
# Function to split the test into fixed size chunks with overlap
def chunk_text(text, chunk_size, chunk_overlap):
    """
    Splits text into chunks of a fixed size with optional overlap.
    """
    chunks = []
    start = 0
    while start < len(text):
        end = start + chunk_size
        chunks.append(text[start:end])
        start += chunk_size - chunk_overlap
    return chunks


In [None]:
# Function to create the embeddings
def embed_texts(texts):
    out = client.models.embed_content(
        model="models/text-embedding-004",
        contents=texts,
        config=types.EmbedContentConfig(task_type="RETRIEVAL_QUERY")
    )
    return [emb.values for emb in out.embeddings]


In [None]:
# Defining a function to calculate cosine similarity

def retrieve(query: str, df: pd.DataFrame, top_k: int = 3) -> pd.DataFrame:
    # embed the query
    q_emb = embed_texts([query])[0]
    # stack dish embeddings into an array
    emb_matrix = np.vstack(df["embedding"].values)
    # cosine similarity
    sims = cosine_similarity([q_emb], emb_matrix)[0]
    df_scores = df.copy()
    df_scores["score"] = sims
    return df_scores.sort_values("score", ascending=False).head(top_k)

In [None]:
# Defining generic chunk values

# Parameters to change chunk_size and chunk_overlap
chunk_size=500
chunk_overlap=50

# Agent 1
**First agent - Input Normalizer / Business Rules Extractor**

In [3]:
client = genai.Client(vertexai=False, api_key=API_KEY)


def call_llm_with_context(user_input: str) -> str:
    prompt = prompt = f"""
You are an expert in translating restaurant business rules into structured logic.
Your task is to extract the key logic (conditions and actions) from the user's sentence.

User Input:
"{user_input}"

Respond with structured JSON like this:
{{
  "conditions": [...],
  "actions": [...]
}}
"""

    contents = [
        types.Content(
            role="user",
            parts=[types.Part.from_text(text=prompt)],
        )
    ]

    generate_content_config = types.GenerateContentConfig(response_mime_type="application/json")

    response = client.models.generate_content(
        model="gemini-2.0-flash",
        contents=contents,
        config=generate_content_config,
    )

    return response.text

## RAG implementation

### Read files and create embeddings

In [11]:
raw_docs = read_documents(folder_path_agent_1)

all_chunks = []
all_filenames = []

for doc in raw_docs:
    chunks = chunk_text(doc['text'], chunk_size, chunk_overlap)
    all_chunks.extend(chunks)
    all_filenames.extend([doc['filename']] * len(chunks))

# Generate embeddings
embeddings = embed_texts(all_chunks)

In [12]:
# Combine files, chunks and embeddings into a DataFrame
df = pd.DataFrame({
    'filename': all_filenames,
    'chunk': all_chunks,
    'embedding': embeddings
})


# Display the DataFrame
print(df)

                   filename  \
0        initial_rules.docx   
1   restaurant_content.docx   
2   restaurant_content.docx   
3   restaurant_content.docx   
4   restaurant_content.docx   
5   restaurant_content.docx   
6         initial_rules.pdf   
7         initial_rules.pdf   
8         initial_rules.pdf   
9         initial_rules.pdf   
10        initial_rules.pdf   
11        initial_rules.pdf   
12        initial_rules.pdf   
13        initial_rules.pdf   
14        initial_rules.pdf   

                                                chunk  \
0   {"conditions": [{"field":"Restaurant.size","op...   
1   # BUSINESS CONTEXT: RESTAURANT STAFFING RULES\...   
2   \n- extra_employees: Integer (Additional staff...   
3   \n\nExample:\nIF Forecast.total_sales > 50 THE...   
4   \n## OUTPUT FORMAT EXPECTED\n\nThe model shoul...   
5   e": "Small"},{"field":"Staffing.base_employees...   
6   [\n \n  \n{\n \n    \n"conditions":\n \n[\n \n...   
7   \n \n    \n"conditions":\n \n[\n \n      \n

### Retrieve the most relevant results based on a query, with an score

In [13]:
pd.set_option('display.max_columns', None)

# (Optional) Also widen the display so it doesn’t wrap or truncate by width:
pd.set_option('display.width', 0)           # auto-detect width
pd.set_option('display.max_colwidth', None)

In [15]:
# Check the results
results_df = retrieve("Give me the existing rules ", df, top_k=5)
results_df

Unnamed: 0,filename,chunk,embedding,score
2,restaurant_content.docx,"\n- extra_employees: Integer (Additional staff needed per time slot)\n\n\n## RULE LOGIC DEFINITIONS\n\n### Rule 1: Base staff by restaurant size\nUse the restaurant's size (Small, Medium, Large) to determine the base number of employees needed.\n\nExample:\nIF Restaurant.size == ""Small"" THEN Staffing.base_employees = 5\n\n### Rule 2: Base staff by total daily sales forecast\nUse the total daily sales forecast to define how many employees are required overall.\n\nExample:\nIF Forecast.total_sales > 50 THEN Staff","[0.04734337, 0.021319663, -0.007882023, 0.035351366, 0.018459542, -0.008353763, 0.052792884, -0.0007198902, -0.007953369, -0.018720526, -0.0392115, 0.08857228, 0.01193808, -0.023311196, -0.01249336, -0.068232216, 0.028487548, 0.008804122, -0.039057534, -0.0016995355, -0.040856294, 0.0061985543, -0.0385517, -0.08457024, -0.028692983, 0.010077333, 0.0005970798, -0.0039844303, 0.009323259, 0.0045622597, 0.05734186, -0.014152076, 0.009895689, -0.11716814, 0.014800227, 0.024968952, -0.003105904, 0.014154518, 0.083711274, -0.029660717, -0.053441484, -0.00092393893, -0.018610148, -0.003939307, 0.0045565623, -0.011314834, -0.049880333, -0.028370067, 0.024066562, 0.019344714, 0.016265875, -0.038536824, 0.009119721, 0.011165543, -0.09751009, -0.029513828, -0.04795187, -0.017814737, 0.08141338, -0.037284315, -0.004035499, -0.030885054, -0.029715892, -0.016712435, 0.09067202, 0.035299055, 0.047424834, 0.023513786, -0.042069476, 0.022288017, -0.028851623, 0.04704884, -0.08766758, -0.0147685185, -0.041466817, -0.033008423, 0.04292418, -0.011430756, 0.00028996175, 0.06426661, -0.005806688, -0.036065403, 0.049590092, 0.014158774, 0.026046798, -0.010950218, 0.027190883, -0.05487133, -0.021192642, 0.03245152, 0.04977327, 0.043495636, -0.0321435, -0.0137100015, 0.0649539, -0.042972244, -0.079606935, -0.047164805, 0.052495517, 0.054372605, ...]",0.429675
1,restaurant_content.docx,"# BUSINESS CONTEXT: RESTAURANT STAFFING RULES\n\nThis context outlines the rules used by a restaurant to determine employee staffing levels based on different inputs.\n\n\n## ENTITIES & FIELDS\n\nRestaurant:\n- size: String (Small, Medium, Large)\n\nForecast:\n- total_sales: Float (Total forecasted sales for the full day)\n- partial_sales: Float (Forecasted sales for a specific time slot)\n\nStaffing:\n- base_employees: Integer (Recommended minimum staff count)\n- extra_employees: Integer (Additional staff need","[0.044675924, 0.0047622896, -0.017632442, 0.03940838, 0.018238258, 0.018227454, 0.06690627, -0.016683524, -0.0228027, -0.032323204, -0.015426469, 0.08994328, 0.01873884, -0.033435527, 0.008830144, -0.056070227, 0.04275974, 0.028653476, -0.0673903, -0.0067725964, -0.04020685, -0.02690084, -0.024455324, -0.055232868, -0.027390992, -0.004793453, 0.021235108, -0.0051843757, 0.00715406, -3.7753023e-06, 0.038873207, -0.0047147004, 0.012641883, -0.08770717, 0.008425233, 0.0007528131, -0.018995184, 0.0110235205, 0.08597618, -0.028640652, -0.05058396, -0.010282526, -0.042297494, 0.010587021, -0.024684677, 0.018631436, -0.06558408, -0.032824263, -0.005744986, 0.04355614, 0.008295231, -0.052317414, -0.003145101, 0.015487199, -0.09427343, -0.019910157, -0.017276157, -0.021578189, 0.06902988, -0.017269257, -0.01595741, -0.04082219, -0.021728544, -0.033724453, 0.076025516, 0.028313972, 0.0776918, 0.0248787, -0.068694726, 0.037691638, -0.029729852, 0.060119577, -0.092306376, 0.0031897905, -0.009342853, -0.010759055, 0.029113734, 0.0065628365, 0.0013009297, 0.042154063, -0.025127243, -0.025301503, 0.03982866, 0.017805912, 0.03001135, -0.013990573, 0.019166952, -0.06259477, -0.03393418, -0.003703386, 0.032783955, 0.04778049, -0.034852866, 0.0027774407, 0.06917336, -0.038214456, -0.106953256, -0.06278646, 0.04447599, 0.065669805, ...]",0.413107
4,restaurant_content.docx,"\n## OUTPUT FORMAT EXPECTED\n\nThe model should convert natural language input into a JSON structure with the following format:\n\n## NOTES\n\n- All numeric comparisons can use: >, >=, <, <=, ==\n- Conditions can combine multiple fields\n- Multiple actions may be applied per rule\n\n## EXAMPLE RULES\n\nNatural Language: IF the restaurant is Small, THEN Staffing.base_employees==5 - Output format: {""conditions"": [{""field"":""Restaurant.size"",""operator"":""=="",""value"": ""Small""},{""field"":""Staffing.base_employees"",""o","[0.009334445, -0.0036603238, -0.037388694, 0.007727036, 0.0014576827, 0.020584455, 0.05311057, -0.033425808, -0.024594964, 0.017180644, -0.014425184, 0.06216374, 0.024641214, 0.034039192, 0.027032392, -0.05505793, 0.05618605, 0.075661615, -0.074663505, 0.025947232, -0.039244946, -0.07094404, -0.03980696, -0.0278908, -0.032847013, -0.015469556, 0.022477027, -0.011701883, 0.0019843793, -0.03688501, 0.04619988, -0.032076254, 0.010295515, -0.08804843, 0.036046058, -0.00032045276, -0.020487448, 0.0044383663, 0.049406666, -0.030046588, -0.07503966, 0.0002812776, 0.0027480698, 0.016886523, -0.009594904, -0.017346138, -0.05030288, -0.020030987, -0.007025172, 0.09268486, 0.013963076, -0.028820159, -0.008875976, 0.0037396527, -0.041734915, 0.008845026, -0.05570012, -0.07708367, 0.059574556, -0.026362885, -0.024832647, -0.026244, 0.004980138, -0.019387737, 0.056083705, -0.002157857, 0.014458028, 0.059371416, -0.06612831, -0.026974443, -0.039268512, 0.03338256, -0.06554062, 0.024572877, -0.02974784, 0.008607635, 0.045234837, -0.021674877, 0.053144563, 0.049866196, -0.018187257, -0.04555992, 0.03070989, 0.029628795, 0.0059647956, -0.037706353, -0.016515793, -0.05059936, -0.0544746, 0.010957873, 0.0089686075, 0.009862392, -0.041862477, -0.0634147, 0.056370914, -0.0056470917, -0.063606866, -0.093938194, 0.020977788, 0.052810777, ...]",0.401688
3,restaurant_content.docx,\n\nExample:\nIF Forecast.total_sales > 50 THEN Staffing.base_employees = 3\n\n### Rule 3: Extra staff by time-slot-specific forecast\nUse the forecasted total sales for a time slot to determine if extra staff are needed.\n\nExample:\nIF Forecast.partial_sales > 5 THEN Staffing.extra_employees = 1\n\n\n## VALID VALUES\n\nRestaurant.size:\n- Small\n- Medium\n- Large\n\n## ACTION TYPES\n\n- Set Staffing.base_employees = [value]\n- Set Staffing.extra_employees = [value]\n\n## OUTPUT FORMAT EXPECTED\n\nThe model should conve,"[0.03376585, 0.02870116, -0.03042911, 0.045254327, 0.0036695902, 0.013115063, 0.061164632, 0.0200572, 0.001165253, -0.025278673, -0.03470604, 0.09599119, 0.010967018, -0.018361237, -0.0039755832, -0.055299837, 0.024062492, 0.056274615, -0.014480988, 0.0002255054, -0.039971318, 0.0011417208, -0.034545384, -0.06400885, -0.01677574, -0.018998232, 0.023232777, 0.01767621, -0.011030865, -0.01716379, 0.052556902, 0.0052025155, 0.0145014785, -0.10324066, 0.018187199, 0.0221062, -0.017857943, -0.0127517525, 0.06529052, -0.051017366, -0.065543085, -0.009064577, -0.012194326, 0.027005404, -0.008900758, -0.011319928, -0.036938947, -0.009797685, 0.021298487, 0.035882436, -0.008152707, -0.021734687, 0.027771426, -0.002539677, -0.10378549, -0.01206313, -0.044808503, -0.010363205, 0.07389317, -0.050304472, 0.004075962, -0.04064845, -0.014714882, -0.021454088, 0.08665647, 0.011100918, 0.043152463, 0.060776632, -0.06602413, 0.0022999782, 0.008612954, 0.06553249, -0.088530086, -0.010816473, -0.024802724, -0.0258194, 0.017351428, -0.014197668, 0.029139446, 0.053006105, 0.0032597762, -0.035740476, 0.047336347, 0.020389067, 0.039939225, -0.032897677, 0.02612498, -0.061472613, -0.046066374, 0.015233098, 0.036431275, 0.02839095, -0.03355845, -0.013550337, 0.08313076, -0.02054669, -0.08262598, -0.051591583, 0.03960648, 0.028505908, ...]",0.381942
6,initial_rules.pdf,"[\n \n \n{\n \n \n""conditions"":\n \n[\n \n \n{\n \n""field"":\n \n""Restaurant.size"",\n \n""operator"":\n \n""=="",\n \n""value"":\n \n""Small""\n \n},\n \n \n{\n \n""field"":\n \n""Forecast.total_sales"",\n \n""operator"":\n \n"">="",\n \n""value"":\n \n0\n \n},\n \n \n{\n \n""field"":\n \n""Forecast.total_sales"",\n \n""operator"":\n \n""<"",\n \n""value"":\n \n50\n \n}\n \n \n],\n \n \n""actions"":\n \n[\n \n \n{\n \n""field"":\n \n""Staffing.base_employees"",\n \n""operator"":\n \n""="",\n \n""value"":\n \n2\n \n}\n \n \n]\n \n \n},\n \n \n{\n \n \n""conditions"":\n \n[\n \n \n{\n \n""field"":\n \n","[0.04919858, -0.010067676, -0.060101226, 0.016268793, 0.03116375, -0.0043034935, 0.05381633, -0.028441971, 0.005918334, -0.013388913, -0.017381676, 0.10779931, 0.003037862, -0.0099967895, 0.02202179, -0.052892298, 0.08155548, 0.046749413, -0.03144321, 0.007933805, -0.014465134, 0.007716479, -0.04269964, -0.052540343, -0.03171504, -0.04394418, -0.01125445, 0.012866279, -0.006961281, 0.010076911, 0.053764537, -0.007951014, 0.023413938, -0.09197664, 0.02816584, 0.032855745, -0.044921413, 0.012058379, 0.0643297, -0.046337213, -0.05716247, 0.0043887706, -0.014260607, -0.014139757, -0.013150639, -0.021319121, -0.055767428, -0.02645386, 0.01573202, 0.021369424, -0.010861691, -0.004468037, -0.010341104, -0.020255065, -0.08822379, 0.001994822, -0.03604258, -0.06923737, 0.062156826, -0.039390337, -0.00024247698, -0.0545765, -0.03646127, -0.021813264, 0.06426366, 0.007534424, 0.028266918, 0.029077137, -0.056083504, -0.013474556, -0.04001026, 0.051787376, -0.031573523, -0.013667688, -0.031852126, -0.0031117643, 0.0057831877, -0.021372423, 0.043803953, 0.025256692, -0.034020916, -0.030068336, 0.039033167, 0.042692337, 0.009480337, -0.02103771, 0.010957637, -0.035968874, -0.04718371, -0.0024774158, 0.039056316, 0.0059929686, -0.03818446, -0.028948782, 0.06847276, -0.022803029, -0.07342502, -0.08546454, 0.042401955, 0.02051047, ...]",0.3815


In [16]:
# 4. Retrieval WITH LLM (RAG)
def rag_query(query: str, df: pd.DataFrame, top_k: int = 3) -> str:
    docs = retrieve(query, df, top_k)
    prompt = f"""
You are an expert in translating restaurant business rules into structured logic.
Your task is to extract the key logic (conditions and actions) from the user's sentence.

Respond with structured JSON like this:
{{
  "conditions": [...],
  "actions": [...]
}}
"""
    # for _, row in docs.iterrows():
    #     prompt += f"- {row['title']}: {row['description']}\n"
    prompt += f"\nQuestion: {query}"
    return call_llm_with_context(prompt)


### RAG call

In [17]:
# Example RAG call
response = rag_query("Modification to the restaurant size rule. The required base number of employees for large restaurants increases from 10 to 12. ", df, top_k=5)
print(response)

{
  "conditions": [
    "Restaurant size is large"
  ],
  "actions": [
    "Change required base number of employees from 10 to 12"
  ]
}


# Agent 2
**Code generator, generating .drl files with drools syntax**

In [23]:
# Pass the rule to json
rule_json = json.loads(response)

In [24]:
# Creation of agent2

def agent2_generate_drl_with_rag(agent1_output: dict, retrieved_docs: pd.DataFrame) -> str:
    # Convert Agent1 output
    conditions = "\n".join(f"- {cond}" for cond in agent1_output.get("conditions", []))
    actions = "\n".join(f"- {act}" for act in agent1_output.get("actions", []))

    # Format RAG context (top-k chunks)
    context = "\n\n".join(retrieved_docs['chunk'].tolist())

    # Rule name from condition (still simple but cleaner)
    cond_keywords = agent1_output.get("conditions", ["GeneratedRule"])
    rule_name = "RuleFor_" + "_".join(cond_keywords[0].split()[:3]).capitalize()

    # Prompt for the LLM
    prompt = f"""
You are a Drools rule generation expert.

Here is some **context** from existing Drools rules:
----------------
{context}
----------------

And here is a new rule description you must convert into DRL format:
Rule name: {rule_name}

Conditions:
{conditions}

Actions:
{actions}

Generate the DRL using this format:
rule "RuleName"
when
    <conditions>
then
    <actions>;
end
"""

    # Call Gemini
    contents = [types.Content(role="user", parts=[types.Part.from_text(text=prompt)])]
    config = types.GenerateContentConfig(response_mime_type="text/plain")

    response = client.models.generate_content(
        model="gemini-2.0-flash",
        contents=contents,
        config=config
    )

    return response.text

## RAG implementation

### Read files and create embeddings

In [28]:
raw_docs = read_documents(folder_path_agent_2)

all_chunks_2 = []
all_filenames_2 = []

for doc in raw_docs:
    chunks = chunk_text(doc['text'], chunk_size, chunk_overlap)
    all_chunks_2.extend(chunks)
    all_filenames_2.extend([doc['filename']] * len(chunks))

# Generate embeddings
embeddings_2 = embed_texts(all_chunks_2)

In [29]:
# Combine files, chunks and embeddings into a DataFrame
df_2 = pd.DataFrame({
    'filename': all_filenames_2,
    'chunk': all_chunks_2,
    'embedding': embeddings_2
})


# Display the DataFrame
print(df_2)

            filename  \
0  initial_rules.pdf   
1  initial_rules.pdf   
2  initial_rules.pdf   
3  initial_rules.pdf   
4  initial_rules.pdf   
5  initial_rules.pdf   
6  initial_rules.pdf   
7  initial_rules.pdf   
8  initial_rules.pdf   

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           chunk  \
0  [\n \n  \n{\n \n    \n"conditions":\n \n[\n \n      \n{\n \n"field":\n \n"Restaurant.size",\n \n"operator":\n \n"==",\n \n"value":\n \n"Small"\n

### Retrieve the most relevant results based on a query, with a score

In [30]:
pd.set_option('display.max_columns', None)

# (Optional) Also widen the display so it doesn’t wrap or truncate by width:
pd.set_option('display.width', 0)           # auto-detect width
pd.set_option('display.max_colwidth', None)

In [31]:
# Check the results
results_df_2 = retrieve("Give me the existing rules ", df_2, top_k=5)
results_df_2

Unnamed: 0,filename,chunk,embedding,score
0,initial_rules.pdf,"[\n \n \n{\n \n \n""conditions"":\n \n[\n \n \n{\n \n""field"":\n \n""Restaurant.size"",\n \n""operator"":\n \n""=="",\n \n""value"":\n \n""Small""\n \n},\n \n \n{\n \n""field"":\n \n""Forecast.total_sales"",\n \n""operator"":\n \n"">="",\n \n""value"":\n \n0\n \n},\n \n \n{\n \n""field"":\n \n""Forecast.total_sales"",\n \n""operator"":\n \n""<"",\n \n""value"":\n \n50\n \n}\n \n \n],\n \n \n""actions"":\n \n[\n \n \n{\n \n""field"":\n \n""Staffing.base_employees"",\n \n""operator"":\n \n""="",\n \n""value"":\n \n2\n \n}\n \n \n]\n \n \n},\n \n \n{\n \n \n""conditions"":\n \n[\n \n \n{\n \n""field"":\n \n","[0.04919858, -0.010067676, -0.060101226, 0.016268793, 0.03116375, -0.0043034935, 0.05381633, -0.028441971, 0.005918334, -0.013388913, -0.017381676, 0.10779931, 0.003037862, -0.0099967895, 0.02202179, -0.052892298, 0.08155548, 0.046749413, -0.03144321, 0.007933805, -0.014465134, 0.007716479, -0.04269964, -0.052540343, -0.03171504, -0.04394418, -0.01125445, 0.012866279, -0.006961281, 0.010076911, 0.053764537, -0.007951014, 0.023413938, -0.09197664, 0.02816584, 0.032855745, -0.044921413, 0.012058379, 0.0643297, -0.046337213, -0.05716247, 0.0043887706, -0.014260607, -0.014139757, -0.013150639, -0.021319121, -0.055767428, -0.02645386, 0.01573202, 0.021369424, -0.010861691, -0.004468037, -0.010341104, -0.020255065, -0.08822379, 0.001994822, -0.03604258, -0.06923737, 0.062156826, -0.039390337, -0.00024247698, -0.0545765, -0.03646127, -0.021813264, 0.06426366, 0.007534424, 0.028266918, 0.029077137, -0.056083504, -0.013474556, -0.04001026, 0.051787376, -0.031573523, -0.013667688, -0.031852126, -0.0031117643, 0.0057831877, -0.021372423, 0.043803953, 0.025256692, -0.034020916, -0.030068336, 0.039033167, 0.042692337, 0.009480337, -0.02103771, 0.010957637, -0.035968874, -0.04718371, -0.0024774158, 0.039056316, 0.0059929686, -0.03818446, -0.028948782, 0.06847276, -0.022803029, -0.07342502, -0.08546454, 0.042401955, 0.02051047, ...]",0.3815
4,initial_rules.pdf,"s"":\n \n[\n \n \n{\n \n""field"":\n \n""Restaurant.size"",\n \n""operator"":\n \n""=="",\n \n""value"":\n \n""Large""\n \n},\n \n \n{\n \n""field"":\n \n""Forecast.total_sales"",\n \n""operator"":\n \n"">="",\n \n""value"":\n \n0\n \n},\n \n \n{\n \n""field"":\n \n""Forecast.total_sales"",\n \n""operator"":\n \n""<"",\n \n""value"":\n \n50\n \n}\n \n \n],\n \n \n""actions"":\n \n[\n \n \n{\n \n""field"":\n \n""Staffing.base_employees"",\n \n""operator"":\n \n""="",\n \n""value"":\n \n4\n \n}\n \n \n]\n \n \n},\n \n \n{\n \n \n""conditions"":\n \n[\n \n \n{\n \n""field"":\n \n""Restaurant.size"",\n \n""oper","[0.06071765, 0.009070737, -0.053456202, 0.03083036, 0.024674656, -0.008818782, 0.05143102, -0.0264886, 0.010213832, -0.004751141, -0.030963944, 0.1032203, 0.013343755, -0.024723422, 0.014676226, -0.054412436, 0.06916734, 0.03606529, -0.02980119, 0.010511553, -0.03717279, 0.00066998816, -0.03543053, -0.060344804, -0.033005465, -0.03521867, -0.011779955, 0.009590996, -0.009388559, -0.0007020367, 0.05931526, 0.000880899, 0.017146045, -0.09980507, 0.034933403, 0.034839693, -0.052384388, 0.013171714, 0.057273522, -0.050504807, -0.059842188, 9.799446e-05, -0.016549936, -0.019828858, -0.015456106, -0.021533126, -0.06517853, -0.027196715, 0.012065453, 0.02142812, -0.014210188, -0.0063792486, -0.008416008, -0.015089314, -0.09574672, -0.007740778, -0.03017222, -0.05469909, 0.06029189, -0.03433038, -0.0034178903, -0.057428036, -0.041685097, -0.017567785, 0.066233434, -0.0074072196, 0.031497505, 0.03406924, -0.049400587, -0.010639216, -0.03681672, 0.052818034, -0.027948348, -0.013798447, -0.028780453, -0.0014758788, -2.3156139e-05, -0.028639533, 0.038940154, 0.026977073, -0.03945377, -0.03492173, 0.035343014, 0.033910025, 0.010371719, -0.011777108, 0.030704577, -0.04191618, -0.04795579, 0.012587864, 0.026574798, 0.01110025, -0.037624247, -0.030156702, 0.07561322, -0.017067237, -0.07156622, -0.08263796, 0.040013887, 0.033862866, ...]",0.380497
5,initial_rules.pdf,"\n \n{\n \n""field"":\n \n""Restaurant.size"",\n \n""operator"":\n \n""=="",\n \n""value"":\n \n""Large""\n \n},\n \n \n{\n \n""field"":\n \n""Forecast.total_sales"",\n \n""operator"":\n \n"">="",\n \n""value"":\n \n50\n \n},\n \n \n{\n \n""field"":\n \n""Forecast.total_sales"",\n \n""operator"":\n \n""<"",\n \n""value"":\n \n100\n \n}\n \n \n],\n \n \n""actions"":\n \n[\n \n \n{\n \n""field"":\n \n""Staffing.base_employees"",\n \n""operator"":\n \n""="",\n \n""value"":\n \n5\n \n}\n \n \n]\n \n \n},\n \n \n{\n \n \n""conditions"":\n \n[\n \n \n{\n \n""field"":\n \n""Restaurant.size"",\n \n""operator"":","[0.049771342, 0.003469822, -0.055141006, 0.017056536, 0.022335293, -0.002680773, 0.049963642, -0.027200213, 0.015760103, -0.0060623027, -0.028575523, 0.10357171, 0.0006913679, -0.010869974, 0.019129487, -0.05781063, 0.0724276, 0.032165732, -0.027313441, 0.013378446, -0.02846697, 0.004737604, -0.039338782, -0.055223145, -0.019840874, -0.03330796, -0.019348986, 0.012439854, -0.01340362, -0.0002563108, 0.050787218, -0.00788948, 0.02367622, -0.099269845, 0.036090046, 0.03621195, -0.054317527, 0.005090671, 0.060762644, -0.056719515, -0.062117416, -0.00346072, -0.016727204, -0.016641311, -0.012119036, -0.02361772, -0.0559163, -0.020049144, 0.002893754, 0.013259914, -0.021637043, -0.0061858236, 0.0023416793, -0.015357091, -0.07903292, 0.0022778602, -0.035039406, -0.05597765, 0.06515684, -0.038305324, -0.00314176, -0.05617843, -0.049408793, -0.010050985, 0.08172735, -0.002440536, 0.020532994, 0.028445095, -0.048720293, -0.013988645, -0.040259965, 0.044613432, -0.026575757, -0.009674836, -0.03406963, -0.009669099, 0.0073625715, -0.0381566, 0.036437266, 0.016819138, -0.032431327, -0.037653327, 0.036967617, 0.033713732, 0.014764241, -0.0086916555, 0.028636092, -0.04366636, -0.05871176, 0.0012137889, 0.031414647, 0.004109089, -0.043691084, -0.03215476, 0.06475321, -0.022494566, -0.068936944, -0.07794888, 0.04281139, 0.030068938, ...]",0.380262
6,initial_rules.pdf,"\n{\n \n""field"":\n \n""Restaurant.size"",\n \n""operator"":\n \n""=="",\n \n""value"":\n \n""Large""\n \n},\n \n \n{\n \n""field"":\n \n""Forecast.total_sales"",\n \n""operator"":\n \n"">="",\n \n""value"":\n \n100\n \n},\n \n \n{\n \n""field"":\n \n""Forecast.total_sales"",\n \n""operator"":\n \n""<"",\n \n""value"":\n \n200\n \n}\n \n \n],\n \n \n""actions"":\n \n[\n \n \n{\n \n""field"":\n \n""Staffing.base_employees"",\n \n""operator"":\n \n""="",\n \n""value"":\n \n6\n \n}\n \n \n]\n \n \n},\n \n \n{\n \n \n""conditions"":\n \n[\n \n \n{\n \n""field"":\n \n""Restaurant.size"",\n \n""operator"":\n \n""=","[0.050289694, 0.0031000231, -0.055422064, 0.016255604, 0.022128517, -0.00257259, 0.05601591, -0.030948708, 0.01476166, -0.0017119299, -0.03135002, 0.10288264, 0.0024365957, -0.013705261, 0.016528694, -0.057606947, 0.06781349, 0.03317944, -0.02886567, 0.0031919272, -0.027832821, 0.0051840786, -0.037973166, -0.05724613, -0.024096632, -0.033228293, -0.022118445, 0.009259252, -0.009111798, -0.0064876284, 0.053012665, -0.0016303356, 0.021622643, -0.10356684, 0.03869891, 0.034161687, -0.053065874, 0.0025803428, 0.06038317, -0.047641736, -0.060313247, -0.0035389296, -0.015459583, -0.014781072, -0.013266972, -0.024279963, -0.057347402, -0.020816006, 0.00736225, 0.021084372, -0.016453484, 0.0010368153, 0.0025685327, -0.015871655, -0.08107633, 0.0020198454, -0.03441385, -0.059727494, 0.06226695, -0.037688185, 0.0005727891, -0.057308223, -0.047843717, -0.006487325, 0.085608646, -0.0028416745, 0.021744391, 0.033478152, -0.05062023, -0.01300394, -0.038871437, 0.044324547, -0.030956088, -0.008492629, -0.03000384, -0.007144448, -0.0005560699, -0.028418912, 0.0333718, 0.01736899, -0.032603096, -0.031234825, 0.044809986, 0.032167934, 0.016942257, -0.008899811, 0.022973983, -0.04469447, -0.057952195, -0.0022322407, 0.028292766, 0.012933225, -0.043160815, -0.025513422, 0.06332486, -0.018532544, -0.065354496, -0.0793613, 0.03846572, 0.03758659, ...]",0.378613
1,initial_rules.pdf,"\n \n \n""conditions"":\n \n[\n \n \n{\n \n""field"":\n \n""Restaurant.size"",\n \n""operator"":\n \n""=="",\n \n""value"":\n \n""Small""\n \n},\n \n \n{\n \n""field"":\n \n""Forecast.total_sales"",\n \n""operator"":\n \n"">="",\n \n""value"":\n \n50\n \n},\n \n \n{\n \n""field"":\n \n""Forecast.total_sales"",\n \n""operator"":\n \n""<"",\n \n""value"":\n \n100\n \n}\n \n \n],\n \n \n""actions"":\n \n[\n \n \n{\n \n""field"":\n \n""Staffing.base_employees"",\n \n""operator"":\n \n""="",\n \n""value"":\n \n3\n \n}\n \n \n]\n \n \n},\n \n \n{\n \n \n""conditions"":\n \n[\n \n \n{\n \n""field"":\n \n""Resta","[0.05456588, -0.0066666594, -0.05697148, 0.009857149, 0.02660761, -0.008977223, 0.051652964, -0.028512156, 0.005845277, -0.007359031, -0.016766578, 0.10431383, -0.00049552467, -0.011104158, 0.012149745, -0.053159587, 0.07560737, 0.044290744, -0.029975226, 0.010971423, -0.018154692, 0.0053198296, -0.04804022, -0.049352165, -0.032199774, -0.038613863, -0.0089941565, 0.009171813, -0.010646751, 0.0072951308, 0.05118638, -0.0071644234, 0.023315774, -0.101643816, 0.031792484, 0.03654953, -0.044305068, 0.012465371, 0.07037262, -0.047536336, -0.0602184, 0.0007264612, -0.017677443, -0.008988425, -0.009316057, -0.024523059, -0.05373987, -0.021116342, 0.008916735, 0.02031957, -0.023063805, 0.0018714505, -0.0032531652, -0.020840434, -0.08608842, -0.005103642, -0.033952743, -0.06356983, 0.064777374, -0.04113601, -0.0008030758, -0.054324232, -0.044997655, -0.008378552, 0.07161215, 0.0027760384, 0.030108487, 0.033128858, -0.054694735, -0.0067321253, -0.040235307, 0.050656836, -0.024157315, -0.010510802, -0.032494605, -0.004695337, 0.010410307, -0.026089743, 0.043544155, 0.021706466, -0.028561074, -0.03384399, 0.04705138, 0.041584652, 0.008056857, -0.023206804, 0.016499909, -0.04212173, -0.050808907, -0.005791444, 0.027129242, 0.001909234, -0.038135804, -0.032856792, 0.05754362, -0.0224903, -0.06895786, -0.07809935, 0.03977637, 0.025536004, ...]",0.37654


## Saving .drl files

In [32]:
# Path to save DRL files locally
path_to_drl = "/content/drive/My Drive/Colab Notebooks/Capstone/Agent2RAG/drl_files/"

# Save the DRLs to folders locally to be able to use them for RAG later
def save_drl_to_file(drl_content: str, directory: str = path_to_drl):
    # Ensure the output directory exists
    os.makedirs(directory, exist_ok=True)

    # Trim leading and trailing spaces and backticks
    drl_content = drl_content.strip("`\n")

     # Adjust regex to extract the rule name more flexibly
    match = re.search(r'rule\s+"([^"]+)"', drl_content.strip())
    if match:
        rule_name = match.group(1)  # Extracted rule name
    else:
        raise ValueError("Could not extract rule name from DRL content")

    # Set filename
    filename = f"{rule_name}.drl"
    filepath = os.path.join(directory, filename)

    # Write the DRL content to file
    with open(filepath, "w") as f:
        f.write(drl_content)

    print(f"✓ DRL file saved at: {filepath}")


In [33]:
# Need to change the drl name for each file. In the future we need to automate this.
drl_text = agent2_generate_drl_with_rag(rule_json, results_df_2)

# Clean response from agent1
def clean_drools_block(text):
    return text.strip().removeprefix("```drools").removesuffix("```").strip()

cleaned_2_response = clean_drools_block(drl_text)
print(cleaned_2_response)

# Save the drl, need to change the drl name for each file. In the future we need to automate this.
save_drl_to_file(cleaned_2_response)


```drl
rule "RuleFor_Restaurant_size_is"
when
    Restaurant( size == "Large" )
then
    Staffing.base_employees = 12;
end
✓ DRL file saved at: /content/drive/My Drive/Colab Notebooks/Capstone/Agent2RAG/drl_files/RuleFor_Restaurant_size_is.drl
