# GenAI with Azure Databricks - Developing RAG System

### Loading the csv file into the DBFS (Databricks File System)

### Loading the csv file into a dataframe

In [0]:
from pyspark.sql.functions import *

df = spark.read.load('/Volumes/dbx_premium/default/rag_lab', format='csv', header=True)
display(df.limit(10))
df.printSchema()

Topic,Description
What is diabetes?,"Diabetes is a chronic condition that affects how the body processes glucose (sugar). It occurs when the body cannot produce enough insulin or the insulin it produces is ineffective in regulating blood sugar. Insulin is a hormone produced by the pancreas that helps glucose enter the cells of the body for energy. Without sufficient insulin, glucose builds up in the bloodstream, leading to high blood sugar levels. Over time, uncontrolled diabetes can cause serious health complications such as heart disease, kidney damage, nerve damage, and vision problems. Proper management and treatment of diabetes are essential to preventing these complications and maintaining a good quality of life. Early detection, lifestyle changes, and medication are key factors in effectively managing the disease."
What are the different types of diabetes?,"Diabetes is categorized into two main types: Type 1 and Type 2. Type 1 diabetes is an autoimmune condition where the body’s immune system attacks and destroys the insulin-producing cells in the pancreas, leading to little or no insulin production. It typically develops in children or young adults and requires lifelong insulin therapy. Type 2 diabetes, on the other hand, occurs when the body becomes resistant to insulin or does not produce enough insulin to meet the body’s needs. It is more common in adults, particularly those who are overweight, inactive, or have a family history of the disease. While Type 1 is not preventable, Type 2 can often be prevented or delayed through lifestyle changes, including diet and exercise."
What are the symptoms of diabetes?,"The symptoms of diabetes can vary depending on the type and how long the condition has been present. Common signs include frequent urination, excessive thirst, hunger, and unexplained weight loss. Some people may experience blurred vision, fatigue, and slow-healing wounds. In the case of Type 1 diabetes, symptoms often develop rapidly, while Type 2 diabetes symptoms may be more subtle and develop over time. Because the early symptoms may not always be noticeable, it is important to get regular check-ups, especially if you are at risk for diabetes. Uncontrolled diabetes can lead to serious complications, so timely diagnosis and treatment are essential."
How is diabetes diagnosed?,"Diabetes is diagnosed through various blood tests. The fasting blood glucose test measures blood sugar levels after an overnight fast, while the oral glucose tolerance test checks how well the body processes sugar after consuming a sugary drink. The HbA1c test, which reflects the average blood sugar levels over the past 2-3 months, is also commonly used to diagnose and monitor diabetes. An HbA1c level of 6.5% or higher is typically indicative of diabetes. A diagnosis may also involve checking for other conditions associated with diabetes, such as high blood pressure or cholesterol imbalances. Early detection allows for better management and prevention of complications."
What is the role of insulin in diabetes?,"Insulin is a hormone produced by the pancreas that helps regulate blood sugar levels by allowing glucose to enter cells for energy. In people with diabetes, either the body does not produce enough insulin (Type 1 diabetes) or the body’s cells do not respond effectively to insulin (Type 2 diabetes). As a result, glucose accumulates in the bloodstream, leading to high blood sugar. Insulin therapy, typically in the form of injections or an insulin pump, helps to lower blood sugar levels and mimic the body’s natural insulin production. Insulin is a crucial part of managing diabetes, particularly for those with Type 1, and can also be used in Type 2 when lifestyle changes and oral medications are not sufficient."
What are the treatment options for type 1 diabetes?,"For people with Type 1 diabetes, treatment primarily involves lifelong insulin therapy to manage blood sugar levels. Insulin can be administered through injections or via an insulin pump. In addition to insulin, individuals with Type 1 diabetes must closely monitor their blood sugar levels throughout the day, maintain a healthy diet, and engage in regular physical activity to help manage their condition. People with Type 1 diabetes also need to be vigilant about potential complications, such as diabetic ketoacidosis, and should regularly consult with their healthcare providers to adjust their treatment plan as needed."
What are the treatment options for type 2 diabetes?,"For Type 2 diabetes, treatment typically starts with lifestyle modifications such as a balanced diet, regular exercise, and weight management. If lifestyle changes are not sufficient, oral medications such as metformin may be prescribed to help regulate blood sugar levels. In some cases, individuals with Type 2 diabetes may require insulin therapy or other injectable medications to improve insulin sensitivity. For those with severe Type 2 diabetes or obesity, bariatric surgery may be considered to help improve blood sugar control. The goal of treatment is to achieve normal blood sugar levels and prevent complications such as heart disease, kidney damage, or nerve damage."
How can I manage my blood sugar levels?,"Managing blood sugar levels is essential for diabetes control. The key factors in blood sugar management include regular monitoring, taking prescribed medications, following a healthy diet, exercising regularly, and managing stress. Monitoring your blood sugar levels allows you to understand how food, exercise, and medications affect your glucose. A balanced diet rich in fiber, lean proteins, and healthy fats can help regulate blood sugar. Exercise increases insulin sensitivity, which helps control blood sugar levels. Stress management techniques, such as yoga and relaxation exercises, are also important for maintaining stable blood sugar. Regular doctor visits are essential to adjust treatment plans as needed."
What is a healthy diet for someone with diabetes?,"A healthy diet for people with diabetes focuses on foods that help regulate blood sugar levels and promote overall health. This includes consuming whole grains, vegetables, lean proteins, and healthy fats while limiting foods with a high glycemic index, such as refined sugars and processed foods. Eating smaller, more frequent meals can also help maintain steady blood sugar levels throughout the day. Carbohydrate counting is a common practice for those with diabetes, as carbohydrates have a direct impact on blood sugar. It is important to work with a dietitian to develop a meal plan that meets individual needs and preferences while promoting blood sugar control."
How often should I check my blood sugar levels?,"Blood sugar monitoring is essential for people with diabetes to understand how their body responds to different foods, activities, and medications. The frequency of testing depends on the type of diabetes and treatment plan. For individuals with Type 1 diabetes or those on insulin, blood sugar may need to be tested multiple times per day, including before and after meals or exercise. For Type 2 diabetes, the frequency of testing may vary based on medication use and blood sugar control. A healthcare provider can provide personalized recommendations for how often to check blood sugar levels, ensuring that levels remain within the target range to prevent complications."


root
 |-- Topic: string (nullable = true)
 |-- Description: string (nullable = true)



### Installing the openai SDK in our python kernel

In [0]:
%pip install openai==1.56.0

### Restarting our python kernel

In [0]:
dbutils.library.restartPython()

### Creating an Azure OpenAI Client


In [0]:
from openai import AzureOpenAI
import json

openai_endpoint = "https://tosrn-meqscj95-eastus2.cognitiveservices.azure.com/openai/deployments/gpt-4o-mini/chat/completions?api-version=2025-01-01-preview"
openai_key = "EyZxKgi7fvPmk3kLvGrBELkcGzOPlAVtfWqvwh7WB9DuOP4gM9UsJQQJ99BHACHYHv6XJ3w3AAAAACOGOlg6"

client = AzureOpenAI(
    api_key = openai_key,
    api_version = "2024-12-01-preview",
    azure_endpoint = openai_endpoint
)

### Saving the updated/new dataframe into ADLS as parquet storage

In [0]:
# Save the updated DataFrame as a Parquet file or table
df.write.mode("overwrite").parquet("/Volumes/dbx_premium/default/rag_lab/diabetes_faq.parquet")
df.write.format("delta").mode("overwrite").saveAsTable("default.diabetes_faq_table")


### Installing the databricks vectorsearch SDK

In [0]:
%pip install databricks-vectorsearch

### Restarting our python environment

In [0]:
dbutils.library.restartPython()

### Enabling Change Data Feed on Our Table

In [0]:
# Enable change data feed for the existing Delta table
spark.sql("""
ALTER TABLE dbx_premium.default.diabetes_faq_table
SET TBLPROPERTIES (delta.enableChangeDataFeed = true)
""")

DataFrame[]

### Developing the Cluster managed Vector index

In [0]:
from databricks.vector_search.client import VectorSearchClient

# vector_client = VectorSearchClient()

# vector_client.create_endpoint(
#      name="vector_search_endpoint",
#      endpoint_type="STANDARD"
#  )

index = vector_client.create_delta_sync_index(
   endpoint_name="vector_search_endpoint",
   source_table_name="dbx_premium.default.diabetes_faq_table",
   index_name="dbx_premium.default.diabetes_faq_index",
   pipeline_type="TRIGGERED",
   primary_key="Topic",
   embedding_source_column="Description",
   embedding_model_endpoint_name="databricks-gte-large-en"
  )

### Triggering our Vector Index - Information Retriever

In [0]:
user_question = "what is diabetes?"

results_dict = index.similarity_search(
            query_text = "{user_question}",
            columns = ["Topic", "Description"],
            num_results=1
          )

content = str(results_dict['result']['data_array'][0])
print(content)

[NOTICE] Using a notebook authentication token. Recommended for development only. For improved performance, please use Service Principal based authentication. To disable this message, pass disable_notice=True.
['What are the different types of diabetes?', 'Diabetes is categorized into two main types: Type 1 and Type 2. Type 1 diabetes is an autoimmune condition where the body’s immune system attacks and destroys the insulin-producing cells in the pancreas, leading to little or no insulin production. It typically develops in children or young adults and requires lifelong insulin therapy. Type 2 diabetes, on the other hand, occurs when the body becomes resistant to insulin or does not produce enough insulin to meet the body’s needs. It is more common in adults, particularly those who are overweight, inactive, or have a family history of the disease. While Type 1 is not preventable, Type 2 can often be prevented or delayed through lifestyle changes, including diet and exercise.', 0.001570

### Developing the Generation Component of our RAG architecture


In [0]:
gpt_response = client.chat.completions.create(
                model="gpt-4o-mini", # model = "deployment_name".
                messages=[
                    {"role": "system", "content": "You are a helpful assistant. You will be passed the user query and the supporting knowledge that can be used to answer the user_query"},
                    {"role": "user", "content": f"user query : {user_question} and supporting knowledge: {content}"}
                ]
            )
print(gpt_response.choices[0].message.content)

Diabetes is a chronic medical condition that occurs when the body is unable to properly process food for use as energy. Specifically, it involves issues with insulin, a hormone produced by the pancreas that helps regulate blood sugar (glucose) levels. When diabetes is present, it can result in high levels of glucose in the blood, which can lead to various health complications over time.

There are two main types of diabetes: 

1. **Type 1 Diabetes**: This is an autoimmune condition where the body's immune system attacks and destroys the insulin-producing cells in the pancreas. This leads to little or no insulin production. Type 1 diabetes typically develops in children or young adults and requires lifelong insulin therapy.

2. **Type 2 Diabetes**: This type occurs when the body becomes resistant to insulin or does not produce enough insulin to meet its needs. It is more common in adults, particularly those who are overweight, inactive, or have a family history of the disease. Unlike Ty

### Developing the RAG model

In [0]:
import mlflow
from mlflow import pyfunc
from openai import AzureOpenAI

class RAGModel(pyfunc.PythonModel):
      def __init__(self, vector_index):
          self.vector_index=vector_index
      
      def retrieve(self, query):
          results_dict = self.vector_index.similarity_search(
            query_text = query,
            columns = ["Topic", "Description"],
            num_results=1
          )

          return results_dict
        
      def chatCompletionsAPI(self, user_query, supporting_knowledge):
          openai_client = AzureOpenAI(
            azure_endpoint = "https://tosrn-meqscj95-eastus2.cognitiveservices.azure.com/openai/deployments/gpt-4o-mini/chat/completions?api-version=2025-01-01-preview",
            api_key = "EyZxKgi7fvPmk3kLvGrBELkcGzOPlAVtfWqvwh7WB9DuOP4gM9UsJQQJ99BHACHYHv6XJ3w3AAAAACOGOlg6",
            api_version = "2024-02-15-preview"
          )

          response = openai_client.chat.completions.create(
                model="gpt-4o-mini", # model = "deployment_name".
                messages=[
                    {"role": "system", "content": "You are a helpful assistant. You will be passed the user query and the supporting knowledge that can be used to answer the user_query"},
                    {"role": "user", "content": f"user query : {user_query} and supporting knowledge: {supporting_knowledge}"}
                ]
            )
          return response.choices[0].message.content
      
      def predict(self, context, data):
          query = data["query"].iloc[0]
          text_data = self.retrieve(query)
          return self.chatCompletionsAPI(query, text_data)
          


      

  param_names = _check_func_signature(func, "predict")


### Saving our Model

In [0]:
test_model = RAGModel(vector_index=index)

In [0]:
from mlflow.models import infer_signature
import pandas as pd

signature = infer_signature(pd.DataFrame([{"query": "what is diabetes?"}]))
model_path = "RAGKULJOTmodel"
mlflow.pyfunc.save_model(path=model_path, python_model=test_model, signature=signature)



### Loading Our Saved Model

In [0]:
# Load our custom model from the local artifact store
loaded_pyfunc_model = mlflow.pyfunc.load_model(model_path)


### Testing our Loaded/Saved Model

In [0]:
model_input = pd.DataFrame([{"query": "what is diabetes?"}])

model_response = loaded_pyfunc_model.predict(model_input)

print(model_response)

[NOTICE] Using a notebook authentication token. Recommended for development only. For improved performance, please use Service Principal based authentication. To disable this message, pass disable_notice=True.
Diabetes is a chronic condition that affects how the body processes glucose (sugar). It happens when the body cannot produce enough insulin or when the insulin it produces is ineffective in regulating blood sugar levels. Insulin is a hormone created by the pancreas that allows glucose to enter the cells for energy. When there is insufficient insulin, glucose accumulates in the bloodstream, leading to high blood sugar levels. 

If diabetes is left uncontrolled, it can result in serious health complications, including heart disease, kidney damage, nerve damage, and vision problems. Effective management and treatment are crucial for preventing these complications and maintaining a good quality of life. This typically involves early detection, lifestyle changes, and, when necessary, 

### Logging our saved model as an artifact

In [0]:
import mlflow

# Log the model as an artifact
with mlflow.start_run() as run:
    mlflow.log_artifacts(local_dir=model_path, artifact_path="rag_model")
    print(f"Model logged with run ID: {run.info.run_id}")


Uploading artifacts:   0%|          | 0/5 [00:00<?, ?it/s]

Model logged with run ID: f59e52a2bc304bc0a93bbbdc6779a7e3


### Inferencing the real-time endpoint

In [0]:
{
  "dataframe_records":[
    {
        "query":"what is diabetes?"
    }
  ]
}

{'dataframe_records': [{'query': 'what is diabetes?'}]}