# **Project 2**

## **Business Use Case**

### **Problem Statement:**

In the fast-paced environment of healthcare centers, healthcare professionals often face the challenge of quick and accurate diagnosis of patients while managing an ever-increasing volume of medical information. Ensuring that healthcare providers have access to the latest and most comprehensive medical knowledge is crucial for improving patient outcomes and reducing the time needed to make informed decisions.

There are multiple challenges that these professionals encounter daily, a few being

- Information Overload: Medical professionals need to go through vast amounts of data and research to make accurate diagnoses and treatment plans. This can be overwhelming and time-consuming.
- Efficiency: For overall patient care and quality health outcomes, quick and accurate diagnosis is vital, especially in emergency situations.
- Access to Trusted Knowledge: In the ever-evolving healthcare industry, providing access to reliable and up-to-date medical information from renowned manuals and research papers is essential for maintaining high standards of care.


### **Objective:**

A renowned chain of hospitals has decided to leverage AI to build a state-of-the-art solution to help healthcare professionals overcome the aforementioned challenges. They have recruited you as an AI specialist and tasked you with building a RAG-based AI solution that leverages renowned medical manuals as its knowledge base. This AI system will act as a POC towards an end product that’ll assist healthcare professionals in making better, quicker, and more accurate diagnoses, ultimately leading to faster patient resolutions and enabling better patient outcomes by reducing errors in diagnosis, saving valuable time for information retrieval, and standardizing care practices across the board.

### **Questions:**

**1. Diagnostic Assistance**: "What are the common symptoms and treatments for pulmonary embolism?"

**2. Drug Information**: "Can you provide the trade names of medications used for treating hypertension?"

**3. Treatment Plans**: "What are the first-line options and alternatives for managing rheumatoid arthritis?"

**4. Specialty Knowledge**: "What are the diagnostic steps for suspected endocrine disorders?"

**5. Critical Care Protocols**: "What is the protocol for managing sepsis in a critical care unit?"

### **How This Application Empowers Professionals and Elevates Healthcare Organizations**

- **Enhanced Diagnostic Accuracy:**
Provides evidence-based insights for better diagnosis and treatment.

- **Time Efficiency:**
Instantly retrieves critical information, saving valuable time.

- **Improved Patient Care:**
Ensures informed decisions with up-to-date medical knowledge.

- **Cost-Effective Operations:**
Reduces redundant tests and consultation delays, lowering costs.

- **Knowledge Empowerment:**
Keeps doctors updated on the latest advancements.

- **Competitive Edge for the Hospital:**
Positions the hospital as a leader in healthcare innovation.

This collaboration between St. Bernard’s Medical Center and InnoviTech Solutions highlights the transformative potential of AI in revolutionizing healthcare.

## **1. Install and Import Required Libraries**

In [1]:
# Install the Azure Machine Learning SDK and FAISS-related utilities
%pip install azure-ai-ml
%pip install -U 'azureml-rag[faiss,hugging_face]>=0.2.36'

Note: you may need to restart the kernel to use updated packages.
Collecting azureml-rag>=0.2.36 (from azureml-rag[faiss,hugging_face]>=0.2.36)
  Downloading azureml_rag-0.2.38-py3-none-any.whl.metadata (23 kB)
Collecting openai>=0.27.8 (from azureml-rag>=0.2.36->azureml-rag[faiss,hugging_face]>=0.2.36)
  Downloading openai-1.65.1-py3-none-any.whl.metadata (27 kB)
Collecting tiktoken<1.0,>=0.7 (from azureml-rag>=0.2.36->azureml-rag[faiss,hugging_face]>=0.2.36)
  Downloading tiktoken-0.9.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.7 kB)
Collecting mmh3 (from azureml-rag>=0.2.36->azureml-rag[faiss,hugging_face]>=0.2.36)
  Downloading mmh3-5.1.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (16 kB)
Collecting faiss-cpu~=1.7.3 (from azureml-rag[faiss,hugging_face]>=0.2.36)
  Downloading faiss_cpu-1.7.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (1.3 kB)
Collecting sentence-tra

## **2. Configure Azure Machine Learning Workspace**

### Get client for AzureML Workspace

In [3]:
#Write your code here
# Import necessary AzureML and authentication libraries
from azure.identity import DefaultAzureCredential, InteractiveBrowserCredential
from azure.ai.ml import MLClient
from azureml.core import Workspace

In [6]:

%%writefile creds.json
{
    "subscription_id": "b113073a-8845-458d-8462-4792938c8faa",
    "resource_group": "default_resorce_group",
    "workspace_name": "pizzasalesworkspace",
    "account_name": "azureai202501"
}

Overwriting creds.json


In [5]:
%%writefile config.json
{
    "AZURE_OPENAI_KEY":"BispoQ9bDuajs4NvQV6hLb6trvEMh4MdpqtBclvImgiiCkHF4bBnJQQJ99BAACYeBjFXJ3w3AAABACOG99SW",
    "AZURE_OPENAI_ENDPOINT":"https://azureai202501.openai.azure.com/",
    "AZURE_OPENAI_APIVERSION":"2023-05-15",
    "AZURE_OPENAI_EMBEDDING_MODEL":"text-embedding-ada-002",
    "AZURE_OPENAI_EMBEDING_DEPLOYMENT":"text-embedding-ada-002"

}

Overwriting config.json


In [7]:

# Initialize credentials for Azure authentication
try:
    credential = DefaultAzureCredential()
    # Check if given credential can get token successfully.
    credential.get_token("https://management.azure.com/.default")
except Exception as ex:
    # Fall back to InteractiveBrowserCredential in case DefaultAzureCredential not work
    credential = InteractiveBrowserCredential()



# Initialize the MLClient to connect with AzureML
ml_client = MLClient.from_config(credential=credential, path="creds.json")



# Create an AzureML Workspace object
ws = Workspace(
    subscription_id=ml_client.subscription_id,
    resource_group=ml_client.resource_group_name,
    workspace_name=ml_client.workspace_name,
)


# Verify the client and workspace details
print(ml_client)

Found the config file in: creds.json


MLClient(credential=<azure.identity._credentials.default.DefaultAzureCredential object at 0x7f2b24317fa0>,
         subscription_id=b113073a-8845-458d-8462-4792938c8faa,
         resource_group_name=default_resorce_group,
         workspace_name=pizzasalesworkspace)


## **3. Register the Reports Dataset as a Data Asset**

In [8]:
#Write you are code here
# Import libraries for data registration
from azure.ai.ml.entities import Data
from azure.ai.ml.constants import AssetTypes
import zipfile
import os

# Path to the ZIP file containing Tesla annual reports
zip_file_path = 'MedicalDiagnosisManuals.zip'


# Directory to extract the reports
extract_to_directory = './MedicalDiagnosisManual'
os.makedirs(extract_to_directory, exist_ok=True)

# Extract the ZIP file containing the reports
with zipfile.ZipFile(zip_file_path, 'r') as zip_ref:
    zip_ref.extractall(extract_to_directory)

# Register the extracted data as a Data asset in AzureML
local_data_path = extract_to_directory
data_asset_name = "Medical-Diagnosis-list"
data_asset_description = "A collection of medical manuals used by St. Bernard's Medical Center for embedding generation and knowledge retrieval in the RAG system."

data_asset = Data(
    path=local_data_path,
    type=AssetTypes.URI_FOLDER,  # Registering as a folder URI
    description=data_asset_description,
    name=data_asset_name
)

# Use the MLClient to register the data asset
ml_client.data.create_or_update(data_asset)
print(f"Data asset '{data_asset.name}' registered successfully.")

Data asset 'Medical-Diagnosis-list' registered successfully.


## **4. Set Up Azure OpenAI Connection**

### Run the cells under _either_ heading (OpenAI or HuggingFace) to use the respective embedding model

#### OpenAI

In [9]:
# # Azure Open AI redentials and the id of the deployed chat model are stored as
# # key value pairs in a json file

with open('creds.json', 'r') as az_creds:   #Fill the blank with json credentails file 
     data = az_creds.read()

# # Credentials to authenticate to the personalized Open AI model server
import json
creds = json.loads(data)

In [28]:
from azureml.rag.utils.connections import get_connection_by_name_v2, create_connection_v2

# # Define the connection name for Azure OpenAI
aoai_connection_name = "Custom_AzureOpenAI_Connection"
endpoint = creds.get("endpoint", "https://azureai202501.openai.azure.com/")  # Use a default or handle missing key
api_key = creds.get("key", "BispoQ9bDuajs4NvQV6hLb6trvEMh4MdpqtBclvImgiiCkHF4bBnJQQJ99BAACYeBjFXJ3w3AAABACOG99SW")  # Use a default or handle missing key
api_version = creds.get("api_version", "2023-05-15")  # Use a default or handle missing key


# Create the Azure OpenAI connection
aoai_connection = create_connection_v2(
    workspace=ws,
    name=aoai_connection_name,
    category="AzureOpenAI",
    target=endpoint,
    auth_type="ApiKey",
    credentials={"key": api_key},
    metadata={"ApiType": "azure", "ApiVersion": "2023-05-15"},
)

aoai_connection_id = aoai_connection["id"]

print(f"Azure OpenAI connection created or retrieved successfully: {aoai_connection_id}")
print(endpoint)
print(f"ao ai connection{aoai_connection}")

Azure OpenAI connection created or retrieved successfully: /subscriptions/b113073a-8845-458d-8462-4792938c8faa/resourceGroups/default_resorce_group/providers/Microsoft.MachineLearningServices/workspaces/pizzasalesworkspace/connections/Custom_AzureOpenAI_Connection
https://azureai202501.openai.azure.com/
ao ai connection{'tags': None, 'location': None, 'id': '/subscriptions/b113073a-8845-458d-8462-4792938c8faa/resourceGroups/default_resorce_group/providers/Microsoft.MachineLearningServices/workspaces/pizzasalesworkspace/connections/Custom_AzureOpenAI_Connection', 'name': 'Custom_AzureOpenAI_Connection', 'type': 'Microsoft.MachineLearningServices/workspaces/connections', 'properties': {'authType': 'ApiKey', 'credentials': None, 'group': 'AzureAI', 'category': 'AzureOpenAI', 'expiryTime': None, 'target': 'https://azureai202501.openai.azure.com/', 'createdByWorkspaceArmId': '/subscriptions/b113073a-8845-458d-8462-4792938c8faa/resourceGroups/default_resorce_group/providers/Microsoft.Machine

In [29]:

from azureml.rag.utils.deployment import infer_deployment

aoai_embedding_model_name = creds.get("AZURE_OPENAI_EMBEDDING_MODEL", "text-embedding-ada-002")
aoai_embedding_deployment_name = creds.get("AZURE_OPENAI_EMBEDING_DEPLOYMENT", "text-embedding-ada-002")

embeddings_model_uri = f"azure_open_ai://openai/deployments/{aoai_embedding_deployment_name}/model/{aoai_embedding_model_name}"
print(f"Embedding Model URI: {embeddings_model_uri}")

Embedding Model URI: azure_open_ai://openai/deployments/text-embedding-ada-002/model/text-embedding-ada-002


In [34]:
from azureml.rag.utils.deployment import infer_deployment

aoai_embedding_model_name = "text-embedding-ada-002"
try:
      aoai_embedding_deployment_name = infer_deployment(
          aoai_connection, aoai_embedding_model_name
      )
      print(
          f"Deployment name in AOAI workspace for model '{aoai_embedding_model_name}' is '{aoai_embedding_deployment_name}'"
      )
except Exception as e:
      print(f"Deployment name in AOAI workspace for model '{aoai_embedding_model_name}' is not found.")
      print(
          f"Please create a deployment for this model by following the deploy instructions on the resource page for '{aoai_connection['properties']['target']}' in Azure Portal."
      )

Deployment name in AOAI workspace for model 'text-embedding-ada-002' is not found.
Please create a deployment for this model by following the deploy instructions on the resource page for 'https://azureai202501.openai.azure.com/' in Azure Portal.


#### HuggingFace


## **5. Setup Pipeline to process data into Index**

### **Define Pipeline Components**

In [15]:
# Write you are code here
#Import the MLClient to access the AzureML registry
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential
ml_registry = MLClient(credential=credential, registry_name="azureml")

# Retrieve components for processing data, generating embeddings, and creating the FAISS index
crack_and_chunk_component = ml_registry.components.get(
    "llm_rag_crack_and_chunk", label="latest"
)
generate_embeddings_component = ml_registry.components.get(
    "llm_rag_generate_embeddings", label="latest"
)
create_faiss_index_component = ml_registry.components.get(
    "llm_rag_create_faiss_index", label="latest"
)
register_mlindex_component = ml_registry.components.get(
    "llm_rag_register_mlindex_asset", label="latest"
)

In [156]:
print(crack_and_chunk_component)

$schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json
name: llm_rag_crack_and_chunk
version: 0.0.78
display_name: LLM - Crack and Chunk Data
description: 'Creates chunks no larger than `chunk_size` from `input_data`, extracted
  document titles are prepended to each chunk


  LLM models have token limits for the prompts passed to them, this is a limiting
  factor at embedding time and even more limiting at prompt completion time as only
  so much context can be passed along with instructions to the LLM and user queries.

  Chunking allows splitting source data of various formats into small but coherent
  snippets of information which can be ''packed'' into LLM prompts when asking for
  answers to user query related to the source documents.


  Supported formats: md, txt, html/htm, pdf, ppt(x), doc(x), xls(x), py

  '
tags:
  Preview: ''
type: command
inputs:
  input_data:
    type: uri_folder
    description: Uri Folder containing files to be chunked.
    op

### **Build the AzureML Pipeline**

In [16]:
# Write you are code here
from azure.ai.ml import Input, Output
from azure.ai.ml.dsl import pipeline
from azure.ai.ml.entities._job.pipeline._io import PipelineInput
from typing import Optional

In [17]:
# Utility function for automatic compute configuration
def use_automatic_compute(component, instance_count=1, instance_type="Standard_NC4as_T4_v3"):
    """Configure a component to use automatic compute."""
    component.set_resources(
        instance_count=instance_count,
        instance_type=instance_type,
        properties={"compute_specification": {"automatic": True}},
    )
    return component


# Utility function to check if optional pipeline inputs are provided
def optional_pipeline_input_provided(input: Optional[PipelineInput]):
    """Check if optional pipeline inputs are provided."""
    return input is not None and input._data is not None

In [18]:
@pipeline(default_compute="serverless")   #Fill the blank with name of the pipeline and compute
def diagnosismanuals_to_faiss(
    data_asset_path: str,
    embeddings_model: str,
    asset_name: str,
    chunk_size: int = 1024,    #Provide the chink size 
    data_source_glob: str = None,
    document_path_replacement_regex: str = None,
    aoai_connection_id=None,
    embeddings_container=None,
):
    """Pipeline to process medical diagnosis manuals and create a FAISS vector index for knowledge retrieval."""
    
    # Step 1: Chunk data into smaller pieces
    crack_and_chunk = crack_and_chunk_component(
        input_data=Input(type="uri_folder", path=data_asset_path),  # Input data asset and fill the blank with proper type 
        input_glob=data_source_glob,
        chunk_size=chunk_size,
        document_path_replacement_regex=document_path_replacement_regex,
    )
    use_automatic_compute(crack_and_chunk)  # Apply compute configuration

    # Step 2: Generate embeddings for the data chunks
    generate_embeddings = generate_embeddings_component(
        chunks_source=crack_and_chunk.outputs.output_chunks,
        embeddings_container=embeddings_container,
        embeddings_model=embeddings_model,
    )
    use_automatic_compute(generate_embeddings)  # Apply compute configuration
    
    #Optional: Include Azure OpenAI connection ID
    if optional_pipeline_input_provided(aoai_connection_id):
        generate_embeddings.environment_variables[                        #Fill the blank with proper variable
            "AZUREML_WORKSPACE_CONNECTION_ID_AOAI"
        ] = aoai_connection_id
    if optional_pipeline_input_provided(embeddings_container):
       generate_embeddings.outputs.embeddings = Output(
            type="uri_folder", path=f"{embeddings_container.path}/{{name}}"    #Fill the blank with proper type
        )

    # Step 3: Create a FAISS vector index from embeddings
    create_faiss_index = create_faiss_index_component(
        embeddings=generate_embeddings.outputs.embeddings,    #Fill the balnk with proper function
    )
    use_automatic_compute(create_faiss_index)  # Apply compute configuration

    # Step 4: Register the FAISS index as an MLIndex asset
    register_mlindex = register_mlindex_component(
        storage_uri=create_faiss_index.outputs.index,    #Fill the balnk with proper function
        asset_name=asset_name
    )
    use_automatic_compute(register_mlindex) # Apply compute configuration
    
    return {
        "mlindex_asset_uri": create_faiss_index.outputs.index,
        "mlindex_asset_id": register_mlindex.outputs.asset_id,
    }

## **6.Submit the Pipeline**

In [19]:
# Define the asset name and data source glob pattern
asset_name = "Medical-Diagnosis-list"  # Name for the FAISS index asset
data_source_glob = "**/*.pdf"  # Pattern to match input data files

In [20]:
# Get the input data asset path from the workspace datastore
datastore_path = ml_client.data.get(data_asset_name, version="1").path
print(f"Datastore path: {datastore_path}")
print(f"assest name: {asset_name}")

Datastore path: azureml://subscriptions/b113073a-8845-458d-8462-4792938c8faa/resourcegroups/default_resorce_group/workspaces/pizzasalesworkspace/datastores/workspaceblobstore/paths/LocalUpload/a0c35917db9eab5fa2442a9597890dcf/MedicalDiagnosisManual/
assest name: Medical-Diagnosis-list


In [22]:
# Create the pipeline job by calling the defined pipeline function
pipeline_job = diagnosismanuals_to_faiss(
    embeddings_model=embeddings_model_uri,  # URI of the embeddings model
    aoai_connection_id=aoai_connection_id,  # Connection ID for Azure OpenAI (optional)
    embeddings_container=Input(
        type="uri_folder",
        path=f"azureml://datastores/workspaceblobstore/paths/embeddings/{asset_name}"    
    ),  # Path for storing generated embeddings
    data_asset_path=Input(
        type="uri_folder",
        path=datastore_path
    ),  # Input data asset path
    chunk_size=1024,  # Size of chunks for processing
    data_source_glob=data_source_glob,  # Glob pattern for input files
    asset_name=asset_name  # Name of the MLIndex asset
)

In [23]:
# Add properties for better indexing and artifact tracking in the AzureML UI
pipeline_job.properties["azureml.mlIndexAssetName"] = asset_name
pipeline_job.properties["azureml.mlIndexAssetKind"] = "faiss"
pipeline_job.properties["azureml.mlIndexAssetSource"] = "Data asset"

In [24]:
print(f"AZUREML_WORKSPACE_CONNECTION_ID_AOAI is : {aoai_connection_id }")

AZUREML_WORKSPACE_CONNECTION_ID_AOAI is : /subscriptions/b113073a-8845-458d-8462-4792938c8faa/resourceGroups/default_resorce_group/providers/Microsoft.MachineLearningServices/workspaces/pizzasalesworkspace/connections/Custom_AzureOpenAI_Connection


In [26]:
# Submit the pipeline job for execution
submitted_pipeline = ml_client.jobs.create_or_update(pipeline_job)
print(f"Pipeline submitted successfully! Job ID: {submitted_pipeline.id}")

Pipeline submitted successfully! Job ID: /subscriptions/b113073a-8845-458d-8462-4792938c8faa/resourceGroups/default_resorce_group/providers/Microsoft.MachineLearningServices/workspaces/pizzasalesworkspace/jobs/modest_lock_n1nv6040l4


In [27]:
# Stream the pipeline job logs for real-time monitoring
ml_client.jobs.stream(submitted_pipeline.name)

RunId: modest_lock_n1nv6040l4
Web View: https://ml.azure.com/runs/modest_lock_n1nv6040l4?wsid=/subscriptions/b113073a-8845-458d-8462-4792938c8faa/resourcegroups/default_resorce_group/workspaces/pizzasalesworkspace

Streaming logs/azureml/executionlogs.txt

[2025-03-01 03:40:38Z] Completing processing run id 7edac6d8-8226-47be-83a4-405f0cdaf056.
[2025-03-01 03:40:40Z] Submitting 1 runs, first five are: cf100eef:8109a56a-ca5e-40cb-8870-b1432afed460
[2025-03-01 03:48:44Z] Execution of experiment failed, update experiment status and cancel running nodes.

Execution Summary
RunId: modest_lock_n1nv6040l4
Web View: https://ml.azure.com/runs/modest_lock_n1nv6040l4?wsid=/subscriptions/b113073a-8845-458d-8462-4792938c8faa/resourcegroups/default_resorce_group/workspaces/pizzasalesworkspace


JobException: Exception : 
 {
    "error": {
        "code": "ServiceError",
        "message": "Pipeline has failed child jobs. Failed nodes: /generate_embeddings. For more details and logs, please go to the job detail page and check the child jobs.",
        "message_format": "Pipeline has failed child jobs. {0}",
        "message_parameters": {},
        "reference_code": "PipelineHasStepJobFailed",
        "details": []
    },
    "environment": "eastus",
    "location": "eastus",
    "time": "2025-03-01T03:48:44.542734Z",
    "component_name": ""
} 

# **Information Retrieval and Response Generation Using LangChain-FAISS and Azure OpenAI**

## **1.Installing Required Libraries**

In [52]:
# Install the required LangChain and HuggingFace libraries
%pip install -U langchain-community
%pip install -U langchain-huggingface
%pip install -U langchain-openai

Collecting langchain-community
  Downloading langchain_community-0.3.18-py3-none-any.whl (2.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m13.8 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hCollecting dataclasses-json<0.7,>=0.5.7
  Downloading dataclasses_json-0.6.7-py3-none-any.whl (28 kB)
Collecting numpy<2,>=1.26.4
  Downloading numpy-1.26.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m18.2/18.2 MB[0m [31m10.0 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
Collecting httpx-sse<1.0.0,>=0.4.0
  Downloading httpx_sse-0.4.0-py3-none-any.whl (7.8 kB)
Collecting langsmith<0.4,>=0.1.125
  Downloading langsmith-0.3.11-py3-none-any.whl (335 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m335.3/335.3 kB[0m [31m11.9 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting SQLAlchemy<3,>=1.4
  Downloading SQLAlchemy-2.0.38-cp310-cp310-manylinux_

## **2. Setting Up Data Retrieval**

### **Downloading and Setting Up FAISS Index Assets**


In [53]:
# Import necessary utilities for artifact retrieval
import azure.ai.ml._artifacts._artifact_utilities as artifact_utils

# Retrieve the path to the latest FAISS index asset from Azure ML
data_info = ml_client.data.get(name=asset_name, label="latest").path

# Download the FAISS index asset to a local directory
artifact_utils.download_artifact_from_aml_uri(
    uri=data_info,
    destination="./medicaldiagnosisfaissindexasset/",
    datastore_operation=ml_client.datastores
)

# The FAISS index asset will be used for vector-based similarity search.

'./medicaldiagnosisfaissindexasset/'

## **3. Loading the FAISS Index**

### **Loading the FAISS Index and Preparing the Retriever**

In [54]:
# Path to the directory containing FAISS index files
index_folder_path = "./medicaldiagnosisfaissindexasset/"

In [55]:
creds = {
    "AZURE_OPENAI_EMBEDDING_MODEL": "text-embedding-ada-002",  # Replace with your actual model name
    "AZURE_OPENAI_ENDPOINT": "https://azureai202501.openai.azure.com/",
    "AZURE_OPENAI_KY": "BispoQ9bDuajs4NvQV6hLb6trvEMh4MdpqtBclvImgiiCkHF4bBnJQQJ99BAACYeBjFXJ3w3AAABACOG99SW",  # Replace with your actual API key
    "AZURE_OPENAI_APIVERSION": "2023-05-15-preview"  # Replace with the correct API version
}

In [56]:
from langchain_openai import AzureOpenAIEmbeddings

# Initialize the embedding model with the provided credentials
embedding_model = AzureOpenAIEmbeddings(
    model=creds["AZURE_OPENAI_EMBEDDING_MODEL"],
    azure_endpoint=creds["AZURE_OPENAI_ENDPOINT"],
    api_key=creds["AZURE_OPENAI_KY"],
    openai_api_version=creds["AZURE_OPENAI_APIVERSION"]
)
print(f"embeddings model: {embedding_model}")


embeddings model: client=<openai.resources.embeddings.Embeddings object at 0x7f2446a27520> async_client=<openai.resources.embeddings.AsyncEmbeddings object at 0x7f2446889f30> model='text-embedding-ada-002' dimensions=None deployment=None openai_api_version='2023-05-15-preview' openai_api_base=None openai_api_type='azure' openai_proxy=None embedding_ctx_length=8191 openai_api_key=SecretStr('**********') openai_organization=None allowed_special=None disallowed_special=None chunk_size=2048 max_retries=2 request_timeout=None headers=None tiktoken_enabled=True tiktoken_model_name=None show_progress_bar=False model_kwargs={} skip_empty=False default_headers=None default_query=None retry_min_seconds=4 retry_max_seconds=20 http_client=None http_async_client=None check_embedding_ctx_length=True azure_endpoint='https://azureai202501.openai.azure.com/' azure_ad_token=None azure_ad_token_provider=None azure_ad_async_token_provider=None validate_base_url=True


In [57]:
from langchain.vectorstores import FAISS

# Load the FAISS index and associate it with the embedding model
retriever = FAISS.load_local(
    folder_path=index_folder_path, 
    embeddings=embedding_model, 
    allow_dangerous_deserialization=True  # Acknowledge the source of the data for safe loading
)

# The retriever is now ready to perform similarity searches.

RuntimeError: Error in faiss::FileIOReader::FileIOReader(const char*) at /project/faiss/faiss/impl/io.cpp:67: Error: 'f' failed: could not open medicaldiagnosisfaissindexasset/index.faiss for reading: No such file or directory

In [58]:
print(f"FAISS index saved at {index_folder_path}")

FAISS index saved at ./medicaldiagnosisfaissindexasset/


## **4. Performing a Similarity Search**

In [59]:
# Define a query to test the retriever - you can feel free to pick any disease or medical condition for this 

query = "What are the common symptoms and treatments for pulmonary embolism?"

# Retrieve the top 3 most relevant documents
results = retriever.similarity_search(query, k=3)

# Display the results
for doc in results:
    print(f"Document: {doc.page_content}\nMetadata: {doc.metadata}")

# This step helps validate that the retriever is functioning as expected.

NameError: name 'retriever' is not defined

## **5: Creating the System and User Prompt Templates**

In [60]:
# Define the system prompt for the Azure OpenAI model
qna_system_message = """
    #Write you are message here
"""
# Define the user message template
qna_user_message_template = """
###Context
Here are some documents that are relevant to the question mentioned below.
{context}

###Question
{question}
"""

## **6. Generating the Response**

In [61]:
# Install the rquired packages
%pip install openai==1.2.0 tiktoken==0.6 session-info --quiet

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
langchain-openai 0.3.7 requires openai<2.0.0,>=1.58.1, but you have openai 1.2.0 which is incompatible.
langchain-openai 0.3.7 requires tiktoken<1,>=0.7, but you have tiktoken 0.6.0 which is incompatible.
dask-sql 2024.5.0 requires dask[dataframe]>=2024.4.1, but you have dask 2023.2.0 which is incompatible.
dask-sql 2024.5.0 requires distributed>=2024.4.1, but you have distributed 2023.2.0 which is incompatible.
dask-sql 2024.5.0 requires pandas>=1.4.0, but you have pandas 1.3.5 which is incompatible.
azureml-rag 0.2.38 requires tiktoken<1.0,>=0.7, but you have tiktoken 0.6.0 which is incompatible.[0m[31m
[0mNote: you may need to restart the kernel to use updated packages.


In [62]:
# Import required libraries
import json
import tiktoken
import pandas as pd
from openai import AzureOpenAI

In [63]:
# Load Azure OpenAI credentials
with open('config.json', 'r') as az_creds:
    data = az_creds.read()

creds = json.loads(data)

#print(creds)

In [None]:
#endpoint = creds.get("endpoint", "https://azureai2025.openai.azure.com/")  # Use a default or handle missing key
#api_key = creds.get("key", "BispoQ9bDuajs4NvQV6hLb6trvEMh4MdpqtBclvImgiiCkHF4bBnJQQJ99BAACYeBjFXJ3w3AAABACOG99SW")  # Use a default or handle missing key
#api_version = creds.get("api_version", "2024-05-01-preview")

In [64]:
# Initialize the Azure OpenAI client
client = AzureOpenAI(
    azure_endpoint=creds["endpoint"],
    api_key=creds["key"],
    api_version=creds["api_version"]
)

KeyError: 'endpoint'

In [65]:
def generate_rag_response(user_input):
    # Retrieve relevant document chunks
    relevant_document_chunks = retriever.similarity_search(user_input, k=3)
    context_list = [d.page_content for d in relevant_document_chunks]

    # Combine document chunks into a single context
    context_for_query = ". ".join(context_list)

    # Compose the prompt
    prompt = [
        {'role': 'system', 'content': qna_system_message},
        {'role': 'user', 'content': qna_user_message_template.format(
            context=context_for_query,
            question=user_input
            )
        }
    ]

    # Generate the response using Azure OpenAI
    try:
        response = client.chat.completions.create(
            model=creds["CHATGPT_MODEL"],
            messages=prompt,
            temperature=0
        )

        # Extract and print the model's response
        response = response.choices[0].message.content.strip()
    except Exception as e:
        response = f'Sorry, I encountered the following error: \n {e}'


    print(response)


#### Question 1: What is the protocol for managing sepsis in a critical care unit?

In [66]:
user_input = "What is the protocol for managing sepsis in a critical care unit?"    # Enter the question to be answered by the system here 
generate_rag_response(user_input)

NameError: name 'retriever' is not defined

#### Question 2: What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?

In [104]:
user_input = "What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?"   # Enter the question to be answered by the system here 
generate_rag_response(user_input)

NameError: name 'retriever' is not defined

#### Question 3: What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?

In [None]:
user_input = "What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?"   # Enter the question to be answered by the system here 
generate_rag_response(user_input)

#### Question 4:  What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?

In [None]:
user_input = "What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?"   # Enter the question to be answered by the system here 
generate_rag_response(user_input)

#### Question 5: What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?

In [None]:
user_input = "What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?"   # Enter the question to be answered by the system here 
generate_rag_response(user_input)

#### Power Ahead!