# COVID-19 Agent

# Creating an LLM-based AI Research Agent for the CORD-19 Dataset.

---



---


The CORD-19 (COVID-19 Open Research Dataset) is a comprehensive collection of
scholarly articles about COVID-19, SARS-CoV-2, and related coronaviruses.
This project aims to build an AI agent that specializes in analyzing this dataset to answer questions about the relationship between COVID-19 and smoking (including cigarettes, vaping, and tobacco). The agent will leverage a Large Language Model (LLM) and Retrieval Augmented Generation (RAG) to provide insights based on the scientific literature within CORD-19.

# Install All Necessary Libraries

---



---



The !pip install commands will then install all the required Python packages.

In [None]:
# Run this cell to install all required libraries
!pip install gradio
!pip install llama-index llama-index-embeddings-huggingface llama-index-llms-huggingface
!pip install pandas pyarrow tqdm
!pip install transformers accelerate bitsandbytes torch

# Verify that PyTorch can see and use the CUDA-enabled GPU.

---



---



In [None]:
import torch

print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")

if torch.cuda.is_available():
    print(f"CUDA version: {torch.version.cuda}")
    print(f"GPU name: {torch.cuda.get_device_name(0)}") # 0 refers to the first GPU
else:
    print("WARNING: CUDA not available. Check Colab runtime settings.")

#Hugging Face Hub Setup

---



---



This code snippet uses the notebook_login function from the huggingface_hub library to securely connect your Colab notebook to your Hugging Face account. This authentication is necessary to download the dataset and the language model we'll be using later. You'll be prompted to enter an access token from your Hugging Face account settings.

In [None]:
!huggingface-cli login

# Google Drive Integration


---



---


This code mounts your Google Drive to the Colab environment. This allows you to save important files, like the vector index we will create later, directly to your personal Google Drive. This prevents data loss if your Colab session disconnects and saves you from having to rebuild the index every time you open the notebook. You'll be asked to authorize Colab to access your Drive.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

#Download and Load the Dataset

---



---



In [None]:
import pandas as pd

print("Attempting to load the CORD-19 abstracts dataset from Hugging Face...")

try:
    # This reads the dataset directly into a pandas DataFrame.
    # It might take a moment to download.
    df_cord19_abstracts = pd.read_parquet("hf://datasets/pritamdeka/cord-19-abstract/data/train-00000-of-00001.parquet")
    print("\n✅ Successfully loaded dataset.")
    print(f"The dataset has {len(df_cord19_abstracts)} abstracts.")

except Exception as e:
    print(f"\n❌ Error loading dataset from Hugging Face: {e}")
    print("Please ensure you are logged in with 'notebook_login()' and have network access.")
    df_cord19_abstracts = pd.DataFrame() # Create empty df to avoid later errors

# Data Acquisition & Preprocessing

---



**Inspect and Clean Data**

---



This code inspects the loaded data to understand its structure. .info() provides a technical summary (columns, data types), and .head() shows the first 5 rows to give us a look at the actual content. We then perform basic cleaning by removing any rows that might have an empty abstract and ensuring the abstract column is treated as text, which prevents errors in later steps.



In [None]:
# Inspect the structure of the loaded data and perform basic cleaning.

# Check if the DataFrame from the previous step exists
if 'df_cord19_abstracts' in locals() and not df_cord19_abstracts.empty:
    print("--- Dataset Information ---")
    df_cord19_abstracts.info()

    print("\n\n--- First 5 Rows of the Dataset ---")
    # Using display() in Colab provides a nicer table format
    display(df_cord19_abstracts.head())

    # --- Data Cleaning ---
    print("\n\n--- Cleaning Data ---")
    # Remove rows where the 'abstract' column is empty or missing
    df_cord19_abstracts.dropna(subset=['abstract'], inplace=True)
    # Ensure the abstract column is treated as the 'string' data type
    df_cord19_abstracts['abstract'] = df_cord19_abstracts['abstract'].astype(str)

    print("✅ Data cleaned: Removed empty abstracts and ensured text format.")
    print(f"The dataset now has {len(df_cord19_abstracts)} abstracts after cleaning.")
else:
    print("❌ DataFrame 'df_cord19_abstracts' was not loaded correctly in the previous step. Please check if you Load the Dataset successfully.")

**Keyword-based Filtering**

---



This code snippet filters our large DataFrame of abstracts to create a smaller, more focused one. First, it defines a keywords list containing terms related to smoking. It then uses pandas' powerful str.contains() function to search the 'abstract' column in a case-insensitive way for any of these keywords. The result is a new DataFrame, df_relevant_abstracts, containing only the documents relevant to our research question.

In [None]:
# Filter the DataFrame to include only abstracts containing specific keywords.

# First, check if the main DataFrame from the previous steps exists
if 'df_cord19_abstracts' in locals() and not df_cord19_abstracts.empty:
    print("--- Filtering Abstracts by Keywords ---")

    # Define your keywords related to smoking
    keywords = ["smoking", "cigarette", "nicotine", "vaping", "tobacco", "e-cigarette", "smoker"]
    print(f"Keywords for filtering: {keywords}")

    # Create a search pattern: "keyword1|keyword2|keyword3" which means "keyword1 OR keyword2 OR keyword3"
    search_terms_pattern = '|'.join(keywords)

    # Filter the DataFrame and create a new one with only relevant abstracts
    df_relevant_abstracts = df_cord19_abstracts[df_cord19_abstracts['abstract'].str.contains(search_terms_pattern, case=False, na=False)]

    print(f"\n✅ Found {len(df_relevant_abstracts)} relevant abstracts after keyword filtering.")

    # Display the first few relevant abstracts to confirm the filtering worked
    if len(df_relevant_abstracts) > 0:
        print("\n--- First 5 Relevant Abstracts ---")
        display(df_relevant_abstracts.head())
    else:
        print("\n⚠️ Warning: No relevant abstracts were found for the given keywords.")

else:
    print("❌ DataFrame 'df_cord19_abstracts' was not found or is empty. Please run the previous steps first.")
    # Create an empty DataFrame to prevent errors in later cells
    df_relevant_abstracts = pd.DataFrame()

# Vector Database Creation & Management

**Configure LlamaIndex Settings**


---


This code snippet configures LlamaIndex's global settings. It specifies which model to use for creating vector embeddings (sentence-transformers/all-MiniLM-L6-v2) and explicitly tells LlamaIndex to use the GPU (cuda) for this process, which significantly speeds it up. The Settings.llm is set to None for now, as we only need the embedding model in this phase.

In [None]:
from llama_index.core import Settings
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

print("--- Configuring LlamaIndex Settings for Embedding Model ---")

# Set the embedding model to be used for converting text to vectors.
# 'all-MiniLM-L6-v2' is a popular and efficient model for this.
# 'device="cuda"' ensures the GPU is used for this computationally intensive task.
Settings.embed_model = HuggingFaceEmbedding(
    model_name="sentence-transformers/all-MiniLM-L6-v2",
    device="cuda"
)

# We are not using a Large Language Model (LLM) in this phase,
# so we set it to None in the global settings for now.
Settings.llm = None

print("\n✅ Embedding model configured to run on CUDA.")
print("   LlamaIndex will now use 'all-MiniLM-L6-v2' for creating text embeddings.")



---



**Prepare Documents and Chunking**


---



This code takes the filtered abstracts (from df_relevant_abstracts) and breaks each one down into smaller, 150-word "chunks." Each chunk is then converted into a LlamaIndex Document object. This chunking process is important because it helps the AI pinpoint very specific pieces of information within the larger abstracts when searching for answers. A progress bar (tqdm) will show the status as it processes the documents.

In [None]:
from llama_index.core import Document
from tqdm import tqdm # For displaying a progress bar

# This list will hold all our chunked Document objects
documents = []

# First, check if the DataFrame with relevant abstracts exists and is not empty
if 'df_relevant_abstracts' in locals() and not df_relevant_abstracts.empty:
    print(f"--- Preparing and chunking {len(df_relevant_abstracts)} relevant abstracts ---")

    # Get the list of abstract texts to process from the 'abstract' column
    texts_to_process = df_relevant_abstracts['abstract'].tolist()

    # Define the desired chunk size in words
    chunk_size_by_words = 150

    # Loop through each abstract, split it into words, and create chunks
    for text in tqdm(texts_to_process, desc="Chunking abstracts"):
        words = text.split() # Split the abstract into a list of words
        # Iterate through the words list, taking 'chunk_size_by_words' at a time
        for i in range(0, len(words), chunk_size_by_words):
            # Join the words in the current chunk back into a string
            chunk_text = " ".join(words[i:i + chunk_size_by_words]).strip()
            # Create a LlamaIndex Document object if the chunk is not empty
            if chunk_text:
                documents.append(Document(text=chunk_text))

    print(f"\n✅ Created {len(documents)} document chunks from the relevant abstracts.")

else:
    print("❌ No relevant abstracts found ('df_relevant_abstracts' is missing or empty).")
    print("   Please ensure previously (Keyword-based Filtering) ran successfully and found some abstracts.")



---



**Create and Persist VectorStoreIndex**

---



This code snippet takes the 5915 document chunks we just created and uses the configured embedding model (all-MiniLM-L6-v2 on the GPU) to convert each chunk into a numerical vector. These vectors are then stored in a VectorStoreIndex, which is an optimized data structure that allows for very fast similarity searches. The code also "persists" (saves) this index to a specified directory (either on your Google Drive if mounted, or in Colab's temporary storage) so it can be reloaded later without rebuilding. This step will take several minutes.

In [None]:
from llama_index.core import VectorStoreIndex
import os # To check for Google Drive path and create directories

print("--- Creating and Persisting VectorStoreIndex ---")

# Initialize the index variable
index = None

# --- Define the directory where the index will be saved ---
# Option 1: Google Drive (Recommended for persistence)
# IMPORTANT: Replace 'YourProjectFolderOnDrive' with an actual folder name you want in your Google Drive.
# This folder will be created if it doesn't exist.
drive_persist_dir = "/content/drive/MyDrive/CORD19_Smoking_Chatbot_Index"

# Option 2: Local Colab storage (index will be lost if Colab session ends)
colab_local_persist_dir = "storage_cord19_smoking_index_colab"

# Determine the persist_dir based on whether Google Drive is mounted
persist_dir = ""
if os.path.exists("/content/drive/MyDrive/"): # Check if the base MyDrive folder exists
    persist_dir = drive_persist_dir
    print(f"Google Drive detected. Index will be saved to: {persist_dir}")
else:
    persist_dir = colab_local_persist_dir
    print(f"Google Drive not detected or not accessible at '/content/drive/MyDrive/'.")
    print(f"Index will be saved to local Colab storage: {persist_dir}")

# Create the directory if it doesn't exist
# This is important for both Google Drive and local Colab storage.
try:
    os.makedirs(persist_dir, exist_ok=True)
    print(f"Ensured directory exists: {persist_dir}")
except OSError as e:
    print(f"Error creating directory {persist_dir}: {e}. Please check the path and permissions.")
    # If directory creation fails, we should not proceed with saving.
    # For now, we'll let the next step potentially fail if 'documents' is empty,
    # but a more robust solution might stop here.


# Check if the 'documents' list exists and has content
if 'documents' in locals() and documents:
    print(f"\nCreating vector index from {len(documents)} document chunks...")
    print("This process will use the GPU and may take 5-15 minutes. Please be patient.")

    # This is the core step: generate embeddings and build the index.
    # LlamaIndex will use the 'Settings.embed_model' we configured previously.
    index = VectorStoreIndex.from_documents(
        documents,
        show_progress=True # Displays a progress bar
    )

    # Save the created index to the specified 'persist_dir'
    index.storage_context.persist(persist_dir=persist_dir)
    print(f"\n✅ VectorStoreIndex created and successfully saved to: {persist_dir}")

else:
    print("\n❌ No document chunks found ('documents' list is missing or empty).")
    print("   Cannot create the VectorStoreIndex. Please ensure 'Prepare Documents and Chunking' ran successfully.")



---



**Load Index from Storage**

---



This code snippet checks if a previously saved vector index exists in the specified directory (persist_dir). If the index was just created in the current session, it won't try to reload it. However, if this is a new Colab session and the index object doesn't exist yet, this code will load the index from disk (from your Google Drive or local Colab storage, depending on where it was saved). This avoids the time-consuming process of rebuilding the index every time.

In [None]:
from llama_index.core import StorageContext, load_index_from_storage
import os

print("--- Attempting to Load Index from Storage ---")

# --- Ensure 'persist_dir' is defined and matches the save location previously ---
# The 'persist_dir' variable should still be in memory from the previous cell.
# If you are running this cell in a new session, you might need to redefine 'persist_dir'
# to point to where your index was saved. For example:
# persist_dir = "/content/drive/MyDrive/CORD19_Smoking_Chatbot_Index"
# OR
# persist_dir = "storage_cord19_smoking_index_colab"

# Check if 'persist_dir' is defined. If not, it means previous snippet was likely not run in this session.
if 'persist_dir' not in locals():
    print("❌ 'persist_dir' is not defined. This usually means previous snippet (saving the index) was not run in this session.")
    print("   Please define 'persist_dir' to point to your saved index location or run previous to create it.")
    # Attempt to set a default if drive is mounted, otherwise local. This is a fallback.
    if os.path.exists("/content/drive/MyDrive/"):
        persist_dir = "/content/drive/MyDrive/CORD19_Smoking_Chatbot_Index" # Default drive path
        print(f"   Attempting to use default Google Drive path: {persist_dir}")
    else:
        persist_dir = "storage_cord19_smoking_index_colab" # Default local path
        print(f"   Attempting to use default local Colab path: {persist_dir}")

print(f"Checking for existing index in: {persist_dir}")

# We try to load the index if:
# 1. The 'index' variable doesn't already exist OR it exists but is None (meaning it wasn't successfully created/loaded yet).
# 2. The 'persist_dir' (the directory where the index should be saved) actually exists.
if ('index' not in locals() or index is None) and os.path.exists(persist_dir):
    print(f"Found existing index directory. Attempting to load index...")
    try:
        # Prepare the storage context pointing to the directory.
        storage_context = StorageContext.from_defaults(persist_dir=persist_dir)
        # Load the index. This will re-populate the 'index' variable.
        index = load_index_from_storage(storage_context)
        print("✅ Index loaded successfully from storage.")
    except Exception as e:
        print(f"❌ Error loading index from storage: {e}")
        print("   The saved index might be corrupted, or the path might be incorrect.")
        print("   You might need to rebuild the index by re-running previous snippet.")
        index = None # Ensure index is None if loading failed

elif 'index' in locals() and index is not None:
    # This case means the index was already created or loaded in the current session (e.g., by running previous snippet).
    print("ℹ️ Index object already exists in this session (likely created in previous snippet). No need to reload.")

else:
    # This case means 'persist_dir' does not exist, and 'index' is not already populated.
    print(f"ℹ️ No existing index found at {persist_dir}.")
    print("   If this is your first time running through all steps, this is normal (index was just created in previous code snippet).")
    print("   If you expected to load an index from a previous session, ensure 'persist_dir' is correct")
    print("   and that you successfully saved the index in that previous sessio.")

# Final check to see if the 'index' object is now available for use
if 'index' in locals() and index is not None:
    print("\n👍 Index object is available and ready for use.")
else:
    print("\n⚠️ Index object is NOT available. Subsequent steps requiring the index may fail.")
    print("   Please review the messages above to diagnose the issue.")

# AI Agent Development

---



**Configure the Large Language Model (LLM)**

---



This code snippet configures and loads the specific Large Language Model (LLM) that our agent will use as its "brain." We're using unsloth/llama-3-8b-Instruct-bnb-4bit, a 4-bit quantized version of Llama 3 8B, which offers a good balance of performance (speed) and quality. The code specifies the model name, tokenizer, context window size, maximum new tokens to generate, and ensures it runs on the GPU using 16-bit precision for efficiency. This configured LLM is then set as the global default for LlamaIndex.

In [None]:
from llama_index.core import Settings
from llama_index.llms.huggingface import HuggingFaceLLM
import torch # For torch_dtype

print("--- Configuring the Large Language Model (LLM) ---")

# Configure and load the Unsloth Quantized Llama 3 8B model
try:
    llm = HuggingFaceLLM(
        model_name="unsloth/llama-3-8b-Instruct-bnb-4bit",
        tokenizer_name="unsloth/llama-3-8b-Instruct-bnb-4bit",
        context_window=2048, # Max tokens the model can see in total
        max_new_tokens=256,  # Max tokens the model will generate in one answer
        device_map="auto",   # Automatically use the GPU
        model_kwargs={"torch_dtype": torch.float16}, # Optimized for Unsloth models
        generate_kwargs={"temperature": 0.7, "do_sample": True} # Controls response creativity
    )
    Settings.llm = llm # Set this as the global LLM for LlamaIndex
    print("\n✅ LLM (unsloth/llama-3-8b-Instruct-bnb-4bit) configured successfully.")
    print("   It's set as the default LLM in LlamaIndex Settings and will run on the GPU.")

except Exception as e:
    print(f"\n❌ CRITICAL ERROR: Failed to configure LLM. Error: {e}")
    print("   Troubleshooting suggestions:")
    print("     - Ensure you are logged into Hugging Face.")
    print("     - Double-check the model name for typos.")
    print("     - Ensure your Colab environment has the A100 GPU selected and enough resources.")
    Settings.llm = None # Ensure LLM is None if setup fails



---

**Assemble the RAG Query Engine**

---
This code snippet assembles the core components of our Retrieval Augmented Generation (RAG) system.

1. Retriever: It creates a retriever from our previously loaded vector index (index), configured to fetch the top 7 most relevant document chunks (similarity_top_k=7) for any query.
2. Prompt Template: It defines a specific set of instructions (qa_prompt_template_str) telling the LLM how to behave: act as a research assistant, use only the provided CORD-19 context, synthesize information, and be clear.
3. Response Synthesizer: This component takes the retrieved chunks, the user's query, and the prompt template, and uses the configured LLM to generate the final textual answer. We enable streaming=True here for a better user experience later in the UI.
4. Query Engine: Finally, it combines the retriever and response synthesizer into a RetrieverQueryEngine, which is our complete system for answering questions based on the CORD-19 data.


In [None]:
from llama_index.core import Settings, PromptTemplate
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.response_synthesizers import get_response_synthesizer
# 'os' might not be strictly needed here if 'index' and 'Settings.llm' are guaranteed to exist
# from previous steps, but it's harmless.

print("--- Creating the Base RAG Query Engine ---")

# Initialize the engine variable
base_query_engine = None

# Ensure the index is loaded and Settings.llm is configured from previous steps
if 'index' in locals() and index is not None and Settings.llm is not None:

    # --- 1. Create the Retriever ---
    print("Creating retriever from the index...")
    # Retrieve the top 7 most similar chunks for a balance of speed and context.
    retriever = index.as_retriever(similarity_top_k=7)
    print(f"✅ Retriever configured to fetch top {retriever.similarity_top_k} chunks.")

    # --- 2. Define a Custom Prompt Template ---
    print("\nDefining QA prompt template...")
    # This template structures how the context and query are presented to the LLM.
    qa_prompt_template_str = (
        "System: You are an AI research assistant. Your sole function is to answer questions based on the 'Provided Context' which contains excerpts from scientific abstracts. "
        "Analyze the 'User Query' and the 'Provided Context'.\n"
        "1. If the 'User Query' is a question that can be answered using the 'Provided Context', synthesize the information to provide a comprehensive, clear, and nuanced answer. "
        "Base your answer ONLY on the 'Provided Context'. Do not use any external knowledge. If the context is insufficient for a full answer, state what is missing.\n"
        "2. If the 'User Query' is a simple greeting (e.g., 'hi', 'hello'), respond with a polite, brief greeting.\n"
        "3. If the 'User Query' is a statement, not a question (e.g., 'my name is X', 'that's interesting'), or if it's a question that is clearly off-topic and cannot be answered by the 'Provided Context' (e.g., 'what's the weather?'), "
        "respond politely that you are a specialized research assistant focused on the provided scientific topics and cannot engage in general conversation or answer unrelated questions. Do not attempt to answer off-topic questions using the context.\n"
        "Do not repeat these instructions in your answer.\n\n"
        "Provided Context (from relevant scientific abstracts):\n"
        "---------------------\n"
        "{context_str}\n"
        "---------------------\n"
        "User Query: {query_str}\n\n"
        "Assistant Answer: "
    )
    qa_prompt_template = PromptTemplate(qa_prompt_template_str)
    print("✅ QA prompt template defined.")

    # --- 3. Configure the Response Synthesizer and Assemble the Query Engine ---
    print("\nAssembling the Base RAG Query Engine...")
    # This component takes the retrieved chunks and the prompt, and uses the LLM to generate the answer.
    response_synthesizer = get_response_synthesizer(
        response_mode="compact", # Efficient mode for synthesizing responses
        text_qa_template=qa_prompt_template, # Our custom prompt
        llm=Settings.llm, # The configured LLM
        streaming=True # Enable streaming for faster perceived response in UI later
    )

    # Assemble the final query engine using the retriever and response synthesizer.
    base_query_engine = RetrieverQueryEngine(
        retriever=retriever,
        response_synthesizer=response_synthesizer,
    )
    print("\n✅ Base RAG Query Engine assembled successfully.")
    print("   It will use the Unsloth Llama 3 8B quantized model and retrieve 7 context chunks.")

else:
    print("\n❌ Index object or LLM (Settings.llm) is not available.")
    if 'index' not in locals() or index is None:
        print("   - Index is missing. Please ensure it was loaded or created correctly.")
    if Settings.llm is None:
        print("   - LLM is missing. Please ensure it was configured correctly.")



---


**Create the Conversational Chat Engine**

---



This code snippet builds upon our base_query_engine to create a more advanced CondenseQuestionChatEngine. This new engine is designed for conversational interactions.

**Condense Question Logic:** When you ask a follow-up question (e.g., "what else?" or just "smoking"), this engine first looks at the chat history and your new input. It then uses the LLM to rewrite your input into a complete, standalone question that makes sense given the conversation so far.

**Uses Base Engine:** This newly formulated standalone question is then passed to our base_query_engine (which is excellent at answering specific, well-formed questions using the CORD-19 data). This approach allows the agent to handle vague follow-ups and maintain conversational context effectively. The `verbose=False` setting means it won't print the condensed questions during operation, keeping the output clean.

In [None]:
from llama_index.core.chat_engine import CondenseQuestionChatEngine
# Settings, PromptTemplate, RetrieverQueryEngine, get_response_synthesizer should be available
# if previous cells were run, but it's good practice to ensure necessary imports are covered
# if a cell is meant to be potentially runnable in isolation after kernel restarts.
# However, for this step-by-step, we assume 'base_query_engine' exists.

print("--- Creating the Conversational Chat Engine ---")

# Initialize the conversational chat engine variable
conversational_chat_engine = None

# Ensure the base_query_engine (from Assemble the RAG Query Engine) and Settings.llm are available
if 'base_query_engine' in locals() and base_query_engine is not None and Settings.llm is not None:

    try:
        # Create the CondenseQuestionChatEngine.
        # This engine uses the base_query_engine to answer the rephrased (condensed) question.
        # It manages chat history internally to understand follow-up questions.
        conversational_chat_engine = CondenseQuestionChatEngine.from_defaults(
            query_engine=base_query_engine, # The engine we built in Assemble the RAG Query Engine
            # We can customize the condense_prompt if needed, but defaults are often good.
            # For example, to see the condensed questions, you can set verbose=True
            verbose=False
        )
        print("\n✅ Conversational Chat Engine (CondenseQuestionChatEngine) created successfully.")
        print("   It will use the base RAG query engine to answer questions after rephrasing them based on chat history.")

    except Exception as e:
        print(f"\n❌ CRITICAL ERROR: Failed to create CondenseQuestionChatEngine. Error: {e}")
        print("   Ensure 'base_query_engine' was created successfully in the previous step.")

else:
    print("\n❌ Base Query Engine ('base_query_engine') or LLM (Settings.llm) is not available.")
    if 'base_query_engine' not in locals() or base_query_engine is None:
        print("   - 'base_query_engine' is missing. Please ensure Assemble the RAG Query Engine was completed successfully.")
    if Settings.llm is None: # Should have been caught in Assemble the RAG Query Engine, but good to check
        print("   - LLM is missing. Please ensure Configuring the Large Language Model (LLM) was successful.")

#Application Development (Gradio GUI)

---
**Structure Code for UI Application**

---

This code defines a single, crucial function called initialize_ai_system(). This function encapsulates all the setup steps required to get our AI agent ready:

1. Configuring the embedding model (from Configure LlamaIndex Settings).
2. Loading our saved vector index (from Load Index from Storage).
3. Configuring the Unsloth Llama 3 8B quantized LLM (from Configure the Large Language Model (LLM)).
4. Assembling the complete RAG query engine (retriever, prompt template, response synthesizer - from  Assemble the RAG Query Engine). It also includes a simple caching mechanism (CACHED_QUERY_ENGINE) so that this entire setup process only runs once per Colab session, making subsequent uses of the agent much faster. This function is essential for a clean and efficient Gradio application.


In [None]:
import os
import torch # For torch.float16
import gradio as gr
from llama_index.core import (
    Settings,
    VectorStoreIndex,
    StorageContext,
    load_index_from_storage,
    PromptTemplate
)
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.response_synthesizers import get_response_synthesizer
from llama_index.core.chat_engine import CondenseQuestionChatEngine
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.huggingface import HuggingFaceLLM

# --- Global Cache for the Chat Engine ---
CACHED_CHAT_ENGINE = None

def initialize_ai_system():
    """
    Initializes all AI components (embedding model, index, LLM,
    base query engine, and then the CondenseQuestionChatEngine with a custom condense prompt).
    Caches and returns the CondenseQuestionChatEngine.
    """
    global CACHED_CHAT_ENGINE

    if CACHED_CHAT_ENGINE is not None:
        print("Returning cached AI system (CondenseQuestionChatEngine).")
        return CACHED_CHAT_ENGINE

    print("--- Initializing AI System with CondenseQuestionChatEngine (Custom Condense Prompt) ---")

    # --- 1. CONFIGURE EMBEDDING MODEL ---
    print("Configuring embedding model...")
    try:
        Settings.embed_model = HuggingFaceEmbedding(
            model_name="sentence-transformers/all-MiniLM-L6-v2",
            device="cuda"
        )
        print("✅ Embedding model configured.")
    except Exception as e:
        print(f"❌ ERROR configuring embedding model: {e}")
        return None

    # --- 2. LOAD THE VECTOR INDEX ---
    print("\nLoading vector index...")
    persist_dir = ""
    drive_path = "/content/drive/MyDrive/CORD19_Smoking_Chatbot_Index"
    colab_local_path = "storage_cord19_smoking_index_colab"
    if os.path.exists(drive_path):
        persist_dir = drive_path
        print(f"   Attempting to load index from Google Drive: {persist_dir}")
    elif os.path.exists(colab_local_path):
        persist_dir = colab_local_path
        print(f"   Attempting to load index from local Colab storage: {persist_dir}")
    else:
        print(f"❌ CRITICAL ERROR: Index directory not found at expected Google Drive path ('{drive_path}') or local Colab path ('{colab_local_path}').")
        return None

    if not os.path.exists(persist_dir):
        print(f"❌ CRITICAL ERROR: Selected persist_dir ('{persist_dir}') does not exist. Cannot load index.")
        return None

    try:
        storage_context = StorageContext.from_defaults(persist_dir=persist_dir)
        index = load_index_from_storage(storage_context)
        print("✅ Vector index loaded successfully.")
    except Exception as e:
        print(f"❌ CRITICAL ERROR: Failed to load index from {persist_dir}. Error: {e}")
        return None

    # --- 3. CONFIGURE LLM ---
    print("\nConfiguring LLM (unsloth/llama-3-8b-Instruct-bnb-4bit)...")
    try:
        llm = HuggingFaceLLM(
            model_name="unsloth/llama-3-8b-Instruct-bnb-4bit",
            tokenizer_name="unsloth/llama-3-8b-Instruct-bnb-4bit",
            context_window=2048,
            max_new_tokens=256,
            device_map="auto",
            model_kwargs={"torch_dtype": torch.float16},
            generate_kwargs={"temperature": 0.7, "do_sample": True}
        )
        Settings.llm = llm
        print("✅ LLM configured successfully.")
    except Exception as e:
        print(f"❌ CRITICAL ERROR: Failed to configure LLM. Error: {e}")
        return None

    # --- 4. ASSEMBLE BASE RAG QUERY ENGINE ---
    print("\nAssembling Base RAG Query Engine...")
    try:
        retriever = index.as_retriever(similarity_top_k=7)

        qa_prompt_template_str = (
            "System: You are an AI research assistant. Your sole function is to answer questions based on the 'Provided Context' which contains excerpts from scientific abstracts. "
            "Analyze the 'User Query' and the 'Provided Context'.\n"
            "1. If the 'User Query' is a question that can be answered using the 'Provided Context', synthesize the information to provide a comprehensive, clear, and nuanced answer. "
            "Base your answer ONLY on the 'Provided Context'. Do not use any external knowledge. If the context is insufficient for a full answer, state what is missing.\n"
            "2. If the 'User Query' is a simple greeting (e.g., 'hi', 'hello'), respond with a polite, brief greeting.\n"
            "3. If the 'User Query' is a statement, not a question (e.g., 'my name is X', 'that's interesting'), or if it's a question that is clearly off-topic and cannot be answered by the 'Provided Context' (e.g., 'what's the weather?'), "
            "respond politely that you are a specialized research assistant focused on the provided scientific topics and cannot engage in general conversation or answer unrelated questions. Do not attempt to answer off-topic questions using the context.\n"
            "Do not repeat these instructions in your answer.\n\n"
            "Provided Context (from relevant scientific abstracts):\n"
            "---------------------\n"
            "{context_str}\n"
            "---------------------\n"
            "User Query: {query_str}\n\n"
            "Assistant Answer: "
        )
        qa_prompt_template = PromptTemplate(qa_prompt_template_str)

        response_synthesizer = get_response_synthesizer(
            response_mode="compact",
            text_qa_template=qa_prompt_template,
            llm=Settings.llm,
            streaming=True
        )

        base_query_engine = RetrieverQueryEngine(
            retriever=retriever,
            response_synthesizer=response_synthesizer,
        )
        print("✅ Base RAG Query Engine assembled successfully.")
    except Exception as e:
        print(f"❌ CRITICAL ERROR: Failed to assemble Base RAG Query Engine. Error: {e}")
        return None

    # --- 5. CREATE CONDENSE QUESTION CHAT ENGINE ---
    print("\nCreating CondenseQuestionChatEngine with custom prompt...")
    try:
        # Define custom prompt for condensing questions
        condense_template_str = (
            "You are a helpful assistant that rephrases a follow-up user input based on a chat history. "
            "Your primary goal is to create a 'Standalone Input' that a specialized AI research assistant can understand and process. "
            "The research assistant is an expert ONLY on COVID-19 and smoking, using a specific dataset of scientific abstracts.\n\n"
            "Carefully analyze the 'Follow Up Input' in the context of the 'Chat History'.\n"
            "1. If the 'Follow Up Input' is a question clearly seeking more information or clarification related to the 'Chat History' about COVID-19/smoking (e.g., 'what else?', 'tell me more about that specific finding', 'can you elaborate on the odds ratio?'), "
            "rephrase it into a detailed, standalone question that incorporates the necessary context from the Chat History for the research AI.\n"
            "2. If the 'Follow Up Input' is a general term central to the research AI's expertise (e.g., 'smoking', 'vaping', 'nicotine and covid'), "
            "rephrase it as a specific question asking for a summary of its relationship with COVID-19 based on the scientific abstracts (e.g., 'What is the relationship between smoking and COVID-19 according to the abstracts?').\n"
            "3. If the 'Follow Up Input' is clearly a simple greeting (e.g., 'hi', 'hello'), a personal statement (e.g., 'my name is Ayse', 'I am a doctor'), or a question completely unrelated to COVID-19/smoking (e.g., 'what's the weather?', 'tell me a joke'), "
            "then the 'Standalone Input' should be EXACTLY the same as the 'Follow Up Input' without any modification or rephrasing.\n\n"
            "Chat History (summarized if long):\n" # Added a note about summary for long history
            "{chat_history}\n\n"
            "Follow Up Input: {question}\n\n"
            "Standalone Input: "
        )
        custom_condense_prompt = PromptTemplate(condense_template_str)

        condense_chat_engine = CondenseQuestionChatEngine.from_defaults(
            query_engine=base_query_engine,
            condense_template_prompt=custom_condense_prompt, # Use the NEW custom prompt
            verbose=True # <<--- Set verbose=True here to see the condensed question!
        )
        print("✅ CondenseQuestionChatEngine created successfully with custom condense prompt.")
    except Exception as e:
        print(f"❌ CRITICAL ERROR: Failed to create CondenseQuestionChatEngine. Error: {e}")
        return None

    # --- 6. CACHE AND RETURN THE CHAT ENGINE ---
    print("\nCaching the CondenseQuestionChatEngine.")
    CACHED_CHAT_ENGINE = condense_chat_engine
    print("\n✅ AI System Initialized with smarter CondenseQuestionChatEngine and Cached Successfully.")
    return CACHED_CHAT_ENGINE



---
**Develop the Gradio Application**

---

This code snippet builds and launches your interactive web UI using Gradio.



1.   **chat_response(message, history) function:** This function is called by Gradio every time a user sends a message.

        *   It first calls our `initialize_ai_system()` to get the (potentially cached) query engine.
        *   It then sends the user's `message` to this engine.
        *   Crucially, it iterates through the `response_stream.response_gen` to `yield` parts of the answer as they are generated by the LLM. This creates a true **streaming effect** in the Gradio UI, making it feel much more responsive as the user sees words appear almost immediately.

2.   **Pre-initialization:** Before launching the UI, we explicitly call `initialize_ai_system()`. This ensures that if it's the first run, the potentially slow setup (**model loading, etc.**) happens before the UI link is generated, preventing a timeout or a very slow first interaction for the user.

3.   **gr.ChatInterface:** This Gradio component quickly creates a full chatbot UI. We tell it to use our `chat_response` function.
4.   **iface.launch(share=True, debug=True):** This launches the web server for the UI. `share=True` generates a public URL that you can open in your browser (and share with others for ~72 hours). `debug=True` will show any Gradio-specific errors in the Colab output.








In [None]:
import gradio as gr
# Ensure initialize_ai_system is defined from Structure Code for UI Application.

def chat_response_streaming(message, history):
    """
    Handles a user's message, pre-filters for some casual inputs,
    gets a response from the AI system for others, and streams it back.
    """
    print(f"\nUser query for Gradio: {message}")
    normalized_message = message.strip().lower()

    # --- Simple Pre-filter for Common Casual Inputs ---
    if normalized_message in ["hi", "hello", "hey"]:
        yield "Hello there! I'm an AI assistant focused on COVID-19 and smoking. How can I help with your research today?"
        return

    if normalized_message.startswith("my name is"):
        try:
            name_part = message.split("my name is", 1)[1].strip()
            if name_part:
                name = name_part.split(" ")[0].capitalize()
                yield f"Nice to meet you, {name}! I can assist with questions about COVID-19 and smoking. What's your query?"
            else:
                yield "Okay! I'm here to help with your research on COVID-19 and smoking."
            return
        except IndexError:
             yield "Okay! I'm here to help with your research on COVID-19 and smoking."
             return

    # Handle common affirmations/closings
    common_affirmations = ["perfect", "great", "thanks", "thank you", "ok", "okay", "got it", "sounds good", "excellent"]
    if normalized_message in common_affirmations:
        yield "You're welcome! Is there anything else I can help you with regarding COVID-19 and smoking research?"
        return

    # --- End of Pre-filter ---

    # If not caught by pre-filters, proceed with the AI engine
    chat_engine_instance = initialize_ai_system()

    if not chat_engine_instance:
        yield "Error: The AI chat engine is not available. Please check the Colab notebook for errors during initialization."
        return

    try:
        response_stream = chat_engine_instance.stream_chat(message)
        accumulated_response = ""
        for token in response_stream.response_gen:
            accumulated_response += token
            yield accumulated_response

    except Exception as e:
        print(f"Error during Gradio query engine processing: {e}")
        import traceback
        traceback.print_exc()
        yield f"Sorry, an error occurred while processing your request: {str(e)}"

# --- Pre-initialize the AI system before launching the UI ---
# (This part remains the same as before)
print("Pre-initializing AI system for Gradio Interface... This might take a few minutes if it's the first run in this session.")
engine_instance = initialize_ai_system()

if engine_instance is None:
    print("CRITICAL ERROR: Could not initialize AI system for Gradio. The UI cannot be launched reliably.")
else:
    print("AI system pre-initialized successfully and is cached.")

    title = "AI Research Assistant: CORD-19 & Smoking Linkages"

    iface = gr.ChatInterface(
        fn=chat_response_streaming,
        title=title,
        description="Ask questions about the relationship between COVID-19 and smoking, based on CORD-19 dataset. Powered by a Llama 3 8B model.",
        examples=[
            ["What is the link between smoking and COVID-19 severity?"],
            ["Does vaping affect COVID-19 outcomes?"],
            ["Are smokers more susceptible to COVID-19?"]
        ],
        chatbot=gr.Chatbot(height=600, label="Chat Conversation"),
        textbox=gr.Textbox(placeholder="Type your question here and press Enter...", container=False, scale=7, label="Your Question")
    )

    print("\nLaunching Gradio Interface... Please wait for the public URL.")
    iface.launch(share=True, debug=True)