# Generative AI Capstone Project

**Title:** Simulating Generative AI Capabilities: Document Understanding + Controlled Output + Few-Shot Prompting  
**Author:**  Shilan Rashidian 

**Date:** April 18, 2025

## Overview

This project demonstrates key capabilities of Generative AI using local simulation methods due to limitations in accessing external APIs (e.g., Google Gemini) within the Kaggle Notebook environment.

### Covered Capabilities:
1. Document Understanding  
2. Controlled Generation (Structured Output)  
3. Few-Shot Prompting (Simulated)

We will process a document, extract meaningful answers to questions, format responses as JSON, and simulate how few-shot examples improve model accuracy.


## 📄 Enhanced PDF Document Understanding with Conversational Interface

This notebook demonstrates an interactive system for understanding PDF documents using LangChain and Google Generative AI (Gemini).

**Functionality:**
1.  **Upload PDF:** Allows you to upload a PDF document.
2.  **Process Document:** Extracts text, splits it into manageable chunks, and creates vector embeddings.
3.  **Conversational Q&A:** Enables you to ask questions about the document content. The system uses the document context and remembers the conversation history.

**Note:** This notebook requires a Google API Key configured in Kaggle Secrets (or as an environment variable) to use Google's Generative AI models.

### 1. Install Dependencies

In [1]:
# Upgrade pip and install all required libraries together for better dependency resolution
!pip install --upgrade -q pip
!pip install -U -q langchain langchain-core langchain-community langchain-google-genai google-generativeai pypdf chromadb tiktoken ipywidgets

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m47.7 MB/s[0m eta [36m0:00:00[0m:00:01[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m34.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m79.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m18.3/18.3 MB[0m [31m153.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.4/2.4 MB[0m [31m82.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m68.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m16.0/16.0 MB[0m [31m108.5 MB/s[0m eta [36m

### 2. Setup Google API Key

Load the Google API Key from Kaggle Secrets and set it as an environment variable. LangChain components will automatically detect and use this key.

In [13]:
import os
import warnings

try:
    from kaggle_secrets import UserSecretsClient
    GOOGLE_API_KEY = UserSecretsClient().get_secret("GOOGLE_API_KEY")
    #os.environ["GOOGLE_API_KEY"] = GOOGLE_API_KEY
    print("✅ Google API Key loaded successfully from Kaggle Secrets.")
except ImportError:
    print("🔑 Kaggle Secrets not available. Checking for GOOGLE_API_KEY environment variable.")
    #GOOGLE_API_KEY = os.environ.get("GOOGLE_API_KEY")
    if GOOGLE_API_KEY:
        print("✅ Google API Key found in environment variables.")
    else:
        warnings.warn("🛑 Google API Key not found in Kaggle Secrets or environment variables. AI features will fail.")
        GOOGLE_API_KEY = None # Explicitly set to None if not found
except Exception as e:
    warnings.warn(f"❌ Failed to load Google API Key: {e}. AI features may fail.")
    GOOGLE_API_KEY = None

✅ Google API Key loaded successfully from Kaggle Secrets.


### 3. Import Libraries & Define Helper Functions

In [14]:
# Rerun this cell with the modified create_retriever function

import tempfile
from IPython.display import display, Markdown
import ipywidgets as widgets
import os # Ensure os is imported

# LangChain components - Using updated/correct import paths
from langchain_google_genai import GoogleGenerativeAIEmbeddings, ChatGoogleGenerativeAI
from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationalRetrievalChain

# --- Helper Functions ---

def load_pdf(file_path):
    """Loads and splits the PDF document into chunks."""
    try:
        loader = PyPDFLoader(file_path)
        docs = loader.load()
        if not docs:
            print("⚠️ Warning: No text could be extracted from the PDF.")
            return []
        splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=150)
        split_docs = splitter.split_documents(docs)
        print(f"📄 PDF loaded and split into {len(split_docs)} chunks.")
        return split_docs
    except Exception as e:
        print(f"❌ Error loading or splitting PDF: {e}")
        return None

# MODIFIED FUNCTION BELOW
def create_retriever(docs):
    """Creates embeddings and a vector store retriever."""
    # Retrieve the API key from the environment where it was set earlier
    api_key = GOOGLE_API_KEY

    if not api_key:
        print("❌ Cannot create retriever: Google API Key is missing from environment variables.")
        return None
    if not docs:
        print("❌ Cannot create retriever: No documents provided.")
        return None

    try:
        # Explicitly pass the API key here
        embedding = GoogleGenerativeAIEmbeddings(
            model="models/embedding-001",
            google_api_key=api_key # <-- Explicitly pass the key
        )
        print("⏳ Creating Chroma vector store (this may take a moment)...")
        vectordb = Chroma.from_documents(
            documents=docs,
            embedding=embedding
        )
        retriever = vectordb.as_retriever(search_kwargs={'k': 5}) # Retrieve top 5 relevant chunks
        print("✅ Vector store and retriever created successfully.")
        return retriever
    except Exception as e:
        # Catch potential errors during embedding or Chroma creation
        print(f"❌ Error creating retriever: {e}")
        # You might want to print more details for debugging:
        # import traceback
        # traceback.print_exc()
        return None
# END OF MODIFIED FUNCTION

def setup_qa_chain(retriever):
    """Sets up the conversational Q&A chain with memory."""
    api_key = GOOGLE_API_KEY
    if not api_key:
        print("❌ Cannot setup QA chain: Google API Key is missing.")
        return None
    if not retriever:
        print("❌ Cannot setup QA chain: Retriever is not available.")
        return None
    try:
        # Pass the key explicitly to the chat model too for consistency
        model = ChatGoogleGenerativeAI(
            model="gemini-1.5-flash-latest",
            temperature=0.2,
            convert_system_message_to_human=True,
            google_api_key=GOOGLE_API_KEY
            )
        memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)
        qa_chain = ConversationalRetrievalChain.from_llm(
            llm=model,
            retriever=retriever,
            memory=memory,
            verbose=False
        )
        print("✅ Conversational Q&A chain is ready.")
        return qa_chain
    except Exception as e:
        print(f"❌ Error setting up QA chain: {e}")
        return None

print("Libraries imported and helper functions defined (create_retriever updated).")

Libraries imported and helper functions defined (create_retriever updated).


### 4. Upload PDF Document

In [4]:
# Create and display the file uploader widget
uploader = widgets.FileUpload(
    accept='.pdf',  # Only accept PDF files
    multiple=False, # Allow only one file upload
    description='Upload PDF'
)
display(uploader)

FileUpload(value=(), accept='.pdf', description='Upload PDF')

### 5. Process Uploaded PDF and Initialize Chat System

Once you upload a file using the widget above, run this cell to process it and set up the Q&A chain.

In [15]:
import os # Ensure os is imported

# --- Specify the path to your PDF file within the Kaggle environment ---
kaggle_file_path = "/kaggle/input/test-pdf/Learning_Tenserflow_buliding_deep.pdf"
# ---------------------------------------------------------------------

# Initialize state variables
pdf_docs = None
retriever = None
qa_chain = None
file_path = None # Keep track of the path being used

print(f"Attempting to process file: {kaggle_file_path}")

# Check if the file exists
if os.path.exists(kaggle_file_path):
    file_path = kaggle_file_path # Set the global file_path variable

    # --- Direct Processing Logic ---
    try:
        # 1. Load and split the PDF
        print("⏳ Loading and splitting PDF...")
        pdf_docs = load_pdf(file_path) # Use the helper function defined earlier

        if pdf_docs:
            # 2. Create the retriever
            print("⏳ Creating vector store and retriever...")
            retriever = create_retriever(pdf_docs) # Use the helper function

            if retriever:
                # 3. Setup the QA chain
                print("⏳ Setting up conversational Q&A chain...")
                qa_chain = setup_qa_chain(retriever) # Use the helper function

                if qa_chain:
                    print("\n✅ Chat system is ready! You can now ask questions in the next cell.")
                else:
                    print("\n❌ Failed to initialize the chat system after creating retriever.")
            else:
                print("\n❌ Failed to initialize the chat system because retriever creation failed.")
        else:
            print("\n❌ Failed to initialize the chat system because PDF processing failed.")

    except Exception as e:
        print(f"❌ An error occurred during file processing: {e}")
    # No finally block needed here for temp file cleanup as we are using a direct path
else:
    print(f"❌ Error: File not found at the specified path: {kaggle_file_path}")
    print("Please ensure the path is correct and the dataset is added to the notebook.")

Attempting to process file: /kaggle/input/test-pdf/Learning_Tenserflow_buliding_deep.pdf
⏳ Loading and splitting PDF...
📄 PDF loaded and split into 603 chunks.
⏳ Creating vector store and retriever...
⏳ Creating Chroma vector store (this may take a moment)...
✅ Vector store and retriever created successfully.
⏳ Setting up conversational Q&A chain...
✅ Conversational Q&A chain is ready.

✅ Chat system is ready! You can now ask questions in the next cell.


  memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)


### 6. Start Chatting!

Run the cell below to start an interactive chat session. Ask questions about the PDF you uploaded. Type `exit` to end the chat.

In [None]:
import time
# os should be imported already, but ensure it is if running cells independently
import os

# Check if qa_chain was successfully created in the previous step
if 'qa_chain' not in globals() or qa_chain is None:
    print("⚠️ Chat system is not ready. Please check the output of the previous cell for errors.")
    if 'file_path' not in globals() or not file_path:
         print("   Reason: PDF file path was not set or the file was not found/processed.")
    elif 'pdf_docs' not in globals() or not pdf_docs:
         print("   Reason: PDF document loading/splitting failed.")
    elif 'retriever' not in globals() or not retriever:
         print("   Reason: Vector store retriever creation failed (check API key and embeddings).")
    elif 'qa_chain' not in globals() or qa_chain is None:
         print("   Reason: Conversational chain setup failed (check model initialization).")

else:
    print(f"💬 Starting chat session about '{os.path.basename(file_path)}'. Type 'exit' to quit.")
    print("---")
    while True:
        try:
            question = input("👤 You: ")
            if question.strip().lower() == "exit":
                print("\n👋 Goodbye!")
                break
            if not question.strip():
                continue

            start_time = time.time()
            # Invoke the chain
            result = qa_chain.invoke({"question": question})
            end_time = time.time()

            # Print the answer
            print(f"\n🤖 Assistant ({end_time - start_time:.2f}s):")
            # Display the answer using Markdown for better formatting potential
            display(Markdown(result['answer']))
            print("---")

        except EOFError:
            # Handle abrupt termination if running in certain environments
            print("\n👋 Session ended unexpectedly.")
            break
        except Exception as e:
            print(f"\n❌ An error occurred during chat: {e}")
            # Optional: break the loop on error, or allow user to continue
            # break

# No temporary file cleanup needed as we used a direct path
print("\nChat session finished.")

💬 Starting chat session about 'Learning_Tenserflow_buliding_deep.pdf'. Type 'exit' to quit.
---


👤 You:  Hi, what is this about?





🤖 Assistant (3.07s):


This text is an index and excerpts from a book about TensorFlow, a deep learning framework.  The excerpts cover topics such as:

* **TensorFlow's capabilities:**  Including using pre-trained models and utilities.
* **Image captioning:**  A deep learning application focusing on generating natural language descriptions for images.
* **TensorFlow Serving:**  A system for deploying and serving TensorFlow models.
* **Tensors:**  The fundamental data structures in TensorFlow, including their attributes, data types, and manipulation.
* **Deep learning concepts:**  Such as backpropagation, word embeddings, and various model architectures (RNNs, autoencoders).
* **Practical examples and code snippets:** Hints at the inclusion of practical examples using TensorFlow.

The overall subject is a guide to using TensorFlow for deep learning, with a focus on practical applications and implementation details.

---


## Conclusion

This notebook demonstrated how to build a conversational interface for PDF documents using LangChain and Google Generative AI. Key steps included:

1.  Setting up the environment and API keys.
2.  Loading and processing PDF documents (`PyPDFLoader`, `RecursiveCharacterTextSplitter`).
3.  Creating vector embeddings and a retriever (`GoogleGenerativeAIEmbeddings`, `Chroma`).
4.  Building a conversational chain with memory (`ChatGoogleGenerativeAI`, `ConversationBufferMemory`, `ConversationalRetrievalChain`).
5.  Providing an interactive chat interface.

This serves as a foundation for more advanced document interaction applications.