# 1. Install and Import Required Libraries
In this section, we will install and import the necessary libraries for working with FAISS, OpenAI, and PDF processing.

- **faiss**: For creating and querying the vector store
- **openai**: For generating embeddings and chatting with the model
- **numpy**: For handling numerical data
- **PyPDF2**: For reading and processing PDF documents

You may need to run the installation cell below if these packages are not already installed.

In [None]:
# Install required packages (uncomment if running in a new environment)
!pip install faiss-cpu openai numpy PyPDF2

In [None]:
# Import required libraries
import faiss
from openai import OpenAI
import numpy as np
from PyPDF2 import PdfReader

# 2. Set Up OpenAI API Key in Google Colab

To use OpenAI's services securely in Google Colab, we'll store the API key in Colab's secrets manager. This is more secure than having the API key directly in your notebook.

1. Click on the 'Files' icon in the left sidebar
2. Click on the 'folder' icon to show all files
3. Click on the 'key' icon to open the Secrets manager
4. Add a new secret with:
   - Name: `OPENAI_API_KEY`
   - Value: Your OpenAI API key from https://platform.openai.com/account/api-keys

The next cell will retrieve the key from Colab's secrets.

In [None]:
# Set up OpenAI API key from Google Colab secrets

try:
    api_key = userdata.get('OPENAI_API_KEY')
    client = OpenAI(api_key=api_key)
    print("OpenAI client successfully initialized with API key from Colab secrets!")
except Exception as e:
    print("Error: Could not load OpenAI API key from Colab secrets.")
    print("Please make sure you've added your API key to the Colab secrets manager.")
    print("Instructions are in the markdown cell above.")

# 3. Generate Embeddings Using OpenAI SDK
We will use OpenAI's embedding model to convert PDF document content into numerical vectors. These embeddings will be used to populate our FAISS vector store.

First, we'll read and process a PDF file, splitting it into chunks, and then generate embeddings for each chunk.

In [None]:
# Function to read PDF and split into chunks
def read_pdf_and_split(file_path, chunk_size=1000):
    reader = PdfReader(file_path)
    chunks = []
    current_chunk = ""

    # Extract text from each page
    for page in reader.pages:
        text = page.extract_text()
        words = text.split()

        # Create chunks of approximately chunk_size characters
        for word in words:
            if len(current_chunk) + len(word) + 1 <= chunk_size:
                current_chunk += word + " "
            else:
                chunks.append(current_chunk.strip())
                current_chunk = word + " "

    if current_chunk:
        chunks.append(current_chunk.strip())

    return chunks

# Function to generate embeddings
def get_embedding(text, model="text-embedding-ada-002"):
    response = client.embeddings.create(
        model=model,
        input=text
    )
    return response.data[0].embedding

# Read and process the PDF file
pdf_path = "ncert_physics.pdf"  # Replace with your PDF file path
chunks = read_pdf_and_split(pdf_path)
print(f"Document split into {len(chunks)} chunks")

# Generate embeddings for each chunk
chunk_embeddings = []
for chunk in chunks:
    embedding = get_embedding(chunk)
    chunk_embeddings.append(embedding)

print("Embeddings generated successfully")

# 4. Create and Populate FAISS Vector Store
Now that we have embeddings for our documents, we will create a FAISS index and add these embeddings to it. This will allow us to efficiently search for similar documents.

In [None]:
# Create FAISS index and add embeddings
embedding_dim = len(chunk_embeddings[0])
index = faiss.IndexFlatL2(embedding_dim)

# Convert embeddings to numpy array and add to index
embeddings = np.array(chunk_embeddings).astype('float32')
index.add(embeddings)

print(f"FAISS index created with {index.ntotal} vectors")

# 5. Query the FAISS Index
Let's see how to query the FAISS index. We'll take a new text, generate its embedding, and find the most similar documents in our vector store.

In [None]:
# Query the FAISS index
query_text = "Explain the physical significance of electric field?"  # Replace with your question
query_embedding = np.array([get_embedding(query_text)]).astype('float32')

k = 3  # Number of nearest chunks to retrieve
distances, indices = index.search(query_embedding, k)

print("Query:", query_text)
print("\nMost relevant chunks:")
for i, (dist, idx) in enumerate(zip(distances[0], indices[0])):
    print(f"\nChunk {i+1} (distance: {dist:.2f}):")
    print(chunks[idx])

# 6. Integrate Chat with OpenAI Using Retrieved Context
We can use the most relevant documents retrieved from the FAISS index as context for a chat interaction with OpenAI's language model. This helps the model provide more informed and accurate responses.

In [None]:
# Use retrieved chunks as context for chat
context_chunks = "\n\n".join([chunks[i] for i in indices[0]])

chat_prompt = f"""Context from the document:\n{context_chunks}\n\nBased on the above context from the document, please answer this question: {query_text}\nAnswer:"""

# Strictly use GPT-5 as requested
model = "gpt-4o"
print(f"Using strict model: {model}")

response = client.chat.completions.create(
    model=model,
    messages=[
        {"role": "system", "content": "You are a helpful assistant that answers questions based on the provided document context."},
        {"role": "user", "content": chat_prompt}
    ]
)

print("Chatbot response:")
print(response.choices[0].message.content)