# **InterviewRAG: A Context-Aware Interview Preparation Assistant**

## **Project Overview**

**InterviewRAG** is a Retrieval-Augmented Generation (RAG) assistant designed to help users prepare for job interviews by asking and answering questions based on **uploaded interview preparation PDFs**. It provides conversational, context-grounded answers powered entirely by **local open-source models**, offering both **privacy** and **zero API cost**.

The assistant expands vague questions, retrieves the most relevant context, and generates helpful answers using an instruction-tuned LLM—all while tracking the conversation to enhance memory and flow. It's ideal for students, job seekers, or professionals looking to practice or refine their interview readiness.

> **Note on Output Visibility**
>
> Some output cells have been intentionally cleared to avoid rendering issues across platforms such as **Kaggle** and **GitHub**.
> 
> This includes:
>
> - Model pipeline and LLM initialization 
> - Embedding generation and vectorstore creation
> 
> All code executes correctly, and full results are reproducible in local or Colab environments.

## **Objectives**

- Upload an interview preparation PDF and split it into semantic chunks.  
- Create a persistent vectorstore for fast semantic retrieval.  
- Automatically rewrite user questions for better relevance.  
- Return answers grounded **only** in the PDF content using a custom prompt.  
- Maintain short-term conversation memory using a buffer.  
- Operate fully offline using local models for both LLM and embeddings.

## **Core Components**

- **Document Loader**: `PyPDFLoader` for parsing interview preparation PDFs.  
- **Text Chunking**: `RecursiveCharacterTextSplitter` to break content into useful segments.  
- **Embeddings**: `all-MiniLM-L6-v2` via `HuggingFaceEmbeddings` for fast vector search.  
- **Vector Store**: `Chroma` for storing and retrieving document chunks efficiently.  
- **LLM (Answering + Query Expansion)**: `mistralai/Mistral-7B-Instruct-v0.1` for answering and refining questions.  
- **Memory**: Lightweight `ConversationBufferMemory` to hold short-term interaction history.  
- **RAG Chain**: LangChain `ConversationalRetrievalChain` to combine context, memory, and the LLM response.

**InterviewRAG** demonstrates how Retrieval-Augmented Generation, prompt engineering, and memory can be combined with local models to build a smart, cost-effective, and personalized interview prep assistant.

#### **Environment Setup & Dependency Installation**

Install core libraries including LangChain, Hugging Face Transformers, ChromaDB, and document parsing utilities required for Retrieval-Augmented Generation workflows.

In [None]:
!pip install -q langchain langchain-huggingface chromadb huggingface_hub sentence-transformers
!pip install -q langchain-community langchainhub pypdf

#### **Import Core Libraries**

Import standard libraries, LangChain modules, document loaders, memory, prompt tools, and evaluation utilities necessary for building the RAG system.

In [2]:
import os
from huggingface_hub import login
from google.colab import files
from langchain.vectorstores import Chroma
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
from langchain_huggingface import HuggingFacePipeline, HuggingFaceEmbeddings
from langchain.chains import ConversationalRetrievalChain
from langchain.prompts import PromptTemplate
from langchain.memory import ConversationBufferMemory
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import PyPDFLoader
import warnings

warnings.filterwarnings("ignore")

#### **Hugging Face Hub Authentication**

Authenticate with the Hugging Face Hub to access pre-trained models for language generation, embeddings, and summarization.

In [None]:
login()

#### **Load and Configure Mistral-7B-Instruct**

Load the Mistral-7B-Instruct model using `transformers.pipeline` for natural language generation. Set up the tokenizer, model, and parameters for conversational output.

In [None]:
model_name = "mistralai/Mistral-7B-Instruct-v0.1"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",
    torch_dtype="auto"
)

text_gen_pipeline = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=512,
    temperature=0.7,
    pad_token_id=tokenizer.eos_token_id,
    do_sample=True
)

llm = HuggingFacePipeline(pipeline=text_gen_pipeline)

#### **Initialize Embedding Model (MiniLM)**

Use `all-MiniLM-L6-v2` from Hugging Face as the embedding model for semantic vector search over document chunks.

In [None]:
embedding_model = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

#### **Set Up Conversational Memory Buffer**

Initialize `ConversationBufferMemory` to retain recent dialogue history. This enables short-term memory during multi-turn interactions and improves contextual relevance of responses.

In [7]:
memory = ConversationBufferMemory(
    memory_key="chat_history",
    return_messages=True,
    output_key="answer",
    k=5
)

#### **Upload and Chunk PDF into Document Segments**

Define a utility to upload a single interview preparation PDF. Load its content and split it into semantically meaningful chunks using `RecursiveCharacterTextSplitter`.

In [8]:
def load_and_chunk_uploaded_pdf(chunk_size=500, chunk_overlap=100):
    uploaded = files.upload()
    pdf_path = list(uploaded.keys())[0]
    loader = PyPDFLoader(pdf_path)
    documents = loader.load()
    splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap)
    return splitter.split_documents(documents)

#### **Create Persistent Vectorstore from PDF Chunks**

Store the document chunks in a Chroma vector database for efficient similarity search. Persist the vectorstore locally for future reuse.

In [9]:
def create_vectorstore(docs, embedding_model, persist_directory="interviewrag_chroma"):
    return Chroma.from_documents(
        documents=docs,
        embedding=embedding_model,
        persist_directory=persist_directory,
        collection_name="interviewrag-docs"
    )

#### **PDF Upload + Vectorstore Preparation Workflow**

Upload the PDF, split it into chunks, and build the vectorstore in one step. This function returns a ready-to-query Chroma instance for downstream use.

In [10]:
def upload_and_prepare_vectorstore(embedding_model):
    print("📂 Please upload your interview PDF...")
    chunks = load_and_chunk_uploaded_pdf()
    print(f"✅ Loaded and split {len(chunks)} chunks.")
    vectorstore = create_vectorstore(chunks, embedding_model)
    print("📦 Vectorstore created successfully.")
    return vectorstore

#### **Query Expansion using LLM Reasoning**

Use the LLM to rewrite vague or brief interview questions into more specific and context-rich queries. This improves the relevance of retrieved chunks.

In [11]:
def expand_query(llm, user_input):
    prompt = f"Rewrite this interview question to be more specific and detailed: {user_input}"
    return llm.invoke(prompt)

#### **Define RAG Prompt Template for Interview Questions**

Create a structured prompt to guide the LLM in generating grounded, professional answers. Enforce constraints to avoid hallucinations and maintain alignment with the provided context.

In [12]:
QA_TEMPLATE = """
You are a helpful AI assistant tasked with answering interview-related questions using only the provided context.

Question:
{question}

Relevant Context:
{context}

Instructions:
- Base your answer only on the context above.
- Do not make up information beyond the context.
- If the answer is not available, say:
  "I'm sorry, the provided materials do not include an answer to this specific question."
- Provide a helpful, friendly, and well-structured answer that is informative but not too lengthy.
"""
QA_CHAIN_PROMPT = PromptTemplate.from_template(QA_TEMPLATE)

#### **Create Conversational Retrieval-Augmented QA Chain**

Build a LangChain `ConversationalRetrievalChain` that integrates the LLM, retriever, memory, and custom prompt template. Configure it to return both answers and source documents.

In [13]:
def create_rag_chain(llm, retriever, memory):
    return ConversationalRetrievalChain.from_llm(
        llm=llm,
        retriever=retriever,
        memory=memory,
        return_source_documents=True,
        combine_docs_chain_kwargs={"prompt": QA_CHAIN_PROMPT},
        output_key="answer",
        verbose=True
    )

#### **InterviewRAG Conversational Chat Function**

Define the main chat loop that handles user interaction. It expands interview questions using the LLM, retrieves relevant document context from the vectorstore, and generates answers using a RAG chain with memory support.

In [14]:
def interview_rag_chat(llm, vectorstore, memory):

    retriever = vectorstore.as_retriever(search_kwargs={"k": 5})
    rag_chain = create_rag_chain(llm, retriever, memory)

    while True:
        user_input = input("📝 Ask your interview question (or type 'exit'):\n> ")
        if user_input.lower().strip() == "exit":
            print("👋 Session ended.")
            break

        expanded = expand_query(llm, user_input)
        result = rag_chain.invoke(expanded)
        print("\n🤖 Answer:\n", result["answer"], "\n")

#### **Upload and Initialize Vectorstore from Interview PDF**

Trigger the upload of the PDF, process it into chunks, and create a persistent Chroma vectorstore for use in retrieval. This prepares the system for question-answering.

In [None]:
vectorstore = upload_and_prepare_vectorstore(embedding_model)

#### **Start the InterviewRAG Chat Interface**

Launch the interactive chat session using the initialized LLM, memory buffer, and vectorstore. The assistant answers user questions in real time with document-grounded responses.

In [16]:
interview_rag_chat(llm, vectorstore, memory)

📝 Ask your interview question (or type 'exit'):
> What should I do before going to an interview?


[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
You are a helpful AI assistant tasked with answering interview-related questions using only the provided context.

Question:
Rewrite this interview question to be more specific and detailed: What should I do before going to an interview?

To be more specific and detailed, the interview question could be: Could you please provide me with a comprehensive list of steps that I should take before going to an interview, including any relevant research, preparation of responses to common interview questions, and any other necessary actions to ensure that I make a positive first impression?

Relevant Context:
▪ Demonstrate enthusiasm and interest by making eye contact, smiling, and a firm handshake. 
▪ Be courteous to everyone you meet; others not directly in the i

## **Final Thoughts**

**InterviewRAG** demonstrates the practical application of Retrieval-Augmented Generation (RAG) for focused, context-aware interview preparation—built entirely with local, open-source tools.

By combining **Mistral-7B-Instruct** for question answering and query refinement, **MiniLM embeddings** for efficient semantic retrieval, and a lightweight buffer-based memory system, InterviewRAG provides grounded, conversational support tailored to user-uploaded interview materials. It ensures that all responses remain tightly aligned with the content of the provided PDF, making the assistant both reliable and domain-specific.

Its modular design, offline operation, and zero-cost architecture make it suitable for individual learners, students, and professionals seeking a private, intelligent way to practice and refine their interview skills.

---

**Thank you for exploring InterviewRAG.**