**Candidate:** Sneha Santha Prabakar

**Week 11:** RAG with conversational memory

For this assignment, we will be using the *GitHub for Developers – Training Manual* (https://githubtraining.github.io/training-manual/legacy-manual.pdf) to train our AI assistant.


The GitHub for Developers training manual is a comprehensive, 60-page technical document that’s clear, structured, and easy to work with. It covers practical topics like Git workflows, branching, and collaboration - exactly the kind of content people ask about in real-world scenarios. The length and clearly structured format makes it ideal for chunking and testing semantic search. It also allows for meaningful follow-up questions, which is perfect for evaluating conversational memory in a RAG setup.

# Part 1: Install libraries

In [None]:
!pip install langchain
!pip install pymupdf
!pip install -U langchain-community
!pip install -U langchain-openai
!pip install sentence-transformers
!pip install openai
!pip install faiss-cpu



This cell installs all the required libraries for the RAG assistant:

- `langchain`: Core library for building chains and agents.

- `pymupdf`: Parses PDF files, used by PyMuPDFLoader.

- `langchain-community`: Needed for loaders like PyMuPDFLoader.

- `langchain-openai`: Used for language models like ChatOpenAI.

- `sentence-transformers`: Provides HuggingFace embedding models.

- `openai`: Official library for OpenAI API.

- `faiss-cpu`: Used for storing and searching vector embeddings efficiently.

In [None]:
# import libaries

# System and environment
import os
import openai
from openai import OpenAI

# LangChain document loading and text processing
from langchain.document_loaders import PyMuPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

# LangChain embeddings and vector storage
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS

# LangChain conversational retrieval components
from langchain_openai import ChatOpenAI
from langchain.chains import ConversationalRetrievalChain
from langchain.memory import ConversationBufferMemory


# Part 2: Document Loading and Preprocessing

In [None]:
# from google.colab import files
# uploaded = files.upload()

## Step 1: Loading the PDF document

In [None]:
loader = PyMuPDFLoader("github for developers - trianing manual.pdf")
documents = loader.load()

## Step 2: Split into ~500-token Chunks

In [None]:
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,       # approx 500 tokens
    chunk_overlap=50      # small overlap for context continuity
)

chunks = text_splitter.split_documents(documents)

##  Step 3: Display 3 Representative Chunks



In [None]:
for i, chunk in enumerate(chunks[:3]):
    print(f"--- Chunk {i+1} ---")
    print(chunk.page_content[:1000])  # limit to ~1000 characters

--- Chunk 1 ---
GitHub for Developers
Training Manual
-v1.0
--- Chunk 2 ---
Table of Contents
Welcome to GitHub for Developers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  1
License. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  1
Getting Ready for Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  2
Step 1: Set Up Your GitHub.com Account . . . . . . . . . . . . . . . . . . . . . . . . .  2
--- Chunk 3 ---
Step 2: Install Git. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  2
Step 3: Set Up Your Text Editor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  3
Exploring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  4
Getting Started With Collaboration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  5


# Part 3: Text Embedding and Vector Store Setup


## Step 1: Choose and Initialize Embedding Model

In [None]:
embedding_model = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

This sets up the text embedding model using HuggingFace.

"all-MiniLM-L6-v2" is a widely-used sentence transformer for semantic similarity.

These embeddings will be used to convert text chunks into vectors for search.


## Step 2: Create and Populate Vector Store

In [None]:

vectorstore = FAISS.from_documents(chunks, embedding_model)

FAISS is used to store and index vector representations of our document chunks.

It allows for efficient semantic search based on similarity to a query.

This fulfills the vector store setup requirement in the assignment.

## Step 3: Run a Semantic Search Query

In [None]:
query = "How do I resolve a merge conflict in Git?"
results = vectorstore.similarity_search(query, k=3) # retrieves the top 3 relevant chunks from the vector store.

print("Query:", query)
for i, res in enumerate(results):
    print(f"\n--- Result {i+1} ---\n{res.page_content[:1000]}")

Query: How do I resolve a merge conflict in Git?

--- Result 1 ---
Resolving Merge Conflicts
Merge conflicts can be intimidating, but resolving them is actually quite
easy. In this section you will learn how!
Local Merge Conflicts
To practice merge conflicts, we first need to create one. Complete the
following:
Activity Instructions
1. Type git checkout -b stats-update origin/stats-update.
2. Type git diff —color-words gh-pages..stats-update.
3. Check out the gh-pages branch.
4. Merge the stats-update branch into gh-pages.

--- Result 2 ---
Local Merge Conflicts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  41
Remote Merge Conflicts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  42
Exploring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  42
Helpful Git Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  43

--- Result 3 ---
3

## Explanation

We used the all-MiniLM-L6-v2 model from HuggingFace to generate semantic embeddings for each chunk of our document. This model is lightweight, efficient, and widely used for sentence-level semantic similarity tasks. It converts each text chunk into a high-dimensional vector that captures the meaning of the content, rather than just surface-level keywords. This makes it particularly suitable for semantic search where users may phrase questions differently from how content is worded in the original text.

To store and index these embeddings, we used FAISS (Facebook AI Similarity Search) — a high-performance library that allows us to perform fast nearest-neighbor searches in vector space. Using `FAISS.from_documents`, we embedded all chunks and saved them in a searchable format. This setup ensures that when a user submits a query, we can efficiently retrieve the most relevant parts of the document based on meaning, not just string matches.

To test the semantic search pipeline, we ran the query:
"*How do I resolve a merge conflict in Git?*"
The system retrieved the top 3 most relevant chunks from the training manual. These chunks included content that directly addressed Git conflict resolution strategies — such as the use of git merge, git status, and merge tools like GitHub Desktop. The results were clear, contextually appropriate, and matched the intent of the question, demonstrating that both the embedding model and vector store were correctly configured.

# Part 4: RAG + Conversational Memory

## Step 1: Setup Conversational Retrieval Chain

In [None]:

openai.api_key = os.getenv("OPENAI_API_KEY")

client = OpenAI(
  api_key=openai.api_key
)

completion = client.chat.completions.create(
  model="gpt-4o-mini",
  store=True,
  messages=[
    {"role": "user", "content": "write a haiku about ai"}
  ]
)

print(completion.choices[0].message);

ChatCompletionMessage(content='Silent circuits hum,  \nWhispers of thought intertwined,  \nSteel dreams shape the dawn.', refusal=None, role='assistant', annotations=[], audio=None, function_call=None, tool_calls=None)


This makes a direct call to OpenAI's API using the openai Python SDK. It generates a response to a simple haiku prompt using GPT-4o-mini.

In [None]:
llm = ChatOpenAI(temperature=0)  # Or GPT-4

memory = ConversationBufferMemory(
    memory_key="chat_history",
    return_messages=True
)

qa_chain = ConversationalRetrievalChain.from_llm(
    llm=llm,
    retriever=vectorstore.as_retriever(),
    memory=memory
)

## Step 2: Simulate a 3-Turn Conversation


In [None]:
print("Turn 1:")
response1 = qa_chain.run("What is the purpose of branching in Git?")
print(response1)

print("\nTurn 2:")
response2 = qa_chain.run("How do I create a new branch?")
print(response2)

print("\nTurn 3:")
response3 = qa_chain.run("Can I switch branches without committing changes?")
print(response3)

Turn 1:
The purpose of branching in Git is to allow developers to work on separate features or fixes without affecting the main codebase (master branch). By creating a branch, developers can experiment, make changes, and fix issues in isolation, keeping the main codebase safe. Branches in Git are lightweight and cheap, allowing developers to work on different tasks simultaneously and merge their changes back into the main codebase when ready.

Turn 2:
To create a new branch in Git, you can follow these steps:

1. Navigate to the repository where you want to create the branch.
2. Use the command `git branch branchname` to create a new branch. Replace `branchname` with the name you want for your branch.
3. To switch to the newly created branch, you can use the command `git checkout branchname`.

These steps will help you create and switch to a new branch in Git.

Turn 3:
Yes, you can switch branches without committing changes in Git. Git allows you to switch branches even if you have cha

## Explanation of Conversational Memory Use:


In this step, we implemented a Retrieval-Augmented Generation (RAG) pipeline that combines a language model with document-based retrieval and conversational memory. This setup allows the assistant to generate accurate, context-aware responses based on both the document content and the flow of an ongoing conversation.

We used ChatOpenAI as the underlying language model and integrated it with our FAISS-based vector retriever using LangChain’s ConversationalRetrievalChain. This chain handles both the retrieval of relevant document chunks (based on user queries) and the generation of coherent responses using those chunks. To maintain continuity across multiple user turns, we added ConversationBufferMemory, which stores the full sequence of prior messages and responses.

The key benefit of this memory component is that it allows the assistant to reference earlier questions and answers, enabling it to track the conversation’s context over time.

To demonstrate this, we simulated a three-turn interaction:

Turn 1: The user asks, “What is a Git branch?”
The assistant responds by explaining the concept of branching in Git, referencing relevant chunks from the training manual.

Turn 2: The user follows up with, “How do I create a new one?”
The assistant correctly infers that “a new one” refers to a Git branch. Thanks to ConversationBufferMemory, it retrieves instructions on branch creation using commands like git branch and git checkout -b.

Turn 3: The user asks, “Can I switch branches without committing?”
The assistant maintains context and provides guidance related to uncommitted changes, including tools like git stash, git switch, and best practices for branch switching.

Because of memory, the assistant doesn't treat each query in isolation. Instead, it builds a coherent narrative across turns, allowing users to ask follow-up questions naturally — just like in a real conversation. It also ensures that the assistant references accurate, document-grounded answers drawn from the GitHub training manual.