<a href="https://colab.research.google.com/github/siddhartha-alexander/Retrieval-Augmented-Generation-RAG-Based-Conversational-AI-Chatbot/blob/main/conversation_ai_using_rag.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install langchain langchain_community langchain_Chroma

# Retrieval-Augmented Generation (RAG) Based Conversational AI Chatbot

**Domain:** Generative AI | NLP | Information Retrieval  
**Project Type:** Applied Machine Learning / GenAI  

## üìñ Introduction

This project implements a **Retrieval-Augmented Generation (RAG)** based conversational AI system.
The chatbot enhances Large Language Model (LLM) responses by retrieving relevant information from
external documents before generating answers.

By combining **vector-based retrieval** with **LLM generation**, the system produces
context-aware, accurate, and less hallucinated responses compared to standard LLM chatbots.


In [None]:
from langchain_text_splitters import CharacterTextSplitter
from langchain_chroma import Chroma
from langchain_core.prompts import PromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

In [None]:
!pip install pypdf


In [None]:
from langchain_community.document_loaders import PyPDFLoader
file='/content/NIPS-2017-attention-is-all-you-need-Paper.pdf'
loader=PyPDFLoader(file)
doc=loader.load_and_split()

In [None]:
doc[0]

In [None]:
len(doc)

12

**Split The Data**

In [None]:
text_split=CharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=100,
    length_function=len
)

**Segmentation**

In [None]:
texts=text_split.split_documents(doc)

In [None]:
texts[0]

Document(metadata={'producer': 'PyPDF2', 'creator': 'PyPDF', 'creationdate': '', 'subject': 'Neural Information Processing Systems http://nips.cc/', 'publisher': 'Curran Associates, Inc.', 'language': 'en-US', 'created': '2017', 'eventtype': 'Poster', 'description-abstract': 'The dominant sequence transduction models are based on complex recurrent orconvolutional neural networks in an encoder and decoder configuration. The best performing such models also connect the encoder and decoder through an attentionm echanisms.  We propose a novel, simple network architecture based solely onan attention mechanism, dispensing with recurrence and convolutions entirely.Experiments on two machine translation tasks show these models to be superiorin quality while being more parallelizable and requiring significantly less timeto train. Our single model with 165 million parameters, achieves 27.5 BLEU onEnglish-to-German translation, improving over the existing best ensemble result by over 1 BLEU. On E

In [None]:
len(texts)

12

**embeddings**

In [None]:
from langchain.embeddings import HuggingFaceEmbeddings
embedding_model=HuggingFaceEmbeddings(model_name='all-MiniLM-L6-v2')


**DataBase Creation**

In [None]:
vectordb=Chroma(
    collection_name='sid',
    embedding_function=embedding_model
)

In [None]:
vectordb

<langchain_chroma.vectorstores.Chroma at 0x7cf823ec2bd0>

**inserting Files into database**

In [None]:
storage_id=vectordb.add_documents(texts)

In [None]:
storage_id[0]

'3c8f23c2-3fb1-4a38-b0b5-a58bf443f422'

In [None]:
storage_id[1]

'ae4c3c2a-261c-46e6-94df-88feabb5dd60'

**Similarity Search**

In [None]:
res=vectordb.similarity_search(
    query='What problem does the Transformer architecture aim to solve?',
    k=2
)

In [None]:
res

Setting up the retrivals
a.retrever
b.llm

In [None]:
retriver=vectordb.as_retriever()

In [None]:
from transformers import AutoTokenizer,AutoModelForSeq2SeqLM,pipeline

In [None]:
tokenizer=AutoTokenizer.from_pretrained('google/flan-t5-base')

In [None]:
if tokenizer.pad_token is not None:
  tokenizer.add_special_tokens({'pad_token':'[PAD]'})

In [None]:
model=AutoModelForSeq2SeqLM.from_pretrained('google/flan-t5-base')

In [None]:
model.resize_token_embeddings(len(tokenizer))

Embedding(32101, 768)

In [None]:
model.config.pad_token_id=tokenizer.pad_token_id

In [None]:
generator=pipeline('text2text-generation',model=model,tokenizer=tokenizer)

Device set to use cpu


In [None]:
from langchain.llms import HuggingFacePipeline
llm=HuggingFacePipeline(pipeline=generator)

**prompt design**

In [None]:
template="""use the context to answer the questions.If you dont know say i dont know.

         context:
         {context}

         question:
         {question}

          answer"""

In [None]:
custom_template=PromptTemplate(
    template=template
)

In [None]:
rag_chain=(
    {'context':retriver,'question':RunnablePassthrough()}
    |custom_template
    |llm
    |StrOutputParser()
)

**Test**

In [None]:
query='What problem does the Transformer architecture aim to solve?'
res=rag_chain.invoke(query)
res

Token indices sequence length is longer than the specified maximum sequence length for this model (5382 > 512). Running this sequence through the model will result in indexing errors


'recurrence and convolutions'

In [None]:
!pip install gradio

In [None]:
import gradio as gr
def chat(message,history):
  bot_message=rag_chain.invoke(message)
  history.append((message,bot_message))
  return history,history
with gr.Blocks() as demo:
  chatbot=gr.Chatbot()
  msg=gr.Textbox()
  clear=gr.Button('clear')
  msg.submit(chat, [msg,chatbot], [msg,chatbot])
  clear.click(lambda: None, None, chatbot, queue=False)
demo.launch()

## ‚úÖ Conclusion

This project demonstrates how Retrieval-Augmented Generation (RAG)
can significantly improve the reliability and usefulness of conversational AI systems.
The modular design allows easy scaling, dataset replacement, and deployment.
