# RAG Application Tutorial

This notebook will guide you through building a Retrieval-Augmented Generation (RAG) application. The application will scrape information from a website, store it in a local vector database, and provide a Gradio interface for users to ask questions and receive relevant information.

## Step 1: Setup

First, we need to install the necessary libraries for web scraping, vector database, and Gradio.

In [None]:
!pip install langchain_community langchain langchain_openai faiss-cpu gradio 

Define website you would like to scrape and enter your OpenAI API key

In [None]:
import os
os.environ["OPENAI_API_KEY"] = ""
website = ""

## Step 2a: Web Scraping

We will use Langchain Community Web Based Loader to load information from a website.

In [None]:
from langchain_community.document_loaders import WebBaseLoader
loader = WebBaseLoader(website)
docs = loader.load()

## Step 2b: Text splitting

We will use a text splitter to split the website in to chunks of data

In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=250,
    chunk_overlap=100,
    
)
split_docs = text_splitter.split_documents(docs)

In [None]:
[sd.page_content for sd in split_docs]

## Step 3: Store Information in a Local Vector Database

We will use FAISS to create a local vector database.

In [None]:
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings

embedding_model = OpenAIEmbeddings()
vectorstore = FAISS.from_documents(split_docs, embedding_model)
retriever = vectorstore.as_retriever(search_kwargs={"k": 5})

## Step 4: Create function to retrieve information and to generate response from LLM

In [None]:

from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model_name="gpt-3.5-turbo")
prompt = """You are a helpful assistant that answers questions based on the provided context.
Use the context below to answer the question. If the answer is not in the context, say "I don't know"."""

def answer_question(question, history=[]):
    relevant_documents = retriever.get_relevant_documents(question)

    texts = [doc.page_content for doc in relevant_documents]
    text_string = "\n".join(texts)
    question_prompt = f"{prompt}\n\nContext:\n{text_string}\n\nQuestion: {question}"

    llm_response = llm(question_prompt).content
    return llm_response

## Step 5: Create a simple gradio UI to test

In [None]:
import gradio as gr

iface = gr.ChatInterface(
    fn=answer_question,
    title="RAG Application",
    description="Ask a question and get an answer based on the website."
)

iface.launch()