## The one time Database creation cells are in RAW format while the ones that relate to simply retrieving results are runnable

In [10]:
from langchain_community.document_loaders import PyPDFLoader
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain.chains import RetrievalQA
from langchain_openai import ChatOpenAI
from langchain_chroma import Chroma
import chromadb
from IPython.display import Markdown,display


from getpass import getpass
import os
import numpy as np

In [11]:
current_folder = os.getcwd() 
file_path = os.path.join(current_folder, "RAG Project Dataset")
db_path = os.getcwd()

## Load the documents and split into Chunks

## Convert to Embeddings and Store in Vector DB

In [3]:
os.environ['OPENAI_API_KEY'] = getpass()
open_api_key = os.environ['OPENAI_API_KEY']
embedding_model = OpenAIEmbeddings(model="text-embedding-3-small")
llm = ChatOpenAI(model = "gpt-4o-mini",temperature = 0, openai_api_key=open_api_key)

 ········


## Declaring the chromadb client

In [4]:
persistent_client = chromadb.PersistentClient(path=db_path)

## Saving embeddings into ChromaDB - Run only for creating embedding database for the first time

## Loading the DB

In [5]:
vector_store_from_client = Chroma(
    client=persistent_client,
    collection_name="knowledge_database",
    embedding_function=embedding_model, 
)

## Instantiating the Retriever

In [6]:
chain = RetrievalQA.from_chain_type(
    llm, chain_type="stuff", retriever=vector_store_from_client.as_retriever()
)

## Querying the vector DB

In [7]:
q1 = "What are the main components of a RAG model and how do they interact?"
q2 = "What are the two sub-layers in each encoder layer of the Transformer model?"
q3 = "Explain how positional encoding is implemented in Transformers and why is it necessary?"
q4 = "Describe the concept of multi-headed attention in the Transformers architecture.Why is it beneficial?"
q5 = "What is few shot learning and how does GPT-3 implement it during inference?"

In [8]:
output = chain.invoke(
    {"query": q5},
    return_only_outputs=True,
)

In [9]:
display(Markdown(output['result']))

Few-shot learning is a machine learning approach where a model is trained to perform a task with only a small number of examples or demonstrations. In the context of GPT-3, few-shot learning is implemented during inference by providing the model with a few examples of the task at hand as part of the input. This allows GPT-3 to adapt to the specific task without requiring any gradient updates or fine-tuning.

During inference, GPT-3 takes in a prompt that includes the task description along with a few demonstrations of the desired output. The model then uses this context to generate responses that are relevant to the task, leveraging its pre-trained knowledge and the provided examples to perform effectively. This method allows GPT-3 to achieve strong performance on various NLP tasks by utilizing in-context learning.