In [2]:
# Importing Libraries
from langchain_community.llms import Ollama

In [8]:
# Initialize the model
llm = Ollama(model='tinyllama')

In [10]:
# Generate answers to a question
question = "When did India get Independence?"
response = llm.predict(question)
print(response)

  response = llm.predict(question)


India got Independence on August 15, 1947. It was achieved through the resolution of the Indian National Congress and the British Raj to give complete independence to India from British rule, which ended their 208-year long rule over the country.


In [12]:
# Generate answers to a question
question = "Who is Sachin Tendulkar"
response = llm.predict(question)
print(response)

Sachin Tendulkar is an Indian former cricketer who played as a left-handed batsman for Karnataka and India. He was known for his aggressive style of play and scoring runs at a rapid pace. He made his debut in the Ranji Trophy in 1972, and went on to represent India in the following years. During his international career, he played 48 Test matches, scoring 2636 runs with an average of 45.10. He was also a member of the Indian team that won the 1983 World Cup. In terms of domestic cricket, Tendulkar was known for his explosive innings and was a key player in Karnataka's historic win against Mumbai Indians in the IPL in 2014. He retired from international cricket in 1987 and later moved into coaching and management roles with various Indian cricket teams, including BCCI and T20 franchises.


## Implementing RAG for custom data

In [17]:
import openai

from langchain.vectorstores import FAISS
from langchain.embeddings import HuggingFaceEmbeddings
#from langchain.embeddings import OpenAIEmbeddings
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

In [19]:
pdf_reader = PyPDFLoader("RAGPaper.pdf")
documents = pdf_reader.load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200,)
chunks = text_splitter.split_documents(documents)

In [21]:
# Create embeddings using a free HF model
embeddings = HuggingFaceEmbeddings()
db = FAISS.from_documents(documents=chunks, embedding=embeddings)

  embeddings = HuggingFaceEmbeddings()
  embeddings = HuggingFaceEmbeddings()


In [23]:
# Initialize the model
llm = Ollama(model='tinyllama')

In [25]:
# Generate answers to a question
question = "When did India get Independence?"
response = llm.predict(question)
print(response)

India got independence on August 15, 1947. It was declared as a republic by the British Empire's Governor General, Lord Mountbatten. The ceremony of India's Independence Day was held in Delhi on August 15, 1947, at Rajghat.


In [27]:
from langchain.chains import ConversationalRetrievalChain
from langchain.prompts import PromptTemplate

CONDENSE_QUESTION_PROMPT = PromptTemplate.from_template("""Given the following conversation and a follow question, rephrase the follow up question to be a standalone question.
                                                        Chat History:{chat_history}
                                                        Follow up Input: {question}
                                                        Standalone question:""")
qa = ConversationalRetrievalChain.from_llm(llm=llm, retriever=db.as_retriever(), condense_question_prompt=CONDENSE_QUESTION_PROMPT, return_source_documents=True,
                                           verbose=False)
                                                        

In [37]:
chat_history=[]
query="""what is a RAG-sequence model?"""
result = qa({"question":query, "chat_history":chat_history})

In [38]:
print(result['answer'])

A RAG-Sequence model is a type of generative model that allows for multiple output tokens to be produced by a single document, using a topic-word sequence as the input. The sequence is generated by iteratively selecting and combining words based on their topic probabilities, such that each word appears in a specific order and contributes to the overall message or theme being conveyed in the text.

The key concept of RAG-Sequence is the use of the topic-word sequence as the input to generate multiple output tokens at once, rather than using a single beam search for all possible outputs. This allows for efficient decoding by running multiple forward passes for each document and combining probabilities across beams in a thresholded manner.

The RAG-Sequence model is particularly useful when producing short or medium-length text sequences that require many output tokens to convey their meaning effectively, such as news articles or social media posts.


In [41]:
chat_history=[]
query="""who are the authors of this paper?"""
result = qa({"question":query, "chat_history":chat_history})

print(result['answer'])

The query "Who are the authors of this paper?" can be answered using a more specific and factually accurate response provided by the RAG model. Here is an example:

Task Input: Middl-ear, 3rd-person singular(s), possessive(s)
Question: Which authors were mentioned in the paper?
Helpful Answer: The answer provided by the RAG model is:

Model Generated Response: "Authors include
