# LongChain Q&A RAG

based on [LangChain_QnA_RAG.ipynb](https://github.com/mishragauravgm/qna-faiss-rag/blob/main/LangChain_QnA_RAG.ipynb)

Here are the steps involved in this:



1.   Read a pdf from a file location
2.   Convert those pdf into chunks
3.   Store those chunks as embeddings using any embedding model
4.   Read a question or prompt from the user

    - Convert it into embedding too
    - Run Faiss on the prompt with the existing embedding database
    - Pass the k-nearest embeddings as a context through the prompt template
5. Pass the context and the question to LLM and get the response.

In [1]:
!pip install openai
!pip install faiss-cpu
!pip install langchain
!pip install python-dotenv
!pip install pypdf2
!pip install langchain_openai
!pip install sentence-transformers
!pip install langchain_community
!pip install streamlit



In [2]:
from openai import OpenAI
import os
from dotenv import load_dotenv
import PyPDF2 as pypdf
from langchain.text_splitter import RecursiveCharacterTextSplitter
#from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain.prompts import PromptTemplate

from langchain.chains import LLMChain
from langchain_openai import ChatOpenAI

from langchain_community.llms import HuggingFaceHub
import streamlit as st

from openai_commands.env import OPENAI_API_KEY

load_dotenv()

client = OpenAI(api_key = OPENAI_API_KEY)

### Split PDF to chunks

In [3]:
def pdf_to_faiss(pdf_location, chunk_size=800, chunk_overlap=100):
    #pdf_location = '/content/drive/MyDrive/Colab Notebooks/LangchainRAGQA/budget_speech.pdf'
    pdf = pypdf.PdfReader(pdf_location)
    full_text = ''
    for i, content in enumerate(pdf.pages):
        raw_text = content.extract_text()
        full_text += raw_text
    text_splits = RecursiveCharacterTextSplitter(separators='\n', chunk_size = 800, chunk_overlap = 100, length_function=len).split_text(full_text)
    embeddings = HuggingFaceEmbeddings()
    db = FAISS.from_texts(text_splits, embeddings)
    return db

In [4]:
docs=''
def answer(db, question):
    repo_id = "mistralai/Mixtral-8x7B-Instruct-v0.1" #"google/flan-t5-base"
    model = HuggingFaceHub(
        repo_id=repo_id, model_kwargs={"temperature": 0.7, "max_length": 5000}
    )
    docs = db.similarity_search(question, k=10)
    prompt = """Answer the following QUESTION based on the CONTEXT
    given. If you do not know the answer and the CONTEXT doesn't
    contain the answer truthfully say "I don't know"

        CONTEXT:{context}
        QUESTION:{question}
        ANSWER:
        """

    prompt_template = PromptTemplate(
        input_variables=["context", "question"],
        template=prompt,
    )
    chain = LLMChain(llm=model, prompt=prompt_template)
    return prompt, docs, chain.run(context = docs, question=question)

In [5]:
db = pdf_to_faiss('/Users/kamangir/Desktop/arash-abadpour-resume-full.pdf')

  embeddings = HuggingFaceEmbeddings()
  embeddings = HuggingFaceEmbeddings()
  from .autonotebook import tqdm as notebook_tqdm


In [6]:
ask = "Recommend a research topic for this person."
prompt, docs, ans = answer(db, ask)
print(ans)

  model = HuggingFaceHub(
  chain = LLMChain(llm=model, prompt=prompt_template)
  return prompt, docs, chain.run(context = docs, question=question)


Answer the following QUESTION based on the CONTEXT
    given. If you do not know the answer and the CONTEXT doesn't
    contain the answer truthfully say "I don't know"

        CONTEXT:[Document(id='cdba225c-2e43-45d9-94b3-1493ecbc75d0', metadata={}, page_content='Arash Abadpour\n15+ Years of Hands-On Machine Vision, Deep Learning & Geospa tial AI.\nrepo: github.com/kamangir\nExperience\n2022–2025 Staﬀ Software Engineer ,EarthDaily Analytics, Vancouver, Canada .\nEarthDaily observes, veriﬁes, & predicts changes to the Ear thś surface to help people\nunderstand & take action.\n2022 Senior Machine Learning Engineer (Computer Vision) ,Vivid Machines,\nToronto, Canada .\nSmart technology to help fruit and vegetable farmers optimi ze quality and yield.\n2020–2022 Vice President, Data Science ,Savormetrics Inc., Mississauga, Canada .\nDeep Learning + Food Inspection = Increasing Proﬁts. Reduci ng Waste. Improving Cus-\ntomer Satisfaction.\n2019–2020 Lead Data Scientist ,Betterview Marketpla

In [7]:
ans

'Answer the following QUESTION based on the CONTEXT\n    given. If you do not know the answer and the CONTEXT doesn\'t\n    contain the answer truthfully say "I don\'t know"\n\n        CONTEXT:[Document(id=\'cdba225c-2e43-45d9-94b3-1493ecbc75d0\', metadata={}, page_content=\'Arash Abadpour\\n15+ Years of Hands-On Machine Vision, Deep Learning & Geospa tial AI.\\nrepo: github.com/kamangir\\nExperience\\n2022–2025 Staﬀ Software Engineer ,EarthDaily Analytics, Vancouver, Canada .\\nEarthDaily observes, veriﬁes, & predicts changes to the Ear thś surface to help people\\nunderstand & take action.\\n2022 Senior Machine Learning Engineer (Computer Vision) ,Vivid Machines,\\nToronto, Canada .\\nSmart technology to help fruit and vegetable farmers optimi ze quality and yield.\\n2020–2022 Vice President, Data Science ,Savormetrics Inc., Mississauga, Canada .\\nDeep Learning + Food Inspection = Increasing Proﬁts. Reduci ng Waste. Improving Cus-\\ntomer Satisfaction.\\n2019–2020 Lead Data Scientis

In [8]:
# END