# Introduction

This is a demo project to show the steps for creating a chat bot that can answer questions from documents. In this project we utilized a Llama-2–7b model, along with all-MiniLM-L6-v2an sentence-transformer embeddings model to create an AI chat bot that uses RAG(Retrieval-Augmented Generation). 

RAG is a technique for augmenting LLM knowledge with additional data. LLMs can reason about wide-ranging topics, but their knowledge is limited to the public data up to a specific point in time that they were trained on. If you want to build AI applications that can reason about private data or data introduced after a model’s cutoff date, you need to augment the knowledge of the model with the specific information it needs. The process of bringing the appropriate information and inserting it into the model prompt is known as Retrieval Augmented Generation (RAG).

LangChain has a number of components designed to help build Q&A applications, and RAG applications more generally.

In [39]:
from langchain.document_loaders import PyPDFLoader, DirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain import PromptTemplate
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS
from langchain.llms import CTransformers
from langchain.chains import RetrievalQA

In [40]:
import warnings
warnings.filterwarnings('ignore')

# Load Data

In [41]:
DB_FAISS_PATH = 'vectorstore/db_faiss'
DATA_PATH = 'demo_pdfs/'

In [42]:
loader = DirectoryLoader(DATA_PATH, glob='*.pdf', loader_cls=PyPDFLoader)

In [43]:
documents = loader.load()

# Text Splitter

Once you've loaded documents, you'll often want to transform them to better suit your application. The simplest example is you may want to split a long document into smaller chunks that can fit into your model's context window. LangChain has a number of built-in document transformers that make it easy to split, combine, filter, and otherwise manipulate documents.

When you want to deal with long pieces of text, it is necessary to split up that text into chunks. As simple as this sounds, there is a lot of potential complexity here. Ideally, you want to keep the semantically related pieces of text together. What "semantically related" means could depend on the type of text. This notebook showcases several ways to do that.

At a high level, text splitters work as following:

1. Split the text up into small, semantically meaningful chunks (often sentences).
2. Start combining these small chunks into a larger chunk until you reach a certain size (as measured by some function).
3. Once you reach that size, make that chunk its own piece of text and then start creating a new chunk of text with some overlap (to keep context between chunks).

That means there are two different axes along which you can customize your text splitter:

1. How the text is split
2. How the chunk size is measured

**Ref:** [Text Splitters
](https://python.langchain.com/docs/modules/data_connection/document_transformers/)

In [44]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
texts = text_splitter.split_documents(documents)
texts[0]

Document(page_content='PSC-CPI: Multi-Scale Protein Sequence-Structure Contrasting for Efficient and\nGeneralizable Compound-Protein Interaction Prediction\nLirong Wu1,2, Yufei Huang1,2, Cheng Tan1,2, Zhangyang Gao1,2, Bozhen Hu1,2, Haitao Lin1,2,\nZicheng Liu1,2, Stan Z. Li1,†\n1AI Lab, Research Center for Industries of the Future, Westlake University, Hangzhou, China, 310030\n2Zhejiang University, Hangzhou, China, 310058', metadata={'source': 'demo_pdfs\\PSC-CPI.pdf', 'page': 0})

# Embedding

**Ref:** [Hugging Face Embedding Models](https://python.langchain.com/docs/integrations/platforms/huggingface#embedding-models)

In [45]:
embeddings = HuggingFaceEmbeddings(model_name='sentence-transformers/all-MiniLM-L6-v2', 
                                   model_kwargs={'device': 'cpu'})

In [46]:
db = FAISS.from_documents(texts, embeddings)
db.save_local(DB_FAISS_PATH)

# Load Model

In [47]:
llm = CTransformers(model='../models/llama-2-7b-chat.ggmlv3.q8_0.bin', 
                    model_type='llama', 
                    config={'max_new_tokens': 256,'temperature':0.3, 'context_length': 700}
                    )
    

# Prompt Template

In [48]:
prompt_template = """Use the following pieces of information to answer the user's question.
If you don't know the answer, just say that you don't know, don't try to make up an answer.

Context: {context}
Question: {question}

Only return the helpful answer below and nothing else.
Helpful and accurate answer:
"""

In [49]:
prompt = PromptTemplate(template = prompt_template, 
                        input_variables=['context', 'question'])

# Retriever

A retriever is an interface that returns documents given an unstructured query. It is more general than a vector store. A retriever does not need to be able to store documents, only to return (or retrieve) them. Vector stores can be used as the backbone of a retriever, but there are other types of retrievers as well.

Retrievers accept a string query as input and return a list of Document's as output.

**Ref:** [Retrievers](https://python.langchain.com/docs/modules/data_connection/retrievers/)

In [50]:
retriever = db.as_retriever(search_kwargs={'k': 2}, 
                            return_source_documents=True, 
                            chain_type_kwargs={'prompt': prompt}
                            )

In [51]:
qa_chain = RetrievalQA.from_chain_type(llm=llm, chain_type='stuff',retriever=retriever)

# Chat

In [52]:
query = "List the Deep learning-based Methods mentioned in the paper"

In [53]:
result = qa_chain({"query": query})
print(result)

{'query': 'List the Deep learning-based Methods mentioned in the paper', 'result': ' The following deep learning-based methods are mentioned in the paper:\n1. DeepDTA (Ozturk, Ozgur, and Ozkirimli, 2018)\n2. Deep-ConvDTI (Lee, Keum, and Nam, 2019)\n3. GraphDTA (Nguyen et al., 2021)\n4. GNNs (Kipf and Welling, 2016; Wu et al., 2021a, 2022b, 2023)\n5. RNNs (Armenteros et al., 2020)\n6. TransformerCPI (Chen et al., 2020a)\n7. HyperattentionDTI (Zhao et al., 2022)'}


In [54]:
print(result['result'])

 The following deep learning-based methods are mentioned in the paper:
1. DeepDTA (Ozturk, Ozgur, and Ozkirimli, 2018)
2. Deep-ConvDTI (Lee, Keum, and Nam, 2019)
3. GraphDTA (Nguyen et al., 2021)
4. GNNs (Kipf and Welling, 2016; Wu et al., 2021a, 2022b, 2023)
5. RNNs (Armenteros et al., 2020)
6. TransformerCPI (Chen et al., 2020a)
7. HyperattentionDTI (Zhao et al., 2022)
