In this notebook we will do the below mentioned steps:

1. Load the Llama-2 paper pdf using LangChain document loaders.
2. Create text chunks.
3. Create Embeddings on the text chunks.
4. Save the embeddings in Vectore Store using chroma.
5. Perform Semantic search without using LLM
6. Perform question answering using Retrieval-Augmented-Generation on the document using LLM (Llama-2)

In [1]:
import torch 
import time
import transformers # HF import
from langchain import HuggingFacePipeline # To build the HF pipeline using Llama-2
from langchain import PromptTemplate,  LLMChain # To create PromptTemplate and LLMChain
from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM , AutoModel  # For creating the model and tokenizer


In [2]:
model_name = 'meta-llama/Llama-2-7b-chat-hf' # Model path for Llama-2 finetuned chat model

device = f'cuda:{torch.cuda.current_device()}' if torch.cuda.is_available() else 'cpu'

tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token

# set quantization configuration to load large model with less GPU memory
# this requires the `bitsandbytes` library
bnb_config = transformers.BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type='nf4',
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
    llm_int8_enable_fp32_cpu_offload=True
)

model_config = transformers.AutoConfig.from_pretrained(
    model_name,
)

model = transformers.AutoModelForCausalLM.from_pretrained(
    model_name,
    #trust_remote_code=True,
    #config=model_config,
    quantization_config=bnb_config,
    device_map='auto',
)

model.eval()

pipe = transformers.pipeline("text-generation",
                model=model,
                tokenizer= tokenizer,
                torch_dtype=torch.bfloat16,
                device_map="auto",
                max_new_tokens = 512,
                do_sample=True,
                top_k=1,
                num_return_sequences=1,
                eos_token_id=tokenizer.eos_token_id,
                repetition_penalty=1.2
                )
        
llm = HuggingFacePipeline(pipeline=pipe,
                          model_kwargs = {'temperature' : 0.7})


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

#### Document Preparation

In [3]:
from langchain.schema import Document

docs = [
    Document(
        page_content="The key idea is to randomly drop units (along with their connections) from the neural network during training. This prevents units from co-adapting too much. This significantly reduces overfitting and gives major improvements over other regularization methods",
        metadata={"name":"Dropout: a simple way to prevent neural networks from overfitting", "year": 2014, 
                  "Authors": "Hinton, G.E., Krizhevsky, A., Srivastava, N., Sutskever, I., & Salakhutdinov, R.", "cited":2084 , 
                  "Field" : "'Neural Network','Regularization'"},
    ),
    Document(
        page_content="We present a residual learning framework to ease the training of deep neural networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.",
        metadata={"name":"Deep Residual Learning for Image Recognition", "year": 2016, 
                  "Authors": "He, K., Ren, S., Sun, J., & Zhang, X. (2016). CoRR", "cited":1436 , 
                  "Field" : "'Image Recognition','Computer Vision'"},
    ),
    Document(
        page_content="Training Deep Neural Networks is complicated by the fact that the distribution of each layer's inputs changes during training, as the parameters of the previous layers change.  We refer to this phenomenon as internal covariate shift, and address the problem by normalizing layer inputs.  Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.",
        metadata={"name":"Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift", "year": 2015, 
                  "Authors": "Sergey Ioffe, Christian Szegedy", "cited":946 , 
                  "Field" : "'Neural Network','Deep Learning','Speed up Training Process'"},
    ),
    Document(
        page_content="Convolutional Neural Networks (CNNs) have been established as a powerful class of models for image recognition problems. Encouraged by these results, we provide an extensive empirical evaluation of CNNs on large-scale video classification using a new dataset of 1 million YouTube videos belonging to 487 classes",
        metadata={"name":"Large-Scale Video Classification with Convolutional Neural Networks ", "year": 2014, 
                  "Authors": "Fei-Fei, L., Karpathy, A., Leung, T., Shetty, S., Sukthankar, R., & Toderici, G.", "cited":865 , 
                  "Field" : "'Convolutional Neural Network','Deep Learning','Video Classfication'"},
    ),
    Document(
       page_content="We present a new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding. Our dataset contains photos of 91 objects types that would be easily recognizable by a 4 year old. Finally, we provide baseline performance analysis for bounding box and segmentation detection results using a Deformable Parts Model.",
       metadata={"name":"Microsoft COCO: Common Objects in Context", "year": 2014, 
                  "Authors": "Belongie, S.J., Dollár, P., Hays, J., Lin, T., Maire, M., Perona, P., Ramanan, D., & Zitnick, C.L", "cited":830 , 
                 "Field" : "'Convolutional Neural Network','Object Detection','Dataset'"},
    ),
]

#### Creation of Embeddings

We will use the open source sentence transformer embedding to create the embedding.

In [4]:
from langchain.embeddings import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(model_name='sentence-transformers/all-MiniLM-L6-v2')


#### Vector Store

In [5]:
from langchain.vectorstores import Chroma

# load embeddings into Chroma - need to pass docs ,embedding function and path of the db

db = Chroma.from_documents(docs,
                           embedding=embeddings)


#### Metadata Field Info

In [6]:
from langchain.chains.query_constructor.base import AttributeInfo

metadata_field_info = [
    AttributeInfo(
        name="name",
        description="Name of the corresponding Paper",
        type="string",
    ),
    AttributeInfo(
        name="year",
        description="The year paper was published",
        type="integer",
    ),
    AttributeInfo(
        name="Authors",
        description="Authors of the corresponding paper",
        type="string",
    ),
    AttributeInfo(
        name="cited",
        description="Number of times the paper is cited",
        type="integer",
    ),
    AttributeInfo(
        name="Field",
        description="In which category the paper belongs to",
        type="string",
    ),
]
document_content_description = "Brief description of some famous Machine Learning Papers"


#### Creating a Retrieval QA Chain using LLM (llama-2)

In [7]:
#Create our Q/A Chain

from langchain.retrievers.self_query.base import SelfQueryRetriever

retriever = SelfQueryRetriever.from_llm(
    llm,
    db,
    document_content_description,
    metadata_field_info,
    verbose=True
)


In [9]:
### without RAG - - Hallucinations 
db_retriever = db.as_retriever()
db_retriever.get_relevant_documents("langchain concepts")


[Document(page_content='We present a new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding. Our dataset contains photos of 91 objects types that would be easily recognizable by a 4 year old. Finally, we provide baseline performance analysis for bounding box and segmentation detection results using a Deformable Parts Model.', metadata={'Authors': 'Belongie, S.J., Dollár, P., Hays, J., Lin, T., Maire, M., Perona, P., Ramanan, D., & Zitnick, C.L', 'Field': "'Convolutional Neural Network','Object Detection','Dataset'", 'cited': 830, 'name': 'Microsoft COCO: Common Objects in Context', 'year': 2014}),
 Document(page_content='We present a residual learning framework to ease the training of deep neural networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the la

In [15]:
retriever.get_relevant_documents("Which paper cited more than 2000 ?")


[Document(page_content='The key idea is to randomly drop units (along with their connections) from the neural network during training. This prevents units from co-adapting too much. This significantly reduces overfitting and gives major improvements over other regularization methods', metadata={'Authors': 'Hinton, G.E., Krizhevsky, A., Srivastava, N., Sutskever, I., & Salakhutdinov, R.', 'Field': "'Neural Network','Regularization'", 'cited': 2084, 'name': 'Dropout: a simple way to prevent neural networks from overfitting', 'year': 2014})]

In [11]:
retriever.get_relevant_documents("object detection Dataset?")


OutputParserException: Parsing text
```json
{
    "query": "object detection",
    "filter": "and(eq(\"field\", \"Object Detection\"), gt(\"cited\", 50))"
}
```
 raised following error:
Received invalid attributes field. Allowed attributes are ['name', 'year', 'Authors', 'cited', 'Field']

In [12]:
retriever.get_relevant_documents("Which papers has less 1000 citation?")


OutputParserException: Parsing text
```json
{
    "query": "less than 1000 citations",
    "filter": "lt(cited, 1000)"
}
```
 raised following error:
Unexpected token Token('COMMA', ',') at line 1, column 9.
Expected one of: 
	* LPAR
Previous tokens: [Token('CNAME', 'cited')]


In [13]:
retriever.get_relevant_documents("Please let me know the details about batch normalization ?")


OutputParserException: Parsing text
```json
{
    "query": "batch normalization",
    "filter": "and(eq(\"authors\", \"Ian Goodfellow, Aaron Courville, Pierre Vincent\" ), gt(\"cited\", 500))"
}
```
 raised following error:
Received invalid attributes authors. Allowed attributes are ['name', 'year', 'Authors', 'cited', 'Field']

In [14]:
retriever.get_relevant_documents("Please let me know the details about Regularizaton ?")



OutputParserException: Parsing text
```json
{
    "query": "Regularization",
    "filter": "and(eq(\"authors\", \"J. Shlensky-Barletta, J. A. M. Maaten, C. E. Rasmussen\"), gt(\"cited\", 50))"
}
```
 raised following error:
Received invalid attributes authors. Allowed attributes are ['name', 'year', 'Authors', 'cited', 'Field']