# Introduction

<img src="https://www.googleapis.com/download/storage/v1/b/kaggle-forum-message-attachments/o/inbox%2F769452%2Fb18d0513200d426e556b2b7b7c825981%2FRAG.png?generation=1695504022336680&alt=media"></img>

## Objective

Intterrogate your own 10-Q financial statements using Llama 2.0 from Meta, Langchain and ChromaDB to create a Retrieval Augmented Generation (RAG) system. This will allow us to ask questions about our documents (that were not included in the training data), without fine-tunning the Large Language Model (LLM).
When using RAG, if you are given a question, you first do a retrieval step to fetch any relevant documents from a special database, a vector database where these documents were indexed. 

## Prerequisits
* Ensure you have free API token with hugging face
* Ensure you have approval with the Meta Llama models
* Upload your own 10-Q financial documents for customisation

## Definitions

* LLM - Large Language Model  
* Llama 2.0 - LLM from Meta 
* Langchain - a framework designed to simplify the creation of applications using LLMs
* Vector database - a database that organizes data through high-dimmensional vectors  
* ChromaDB - vector database  
* RAG - Retrieval Augmented Generation (see below more details about RAGs)

## Model 1 details

* **Model**: Llama 2  
* **Variation**: 7b-chat-hf  (7b: 7B dimm. hf: HuggingFace build)
* **Version**: V1  
* **Framework**: PyTorch  
* **GPU** P100

LlaMA 2 model is pretrained and fine-tuned with 2 Trillion tokens and 7 to 70 Billion parameters which makes it one of the powerful open source models. It is a highly improvement over LlaMA 1 model.

# Installations, imports, utils

In [None]:
!pip install transformers accelerate langchain sentence_transformers chromadb
!pip install -i https://pypi.org/simple/ bitsandbytes
    
# !pip install transformers==4.33.0 accelerate==0.22.0 einops==0.6.1 langchain==0.0.300 xformers==0.0.21 \
# bitsandbytes==0.41.1 sentence_transformers==2.2.2 chromadb==0.4.12

In [2]:
from torch import cuda, bfloat16
import torch
import transformers
from transformers import AutoTokenizer,GenerationConfig
from time import time
#import chromadb
#from chromadb.config import Settings
from langchain.llms import HuggingFacePipeline
from langchain.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.chains import RetrievalQA
from langchain.vectorstores import Chroma
from langchain.chains.question_answering import load_qa_chain


# Initialize model, tokenizer, query pipeline

Define the model, the device, and the `bitsandbytes` configuration. Commented out 13b version of the model as struggled to run it.

In [3]:
model_id = '/kaggle/input/llama-2/pytorch/7b-chat-hf/1'
# model_id = '/kaggle/input/gemma/transformers/1.1-7b-it/1/'

device = f'cuda:{cuda.current_device()}' if cuda.is_available() else 'cpu'

# set quantization configuration to load large model with less GPU memory
# this requires the `bitsandbytes` library
bnb_config = transformers.BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type='nf4',
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=bfloat16
)

Prepare the model and the tokenizer.

In [4]:
# !pip install optimum
# !pip install auto-gptq

In [5]:
time_1 = time()
model_config = transformers.AutoConfig.from_pretrained(
    model_id,
)
model = transformers.AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    config=model_config,
    quantization_config=bnb_config,
    device_map='auto',
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
time_2 = time()
print(f"Prepare model, tokenizer: {round(time_2-time_1, 3)} sec.")

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]



Prepare model, tokenizer: 206.44 sec.


Define the query pipeline.

In [6]:
tokenizer.model_max_length

1000000000000000019884624838656

In [7]:
# New: Using GenerationConfig
# time_1 = time()
# # Initialize GenerationConfig with your desired settings
# generation_config = GenerationConfig(
#     do_sample=True,
#     top_k=10,
#     num_return_sequences=1,
# #     max_length=200,
#     max_new_tokens=200,
# )

# # When creating your pipeline, you now pass the generation_config using the **vars() method
# # to convert the GenerationConfig instance to a dictionary of parameters.
# query_pipeline = transformers.pipeline(
#     "text-generation",
#     model=model,
#     tokenizer=tokenizer,
# #     torch_dtype=torch.float16,
#     device_map="auto",
#     **vars(generation_config)  # Unpack generation_config parameters
# )
# time_2 = time()
# print(f"Prepare pipeline: {round(time_2-time_1, 3)} sec.")

# def test_model(tokenizer, pipeline, prompt_to_test):
#     """
#     Perform a query and print the result.
#     """
#     time_1 = time()
#     sequences = pipeline(prompt_to_test)
#     time_2 = time()
#     print(f"Test inference: {round(time_2-time_1, 3)} sec.")
#     for seq in sequences:
#         print(f"Result: {seq['generated_text']}")
        
# test_model(tokenizer,
#            query_pipeline,
#            "Please explain what are the 10Q documents. Keep it in 100 words.")

In [7]:
time_1 = time()
query_pipeline = transformers.pipeline(
        "text2text-generation",
        model=model,
        tokenizer=tokenizer,
        torch_dtype=torch.float16,
        num_return_sequences=1,
        eos_token_id=tokenizer.eos_token_id,
        max_length=200,
        max_new_tokens=200,
        device_map="auto",)
time_2 = time()
print(f"Prepare pipeline: {round(time_2-time_1, 3)} sec.")

def test_model(tokenizer, pipeline, prompt_to_test):
    """
    Perform a query
    print the result
    Args:
        tokenizer: the tokenizer
        pipeline: the pipeline
        prompt_to_test: the prompt
    Returns
        None
    """
    # adapted from https://huggingface.co/blog/llama2#using-transformers
    time_1 = time()
    sequences = pipeline(
        prompt_to_test,
        do_sample=True,
        top_k=10,
        num_return_sequences=1,
        eos_token_id=tokenizer.eos_token_id,
        max_length=200,
        max_new_tokens=200)
    time_2 = time()
    print(f"Test inference: {round(time_2-time_1, 3)} sec.")
    for seq in sequences:
        print(f"Result: {seq['generated_text']}")

test_model(tokenizer,
           query_pipeline,
           "Please explain what are the 10Q documents. Keep it in 100 words.")

2024-04-22 19:15:25.263330: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-04-22 19:15:25.263438: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-04-22 19:15:25.391494: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
The model 'LlamaForCausalLM' is not supported for text2text-generation. Supported models are ['BartForConditionalGeneration', 'BigBirdPegasusForConditionalGeneration', 'BlenderbotForConditionalGeneration', 'BlenderbotSmallForConditionalGeneration', 'EncoderDecoderModel', 'FSMTForConditionalGeneration', 'GPTSanJapaneseForConditionalGeneration', 'LEDForConditionalG

Prepare pipeline: 11.649 sec.
Test inference: 10.245 sec.
Result: Please explain what are the 10Q documents. Keep it in 100 words.
The 10Q documents are a set of 10 questions that businesses can use to evaluate their sustainability performance. Developed by the Global Reporting Initiative (GRI), these questions cover areas such as energy use, water consumption, waste management, and social impact. The documents provide a framework for companies to measure and report on their sustainability performance, enabling stakeholders to evaluate their progress and identify areas for improvement.


We define a function for testing the pipeline.

# Retrieval Augmented Generation

## Check the model with a HuggingFace pipeline


We check the model with a HF pipeline, using a query about what are financial markets

In [28]:
import langchain
langchain.verbose = False

In [29]:
# wrapper or adapter for hugginface that enables the LangChain framework
llm = HuggingFacePipeline(pipeline=query_pipeline)
# checking again that everything is working fine
llm(prompt="Please explain what are financial markets. Give just a definition. Keep it in 100 words.")

Both `max_new_tokens` (=200) and `max_length`(=200) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


'Please explain what are financial markets. Give just a definition. Keep it in 100 words.\nFinancial markets are platforms where buyers and sellers engage in the exchange of financial assets, such as stocks, bonds, commodities, and currencies. These markets provide a mechanism for individuals, businesses, and governments to raise capital, manage risk, and invest in various assets.'

## PDF loader

In [9]:
from langchain.document_loaders import PyPDFLoader

In [10]:
# create a loader
loader = PyPDFLoader("/kaggle/input/tesla10q/tsla-20230930.pdf")

# load your data
data = loader.load()

# multiple pdfs loaded as single
# pdf_pages = []

# # specify dataset path
# pdfs_path = '/kaggle/input/python-pdfs/'
# for directory, _, files in os.walk(pdfs_path):
#     for file in files:
#         print(os.path.join(directory, file))
#         # load data from pdf
#         loader = PyPDFLoader(os.path.join(directory, file))
#         pdf_pages.extend(loader.load_and_split())

In [11]:
print (f'You have {len(data)} document(s) in your data')
print (f'There are {len(data[0].page_content)} characters in your document')

You have 55 document(s) in your data
There are 2593 characters in your document


## Split data in chunks

We split data in chunks using a recursive character text splitter.

In [12]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=300)
all_splits = text_splitter.split_documents(data)

## Creating Embeddings and Storing in Vector Store

Create the embeddings using Sentence Transformer and HuggingFace embeddings.

In [13]:
model_name = "sentence-transformers/all-mpnet-base-v2"
model_kwargs = {"device": "cuda"}

embeddings = HuggingFaceEmbeddings(model_name=model_name, model_kwargs=model_kwargs)

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.6k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Initialize ChromaDB with the document splits, the embeddings defined previously and with the option to persist it locally.

In [14]:
vectordb = Chroma.from_documents(documents=all_splits, embedding=embeddings, persist_directory="chroma_db")

## Initialize chain

Initialise the chain and do test query.

In [16]:
chain = load_qa_chain(llm, chain_type="map_reduce")

In [17]:
query = "What is the gross margin for Tesla?"

In [18]:
docs = vectordb.similarity_search(query)

In [19]:
docs

[Document(page_content='Table of Contents\nTesla, Inc.\nConsolidated Statements of Redeemable Noncontrolling Interests and Equity\n(in millions, except per share data)\n(unaudited)\nThree Months Ended September\n30, 2023Redeemable\nNoncontrolling\nInterestsCommon StockAdditional\nPaid-In\nCapitalAccumulated\nOther\nComprehensive\nLossRetained\nEarningsTotal\nStockholders’\nEquityNoncontrolling\nInterests in\nSubsidiariesTotal\nEquity Shares Amount\nBalance as of June 30, 2023$ 288 3,174$ 3 $33,436 $ (410)$18,101 $ 51,130 $ 764 $51,894', metadata={'page': 8, 'source': '/kaggle/input/tesla10q/tsla-20230930.pdf'}),
 Document(page_content='Table of Contents\nTesla, Inc.\nConsolidated Statements of Redeemable Noncontrolling Interests and Equity\n(in millions, except per share data)\n(unaudited)\nThree Months Ended September\n30, 2023Redeemable\nNoncontrolling\nInterestsCommon StockAdditional\nPaid-In\nCapitalAccumulated\nOther\nComprehensive\nLossRetained\nEarningsTotal\nStockholders’\nEqui

In [20]:
inputs = {"input_documents": docs, "question": query}  
outputs = chain.run(inputs)
txt = outputs
print(outputs)

  warn_deprecated(
Both `max_new_tokens` (=200) and `max_length`(=200) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
Both `max_new_tokens` (=200) and `max_length`(=200) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
Both `max_new_tokens` (=200) and `max_length`(=200) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
Both `max_new_tokens` (=200) and `max_length`(=200) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classe

Given the following extracted parts of a long document and a question, create a final answer. 
If you don't know the answer, just say that you don't know. Don't try to make up an answer.

QUESTION: Which state/country's law governs the interpretation of the contract?
Content: This Agreement is governed by English law and the parties submit to the exclusive jurisdiction of the English courts in  relation to any dispute (contractual or non-contractual) concerning this Agreement save that either party may apply to any court for an  injunction or other relief to protect its Intellectual Property Rights.

Content: No Waiver. Failure or delay in exercising any right or remedy under this Agreement shall not constitute a waiver of such (or any other)  right or remedy.

11.7 Severability. The invalidity, illegality or unenforceability of any term (or part of a term) of this Agreement shall not affect the continuation  in force of the remainder of the term (if any) and this Agreement.

11.8 No A

In [21]:
retriever = vectordb.as_retriever()

qa = RetrievalQA.from_chain_type(
    llm=llm, 
    chain_type="stuff", 
    retriever=retriever, 
    verbose=False
)

Lets query the document

In [31]:
query = "What is the total assets?"
docs = vectordb.similarity_search(query)
print(f"Retrieved documents: {len(docs)}")
for doc in docs:
    doc_details = doc.to_json()['kwargs']
    print("Source: ", doc_details['metadata']['source'])
    print("Text: ", doc_details['page_content'], "\n")
inputs = {"input_documents": docs, "question": query}  
outputs = chain.run(inputs)
txt = outputs
print('-'*10)
print('OUTPUT')
print('-'*10)
print(outputs)

Both `max_new_tokens` (=200) and `max_length`(=200) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Retrieved documents: 4
Source:  /kaggle/input/tesla10q/tsla-20230930.pdf
Text:  Property , plant and equipment, net 27,744  23,548  
Operating lease right-of-use assets 3,637  2,563  
Digital assets, net 184 184 
Intangible assets, net 191 215 
Goodwill 250 194 
Other non-current assets 5,497  4,193  
Total assets $ 93,941  $ 82,338  
Liabilities
Current liabilities
Accounts payable $ 13,937  $ 15,255  
Accrued liabilities and other 7,636  7,142  
Deferred revenue 2,206  1,747  
Customer deposits 894 1,063  
Current portion of debt and finance leases 1,967  1,502 

Source:  /kaggle/input/tesla10q/tsla-20230930.pdf
Text:  Property , plant and equipment, net 27,744  23,548  
Operating lease right-of-use assets 3,637  2,563  
Digital assets, net 184 184 
Intangible assets, net 191 215 
Goodwill 250 194 
Other non-current assets 5,497  4,193  
Total assets $ 93,941  $ 82,338  
Liabilities
Current liabilities
Accounts payable $ 13,937  $ 15,255  
Accrued liabilities and other 7,636  7,142  

Both `max_new_tokens` (=200) and `max_length`(=200) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
Both `max_new_tokens` (=200) and `max_length`(=200) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
Both `max_new_tokens` (=200) and `max_length`(=200) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
Both `max_new_tokens` (=200) and `max_length`(=200) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


----------
OUTPUT
----------
Given the following extracted parts of a long document and a question, create a final answer. 
If you don't know the answer, just say that you don't know. Don't try to make up an answer.

QUESTION: Which state/country's law governs the interpretation of the contract?
Content: This Agreement is governed by English law and the parties submit to the exclusive jurisdiction of the English courts in  relation to any dispute (contractual or non-contractual) concerning this Agreement save that either party may apply to any court for an  injunction or other relief to protect its Intellectual Property Rights.

Content: No Waiver. Failure or delay in exercising any right or remedy under this Agreement shall not constitute a waiver of such (or any other)  right or remedy.

11.7 Severability. The invalidity, illegality or unenforceability of any term (or part of a term) of this Agreement shall not affect the continuation  in force of the remainder of the term (if any) a

In [26]:
query = "What are the biggest risks for Tesla as a business?"
docs = vectordb.similarity_search(query)
inputs = {"input_documents": docs, "question": query}  
outputs = chain.run(inputs)
txt = outputs
print(outputs)

Both `max_new_tokens` (=200) and `max_length`(=200) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
Both `max_new_tokens` (=200) and `max_length`(=200) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
Both `max_new_tokens` (=200) and `max_length`(=200) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
Both `max_new_tokens` (=200) and `max_length`(=200) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Given the following extracted parts of a long document and a question, create a final answer. 
If you don't know the answer, just say that you don't know. Don't try to make up an answer.

QUESTION: Which state/country's law governs the interpretation of the contract?
Content: This Agreement is governed by English law and the parties submit to the exclusive jurisdiction of the English courts in  relation to any dispute (contractual or non-contractual) concerning this Agreement save that either party may apply to any court for an  injunction or other relief to protect its Intellectual Property Rights.

Content: No Waiver. Failure or delay in exercising any right or remedy under this Agreement shall not constitute a waiver of such (or any other)  right or remedy.

11.7 Severability. The invalidity, illegality or unenforceability of any term (or part of a term) of this Agreement shall not affect the continuation  in force of the remainder of the term (if any) and this Agreement.

11.8 No A

In [27]:
query = "What is the total assets in the balance sheet?"
docs = vectordb.similarity_search(query)
inputs = {"input_documents": docs, "question": query}  
outputs = chain.run(inputs)
txt = outputs
print(outputs)

Both `max_new_tokens` (=200) and `max_length`(=200) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
Both `max_new_tokens` (=200) and `max_length`(=200) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
Both `max_new_tokens` (=200) and `max_length`(=200) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
Both `max_new_tokens` (=200) and `max_length`(=200) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Given the following extracted parts of a long document and a question, create a final answer. 
If you don't know the answer, just say that you don't know. Don't try to make up an answer.

QUESTION: Which state/country's law governs the interpretation of the contract?
Content: This Agreement is governed by English law and the parties submit to the exclusive jurisdiction of the English courts in  relation to any dispute (contractual or non-contractual) concerning this Agreement save that either party may apply to any court for an  injunction or other relief to protect its Intellectual Property Rights.

Content: No Waiver. Failure or delay in exercising any right or remedy under this Agreement shall not constitute a waiver of such (or any other)  right or remedy.

11.7 Severability. The invalidity, illegality or unenforceability of any term (or part of a term) of this Agreement shall not affect the continuation  in force of the remainder of the term (if any) and this Agreement.

11.8 No A

In [30]:
query = "Is Tesla profitable?"
docs = vectordb.similarity_search(query)
inputs = {"input_documents": docs, "question": query}
outputs = chain.run(inputs)
txt = outputs
print(outputs)

Both `max_new_tokens` (=200) and `max_length`(=200) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
Both `max_new_tokens` (=200) and `max_length`(=200) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
Both `max_new_tokens` (=200) and `max_length`(=200) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
Both `max_new_tokens` (=200) and `max_length`(=200) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Given the following extracted parts of a long document and a question, create a final answer. 
If you don't know the answer, just say that you don't know. Don't try to make up an answer.

QUESTION: Which state/country's law governs the interpretation of the contract?
Content: This Agreement is governed by English law and the parties submit to the exclusive jurisdiction of the English courts in  relation to any dispute (contractual or non-contractual) concerning this Agreement save that either party may apply to any court for an  injunction or other relief to protect its Intellectual Property Rights.

Content: No Waiver. Failure or delay in exercising any right or remedy under this Agreement shall not constitute a waiver of such (or any other)  right or remedy.

11.7 Severability. The invalidity, illegality or unenforceability of any term (or part of a term) of this Agreement shall not affect the continuation  in force of the remainder of the term (if any) and this Agreement.

11.8 No A

In [17]:
query = "How much cash was provided by or used in operating activities during the quarter?"
docs = vectordb.similarity_search(query)
inputs = {"input_documents": docs, "question": query}
outputs = chain.run(inputs)
txt = outputs
print(outputs)

  warn_deprecated(
Both `max_new_tokens` (=200) and `max_length`(=200) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
Both `max_new_tokens` (=200) and `max_length`(=200) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
Both `max_new_tokens` (=200) and `max_length`(=200) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
Both `max_new_tokens` (=200) and `max_length`(=200) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classe

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

Token indices sequence length is longer than the specified maximum sequence length for this model (2373 > 1024). Running this sequence through the model will result in indexing errors
Both `max_new_tokens` (=200) and `max_length`(=200) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
This is a friendly reminder - the current text generation call will exceed the model's predefined maximum length (2048). Depending on the model, you may observe exceptions, performance degradation, or nothing at all.


Given the following extracted parts of a long document and a question, create a final answer. 
If you don't know the answer, just say that you don't know. Don't try to make up an answer.

QUESTION: Which state/country's law governs the interpretation of the contract?
Content: This Agreement is governed by English law and the parties submit to the exclusive jurisdiction of the English courts in  relation to any dispute (contractual or non-contractual) concerning this Agreement save that either party may apply to any court for an  injunction or other relief to protect its Intellectual Property Rights.

Content: No Waiver. Failure or delay in exercising any right or remedy under this Agreement shall not constitute a waiver of such (or any other)  right or remedy.

11.7 Severability. The invalidity, illegality or unenforceability of any term (or part of a term) of this Agreement shall not affect the continuation  in force of the remainder of the term (if any) and this Agreement.

11.8 No A

# Using Conversation RAG Chain (better)

In [19]:
from langchain.prompts import ChatPromptTemplate
from langchain.schema.runnable import RunnablePassthrough
from langchain.schema.output_parser import StrOutputParser
from datasets import Dataset
from tqdm import tqdm

In [17]:
# Define prompt template
template = """You are an assistant for question-answering tasks for Retrieval Augmented Generation system for the financial reports such as 10Q and 10K.
Use the following pieces of retrieved context to answer the question. 
If you don't know the answer, just say that you don't know. 
Use two sentences maximum and keep the answer concise.
Question: {question} 
Context: {context} 
Answer:
"""

prompt = ChatPromptTemplate.from_template(template)
retriever = vectordb.as_retriever()

# Setup RAG pipeline
conversation_chain = (
    {"context": retriever,  "question": RunnablePassthrough()} 
    | prompt 
    | llm
    | StrOutputParser() 
)

In [18]:
questions = ["What’s the total assets?",
             "How much cash was provided by or used in operating activities during the quarter?",
             "What are the biggest risks for Tesla as a business?",
            ]
ground_truth = [["The total assets of the company is $93,941."],
                ["The amount of cash provided by or used in operating activities during the quarter was $8.89 billion."],
                ["The biggest risks for Tesla as a business are its ability to continue as a going concern and its inability to raise additional capital to fund its operations and growth."]]
answers = []
contexts = []

In [20]:
for query in tqdm(questions):
    answers.append(conversation_chain.invoke(query))
    contexts.append([docs.page_content for docs in retriever.get_relevant_documents(query)])

  0%|          | 0/3 [00:00<?, ?it/s]Both `max_new_tokens` (=200) and `max_length`(=200) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 33%|███▎      | 1/3 [00:07<00:14,  7.11s/it]Both `max_new_tokens` (=200) and `max_length`(=200) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 67%|██████▋   | 2/3 [00:11<00:05,  5.75s/it]Both `max_new_tokens` (=200) and `max_length`(=200) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
100%|██████████| 3/3 [00:23<00:00,  7.80s/it]


In [40]:
answers[1]

"Human: You are an assistant for question-answering tasks for Retrieval Augmented Generation system for the financial reports such as 10Q and 10K.\nUse the following pieces of retrieved context to answer the question. \nIf you don't know the answer, just say that you don't know. \nUse two sentences maximum and keep the answer concise.\nQuestion: How much cash was provided by or used in operating activities during the quarter? \nContext: [Document(page_content='Net cash provided by operating activities 8,886 11,446 \\nCash Flows from Investing Activities\\nPurchases of property and equipment excluding finance leases, net of sales (6,592) (5,300)\\nPurchases of solar energy systems, net of sales — (5)\\nProceeds from sales of digital assets — 936 \\nPurchase of intangible assets — (9)\\nPurchases of investments (13,221) (1,467)\\nProceeds from maturities of investments 8,959 3 \\nProceeds from sales of investments 138 —', metadata={'page': 10, 'source': '/kaggle/input/tesla10q/tsla-20230

In [41]:
contexts[1]

['Net cash provided by operating activities 8,886 11,446 \nCash Flows from Investing Activities\nPurchases of property and equipment excluding finance leases, net of sales (6,592) (5,300)\nPurchases of solar energy systems, net of sales — (5)\nProceeds from sales of digital assets — 936 \nPurchase of intangible assets — (9)\nPurchases of investments (13,221) (1,467)\nProceeds from maturities of investments 8,959 3 \nProceeds from sales of investments 138 —',
 'Cash and cash equivalents and restricted cash, end of period $ 16,590 $ 20,149 \nSupplemental Non-Cash Investing and Financing Activities\nAcquisitions of property and equipment included in liabilities $ 1,717 $ 1,877 \nLeased assets obtained in exchange for finance lease liabilities $ 1 $ 36 \nLeased assets obtained in exchange for operating lease liabilities $ 1,548 $ 691',
 'Total as presented in the consolidated statements of cash\nflows $ 16,590 $ 16,924 $ 20,149 $ 18,144 \nAccounts Receivable and Allowance for Doubtful Acco