# L7: Conversational RAG

<p style="background-color:#fff6e4; padding:15px; border-width:3px; border-color:#f5ecda; border-style:solid; border-radius:6px"> ⏳ <b>Note <code>(Kernel Starting)</code>:</b> This notebook takes about 30 seconds to be ready to use. You may start and watch the video while you wait.</p>

In [1]:
import warnings
warnings.filterwarnings('ignore')

## Import libraries

In [2]:
from ai21 import AI21Client
from ai21.models.chat import ChatMessage
import uuid
import time

<div style="background-color:#fff6ff; padding:13px; border-width:3px; border-color:#efe6ef; border-style:solid; border-radius:6px">
<p> 💻 &nbsp; <b>Access <code>requirements.txt</code> and <code>utils.py</code> files:</b> 1) click on the <em>"File"</em> option on the top menu of the notebook and then 2) click on <em>"Open"</em>.

<p> ⬇ &nbsp; <b>Download Notebooks:</b> 1) click on the <em>"File"</em> option on the top menu of the notebook and then 2) click on <em>"Download as"</em> and select <em>"Notebook (.ipynb)"</em>.</p>

<p> 📒 &nbsp; For more help, please see the <em>"Appendix – Tips, Help, and Download"</em> Lesson.</p>
</div>

## Load API key and create AI21Client

In [3]:
from utils import get_ai21_api_key
ai21_api_key = get_ai21_api_key()
client = AI21Client(api_key=ai21_api_key)

In [4]:
from utils import call_convrag

conversation_history = []
def convrag_response(message):
  conversation_history.append(ChatMessage(content=message, role="user"))
  chat_response = call_convrag(client, conversation_history)
  # the LLM response to user query
  response = chat_response.choices[0].content
  # most relevant retrieved text segment
  text_retrieval = chat_response.sources[0].text
  # the file contains the retrieved text segment
  file_retrieval = chat_response.sources[0].file_name
  conversation_history.append(ChatMessage(content=response, role="assistant"))
  return response

## Prompt the Conversational RAG

<p style="background-color:#f7fff8; padding:15px; border-width:3px; border-color:#e0f0e0; border-style:solid; border-radius:6px"> 🚨
&nbsp; <b>Different Run Results:</b> The output generated by AI chat models can vary with each execution due to their probabilistic nature. Don't be surprised if your results differ from those shown in the video.</p>

In [5]:
message = "You are a financial analyst and what is the summary with Nvidia annual earnings report?"

response = convrag_response(message)

print(response)

NVIDIA's annual earnings report highlights significant growth across its business segments. For fiscal year 2024, the company reported a 126% increase in total revenue, reaching $60.9 billion, with Data Center revenue up an impressive 217%. The Compute & Networking segment saw a 530% increase in operating income, while Graphics revenue grew by 28%. Professional Visualization revenue increased by 1%, and Automotive revenue rose by 21%. NVIDIA also launched several new products and services, including the NVIDIA DGX Cloud, an AI-training-as-a-service platform, and introduced new GPUs based on the NVIDIA RTX Ada Lovelace architecture. Additionally, the company announced a partnership with MediaTek to develop automotive systems on chips. These results reflect strong demand across various industry verticals, including automotive, financial services, and healthcare


In [6]:
message = "How much did the Nvidia's revenue increase in the period?"

response = convrag_response(message)

print(response)

$60.9 billion, up 126% from fiscal year 2023


In [7]:
message = "Should I buy Nvidia stock now?"

response = convrag_response(message)

print(response)

NVIDIA's revenue increased by 126% to $60.9 billion in fiscal year 2024, driven by strong demand in Data Center, Gaming, and Automotive segments. However, whether to buy NVIDIA stock now depends on various factors, including market conditions, future growth prospects, and your investment goals. It's important to conduct a thorough analysis or consult a financial advisor before making any investment decisions


## Create a gradio chat app

In [8]:
import gradio as gr

demo = gr.Interface(
    fn=convrag_response,
    inputs=[gr.Textbox(label="Your questions:", lines=2)],
    outputs=[gr.Textbox(label="AI21 Conversational RAG answer:", lines=2)],
    examples=[
    "How have revenue, gross margin, and net income trended over the past year?",
    "What are the actions taken by the company about sustainability?",
    "What are the main risks of the company?",
    "How is the company allocating its capital (e.g., dividends, share repurchases, acquisitions)?",
    "Are there any concerning trends in operating cash flow?",
    ],
    title="Nvidia 10-K Q&A",
    description="Use AI21 Conversational RAG to retrieval insights from SEC filings",
    allow_flagging="never"
)


demo.launch(server_name="0.0.0.0")

* Running on local URL:  https://0.0.0.0:7860

To create a public link, set `share=True` in `launch()`.




## RAG with AI21 Jamba model in Langchain

In [9]:
from langchain_ai21 import ChatAI21
from langchain_community.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_huggingface.embeddings import HuggingFaceEmbeddings
from langchain_chroma import Chroma
from langchain_core.prompts import PromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

In [10]:
llm = ChatAI21(model="jamba-large",
               max_tokens = 4096,
               temperature = 0.4,
               top_p = 1)

In [11]:
loader = TextLoader("./Nvidia_10K_20240128.txt")
doc = loader.load()

In [12]:
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=2000,
    chunk_overlap=400)
documents = text_splitter.split_documents(doc)

In [13]:
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")
vectorstore = Chroma.from_documents(documents, embedding=embeddings)

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

### Prompt template

In [14]:
prompt = PromptTemplate.from_template(
    """You are an expert in answering questions based on provided context.
    Answer the question based on the provided context below to the best of your ability.
    The response must be complete, coherent and concise.
    If the answer is not contained in the context, please respond with "answer not in the document"\n
    Here is the context you should use to answer the question: \n
    <context>
    {context}
    </context> \n
    Based on the provided context, answer the following question: {question} \n
    Answer:"""
)

In [15]:
retriever = vectorstore.as_retriever(
    search_type="mmr", 
    search_kwargs={"k": 10})

In [16]:
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

# RAG chain
rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

### Query

In [17]:
q = "How has the company revenue and profit changed from last year?"

response = rag_chain.invoke(q)
print(f"Answer: {response}")

Answer: The company's revenue and profit saw significant growth in fiscal year 2024 compared to fiscal year 2023. Revenue increased by 126%, rising from $26.97 billion to $60.92 billion, driven by strong performance in the Compute & Networking and Graphics segments. Operating income surged by 681%, from $4.22 billion to $32.97 billion, reflecting higher revenue and improved margins. Net income also grew substantially, up 581% from $4.37 billion to $29.76 billion, with net income per diluted share increasing by 586%, from $1.74 to $11.93.


In [18]:
docs = retriever.invoke(q)
docs

[Document(metadata={'source': './Nvidia_10K_20240128.txt'}, page_content='Income tax expense (benefit) 6.6  (0.7) Net income 48.9  % 16.2  % Reportable Segments Revenue by Reportable Segments Year Ended Jan 28, 2024 Jan 29, 2023 $ Change % Change ($ in millions) Compute & Networking $ 47,405  $ 15,068  $ 32,337  215  % Graphics 13,517  11,906  1,611  14  % Total $ 60,922  $ 26,974  $ 33,948  126  % Operating Income by Reportable Segments Year Ended Jan 28, 2024 Jan 29, 2023 $ Change % Change ($ in millions) Compute & Networking $ 32,016  $ 5,083  $ 26,933  530  % Graphics 5,846  4,552  1,294  28  % All Other (4,890) (5,411) 521  (10) % Total $ 32,972  $ 4,224  $ 28,748  681  % Compute & Networking revenue  – The year-on-year increase was due to higher Data Center revenue. Compute grew 266% due to higher shipments of the NVIDIA Hopper GPU computing platform for the training and inference of LLMs, recommendation engines and generative AI applications. Networking was up 133% due to higher

In [19]:
questions = ["What are the main business risks for the company?",
             "What are the key financial metrics of the company?",
             "What is the profit growth of the company in the reporting period?",
             "Did the company have a cybersecurity incident based on the following SEC filing document?"
]

for q in questions:
    response = rag_chain.invoke(q)
    print("="*80)
    print(f"Question: {q}")
    print(f"Answer: {response}")

Question: What are the main business risks for the company?
Answer: The company faces several significant business risks, including:

1. **Industry and Market Challenges**:
	- Failure to adapt to evolving industry needs could harm financial results.
	- Intense competition may reduce market share and impact financial performance.
2. **Demand, Supply, and Manufacturing Risks**:
	- Inaccurate demand forecasting can lead to supply-demand imbalances.
	- Dependence on third-party suppliers for manufacturing and assembly reduces control over product quality, quantity, and delivery schedules.
	- Product defects can result in significant remediation costs and reputational damage.
3. **Global Operations Risks**:
	- Adverse economic conditions and international operations expose the company to financial and operational risks.
	- Cybersecurity breaches and data privacy violations could disrupt operations and damage reputation.
	- Business disruptions, including natural disasters and geopolitical t

In [20]:
vectorstore.delete_collection()