<a href="https://colab.research.google.com/github/hg402/Flowise_EduLLM/blob/main/wcars_python_advanced_rag_with_llama_3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<center><h1>Basic Python and Implementing Your Own LLM for Teaching<br>and Research</h1></center>

<center><h3>Haoyun (Harry) Gao <br>
Rutgers University, Department of Accounting and Information Systems</center></h3>

# Introduction

Welcome to this comprehensive guide on building your own Large Language Model (LLM). In this session, you will learn how to leverage one of the latest open-source LLM, Llama 3 by Meta, along with powerful tools like LangChain and API services from Groq and LlamaCloud. Our focus will be on retrieval-augmented generation (RAG), which enables querying PDF files with contextual questions, minimizing hallucinations.

By the end of this guide, you will gain hands-on experience in:
* Basic Python Commands: Master the essential syntax of today’s most widely used programming language.
* Document Parsing: Learn how to efficiently extract and preprocess information from various documents.
* Prompt Engineering: Understand the art of designing effective prompts to enhance the performance of your LLM.
* Vector Databases: Explore how to store and manage large-scale data efficiently using vector databases.

**Learning Outcomes**

After completing this script, you should be able to:

* Set up and run the Llama 3 model in your local environment.
* Integrate LangChain to complete retrieval-augmented generation (RAG) tasks.
* Utilize Groq and LlamaCloud APIs to extend the functionality of your LLM.
* Implement document parsing, prompt engineering, and vector databases in your projects.

Let's get started on this exciting journey to build and customize your own powerful Large Language Model!

# Basics of Python Syntax

## 1. Basic Data Types and Operations

Python has a variety of data types that let you store and manipulate different kinds of data.

Common Data Types:
* Integer (int): Whole numbers
* Float (float): Numbers with decimals
* String (str): Text data
* Boolean (bool): True or False

The hashtags indicate comments within the code, and the comments don't run.

In [1]:
# Integer
num1 = 10

# Float
num2 = 5.5

# Boolean
is_active = True

# String
text = "Welcome to Python!"

In [2]:
print(num1)
print(num2)
print(is_active)
print(text)

10
5.5
True
Welcome to Python!


Operations:
* Addition (+): Adds numbers or concatenates strings.
* Subtraction (-): Subtracts numbers.
* Multiplication (*): Multiplies numbers or repeats strings.
* Division (/): Divides numbers.

In [3]:
# Example operations
result = num1 + num2     # Addition
greeting = "Hello" + " World"  # String concatenation
repeated = "Python! " * 3  # Repeats the string 3 times

In [4]:
print(result)
print(greeting)
print(repeated)

15.5
Hello World
Python! Python! Python! 


## 2. Working with Strings

Strings are sequences of characters and can be manipulated in many ways.

**Basic String Operations**:
* Concatenation: + to join strings.
* Slicing: Use [start:end] to get part of a string.
* Methods: Functions that work on strings, like `.lower()`, `.upper()`, `.replace()`.

In [5]:
# Defining a string
greeting = "Hello, Python!"

In [6]:
# String methods
greeting_lower = greeting.lower()  # 'hello, python!'
greeting_upper = greeting.upper()  # 'HELLO, PYTHON!'

In [7]:
# Slicing
partial_text = greeting[0:5]  # 'Hello'

## 3. Lists

Lists are collections that let you store multiple items. Lists can store any data type and are great for managing data collections.

**Defining and Accessing Lists**

In [8]:
# Defining a list
fruits = ["apple", "banana", "cherry"]
print(fruits)

['apple', 'banana', 'cherry']


In [9]:
# Accessing elements
print(fruits[0])  # Output: apple

apple


In [10]:
# Adding and removing items
fruits.append("date")   # Adds 'date' to the end of the list
fruits.remove("banana") # Removes 'banana' from the list

In [11]:
print(fruits)

['apple', 'cherry', 'date']


**List Slicing**
* Just like strings, lists support slicing to access specific portions.

In [12]:
# Getting the first two elements
first_two = fruits[0:2]  # Output: ['apple', 'cherry']

## 4. Control Flow

Python uses control flow statements to make decisions and repeat actions.

If Statements

* **If Statements**: `if`, `elif` and `else` let us run code conditionally.
* Examples:

In [13]:
# Conditional check
age = 18
if age >= 18:
    print("You're an adult.")

You're an adult.


* **Loops**: Loops are used for repeating tasks. We’ll cover `for` and `while` loops here.
* Examples:

In [14]:
# Iterate over a list
for fruit in fruits:
    print(fruit)

apple
cherry
date


In [15]:
# Repeat a task until a condition is met
count = 0
while count < 5:
    print("Counting:", count)
    count += 1

Counting: 0
Counting: 1
Counting: 2
Counting: 3
Counting: 4


##5. Basic Functions

Functions let you package code into reusable pieces, which can be called with different inputs to get different results.

**Defining a Function**:
To define a function, use `def` followed by the function name and parameters.

In [16]:
def add(a, b):
    return a + b

sum_result = add(5, 10)

print(sum_result)

15


Example Function with Conditionals:

In [17]:
def check_age(age):
    if age >= 18:
        return "Adult"
    else:
        return "Minor"

status = check_age(16)  # Output: 'Minor'

## 6. Importing Modules

Modules are collections of functions and variables that expand Python’s capabilities. For example, the `math` module provides advanced math operations.

Importing and Using Modules:

In [18]:
# Import the math module
import math

# Using math functions
pi_value = math.pi
square_root = math.sqrt(16)  # Output: 4.0

# Simple Q&A Model Using Llama 3

## Housekeeping

To start with, let's set the timezone of the Python environment to US Eastern Time and retrieve the timezone names

In [1]:
import os
import time
os.environ['TZ'] = 'US/Eastern'
time.tzset()
time.tzname

('EST', 'EDT')

In [2]:
# Install the `ipython-autotime` package to measure the execution time of each cell.
!pip -qqq install ipython-autotime
%load_ext autotime

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/1.6 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.4/1.6 MB[0m [31m11.7 MB/s[0m eta [36m0:00:01[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m1.6/1.6 MB[0m [31m27.5 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.6/1.6 MB[0m [31m18.2 MB/s[0m eta [36m0:00:00[0m
[?25htime: 453 µs (started: 2024-11-01 06:05:29 -04:00)


Agree to the Google Drive's request in accessing your Google account when running the below code:

In [3]:
#  Mount your Google Drive to access files.
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive
time: 2min 8s (started: 2024-11-01 06:05:29 -04:00)


We now install the necessary libraries. You may ignore the compatibility issues reported here for now. It takes about 4 minutes.

In [4]:
# Install pip package manager.
!pip -qqq install pip --progress-bar off > /dev/null 2>&1
# Install LangChain Groq integration library version 0.1.3.
!pip -qqq install langchain-groq==0.1.3 --progress-bar off > /dev/null 2>&1
# Install LangChain framework version 0.1.17.
!pip -qqq install langchain==0.1.17 --progress-bar off > /dev/null 2>&1
# Install LlamaParse library version 0.1.3 for document parsing.
!pip -qqq install llama-parse==0.1.3 --progress-bar off > /dev/null 2>&1
# Install Qdrant client library version 1.9.1 for vector database interaction.
!pip -qqq install qdrant-client==1.9.1  --progress-bar off > /dev/null 2>&1
# Install Unstructured library version 0.13.6 for handling Markdown files.
!pip -qqq install "unstructured[md]"==0.13.6 --progress-bar off > /dev/null 2>&1
# Install FastEmbed library version 0.2.7 for embedding generation.
!pip -qqq install fastembed==0.2.7 --progress-bar off > /dev/null 2>&1
# Installing the flashrank package version 0.2.4 for efficient information retrieval
!pip -qqq install flashrank==0.2.4 --progress-bar off > /dev/null 2>&1

time: 3min 59s (started: 2024-11-01 06:07:38 -04:00)


Now we import the necessary packages:

In [5]:
# Importing the os module to interact with the operating system
import os

# Importing the textwrap module for text formatting functions
import textwrap

# Importing the Path class from pathlib to handle file paths
from pathlib import Path

# Importing the userdata module from Google Colab for user data interactions
from google.colab import userdata

# Importing the Markdown class from IPython.display to display markdown content in Jupyter notebooks
from IPython.display import Markdown

# Importing the RetrievalQA chain from LangChain for question-answering capabilities
from langchain.chains import RetrievalQA

# Importing the PromptTemplate class from LangChain for creating prompt templates
from langchain.prompts import PromptTemplate

# Importing the ContextualCompressionRetriever from LangChain for contextual information retrieval
from langchain.retrievers import ContextualCompressionRetriever

# Importing the FlashrankRerank class from LangChain for document compression and ranking
from langchain.retrievers.document_compressors import FlashrankRerank

# Importing the RecursiveCharacterTextSplitter from LangChain for splitting text recursively by characters
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Importing the Qdrant class from LangChain for vector store operations
from langchain.vectorstores import Qdrant

# Importing the UnstructuredMarkdownLoader from LangChain Community for loading markdown documents
from langchain_community.document_loaders import UnstructuredMarkdownLoader

# Importing the FastEmbedEmbeddings from LangChain Community for fast embeddings generation
from langchain_community.embeddings.fastembed import FastEmbedEmbeddings

# Importing the ChatPromptTemplate from LangChain Core for creating chat prompt templates
from langchain_core.prompts import ChatPromptTemplate

# Importing the ChatGroq class from LangChain Groq for chat functionalities with Groq
from langchain_groq import ChatGroq

# Importing the LlamaParse class from llama_parse for parsing documents
from llama_parse import LlamaParse

# Importing the getpass module for securely handling the API key prompts
import getpass

# Suppress warnings (for better experience during this session; not recommended in future)
import warnings
warnings.filterwarnings("ignore")

time: 19.6 s (started: 2024-11-01 06:11:37 -04:00)


The helper function below can make the output from our LLM response more readable in the coding environment.

In [6]:
def print_response(response):
    response_txt = response["result"]
    for chunk in response_txt.split("\n"):
        if not chunk:
            print()
            continue
        print("\n".join(textwrap.wrap(chunk, 100, break_long_words=False)))

time: 956 µs (started: 2024-11-01 06:12:53 -04:00)


## Run Llama 3 with general questions

Now we load the Groq API. To run a proper LLM that has comparable performance to ChatGPT, you need GPU resources on your computer or cloud server. Groq is a platform for you to run open-source LLM without worrying about GPU resources.

You may get a free API key at: https://console.groq.com/login.

Alternatively, you may input `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` for the purpose of this session only. It will be deactivated after this session.

In [7]:
api_key = getpass.getpass('Enter your GROQ API KEY:')

os.environ["GROQ_API_KEY"] = api_key

Enter your GROQ API KEY:··········
time: 4.78 s (started: 2024-11-01 06:13:00 -04:00)


Now we load the LlamaCloud API, which is capable of parsing PDF documents with tables.

You may get a free API key at: https://cloud.llamaindex.ai/login.

Alternatively, you may input `llx-WDx6KhktOtTqbc1zEJM3VqcaHjCnyHSW3zcCDbsSknMrFk5K` for the purpose of this session only. It will be deactivated after this session.

In [8]:
os.environ["LLAMA_PARSE"] = getpass.getpass('Enter your LlamaCloud API KEY:')

Enter your LlamaCloud API KEY:··········
time: 3.02 s (started: 2024-11-01 06:13:18 -04:00)


We now have loaded all the necessary packages and APIs. We can now make simple queries to Llama 3, just like how we interact with ChatGPT.

In [9]:
from groq import Groq
client = Groq(
    api_key= os.environ["GROQ_API_KEY"]
)

time: 174 ms (started: 2024-11-01 06:27:34 -04:00)


In [10]:
chat_completion = client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": "Plan for me a half-day trip in Newark, NJ.",
        }
    ],
    model="llama3-70b-8192",
)

time: 1.7 s (started: 2024-11-01 06:27:38 -04:00)


This is another helper function that can make the output from our LLM response more readable in the coding environment, similar to the above one:

In [11]:
def print_response_chat(response):
    response_txt = response.choices[0].message.content
    for chunk in response_txt.split("\n"):
        if not chunk:
            print()
            continue
        print("\n".join(textwrap.wrap(chunk, 100, break_long_words=False)))

time: 1.4 ms (started: 2024-11-01 06:44:33 -04:00)


Here we print out the result, using the helper function:

In [12]:
print_response_chat(chat_completion)

Newark, NJ! While often overshadowed by its neighboring city, New York, Newark has a rich history,
cultural attractions, and a vibrant community worth exploring. Here's a half-day itinerary to help
you discover the best of Newark:

**Stop 1: Newark Museum (9:30 am - 10:30 am)**
Begin your day at the Newark Museum, the state's largest and most comprehensive museum. With a
collection of over 80,000 objects, you'll find art, science, and local history exhibits. Be sure to
check out the Tibetan Art Gallery, featuring a stunning collection of Tibetan art and artifacts.

**Stop 2: Washington Park and the Newark Public Library (11:00 am - 12:00 pm)**
Take a short walk to Washington Park, a beautiful green space in the heart of the city. Admire the
park's historic monuments and take a stroll around the scenic walking paths. Right across the park
is the Newark Public Library, a stunning Beaux-Arts building with a beautiful reading room.

**Stop 3: Lunch at Ferry Street (12:00 pm - 1:00 pm)**
He

We can now put them together can create a "Chat Llama 3" in your browser!

In [32]:
from groq import Groq
client = Groq(
    api_key= os.environ["GROQ_API_KEY"]
)

chat_completion = client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": input("Enter your question: "),
        }
    ],
    model="llama3-70b-8192",
)

def print_response_chat(response):
    response_txt = response.choices[0].message.content
    for chunk in response_txt.split("\n"):
        if not chunk:
            print()
            continue
        print("\n".join(textwrap.wrap(chunk, 100, break_long_words=False)))

print_response_chat(chat_completion)

Enter your question: What are some latest regulation by PCAOB?
The Public Company Accounting Oversight Board (PCAOB) is a non-profit corporation that oversees the
audits of publicly traded companies and other issuers, as well as the auditors who perform those
audits. The PCAOB regularly issues new rules, guidance, and amendments to existing rules to improve
audit quality, enhance auditor independence, and protect investors. Here are some of the latest
regulations and guidance issued by the PCAOB:

1. **Critical Audit Matters (CAMs)**: In 2017, the PCAOB adopted a standard requiring auditors to
include Critical Audit Matters (CAMs) in their audit reports. CAMs are matters that have been
communicated to the audit committee and are related to material accounts or disclosures. The goal is
to provide more transparency and insights into the audit process.
2. **Audit Firm Holding Companies**: In 2020, the PCAOB issued a rule requiring audit firm holding
companies to register with the PCAOB an

To move on, we will need a PDF document for the LLM to query from, so that the chance of hallucination is minimized. Below is an overview of the RAG framework adopted in this demonstration:

In this demonstration I will use the Meta 2024 Q1 results as an example. I have preloaded the chosen PDF file in my Google Drive and you may use the code below to load it into a temporary folder on your Google Drive.

![overview](https://drive.google.com/uc?export=view&id=1LuaFBASrwC6oPlelemAnqa5YbiH4kghq)

# RAG with PCAOB Audit Standards

Here, we present an example of how this advanced RAG methodology can be applied in an audit setting.

In [37]:
# Download auditing standards to a Google Drive folder "data" (folder automatically created)
!mkdir data
!gdown 1-Ex0IQMYtyGPyl-tU6-GOtrcqqoFbZU8 -O "data/auditing_standards.pdf"

mkdir: cannot create directory ‘data’: File exists
Downloading...
From: https://drive.google.com/uc?id=1-Ex0IQMYtyGPyl-tU6-GOtrcqqoFbZU8
To: /content/data/audit standards.pdf
100% 5.70M/5.70M [00:00<00:00, 88.1MB/s]
time: 7.95 s (started: 2024-10-30 10:54:39 -04:00)


You may also upload your own document for querying. Upload it to a Google Drive folder and then load it in Colab using the below codes (select all, then uncomment using "`Ctrl` + `/`" for Windows users OR "`Command` + `/`" for Mac users):

In [13]:
# # Create a folder in your Google Drive called "documents_wcars"
# # and you want to load the file named 'auditing_standards.pdf' from the conference folder

# from google.colab import drive
# drive.mount('/content/drive')

# # Specify the path to your document in Google Drive
# upload_path = '/content/drive/My Drive/documents_wcars/auditing_standards.pdf'

# # ... rest of your code to process the documents ...

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
time: 1.62 s (started: 2024-11-01 07:10:33 -04:00)


## Document Parsing

Document parsing is the process of recognizing/examining data in a document and extracting useful information from it. For instance, data from PDF and Word documents can be extracted using document parser APIs and stored in a JSON file. (source: https://www.edenai.co/post/best-document-parsing-apis)

Parsing documents efficiently is a crucial skill, especially when dealing with detailed financial reports or any other data-rich documents. By following this code snippet, you will learn how to extract and preprocess information from a PDF document using LlamaParse, an API service provided by LlamaCloud.

Instruction Definition: An instruction is defined to guide the parsing process. The instruction specifies that the document contains detailed financial information and numerous tables, emphasizing precision in answering questions.

In [None]:
instruction = """The provided document is PCAOB's audit standards.
This document provides detailed  information about how an audit should be conducted.
Try to be precise while answering the questions."""

**Parser Initialization**: The `LlamaParse` class is initialized with an API key, result type (markdown), parsing instructions, and a maximum timeout setting.

In [None]:
parser = LlamaParse(
    api_key=os.environ["LLAMA_PARSE"],#userdata.get("LLAMA_PARSE"),
    result_type="markdown",
    parsing_instruction=instruction,
    max_timeout=5000,
)

**Asynchronous Data Loading**: The auditing standards document is parsed asynchronously. This allows for efficient handling of large documents without blocking the execution of other tasks. The first parsed document is accessed and stored in the `parsed_doc` variable.

In [15]:
llama_parse_documents = await parser.aload_data("./data/auditing_standards.pdf")

# if you uploaded your own document, comment out the above line and run the below command:

# llama_parse_documents = await parser.aload_data(upload_path)

parsed_doc = llama_parse_documents[0]

Started parsing the file under job_id f3a91674-e0e9-4bba-9b5e-47d34a37d996
time: 1min 3s (started: 2024-11-01 07:12:18 -04:00)


Here we save the first parsed document in our Google Drive folder:

In [21]:
document_path = Path("data/parsed_document_audit.md") # this path leads to a temporary folder in your Google Drive

# again, if you are uploading your own document and would like to retrieve the output later, comment out the above line and run the below command:
# document_path = Path("/content/drive/My Drive/documents_wcars/parsed_document_audit.md")

with document_path.open("a") as f:
    f.write(parsed_doc.text)

time: 25.5 ms (started: 2024-11-01 07:21:00 -04:00)


## Text Splitters and Vector Embeddings

In this part of the script, we use the `UnstructuredMarkdownLoader` class to load and parse a markdown document. By initializing the loader with the path to the document (`document_path`), we can then call the `load()` method to read and process the contents of the markdown file into a structured format. This enables further analysis and manipulation of the document's data within our application. The `loaded_documents` variable will contain the parsed content, ready for subsequent steps in our workflow.

In [22]:
loader = UnstructuredMarkdownLoader(document_path)
loaded_documents = loader.load()

[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.


time: 15.8 s (started: 2024-11-01 07:22:36 -04:00)


The parsed document is too lengthy to be efficiently queried using an LLM, so we need to break it into smaller, manageable chunks. The key is that the LLM will utilize the chunk containing the most relevant information, determined by the similarity between the user's question and each chunk.

To achieve this, we use LangChain's `RecursiveCharacterTextSplitter` to split the text into smaller segments:

In [23]:
# Note: free Groq queries have limit of 6000 tokens per minute of context input
text_splitter = RecursiveCharacterTextSplitter(chunk_size=6000, chunk_overlap=128)# previous parameter setting: (chunk_size=2048, chunk_overlap=128)
docs = text_splitter.split_documents(loaded_documents)
len(docs)

297

time: 41.6 ms (started: 2024-11-01 07:22:51 -04:00)


To better capture the meaning of each we will use vector embeddings to compare against the user's question and determine the similarity and relevance of the information.

We first use `FastEmbedEmbeddings` and specify the model as `BAAI/bge-base-en-v1.5`:

In [25]:
embeddings = FastEmbedEmbeddings(model_name="BAAI/bge-small-en-v1.5")

Fetching 5 files:   0%|          | 0/5 [00:00<?, ?it/s]

tokenizer_config.json:   0%|          | 0.00/1.24k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/695 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/706 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

model_optimized.onnx:   0%|          | 0.00/66.5M [00:00<?, ?B/s]

time: 8.44 s (started: 2024-11-01 07:23:22 -04:00)


You might ask why the Meta Llama 3 is not used for generating embeddings. Llama 3 is primarily designed as a large language model (LLM) for generating text, understanding context, and performing natural language processing tasks. While it can generate embeddings, using specialized models for embeddings often offers several advantages:

* Efficiency: Models like `BAAI/bge-base-en-v1.5` are specifically optimized for generating embeddings quickly and efficiently, making them more suitable for tasks requiring large-scale embedding generation.

* Specialization: Embedding models are trained to create dense vector representations that capture semantic meaning in a way that's optimized for similarity searches and clustering. They excel in tasks where understanding the relationship between different pieces of text is crucial.

* Performance: Dedicated embedding models often have better performance metrics (e.g., accuracy, precision, recall) for embedding tasks compared to general-purpose language models.

* Resource Management: Using a specialized embedding model can be more resource-efficient, reducing computational overhead and memory usage compared to using a large language model like Llama3 for the same purpose.

Thus, in this script, the `FastEmbedEmbeddings` class with the `BAAI/bge-base-en-v1.5 model` is used to generate embeddings, ensuring efficiency and high performance for the embedding tasks required in the workflow.

Qdrant is an open-source vector search engine and database. It specializes in managing and querying vector embeddings efficiently. We create a temp folder `./db5` to store the vector database for the chunks:

Running the below codes takes 3-4 minutes.

In [26]:
qdrant = Qdrant.from_documents(
    docs,
    embeddings,
    # location=":memory:",
    path="./db5",
    collection_name="document_embeddings",
)

time: 3min 48s (started: 2024-11-01 07:23:32 -04:00)


By creating a vector store with Qdrant, we can efficiently store and query embeddings, enabling tasks such as similarity search and clustering based on the semantic content of the documents. This setup optimizes performance and scalability for handling large volumes of text data effectively.

In this code section, we load our query together with a "retriever" so that we can perform a similarity search using the `qdrant` vector store. It searches for documents similar to the query "What does the standard say about using specialists in an audit?" and prints the top results along with their similarity scores. For a demo of the process please refer to the second use case (Meta Earnings).

In [27]:
query = "What does the standard say about using specialists in an audit?"

time: 477 µs (started: 2024-11-01 07:27:20 -04:00)


In [28]:
retriever = qdrant.as_retriever(search_kwargs={"k": 5})

time: 692 µs (started: 2024-11-01 07:27:20 -04:00)


## Reranking

FlashrankRerank specializes in *compressing* large documents into more manageable sizes while preserving essential information. This compression reduces the computational resources needed for processing and improves the efficiency of document retrieval tasks.

In the below section, a `FlashrankRerank` compressor is initialized with a specific model (`ms-marco-MiniLM-L-12-v2`). This compressor is then used to create a `ContextualCompressionRetriever` (`compression_retriever`), which integrates both the retrieval and compression capabilities for further processing of document data.

In [29]:
compressor = FlashrankRerank(model="ms-marco-MiniLM-L-12-v2")
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, base_retriever=retriever
)

Downloading ms-marco-MiniLM-L-12-v2...


ms-marco-MiniLM-L-12-v2.zip: 100%|██████████| 21.6M/21.6M [00:02<00:00, 10.5MiB/s]


time: 3.19 s (started: 2024-11-01 07:27:21 -04:00)


You may refer to the second use case (Meta earnings) to see how the reranking process is executed in more detail.

## Q&A Over Document

In this script segment, we employ the Llama 3 model to perform question-answering (Q&A) over the earnings document. Groq's `ChatGroq` is used to host the LLM on the cloud while being compatible with the `LangChain` library.

In [30]:
# Initialize the ChatGroq model
llm = ChatGroq(temperature=0, model_name="llama3-70b-8192")

time: 133 ms (started: 2024-11-01 07:27:24 -04:00)


Prompt engineering is used here to make sure the response is what we expect from the model. For example, by saying "don't try to make up an answer", the model can reduce hallucination.

In [31]:
# Define the prompt template for generating responses
prompt_template = """
Use the following pieces of information to answer the user's question.
If you don't know the answer, just say that you don't know, don't try to make up an answer.

Context: {context}
Question: {question}

Answer the question and provide additional helpful information,
based on the pieces of information, if applicable. Be succinct.

Responses should be properly formatted to be easily read.
"""

# Initialize the PromptTemplate with input variables so that both context and question will be considered by the model.
prompt = PromptTemplate(
    template=prompt_template, input_variables=["context", "question"]
)

time: 1.29 ms (started: 2024-11-01 07:27:24 -04:00)


Using `RetrievalQA`, we establish a question-answering pipeline (`qa`) that integrates the Llama 3 model (`llm`), a specified chain type (`stuff`), and the compression retriever from the previous step (`compression_retriever`). This setup enables the retrieval and processing of documents to generate accurate responses aligned with user queries.

The `verbose` parameter in the `chain_type_kwargs` refers to a setting that controls the level of detail or verbosity of the output during the question-answering process. Specifically, when `verbose=True` is set, it typically enables the system to provide additional information or logs that help in understanding the internal workings of the retrieval and processing pipeline.

In [32]:
qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=compression_retriever,
    return_source_documents=True,
    chain_type_kwargs={"prompt": prompt, "verbose": True},
)

time: 2.07 ms (started: 2024-11-01 07:27:24 -04:00)


In [33]:
# Perform Q&A with a specific query
%%time
response = qa.invoke("What does the standard say about using specialists in an audit?")

Running pairwise ranking..


[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
Use the following pieces of information to answer the user's question.
If you don't know the answer, just say that you don't know, don't try to make up an answer.

Context: For Audits of FYE On 12/15/2020-12/14/2024

Inconsistency in, or Doubts about the Reliability of, Audit Evidence

.29 If audit evidence obtained from one source is inconsistent with that obtained from another, or if the auditor has doubts about the reliability of information to be used as audit evidence, the auditor should perform the audit procedures necessary to resolve the matter and should determine the effect, if any, on other aspects of the audit.

Appendix A - Using the Work of a Company's Specialist as Audit Evidence

.A1 This appendix describes the auditor's responsibilities with respect to using the work of a specialist, employed or engaged by th

In [34]:
print_response(response)

**Answer:** The standard provides guidance on using specialists in an audit, including company's
specialists and auditor-engaged specialists. It outlines the auditor's responsibilities when using
the work of a specialist as audit evidence, including assessing the specialist's knowledge, skill,
and ability, evaluating the specialist's work, and determining the necessary evidence to support the
auditor's conclusion.

**Additional helpful information:**

* The standard applies to specialists with special skill or knowledge in a particular field other
than accounting or auditing.
* The auditor should obtain an understanding of the professional qualifications, experience, and
reputation of the specialist, as well as assess the specialist's objectivity and relationship with
the company.
* The auditor should evaluate the specialist's work, including the data, significant assumptions,
and methods used, and determine whether the specialist's work provides sufficient appropriate
evidence to supp

# RAG with Meta earnings

Here, we present another example of how this advanced RAG methodology can be applied in analyzing company disclosures.

In [36]:
# create a Google Drive folder "data" (folder automatically created)
!mkdir data
# download Meta 2024 Q1 Earnings to the specified path
!gdown 1ee-BhQiH-S9a2IkHiFbJz9eX_SfcZ5m9 -O "data/meta-earnings.pdf"

mkdir: cannot create directory ‘data’: File exists
Downloading...
From: https://drive.google.com/uc?id=1ee-BhQiH-S9a2IkHiFbJz9eX_SfcZ5m9
To: /content/data/meta-earnings.pdf
100% 160k/160k [00:00<00:00, 69.7MB/s]
time: 3.83 s (started: 2024-10-30 10:52:30 -04:00)


## Document Parsing

*Document parsing* is the process of recognizing/examining data in a document and extracting useful information from it. For instance, data from PDF and Word documents can be extracted using document parser APIs and stored in a JSON file. (source: https://www.edenai.co/post/best-document-parsing-apis)

Parsing documents efficiently is a crucial skill, especially when dealing with detailed financial reports or any other data-rich documents. By following this code snippet, you will learn how to extract and preprocess information from a PDF document using LlamaParse, an API service provided by LlamaCloud.

**Instruction Definition:** An instruction is defined to guide the parsing process. The instruction specifies that the document contains detailed financial information and numerous tables, emphasizing precision in answering questions.

In [None]:
instruction = """The provided document is Meta First Quarter 2024 Results.
This form provides detailed financial information about the company's performance for a specific quarter.
It includes unaudited financial statements, management discussion and analysis, and other relevant disclosures required by the SEC.
It contains many tables.
Try to be precise while answering the questions."""

time: 547 µs (started: 2024-07-10 05:53:06 +00:00)


**Parser Initialization**: The `LlamaParse` class is initialized with an API key, result type (markdown), parsing instructions, and a maximum timeout setting.

In [None]:
parser = LlamaParse(
    api_key=os.environ["LLAMA_PARSE"],
    result_type="markdown",
    parsing_instruction=instruction,
    max_timeout=5000,
)

time: 823 µs (started: 2024-07-10 05:53:10 +00:00)


**Asynchronous Data Loading**: The document located at `./data/meta-earnings.pdf` is parsed asynchronously. This allows for efficient handling of large documents without blocking the execution of other tasks. The first parsed document is accessed and stored in the `parsed_doc` variable.

In [None]:
llama_parse_documents = await parser.aload_data("./data/meta-earnings.pdf")

parsed_doc = llama_parse_documents[0]

Started parsing the file under job_id 848a8ac4-326d-42e3-a43f-1e4bb53c7d53
time: 27.2 s (started: 2024-07-10 05:53:12 +00:00)


Here we print out the first 4096 characters of the parsed document:

In [None]:
Markdown(parsed_doc.text[:1024])

# Meta Reports First Quarter 2024 Results

MENLO PARK, Calif. – April 24, 2024 – Meta Platforms, Inc. (Nasdaq: META) today reported financial results for the quarter ended March 31, 2024.

"It's been a good start to the year," said Mark Zuckerberg, Meta founder and CEO. "The new version of Meta AI with Llama 3 is another step towards building the world's leading AI. We're seeing healthy growth across our apps and we continue making steady progress building the metaverse as well."

# First Quarter 2024 Financial Highlights

|In millions, except percentages and per share amounts|Three Months Ended March 31,|2024|2023|% Change|
|---|---|---|---|---|
|Revenue|$36,455|$28,645|27%| |
|Costs and expenses|$22,637|$21,418|6%| |
|Income from operations|$13,818|$7,227|91%| |
|Operating margin|38%|25%| | |
|Provision for income taxes|$1,814|$1,598|14%| |
|Effective tax rate|13%|22%| | |
|Net income|$12,369|$5,709|117%| |
|Diluted earnings per share (EPS)|$4.71|$2.20|114%| |

# First Quarter 2024 Operational and Other Fin

time: 3.38 ms (started: 2024-07-10 05:53:54 +00:00)


Here we save the parsed document in our temporary Google Drive folder:

In [None]:
document_path = Path("data/parsed_document.md")
with document_path.open("a") as f:
    f.write(parsed_doc.text)

time: 1.84 ms (started: 2024-07-10 05:54:25 +00:00)


## Text Splitters and Vector Embeddings

In this part of the script, we use the `UnstructuredMarkdownLoader` class to load and parse a markdown document. By initializing the loader with the path to the document (`document_path`), we can then call the `load()` method to read and process the contents of the markdown file into a structured format. This enables further analysis and manipulation of the document's data within our application. The `loaded_documents` variable will contain the parsed content, ready for subsequent steps in our workflow.

In [None]:
loader = UnstructuredMarkdownLoader(document_path)
loaded_documents = loader.load()

time: 145 ms (started: 2024-07-10 05:54:29 +00:00)


The parsed document is too lengthy to be efficiently queried using an LLM, so we need to break it into smaller, manageable chunks. The key is that the LLM will utilize the chunk containing the most relevant information, determined by the similarity between the user's question and each chunk.

To achieve this, we use LangChain's `RecursiveCharacterTextSplitter` to split the text into smaller segments:

In [None]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=2048, chunk_overlap=128)
docs = text_splitter.split_documents(loaded_documents)
len(docs)

11

time: 7.05 ms (started: 2024-07-10 05:54:33 +00:00)


This method ensures that each chunk is of optimal size for processing, while overlapping segments help maintain context across the chunks.

Let's print out the first chunk and see how it looks like:

In [None]:
print(docs[0].page_content)

Meta Reports First Quarter 2024 Results

MENLO PARK, Calif. – April 24, 2024 – Meta Platforms, Inc. (Nasdaq: META) today reported financial results for the quarter ended March 31, 2024.

"It's been a good start to the year," said Mark Zuckerberg, Meta founder and CEO. "The new version of Meta AI with Llama 3 is another step towards building the world's leading AI. We're seeing healthy growth across our apps and we continue making steady progress building the metaverse as well."

First Quarter 2024 Financial Highlights

In millions, except percentages and per share amounts Three Months Ended March 31, 2024 2023 % Change Revenue $36,455 $28,645 27% Costs and expenses $22,637 $21,418 6% Income from operations $13,818 $7,227 91% Operating margin 38% 25% Provision for income taxes $1,814 $1,598 14% Effective tax rate 13% 22% Net income $12,369 $5,709 117% Diluted earnings per share (EPS) $4.71 $2.20 114%

First Quarter 2024 Operational and Other Financial Highlights

Family daily active peo

To better capture the meaning of each we will use vector embeddings to compare against the user's question and determine the similarity and relevance of the information.

We first use `FastEmbedEmbeddings` and specify the model as `BAAI/bge-base-en-v1.5`:

In [None]:
embeddings = FastEmbedEmbeddings(model_name="BAAI/bge-base-en-v1.5")

Fetching 5 files:   0%|          | 0/5 [00:00<?, ?it/s]

special_tokens_map.json:   0%|          | 0.00/695 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.24k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/740 [00:00<?, ?B/s]

model_optimized.onnx:   0%|          | 0.00/218M [00:00<?, ?B/s]

time: 8.98 s (started: 2024-07-10 05:54:42 +00:00)


You might ask why the Meta Llama 3 is not used for generating embeddings. Llama 3 is primarily designed as a large language model (LLM) for generating text, understanding context, and performing natural language processing tasks. While it can generate embeddings, using specialized models for embeddings often offers several advantages:

* Efficiency: Models like `BAAI/bge-base-en-v1.5` are specifically optimized for generating embeddings quickly and efficiently, making them more suitable for tasks requiring large-scale embedding generation.

* Specialization: Embedding models are trained to create dense vector representations that capture semantic meaning in a way that's optimized for similarity searches and clustering. They excel in tasks where understanding the relationship between different pieces of text is crucial.

* Performance: Dedicated embedding models often have better performance metrics (e.g., accuracy, precision, recall) for embedding tasks compared to general-purpose language models.

* Resource Management: Using a specialized embedding model can be more resource-efficient, reducing computational overhead and memory usage compared to using a large language model like Llama3 for the same purpose.

Thus, in this script, the `FastEmbedEmbeddings` class with the `BAAI/bge-base-en-v1.5 model` is used to generate embeddings, ensuring efficiency and high performance for the embedding tasks required in the workflow.

Qdrant is an open-source vector search engine and database. It specializes in managing and querying vector embeddings efficiently. We create a temp folder `./db` to store the vector database for the chunks:

In [None]:
qdrant = Qdrant.from_documents(
    docs,
    embeddings, # Represents the embeddings model (FastEmbedEmbeddings in this case).
    # (optional) location=":memory:",
    path="./db",
    collection_name="document_embeddings", # Defines the name of the collection within Qdrant where the embeddings will be stored.
)

time: 24.9 s (started: 2024-07-10 05:54:51 +00:00)


By creating a vector store with Qdrant, we can efficiently store and query embeddings, enabling tasks such as similarity search and clustering based on the semantic content of the documents. This setup optimizes performance and scalability for handling large volumes of text data effectively.

In this code section, we perform a similarity search using the `qdrant` vector store. It searches for documents similar to the query "What is the most important innovation from Meta?" and prints the top results along with their similarity scores.

In [None]:
%%time
# Perform similarity search based on a query using the Qdrant vector store
query = "What is the most important innovation from Meta?"
similar_docs = qdrant.similarity_search_with_score(query)

CPU times: user 273 ms, sys: 960 µs, total: 274 ms
Wall time: 273 ms
time: 275 ms (started: 2024-07-10 05:55:16 +00:00)


In [None]:
# Print results of retrieval
for doc, score in similar_docs:
    print(f"text: {doc.page_content[:256]}\n")
    print(f"score: {score}")
    print("-" * 80)
    print()

text: Meta Reports First Quarter 2024 Results

MENLO PARK, Calif. – April 24, 2024 – Meta Platforms, Inc. (Nasdaq: META) today reported financial results for the quarter ended March 31, 2024.

"It's been a good start to the year," said Mark Zuckerberg, Meta foun

score: 0.6154119568600498
--------------------------------------------------------------------------------

text: Webcast and Conference Call Information

Meta will host a conference call to discuss the results at 2:00 p.m. PT / 5:00 p.m. ET today. The live webcast of Meta's earnings conference call can be accessed at investor.fb.com, along with the earnings press rel

score: 0.5711460522832437
--------------------------------------------------------------------------------

text: Reconciliation of cash, cash equivalents, and restricted cash to the condensed consolidated balance sheets

Cash and cash equivalents $32,307 $11,551 Restricted cash, included in prepaid expenses and other current assets 84 224 Restricted cash, inclu

In the context of natural language processing (NLP) and information retrieval systems, a **retriever** refers to a component or module responsible for fetching and retrieving relevant information or documents from a database or corpus based on a given query. Retrievers play a crucial role in various applications, including search engines, question-answering systems, and document retrieval tasks.

The below block configures `qdrant` as a retriever (`retriever`) and performs retrieval based on the same query. It retrieves up to 5 documents that match the query and prints their IDs and truncated content.

In [None]:
# Configure Qdrant as a retriever and perform retrieval based on the query
%%time
retriever = qdrant.as_retriever(search_kwargs={"k": 5})
retrieved_docs = retriever.invoke(query)

CPU times: user 321 ms, sys: 27 ms, total: 348 ms
Wall time: 351 ms
time: 352 ms (started: 2024-07-10 05:55:16 +00:00)


In [None]:
# Print results of similarity search
for doc in retrieved_docs:
    print(f"id: {doc.metadata['_id']}\n")
    print(f"text: {doc.page_content[:256]}\n")
    print("-" * 80)
    print()

id: 8ee19e6473454db5b1e1bb318f900f64

text: Meta Reports First Quarter 2024 Results

MENLO PARK, Calif. – April 24, 2024 – Meta Platforms, Inc. (Nasdaq: META) today reported financial results for the quarter ended March 31, 2024.

"It's been a good start to the year," said Mark Zuckerberg, Meta foun

--------------------------------------------------------------------------------

id: b1190e896859405b9f247816f63da732

text: Webcast and Conference Call Information

Meta will host a conference call to discuss the results at 2:00 p.m. PT / 5:00 p.m. ET today. The live webcast of Meta's earnings conference call can be accessed at investor.fb.com, along with the earnings press rel

--------------------------------------------------------------------------------

id: defde6ff0a9a40b4b5911e73df9fc6f2

text: Reconciliation of cash, cash equivalents, and restricted cash to the condensed consolidated balance sheets

Cash and cash equivalents $32,307 $11,551 Restricted cash, included in prepaid e

## Reranking

FlashrankRerank specializes in *compressing* large documents into more manageable sizes while preserving essential information. This compression reduces the computational resources needed for processing and improves the efficiency of document retrieval tasks.

In the below section, a `FlashrankRerank` compressor is initialized with a specific model (`ms-marco-MiniLM-L-12-v2`). This compressor is then used to create a `ContextualCompressionRetriever` (`compression_retriever`), which integrates both the retrieval and compression capabilities for further processing of document data.

In [None]:
compressor = FlashrankRerank(model="ms-marco-MiniLM-L-12-v2")
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, base_retriever=retriever
)

time: 212 ms (started: 2024-07-10 05:55:23 +00:00)


Below is the reranking process being executed:

In [None]:
%%time
reranked_docs = compression_retriever.invoke(query)
len(reranked_docs)

Running pairwise ranking..
CPU times: user 3.45 s, sys: 172 ms, total: 3.62 s
Wall time: 3.72 s


3

time: 3.73 s (started: 2024-07-08 16:01:53 +00:00)


After reranking, the code iterates through `reranked_docs`, printing details for each document. These details help in understanding how each document is ranked and presented based on its relevance to the query.

In [None]:
for doc in reranked_docs:
    print(f"id: {doc.metadata['_id']}\n")
    print(f"text: {doc.page_content[:256]}\n")
    print(f"score: {doc.metadata['relevance_score']}")
    print("-" * 80)
    print()

id: 8ee19e6473454db5b1e1bb318f900f64

text: Meta Reports First Quarter 2024 Results

MENLO PARK, Calif. – April 24, 2024 – Meta Platforms, Inc. (Nasdaq: META) today reported financial results for the quarter ended March 31, 2024.

"It's been a good start to the year," said Mark Zuckerberg, Meta foun

score: 0.1650884598493576
--------------------------------------------------------------------------------

id: b1190e896859405b9f247816f63da732

text: Webcast and Conference Call Information

Meta will host a conference call to discuss the results at 2:00 p.m. PT / 5:00 p.m. ET today. The live webcast of Meta's earnings conference call can be accessed at investor.fb.com, along with the earnings press rel

score: 0.006229652091860771
--------------------------------------------------------------------------------

id: dca20c6e4e64421b81c98df52a089ca4

text: This press release contains forward-looking statements regarding our future business plans and expectations. These forward-looking sta

## Q&A Over Document

In this script segment, we employ the Llama 3 model to perform question-answering (Q&A) over the earnings document. Groq's `ChatGroq` is used to host the LLM on the cloud while being compatible with the `LangChain` library.

In [None]:
# Initialize the ChatGroq model
llm = ChatGroq(temperature=0, model_name="llama3-70b-8192")

time: 118 ms (started: 2024-07-10 05:55:39 +00:00)


Prompt engineering is used here to make sure the response is what we expect from the model. For example, by saying "don't try to make up an answer", the model can reduce hallucination.

In [None]:
# Define the prompt template for generating responses
prompt_template = """
Use the following pieces of information to answer the user's question.
If you don't know the answer, just say that you don't know, don't try to make up an answer.

Context: {context}
Question: {question}

Answer the question and provide additional helpful information,
based on the pieces of information, if applicable. Be succinct.

Responses should be properly formatted to be easily read.
"""

# Initialize the PromptTemplate with input variables so that both context and question will be considered by the model.
prompt = PromptTemplate(
    template=prompt_template, input_variables=["context", "question"]
)

time: 683 µs (started: 2024-07-10 05:55:50 +00:00)


Using `RetrievalQA`, we establish a question-answering pipeline (`qa`) that integrates the Llama 3 model (`llm`), a specified chain type (`stuff`), and the compression retriever from the previous step (`compression_retriever`). This setup enables the retrieval and processing of documents to generate accurate responses aligned with user queries.

The `verbose` parameter in the `chain_type_kwargs` refers to a setting that controls the level of detail or verbosity of the output during the question-answering process. Specifically, when `verbose=True` is set, it typically enables the system to provide additional information or logs that help in understanding the internal workings of the retrieval and processing pipeline.

In [None]:
# Initialize the RetrievalQA chain
qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff", # Example placeholder chain type
    retriever=compression_retriever,
    return_source_documents=False,
    chain_type_kwargs={"prompt": prompt, "verbose": False},
)

time: 4.87 ms (started: 2024-07-10 06:06:25 +00:00)


### significant innovation

In [None]:
# Perform Q&A with a specific query
response = qa.invoke("What is the most significant innovation from Meta?")

Running pairwise ranking..


[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
Use the following pieces of information to answer the user's question.
If you don't know the answer, just say that you don't know, don't try to make up an answer.

Context: Meta Reports First Quarter 2024 Results

MENLO PARK, Calif. – April 24, 2024 – Meta Platforms, Inc. (Nasdaq: META) today reported financial results for the quarter ended March 31, 2024.

"It's been a good start to the year," said Mark Zuckerberg, Meta founder and CEO. "The new version of Meta AI with Llama 3 is another step towards building the world's leading AI. We're seeing healthy growth across our apps and we continue making steady progress building the metaverse as well."

First Quarter 2024 Financial Highlights

In millions, except percentages and per share amounts Three Months Ended March 31, 2024 2023 % Change Revenue $36,455 $28,645 27% Costs and

In [None]:
# Print the response
print_response(response)

Based on the provided information, the most significant innovation from Meta is the new version of
Meta AI with Llama 3, which is mentioned in the quote from Mark Zuckerberg, Meta founder and CEO.
This innovation is part of Meta's efforts to build the world's leading AI.

Additionally, the press release highlights Meta's progress in building the metaverse, which is
another significant innovation from the company. However, the exact details of these innovations are
not provided in the given information.
time: 972 µs (started: 2024-07-10 05:56:16 +00:00)


In [None]:
# Print the response (this was run two days before the above - both are similar but not the same)
print_response(response)

Based on the provided information, the most significant innovation from Meta is the new version of
Meta AI with Llama 3, which is mentioned in the quote from Mark Zuckerberg, Meta founder and CEO.
This is described as "another step towards building the world's leading AI".

Additionally, the press release highlights Meta's efforts in building the metaverse, which is
mentioned as one of the areas where the company is making steady progress. However, it does not
provide further details on specific innovations or developments in this area.
time: 901 µs (started: 2024-07-08 16:02:28 +00:00)


### revenue for 2024 and % change

In [None]:
%%time
response = qa.invoke("What is the revenue for 2024 and % change?")

Running pairwise ranking..


[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
Use the following pieces of information to answer the user's question.
If you don't know the answer, just say that you don't know, don't try to make up an answer.

Context: Reconciliation of GAAP to Non-GAAP Results

Three Months Ended March 31, 2024 Three Months Ended March 31, 2023 GAAP revenue $36,455 $28,645 Foreign exchange effect on 2024 revenue using 2023 rates (106) Revenue excluding foreign exchange effect $36,349 GAAP revenue year-over-year change % 27% Revenue excluding foreign exchange effect year-over-year change % 27% GAAP advertising revenue $35,635 $28,101 Foreign exchange effect on 2024 advertising revenue using 2023 rates (105) Advertising revenue excluding foreign exchange effect $35,530 GAAP advertising revenue year-over-year change % 27% Advertising revenue excluding foreign exchange effect year-over-year

In [None]:
Markdown(response["result"])

**Revenue for 2024 and % Change:**

The revenue for 2024 is $36,455 million, which represents a 27% year-over-year change compared to 2023.

**Additional Helpful Information:**

* Revenue excluding foreign exchange effect is $36,349 million, which also represents a 27% year-over-year change.
* Advertising revenue is $35,635 million, which represents a 27% year-over-year change, and advertising revenue excluding foreign exchange effect is $35,530 million, which represents a 26% year-over-year change.

time: 5.23 ms (started: 2024-07-08 16:08:42 +00:00)


In [None]:
response["result"]

'**Revenue for 2024 and % Change:**\n\nThe revenue for 2024 is $36,455 million, which represents a 27% year-over-year change compared to 2023.\n\n**Additional Helpful Information:**\n\n* Revenue excluding foreign exchange effect is $36,349 million, which also represents a 27% year-over-year change.\n* Advertising revenue is $35,635 million, which represents a 27% year-over-year change, and advertising revenue excluding foreign exchange effect is $35,530 million, which represents a 26% year-over-year change.'

time: 12.4 ms (started: 2024-07-08 16:09:07 +00:00)


### revenue for 2023

In [None]:
%%time
response = qa.invoke("What is the revenue for 2023?")

Running pairwise ranking..
CPU times: user 2.41 s, sys: 6.13 ms, total: 2.41 s
Wall time: 3.39 s
time: 3.39 s (started: 2024-07-08 16:19:43 +00:00)


In [None]:
print_response(response)

**Answer:** The revenue for 2023 is $28,645.

**Additional helpful information:**

* The revenue for 2024 is $36,455, which is a 27% year-over-year increase from 2023.
* The foreign exchange effect on 2024 revenue using 2023 rates is ($106), which means that if the
exchange rates were the same as in 2023, the revenue would be $36,349.
time: 1.78 ms (started: 2024-07-08 16:19:52 +00:00)


In [None]:
print_response(response)

The revenue for 2023 is $28,645.

Additional information: This is a 27% increase from the previous year, and the revenue excluding
foreign exchange effect is also $28,645.


### expected revenue

In [None]:
%%time
response = qa.invoke("What is the expected revenue for the second quarter of 2024?")

Running pairwise ranking..
CPU times: user 2.59 s, sys: 9.42 ms, total: 2.6 s
Wall time: 6.58 s


In [None]:
Markdown(response["result"])

**Answer:** The expected revenue for the second quarter of 2024 is in the range of $36.5-39 billion.

**Additional information:** This guidance assumes a 1% headwind to year-over-year total revenue growth due to foreign currency exchange rates.

### overall outlook

In [None]:
%%time
response = qa.invoke("What is the overall outlook of Q1 2024?")

Running pairwise ranking..
CPU times: user 3.44 s, sys: 12.9 ms, total: 3.45 s
Wall time: 8.28 s


In [None]:
print_response(response)

**Overall Outlook of Q1 2024:**

The overall outlook of Q1 2024 is positive. According to Mark Zuckerberg, "It's been a good start to
the year." The company has reported strong financial results, with revenue increasing by 27% year-
over-year to $36.46 billion. Net income has also increased by 117% year-over-year to $12.37 billion.

**Additional Highlights:**

* Family daily active people (DAP) increased by 7% year-over-year to 3.24 billion.
* Ad impressions increased by 20% year-over-year.
* Average price per ad increased by 6% year-over-year.
* Capital expenditures were $6.72 billion, and free cash flow was $12.53 billion.
* The company has also reported a strong capital return program, with share repurchases of $14.64
billion and dividend payments of $1.27 billion.


### and ... any further questions?

To convert the question into an input field that users can interact with directly, you can use `input()` function in Python, like this:

In [None]:
# Define a function to perform Q&A with user input
def perform_qa_with_input():
    question = input("Enter your question: ")
    response = qa.invoke(question)
    return response

# Example usage
response = perform_qa_with_input()
print_response(response)

Enter your question: When will Meta host the conference call?
Running pairwise ranking..
**Answer:** Meta will host the conference call at 2:00 p.m. PT / 5:00 p.m. ET today.

**Additional helpful information:**

* The live webcast of Meta's earnings conference call can be accessed at investor.fb.com.
* A replay of the call will be available at the same website after the call.
* Transcripts of conference calls with publishing equity research analysts held today will also be
posted to the investor.fb.com website.
time: 14.8 s (started: 2024-07-10 06:06:47 +00:00)


# References
- [Meta Reports First Quarter 2024 Results](https://s21.q4cdn.com/399680738/files/doc_financials/2024/q1/Meta-03-31-2024-Exhibit-99-1_FINAL.pdf)
- [PCAOB Audit Standards](https://assets.pcaobus.org/pcaob-dev/docs/default-source/standards/auditing/documents/auditing_standards_audits_after_december_15_2020_december_14_2024.pdf?sfvrsn=915b22d3_1)