# Answer questions from Reddit SEC filing with Opper and Mistral

In [3]:
# Install the datasets library from huggingface
!pip install opperai -U
!pip install pydantic


Collecting opperai
  Downloading opperai-0.5.2-py2.py3-none-any.whl.metadata (3.7 kB)
Downloading opperai-0.5.2-py2.py3-none-any.whl (25 kB)
Installing collected packages: opperai
  Attempting uninstall: opperai
    Found existing installation: opperai 0.5.1
    Uninstalling opperai-0.5.1:
      Successfully uninstalled opperai-0.5.1
Successfully installed opperai-0.5.2


## Imports

In [4]:
import os
from opperai import fn, Client, AsyncClient
from opperai.types.indexes import RetrievalResponse
from pydantic import BaseModel, Field
from typing import List, Literal

os.environ["OPPER_API_KEY"] = "op-BN3K7QS92CIJI16MANUE"

opper = Client()


None


## Index PDF

In [106]:

# We get or create our index
index = opper.indexes.get(name="mistral-rag2")
if not index:
    index = opper.indexes.create(name="mistral-rag2")

    print(index)

    # we upload our pdf to the index
    opper.indexes.upload_file(
        id=index.id, 
        file_path="./reddit-sec.pdf",
        )


## Retrieve relevant content for answering question

In [107]:
question = "What are the key financial and growth numbers for Reddit?"

# We retrieve the content from the index
results = opper.indexes.retrieve(
    id=index.id,
    query=question,
    k=3
)

print(results)


[RetrievalResponse(content='04/05/2024, 23:15 Document\nhttps://www.sec.gov/Archives/edgar/data/1713445/000162828024006294/reddits-1q423.htm 19/281Table of Contents\nexcluding China and Russia, is expected to grow at a CAGR of 20% to $1.0 trillion in 2027. We believe the importance of data to all\ntypes of analytics and AI, from training to testing and refining models, positions us well to tap into this strong market.\nUser Economy\nCommerce is already at the core of many communities today. As we introduce new ways to enable developers to add additional\nfunctionality to their communities, we believe there will be further development of economic features on Reddit (e.g., games). We\nsee informal exchanges today of digital goods, services, and even physical goods. We recognize the opportunity that commerce\npresents and we have continued to invest in the future of Reddit’s user economy. Using estimates from IDC’s Consumer Market\nModel and focusing on six core geographies (United States

In [131]:
# Extract file_name and page number from results
class Source(BaseModel):
    file_name: str
    content: str
    page_number: int

processed_results = [
    Source(
        content=result.content,
        file_name=result.metadata.get("file_name"),
        page_number=result.metadata.get("page")
    ) for result in results
]

print(processed_results)


[Source(content='04/05/2024, 23:15 Document\nhttps://www.sec.gov/Archives/edgar/data/1713445/000162828024006294/reddits-1q423.htm 19/281Table of Contents\nexcluding China and Russia, is expected to grow at a CAGR of 20% to $1.0 trillion in 2027. We believe the importance of data to all\ntypes of analytics and AI, from training to testing and refining models, positions us well to tap into this strong market.\nUser Economy\nCommerce is already at the core of many communities today. As we introduce new ways to enable developers to add additional\nfunctionality to their communities, we believe there will be further development of economic features on Reddit (e.g., games). We\nsee informal exchanges today of digital goods, services, and even physical goods. We recognize the opportunity that commerce\npresents and we have continued to invest in the future of Reddit’s user economy. Using estimates from IDC’s Consumer Market\nModel and focusing on six core geographies (United States, Canada, A

## Create response with citations

In [130]:
class Citation(BaseModel):
    source: str = Field(..., description="The source, such as file name, used to answer the question")
    page_number: int = Field(..., description="The page number of the source")
    citation: str = Field(..., description="A relevant citation from the source, preferably short")

class Response(BaseModel):
    answer: str = Field(..., description="The answer to the question")
    citations: List[Citation] = Field(..., description="Citations used to answer the question")

@fn(path="test/mistral-rag/response", model="mistral/mistral-large-latest-eu")
def answer_question(question: str, sources: List[Source]) -> Response:
    """ Produce a detailed answer to the question using the sources. Clearly mark each fact or statement with inline citations in the style of [1], [2]. 
    """

response = answer_question(question, processed_results)
print(response)


answer="Reddit's key financial and growth numbers are not explicitly provided in the given sources. However, the sources mention that the market size for Reddit's user economy is estimated to be $1.3 trillion today and is expected to grow at a CAGR of 12% to $2.1 trillion in 2027. Reddit believes it can generate revenue based on the volume of commerce conducted on its platform [1]. The sources also mention that Reddit is an emerging growth company as defined in the Jumpstart Our Business Startups Act of 2012 and will remain so until certain conditions are met [2]. However, specific financial figures such as revenue, profit, or user base numbers are not provided." citations=[Citation(source='reddit-sec.pdf', page_number=19, citation='Using estimates from IDC’s Consumer Market Model and focusing on six core geographies (United States, Canada, Australia, Western Europe, India, and Latin America), we believe this market size is $1.3 trillion today, and it is expected to grow at a CAGR of 1

## Print it! 

In [132]:
print(response.answer)
print()
index = 1
for citation in response.citations:
    print(f"[{index}]", citation.source, "page:", citation.page_number, "segment:", f'"{citation.citation[:50]}..."')
    index += 1
    


Reddit's key financial and growth numbers are not explicitly provided in the given sources. However, the sources mention that the market size for Reddit's user economy is estimated to be $1.3 trillion today and is expected to grow at a CAGR of 12% to $2.1 trillion in 2027. Reddit believes it can generate revenue based on the volume of commerce conducted on its platform [1]. The sources also mention that Reddit is an emerging growth company as defined in the Jumpstart Our Business Startups Act of 2012 and will remain so until certain conditions are met [2]. However, specific financial figures such as revenue, profit, or user base numbers are not provided.

[1] reddit-sec.pdf page: 19 segment: "Using estimates from IDC’s Consumer Market Model a..."
[2] reddit-sec.pdf page: 21 segment: "We are an emerging growth company as defined in th..."
