# RAG with o1-preview and o1-mini

Appendix to blog post

In [6]:
# Install the datasets library from huggingface
!pip install opperai -U
!pip install pydantic




## Imports

In [7]:
import os
from opperai import Opper, fn, Client, AsyncClient
from opperai.types.indexes import RetrievalResponse
from pydantic import BaseModel, Field
from typing import List, Literal

os.environ["OPPER_API_KEY"] = "<your opper api key here>"

opper = Opper()


## Index PDF

In [8]:

# We get or create our index
index = opper.indexes.get(name="reddit-s1")
if not index:
    index = opper.indexes.create(name="reddit-s1")

    # we upload our pdf to the index
    index.upload_file(
        file_path="./reddit-sec.pdf",
        )

print(index)


Index(_client=<opperai._client.Client object at 0x106ee6650>, _index=Index(uuid=UUID('d485b596-9ff0-49a6-b78c-13fdad697f00'), name='reddit-s1', created_at=datetime.datetime(2024, 9, 15, 12, 57, 39, 717955, tzinfo=TzInfo(UTC))))


## Retrieve relevant knowledge

In [9]:
question = "Provide a data-driven SWOT analysis of Reddit with emphasis on impact from AI"

subquestions, _ = opper.call(
    name="generate_subquestions",
    instructions="Given that you can query Reddit's S1 filing to answer the question, generate a list of subquestions that you would want the answer to in order to answer the main question. Only return the subquestions, not the question.",
    input=question,
    output_type=List[str],
    model="openai/o1-mini"
)
knowledge = []
for subquestion in subquestions:
    print(subquestion)
    result = index.query(
        query=subquestion,
        k=1
    )
    knowledge.append(result)

# print(knowledge)
    

What are Reddit's core strengths in terms of user engagement and community building, and how do these strengths compare to competitors in the presence of AI-driven content moderation tools?
What weaknesses does Reddit face in its current business model and platform infrastructure that could be exacerbated by the integration of AI technologies?
What opportunities exist for Reddit to leverage AI to enhance user experience, personalization, and content discovery?
What potential threats does the rise of AI pose to Reddit's market position, including risks related to AI-generated content, privacy concerns, and regulatory challenges?
How is Reddit currently utilizing AI in its operations, and what data supports the effectiveness of these implementations?
In what ways can AI-driven analytics provide insights into Reddit's user behavior and content trends to inform strategic decisions?
What are the financial implications for Reddit in adopting and scaling AI technologies within its platform?
H

## Create response with citations

In [10]:
class Citation(BaseModel):
    file_name: str 
    page_number: int 
    citation: str 

class Response(BaseModel):
    thoughts: str
    answer: str 
    citations: List[Citation]

response, _ = opper.call(
    name="o1/respond",
    model="openai/o1-preview",
    instructions="Produce an answer to the question using knowledge. Refer to any facts with [1], [2] etc.",
    input={
        "question": question,
        "knowledge": knowledge
    },
    output_type=Response
)

print(response)


thoughts="After analyzing the provided knowledge, I have identified key points related to Reddit's strengths, weaknesses, opportunities, and threats, particularly focusing on the impact of AI. I have organized these points into a SWOT analysis with appropriate citations." answer="Strengths:\n- Reddit's massive corpus of conversational data is foundational to current AI technology and many LLMs, making it valuable for model training [1].\n- Reddit is investing in AI to enhance the user experience, making it more personalized and safer, and to improve search capabilities, which is expected to increase user engagement and retention [2].\n- AI is expected to improve Reddit's ability to localize content and moderate content as they expand internationally [2].\n\nWeaknesses:\n- New AI applications require additional investment, increasing costs and complexity, which may impact gross margin [3].\n- Market acceptance of AI technologies is uncertain; Reddit may be unsuccessful in its product de

## Print results 

In [11]:
print(response.answer)
print()
index = 1
for citation in response.citations:
    print(f"[{index}]", f'"{citation.citation}" from {citation.file_name} page {citation.page_number}')
    index += 1
    


Strengths:
- Reddit's massive corpus of conversational data is foundational to current AI technology and many LLMs, making it valuable for model training [1].
- Reddit is investing in AI to enhance the user experience, making it more personalized and safer, and to improve search capabilities, which is expected to increase user engagement and retention [2].
- AI is expected to improve Reddit's ability to localize content and moderate content as they expand internationally [2].

Weaknesses:
- New AI applications require additional investment, increasing costs and complexity, which may impact gross margin [3].
- Market acceptance of AI technologies is uncertain; Reddit may be unsuccessful in its product development efforts [3].
- Reddit may face competition from LLMs; users might choose to use AI models instead of visiting Reddit directly [4].

Opportunities:
- Emerging opportunity in data licensing given the value of Reddit's data in sentiment analysis and trend identification [1].
- Red