In [30]:
import pandas as pd
from openai import OpenAI
from ast import literal_eval
from embeddings import CHUNK_SIZE
from retrieval import query_embeddings


client = OpenAI()

In [31]:
df = pd.read_csv("data/20e01e08-12bd-4258-ab45-5cf9244b727f.csv")
df["embedding"] = df["embedding"].apply(literal_eval)
texts, embeddings = df["text"].tolist(), df["embedding"].tolist()

In [32]:
def query_results(query: str, debug=True) -> list[str]:
    results = query_embeddings(query, embeddings, 5)
    results_text = [texts[i] for i, _ in results]
    if debug:
        for i, result in enumerate(results_text):
            print(f"Result {i + 1} (Similarity: {results[i][1]}):")
            print(result)
            print("-" * 100)
    return results_text


def ask(query: str, results_text: list[str], debug=True):
    context = "\n\n###\n\n".join(results_text)
    system_message = "Answer the question based on the context below, and if the question can't be answered based on the context, say \"I don't know\"\n\n"
    user_message = f"Context: {context}\n\n---\n\nQuestion: {query}\nAnswer:"

    if debug:
        print("System message:")
        print(system_message)
        print("User message:")
        print(user_message)

    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "system",
                "content": system_message,
            },
            {
                "role": "user",
                "content": user_message,
            },
        ],
    )

    return response.choices[0].message.content

In [33]:
query = "What is Stargate?"
results_text = query_results(query)

Result 1 (Similarity: 0.9342499421203767):
While some consumers and professionals have embraced ChatGPT and other conversational AI as well as AI-generated video, turning these recent breakthroughs into technology that produces significant revenue could take longer than practitioners in the field anticipated. Firms including Amazon and Google have quietly tempered expectations for sales, in part because such AI is costly and requires a lot of work to launch inside large enterprises or to power new features in apps used by millions of people. Altman said at an Intel event last month that AI models get “predictably better” when researchers throw more computing power at them. OpenAI has published research on this topic, which it refers to as the “scaling laws” of conversational AI. OpenAI “throwing ever more compute [power to scale up existing AI] risks leading to a ‘trough of disillusionment’” among customers as they realize the limits of the technology, said Ali Ghodsi, CEO of Databrick

In [34]:
ask(query, results_text)

System message:
Answer the question based on the context below, and if the question can't be answered based on the context, say "I don't know"


User message:
Context: While some consumers and professionals have embraced ChatGPT and other conversational AI as well as AI-generated video, turning these recent breakthroughs into technology that produces significant revenue could take longer than practitioners in the field anticipated. Firms including Amazon and Google have quietly tempered expectations for sales, in part because such AI is costly and requires a lot of work to launch inside large enterprises or to power new features in apps used by millions of people. Altman said at an Intel event last month that AI models get “predictably better” when researchers throw more computing power at them. OpenAI has published research on this topic, which it refers to as the “scaling laws” of conversational AI. OpenAI “throwing ever more compute [power to scale up existing AI] risks leading to a

"I don't know. The context provided does not mention anything about Stargate."

In [35]:
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {
            "role": "user",
            "content": f"""Given a question, generate a paragraph of text that answers the question.
Question: {query}
Answer:
         """,
        },
    ],
    max_tokens=CHUNK_SIZE,
)

generated_text = response.choices[0].message.content
generated_text

'Stargate is a popular science fiction franchise that originated with the 1994 film of the same name, directed by Roland Emmerich and starring Kurt Russell and James Spader. The premise centers around a mysterious ancient device called the Stargate, which allows for instant travel to other planets across the galaxy. Following the film’s success, the franchise expanded into several television series, most notably "Stargate SG-1," which debuted in 1997 and ran for ten seasons, as well as spin-offs like "Stargate Atlantis" and "Stargate Universe." The shows combine elements of exploration, adventure, and mythology, as a team of military personnel and scientists use the Stargate to explore new worlds, encounter alien civilizations, and defend Earth from threats. The franchise has garnered a dedicated fan base and remains influential in the realm of science fiction.'

In [36]:
generated_results_text = query_results(generated_text)

Result 1 (Similarity: 0.9120038019165091):
With more servers available, some OpenAI leaders believe the company can use its existing AI and recent technical breakthroughs such as Q*—a model that can reason about math problems it hasn’t previously been trained to solve—to create the right synthetic (non–human-generated) data for training better models after running out of human-generated data to give them. These models may also be able to figure out the flaws in existing models like GPT-4 and suggest technical improvements—in other words, self-improving AI.
----------------------------------------------------------------------------------------------------
Result 2 (Similarity: 0.9069401840199753):
An OpenAI spokesperson did not have a comment for this article. Altman has said privately that Google, one of OpenAI’s biggest rivals, will have more computing capacity than OpenAI in the near term, and publicly he has complained about not having as many AI server chips as he’d like. That’s o

In [37]:
ask(generated_text, generated_results_text)

System message:
Answer the question based on the context below, and if the question can't be answered based on the context, say "I don't know"


User message:
Context: With more servers available, some OpenAI leaders believe the company can use its existing AI and recent technical breakthroughs such as Q*—a model that can reason about math problems it hasn’t previously been trained to solve—to create the right synthetic (non–human-generated) data for training better models after running out of human-generated data to give them. These models may also be able to figure out the flaws in existing models like GPT-4 and suggest technical improvements—in other words, self-improving AI.

###

An OpenAI spokesperson did not have a comment for this article. Altman has said privately that Google, one of OpenAI’s biggest rivals, will have more computing capacity than OpenAI in the near term, and publicly he has complained about not having as many AI server chips as he’d like. That’s one reason he 

"I don't know."