In [50]:
import pandas as pd
from openai import OpenAI
from ast import literal_eval
from embeddings import CHUNK_SIZE
from retrieval import query_embeddings


client = OpenAI()

In [51]:
df = pd.read_csv("data/20e01e08-12bd-4258-ab45-5cf9244b727f.csv")
df["embedding"] = df["embedding"].apply(literal_eval)
texts, embeddings = df["text"].tolist(), df["embedding"].tolist()

In [52]:
def query_results(query: str, debug=True) -> list[str]:
    results = query_embeddings(query, embeddings, 5)
    results_text = [texts[i] for i, _ in results]
    if debug:
        for i, result in enumerate(results_text):
            print(f"Result {i + 1} (Similarity: {results[i][1]}):")
            print(result)
            print("-" * 100)
    return results_text


def ask(query: str, results_text: list[str], debug=True):
    context = "\n\n###\n\n".join(results_text)
    system_message = "Answer the question based on the context below, and if the question can't be answered based on the context, say \"I don't know\"\n\n"
    user_message = f"Context: {context}\n\n---\n\nQuestion: {query}\nAnswer:"

    if debug:
        print("System message:")
        print(system_message)
        print("User message:")
        print(user_message)

    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "system",
                "content": system_message,
            },
            {
                "role": "user",
                "content": user_message,
            },
        ],
    )

    return response.choices[0].message.content

In [53]:
query = "What is Microsoft working on?"
results_text = query_results(query)

Result 1 (Similarity: 0.7596776731928896):
With more servers available, some OpenAI leaders believe the company can use its existing AI and recent technical breakthroughs such as Q*—a model that can reason about math problems it hasn’t previously been trained to solve—to create the right synthetic (non–human-generated) data for training better models after running out of human-generated data to give them. These models may also be able to figure out the flaws in existing models like GPT-4 and suggest technical improvements—in other words, self-improving AI.
----------------------------------------------------------------------------------------------------
Result 2 (Similarity: 0.7341175261900564):
While some consumers and professionals have embraced ChatGPT and other conversational AI as well as AI-generated video, turning these recent breakthroughs into technology that produces significant revenue could take longer than practitioners in the field anticipated. Firms including Amazon an

In [54]:
ask(query, results_text)

System message:
Answer the question based on the context below, and if the question can't be answered based on the context, say "I don't know"


User message:
Context: With more servers available, some OpenAI leaders believe the company can use its existing AI and recent technical breakthroughs such as Q*—a model that can reason about math problems it hasn’t previously been trained to solve—to create the right synthetic (non–human-generated) data for training better models after running out of human-generated data to give them. These models may also be able to figure out the flaws in existing models like GPT-4 and suggest technical improvements—in other words, self-improving AI.

###

While some consumers and professionals have embraced ChatGPT and other conversational AI as well as AI-generated video, turning these recent breakthroughs into technology that produces significant revenue could take longer than practitioners in the field anticipated. Firms including Amazon and Google have

"Microsoft is working on supporting the development of alternative AI server chips to reduce reliance on Nvidia's GPUs. This involves discussing data center projects with OpenAI and potentially switching from InfiniBand to more generic Ethernet cables."

In [55]:
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {
            "role": "user",
            "content": f"""Given a question, generate a paragraph of text that answers the question.
Question: {query}
Answer:
         """,
        },
    ],
    max_tokens=CHUNK_SIZE,
)

generated_text = response.choices[0].message.content
generated_text

'Microsoft is actively working on a variety of innovative projects and technologies to enhance productivity and user experience across its platforms. One of their significant focuses is on artificial intelligence, including advancements in AI tools integrated into Microsoft 365 applications, which aim to improve efficiency and creativity for users. Additionally, Microsoft is investing in cloud computing through its Azure platform, expanding services to support businesses in their digital transformation journeys. The company is also dedicated to augmenting its gaming division, particularly with services like Xbox Game Pass and exploring the metaverse through initiatives like Mesh for Microsoft Teams. Furthermore, Microsoft is committed to sustainability and accessibility, working towards making technology inclusive and environmentally friendly. These efforts reflect Microsoft’s vision to empower every person and organization on the planet to achieve more.'

In [56]:
generated_results_text = query_results(generated_text)

Result 1 (Similarity: 0.6715918790361837):
With more servers available, some OpenAI leaders believe the company can use its existing AI and recent technical breakthroughs such as Q*—a model that can reason about math problems it hasn’t previously been trained to solve—to create the right synthetic (non–human-generated) data for training better models after running out of human-generated data to give them. These models may also be able to figure out the flaws in existing models like GPT-4 and suggest technical improvements—in other words, self-improving AI.
----------------------------------------------------------------------------------------------------
Result 2 (Similarity: 0.6693152458536474):
“We should really focus on making this technology useful for humans and enterprises. That takes time. I believe it’ll be amazing, but [it] doesn’t happen overnight.” The stakes are high for OpenAI to prove that its next major conversational AI, known as a large language model, is significantl

In [57]:
ask(query, generated_results_text)

System message:
Answer the question based on the context below, and if the question can't be answered based on the context, say "I don't know"


User message:
Context: With more servers available, some OpenAI leaders believe the company can use its existing AI and recent technical breakthroughs such as Q*—a model that can reason about math problems it hasn’t previously been trained to solve—to create the right synthetic (non–human-generated) data for training better models after running out of human-generated data to give them. These models may also be able to figure out the flaws in existing models like GPT-4 and suggest technical improvements—in other words, self-improving AI.

###

“We should really focus on making this technology useful for humans and enterprises. That takes time. I believe it’ll be amazing, but [it] doesn’t happen overnight.” The stakes are high for OpenAI to prove that its next major conversational AI, known as a large language model, is significantly better than

'Microsoft is involved in discussions about launching Stargate, a project aimed at developing supercomputers. The project is expected to start as soon as 2028 and expand through 2030, potentially requiring up to 5 gigawatts of power by the end.'