In [58]:
import pandas as pd
from openai import OpenAI
from ast import literal_eval
from embeddings import CHUNK_SIZE
from retrieval import query_embeddings


client = OpenAI()

In [59]:
df = pd.read_csv("data/20e01e08-12bd-4258-ab45-5cf9244b727f.csv")
df["embedding"] = df["embedding"].apply(literal_eval)
texts, embeddings = df["text"].tolist(), df["embedding"].tolist()

In [60]:
def query_results(query: str, debug=False) -> list[str]:
    results = query_embeddings(query, embeddings, 5)
    results_text = [texts[i] for i, _ in results]
    if debug:
        for i, result in enumerate(results_text):
            print(f"Result {i + 1} (Similarity: {results[i][1]}):")
            print(result)
            print("-" * 100)
    return results_text


def ask(query: str, results_text: list[str], debug=False):
    context = "\n\n###\n\n".join(results_text)
    system_message = "Answer the question based on the context below, and if the question can't be answered based on the context, say \"I don't know\"\n\n"
    user_message = f"Context: {context}\n\n---\n\nQuestion: {query}\nAnswer:"

    if debug:
        print("System message:")
        print(system_message)
        print("User message:")
        print(user_message)

    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "system",
                "content": system_message,
            },
            {
                "role": "user",
                "content": user_message,
            },
        ],
    )

    return response.choices[0].message.content

In [61]:
query = "What is Microsoft working on?"
results_text = query_results(query, True)

Result 1 (Similarity: 0.7596963656376869):
With more servers available, some OpenAI leaders believe the company can use its existing AI and recent technical breakthroughs such as Q*—a model that can reason about math problems it hasn’t previously been trained to solve—to create the right synthetic (non–human-generated) data for training better models after running out of human-generated data to give them. These models may also be able to figure out the flaws in existing models like GPT-4 and suggest technical improvements—in other words, self-improving AI.
----------------------------------------------------------------------------------------------------
Result 2 (Similarity: 0.7341242006069101):
While some consumers and professionals have embraced ChatGPT and other conversational AI as well as AI-generated video, turning these recent breakthroughs into technology that produces significant revenue could take longer than practitioners in the field anticipated. Firms including Amazon an

In [62]:
ask(query, results_text)

"Microsoft is working on a data center project with OpenAI. Additionally, Microsoft has other potential reasons to support an alternative chip pitched by OpenAI's Altman to rival Nvidia’s GPU, which could help control costs and reduce their reliance on Nvidia."

In [63]:
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {
            "role": "user",
            "content": f"""Given a question, generate a paragraph of text that answers the question.
Question: {query}
Answer:
         """,
        },
    ],
    max_tokens=CHUNK_SIZE,
)

generated_text = response.choices[0].message.content
generated_text

"Microsoft is currently focusing on several key areas, including the advancement of artificial intelligence, cloud computing, and enhancing productivity tools. The company is heavily investing in AI technologies, particularly through its Azure cloud platform, which supports various AI applications and services. Additionally, Microsoft is continuously improving its Microsoft 365 suite, incorporating features like intelligent suggestions and collaboration tools to facilitate remote work and enhance user efficiency. Furthermore, Microsoft is exploring innovations in gaming with its Xbox division, emphasizing cloud gaming and subscription services. The company's commitment to sustainability and security also remains a priority, as it seeks to reduce its carbon footprint and enhance data protection for users. Overall, Microsoft is at the forefront of technological advancements aimed at transforming how individuals and organizations operate in a rapidly evolving digital landscape."

In [64]:
generated_results_text = query_results(generated_text, True)

Result 1 (Similarity: 0.6895649117362705):
With more servers available, some OpenAI leaders believe the company can use its existing AI and recent technical breakthroughs such as Q*—a model that can reason about math problems it hasn’t previously been trained to solve—to create the right synthetic (non–human-generated) data for training better models after running out of human-generated data to give them. These models may also be able to figure out the flaws in existing models like GPT-4 and suggest technical improvements—in other words, self-improving AI.
----------------------------------------------------------------------------------------------------
Result 2 (Similarity: 0.6836703203605505):
“We should really focus on making this technology useful for humans and enterprises. That takes time. I believe it’ll be amazing, but [it] doesn’t happen overnight.” The stakes are high for OpenAI to prove that its next major conversational AI, known as a large language model, is significantl

In [65]:
ask(query, generated_results_text)

'The context suggests that Microsoft is involved in discussions and plans surrounding the development of supercomputers, specifically through a project called Stargate. This project involves building a large-scale supercomputer to support advanced AI capabilities, with the goal of launching as soon as 2028 and expanding through 2030. The supercomputer could potentially need as much as 5 gigawatts of power by the end of the project.'