# Introduction

This notebook shows how to get a summary of a pdf document using OpenAI API

# Load OpenAI API key

In [1]:
import os
from dotenv import load_dotenv

load_dotenv()
api_key = os.getenv("OPENAI_API_KEY")

if api_key:
    print(f"API key loaded: {api_key[:5]}... (truncated)")
else:
    print("API key not loaded")

API key loaded: sk-pr... (truncated)


# Create OpenAI client

In [2]:
from openai import OpenAI

try:
    openai_client = OpenAI()
    print("OpenAI client created successfully.")
except Exception as e:
    print(f"Failed to create OpenAI client: {e}")

OpenAI client created successfully.


# Upload a pdf document to OpenAI API Assistent

In [3]:
file = openai_client.files.create(
    file=open("Tech_Screening_Cloud_Architect.pdf", "rb"),
    purpose="assistants"
)
print("Uploaded file ID:", file.id)

Uploaded file ID: file-HzTxHJ3uV9A4MaqcELqxbw


In [4]:
# Check the file you have uploaded
files = openai_client.files.list()
for f in files.data:
    print(f"{f.id} | {f.filename} | {f.purpose} | {f.status}")

file-HzTxHJ3uV9A4MaqcELqxbw | Tech_Screening_Cloud_Architect.pdf | assistants | processed


# Create API assistant

In [5]:
assistant = openai_client.beta.assistants.create(
    name="Document Analyzer",
    instructions="You help summarize and analyze uploaded documents.",
    tools=[{"type": "file_search"}],
    model="gpt-4o"
)
print("Assistant ID:", assistant.id)

Assistant ID: asst_RhoI95pCTUQzSzMXoVWSAzDN


# Create a thread

In [6]:
thread = openai_client.beta.threads.create()
print("Thread ID:", thread.id)

  thread = openai_client.beta.threads.create()


Thread ID: thread_eJgqh0j5TwoIq7yu4PLefUWR


# Send a message with attached file

In [7]:
message = openai_client.beta.threads.messages.create(
    thread_id=thread.id,
    role="user",
    content="Can you summarize this document and write the summary in markdown format?",
    attachments=[
                {
                    "file_id": file.id,
                    "tools": [{"type": "file_search"}]
                }
            ]
)
print("Message ID:", message.id)

  message = openai_client.beta.threads.messages.create(


Message ID: msg_ajky531qoMBaChK10trXQJXF


# Start a run and attach the pdf document

In [8]:
run = openai_client.beta.threads.runs.create(
            thread_id=thread.id,
            assistant_id=assistant.id,
        )

  run = openai_client.beta.threads.runs.create(


# Poll for result

In [9]:
import time
while True:
    run_status = openai_client.beta.threads.runs.retrieve(thread_id=thread.id, run_id=run.id)
    if run_status.status in ["completed", "failed"]:
        break
    time.sleep(1)

  run_status = openai_client.beta.threads.runs.retrieve(thread_id=thread.id, run_id=run.id)


# Read the assistant’s response

In [10]:
from IPython.display import Markdown, display

def display_markdown(text):
    display(Markdown(text))

In [11]:
messages = openai_client.beta.threads.messages.list(thread_id=thread.id)
for msg in reversed(messages.data):
    display_markdown(msg.content[0].text.value)

  messages = openai_client.beta.threads.messages.list(thread_id=thread.id)


Can you summarize this document and write the summary in markdown format?

# Summary of Tech Screening Document

## Introduction
The document presents a technical challenge for a hypothetical Cloud Architect role. It centers on enhancing "Mixtapes," a music-sharing platform initiated by Rocky Roads (RR), a record company. The goal is to make the platform scalable, faster, and future-proof, allowing musicians worldwide to share their talents easily.

## Background
Gert-Jan, founder of RR, conceptualized Mixtapes to revolutionize music distribution and discovery. Initially launched as a local platform in the Benelux, it is now poised for international growth, especially in the American and potentially Asian markets. Mixtapes serves as a community for artists to showcase their work for free, while record labels and other entities pay to scout talent【4:1†Tech_Screening_Cloud_Architect.pdf】.

## Current Challenges
The platform currently operates with traditional web hosting in Amsterdam, which suffices for regional users but does not deliver optimal performance globally. Gert-Jan is open to changes that improve cost-efficiency and scalability without altering the user experience on the frontend【4:4†Tech_Screening_Cloud_Architect.pdf】.

## Technical Considerations
Mixtapes utilizes a backend comprising SQL, a Redis-cluster, and PHP, with a RESTful API interfacing the frontend. The document highlights the need for a cloud solution that balances performance with costs, given the potential for significant international traffic【4:3†Tech_Screening_Cloud_Architect.pdf】.

## Conclusion
The document tasks the Cloud Architect with proposing a strategic plan for upgrading Mixtapes to handle its growing international presence while ensuring the platform remains cost-effective and user-friendly【4:5†Tech_Screening_Cloud_Architect.pdf】. It invites creative freedom in the approach, encouraging innovative solutions to enhance the platform's technological infrastructure.

# Delete attached files

In [12]:
files = openai_client.files.list()
for f in files.data:
    openai_client.files.delete(file_id=f.id)

In [13]:
# Check if all files have been deleted

files = openai_client.files.list()
if not files.data:
    print("All files have been deleted successfully.")
else:
    print("Some files were not deleted:")
    for f in files.data:
        print(f"{f.id} | {f.filename} | {f.purpose} | {f.status}")

All files have been deleted successfully.
