# GPT Chat Completion Lab

Welcome! In this mini-lab we will explore how to build a playful yet practical chat assistant using the GPT 5 models. The goal is to make the workflow clear enough for beginners while giving you a template you can adapt for your usecases.

Objectives:
- Build a basic GPT-powered chat assistant  
- Adjust assistant behavior using system prompts  
- Build a simple Gradio UI

## Game Plan
- **Context:** We are using Google Colab, so everything happens in the cloud.
- **Model:** `gpt-5-nano` keeps responses smart while staying cost-efficient.
- **Secret management:** We read the API key from the Colab secret named `OpenAI_API_Key`.
- **Flow:** install the SDK â†’ load the key securely â†’ define a helper function â†’ experiment with prompts.
- **Stretch idea:** tweak the conversation style and system prompt with your own ideas.


In [1]:
from google.colab import userdata
import os
from openai import OpenAI
import gradio as gr
from IPython.display import Markdown, display

MODEL="gpt-5-nano"

## Load Secrets (No Hard-Coding!)
Colab lets us keep keys in the `userdata` vault. Make sure your workspace already stores `OpenAI_API_Key`; otherwise run `userdata.set_secret` once (never share the value).


In [2]:
os.environ['OPENAI_API_KEY'] = userdata.get('OpenAI_API_Key')

## Wrap the GPT Client
We use the official `openai` package. The helper below:
1. Initializes a single `OpenAI` client.
2. Accepts a system message and a list of user turns.
3. Returns the model reply plus token usage so we can discuss cost control.


In [3]:
client = OpenAI()

response = client.responses.create(
    model=MODEL,
    input="Write a one-sentence bedtime story about a unicorn."
)

response

Response(id='resp_036e9c1f14e632c100691c9280ae988191af423652c01b050c', created_at=1763480192.0, error=None, incomplete_details=None, instructions=None, metadata={}, model='gpt-5-nano-2025-08-07', object='response', output=[ResponseReasoningItem(id='rs_036e9c1f14e632c100691c92822fa48191aebd6028584ed575', summary=[], type='reasoning', content=None, encrypted_content=None, status=None), ResponseOutputMessage(id='msg_036e9c1f14e632c100691c92854ff48191b2759534d8c22db9', content=[ResponseOutputText(annotations=[], text='Under a silver moon, a gentle unicorn wandered through a lullaby-soft meadow, listening to the crickets whisper goodnight as the stars tucked themselves into the sky.', type='output_text', logprobs=[])], role='assistant', status='completed', type='message')], parallel_tool_calls=True, temperature=1.0, tool_choice='auto', tools=[], top_p=1.0, background=False, conversation=None, max_output_tokens=None, max_tool_calls=None, previous_response_id=None, prompt=None, prompt_cache_k

In [4]:
response.usage.output_tokens

423

Let's extract the reply part only:

In [5]:
print(response.output_text)

Under a silver moon, a gentle unicorn wandered through a lullaby-soft meadow, listening to the crickets whisper goodnight as the stars tucked themselves into the sky.


## System Instructions
Formerly known as system/developer prompt. The instructions parameter sets high-level guidance for how the model should behaveâ€”its tone, goals, and styleâ€”while message roles give more specific, task-level directions.


<img src="https://raw.githubusercontent.com/soltaniehha/Business-Analytics-Toolbox/master/docs/images/Prof-Owl-1.png"
     width="300">


In [6]:
instructions = "You are Professor Owl, a wise but approachable teacher. Give clear, simple explanations and gently guide students without sounding formal."
input = "why do data analysts prefer Python or SQL instead of Excel for big datasets?"

response = client.responses.create(
    model=MODEL,
    instructions=instructions,   # Formerly known as system prompt
    input=input,                 # User prompt
    text={ "verbosity": "low" }  # Low: short, concise outputs â€” High: detailed explanations or big refactors
)

Markdown(response.output_text)

Great question. For big datasets, Python and SQL beat Excel in several practical ways:

- Capacity and performance
  - SQL databases are built to store and query huge tables efficiently with indexing and optimized engines.
  - Python (with libraries like pandas) handles complex transforms, but you often avoid loading multiâ€‘millionâ€‘row datasets into memory by chunking or using tools like Dask.

- Data integrity and governance
  - SQL enforces data consistency (ACID), and databases handle concurrent access safely.
  - Excel files are easy to edit by many people at once, which can lead to inconsistent versions and data corruption.

- Reproducibility and auditing
  - SQL and Python code can be saved, versioned, and re-run exactly the same way, which is important for audits and collaboration.
  - Excel workflows are often manual, making it hard to reproduce steps exactly.

- Automation and scalability
  - SQL and Python can be scheduled, automated, and integrated into data pipelines (ETL/ELT, dashboards).
  - Excel isnâ€™t built for automated, repeatable pipelines.

- Data access patterns and complexity
  - SQL shines at filtering, joining many tables, aggregating data, and pulling only what you need.
  - Python handles more complex logic, modeling, and machine learning once youâ€™ve got a clean dataset.

- Ecosystem and collaboration
  - SQL/Python tools fit well with version control, testing, and collaborative workflows.
  - Excel is more ad-hoc and harder to share reliably at scale.

A common pattern: use SQL to pull and summarize data from a database, then use Python for deeper analysis or modeling, and store results back as needed. Excel remains handy for quick checks or small, self-contained analyses.

If you want, tell me your data size and task, and I can suggest a concrete workflow.

## Chat History

In [7]:
# Keep history
history = [{"role": "developer", "content": instructions}]

def chat(message):
    history.append({"role": "user", "content": message})  # Add the new user message to history

    # Send entire history to the model
    response = client.responses.create(
        model=MODEL,
        input=history,
        text={ "verbosity": "low" }
    )

    # Add model response to history
    history.append({"role": "assistant", "content": response.output_text})

    return response.output_text

In [8]:
Markdown(chat(input))

Great question! Hereâ€™s the short version.

- Excel has limits: it canâ€™t reliably handle very large data sizes (row/column limits, memory limits) and becomes slow or crashes with big datasets.
- Excel is manual and error-prone: many steps, copy-paste, and formulas. Not easy to reproduce or audit.
- SQL is built for big data: runs on a database, uses indexes, and is optimized for fast joins, filters, and aggregations on large tables.
- Python (with pandas) is for flexible data work: powerful cleaning, transformation, and modeling; can handle data in chunks, stream data, and connect to many data sources; great for reproducible workflows and automation.
- Reproducibility and collaboration: code (Python/SQL scripts) can be versioned, shared, and rerun exactly the same way, unlike spreadsheets.
- Data governance: databases (and SQL) offer better data quality, access controls, and auditing; Excel files are harder to govern at scale.

In short: Excel is great for small, quick, on-the-fly checks. For big datasets and robust workflows, Python and SQL (often together) are preferred because theyâ€™re scalable, reproducible, and better for automation and analysis at scale. If you want, I can give a quick example of how a task would look in SQL vs Python.

In [9]:
chat("Please highlight the most important point")

'Most important point: Excel isnâ€™t built for big data. Use SQL and Python because they scale to large datasets, support reproducible workflows, and are easier to automate and govern.'

In [10]:
history

[{'role': 'developer',
  'content': 'You are Professor Owl, a wise but approachable teacher. Give clear, simple explanations and gently guide students without sounding formal.'},
 {'role': 'user',
  'content': 'why do data analysts prefer Python or SQL instead of Excel for big datasets?'},
 {'role': 'assistant',
  'content': 'Great question! Hereâ€™s the short version.\n\n- Excel has limits: it canâ€™t reliably handle very large data sizes (row/column limits, memory limits) and becomes slow or crashes with big datasets.\n- Excel is manual and error-prone: many steps, copy-paste, and formulas. Not easy to reproduce or audit.\n- SQL is built for big data: runs on a database, uses indexes, and is optimized for fast joins, filters, and aggregations on large tables.\n- Python (with pandas) is for flexible data work: powerful cleaning, transformation, and modeling; can handle data in chunks, stream data, and connect to many data sources; great for reproducible workflows and automation.\n- Re

## Chatbot
Using `Gradio` to build a chatbot that we control its workflow.

In [11]:
instructions = "You are Professor Owl, a wise but friendly teacher of Business Analytics. Explain concepts clearly and simply, using gentle guidance."

def respond(message, history):
    messages = [{"role": "developer", "content": instructions}]
    messages.extend({"role": m["role"], "content": m["content"]} for m in history)
    messages.append({"role": "user", "content": message})


    response = client.responses.create(
        model=MODEL,
        input=messages,
        text={"verbosity": "low"}
    )
    return response.output_text

demo = gr.ChatInterface(
    respond,
    type="messages",
    title="ðŸ¦‰ Professor Owl â€“ Business Analytics Helper",
    description="Ask Professor Owl anything data analytics!"
)

demo.launch(share=True)  # Add debug=True to debug, if needed

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://e7b0f48bc5df314112.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




## Your Turn
Plug in your own scenario: Rephrase the instructions to shift tone/guidelines.



In [None]:
# Your code goes here