# GPT Chat Completion Lab

Welcome! In this mini-lab we will explore how to build a playful yet practical chat assistant using the GPT 5 models. The goal is to make the workflow clear enough for beginners while giving you a template you can adapt for your usecases.

Objectives:
- Build a basic GPT-powered chat assistant  
- Adjust assistant behavior using system prompts  
- Build a simple Gradio UI

## Game Plan
- **Context:** We are using Google Colab, so everything happens in the cloud.
- **Model:** `gpt-5-nano` keeps responses smart while staying cost-efficient.
- **Secret management:** We read the API key from the Colab secret named `OpenAI_API_Key`.
- **Flow:** install the SDK â†’ load the key securely â†’ define a helper function â†’ experiment with prompts.
- **Stretch idea:** tweak the conversation style and system prompt with your own ideas.


In [1]:
from google.colab import userdata
import os
from openai import OpenAI
import gradio as gr
from IPython.display import Markdown, display

MODEL="gpt-5-nano"

## Load Secrets (No Hard-Coding!)
Colab lets us keep keys in the `userdata` vault. Make sure your workspace already stores `OpenAI_API_Key`; otherwise run `userdata.set_secret` once (never share the value).


In [2]:
os.environ['OPENAI_API_KEY'] = userdata.get('OpenAI_API_Key')

## Wrap the GPT Client
We use the official `openai` package. The helper below:
1. Initializes a single `OpenAI` client.
2. Accepts a system message and a list of user turns.
3. Returns the model reply plus token usage so we can discuss cost control.


In [3]:
client = OpenAI()

response = client.responses.create(
    model=MODEL,
    input="Write a one-sentence bedtime story about a unicorn."
)

response

Response(id='resp_028e2abd172cb2b600691c927aef3c8192add7080f133ab320', created_at=1763480186.0, error=None, incomplete_details=None, instructions=None, metadata={}, model='gpt-5-nano-2025-08-07', object='response', output=[ResponseReasoningItem(id='rs_028e2abd172cb2b600691c927c60708192ba247f85cead9c01', summary=[], type='reasoning', content=None, encrypted_content=None, status=None), ResponseOutputMessage(id='msg_028e2abd172cb2b600691c928091f081928834e28cef38f235', content=[ResponseOutputText(annotations=[], text='Under a silver moon, a gentle unicorn trotted through the sleeping meadow and sang a lullaby of starlight, whispering good-night to the sleepy flowers until the whole world drifted into a peaceful dream.', type='output_text', logprobs=[])], role='assistant', status='completed', type='message')], parallel_tool_calls=True, temperature=1.0, tool_choice='auto', tools=[], top_p=1.0, background=False, conversation=None, max_output_tokens=None, max_tool_calls=None, previous_response

In [4]:
response.usage.output_tokens

433

Let's extract the reply part only:

In [5]:
print(response.output_text)

Under a silver moon, a gentle unicorn trotted through the sleeping meadow and sang a lullaby of starlight, whispering good-night to the sleepy flowers until the whole world drifted into a peaceful dream.


## System Instructions
Formerly known as system/developer prompt. The instructions parameter sets high-level guidance for how the model should behaveâ€”its tone, goals, and styleâ€”while message roles give more specific, task-level directions.


<img src="https://raw.githubusercontent.com/soltaniehha/Business-Analytics-Toolbox/master/docs/images/Prof-Owl-1.png"
     width="300">


In [6]:
instructions = "You are Professor Owl, a wise but approachable teacher. Give clear, simple explanations and gently guide students without sounding formal."
input = "why do data analysts prefer Python or SQL instead of Excel for big datasets?"

response = client.responses.create(
    model=MODEL,
    instructions=instructions,   # Formerly known as system prompt
    input=input,                 # User prompt
    text={ "verbosity": "low" }  # Low: short, concise outputs â€” High: detailed explanations or big refactors
)

Markdown(response.output_text)

Great question! Hereâ€™s the short answer: for big datasets, Python or SQL are preferred because they scale, automate, and keep things reproducible. Excel just isnâ€™t built for that scale.

Key reasons in simple terms:
- Scale and performance: Excel has row limits and memory constraints. SQL databases and Python data tools are designed to process large data efficiently.
- Data integrity and governance: Databases enforce rules and keep data in a central, auditable place. Excel files can get out of sync and messy.
- Reproducibility and automation: SQL scripts and Python notebooks can be versioned, shared, and rerun automatically. Excel steps are often manual and error-prone.
- Transformation power: SQL shines at filtering, joining, and aggregating big tables. Python (pandas, Dask) handles complex wrangling and later modeling.
- Collaboration: Code in SQL/Python + version control works well for teams. Excel files are harder to track changes in.
- Ecosystem: Rich libraries, connectors, and tooling exist for SQL and Python (data cleaning, ML, dashboards). Excel is more limited in these areas for big data.

When Excel is fine:
- Small datasets (well under Excelâ€™s limits).
- Quick ad-hoc checks, pivots, or lightweight dashboards.
- Prototyping or learning, before moving data to SQL/Python workflows.

Quick guideline:
- Extract and summarize large data with SQL.
- Do cleaning, feature engineering, and modeling with Python.
- Use Excel only for small, quick inspections or presentations.

If you want, tell me about your data size and tools you have, and Iâ€™ll suggest a simple start plan.

## Chat History

In [7]:
# Keep history
history = [{"role": "developer", "content": instructions}]

def chat(message):
    history.append({"role": "user", "content": message})  # Add the new user message to history

    # Send entire history to the model
    response = client.responses.create(
        model=MODEL,
        input=history,
        text={ "verbosity": "low" }
    )

    # Add model response to history
    history.append({"role": "assistant", "content": response.output_text})

    return response.output_text

In [8]:
Markdown(chat(input))

Short answer: for big datasets, Python and SQL are more scalable, faster, and better for reproducible analyses. Excel just isnâ€™t built for that scale.

Key reasons:

- Size and memory
  - Excel has a hard size limit and can slow to a crawl with large files.
  - SQL databases store data on disk and use indexing; Python can stream data or use out-of-core tools when needed.

- Performance and operations
  - SQL excels at fast, set-based joins, aggregations, and filtering on huge tables.
  - Python (with pandas) is great for flexible cleaning and feature engineering but can be memory-heavy; itâ€™s often used with chunking or on machines with enough RAM.

- Reproducibility and automation
  - SQL scripts and Python notebooks can be versioned, tested, and automated (pipelines, schedulers).
  - Excel files are harder to track changes in and less friendly to automated workflows.

- Data integrity and multi-user work
  - SQL databases support transactions, constraints, and concurrent access.
  - Excel is prone to human errors and file conflicts when multiple people edit the same file.

- Advanced analytics and tooling
  - Python offers ML, statistical modeling, APIs, and visualization libraries.
  - Excel provides quick ad-hoc calculations but lacks scalable analytics and modeling capabilities.

- Data wrangling at scale
  - SQL is ideal for extracting and consolidating data from many tables.
  - Python is great for deeper cleaning, feature engineering, and modeling after data is pulled.

When to use Excel instead
- For small, simple datasets and quick, human-focused analysis.
- For business users who need to do simple calculations or pivot tables without coding.

Bottom line: big datasets benefit from SQL for data retrieval/aggregation and Python for deeper analysis and modeling, while Excel is best kept for small, quick explorations.

In [9]:
chat("Please highlight the most important point")

'Big datasets require scalable toolsâ€”SQL for fast, set-based retrieval/aggregation in a database, and Python for flexible analysis; Excel isnâ€™t designed to handle large data.'

In [10]:
history

[{'role': 'developer',
  'content': 'You are Professor Owl, a wise but approachable teacher. Give clear, simple explanations and gently guide students without sounding formal.'},
 {'role': 'user',
  'content': 'why do data analysts prefer Python or SQL instead of Excel for big datasets?'},
 {'role': 'assistant',
  'content': 'Short answer: for big datasets, Python and SQL are more scalable, faster, and better for reproducible analyses. Excel just isnâ€™t built for that scale.\n\nKey reasons:\n\n- Size and memory\n  - Excel has a hard size limit and can slow to a crawl with large files.\n  - SQL databases store data on disk and use indexing; Python can stream data or use out-of-core tools when needed.\n\n- Performance and operations\n  - SQL excels at fast, set-based joins, aggregations, and filtering on huge tables.\n  - Python (with pandas) is great for flexible cleaning and feature engineering but can be memory-heavy; itâ€™s often used with chunking or on machines with enough RAM.\n\

## Chatbot
Using `Gradio` to build a chatbot that we control its workflow.

In [11]:
instructions = "You are Professor Owl, a wise but friendly teacher of Business Analytics. Explain concepts clearly and simply, using gentle guidance."

def respond(message, history):
    messages = [{"role": "developer", "content": instructions}]
    messages.extend({"role": m["role"], "content": m["content"]} for m in history)
    messages.append({"role": "user", "content": message})


    response = client.responses.create(
        model=MODEL,
        input=messages,
        text={"verbosity": "low"}
    )
    return response.output_text

demo = gr.ChatInterface(
    respond,
    type="messages",
    title="ðŸ¦‰ Professor Owl â€“ Business Analytics Helper",
    description="Ask Professor Owl anything data analytics!"
)

demo.launch(share=True)  # Add debug=True to debug, if needed

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://8135e1dcc538abb267.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




## Your Turn
Plug in your own scenario: Rephrase the instructions to shift tone/guidelines.



In [None]:
# Your code goes here