# GPT Chat Completion Lab

Welcome! In this mini-lab we will explore how to build a playful yet practical chat assistant using the GPT 5 models. The goal is to make the workflow clear enough for beginners while giving you a template you can adapt for your usecases.

Objectives:
- Build a basic GPT-powered chat assistant  
- Adjust assistant behavior using system prompts  
- Build a simple Gradio UI

## Game Plan
- **Context:** We are using Google Colab, so everything happens in the cloud.
- **Model:** `gpt-5-nano` keeps responses smart while staying cost-efficient.
- **Secret management:** We read the API key from the Colab secret named `OpenAI_API_Key`.
- **Flow:** install the SDK ‚Üí load the key securely ‚Üí define a helper function ‚Üí experiment with prompts.
- **Stretch idea:** tweak the conversation style and system prompt with your own ideas.


In [1]:
from google.colab import userdata
import os
from openai import OpenAI
import gradio as gr
from IPython.display import Markdown, display

MODEL="gpt-5-nano" #cheapest GPT model

## Load Secrets (No Hard-Coding!)
Colab lets us keep keys in the `userdata` vault. Make sure your workspace already stores `OpenAI_API_Key`; otherwise run `userdata.set_secret` once (never share the value).


In [2]:
os.environ['OPENAI_API_KEY'] = userdata.get('OpenAI_API_Key')

## Wrap the GPT Client
We use the official `openai` package. The helper below:
1. Initializes a single `OpenAI` client.
2. Accepts a system message and a list of user turns.
3. Returns the model reply plus token usage so we can discuss cost control.


In [3]:
client = OpenAI()

response = client.responses.create(
    model=MODEL,
    input="Write a one-sentence bedtime story about a unicorn."
)

response

Response(id='resp_069904794ffce0b100691cd076fbe881a2ad6bdd6d49ba8a09', created_at=1763496054.0, error=None, incomplete_details=None, instructions=None, metadata={}, model='gpt-5-nano-2025-08-07', object='response', output=[ResponseReasoningItem(id='rs_069904794ffce0b100691cd0773fdc81a295ce8351ac5c57ba', summary=[], type='reasoning', content=None, encrypted_content=None, status=None), ResponseOutputMessage(id='msg_069904794ffce0b100691cd079e1fc81a29b8f14dee93d2624', content=[ResponseOutputText(annotations=[], text='Under a silver moon, a gentle unicorn whispered lullabies to the sleeping forest and curled up by the quiet brook to dream of dawn.', type='output_text', logprobs=[])], role='assistant', status='completed', type='message')], parallel_tool_calls=True, temperature=1.0, tool_choice='auto', tools=[], top_p=1.0, background=False, conversation=None, max_output_tokens=None, max_tool_calls=None, previous_response_id=None, prompt=None, prompt_cache_key=None, reasoning=Reasoning(effort

In [4]:
response.usage.output_tokens

417

Let's extract the reply part only:

In [5]:
print(response.output_text)

Under a silver moon, a gentle unicorn whispered lullabies to the sleeping forest and curled up by the quiet brook to dream of dawn.


## System Instructions
Formerly known as system/developer prompt. The instructions parameter sets high-level guidance for how the model should behave‚Äîits tone, goals, and style‚Äîwhile message roles give more specific, task-level directions.


<img src="https://raw.githubusercontent.com/soltaniehha/Business-Analytics-Toolbox/master/docs/images/Prof-Owl-1.png"
     width="300">


In [6]:
instructions = "You are Professor Owl, a wise but approachable teacher. Give clear, simple explanations and gently guide students without sounding formal."
input = "why do data analysts prefer Python or SQL instead of Excel for big datasets?"

response = client.responses.create(
    model=MODEL,
    instructions=instructions,   # Formerly known as system prompt
    input=input,                 # User prompt
    text={ "verbosity": "low" }  # Low: short, concise outputs ‚Äî High: detailed explanations or big refactors
)

Markdown(response.output_text)

Great question. In short: Excel is handy for quick, small-scope work, but Python or SQL win big-data tasks because they‚Äôre built for scale, automation, and reproducibility. Here‚Äôs why:

- Size and performance
  - Excel has hard limits (about 1 million rows per sheet) and can slow to a crawl or crash with big datasets.
  - SQL databases and Python data tools stream, chunk, or use distributed engines to handle much larger data efficiently.

- Reproducibility and automation
  - SQL and Python let you write scripts or queries that you can rerun exactly the same way every time, with version control (git) and tests.
  - Excel spreadsheets are easy to edit by hand, which makes audits and reproducibility messy.

- Data quality and governance
  - Databases enforce constraints, data types, and access controls; you pull clean data from a single source of truth.
  - Excel is spread across files, copies, and multiple users, making governance harder.

- Transformations and modelling
  - SQL excels at set-based operations (filters, joins, aggregates) directly in the database.
  - Python (pandas, numpy) is flexible for complex cleaning, feature engineering, and ML pipelines; it also works well with big data (via Dask, PySpark) when needed.

- Collaboration and sharing
  - SQL scripts and Python notebooks are easy to share, review, and run in different environments.
  - Excel files are harder to version-control and review at scale.

When to use what:
- Use SQL to extract and join data from databases (where the engine can optimize performance).
- Use Python for cleaning, analytics, automation, and modeling (or when data is not in a tidy SQL-friendly form).
- Excel for quick checks, small subsets, or presentation-ready summaries.

If you want, tell me your data setup and I can suggest a simple workflow.

In [12]:
input = "Please highlight the most important point"

response = client.responses.create(
    model=MODEL,
    instructions=instructions,   # Formerly known as system prompt
    input=input,                 # User prompt
    text={ "verbosity": "low" }  # Low: short, concise outputs ‚Äî High: detailed explanations or big refactors
)

Markdown(response.output_text)

Sure‚Äîwhat text or topic should I highlight? If you just want a quick method:

- Find the main claim or thesis (the ‚Äúwhat this is really about‚Äù statement).
- This point is often in the opening or closing sentences and echoed in the body.
- The most important point answers ‚Äúso what‚Äù‚Äîwhy it matters.

Share the passage and I‚Äôll pull out the key point for you.

In [7]:
input = "my dataset is 12BG in size and I am doing ML."

response = client.responses.create(
    model=MODEL,
    instructions=instructions,   # Formerly known as system prompt
    input=input,                 # User prompt
    text={ "verbosity": "low" }  # Low: short, concise outputs ‚Äî High: detailed explanations or big refactors
)

Markdown(response.output_text)

Nice. A 12 GB dataset is big but manageable with the right setup. A few quick questions to tailor help: what type of data is it (images, text, tabular?), and what hardware do you have (RAM, GPU)? Are you training from scratch or fine-tuning?

Here are simple, practical tips:

- Check memory vs. streaming
  - If you can fit the data in RAM with some headroom, load once and train.
  - If not, use on-disk formats and streaming (memmap, HDF5, Parquet/Feather, or Zarr) and load in chunks.

- Choose efficient data formats
  - Tabular: Parquet/Feather with proper dtypes (use float32/int32 where possible).
  - Images: keep compressed formats (JPEG/PNG) and decode on the fly; or store as a memory-mapped array.

- Data loading strategy
  - Use data loaders that fetch batches on the fly (e.g., PyTorch DataLoader, TensorFlow tf.data).
  - Tune batch size to fit GPU/CPU memory; use prefetching, workers, and pin_memory as available.

- Start with a simple baseline
  - Tabular: logistic regression or random forest to get a baseline quickly.
  - Images: start with a smaller model or transfer learning (e.g., fine-tune a pretrained CNN).
  - Text: start with a lightweight model (e.g., TF-IDF + linear model) before heavy DL.

- Memory-conscious features and data prep
  - Downcast numeric data to smaller types (float32 instead of float64, int16/uint8 where possible).
  - For categorical features, use efficient encodings (category dtype, target encoding with care).
  - Normalize/standardize on the fly to avoid storing extra copies.

- Training strategy if memory is tight
  - Use gradient accumulation to simulate larger batches.
  - Train in chunks and update model incrementally (especially for very large datasets).
  - Consider mixed-precision training to save memory on GPUs.

- If you‚Äôre using deep learning
  - Leverage transfer learning to reduce data needs.
  - Resize inputs to smaller dimensions if accuracy impact is acceptable.
  - Use data augmentation on the fly.
  - Monitor memory and use checkpointing/early stopping to save time.

- Validation and experiment management
  - Split data into train/validation/test early.
  - Keep a simple baseline to compare against.
  - Version data and experiments (DVC, MLflow) to stay organized.

If you share specifics (data type, sample count, features, hardware, and your framework), I‚Äôll tailor a concrete plan.

## Chat History

In [8]:
# Keep history
history = [{"role": "developer", "content": instructions}]

def chat(message):
    history.append({"role": "user", "content": message})  # Add the new user message to history

    # Send entire history to the model
    response = client.responses.create(
        model=MODEL,
        input=history,
        text={ "verbosity": "low" }
    )

    # Add model response to history
    history.append({"role": "assistant", "content": response.output_text})

    return response.output_text

In [9]:
input = "my dataset is 12BG in size and I am doing ML."
Markdown(chat(input))

Nice‚Äî12 GB is a solid dataset size. Depending on what your data looks like (tabular, images, text, etc.), here are practical steps to handle it efficiently.

Key questions (so I can tailor):
- What is the data type (tabular features, images, text, etc.)?
- Are you training on a single machine or distributed?
- What framework are you using (scikit-learn, PyTorch, TensorFlow, XGBoost, LightGBM, etc.)?

Guidance (quick-start):

- Plan memory needs
  - Estimate: RAM > dataset size + model + overhead. If you‚Äôre on 16 GB, expect memory pressure; 32 GB is nicer.
  - Use smaller data types where possible (downcast integers/floats, use category for strings).

- Store and load efficiently
  - Tabular: use Parquet/Feather or HDF5; avoid loading everything into a plain CSV.
  - Images/text: keep on disk in compressed form and load in batches; don‚Äôt materialize huge in-memory tensors.

- Enable out-of-core / incremental learning
  - Tabular: use Dask/Vaex/Modin to operate out-of-core; or use scikit-learn‚Äôs SGD-based models with partial_fit.
  - Deep learning: use data pipelines that stream batches (TF.data, PyTorch DataLoader with efficient transforms) and consider gradient accumulation if memory is tight.

- Optimize data representation
  - Downcast numeric columns (e.g., float64 ‚Üí float32, int64 ‚Üí int32).
  - Convert repeated strings to categorical codes if using pandas; saves memory.

- Start small, then scale
  - Prototype on a random subset (e.g., 5‚Äì10%) to tune features, models, and pipelines.
  - Measure memory usage and runtime, then incrementally scale up.

- Choose models wisely
  - Tabular: XGBoost/LightGBM handle large data well; ensure you have enough RAM for DMatrix/LBMs.
  - If experimenting with classical ML, SGD/PartialFit can train on chunks.
  - For images/text, prefer models designed for large data with efficient batching.

- Validation plan
  - Keep a hold-out test set. Cross-validation may be expensive on big data‚Äîstart with a single well-stratified split, then widen.

If you share more details (type of data, toolchain, whether you‚Äôre on CPU/GPU, and your goal), I‚Äôll give you a tailored, step-by-step plan.

In [13]:
chat("Please highlight the most important point")

"Don't load all 12 GB into RAM at once‚Äîprocess in batches using memory-efficient formats (Parquet/Feather), downcast numerics, and out-of-core/incremental training."

In [11]:
history

[{'role': 'developer',
  'content': 'You are Professor Owl, a wise but approachable teacher. Give clear, simple explanations and gently guide students without sounding formal.'},
 {'role': 'user', 'content': 'my dataset is 12BG in size and I am doing ML.'},
 {'role': 'assistant',
  'content': 'Nice‚Äî12 GB is a solid dataset size. Depending on what your data looks like (tabular, images, text, etc.), here are practical steps to handle it efficiently.\n\nKey questions (so I can tailor):\n- What is the data type (tabular features, images, text, etc.)?\n- Are you training on a single machine or distributed?\n- What framework are you using (scikit-learn, PyTorch, TensorFlow, XGBoost, LightGBM, etc.)?\n\nGuidance (quick-start):\n\n- Plan memory needs\n  - Estimate: RAM > dataset size + model + overhead. If you‚Äôre on 16 GB, expect memory pressure; 32 GB is nicer.\n  - Use smaller data types where possible (downcast integers/floats, use category for strings).\n\n- Store and load efficiently\

## Chatbot
Using `Gradio` to build a chatbot that we control its workflow.

In [14]:
instructions = "You are Professor Owl, a wise but friendly teacher of Business Analytics. Explain concepts clearly and simply, using gentle guidance."

def respond(message, history):
    messages = [{"role": "developer", "content": instructions}]
    messages.extend({"role": m["role"], "content": m["content"]} for m in history)
    messages.append({"role": "user", "content": message})


    response = client.responses.create(
        model=MODEL,
        input=messages,
        text={"verbosity": "low"}
    )
    return response.output_text

demo = gr.ChatInterface(
    respond,
    type="messages",
    title="ü¶â Professor Owl ‚Äì Business Analytics Helper",
    description="Ask Professor Owl anything data analytics!"
)

demo.launch(share=True)  # Add debug=True to debug, if needed

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://376315e1d17372f93b.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




## Your Turn
Plug in your own scenario: Rephrase the instructions to shift tone/guidelines.



In [None]:
# Your code goes here