# Project Overview

In this notebook, I will develop a chatbot app that generates datasets to the user's request.

The LLM model that will be used in this app is `Llama-3.2-3B-Instruct`, which can easily be replaced with other models in the pipeline below.

In [1]:
!pip install -q --upgrade bitsandbytes accelerate

[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m59.1/59.1 MB[0m [31m14.6 MB/s[0m eta [36m0:00:00[0m
[?25h

In [27]:
# imports

import os
import requests
from IPython.display import Markdown, display, update_display
from openai import OpenAI
from google.colab import drive
from huggingface_hub import login
from google.colab import userdata
from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer, BitsAndBytesConfig, TextIteratorStreamer
import torch
import pandas as pd
import io
import re

In [3]:
# Constants

LLAMA = "meta-llama/Llama-3.2-3B-Instruct"

In [4]:
# Sign in to HuggingFace Hub

## To access the model, you must have a Hugging Face token and use it to login and access the Llama model.
hf_token = userdata.get('HF_TOKEN')
login(hf_token, add_to_git_credential=True)


# STEP 1: Setting the model and tokenizer

In [None]:
quant_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_quant_type="nf4"
)

tokenizer = AutoTokenizer.from_pretrained(LLAMA)
tokenizer.pad_token = tokenizer.eos_token
model = AutoModelForCausalLM.from_pretrained(LLAMA, device_map="auto", quantization_config=quant_config)


# Step 2: Running an Example

In [17]:

system_message = """
You receive information from the user about generating a dataset.
The information includes:
1. the columns of the dataset (names and meanings).
2. number of rows (20 unless the user specified otherwise).

if the user's prompt does not meet the requirements - ask the user to follow the instructions.
otherwise, you should answer directly with the table, as a markdown table.
"""

messages = [
    {"role": "system", "content": system_message}
  ]

user_prompt = """
please generate me a dataset of users, including the next columns:
id, user_name, password, rank.
please generate 5 rows only.
"""
{"role": "user", "content": user_prompt}
messages.append({"role": "user", "content": user_prompt})


inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda")
streamer = TextStreamer(tokenizer)
outputs = model.generate(inputs, max_new_tokens=2000, streamer=streamer)
response = tokenizer.decode(outputs[0])

response = tokenizer.decode(outputs[0]).split("assistant<|end_header_id|>")[1]

display(Markdown(response))

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 29 Dec 2025

You receive information from the user about generating a dataset. 
The information includes: 
1. the columns of the dataset (names and meanings).
2. number of rows (20 unless the user specified otherwise).

if the user's prompt does not meet the requirements - ask the user to follow the instructions.
otherwise, you should answer directly with the table, as a markdown table.<|eot_id|><|start_header_id|>user<|end_header_id|>

please generate me a dataset of users, including the next columns: 
id, user_name, password, rank.
please generate 5 rows only.<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Here is the generated dataset as a markdown table:

| id | user_name | password | rank |
|----|-----------|----------|------|
| 1  | John       | password1 | 1     |
| 2  | Alice       | password2 | 2     |
| 3  | Bob        | password3 | 3     |
| 4  | Jane      

# Step 3: Building a gradio app

In [47]:
import gradio as gr
from threading import Thread


SYSTEM_MESSAGE = """
You receive information from the user about generating a dataset.
The information includes:
1. the columns of the dataset (names and meanings).
2. number of rows (20 unless the user specified otherwise).

if the user's prompt does not meet the requirements - ask the user to follow the instructions.
otherwise, you should answer directly with the table, as a markdown table.
"""


def extract_table_to_csv(history):
    """Parses the last assistant message for a markdown table and saves to CSV."""
    if not history:
        return None

    # Get the last message from the assistant
    last_message = history[-1]["content"]

    # Simple regex to find a markdown table (starts and ends with |)
    table_match = re.search(r'(\|.*\|(?:\n\|.*\|)+)', last_message)
    if not table_match:
        return None

    table_str = table_match.group(1)
    # Remove the <|eot_id|> token if it's appended to the last line of the table.
    # This happens because the model output includes this token after the generated table.
    table_str = table_str.replace('<|eot_id|', '').strip()

    # Use pandas to read the markdown table
    # We clean the string to remove the '---' separator line which can confuse basic parsers
    lines = [line for line in table_str.split('\n') if '---|' not in line and line.strip()]
    df = pd.read_csv(io.StringIO('\n'.join(lines)), sep='|', skipinitialspace=True).dropna(axis=1, how='all')

    # Clean column names and data (removing whitespace)
    df.columns = df.columns.str.strip()
    df = df.map(lambda x: x.strip() if isinstance(x, str) else x)

    # Save to a temporary CSV
    file_path = "generated_dataset.csv"
    df.to_csv(file_path, index=False)
    return file_path

def generate_dataset(user_input, history):
    messages = [{"role": "system", "content": SYSTEM_MESSAGE}]
    for msg in history:
        messages.append(msg)
    messages.append({"role": "user", "content": user_input})

    inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to("cuda")
    streamer = TextIteratorStreamer(tokenizer, timeout=10., skip_prompt=True, skip_special_tokens=True)

    generate_kwargs = dict(input_ids=inputs, streamer=streamer, max_new_tokens=2000)

    t = Thread(target=model.generate, kwargs=generate_kwargs)
    t.start()

    history.append({"role": "user", "content": user_input})
    history.append({"role": "assistant", "content": ""})

    partial_text = ""
    for new_text in streamer:
        partial_text += new_text
        history[-1]["content"] = partial_text
        yield history, gr.update(visible=False) # Hide download button while generating

    # Once finished, show the download button
    yield history, gr.update(visible=True)

# --- Gradio UI ---
with gr.Blocks(theme=gr.themes.Soft()) as demo:
    gr.Markdown("# üìä Dataset Generator with Export")

    chatbot = gr.Chatbot(label="Dataset Preview", type="messages", height=300)
    msg = gr.Textbox(label="Input requirements", placeholder="Columns: name, age. Rows: 10.")

    with gr.Row():
        clear = gr.Button("Clear Chat")
        # The download component
        download_btn = gr.DownloadButton("Download last table as CSV (if exists)", visible=False)

    # Sequence: 1. Generate text -> 2. When done, user can click download
    msg.submit(generate_dataset, [msg, chatbot], [chatbot, download_btn])
    msg.submit(lambda: "", None, [msg])

    # This function triggers when the download button is clicked
    download_btn.click(extract_table_to_csv, [chatbot], [download_btn])

    clear.click(lambda: ([], gr.update(visible=False)), None, [chatbot, download_btn])

demo.launch(inbrowser=True)

  with gr.Blocks(theme=gr.themes.Soft()) as demo:
  chatbot = gr.Chatbot(label="Dataset Preview", type="messages", height=300)


It looks like you are running Gradio on a hosted Jupyter notebook, which requires `share=True`. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://f5316e9896ef84cc7b.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


