Skip to content

jiuling-ssk/STA323Project2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

STA323 Project2 - Data Agent System

This repository contains the Q1 Data Agent implementation for STA323 Project2.

What is included

The Q1 files are split by assignment parts:

  • q1_first_three/: code and notes for Q1 parts 1-3, including data selection, preprocessing, Ray Train + LoRA training, reports, and reproducibility notes.

  • q1_fourth_demo/: code and local deployment assets for Q1 part 4, including the Gradio model demo, GitHub Pages static page, and model download helper.

  • q1_fourth_demo/web_demo.py: Qwen-style Gradio web demo adapted from the official Qwen demo pattern.

  • q1_first_three/q1_pipeline.py: Data download, selection, Qwen SFT preprocessing, Ray Train + LoRA training, and safe code execution utilities.

  • q1_first_three/q1_report_cn.md: Chinese report draft for Q1.

  • q1_first_three/q1_reproduce.md: Reproduction commands and training result summary.

  • q1_first_three/project2_q1_complete.ipynb: Notebook entry for the completed Q1 workflow.

  • q1_fourth_demo/docs/index.html: GitHub Pages static project page.

Final training result

Base model: Qwen/Qwen3.5-0.8B
Training method: Ray Train + LoRA
Train samples: 2000
Validation samples: 500
GPU: NVIDIA A30 24GB
train_loss: 0.7628931648731232
validation_loss: 0.7091293325424194
global_step: 250
checkpoint: project2_outputs/ray_results/qwen-datamind-lora/checkpoint_2026-06-01_03-31-48.192484

Run the demo

python3 -m venv --system-site-packages .venv
.venv/bin/python -m pip install -r requirements.txt

Download the official checkpoint for laptop/offline demos:

.venv/bin/python q1_fourth_demo/download_qwen_checkpoint.py \
  --repo-id Qwen/Qwen3.5-0.8B \
  --output-dir models/Qwen3.5-0.8B

Run with the official checkpoint:

NO_PROXY=127.0.0.1,localhost \
no_proxy=127.0.0.1,localhost \
GRADIO_ANALYTICS_ENABLED=False \
.venv/bin/python q1_fourth_demo/web_demo.py \
  --checkpoint-path models/Qwen3.5-0.8B \
  --server-port 7860 \
  --inbrowser

If the fine-tuned LoRA checkpoint exists on the machine, run:

NO_PROXY=127.0.0.1,localhost \
no_proxy=127.0.0.1,localhost \
GRADIO_ANALYTICS_ENABLED=False \
.venv/bin/python q1_fourth_demo/web_demo.py \
  --checkpoint-path /Work/STA323/Project2/project2_outputs/ray_results/qwen-datamind-lora/checkpoint_2026-06-01_03-31-48.192484 \
  --base-model-path /Work/STA323/Project2/models/Qwen3.5-0.8B \
  --server-port 7860

Open http://127.0.0.1:7860.

The Web UI is optimized for data-analysis demos: upload a CSV/Excel file, inspect the generated data diagnostics and preview, choose a suggested analysis question, generate Python code, execute it in the restricted analysis sandbox, and show the chart/output in the same page.

Local deployment note

GitHub Pages hosts only the static q1_fourth_demo/docs/index.html project page. The actual model demo is intended to run locally with Gradio at http://127.0.0.1:7860. For the course demo, start q1_fourth_demo/web_demo.py locally, then open the local Gradio URL in the browser.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors