Hα» thα»ng AI tα»± Δα»ng sinh BαΊ£ng cΓ’u hα»i khαΊ£o sΓ‘t (Questionnaire) vΓ PhαΊ‘m vi cΓ΄ng viα»c (Scope of Work) cho cΓ‘c dα»± Γ‘n tΖ° vαΊ₯n, tΓch hợp Google Gemini, RAG (BigQuery Vector Search), vΓ xuαΊ₯t kαΊΏt quαΊ£ ra Google Sheets.
- Features
- Architecture
- Tech Stack
- Prerequisites
- Installation
- Configuration
- Usage
- Deployment
- Project Structure
- Documentation
- Contributing
- AI-Powered Generation: Sα» dα»₯ng Google Gemini vα»i RAG Δα» sinh cΓ’u hα»i thΓ΄ng minh
- Multi-Category Support: Tα» chα»©c cΓ’u hα»i theo cΓ‘c danh mα»₯c logic (Technical, Business, Infrastructure, etc.)
- Quality Control: AI Judge tα»± Δα»ng ΔΓ‘nh giΓ‘ chαΊ₯t lượng cΓ’u hα»i (coverage, relevance, clarity)
- Retry Mechanism: Tα»± Δα»ng regenerate nαΊΏu chαΊ₯t lượng khΓ΄ng ΔαΊ‘t (max 3 lαΊ§n)
- RAG Integration: Vector search trΓͺn BigQuery Δα» lαΊ₯y cΓ’u hα»i tΖ°Ζ‘ng tα»±
- Google Sheets Export: XuαΊ₯t trα»±c tiαΊΏp vΓ o Google Sheet vα»i formatting chuyΓͺn nghiα»p
- Custom Worksheet: NgΖ°α»i dΓΉng tα»± ΔαΊ·t tΓͺn worksheet, tα»± Δα»ng xΓ³a & tαΊ‘o mα»i
- Requirements Display: Hiα»n thα» yΓͺu cαΊ§u gα»c cα»§a ngΖ°α»i dΓΉng (truncated 200 chars)
- Multi-Agent System: 4 agents chuyΓͺn biα»t (Generate, Quality, Refine, Export)
- Intelligent Orchestration: Judge-based quality control vα»i auto-retry
- Comprehensive Output: Project Detail + Assumptions + Scope of Work
- Professional Formatting: 15-column task breakdown vα»i man-days, owners, dates
- Dual Worksheet Export:
- Worksheet 1 "Overview": Project metadata + Detail (left) + Assumptions (right)
- Worksheet 2 "Scope of Work": Full task table vα»i Progress dropdown
- Category Grouping: Tasks grouped by category vα»i auto-calculated totals
- Interactive UI: Phase-by-phase generation vα»i real-time status updates
- Streamlit UI: Modern, responsive web interface
- Session Management: Persistent state across interactions
- JSON Export: Download results as structured JSON
- Error Handling: Comprehensive error reporting vΓ user guidance
- Cost Tracking: Token usage monitoring (if implemented)
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β User (Web Browser) β
βββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Streamlit Frontend β
β βββββββββββββββββββββββββ ββββββββββββββββββββββββββββββββββ β
β β questionnaire_ui.py β β sow_agent_ui.py β β
β β - Input form β β - 4-phase workflow β β
β β - Config sidebar β β - Agent orchestration β β
β β - Export interface β β - Quality control β β
β βββββββββββββββββββββββββ ββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Business Logic Layer β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Questionnaire Module (app/questionaires/) β β
β β - engine.py: Core generation logic β β
β β - model.py: Pydantic models (Question, Category, etc.) β β
β β - prompts.py: LLM prompt templates β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β SOW Module (app/scope_of_work/) β β
β β - engine.py: Multi-agent orchestrator β β
β β - model.py: SOW data models β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Sheet Connectors (app/sheet/, app/sow_sheet/) β β
β β - connect.py: Google Sheets API integration β β
β β - model.py: Sheet operation models β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β External Services β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββββββββ β
β β Google β β BigQuery β β Google Sheets β β
β β Gemini API β β Vector Searchβ β API β β
β β (LLM) β β (RAG) β β (Export) β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Questionnaire Generation:
User Input β RAG Retrieval (BigQuery) β LLM Generation (Gemini)
β
AI Judge
β
Pass? β Export to Sheets
Fail? β Retry (max 3x)
SOW Generation:
Phase 1: Gather Info β Extract requirements
Phase 2: Generate β Multi-agent system (Generate + Quality + Refine)
Phase 3: Quality Control β Judge evaluation + auto-retry
Phase 4: Export β Dual-worksheet formatting β Google Sheets
- Python 3.11+: Primary language
- Streamlit 1.28+: Web UI framework
- Pydantic 2.0+: Data validation
- Google Gemini: LLM for generation (gemini-1.5-flash, gemini-1.5-pro)
- BigQuery Vector Search: RAG for retrieval
- Vertex AI: ML platform
- Google Cloud Run: Serverless deployment
- Secret Manager: Credential management
- Cloud Build: Container builds
- Artifact Registry: Docker registry
- Google Sheets API: Export functionality
- gspread: Python wrapper for Sheets
- google-auth: Authentication
- python-dotenv: Environment variables
- pytest: Testing framework
- black: Code formatting
- Python 3.11 or higher
- pip (Python package manager)
- Git
- Google Cloud SDK (for deployment)
- GCP Project vα»i cΓ‘c API enabled:
- Vertex AI API (
aiplatform.googleapis.com) - BigQuery API (
bigquery.googleapis.com) - Google Sheets API (
sheets.googleapis.com) - Google Drive API (
drive.googleapis.com) - Secret Manager API (
secretmanager.googleapis.com) - Cloud Run API (
run.googleapis.com)
- Vertex AI API (
- Google API Key: Gemini API access
- Service Account JSON: BigQuery + Sheets access
- Required roles:
- BigQuery Data Viewer
- BigQuery Job User
- Vertex AI User
- Required roles:
- Shared Sheet URL (ngΖ°α»i dΓΉng cung cαΊ₯p)
- Service account email added as Editor
-
Clone repository
git clone https://github.com/hungchou8994/int-data-qa-n-sow-generation.git cd int-data-qa-n-sow-generation -
Create virtual environment
# Windows python -m venv venv .\venv\Scripts\activate # Linux/Mac python3 -m venv venv source venv/bin/activate
-
Install dependencies
cd app pip install -r requirements.txt -
Setup environment variables
# Create .env file in project root cp .env.example .env # Edit .env with your credentials GOOGLE_API_KEY=your_gemini_api_key_here GOOGLE_APPLICATION_CREDENTIALS=path/to/service-account-key.json
-
Verify installation
python -c "import streamlit; import google.generativeai; print('OK')"
Create .env file in project root:
# Required
GOOGLE_API_KEY=AIzaSy... # Gemini API key
GOOGLE_APPLICATION_CREDENTIALS=D:\path\to\service-account.json
# Optional
PROJECT_ID=int-data-qa-n-sow-generation # GCP project ID
BQ_DATASET=your_dataset # BigQuery dataset name
BQ_TABLE=your_table # BigQuery table name-
Create service account in GCP Console
-
Grant IAM roles:
gcloud projects add-iam-policy-binding int-data-qa-n-sow-generation \ --member=serviceAccount:SERVICE_ACCOUNT_EMAIL \ --role=roles/bigquery.dataViewer gcloud projects add-iam-policy-binding int-data-qa-n-sow-generation \ --member=serviceAccount:SERVICE_ACCOUNT_EMAIL \ --role=roles/bigquery.jobUser gcloud projects add-iam-policy-binding int-data-qa-n-sow-generation \ --member=serviceAccount:SERVICE_ACCOUNT_EMAIL \ --role=roles/aiplatform.user
-
Download JSON key vΓ set trong
.env
Prepare your questionnaire embeddings table:
CREATE TABLE `project.dataset.questionnaire_embeddings` (
id STRING,
question TEXT,
embedding ARRAY<FLOAT64>,
category STRING,
metadata JSON
);-
Start app
cd app streamlit run questionnaire_ui.py -
Open browser: http://localhost:8501
-
Configure (Sidebar)
- Google API Key (if not in .env)
- Model selection (gemini-1.5-flash recommended)
- Temperature (0.7 default)
-
Generate questionnaire
- Enter customer name
- Describe business domain
- Specify requirements
- Click "Generate Questionnaire"
-
Export to Google Sheets
- Paste Google Sheet URL
- Enter worksheet name (default: "Questionnaire")
- Click "Export to Google Sheets"
-
Start app
cd app streamlit run sow_agent_ui.py -
Phase 1: Gather Information
- Customer name
- Business context
- Requirements
- Click "Start Generation"
-
Phase 2-3: AI Generation (automatic)
- Agent system generates SOW
- Judge evaluates quality
- Auto-retry if needed
-
Phase 4: Export
- Paste Google Sheet URL
- Enter worksheet names (default: "Overview", "Scope of Work")
- Click "Export to Google Sheets"
- Or download JSON
cd app/sow_sheet
python test_export.py-
Setup secrets (one-time)
# Create secrets in Secret Manager echo -n "YOUR_API_KEY" | gcloud secrets create google-api-key --data-file=- gcloud secrets create service-account-key ` --data-file="D:\path\to\service-account.json" # Grant access to Cloud Run service account .\setup-permissions.ps1
-
Deploy
# Windows .\deploy.ps1 # Linux/Mac chmod +x deploy.sh ./deploy.sh
-
Access deployed app
- URL will be displayed after deployment
- Format:
https://SERVICE_NAME-HASH-REGION.run.app
See detailed guide: DEPLOYMENT.md
See setup guide: SERVICE_ACCOUNT_SETUP.md
See monitoring guide: OBSERVABILITY.md
int-data-qa-n-sow-generation/
βββ app/ # Main application
β βββ questionnaire_ui.py # Questionnaire Streamlit UI
β βββ sow_agent_ui.py # SOW Streamlit UI
β βββ requirements.txt # Python dependencies
β βββ questionaires/ # Questionnaire module
β β βββ engine.py # Core generation logic
β β βββ model.py # Pydantic data models
β β βββ prompts.py # LLM prompt templates
β β βββ test.py # Unit tests
β βββ scope_of_work/ # SOW module
β β βββ engine.py # Multi-agent orchestrator
β β βββ model.py # SOW data models
β βββ sheet/ # Questionnaire sheets connector
β β βββ connect.py # Google Sheets integration
β β βββ model.py # Sheet models
β βββ sow_sheet/ # SOW sheets connector
β βββ connect.py # SOW export logic
β βββ model.py # SOW sheet models
β βββ test_export.py # Export testing script
βββ rag/ # RAG module
β βββ main.py # RAG orchestrator
β βββ bq_vector.py # BigQuery vector search
β βββ requirements.txt # RAG dependencies
β βββ embedding/ # Embedding generation
β β βββ model.py # Embedding models
β βββ sheet/ # RAG data ingestion
β βββ ingest_questionnaires.py # Ingest questionnaires
β βββ ingest_sows.py # Ingest SOWs
β βββ prompt.py # Embedding prompts
β βββ questionaires.csv # Sample data
β βββ sow_hung.csv # Sample SOW data
βββ Dockerfile # Container definition
βββ .dockerignore # Docker build exclusions
βββ deploy.ps1 # PowerShell deployment script
βββ deploy.sh # Bash deployment script
βββ setup-permissions.ps1 # IAM permission setup
βββ .env.example # Environment template
βββ .gitignore # Git exclusions
βββ README.md # This file
βββ DEPLOYMENT.md # Deployment guide
βββ SERVICE_ACCOUNT_SETUP.md # Permission guide
βββ OBSERVABILITY.md # Monitoring guide
- DEPLOYMENT.md: Cloud Run deployment guide
- SERVICE_ACCOUNT_SETUP.md: IAM permissions setup
- OBSERVABILITY.md: Monitoring & logging setup
- engine.py: Core generation with retry logic
- model.py: Data models (Question, Category, Questionnaire)
- prompts.py: System prompts vΓ templates
- engine.py: Multi-agent orchestrator (4 agents)
- model.py: SOW models (ProjectDetail, ProjectAssumption, ScopeOfWork)
- connect.py: Google Sheets API wrapper
- Delete + recreate worksheet approach
- Professional formatting (colors, borders, widths)
- Frozen headers, data validation (dropdowns)
- bq_vector.py: BigQuery vector similarity search
- embedding/model.py: Text embedding generation
- sheet/ingest_*.py: Data ingestion scripts
cd app/questionaires
python -m pytest test.py -vcd app/sow_sheet
python test_export.py- Generate questionnaire (5-10 categories)
- Judge passes (score >= 75)
- Export to Google Sheets works
- Worksheet name customization works
- Generate SOW (all 4 phases)
- SOW export creates 2 worksheets
- Formatting is correct (colors, borders, widths)
-
Create feature branch
git checkout -b feature/your-feature-name
-
Make changes
- Follow PEP 8 style guide
- Add docstrings to functions
- Update tests if needed
-
Test locally
streamlit run questionnaire_ui.py
-
Commit & push
git add . git commit -m "Add: your feature description" git push origin feature/your-feature-name
-
Create Pull Request
- Use black for formatting:
black app/ - Use flake8 for linting:
flake8 app/ - Follow PEP 8 conventions
- Questionnaire Generation: 15-30s (depends on RAG retrieval + LLM)
- SOW Generation: 45-90s (multi-agent with retries)
- Sheet Export: 3-8s (depends on data size)
- Cold start: 10-15s (first request)
- Warm start: 2-5s (subsequent requests)
- Memory usage: ~500MB-1GB
- CPU: 2 vCPU recommended
- β
Never commit
.envor credentials to git - β Use Secret Manager for production secrets
- β Rotate API keys regularly
- β Use service accounts with least privilege
- β Enable Cloud Armor (DDoS protection) for production
- β Implement authentication if needed (Cloud IAP, OAuth)
- Local:
.envfile (gitignored) - Cloud Run: Secret Manager
- CI/CD: GitHub Secrets or Cloud Build substitutions
1. Module import error after folder rename
# Clear Python cache
find . -type d -name "__pycache__" -exec rm -rf {} +
# Or Windows PowerShell
Get-ChildItem -Recurse -Directory -Filter __pycache__ | Remove-Item -Recurse -Force2. Google Sheets "Insufficient permissions"
- Share sheet vα»i service account email (from JSON)
- Grant "Editor" permission
3. BigQuery "Access Denied"
- Verify service account has
roles/bigquery.dataViewer - Check dataset/table permissions
4. Gemini API quota exceeded
- Check API quotas in GCP Console
- Increase quota or use exponential backoff
5. Streamlit connection error
# Disable XSRF protection
streamlit run app.py --server.enableXsrfProtection=false6. Cloud Run deployment fails
# Check build logs
gcloud builds log BUILD_ID
# Check service logs
gcloud run logs tail SERVICE_NAME --region asia-southeast1- Google Cloud Platform for Vertex AI & BigQuery
- Streamlit team for excellent web framework
- Google Gemini for powerful LLM capabilities
Issues & Questions:
- GitHub Issues: Create an issue
Documentation:
- Migrate to Google ADK (Agent Development Kit)
- Implement A2A (Agent-to-Agent) communication
- Add OpenTelemetry observability
- Custom dashboards in Cloud Monitoring
- Automated testing pipeline (CI/CD)
- Multi-language support (English, Vietnamese)
- PDF export functionality
- Template management system
- User authentication & authorization
- Team collaboration features