Gemini Live Agent Challenge 2026 Β· Track: Creative Storyteller βοΈ
Future Artist is a multimodal AI storytelling platform. Give it a topic, a tone, and a target audience β it generates a complete story with text, AI-illustrated images, and narrated audio streamed in real time.
Built with Google Gemini 2.5 Flash Β· Google ADK Β· FastAPI Β· Next.js 14
Live demo: futureartist-frontend-226638196775.us-central1.run.app _
| Mode | What you get |
|---|---|
| Children's Storybook | Cartoon illustrations + playful narration per scene |
| Marketing Campaign | Brand visuals + inspiring structured copy |
| Educational | Clean diagrams + professional tone |
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Google Cloud Run β
β β
β ββββββββββββββββββββββββ βββββββββββββββββββββββββββββ β
β β Next.js Frontend β β FastAPI Backend β β
β β (port 3000) β β (port 8000) β β
β β β β β β
β β StoryCreator UI βββββββββΊβ /api/story (REST) β β
β β ControlPanel β β /ws/story (WebSocket) β β
β β MediaViewer β β β β
β β AudioPlayer β β βββββββββββββββββββββββ β β
β ββββββββββββββββββββββββ β β Orchestrator Agent β β β
β β² WebSocket stream β β (Google ADK) β β β
β β chunks in real-time β ββββββββββββ¬βββββββββββ β β
β ββββββββββββββββββββββββββββββββββββββββ β β
β β β β β
β β ββββββββΌβββββββββββββββ β β
β β β Multi-Agent Pipelineβ β β
β β β β β β
β β β Story Planner Agent β β β
β β β Style Director Agent β β β
β β β Text Generator Agent β β β
β β β Image Generator Agentβ β β
β β β Audio Generator Agentβ β β
β β ββββββββββββ¬ββββββββββββ β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββΌβββββββββββββββ
β Google Gemini API β
β β
β gemini-2.5-flash β
β ββ Text generation β
β ββ Image generation β
βββββββββββββββββββββββββββββββ
Request flow:
- User fills the form and clicks Generate
- Frontend opens a WebSocket connection to the backend
- Backend Orchestrator (Google ADK) kicks off the agent pipeline
- Each agent calls Gemini API and streams results back through the Orchestrator
- Backend sends chunks over WebSocket as they arrive β
text,image,audiotyped - Frontend renders each chunk inline as it streams β story builds in real time
User Request
βββΊ Story Planner β narrative structure, scene breakdown, character bios
βββΊ Style Director β visual style guide, color palette, character consistency rules
βββΊ Text Generator β scene-by-scene story text (streamed)
βββΊ Image Generator β scene illustrations via Gemini image generation (streamed)
βββΊ Audio Generator β narration config; browser reads full text via Web Speech API
| Layer | Technology |
|---|---|
| AI Model | Gemini 2.5 Flash (text + image generation) |
| Agent Framework | Google ADK (Agent Development Kit) |
| Backend | Python 3.11, FastAPI, Uvicorn |
| Frontend | Next.js 14, TypeScript, Tailwind CSS |
| Streaming | WebSocket (real-time chunk delivery) |
| TTS | Web Speech API (browser-native, tone-matched) |
| Hosting | Google Cloud Run (us-central1) |
- Python 3.11+
- Node.js 18+
- A Gemini API key β get one free at aistudio.google.com/apikey
git clone https://github.com/stevenchendan/futureArtist.git
cd futureArtistcd backend
# Create and activate virtual environment
python -m venv venv
source venv/bin/activate # macOS/Linux
# venv\Scripts\activate # Windows
# Install dependencies
pip install -r requirements.txt
# Configure environment
cp .env.example .envOpen backend/.env and set your API key β this is the only required change:
GEMINI_API_KEY=your-gemini-api-key-hereStart the backend:
python -m app.adk.mainConfirm it's running:
curl http://localhost:8000/health
# β {"status":"healthy","services":{"gemini":"connected","storage":"connected"}}Open a new terminal from the repo root:
cd futureArtist/frontend # or just: cd ../frontend if you're still in backend/
npm install
npm run devThe frontend connects to http://localhost:8000 by default. No .env.local changes needed.
| Field | Value |
|---|---|
| Story Topic | A brave little robot who gets lost in a magical forest |
| Story Type | Storybook |
| Tone | Playful |
| Target Audience | Children |
| Visual Style | Cartoon |
| Include | Text + Images + Audio |
Expected: Multi-scene illustrated storybook with cartoon images inline, audio player per scene, and Reading Mode button.
| Field | Value |
|---|---|
| Story Topic | Launching a sustainable coffee brand for eco-conscious millennials |
| Story Type | Marketing |
| Tone | Inspiring |
| Target Audience | Young Adults |
| Visual Style | Modern |
| Include | Text + Images |
Expected: Structured marketing copy with on-brand visuals.
| Field | Value |
|---|---|
| Story Topic | How the human immune system fights viruses |
| Story Type | Educational |
| Tone | Professional |
| Target Audience | General Public |
| Visual Style | Minimalist |
| Include | Text + Images |
Expected: Clear, well-structured educational narrative with clean illustrations.
- Content streams progressively β text and images appear as each scene completes
- Images match the scene and visual style selected
- Audio player reads the full story text aloud with tone-matched rate/pitch
- Reading Mode β distraction-free full-width reading view, exits cleanly
| Variable | Required | Default | Description |
|---|---|---|---|
GEMINI_API_KEY |
Yes | β | Gemini API key from AI Studio |
GEMINI_MODEL |
No | gemini-2.5-flash |
Model to use |
GOOGLE_CLOUD_PROJECT |
No | β | GCP project (only needed for Cloud deployment) |
PORT |
No | 8000 |
Server port |
ALLOWED_ORIGINS |
No | * |
CORS origins |
| Variable | Default | Description |
|---|---|---|
NEXT_PUBLIC_API_URL |
http://localhost:8000 |
Backend HTTP URL |
NEXT_PUBLIC_WS_URL |
ws://localhost:8000 |
Backend WebSocket URL |
# Backend
cd backend
gcloud run deploy futureartist-backend \
--source . \
--region us-central1 \
--allow-unauthenticated \
--set-env-vars GEMINI_API_KEY=your-key,GEMINI_MODEL=gemini-2.5-flash
# Frontend
cd frontend
gcloud run deploy futureartist-frontend \
--source . \
--region us-central1 \
--allow-unauthenticatedfutureArtist/
βββ backend/
β βββ app/
β βββ adk/ # ADK entry point & config
β βββ agents/ # Story Planner, Style Director, Text/Image/Audio Generators
β βββ api/ # FastAPI routes & WebSocket handler
β βββ models/ # Pydantic data models
βββ frontend/
βββ src/
βββ components/
βββ StoryCreator/ # Main creation form
βββ MediaViewer/ # Streaming content renderer + AudioPlayer
βββ ControlPanel/ # Style, tone, audience controls
Steven (Liang) Chen Β· Kuan Yu
Built for the Gemini Live Agent Challenge 2026 β Creative Storyteller Track
This content was created for the purposes of entering the Gemini Live Agent Challenge hackathon. #GeminiLiveAgentChallenge
MIT β see LICENSE
