Skip to content

stevenchendan/futureArtist

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

23 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Future Artist β€” Creative Storyteller AI Agent

Gemini Live Agent Challenge 2026 · Track: Creative Storyteller ✍️

Future Artist is a multimodal AI storytelling platform. Give it a topic, a tone, and a target audience β€” it generates a complete story with text, AI-illustrated images, and narrated audio streamed in real time.

Built with Google Gemini 2.5 Flash Β· Google ADK Β· FastAPI Β· Next.js 14


Live demo: futureartist-frontend-226638196775.us-central1.run.app _

Mode What you get
Children's Storybook Cartoon illustrations + playful narration per scene
Marketing Campaign Brand visuals + inspiring structured copy
Educational Clean diagrams + professional tone

System Architecture

System Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                        Google Cloud Run                         β”‚
β”‚                                                                 β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚   Next.js Frontend   β”‚        β”‚    FastAPI Backend         β”‚  β”‚
β”‚  β”‚   (port 3000)        β”‚        β”‚    (port 8000)             β”‚  β”‚
β”‚  β”‚                      β”‚        β”‚                            β”‚  β”‚
β”‚  β”‚  StoryCreator UI     │◄──────►│  /api/story (REST)         β”‚  β”‚
β”‚  β”‚  ControlPanel        β”‚        β”‚  /ws/story  (WebSocket)    β”‚  β”‚
β”‚  β”‚  MediaViewer         β”‚        β”‚                            β”‚  β”‚
β”‚  β”‚  AudioPlayer         β”‚        β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜        β”‚  β”‚  Orchestrator Agent  β”‚  β”‚  β”‚
β”‚         β–²  WebSocket stream      β”‚  β”‚  (Google ADK)        β”‚  β”‚  β”‚
β”‚         β”‚  chunks in real-time   β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚  β”‚
β”‚         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜             β”‚  β”‚
β”‚                                  β”‚           β”‚                β”‚  β”‚
β”‚                                  β”‚    β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚  β”‚
β”‚                                  β”‚    β”‚  Multi-Agent Pipelineβ”‚ β”‚  β”‚
β”‚                                  β”‚    β”‚                      β”‚ β”‚  β”‚
β”‚                                  β”‚    β”‚ Story Planner Agent  β”‚ β”‚  β”‚
β”‚                                  β”‚    β”‚ Style Director Agent β”‚ β”‚  β”‚
β”‚                                  β”‚    β”‚ Text Generator Agent β”‚ β”‚  β”‚
β”‚                                  β”‚    β”‚ Image Generator Agentβ”‚ β”‚  β”‚
β”‚                                  β”‚    β”‚ Audio Generator Agentβ”‚ β”‚  β”‚
β”‚                                  β”‚    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”‚β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                                   β”‚
                                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                                    β”‚        Google Gemini API     β”‚
                                    β”‚                              β”‚
                                    β”‚  gemini-2.5-flash            β”‚
                                    β”‚  β”œβ”€ Text generation          β”‚
                                    β”‚  └─ Image generation         β”‚
                                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Request flow:

  1. User fills the form and clicks Generate
  2. Frontend opens a WebSocket connection to the backend
  3. Backend Orchestrator (Google ADK) kicks off the agent pipeline
  4. Each agent calls Gemini API and streams results back through the Orchestrator
  5. Backend sends chunks over WebSocket as they arrive β€” text, image, audio typed
  6. Frontend renders each chunk inline as it streams β€” story builds in real time

Agent Pipeline

User Request
    └─► Story Planner      β€” narrative structure, scene breakdown, character bios
        └─► Style Director  β€” visual style guide, color palette, character consistency rules
            └─► Text Generator  β€” scene-by-scene story text (streamed)
            └─► Image Generator β€” scene illustrations via Gemini image generation (streamed)
            └─► Audio Generator β€” narration config; browser reads full text via Web Speech API

Tech Stack

Layer Technology
AI Model Gemini 2.5 Flash (text + image generation)
Agent Framework Google ADK (Agent Development Kit)
Backend Python 3.11, FastAPI, Uvicorn
Frontend Next.js 14, TypeScript, Tailwind CSS
Streaming WebSocket (real-time chunk delivery)
TTS Web Speech API (browser-native, tone-matched)
Hosting Google Cloud Run (us-central1)

Local Setup

Prerequisites

1. Clone

git clone https://github.com/stevenchendan/futureArtist.git
cd futureArtist

2. Backend

cd backend

# Create and activate virtual environment
python -m venv venv
source venv/bin/activate        # macOS/Linux
# venv\Scripts\activate         # Windows

# Install dependencies
pip install -r requirements.txt

# Configure environment
cp .env.example .env

Open backend/.env and set your API key β€” this is the only required change:

GEMINI_API_KEY=your-gemini-api-key-here

Start the backend:

python -m app.adk.main

Confirm it's running:

curl http://localhost:8000/health
# β†’ {"status":"healthy","services":{"gemini":"connected","storage":"connected"}}

3. Frontend

Open a new terminal from the repo root:

cd futureArtist/frontend   # or just: cd ../frontend if you're still in backend/
npm install
npm run dev

The frontend connects to http://localhost:8000 by default. No .env.local changes needed.

Open http://localhost:3000


Reproducible Test Scenarios

Scenario A β€” Children's Storybook

Field Value
Story Topic A brave little robot who gets lost in a magical forest
Story Type Storybook
Tone Playful
Target Audience Children
Visual Style Cartoon
Include Text + Images + Audio

Expected: Multi-scene illustrated storybook with cartoon images inline, audio player per scene, and Reading Mode button.

Scenario B β€” Marketing Campaign

Field Value
Story Topic Launching a sustainable coffee brand for eco-conscious millennials
Story Type Marketing
Tone Inspiring
Target Audience Young Adults
Visual Style Modern
Include Text + Images

Expected: Structured marketing copy with on-brand visuals.

Scenario C β€” Educational Explainer

Field Value
Story Topic How the human immune system fights viruses
Story Type Educational
Tone Professional
Target Audience General Public
Visual Style Minimalist
Include Text + Images

Expected: Clear, well-structured educational narrative with clean illustrations.

What to Verify

  • Content streams progressively β€” text and images appear as each scene completes
  • Images match the scene and visual style selected
  • Audio player reads the full story text aloud with tone-matched rate/pitch
  • Reading Mode β€” distraction-free full-width reading view, exits cleanly

Environment Variables

Backend (backend/.env)

Variable Required Default Description
GEMINI_API_KEY Yes β€” Gemini API key from AI Studio
GEMINI_MODEL No gemini-2.5-flash Model to use
GOOGLE_CLOUD_PROJECT No β€” GCP project (only needed for Cloud deployment)
PORT No 8000 Server port
ALLOWED_ORIGINS No * CORS origins

Frontend (frontend/.env.local)

Variable Default Description
NEXT_PUBLIC_API_URL http://localhost:8000 Backend HTTP URL
NEXT_PUBLIC_WS_URL ws://localhost:8000 Backend WebSocket URL

Google Cloud Deployment

# Backend
cd backend
gcloud run deploy futureartist-backend \
  --source . \
  --region us-central1 \
  --allow-unauthenticated \
  --set-env-vars GEMINI_API_KEY=your-key,GEMINI_MODEL=gemini-2.5-flash

# Frontend
cd frontend
gcloud run deploy futureartist-frontend \
  --source . \
  --region us-central1 \
  --allow-unauthenticated

Project Structure

futureArtist/
β”œβ”€β”€ backend/
β”‚   └── app/
β”‚       β”œβ”€β”€ adk/            # ADK entry point & config
β”‚       β”œβ”€β”€ agents/         # Story Planner, Style Director, Text/Image/Audio Generators
β”‚       β”œβ”€β”€ api/            # FastAPI routes & WebSocket handler
β”‚       └── models/         # Pydantic data models
└── frontend/
    └── src/
        └── components/
            β”œβ”€β”€ StoryCreator/   # Main creation form
            β”œβ”€β”€ MediaViewer/    # Streaming content renderer + AudioPlayer
            └── ControlPanel/   # Style, tone, audience controls

Team

Steven (Liang) Chen Β· Kuan Yu

Built for the Gemini Live Agent Challenge 2026 β€” Creative Storyteller Track

This content was created for the purposes of entering the Gemini Live Agent Challenge hackathon. #GeminiLiveAgentChallenge


License

MIT β€” see LICENSE

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors