在一起 (Zài Yīqǐ) — Together

A live AI agent that translates, narrates, and creates memories — so people separated by distance can truly be there with each other.

Google NYC Build With AI Hackathon 2026

What It Does

在一起 is not a video call. It's not a translation app. It's presence.

Two people — speaking different languages, thousands of miles apart — have a natural conversation. No delays, no subtitles they have to read, no language barrier. They just talk, and they understand.

When the call ends, the system generates keepsakes: an Interactive Storybook (AI-generated illustrations woven with moments from the call) and a Memory Video (stylized images set to music).

Features

Feature	Description
Live Translation	Bidirectional speech translation via Gemini Live API
Live Captions	Translated text overlay in real time
Presence Narration	"Where are you?" triggers camera → AI describes the environment in the other person's language
Visual Captions	Scene analysis labels from Google Cloud Vision API
Interactive Storybook	Post-call: screenshots + voice snippets → Gemini Interleaved Output → illustrated storybook
Memory Video	Post-call: stylized images → Veo video keepsake
Fallback Pipeline	ElevenLabs STT/TTS if Gemini is rate-limited
Billing Monitor	Live cost tracking with configurable alerts at $3, $5, $9, $12, $15, $19, $21, $23, $24.9

Tech Stack

Component	Service
Real-time translation	Gemini Live API (`gemini-2.5-flash-native-audio-preview`)
Scene analysis	Google Cloud Vision API
Storybook generation	Gemini Interleaved Output (`gemini-2.5-flash-image`)
Image stylization	Gemini image generation
Video generation	Veo (`veo-2.0-generate-001`)
Backend	Python / FastAPI / WebSockets
Hosting	Google Cloud Run
Fallback STT/TTS	ElevenLabs Scribe + Flash

Quick Start

Prerequisites

Python 3.11+
Google AI Studio API key
(Optional) Google Cloud project with Vision API enabled
(Optional) ElevenLabs API key for fallback

Setup

# Clone and enter
cd google-nyc-hackathon

# Create virtual environment
python -m venv .venv
source .venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Configure API keys
cp .env.example .env
# Edit .env with your keys

# Run
python server.py

Open http://localhost:8000 in two browser tabs. Join the same room with different languages. Start talking.

Demo URLs:

http://localhost:8000/demo/live (live translation demo)
http://localhost:8000/demo/memory (storybook + memory video demo)

Deploy to Cloud Run

# Build and deploy
gcloud run deploy together \
  --source . \
  --region us-central1 \
  --allow-unauthenticated \
  --set-env-vars "GOOGLE_API_KEY=your_key" \
  --session-affinity \
  --port 8080

The --session-affinity flag is important for WebSocket support.

After deploy, use:

https://<your-service-url>/demo/live
https://<your-service-url>/demo/memory

How It Works

Person A (Hindi)                    Person B (English)
     │                                    │
     ▼                                    ▼
[Mic] → Audio ─────────┬─────────── Audio ← [Mic]
                       │
                       ▼
              ┌─────────────────┐
              │  Cloud Run      │
              │  (FastAPI)      │
              │                 │
              │  Gemini Live    │◄── Bidirectional translation
              │  Vision API     │◄── Scene analysis (triggered)
              │  Screenshot     │◄── Capture for memorabilia
              └─────────────────┘
                       │
          ┌────────────┴────────────┐
          ▼                         ▼
   Translated audio          Translated audio
   + Hindi captions          + English captions

Post-call: Screenshots, voice snippets, and scene descriptions feed into Gemini Interleaved Output (storybook) and Veo (memory video).

Project Structure

├── server.py                  # FastAPI WebSocket server
├── storybook_generator.py     # Interactive Storybook pipeline
├── memory_video.py            # Memory Video pipeline
├── static/
│   ├── index.html             # Landing page
│   ├── participant.html       # Call UI
│   └── memories.html          # Post-call memorabilia page
├── requirements.txt
├── Dockerfile
└── .env.example

Hackathon Categories

Live Agents — Bidirectional voice translation with barge-in support, real-time captions, and vision-triggered environment narration.

Creative Storyteller — Gemini Interleaved Output generates a flowing storybook with AI-illustrated pages from a single prompt, weaving screenshots, dialogue, and scene descriptions.

The Close

在一起 means "together." That's what this is. Not a translation app. Not a video call. A way to actually be there — even when you can't.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.claude		.claude
.cursor/rules		.cursor/rules
static		static
transcripts		transcripts
.dockerignore		.dockerignore
.env		.env
.env.example		.env.example
.gitignore		.gitignore
AGENTS.md		AGENTS.md
BILLING_MONITOR.md		BILLING_MONITOR.md
Deepgram code sample.rtf		Deepgram code sample.rtf
Dockerfile		Dockerfile
README.md		README.md
VOICE_PIPELINE_INTEGRATION.md		VOICE_PIPELINE_INTEGRATION.md
billing_monitor.py		billing_monitor.py
deploy.sh		deploy.sh
elevenlabs code sample.rtf		elevenlabs code sample.rtf
memory_video.py		memory_video.py
requirements.txt		requirements.txt
seed_demo.py		seed_demo.py
server.py		server.py
share.sh		share.sh
skills.md		skills.md
storybook_generator.py		storybook_generator.py
test_billing.py		test_billing.py
voice pipeline.rtf		voice pipeline.rtf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

在一起 (Zài Yīqǐ) — Together

What It Does

Features

Tech Stack

Quick Start

Prerequisites

Setup

Deploy to Cloud Run

How It Works

Project Structure

Hackathon Categories

The Close

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

在一起 (Zài Yīqǐ) — Together

What It Does

Features

Tech Stack

Quick Start

Prerequisites

Setup

Deploy to Cloud Run

How It Works

Project Structure

Hackathon Categories

The Close

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages