Skip to content

kousen/SoraJava

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sora2Java - OpenAI Sora 2 Video Generation Java Client

License: MIT Java Spring Boot CI

This Spring Boot application demonstrates multiple Java approaches to integrate with OpenAI's Sora 2 video generation API. The implementation showcases Java's diverse async patterns, from traditional schedulers to modern virtual threads.

Key Insight: While documentation examples often show simple sleep() loops for clarity, production applications benefit from more sophisticated patterns. Java offers multiple approaches including the unique virtual threads feature that combines the simplicity of synchronous code with async performance.

The Fast-Food Restaurant Analogy

Imagine ordering lunch at a fast-food restaurant:

Synchronous (Blocking) Model:

  • You place your order
  • You and the cashier stand there, staring at each other, waiting for your food
  • You can't leave the counter
  • The cashier can't take anyone else's order
  • If you have as many customers as cashiers, the entire restaurant is "blocked on I/O"

Asynchronous (Polling) Model:

  • You place your order and get a receipt with a number
  • You can grab a table, get napkins, or get a drink
  • The cashier can immediately take other customers' orders
  • Two or three good cashiers can handle the entire lunch rush
  • You periodically check the screen for your number
  • The system is more complicated, but it scales much better

Webhook Model:

  • Like the polling model, but the cashier calls your number when ready
  • No need to check the screen - you're notified
  • Even more efficient for the customer

This is exactly why OpenAI and Google use async models for video generation: generating a video takes 2-4 minutes regardless of the approach, but with async patterns, the server doesn't tie up threads waiting. It's all about scaling for the server.

Why Video Generation is Inherently Asynchronous

Unlike text or image generation (which are synchronous request-response), video generation APIs are inherently asynchronous:

  1. Initial POST → Returns a video_id immediately
  2. Poll or Wait → Check status periodically or register webhook
  3. Download → Retrieve completed video when ready

Time to generate: 2-4 minutes (both sync and async take the same time) Difference: With async, threads aren't blocked, allowing servers to handle 1000x more concurrent requests

Features

  • Multiple Client Implementations:

    • Pure Java HttpClient (Java 11+)
    • Spring RestClient (Spring 6+)
  • Two Video Generation Modes:

    • Text-to-Video — Generate videos from text prompts
    • Image-to-Video — Animate static images (e.g., American Gothic square dancing!)
  • Four Polling Strategies:

    • VirtualThread — Java 21+ virtual threads ⭐ RECOMMENDED
    • FixedRate — ScheduledExecutorService with fixed intervals
    • SelfScheduling — Dynamic self-rescheduling polling
    • Reactive — Spring WebFlux reactive streams
  • Real-time Progress Tracking — Visual progress bars with percentage updates

  • Webhook Support — Alternative to polling (OpenAI sends notifications)

  • REST API Endpoints — Test all strategies via HTTP

  • Interactive CLI Demo — Side-by-side comparison tool

Prerequisites

  1. Java 21+ (Java 21 or 25 LTS recommended for virtual threads)
  2. Gradle (wrapper included)
  3. OpenAI API Key with Sora 2 access — Set OPENAI_API_KEY environment variable

Getting Started

1. Set up your API key

export OPENAI_API_KEY="sk-proj-..."

2. Build the project

./gradlew build

3. Run the interactive demo

./gradlew run

4. Or run as Spring Boot application

./gradlew bootRun

The application starts on http://localhost:8080

API Endpoints

Text-to-Video Generation (different strategies)

  • POST /api/video/generate/virtualthread
  • POST /api/video/generate/fixedrate
  • POST /api/video/generate/selfscheduling
  • POST /api/video/generate/reactive
  • POST /api/video/generate/restclient
  • POST /api/video/generate/httpclient

Request body:

{
  "prompt": "A serene mountain landscape at sunset with golden light"
}

Image-to-Video Generation

  • POST /api/video/generate/image-to-video

Request body:

{
  "prompt": "The couple suddenly smile and begin square dancing together",
  "image_url": "https://upload.wikimedia.org/wikipedia/commons/c/cc/Grant_Wood_-_American_Gothic_-_Google_Art_Project.jpg"
}

Note: Images are automatically resized to match requested video dimensions. The API requires exact dimension matching, not just aspect ratio.

Utility Endpoints

  • GET /api/video/strategies — List available strategies with descriptions
  • GET /api/video/health — Health check

Webhook Endpoint

  • POST /api/webhook/sora — Receive OpenAI webhook events

Example Usage

Using cURL

Text-to-Video:

curl -X POST http://localhost:8080/api/video/generate/virtualthread \
  -H "Content-Type: application/json" \
  -d '{"prompt": "A cat playing with a ball of yarn in a sunny garden"}'

Image-to-Video:

curl -X POST http://localhost:8080/api/video/generate/image-to-video \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "The couple suddenly smile and begin square dancing together",
    "image_url": "https://upload.wikimedia.org/wikipedia/commons/c/cc/Grant_Wood_-_American_Gothic_-_Google_Art_Project.jpg"
  }'

Interactive Demo

./gradlew run

The demo presents an interactive menu where you can:

  • Choose which polling strategy to test (text-to-video)
  • Test image-to-video with American Gothic painting
  • Enter custom prompts
  • See real-time progress bars during generation
  • Compare performance across strategies

Configuration

Edit src/main/resources/application.properties:

# OpenAI API Configuration
openai.api.key=${OPENAI_API_KEY}
sora.api.model=sora-2
sora.api.base-url=https://api.openai.com/v1

# Video Parameters
sora.video.size=1280x720
sora.video.seconds=8

# Polling Configuration
sora.polling.interval-seconds=5
sora.polling.max-timeout-minutes=10

# Output
sora.output.directory=./videos

Polling Strategy Comparison

Strategy Java Version Complexity Scalability Best For
VirtualThread 21+ Low Excellent Modern Java apps (recommended)
Reactive 8+ High Excellent High-concurrency reactive apps
FixedRate 8+ Medium Good Traditional enterprise apps
SelfScheduling 8+ Medium Good Dynamic interval adjustments
RestClient Spring 6+ Low Limited Simple use cases only
HttpClient 11+ Low Limited Simple use cases only

Why Virtual Threads Win

Virtual threads (Java 21+) give you the best of both worlds:

// Looks like simple blocking code with progress tracking
do {
    Thread.sleep(5000);
    status = client.checkVideoStatus(videoId);

    // Display progress bar if available
    if (status.progress() != null && status.progress() > 0) {
        int barLength = 30;
        int filledLength = (int) ((status.progress() / 100.0) * barLength);
        String bar = "=".repeat(filledLength) + "-".repeat(barLength - filledLength);
        System.out.print(String.format("\r%s: [%s] %d%%",
                status.status(), bar, status.progress()));
    }
} while (!status.isDone());

But it's running on virtual threads:

  • Extremely lightweight (millions possible)
  • No thread pool exhaustion
  • Simple, readable code
  • Excellent performance
  • Easy to add features like progress tracking

This is dramatically better than Python/JS busy-waiting loops because:

  1. Python/JS block OS threads (expensive, limited)
  2. Java virtual threads are cheap and plentiful
  3. Same simple code, vastly better scalability

Comparison with Other Languages

Documentation Examples (Python, JS, etc.):

import time

operation = client.models.generate_videos(...)

# Simple polling loop - clear and educational
while not operation.done:
    print("Waiting...")
    time.sleep(10)
    operation = client.operations.get(operation)

Production Patterns:

All modern languages offer sophisticated async patterns:

  • Python: asyncio with async/await for non-blocking I/O
  • JavaScript: Promises and async/await with event loop
  • Java: Multiple options including virtual threads

Java Virtual Threads (unique advantage):

// Looks like simple blocking code
do {
    Thread.sleep(5000);
    status = client.checkVideoStatus(videoId);
} while (!status.isDone());

// But runs on lightweight virtual threads
// Scales to millions of concurrent operations

Why Virtual Threads Stand Out:

  • Write familiar synchronous-looking code
  • Get async performance automatically
  • No need for async/await keywords throughout codebase
  • Existing blocking libraries work without modification

For production web applications, this project demonstrates how Java's diverse async toolkit (schedulers, reactive streams, virtual threads) provides flexibility for different architectural needs.

Cost Considerations

Pricing

  • Sora 2: $0.10/second
  • Sora 2 Pro: $0.30/second
  • Example: 8-second video @ 720p = $0.80

⚠️ Important: Each video generation costs money. The interactive demo warns you before each generation.

Project Structure

src/main/java/com/kousenit/sora2java/
├── client/              # HTTP client implementations
│   ├── SoraVideoClient.java              # Text-to-video interface
│   ├── SoraImageVideoClient.java         # Image-to-video interface
│   ├── HttpClientSoraVideoClient.java
│   ├── RestClientSoraVideoClient.java
│   └── RestClientSoraImageVideoClient.java  # Image-to-video with multipart upload
├── controller/          # REST endpoints
│   ├── VideoGenerationController.java    # Both text & image-to-video endpoints
│   └── WebhookController.java
├── model/               # Data models (records)
│   └── SoraRecords.java                  # VideoRequest, ImageVideoRequest, VideoStatus, etc.
├── service/             # Business logic and polling strategies
│   ├── PollingStrategy.java (sealed interface)
│   ├── VirtualThreadPollingStrategy.java ⭐  # Text-to-video with progress bars
│   ├── FixedRatePollingStrategy.java
│   ├── SelfSchedulingPollingStrategy.java
│   ├── ReactivePollingStrategy.java
│   ├── ImageVideoGenerationService.java   # Image-to-video with progress bars
│   └── VideoGenerationService.java
├── SoraVideoDemo.java   # Interactive CLI demo (text & image-to-video)
└── Sora2JavaApplication.java

Modern Java Features

This project showcases cutting-edge Java features:

  • Virtual Threads (Java 21+) — Lightweight concurrency
  • Records — Immutable data models
  • Sealed Interfaces — Compile-time exhaustiveness checking
  • Pattern Matching — Enhanced switch expressions
  • Text Blocks — Multi-line strings

Educational Value

This project demonstrates:

  1. Production-Ready Integration — Real OpenAI Sora 2 API with proper error handling
  2. Async Pattern Comparison — 4 different concurrency approaches
  3. Multiple Input Modalities — Both text-to-video and image-to-video generation
  4. Modern Java Showcase — Records, sealed interfaces, virtual threads, progress tracking
  5. Fast-Food Analogy — Clear explanation of async benefits
  6. Multipart File Uploads — Automatic image download, resize, and upload

Perfect for:

  • Java developers learning AI API integration
  • Understanding when and why to use async patterns
  • Comparing different Java concurrency approaches
  • Creating educational YouTube content about async programming
  • Learning how to handle multipart form-data with Spring RestClient

Additional Documentation

License

MIT License - Feel free to use this for learning, teaching, or production projects.

Contributing

This is an educational project demonstrating async patterns in Java. Feel free to:

  • Use it in your training courses
  • Create YouTube videos about it
  • Adapt the patterns for your own projects
  • Submit PRs with improvements

Built withJava 21, 🍃 Spring Boot 3.5, and the belief that Java's async patterns deserve more recognition in the AI/ML integration space.

About

Generate Sora 2 videos using Java

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

Packages

No packages published

Contributors 2

  •  
  •  

Languages