🚀 Prizm - The Ultimate AI Media Processing Platform

The most advanced multi-provider AI media processing platform ever built! Prizm unifies 15+ AI ### 3. Fluent API (Zero Config)

import { $$ } from 'prizm';

// NEW: Single await pattern (cleanest)
const speech = await $$("elevenlabs")("voice-id")("Hello world!");
const image = await $$("replicate")("flux-schnell")("A beautiful sunset");
const video = await $$("runway")("gen-3")("Dancing robot", { duration: 5 });

// LEGACY: Double await pattern (still works)
const speechLegacy = await (await $("elevenlabs")("voice-id"))("Hello world!");

4. Smart Asset Loadingports 500+ models, and features Dynamic Provider & Service Loading - enabling a decentralized ecosystem with Go-like module loading.

🌐 NEW: Dynamic Provider & Service Loading (June 2025)

The breakthrough feature for ecosystem building: Load providers and services from URLs at runtime!

// 🔄 Load providers from GitHub repositories
const provider = await getProvider('https://github.com/company/custom-ai-provider');

// 📦 Load providers from NPM packages  
const provider = await getProvider('@company/enterprise-provider@2.1.0');

// 🤝 Providers dynamically load their service dependencies
await provider.configure({
  serviceUrl: 'github:company/gpu-accelerated-service@v2.0.0',
  serviceConfig: { enableGPU: true, memory: '24GB' }
});

// 🚀 Zero-setup deployment
const result = await provider.getModel('custom-model').transform(input);

Benefits:

🔄 Dynamic Dependencies: Providers specify exact service needs via URL
📦 Decentralized Ecosystem: Community-driven provider/service development
🚀 Zero Setup: Just specify URL, everything else automatic
🔒 Service Isolation: Each provider can use different service versions
🌐 Version Control: Use semantic versioning for reproducible deployments

➡️ Quick Start: Dynamic Loading Guide
➡️ Architecture: Dynamic Loading System

🔥 Universal Role Compatibility - GAME CHANGER

The breakthrough feature that changes everything: Any asset can be input to any model through automatic provider-based conversions.

// ✨ Text → Image → Video pipeline (automatically!)
const textAsset = TextAsset.fromString("A sunset over mountains");
const video = await imageToVideoModel.transform(textAsset);
// Behind the scenes: Text →(DALL-E)→ Image →(Runway)→ Video

// ✨ Video → Audio extraction (automatically!)  
const videoAsset = VideoAsset.fromFile('movie.mp4');
const audio = await audioModel.transform(videoAsset);
// Behind the scenes: Video →(FFmpeg)→ Audio

// ✨ Audio → Text transcription (automatically!)
const audioAsset = AudioAsset.fromFile('speech.wav');
const transcript = await textModel.transform(audioAsset);
// Behind the scenes: Audio →(Whisper)→ Text

// 🎯 The magic: inputAsset.asRole(RequiredType)

Benefits:

🌍 Universal Input: ANY asset → ANY model
🔄 Automatic Pipelines: Complex workflows become simple
🛡️ Type Safe: Full TypeScript support
🚀 Future Proof: New providers enhance ALL assets
🎨 Composable: Chain any transformations seamlessly

⚡ One-Line Magic

// NEW: Single await - ultra clean!
const image = await $$("replicate")("flux-schnell")("A majestic dragon");
const speech = await $$("elevenlabs")("voice-id")("Hello world!");

// LEGACY: Double await - still works
const image = await (await $("replicate")("flux-schnell"))("A majestic dragon");
const speech = await (await $("elevenlabs")("voice-id"))("Hello world!");

// Core SDK - maximum control  
const registry = ProviderRegistry.getInstance();
const provider = await registry.getProvider('elevenlabs');
const model = await provider.createTextToAudioModel('voice-id');
const result = await model.transform(Text.fromString(input), options);

// REST API - language agnostic
POST /api/v1/transform/elevenlabs/voice-id
{ capability: 'text-to-audio', input: 'Hello world!' }

🏗️ Layered Architecture

Prizm provides the unified platform to make it happen!

Dynamic Loading - Go-like module loading: getProvider('github:owner/repo@v1.0.0')
Provider → Service - Dynamic service dependency management
Core SDK - provider→model→transform foundation
Fluent API - zero-config one-liners: $("provider")("model")(input)
REST API - language-agnostic HTTP interface
Job System - async workflows with generation chains
Smart Assets - format-agnostic loading with auto-detection
Asset Utilities - rich helper methods for manipulation
Type Guards - runtime safety and role checking
Provider Utils - discovery and health management
Job Management - complete workflow orchestration
Format Registry - extensible format detection system

🎯 Target Users

AI Agent Frameworks (LangChain, AutoGen, custom agents)
Multi-modal Applications requiring consistent media transformation
Workflow Orchestrators needing reliable media processing
Developer Tools that want to add media capabilities
Production Applications requiring enterprise-grade media infrastructure

🌟 World-Class Features

🌐 Dynamic Loading: Load providers from GitHub/NPM at runtime with Go-like module system
🤝 Provider → Service: Providers automatically load and manage their service dependencies
🔌 15+ AI Providers: FAL.ai, Replicate, Together.ai, OpenRouter, HuggingFace, OpenAI + Local Docker Services
🧠 500+ AI Models: Access any model through unified interfaces with dynamic discovery
🎨 Smart Asset System: Load any format, get the right capabilities automatically
🐳 Docker Services: Local FFMPEG, Chatterbox TTS, Whisper STT for privacy and control
🔄 Job System: Async processing with complete generation chain tracking
💰 Cost Optimization: Automatic free model detection and intelligent provider selection
🛡️ Enterprise Ready: Auto-scaling, failover, load balancing, comprehensive testing
📱 Language Agnostic: REST API works with any programming language

🚀 Quick Start

1. Dynamic Provider Loading (NEW!)

import { getProvider } from 'prizm';

// Load provider from GitHub
const provider = await getProvider('https://github.com/company/custom-provider');

// Configure with dynamic service
await provider.configure({
  serviceUrl: 'github:company/specialized-service@v1.0.0',
  serviceConfig: { enableGPU: true }
});

// Use like any other provider
const result = await provider.getModel('custom-model').transform(input);

2. Core SDK Usage

import { ProviderRegistry, Text } from 'prizm';

const registry = ProviderRegistry.getInstance();
const provider = await registry.getProvider('elevenlabs');
const model = await provider.createTextToAudioModel('voice-id');
const result = await model.transform(Text.fromString("Hello world!"), options);

3. Fluent API (Zero Config)

import { $$ } from 'prizm';

// NEW: Single await pattern (cleanest)
const speech = await $$("elevenlabs")("voice-id")("Hello world!");
const image = await $$("replicate")("flux-schnell")("A beautiful sunset");
const video = await $$("runway")("gen-3")("Dancing robot", { duration: 5 });

// LEGACY: Double await pattern (still works)
const speechLegacy = await (await $("elevenlabs")("voice-id"))("Hello world!");

4. Smart Asset Loading

import { AssetLoader } from 'prizm';

const asset = AssetLoader.load('video.mp4');  // Auto-detects format + roles
const video = await asset.asVideo();           // Type-safe video access
const audio = await asset.extractAudio();     // FFmpeg integration

5. REST API (Any Language)

# Start the server
npm install && npm run dev

# Make requests from any language
curl -X POST http://localhost:3000/api/v1/transform/replicate/flux-schnell \
  -H "Content-Type: application/json" \
  -d '{"capability": "text-to-image", "input": "A majestic dragon"}'

🏗️ Prizm SDK Architecture

Prizm provides the unified platform to make it happen!

Dynamic Loading - Go-like module loading: getProvider('github:owner/repo@v1.0.0')
Provider → Service - Dynamic service dependency management
Core SDK - provider→model→transform foundation
Fluent API - zero-config one-liners: $("provider")("model")(input)
REST API - language-agnostic HTTP interface
Job System - async workflows with generation chains
Smart Assets - format-agnostic loading with auto-detection
Asset Utilities - rich helper methods for manipulation
Type Guards - runtime safety and role checking
Provider Utils - discovery and health management
Job Management - complete workflow orchestration
Format Registry - extensible format detection system

Layer 1: Dynamic Loading (Ecosystem Building)

// Load providers from any source
const provider = await getProvider('github:company/ai-provider@v2.1.0');
const provider = await getProvider('@company/enterprise-provider@latest');
const provider = await getProvider('file:///path/to/local/provider');

// Providers auto-load their service dependencies
await provider.configure({
  serviceUrl: 'github:company/gpu-service@v1.0.0',
  serviceConfig: { enableGPU: true, memory: '24GB' }
});

Layer 2: Core SDK (Maximum Control)

// Full control over every aspect
const registry = ProviderRegistry.getInstance();
const provider = await registry.getProvider('replicate');
const models = provider.getModelsForCapability('text-to-image');
const model = await provider.getModel('flux-schnell');
const result = await model.transform(input, { steps: 4, aspect_ratio: "16:9" });

Layer 2: Smart Asset System (Format-Agnostic)

// Zero-config asset loading with auto-detection
const asset = AssetLoader.load('video.mp4');  // Auto-detects: Video + Audio + Speech
const formatInfo = AssetLoader.getFormatInfo('video.mp4');
const canDoSpeech = AssetLoader.supportsRoles('video.mp4', ['speech']);

Layer 3: Job System (Production Workflows)

// Async processing with generation chains
const { jobId } = await fetch('/api/v1/transform/replicate/flux-schnell', {
  method: 'POST',
  body: JSON.stringify({ capability: 'text-to-image', input: 'Dragon' })
});
const result = await pollJobUntilComplete(jobId);

📁 Prizm SDK Structure

src/
├── media/                     # Core Prizm SDK
│   ├── registry/             # Provider registry and bootstrapping
│   ├── providers/            # Provider implementations
│   │   ├── elevenlabs/      # ElevenLabs TTS provider package
│   │   ├── falai/           # FAL.ai provider package
│   │   ├── replicate/       # Replicate provider package
│   │   ├── together/        # Together.ai provider package
│   │   ├── openrouter/      # OpenRouter provider package
│   │   ├── creatify/        # Creatify AI avatar provider package
│   │   └── docker/          # Docker-based local providers
│   │       ├── zonos/       # Zonos TTS voice cloning
│   │       ├── huggingface/ # HuggingFace models
│   │       ├── chatterbox/  # Chatterbox TTS
│   │       └── ffmpeg/      # FFMPEG processing
│   ├── assets/              # Smart asset loading system
│   │   ├── roles/           # Role-based asset classes (Audio, Video, Text, Image)
│   │   ├── mixins/          # Role mixin implementations
│   │   └── SmartAssetFactory.ts  # Format-agnostic asset loading
│   ├── fluent/              # Fluent API ($("provider")("model") syntax)
│   ├── capabilities/        # Provider capability system
│   ├── models/              # Model abstractions and implementations
│   └── types/               # TypeScript type definitions
├── app/api/v1/              # REST API endpoints
│   ├── transform/           # Transformation endpoints
│   ├── jobs/                # Job management system
│   ├── providers/           # Provider discovery endpoints
│   └── capabilities/        # Capability listing endpoints
└── services/                # Base Docker service management

services/                    # Docker service configurations
├── ffmpeg/                 # FFMPEG video processing service
├── chatterbox/             # Text-to-speech service
└── whisper/                # Speech-to-text service

📦 Installation & Setup

NPM Package (Coming Soon)

npm install prizm

Development Setup

git clone https://github.com/your-org/prizm
cd prizm
npm install
npm run dev

🔧 Environment Configuration

# API Provider Keys (add the ones you want to use)
FALAI_API_KEY=your_fal_ai_key
REPLICATE_API_TOKEN=your_replicate_token
OPENROUTER_API_KEY=your_openrouter_key

# Docker Service URLs (optional - for local services)
FFMPEG_SERVICE_URL=http://localhost:8006
CHATTERBOX_DOCKER_URL=http://localhost:8004
WHISPER_SERVICE_URL=http://localhost:9000

📚 Documentation

Getting Started

Quick Start Guide - Set up and first transformation
ElevenLabs Integration - Premium text-to-speech setup
TypeScript Migration Guide - Fix common issues

API References

Fluent API Complete Reference - All syntax patterns
Provider Documentation - Provider-specific guides
Asset Roles - Working with media assets

Implementation Guides

🏗️ Architecture & Development

🔌 Provider Registry Deep Dive - Centralized provider management
🧠 Model Discovery Deep Dive - How models are dynamically discovered
🎨 Asset & Role System Architecture - Smart asset loading and roles
🎬 Video Composition - Advanced video composition
🧪 Testing Guide - Comprehensive testing strategy
🔧 Environment Configuration Guide - Setting up environment variables
📡 API Reference - Complete REST API documentation

🎪 Epic Examples

� AI Agent Integration

// LangChain agent using Prizm for media capabilities
import { ProviderRegistry } from 'prizm';

class MediaCapableAgent extends Agent {
  constructor() {
    super();
    this.registry = ProviderRegistry.getInstance();
  }
  
  async createMarketingCampaign(description: string) {
    // Generate copy
    const copy = await this.generateCopy(description);
    
    // Create visuals with Prizm
    const provider = await this.registry.getProvider('replicate');
    const model = await provider.getModel('flux-pro');
    const heroImage = await model.transform(copy, { aspect_ratio: "16:9" });
    
    return { copy, heroImage };
  }
}

🌈 Ultimate Marketing Pipeline

// Script → Images → Animation → Voiceover → Composition
const script = await $("openrouter")("deepseek/deepseek-chat:free")("Write epic marketing script");
const visuals = await $("replicate")("flux-pro")(script);
const animation = await $("runway")("gen-3")(visuals, { duration: 5 });
const voiceover = await $("chatterbox")("voice-clone")(script);
const final = await $("ffmpeg")("compose")([animation, voiceover]);

🎨 Smart Asset Processing

// Load any format, get all capabilities automatically
const asset = AssetLoader.load('mystery-file.???');  // Works with ANY format!

if (hasVideoRole(asset)) {
  const video = await asset.asVideo();
  const thumbnail = await asset.extractFrame(1.0);
}

if (hasAudioRole(asset)) {
  const audio = await asset.asAudio();
  const transcript = await asset.transcribe();
}

🧪 Testing

# Run all tests
npm test

# Test specific components
npm run test:providers   # Test provider integrations
npm run test:assets     # Test asset loading system
npm run test:api        # Test REST API endpoints

🤝 Contributing

Fork the repository
Create a feature branch: git checkout -b feature/amazing-feature
Commit your changes: git commit -m 'Add amazing feature'
Push to the branch: git push origin feature/amazing-feature
Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🌟 Star History

⭐ Star this repo if Prizm helps your project!

Built with ❤️ by the Prizm Team

Making AI media transformation as simple as one line of code.

# Test video composition
npm run test:composition

# Test provider functionality  
npm run test:providers

# Test asset loading
npm run test:assets

🐳 Docker Services

Start required Docker services:

# Start FFMPEG service
cd services/ffmpeg && docker-compose up -d

# Start Chatterbox TTS service
cd services/chatterbox && docker-compose up -d

# Start Whisper STT service
cd services/whisper && docker-compose up -d

# Start HuggingFace Text-to-Image service
cd services/huggingface && docker-compose up -d

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
.clinerules		.clinerules
.cursor		.cursor
.gemini		.gemini
.github/instructions		.github/instructions
.next		.next
.roo		.roo
.taskmaster		.taskmaster
.trae/rules		.trae/rules
.windsurf		.windsurf
cache		cache
docs		docs
examples		examples
scripts		scripts
services		services
src		src
test-cache		test-cache
test-videos		test-videos
tests		tests
.env.example		.env.example
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
.roomodes		.roomodes
.windsurfrules		.windsurfrules
AGENTS.md		AGENTS.md
ASSET_INGESTION_FLOW.md		ASSET_INGESTION_FLOW.md
ASSET_REFACTORING_PROPOSAL.md		ASSET_REFACTORING_PROPOSAL.md
ASYNC_ROLE_CASTING.md		ASYNC_ROLE_CASTING.md
CLAUDE.md		CLAUDE.md
COMPOSITION_BUILDER_REFACTORING.md		COMPOSITION_BUILDER_REFACTORING.md
DOCUMENTATION_UPDATE_COMPLETE.md		DOCUMENTATION_UPDATE_COMPLETE.md
FALAI_IMPLEMENTATION_SUMMARY.md		FALAI_IMPLEMENTATION_SUMMARY.md
FFMPEG_LOCAL_CLIENT_README.md		FFMPEG_LOCAL_CLIENT_README.md
FFMPEG_REFACTORING_SUCCESS.md		FFMPEG_REFACTORING_SUCCESS.md
HUGGINGFACE_ENHANCEMENT_SUCCESS_REPORT.md		HUGGINGFACE_ENHANCEMENT_SUCCESS_REPORT.md
HUGGINGFACE_TEXTTOAUDIO_IMPLEMENTATION.md		HUGGINGFACE_TEXTTOAUDIO_IMPLEMENTATION.md
KOKORO_DOCKER_INTEGRATION.md		KOKORO_DOCKER_INTEGRATION.md
N-VIDEO-COMPOSITION-ENHANCEMENT.md		N-VIDEO-COMPOSITION-ENHANCEMENT.md
PLATFORM_OVERVIEW.md		PLATFORM_OVERVIEW.md
PRIZM_ARCHITECTURE.md		PRIZM_ARCHITECTURE.md
README.md		README.md
SMART_ASSET_LOADING.md		SMART_ASSET_LOADING.md
STRUCTURED_JSON_SUPPORT.md		STRUCTURED_JSON_SUPPORT.md
TEXTTOAUDIO_REFACTORING_SUMMARY.md		TEXTTOAUDIO_REFACTORING_SUMMARY.md
UNIVERSAL_ROLE_COMPATIBILITY.md		UNIVERSAL_ROLE_COMPATIBILITY.md
UNIVERSAL_ROLE_COMPATIBILITY_COMPLETE.md		UNIVERSAL_ROLE_COMPATIBILITY_COMPLETE.md
VIDEO_TO_IMAGE_PROCESS.md		VIDEO_TO_IMAGE_PROCESS.md
WAVESPEED_DYNAMIC_DISCOVERY.md		WAVESPEED_DYNAMIC_DISCOVERY.md
ZONOS_DOCKER_INTEGRATION.md		ZONOS_DOCKER_INTEGRATION.md
ZONOS_HEALTH_CHECK_FIXES.md		ZONOS_HEALTH_CHECK_FIXES.md
ZONOS_LANGUAGE_FIX.md		ZONOS_LANGUAGE_FIX.md
ZONOS_UPDATE_SUMMARY.md		ZONOS_UPDATE_SUMMARY.md
alpha.png		alpha.png
check-framepack-alternatives.ts		check-framepack-alternatives.ts
check-job.ts		check-job.ts
check-openrouter-free-models.ts		check-openrouter-free-models.ts
check-together-free-models.ts		check-together-free-models.ts
concat_list.txt		concat_list.txt
confusion.wav		confusion.wav
debug-framepack-html.ts		debug-framepack-html.ts
debug-provider-registration.ts		debug-provider-registration.ts
debug-test.mp3		debug-test.mp3
debug-together-discovery.ts		debug-together-discovery.ts
debug-together-full-models.ts		debug-together-full-models.ts
debug-together-response-format.ts		debug-together-response-format.ts
debug-unclassified-models.ts		debug-unclassified-models.ts
fix-imports.ts		fix-imports.ts
fix-text-imports.js		fix-text-imports.js
framepack-debug.html		framepack-debug.html
generate-audio-test.ts		generate-audio-test.ts
get-openrouter-models.ts		get-openrouter-models.ts
middleware.ts		middleware.ts
next-env.d.ts		next-env.d.ts
next.config.js		next.config.js
package-lock.json		package-lock.json
package.json		package.json
pnpm-workspace.yaml		pnpm-workspace.yaml
postcss.config.js		postcss.config.js
rebuild-huggingface-service.bat		rebuild-huggingface-service.bat
restart-ffmpeg-service.bat		restart-ffmpeg-service.bat
start-zonos-service.bat		start-zonos-service.bat
tailwind.config.js		tailwind.config.js
temp_commit_message.txt		temp_commit_message.txt
test-all-falai-models.ts		test-all-falai-models.ts
test-all-providers.ts		test-all-providers.ts
test-api-comprehensive.js		test-api-comprehensive.js
test-api-debug-chain.js		test-api-debug-chain.js
test-api-debug.js		test-api-debug.js
test-api-full.ts		test-api-full.ts
test-api-simple-chain.js		test-api-simple-chain.js
test-api-simple.ts		test-api-simple.ts
test-async-roles.ts		test-async-roles.ts
test-canplayrole-improvements.ts		test-canplayrole-improvements.ts
test-concat-demuxer.ts		test-concat-demuxer.ts
test-configuration-driven-services.ts		test-configuration-driven-services.ts
test-direct-syntax.ts		test-direct-syntax.ts
test-dynamic-registries.ts		test-dynamic-registries.ts
test-elegant-pattern.ts		test-elegant-pattern.ts
test-enhanced-fluent.ts		test-enhanced-fluent.ts
test-enhanced-hf-service.py		test-enhanced-hf-service.py
test-enhanced-smartasset.ts		test-enhanced-smartasset.ts

Folders and files

Latest commit

History

Repository files navigation

🚀 Prizm - The Ultimate AI Media Processing Platform

4. Smart Asset Loadingports 500+ models, and features Dynamic Provider & Service Loading - enabling a decentralized ecosystem with Go-like module loading.

🌐 NEW: Dynamic Provider & Service Loading (June 2025)

🔥 Universal Role Compatibility - GAME CHANGER

⚡ One-Line Magic

🏗️ Layered Architecture

🎯 Target Users

🌟 World-Class Features

🚀 Quick Start

1. Dynamic Provider Loading (NEW!)

2. Core SDK Usage

3. Fluent API (Zero Config)

4. Smart Asset Loading

5. REST API (Any Language)

🏗️ Prizm SDK Architecture

Layer 1: Dynamic Loading (Ecosystem Building)

Layer 2: Core SDK (Maximum Control)

Layer 2: Smart Asset System (Format-Agnostic)

Layer 3: Job System (Production Workflows)

📁 Prizm SDK Structure

📦 Installation & Setup

NPM Package (Coming Soon)

Development Setup

🔧 Environment Configuration

📚 Documentation

Getting Started

API References

Implementation Guides

🏗️ Architecture & Development

🎪 Epic Examples

� AI Agent Integration

🌈 Ultimate Marketing Pipeline

🎨 Smart Asset Processing

🧪 Testing

🤝 Contributing

📄 License

🌟 Star History

🐳 Docker Services

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages