easy-ai-api is a typed, multimodal Python library that wraps multiple AI providers behind a single public API for:
- Text generation — chat, instructions, reasoning, tool use, batch, optional image inputs (vision)
- Speech transcription — audio to text with word timings and speaker diarization
- Speech synthesis — text to spoken audio
- Music generation — text prompt to instrumental audio
- Image generation — text to image
- Image transformation — prompt + input image to new image
- Image composition — prompt + base image + reference image to a new image
- Image editing — inpainting with optional mask
- Video generation — text/image/audio to video
- Lip sync — animate a face image with an audio track
All operations are available in synchronous and asynchronous variants. Provider names are resolved through a flexible alias system, so "gemini", "google", and "google-ai" all map to the same adapter.
- Python 3.12 or higher
- Dependencies installed automatically:
httpx,pydantic,Pillow
pip install easy-ai-apiFor local development (includes test and lint tools):
pip install -e ".[dev]"import easy_ai_api
print(easy_ai_api.__version__)Credentials are resolved lazily, in this order:
- Explicit
credentials={...}values passed to a call or toEasyAiApi(...). - Environment variables from the current process.
easy-ai-api does not auto-load .env files, keeping imports side-effect free.
cp .env.example .env
# Edit .env and fill only the providers you plan to use
export OPENAI_API_KEY="sk-..."
export GOOGLE_API_KEY="..."If your application uses python-dotenv, load it in the application entrypoint — not inside the library:
from dotenv import load_dotenv
load_dotenv()See docs/configuration.md for full details and per-call credential override patterns.
from easy_ai_api.text import generate
result = generate(
provider="openai",
instructions="Write a short slogan about rain in one sentence.",
credentials={"OPENAI_API_KEY": "sk-..."},
)
print(result.text)
print(result.cost_usd)Pass one or more images via input_images to describe them, transcribe text in the image, or answer questions grounded in visual content. Accepts local paths, raw bytes, or base64 strings.
from easy_ai_api.text import generate
result = generate(
provider="openai",
instructions="Describe the image in one concise paragraph.",
input_images=["/path/to/photo.jpg"],
credentials={"OPENAI_API_KEY": "sk-..."},
)
print(result.text)Vision-capable providers include openai, google, anthropic, and any provider whose selected model supports image inputs.
from easy_ai_api.audio import transcribe
result = transcribe(
provider="deepgram",
audio="/path/to/recording.mp3",
credentials={"DEEPGRAM_API_KEY": "..."},
)
print(result.text)
for word in result.words:
print(word.text, word.start_seconds, word.end_seconds)
for segment in result.speaker_segments:
print(segment.speaker, segment.text)from easy_ai_api.audio import synthesize
result = synthesize(
provider="elevenlabs",
text="Hello, this is a test of text-to-speech synthesis.",
credentials={"ELEVENLABS_API_KEY": "..."},
)
import base64
audio_bytes = base64.b64decode(result.audio_base64)
with open("output.mp3", "wb") as f:
f.write(audio_bytes)from easy_ai_api.audio import compose
result = compose(
provider="stability",
prompt="Upbeat lo-fi hip-hop, 80 BPM, mellow piano and drums",
duration_seconds=30,
credentials={"STABILITY_API_KEY": "..."},
)
import base64
audio_bytes = base64.b64decode(result.audio_base64)
with open("music.mp3", "wb") as f:
f.write(audio_bytes)from easy_ai_api.image import generate
result = generate(
provider="openai",
prompt="A cinematic lighthouse in a storm, dramatic lighting",
credentials={"OPENAI_API_KEY": "sk-..."},
)
import base64
image_bytes = base64.b64decode(result.image_base64)
with open("output.png", "wb") as f:
f.write(image_bytes)Takes an existing image and transforms it according to a prompt.
from easy_ai_api.image import transform
result = transform(
provider="stability",
prompt="Turn this photo into a watercolor painting",
image="/path/to/photo.jpg",
strength=0.75,
credentials={"STABILITY_API_KEY": "..."},
)
import base64
image_bytes = base64.b64decode(result.image_base64)
with open("transformed.png", "wb") as f:
f.write(image_bytes)Combines a base image with a reference image using a prompt. Use it to apply the style, pose, or scene of the reference image to the subject of the base image. Typical prompts: "render the person of image 1 in the pose of image 2", "image 1 in the style of image 2", "place the subject of image 1 in the scene of image 2".
from easy_ai_api.image import compose
result = compose(
provider="google",
prompt="Render the person of image 1 in the painting style of image 2",
image="/path/to/subject.jpg",
reference_image="/path/to/style_reference.jpg",
credentials={"GOOGLE_API_KEY": "..."},
)
import base64
image_bytes = base64.b64decode(result.image_base64)
with open("composed.png", "wb") as f:
f.write(image_bytes)Edits specific regions of an image using a prompt. Pass a mask to limit edits to the masked area.
from easy_ai_api.image import edit
result = edit(
provider="openai",
prompt="Replace the sky with a dramatic sunset",
image="/path/to/photo.png",
mask="/path/to/mask.png", # black pixels = edited (inpainted), white pixels = preserved
credentials={"OPENAI_API_KEY": "sk-..."},
)
import base64
image_bytes = base64.b64decode(result.image_base64)
with open("edited.png", "wb") as f:
f.write(image_bytes)Video files are written directly to a local path. Long-running jobs are polled until completion.
from easy_ai_api.video import generate
result = generate(
provider="runway",
prompt="A slow pan across a misty mountain valley at dawn",
output_path="output.mp4",
credentials={"RUNWAYML_API_SECRET": "..."},
)
print(result.output_path) # pathlib.Path to the written fileTo generate a video driven by both a reference image and an audio track:
from easy_ai_api.video import generate
result = generate(
provider="heygen",
image="/path/to/avatar.jpg",
audio="/path/to/voiceover.mp3",
output_path="avatar_video.mp4",
credentials={"HEYGEN_API_KEY": "..."},
)Animate a still image or avatar portrait using a supplied audio track.
from easy_ai_api.video import lipsync
result = lipsync(
provider="heygen",
image="/path/to/face.jpg",
audio="/path/to/speech.mp3",
output_path="lipsync.mp4",
credentials={"HEYGEN_API_KEY": "..."},
)
print(result.output_path)Use EasyAiApi to share credentials and default settings across many calls:
from easy_ai_api import EasyAiApi
client = EasyAiApi(
credentials={"OPENAI_API_KEY": "sk-..."},
timeout_seconds=90,
job_timeout_seconds=900,
max_retries=4,
)
text_result = client.text.generate(
provider="openai",
instructions="Summarize this in 3 bullets.",
context={"topic": "multimodal AI libraries"},
)
image_result = client.image.generate(
provider="openai",
prompt="Abstract geometric art in blue and gold",
)
composed = client.image.compose(
provider="google",
prompt="Put the person of image 1 in the scene of image 2",
image="/path/to/subject.jpg",
reference_image="/path/to/scene.jpg",
)Every helper has an _async variant. Combine them with asyncio.gather for concurrent requests:
import asyncio
from easy_ai_api.image import generate_async
from easy_ai_api.text import generate_async as text_generate_async
async def main() -> None:
text_task = text_generate_async(
provider="openai",
instructions="Write a one-sentence product description for a smart umbrella.",
credentials={"OPENAI_API_KEY": "sk-..."},
)
image_task = generate_async(
provider="openai",
prompt="A smart umbrella with solar panels and an LED display",
credentials={"OPENAI_API_KEY": "sk-..."},
)
text_result, image_result = await asyncio.gather(text_task, image_task)
print(text_result.text)
print(image_result.image_base64[:32])
asyncio.run(main())from easy_ai_api.text import batch_generate, batch_generate_async, generate, generate_async
from easy_ai_api.models import TextGenerationRequest, TextGenerationResult| Function | Description |
|---|---|
generate(provider, instructions, ...) |
Generate text from a single prompt. Accepts input_images for vision. |
generate_async(...) |
Async variant. |
batch_generate(requests, ...) |
Run multiple requests to the same provider concurrently. |
batch_generate_async(...) |
Async variant. |
from easy_ai_api.audio import (
compose, compose_async,
synthesize, synthesize_async,
transcribe, transcribe_async,
)
from easy_ai_api.models import (
MusicGenerationRequest, MusicGenerationResult,
SpeechSynthesisRequest, SpeechSynthesisResult,
SpeechTranscriptionRequest, SpeechTranscriptionResult,
)| Function | Description |
|---|---|
transcribe(provider, audio, ...) |
Transcribe audio to text with word timings and speaker diarization. |
synthesize(provider, text, ...) |
Synthesize speech audio from text. |
compose(provider, prompt, ...) |
Generate instrumental music from a text prompt. |
from easy_ai_api.image import (
compose, compose_async,
edit, edit_async,
generate, generate_async,
transform, transform_async,
)
from easy_ai_api.models import (
ImageCompositionRequest,
ImageEditRequest,
ImageGenerationRequest,
ImageResult,
ImageTransformationRequest,
)| Function | Description |
|---|---|
generate(provider, prompt, ...) |
Generate an image from a text prompt. |
transform(provider, prompt, image, ...) |
Transform an input image guided by a prompt. |
compose(provider, prompt, image, reference_image, ...) |
Combine a base image with a reference image using a prompt. |
edit(provider, prompt, image, ...) |
Edit an image region with an optional mask. |
from easy_ai_api.video import generate, generate_async, lipsync, lipsync_async
from easy_ai_api.models import LipSyncRequest, VideoGenerationRequest, VideoResult| Function | Description |
|---|---|
generate(provider, output_path, ...) |
Generate a video. Accepts optional prompt, image, and audio. |
lipsync(provider, image, audio, output_path, ...) |
Animate a face image with a supplied audio track. |
from easy_ai_api.exceptions import (
ConfigurationError,
EasyAiApiError,
IncompatibleParameterError,
InvalidParameterError,
InvalidProviderResponseError,
JobFailedError,
MissingCredentialError,
PricingUnavailableError,
ProviderTimeoutError,
TemporaryDownloadError,
UnsupportedModelError,
UnsupportedProviderError,
)See docs/errors.md for descriptions and handling patterns.
| Provider | Env var |
|---|---|
openai |
OPENAI_API_KEY |
groq |
GROQ_API_KEY |
together |
TOGETHER_API_KEY |
fireworks |
FIREWORKS_API_KEY |
deepseek |
DEEPSEEK_API_KEY |
openrouter |
OPENROUTER_API_KEY |
xai |
XAI_API_KEY |
mistral |
MISTRAL_API_KEY |
anthropic |
ANTHROPIC_API_KEY |
google |
GOOGLE_API_KEY |
cohere |
COHERE_API_KEY |
perplexity |
PERPLEXITY_API_KEY |
deepinfra |
DEEPINFRA_API_KEY |
huggingface |
HUGGINGFACE_API_KEY |
| Provider | Env var |
|---|---|
deepgram |
DEEPGRAM_API_KEY |
assemblyai |
ASSEMBLYAI_API_KEY |
speechmatics |
SPEECHMATICS_API_KEY |
revai |
REVAI_API_KEY |
| Provider | Env var |
|---|---|
cartesia |
CARTESIA_API_KEY |
azure |
AZURE_SPEECH_API_KEY, AZURE_SPEECH_REGION |
hume |
HUME_API_KEY |
elevenlabs |
ELEVENLABS_API_KEY |
murf |
MURF_API_KEY |
| Provider | Env var |
|---|---|
google |
GOOGLE_API_KEY |
elevenlabs |
ELEVENLABS_API_KEY |
stability |
STABILITY_API_KEY |
beatoven |
BEATOVEN_API_KEY |
loudly |
LOUDLY_API_KEY |
| Provider | Env var |
|---|---|
openai |
OPENAI_API_KEY |
google |
GOOGLE_API_KEY |
bfl |
BFL_API_KEY |
ideogram |
IDEOGRAM_API_KEY |
stability |
STABILITY_API_KEY |
hedra |
HEDRA_API_KEY |
| Provider | Env var |
|---|---|
openai |
OPENAI_API_KEY |
google |
GOOGLE_API_KEY |
bfl |
BFL_API_KEY |
ideogram |
IDEOGRAM_API_KEY |
stability |
STABILITY_API_KEY |
| Provider | Env var |
|---|---|
google |
GOOGLE_API_KEY |
bfl |
BFL_API_KEY |
| Provider | Env var |
|---|---|
openai |
OPENAI_API_KEY |
google |
GOOGLE_API_KEY |
bfl |
BFL_API_KEY |
ideogram |
IDEOGRAM_API_KEY |
stability |
STABILITY_API_KEY |
| Provider | Env var |
|---|---|
runway |
RUNWAYML_API_SECRET |
luma |
LUMA_API_KEY |
fal |
FAL_KEY |
hedra |
HEDRA_API_KEY |
| Provider | Env var |
|---|---|
google |
GOOGLE_API_KEY |
heygen |
HEYGEN_API_KEY |
did |
DID_API_KEY |
hedra |
HEDRA_API_KEY |
| Provider | Env var |
|---|---|
heygen |
HEYGEN_API_KEY |
did |
DID_API_KEY |
hedra |
HEDRA_API_KEY |
easy-ai-api uses typed exceptions so your code can handle failures intentionally.
from easy_ai_api.exceptions import MissingCredentialError, UnsupportedProviderError
from easy_ai_api.text import generate
try:
result = generate(provider="openai", instructions="Write one sentence.")
except MissingCredentialError as exc:
# exc.provider — the canonical provider name
# exc.env_vars — tuple of env var names that must be set
print(f"Missing credentials for {exc.provider}: {exc.env_vars}")
except UnsupportedProviderError as exc:
print(f"Unknown provider: {exc}")A MissingCredentialError for one provider does not affect other providers. Package imports and unrelated operations remain fully usable.
See docs/errors.md for the full exception hierarchy and handling recommendations.
Many providers accept common aliases:
| Alias | Resolves to |
|---|---|
gemini, google-ai, imagen, veo, lyria |
google |
flux, black-forest-labs |
bfl |
mistralai |
mistral |
runwayml, runway-ml |
runway |
lumalabs, luma-dream-machine |
luma |
hey-gen |
heygen |
d-id |
did |
eleven-labs |
elevenlabs |
assembly-ai |
assemblyai |
pplx |
perplexity |
deep-infra |
deepinfra |
hugging-face, hf |
huggingface |
docs/configuration.md— credential resolution, python-dotenv integration, per-call overridesdocs/providers.md— complete credential tables for every provider and operationdocs/errors.md— exception hierarchy, handling patterns, retry guidance
See CONTRIBUTING.md for development setup, running tests, and the release process.