-
Notifications
You must be signed in to change notification settings - Fork 294
Adding a New Provider
PaperBanana's provider system is modular. You can add support for new VLM or image generation backends without modifying the core pipeline.
| Provider | VLM | Image Gen | Env Variable |
|---|---|---|---|
| Google Gemini | gemini |
google_imagen |
GOOGLE_API_KEY |
| OpenRouter | openrouter |
openrouter_imagen |
OPENROUTER_API_KEY |
See OpenRouter Provider for using Claude, GPT-4, Llama, and other models via OpenRouter.
There are two provider interfaces:
- VLM Provider - handles text generation tasks (planning, styling, critique)
- Image Generation Provider - handles rendering descriptions into images
Create a new file in paperbanana/providers/vlm/:
# paperbanana/providers/vlm/your_provider.py
from typing import Optional
from PIL import Image
from paperbanana.providers.base import VLMProvider
class YourProvider(VLMProvider):
def __init__(self, api_key: Optional[str] = None, model: str = "default-model"):
self._api_key = api_key
self._model = model
@property
def name(self) -> str:
return "your_provider"
@property
def model_name(self) -> str:
return self._model
def is_available(self) -> bool:
return self._api_key is not None
async def generate(
self,
prompt: str,
images: Optional[list[Image.Image]] = None,
system_prompt: Optional[str] = None,
temperature: float = 1.0,
max_tokens: int = 4096,
response_format: Optional[str] = None,
) -> str:
# Call your API here
# Return the generated text
...Register it in paperbanana/providers/registry.py:
elif provider_name == "your_provider":
from paperbanana.providers.vlm.your_provider import YourProvider
return YourProvider(api_key=settings.your_api_key, model=model)Create a new file in paperbanana/providers/image_gen/:
# paperbanana/providers/image_gen/your_provider.py
from typing import Optional
from PIL import Image
from paperbanana.providers.base import ImageGenProvider
class YourImageProvider(ImageGenProvider):
def __init__(self, api_key: Optional[str] = None, model: str = "default-model"):
self._api_key = api_key
self._model = model
@property
def name(self) -> str:
return "your_image_provider"
@property
def model_name(self) -> str:
return self._model
def is_available(self) -> bool:
return self._api_key is not None
async def generate(
self,
prompt: str,
negative_prompt: Optional[str] = None,
width: int = 1024,
height: int = 1024,
seed: Optional[int] = None,
) -> Image.Image:
# Call your API here
# Return a PIL Image
...Register it in paperbanana/providers/registry.py.
The OpenRouter implementation is a good reference. Key files:
-
paperbanana/providers/vlm/openrouter.py- VLM provider -
paperbanana/providers/image_gen/openrouter_imagen.py- Image generation provider -
paperbanana/providers/registry.py- Registration logic -
paperbanana/core/config.py- API key configuration
The OpenRouter provider uses the OpenAI-compatible API format, making it easy to adapt for other OpenAI-compatible services.
Once registered, use it via CLI flags or config:
paperbanana generate \
--input method.txt \
--caption "Overview" \
--vlm-provider your_provider \
--vlm-model your-model-name \
--image-provider your_image_provider \
--image-model your-image-modelOr in configs/config.yaml:
vlm:
provider: your_provider
model: your-model-name
image:
provider: your_image_provider
model: your-image-modelAdd your API key setting to paperbanana/core/config.py:
class Settings(BaseSettings):
# ... existing settings ...
your_api_key: Optional[str] = Field(default=None, alias="YOUR_API_KEY")Users can then set YOUR_API_KEY in their .env file or environment.
These are open for contribution:
- OpenAI (GPT-4o for VLM, DALL-E 3 for image gen) - direct integration, not via OpenRouter
- Anthropic (Claude for VLM) - direct integration
- Local models via Ollama (LLaVA for VLM, Stable Diffusion for image gen)
- Replicate (various open-source models)
- Together AI (open-source models at scale)
If you're adding one of these, open an issue first to coordinate with other potential contributors.
- Write unit tests in
tests/providers/test_your_provider.py - Test with the full pipeline:
paperbanana generate --vlm-provider your_provider ... - Compare output quality against the default Gemini provider
Include:
- Provider implementation files
- Registration in
registry.py - Configuration in
config.py - Tests
- Documentation updates (this page, FAQ, etc.)