Skip to content

Adding a New Provider

dippatel1994 edited this page Feb 13, 2026 · 2 revisions

Adding a New Provider

PaperBanana's provider system is modular. You can add support for new VLM or image generation backends without modifying the core pipeline.

Currently Supported Providers

Provider VLM Image Gen Env Variable
Google Gemini gemini google_imagen GOOGLE_API_KEY
OpenRouter openrouter openrouter_imagen OPENROUTER_API_KEY

See OpenRouter Provider for using Claude, GPT-4, Llama, and other models via OpenRouter.

Provider Types

There are two provider interfaces:

  1. VLM Provider - handles text generation tasks (planning, styling, critique)
  2. Image Generation Provider - handles rendering descriptions into images

Adding a VLM Provider

Create a new file in paperbanana/providers/vlm/:

# paperbanana/providers/vlm/your_provider.py

from typing import Optional
from PIL import Image
from paperbanana.providers.base import VLMProvider

class YourProvider(VLMProvider):
    def __init__(self, api_key: Optional[str] = None, model: str = "default-model"):
        self._api_key = api_key
        self._model = model

    @property
    def name(self) -> str:
        return "your_provider"

    @property
    def model_name(self) -> str:
        return self._model

    def is_available(self) -> bool:
        return self._api_key is not None

    async def generate(
        self,
        prompt: str,
        images: Optional[list[Image.Image]] = None,
        system_prompt: Optional[str] = None,
        temperature: float = 1.0,
        max_tokens: int = 4096,
        response_format: Optional[str] = None,
    ) -> str:
        # Call your API here
        # Return the generated text
        ...

Register it in paperbanana/providers/registry.py:

elif provider_name == "your_provider":
    from paperbanana.providers.vlm.your_provider import YourProvider
    return YourProvider(api_key=settings.your_api_key, model=model)

Adding an Image Generation Provider

Create a new file in paperbanana/providers/image_gen/:

# paperbanana/providers/image_gen/your_provider.py

from typing import Optional
from PIL import Image
from paperbanana.providers.base import ImageGenProvider

class YourImageProvider(ImageGenProvider):
    def __init__(self, api_key: Optional[str] = None, model: str = "default-model"):
        self._api_key = api_key
        self._model = model

    @property
    def name(self) -> str:
        return "your_image_provider"

    @property
    def model_name(self) -> str:
        return self._model

    def is_available(self) -> bool:
        return self._api_key is not None

    async def generate(
        self,
        prompt: str,
        negative_prompt: Optional[str] = None,
        width: int = 1024,
        height: int = 1024,
        seed: Optional[int] = None,
    ) -> Image.Image:
        # Call your API here
        # Return a PIL Image
        ...

Register it in paperbanana/providers/registry.py.

Real Example: OpenRouter Provider

The OpenRouter implementation is a good reference. Key files:

  • paperbanana/providers/vlm/openrouter.py - VLM provider
  • paperbanana/providers/image_gen/openrouter_imagen.py - Image generation provider
  • paperbanana/providers/registry.py - Registration logic
  • paperbanana/core/config.py - API key configuration

The OpenRouter provider uses the OpenAI-compatible API format, making it easy to adapt for other OpenAI-compatible services.

Using Your Provider

Once registered, use it via CLI flags or config:

paperbanana generate \
  --input method.txt \
  --caption "Overview" \
  --vlm-provider your_provider \
  --vlm-model your-model-name \
  --image-provider your_image_provider \
  --image-model your-image-model

Or in configs/config.yaml:

vlm:
  provider: your_provider
  model: your-model-name

image:
  provider: your_image_provider
  model: your-image-model

Configuration

Add your API key setting to paperbanana/core/config.py:

class Settings(BaseSettings):
    # ... existing settings ...
    your_api_key: Optional[str] = Field(default=None, alias="YOUR_API_KEY")

Users can then set YOUR_API_KEY in their .env file or environment.

Providers the Community Has Requested

These are open for contribution:

  • OpenAI (GPT-4o for VLM, DALL-E 3 for image gen) - direct integration, not via OpenRouter
  • Anthropic (Claude for VLM) - direct integration
  • Local models via Ollama (LLaVA for VLM, Stable Diffusion for image gen)
  • Replicate (various open-source models)
  • Together AI (open-source models at scale)

If you're adding one of these, open an issue first to coordinate with other potential contributors.

Testing Your Provider

  1. Write unit tests in tests/providers/test_your_provider.py
  2. Test with the full pipeline: paperbanana generate --vlm-provider your_provider ...
  3. Compare output quality against the default Gemini provider

Submitting a PR

Include:

  • Provider implementation files
  • Registration in registry.py
  • Configuration in config.py
  • Tests
  • Documentation updates (this page, FAQ, etc.)

Clone this wiki locally