Skip to content

Configuration

Dipkumar Patel edited this page Feb 4, 2026 · 1 revision

Configuration

PaperBanana uses a YAML configuration file with CLI flag overrides.

Default Config

The default configuration lives at configs/config.yaml:

vlm:
  provider: gemini
  model: gemini-2.0-flash

image:
  provider: google_imagen
  model: gemini-3-pro-image-preview

pipeline:
  num_retrieval_examples: 10
  refinement_iterations: 3
  output_resolution: "2k"

reference:
  path: data/reference_sets

output:
  dir: outputs
  save_iterations: true
  save_metadata: true

Using a Custom Config

paperbanana generate \
  --input method.txt \
  --caption "Overview" \
  --config my_config.yaml

You only need to include the settings you want to override. Unspecified settings fall back to defaults.

Configuration Options

VLM Settings

Key Description Default
vlm.provider VLM provider name gemini
vlm.model VLM model identifier gemini-2.0-flash

The VLM handles all text generation tasks: retrieval scoring, planning, styling, and critique.

Image Generation Settings

Key Description Default
image.provider Image generation provider google_imagen
image.model Image generation model gemini-3-pro-image-preview

Pipeline Settings

Key Description Default
pipeline.num_retrieval_examples Number of reference examples to retrieve 10
pipeline.refinement_iterations Visualizer-Critic refinement rounds 3
pipeline.output_resolution Target output resolution "2k"

num_retrieval_examples: Higher values give the Planner more context but increase prompt length and API cost. The maximum is 13 (the full reference set). Values between 5 and 10 work well for most inputs.

refinement_iterations: Each iteration runs one Visualizer + Critic cycle. 3 is the default from the paper. Setting to 1 produces faster but lower quality output. Beyond 3 shows diminishing returns.

Reference Settings

Key Description Default
reference.path Path to reference dataset directory data/reference_sets

Point this at a custom reference set if you've curated your own examples.

Output Settings

Key Description Default
output.dir Base output directory outputs
output.save_iterations Save intermediate iteration images true
output.save_metadata Save run metadata JSON true

Setting save_iterations: false reduces disk usage if you only need the final output.

CLI Flag Precedence

CLI flags override config file values. For example:

# Config file says 3 iterations, but this runs 1
paperbanana generate \
  --input method.txt \
  --caption "Overview" \
  --config my_config.yaml \
  --iterations 1

Environment Variables

Variable Description
GOOGLE_API_KEY Gemini API key. Read from .env if present.

Clone this wiki locally