Skip to content
Dipkumar Patel edited this page Feb 5, 2026 · 3 revisions

FAQ

General

Is this the official implementation?

No. This is an unofficial, community-driven implementation based on the publicly available paper. It is not affiliated with or endorsed by the original authors or Google Research. The official code has not been released yet.

How close is this to the original paper?

We followed the paper's described methodology as closely as possible. The five-agent pipeline, two-phase architecture, and iterative refinement process match the paper. The main differences are in the reference dataset (we use 13 curated examples vs. the paper's 292) and potentially in prompt details, which the paper doesn't fully specify.

Does it cost money to use?

No. PaperBanana runs on Google Gemini's free tier. You need a free API key from Google AI Studio. The free tier has rate limits but is sufficient for normal usage. The package itself is MIT licensed and free on PyPI.

Can I use it for my paper submission?

Yes. The output is yours. However, we'd recommend treating generated diagrams as a strong starting point rather than final camera-ready figures. Review the output for accuracy and make manual adjustments as needed.

Is PaperBanana on PyPI?

Yes. pip install paperbanana installs the CLI and Python API. pip install paperbanana[mcp] adds MCP server support. See Installation for all options.

Is the MCP server listed on registries?

Yes. PaperBanana is published on the Official MCP Registry and submitted to mcp.so. You can also find it on PyPI.

Technical

Why Gemini specifically?

The original paper uses Gemini for both VLM and image generation. We followed their choice to stay as close to the described system as possible. The provider system is modular, so community contributions for other backends (OpenAI, Anthropic, local models) are welcome. See Adding a New Provider.

Why only 13 reference examples instead of 292?

Curating 292 high-quality (methodology text, diagram, caption) tuples requires significant manual effort. The paper describes using 2,000 NeurIPS papers as the starting point. Our 13 examples were manually verified to be clean and representative across the four categories. We're actively looking for community contributions to expand this. See Adding Reference Examples.

How long does generation take?

Typically 30-90 seconds for a single diagram with 3 refinement iterations. Most of the time is spent on API calls to Gemini. Reducing iterations to 1-2 speeds things up at the cost of some output quality.

Can I use local models instead of Gemini?

Not yet out of the box, but the provider system supports this. Someone would need to implement an Ollama or similar provider. The challenge is that local image generation models (Stable Diffusion, FLUX) produce a different style than Gemini's native generation, so prompt templates may need adjustment. This is an open area for contribution.

Does the MCP server work with Windsurf/Zed/other editors?

It should work with any editor that supports the MCP specification. We've tested with Cursor and Claude Desktop. If you get it working with another client, let us know and we'll add configuration examples to the MCP Server Setup page.

Output Quality

The diagram doesn't look right. What can I do?

Several things affect output quality:

  1. Input text specificity: More detailed methodology descriptions produce better diagrams. Vague descriptions give the Planner less to work with.
  2. Caption clarity: The caption should describe the communicative intent, not just label the figure.
  3. Re-running: Generation is non-deterministic. Running the same input again sometimes produces better results.
  4. Iterations: More refinement rounds (up to 3) generally help. Diminishing returns beyond 3.

Why does it sometimes produce results plots instead of architecture diagrams?

The Retriever may select poorly matched reference examples. This can happen when the methodology text is ambiguous about what kind of visualization is needed. Being explicit in the caption (e.g., "System architecture diagram showing..." rather than just "Overview of our method") helps.

Can it generate diagrams in specific styles (e.g., matching my paper's existing figures)?

Not currently. The Stylist applies NeurIPS-style guidelines uniformly. Supporting custom style references is a possible future enhancement.

Clone this wiki locally