Issues with truncation #1365

designermonkey · 2026-06-17T13:03:47Z

designermonkey
Jun 17, 2026

I use graphify with OpenRouter by using the Ollama settings and providing the OpenRouter API endpoint and key which wirks for the most part.

The problems I have is truncation and invalid json being returned. For example:

[graphify extract] semantic extraction on 135 files via ollama...
[graphify] LLM returned invalid JSON, skipping chunk: Unterminated string starting at: line 1055 column 7 (char 28759)
[graphify] chunk of 4 truncated at depth 0, splitting into halves of 2 and 2

Is there any way I can debug any of this to see if there are any settings I need to change to optimise my calling of the API with graphify?

My command is aliased so when I call graphify . --mode deep --backend ollama --token-budget 6000 it runs:

#!/usr/bin/env bash

OLLAMA_BASE_URL=https://openrouter.ai/api/v1 \
OLLAMA_API_KEY="${OPENROUTER_API_KEY}" \
OLLAMA_MODEL="${GRAPHIFY_OR_MODEL:-deepseek/deepseek-v4-flash}" \
exec /Users/jorpo/.pyenv/shims/graphify "$@"

Any suggestions would be greatly appreciated :)

Answered by safishamsi

Jun 17, 2026

Thanks for the detailed report - the logs made this easy to pin down.

What's happening

The truncation is on the output side, and graphify is actually recovering from it: when a chunk's JSON comes back truncated/unparseable, it bisects the chunk (splitting into halves of 2 and 2) and re-extracts the smaller halves. So those warnings are noisy but not data loss - the affected files get re-extracted on smaller inputs.

But there was a real bug underneath it. The OpenAI-compatible backends (ollama, openai, deepseek, kimi) define their output cap as max_tokens: 16384 in the backend config, but the request dispatch only read a max_completion_tokens key - which only the gemini config defines. So …

View full answer

safishamsi · 2026-06-17T13:36:24Z

safishamsi
Jun 17, 2026
Maintainer

Thanks for the detailed report - the logs made this easy to pin down.

What's happening

The truncation is on the output side, and graphify is actually recovering from it: when a chunk's JSON comes back truncated/unparseable, it bisects the chunk (splitting into halves of 2 and 2) and re-extracts the smaller halves. So those warnings are noisy but not data loss - the affected files get re-extracted on smaller inputs.

But there was a real bug underneath it. The OpenAI-compatible backends (ollama, openai, deepseek, kimi) define their output cap as max_tokens: 16384 in the backend config, but the request dispatch only read a max_completion_tokens key - which only the gemini config defines. So those four backends silently fell back to an 8192 output cap. In --mode deep, the JSON graph for a 4-file chunk easily exceeds 8192 tokens, which is exactly your Unterminated string ... char 28759 (~7k tokens) truncation.

Fixed

Just pushed 5b0c154 to v8: the dispatch now honours either cap key, so ollama/openai/deepseek/kimi get their intended 16384, and GRAPHIFY_MAX_OUTPUT_TOKENS still overrides. That alone should sharply cut the truncation/bisect churn you're seeing.

What you can do right now (before the release)

Lift the cap via env var (works on the current release too - the override is checked before the buggy default):
```
GRAPHIFY_MAX_OUTPUT_TOKENS=16384
```
Lower --token-budget (counterintuitive but effective): it sizes the input chunk, and a smaller input chunk produces a smaller JSON output that fits under the cap. Try --token-budget 3000 (your 4-file chunks become 1-2 files).

For OpenRouter specifically, prefer the openai backend over the Ollama shim:

OPENAI_BASE_URL=https://openrouter.ai/api/v1 \
OPENAI_API_KEY="${OPENROUTER_API_KEY}" \
GRAPHIFY_OPENAI_MODEL="${GRAPHIFY_OR_MODEL:-deepseek/deepseek-v4-flash}" \
GRAPHIFY_MAX_OUTPUT_TOKENS=16384 \
exec graphify "$@"   # then: graphify . --mode deep --backend openai --token-budget 4000

The Ollama shim sends an Ollama-only num_ctx option that OpenRouter ignores (so GRAPHIFY_OLLAMA_NUM_CTX does nothing for you), whereas the openai backend is a clean OpenAI-compatible path - a better fit for OpenRouter.

One caveat

deepseek/deepseek-v4-flash on OpenRouter has its own max-output-tokens ceiling. If that ceiling is below what a chunk needs, raising graphify's cap won't help past it - in that case lever #2 (smaller --token-budget) is the reliable fix. Worth a glance at the model's page on OpenRouter to see its max output.

Hope that unblocks you - let us know how it goes!

1 reply

designermonkey Jun 17, 2026
Author

Thanks for the fast reply! I will give it a go now.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Issues with truncation #1365

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Issues with truncation #1365

Uh oh!

Uh oh!

designermonkey Jun 17, 2026

What's happening

Replies: 1 comment · 1 reply

Uh oh!

safishamsi Jun 17, 2026 Maintainer

What's happening

Fixed

What you can do right now (before the release)

One caveat

Uh oh!

designermonkey Jun 17, 2026 Author

designermonkey
Jun 17, 2026

Replies: 1 comment 1 reply

safishamsi
Jun 17, 2026
Maintainer

designermonkey Jun 17, 2026
Author