Skip to content

Conversation

@bluebread
Copy link

@bluebread bluebread commented Nov 23, 2025

Make sure to read the contributing guidelines before submitting a PR

@sfallah
llama-mtmd-cli applies a chat template to input texts/images and appends image tokens after text tokens by default. However, it looks like DeepSeek-OCR doesn't work if images follow texts. By swapping the order of <image> and the text, the model outputs only two tokens ("}" and EOS). This doesn't look like a bug in the original model, but I don't know if this is intentional or not.

Therefore, I changed a bit of how llama-mtmd-cli tokenizes the input. There could be a better way to do this though.

In correct order:
image
In reverse order:
image

@sfallah sfallah merged commit a594990 into sfallah:sf/deepseek-ocr Nov 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants