Skip to content

On Windows (but not on UNIX) redirecting the stdin of main to a pipe or a file results in wrong decoding of non-ASCII characters #6294

@enzomich

Description

@enzomich

For a small RAG application I have written a Python wrapper that opens Llama.cpp's main into a subprocess using subprocess.Popen() and communicates with it through two pipes (yes, I'm using the --simple-io option). Everything works fine, with an exception: if the line sent to main's stdin contains non-ASCII characters (e.g., Greek or Cyrillic or even just Latin with accents or other diacritical marks) those characters, and only those, are received as garbled text (and understood by the model with a lot of fantasy). Initially I thought that I was doing something wrong, but then I discovered exactly the same thing happens without my Python wrapper, by launching main at the command line and redirecting its stdin using a "main < file.txt" or "echo input_line | main" command:

C:\Users\enzom\AI\LlamaFeeder>echo Translate "Σήμερον ἐστὶν εὔδια ἡμέρα" | \Users\enzom\AI\llama.cpp\llama-b2391-bin-win-cublas-cu12.2.0-x64\main -m \Users\enzom\AI\llama.cpp\Models\mixtral-8x7b-instruct-v0.1.Q4_K_M.gguf --simple-io --instruct --temp 0.1
Log start
main: build = 2391 (7ab7b733)
[...]
 - If you want to submit another line, end your input with '\'.


>  The English translation of "ΣήμεÏον á¼ÏÏὶν εá½Î´Î¹Î± ἡμέÏα" is "The children are playing in the park."

>
>
>  Trans
>
> late
>
>  "
>
> Î
>
> £
> Î

Please also note the garbage in the following lines until main is killed with a Ctrl-C, as if it hadn't noticed that the pipe was closed at the other side.

On the other hand, if the instruction is entered at the console prompt everything works as expected:

C:\Users\enzom\AI\LlamaFeeder>\Users\enzom\AI\llama.cpp\llama-b2391-bin-win-cublas-cu12.2.0-x64\main -m \Users\enzom\AI\llama.cpp\Models\mixtral-8x7b-instruct-v0.1.Q4_K_M.gguf --simple-io --instruct --temp 0.1 
Log start
[...]
 - If you want to submit another line, end your input with '\'.


> Translate "Σήμερον ἐστὶν εὔδια ἡμέρα"
 The translation of "Σήμερον ἐστὶν εὔδια ἡμέρα" is "Today is a fair day."

>

Any idea about how to fix this?

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions