Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.getc won't permit completely raw input #10

Closed
geekosaur opened this issue Oct 28, 2017 · 3 comments
Closed

.getc won't permit completely raw input #10

geekosaur opened this issue Oct 28, 2017 · 3 comments

Comments

@geekosaur
Copy link

The example code is wrong in that the getc method will always wait for additional input in order to determine whether it is a combining character or not.

To actually do raw input, you need to set the IO::Handle (e.g. $*IN) to binary mode ($*IN.encoding(Nil);) and use $*IN.read(1) to read a single byte; note that this will give you a Buf instead of a Str. (A quick hack for that is $*IN.read(1).decode("latin-1"), especially since Buf is currently not as easy to work with as Str.)

The raw input mechanism may require a somewhat recent moarvm and matching rakudo to work properly.

@AlexDaniel
Copy link

There's a rakudobug that I am rejecting with similar reasoning: RT#125828. I think something like INIT { $*IN.encoding: ‘ASCII’ } should work too.

@krunen
Copy link
Owner

krunen commented Feb 8, 2018

I have made a module Unicode::UTF8Parser that parses UTF8 chars from a stream without waiting for combining chars. I can change the example to use this module.

I have held back doing this, as I think Rakudo should expose this as some kind of parser mode. It seems silly to reimplement a full UTF8 parser in plain perl6. Also, I am not sure how Rakudo will handle un-normalized unicode (that is, with base char and combining chars separately). I have used it myself without much problems though - the strings seems to become normalized quite quickly.

@krunen
Copy link
Owner

krunen commented Aug 18, 2023

I have added a working example to read single chars from $*IN.

@krunen krunen closed this as completed Aug 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants