Skip to content

Consider extending \x to support non-ASCII characters #1352

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ridiculousfish opened this issue Mar 22, 2014 · 3 comments · Fixed by #9247
Closed

Consider extending \x to support non-ASCII characters #1352

ridiculousfish opened this issue Mar 22, 2014 · 3 comments · Fixed by #9247

Comments

@ridiculousfish
Copy link
Member

Currently \x escapes an ASCII value, and \X escapes a byte value. \x is "safe", because it cannot be used to generate invalid sequences: echo \xE4 is a syntax error. \X is "dangerous" because it can, e.g. echo \XE4 is accepted and results in an invalid UTF-8 sequence.

This bug tracks whether we should extend \x to allow escaping non-ASCII characters. An open question is whether it should escape byte values or characters. That is, does \xE4 generate the literal byte 0xE4, or the unichar U+00E4.

Spun off from #1225 . See that bug for more discussion.

@ddevault
Copy link

I just ran into this issue, and came here to report a bug. Expressing support for proper \x support, as defined in POSIX. That is, echo -ne "\xFF" should output a literal 0xFF to stdout.

@faho
Copy link
Member

faho commented Sep 26, 2022

Yeah, I don't see a use for a "checking-hex" escape - you already decided to use hex values, so you probably know what you are doing or want the actual bytes!

I would just make \x and \X the same. Unless anyone disagrees, I'm going to implement that.

@krobelus
Copy link
Contributor

sounds good, thanks. Both \xe4 and \Xe4 should produce that byte. For Unicode characters we already have \u00e4

faho added a commit to faho/fish-shell that referenced this issue Sep 29, 2022
Up to now, in normal locales \x was essentially the same as \X, except
that it errored if given a value > 0x7f.

That's kind of annoying and useless.

A subtle change is that `\xHH` now represents the character (if any)
encoded by the byte value "HH", so even for values <= 0x7f if that's
not the same as the ASCII value we would diverge.

I do not believe anyone has ever run fish on a system where that
distinction matters. It isn't a thing for UTF-8, it isn't a thing for
ASCII, it isn't a thing for UTF-16, it isn't a thing for any extended
ASCII scheme - ISO8859-X, it isn't a thing for SHIFT-JIS.

I am reasonably certain we are making that same assumption in other
places.

Fixes fish-shell#1352
@faho faho mentioned this issue Sep 29, 2022
3 tasks
faho added a commit to faho/fish-shell that referenced this issue Oct 9, 2022
Up to now, in normal locales \x was essentially the same as \X, except
that it errored if given a value > 0x7f.

That's kind of annoying and useless.

A subtle change is that `\xHH` now represents the character (if any)
encoded by the byte value "HH", so even for values <= 0x7f if that's
not the same as the ASCII value we would diverge.

I do not believe anyone has ever run fish on a system where that
distinction matters. It isn't a thing for UTF-8, it isn't a thing for
ASCII, it isn't a thing for UTF-16, it isn't a thing for any extended
ASCII scheme - ISO8859-X, it isn't a thing for SHIFT-JIS.

I am reasonably certain we are making that same assumption in other
places.

Fixes fish-shell#1352
faho added a commit to faho/fish-shell that referenced this issue Oct 9, 2022
Up to now, in normal locales \x was essentially the same as \X, except
that it errored if given a value > 0x7f.

That's kind of annoying and useless.

A subtle change is that `\xHH` now represents the character (if any)
encoded by the byte value "HH", so even for values <= 0x7f if that's
not the same as the ASCII value we would diverge.

I do not believe anyone has ever run fish on a system where that
distinction matters. It isn't a thing for UTF-8, it isn't a thing for
ASCII, it isn't a thing for UTF-16, it isn't a thing for any extended
ASCII scheme - ISO8859-X, it isn't a thing for SHIFT-JIS.

I am reasonably certain we are making that same assumption in other
places.

Fixes fish-shell#1352
@faho faho closed this as completed in #9247 Oct 9, 2022
faho added a commit that referenced this issue Oct 9, 2022
Up to now, in normal locales \x was essentially the same as \X, except
that it errored if given a value > 0x7f.

That's kind of annoying and useless.

A subtle change is that `\xHH` now represents the character (if any)
encoded by the byte value "HH", so even for values <= 0x7f if that's
not the same as the ASCII value we would diverge.

I do not believe anyone has ever run fish on a system where that
distinction matters. It isn't a thing for UTF-8, it isn't a thing for
ASCII, it isn't a thing for UTF-16, it isn't a thing for any extended
ASCII scheme - ISO8859-X, it isn't a thing for SHIFT-JIS.

I am reasonably certain we are making that same assumption in other
places.

Fixes #1352
@faho faho modified the milestones: fish-future, fish 3.6.0 Oct 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants