-
-
Notifications
You must be signed in to change notification settings - Fork 155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Escape backspace etc graphemes in string.inspect #600
Comments
Thank you |
Shouldn't form feed also be green in your list? It seems to escape fine: import gleam/io
import gleam/list
import gleam/string
fn cp(n: Int) -> UtfCodepoint {
let assert Ok(cp) = string.utf_codepoint(n)
cp
}
pub fn main() {
[34, 92, 13, 10, 9, 12]
|> list.map(cp)
|> string.from_utf_codepoints
|> string.inspect
|> io.println
}
// prints: "\"\\\r\n\t\f" See the erlang code here: Line 490 in fe51781
|
I suppose fixing may be as simple as adding more control characters that should potentially be escaped there in that function. |
I opened a PR that should hopefully fix this issue. Null characters affecting string comparison is untouched because it's arguably intentional behavior. |
@mooreryan The \f patch is unreleased, and I originally did testing on the public build. I tested it lightly in the unreleased build and the \f patch does seem to work. |
I'm not sure what you mean by the Edit: oh, I think you mean this patch. |
This comment (#602 (comment)) has me thinking, should The pull request #602 adds handling for
which seems more helpful. |
Oh that's clever. We could use that syntax for any invisible grapheme- is that what you're saying? How would we identify which ones are invisible? |
I made a chart with the invisible ones earlier in this thread. Theoretically, anything not being converted and has a value <32 is invisible (and 127). The conversion list just does the first enter found (or maybe not, I don't know Erlang well), so maybe we can just add a conversion rule to <32 at the end of the list. And 127 too. I prefer \b over \u{08} when possible, but I don't think a lot of these invisible ones have a well defined C escape so it may be inevitable. |
Yes that would be the general idea. The "simplest" solution may be as @Hyperion-21 says, and just convert the beginning of the ascii table. However, the point of identifying which are invisible is trickier than just taking ascii values < 32. For one thing, there are many "non printing" things outside of that range when you consider unicode...check it out: import gleeunit
import gleeunit/should
pub fn main() {
gleeunit.main()
}
pub fn a_test() {
let x = "a b"
let y = "a\u{0020}b"
should.equal(x, y)
}
pub fn b_test() {
let x = "a b"
let y = "a\u{00A0}b"
should.equal(x, y)
}
pub fn c_test() {
let x = "a\u{0020}b"
let y = "a\u{00A0}b"
should.equal(x, y)
} which yields:
Those all look like spaces, but they're not the same. So, the "ideal" Second, you could imagine going beyond "invisible" characters. Check out this classic example: pub fn e_accent_test() {
let e1 = "\u{00E9}"
let e2 = "\u{0065}\u{0301}"
should.equal(e1, e2)
} and that yields this:
Which both look like the same Both of the outputs shown in those three failures could be considered pretty unhelpful, and worth treating, but, it is complicated, so I'm not sure how complex the My point the semantics of
It's true that |
Is there anything stopping Gleam from supporting an extended set of escape codes? I found an old line in the compiler's changelog saying "Gleam now only supports I'll update #602 momentarily to match the |
Alright, that's done. |
Edited the parent post of this thread to better represent the current state of the issue/pr. |
Fixed by #602
Edited 5/24, 8:00 PM PST
I was having issues related to accidentally interpreting binary data as text, resulting in the following error output from
![image](https://private-user-images.githubusercontent.com/146277660/332748388-7382c2d5-2cb2-4550-a75e-fcb296e5c4dd.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjMzNzg4MDIsIm5iZiI6MTcyMzM3ODUwMiwicGF0aCI6Ii8xNDYyNzc2NjAvMzMyNzQ4Mzg4LTczODJjMmQ1LTJjYjItNDU1MC1hNzVlLWZjYjI5NmU1YzRkZC5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwODExJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDgxMVQxMjE1MDJaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1kN2NmODNmMDkzN2IyM2VmY2JlZjYxMmZiZmQyMjMyMjRmMjBkYWQzZGEwMzg5NmZiOThkYTBiOTI1ZjNkYTFlJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.SbW6DCBSLzKBjhb41c4_uiFXntQ6ZtZDQyJQ9wiq1mA)
gleeunit
:Eventually, I figured out that there's a 0x08 character in there, which is causing the first " to get deleted, then a 0x40 (@) immediately following it takes its place, resulting in strange behavior that looked just close enough to intentional to stump me for a bit.
Ideally, 0x08 should not be doing anything here, since it can easily destroy the output and make it unreadable. Other ASCII escape character like 0x0B (escape) have similar effects here. This could be done by having io.print-like functions replace these sequences with hex (0x08) or escape codes (
\b
,\u{0008}
).This is most severe for gleeunit, where the developer depends on its output to properly debug their program, but I was also having this happen with
io.debug
andio.print
as well (where this behavior might be intentional).Theoretically, to recreate this you just need to write 0x08 to a file, load it (such as with the file_streams package), then get it to print to screen (either by
io.debug
or usinggleeunit
should.equal
to get it to print on test fail) and weird things will happen. I imagine having several 0x08s in the string will cause even more damage to the output.The text was updated successfully, but these errors were encountered: