Escape backspace etc graphemes in string.inspect #600

Michael-Mark-Edu · 2024-05-22T10:01:21Z

Fixed by #602

Edited 5/24, 8:00 PM PST

I was having issues related to accidentally interpreting binary data as text, resulting in the following error output from gleeunit:

Eventually, I figured out that there's a 0x08 character in there, which is causing the first " to get deleted, then a 0x40 (@) immediately following it takes its place, resulting in strange behavior that looked just close enough to intentional to stump me for a bit.

Ideally, 0x08 should not be doing anything here, since it can easily destroy the output and make it unreadable. Other ASCII escape character like 0x0B (escape) have similar effects here. This could be done by having io.print-like functions replace these sequences with hex (0x08) or escape codes (\b, \u{0008}).

This is most severe for gleeunit, where the developer depends on its output to properly debug their program, but I was also having this happen with io.debug and io.print as well (where this behavior might be intentional).

Theoretically, to recreate this you just need to write 0x08 to a file, load it (such as with the file_streams package), then get it to print to screen (either by io.debug or using gleeunit should.equal to get it to print on test fail) and weird things will happen. I imagine having several 0x08s in the string will cause even more damage to the output.

The text was updated successfully, but these errors were encountered:

lpil · 2024-05-22T20:58:52Z

Thank you

Michael-Mark-Edu · 2024-05-22T22:24:58Z

Here are all the escape codes that are relevant to this:

(code 127 is in the "invisible" category, codes 128+ are UTF-8 and shouldn't have special functionality)

Michael-Mark-Edu · 2024-05-22T22:31:10Z

This also seems to be an Erlang-specific issue, since Javascript just turns everything into unicode

mooreryan · 2024-05-22T22:45:43Z

Shouldn't form feed also be green in your list? It seems to escape fine:

import gleam/io
import gleam/list
import gleam/string

fn cp(n: Int) -> UtfCodepoint {
  let assert Ok(cp) = string.utf_codepoint(n)
  cp
}

pub fn main() {
  [34, 92, 13, 10, 9, 12]
  |> list.map(cp)
  |> string.from_utf_codepoints
  |> string.inspect
  |> io.println
}

// prints: "\"\\\r\n\t\f"

See the erlang code here:

stdlib/src/gleam_stdlib.erl

Line 490 in fe51781

inspect_maybe_utf8_string(Binary, Acc) ->

mooreryan · 2024-05-22T22:47:25Z

I suppose fixing may be as simple as adding more control characters that should potentially be escaped there in that function.

Michael-Mark-Edu · 2024-05-22T23:09:31Z

I opened a PR that should hopefully fix this issue. Null characters affecting string comparison is untouched because it's arguably intentional behavior.

Michael-Mark-Edu · 2024-05-22T23:42:58Z

@mooreryan The \f patch is unreleased, and I originally did testing on the public build. I tested it lightly in the unreleased build and the \f patch does seem to work.

mooreryan · 2024-05-23T00:51:46Z

I'm not sure what you mean by the \f patch. The pull request you link doesn't change behavior with the form feed character, as it is already properly escaped in the code.

Edit: oh, I think you mean this patch.

mooreryan · 2024-05-24T17:06:43Z

This comment (#602 (comment)) has me thinking, should string.inspect handle more of the first 32 non-printable ascii characters?

The pull request #602 adds handling for \b \v and \e, however, it may be useful to also show the Gleam escape syntax for other non-printable characters. Going back to the original motivating example, if more of the non-printable characters were handled by string.inspect, then the diff would look something like this:

expected: Ok("abc123")
     got: Ok("\u{0008}abc123")

which seems more helpful.

lpil · 2024-05-24T18:05:45Z

Oh that's clever. We could use that syntax for any invisible grapheme- is that what you're saying? How would we identify which ones are invisible?

Hyperion-21 · 2024-05-24T20:27:51Z

I made a chart with the invisible ones earlier in this thread. Theoretically, anything not being converted and has a value <32 is invisible (and 127). The conversion list just does the first enter found (or maybe not, I don't know Erlang well), so maybe we can just add a conversion rule to <32 at the end of the list. And 127 too.

I prefer \b over \u{08} when possible, but I don't think a lot of these invisible ones have a well defined C escape so it may be inevitable.

mooreryan · 2024-05-24T20:56:01Z

Oh that's clever. We could use that syntax for any invisible grapheme- is that what you're saying? How would we identify which ones are invisible?

Yes that would be the general idea. The "simplest" solution may be as @Hyperion-21 says, and just convert the beginning of the ascii table. However, the point of identifying which are invisible is trickier than just taking ascii values < 32. For one thing, there are many "non printing" things outside of that range when you consider unicode...check it out:

import gleeunit
import gleeunit/should

pub fn main() {
  gleeunit.main()
}

pub fn a_test() {
  let x = "a b"
  let y = "a\u{0020}b"

  should.equal(x, y)
}

pub fn b_test() {
  let x = "a b"
  let y = "a\u{00A0}b"

  should.equal(x, y)
}

pub fn c_test() {
  let x = "a\u{0020}b"
  let y = "a\u{00A0}b"

  should.equal(x, y)
}

which yields:

Failures:

  1) invisible_chars_test.b_test
     Values were not equal
     expected: "a b"
          got: "a b"
     output: 

  2) invisible_chars_test.c_test
     Values were not equal
     expected: "a b"
          got: "a b"
     output:

Those all look like spaces, but they're not the same. So, the "ideal" string.inspect function may somehow account for that. But it is getting trickier, and so maybe should be left to some 3rd party library? (not sure about that).

Second, you could imagine going beyond "invisible" characters. Check out this classic example:

pub fn e_accent_test() {
  let e1 = "\u{00E9}"
  let e2 = "\u{0065}\u{0301}"

  should.equal(e1, e2)
}

and that yields this:

  3) invisible_chars_test.e_accent_test
     Values were not equal
     expected: "é"
          got: "é"
     output:

Which both look like the same e with accute accent.

Both of the outputs shown in those three failures could be considered pretty unhelpful, and worth treating, but, it is complicated, so I'm not sure how complex the string.inspect should be. It should probably be examined what some common other languages do.

My point the semantics of string.inspect could get tricky, and I'm not sure how far the escaping should be taken, though it could be potentially useful.

I prefer \b over \u{08} when possible, but I don't think a lot of these invisible ones have a well defined C escape so it may be inevitable.

It's true that \b is nice, but it is not valid gleam syntax.

Michael-Mark-Edu · 2024-05-24T22:45:22Z

Is there anything stopping Gleam from supporting an extended set of escape codes? I found an old line in the compiler's changelog saying "Gleam now only supports \r, \n, \t, \", and \\ string escapes" which makes me think this is an intentional decision... but why? It seems like an arbitrary decision.

I'll update #602 momentarily to match the \u syntax. I'll also see if I can get it to show the invisible graphemes.

Michael-Mark-Edu · 2024-05-25T00:13:00Z

Alright, that's done.

Michael-Mark-Edu · 2024-05-25T03:00:49Z

Edited the parent post of this thread to better represent the current state of the issue/pr.

lpil changed the title ~~Console output affected by ASCII escape codes when it shouldn't~~ Escape backspace etc graphemes in string.inspect May 22, 2024

lpil added bug Something isn't working good first issue Good for newcomers help wanted Extra attention is needed labels May 22, 2024

Michael-Mark-Edu mentioned this issue May 22, 2024

Prevent control codes actioning in string.inspect #602

Closed

Michael-Mark-Edu mentioned this issue May 26, 2024

JSON.stringify produces invalid Gleam escape sequences and should be replaced #607

Closed

lpil mentioned this issue May 29, 2024

Prevent control codes actioning in string.inspect #615

Merged

lpil closed this as completed in #615 May 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Escape backspace etc graphemes in string.inspect #600

Escape backspace etc graphemes in string.inspect #600

Michael-Mark-Edu commented May 22, 2024 •

edited

Loading

lpil commented May 22, 2024

Michael-Mark-Edu commented May 22, 2024

Michael-Mark-Edu commented May 22, 2024

mooreryan commented May 22, 2024 •

edited

Loading

mooreryan commented May 22, 2024

Michael-Mark-Edu commented May 22, 2024

Michael-Mark-Edu commented May 22, 2024

mooreryan commented May 23, 2024 •

edited

Loading

mooreryan commented May 24, 2024 •

edited

Loading

lpil commented May 24, 2024

Hyperion-21 commented May 24, 2024

mooreryan commented May 24, 2024 •

edited

Loading

Michael-Mark-Edu commented May 24, 2024

Michael-Mark-Edu commented May 25, 2024

Michael-Mark-Edu commented May 25, 2024

Escape backspace etc graphemes in string.inspect #600

Escape backspace etc graphemes in string.inspect #600

Comments

Michael-Mark-Edu commented May 22, 2024 • edited Loading

lpil commented May 22, 2024

Michael-Mark-Edu commented May 22, 2024

Michael-Mark-Edu commented May 22, 2024

mooreryan commented May 22, 2024 • edited Loading

mooreryan commented May 22, 2024

Michael-Mark-Edu commented May 22, 2024

Michael-Mark-Edu commented May 22, 2024

mooreryan commented May 23, 2024 • edited Loading

mooreryan commented May 24, 2024 • edited Loading

lpil commented May 24, 2024

Hyperion-21 commented May 24, 2024

mooreryan commented May 24, 2024 • edited Loading

Michael-Mark-Edu commented May 24, 2024

Michael-Mark-Edu commented May 25, 2024

Michael-Mark-Edu commented May 25, 2024

Michael-Mark-Edu commented May 22, 2024 •

edited

Loading

mooreryan commented May 22, 2024 •

edited

Loading

mooreryan commented May 23, 2024 •

edited

Loading

mooreryan commented May 24, 2024 •

edited

Loading

mooreryan commented May 24, 2024 •

edited

Loading