wcswidth() returning -1 not handled (and newline in PS1 not handled) #394

andychu · 2019-07-04T04:58:18Z

I just merged this, but when I tested it against some random unicode prompt, libc.wcswidth() is returning -1? That causes an AssertionError later.

I reduced it to this, which works in bash:

PS1=$'\xe2\x86\x92'

Can you reproduce this on your machine? Either way, we should handle the -1 return value.

But I'm not sure why this is happening. Maybe the '\x01 stripping isn't right? That would be an unprintable character.

https://unix.stackexchange.com/questions/25903/awesome-symbols-and-characters-in-a-bash-prompt

Originally posted by @andychu in #368 (comment)

The text was updated successfully, but these errors were encountered:

andychu · 2019-07-04T17:26:10Z

@jyn514 Here's a simpler repro. (not sure why the last one was flaky)

A newline in the prompt causes this problem. So I think we should check -1 and then return the fallback... or maybe somehow indicate a warning? newlines in prompts are probably not uncommon.

Maybe we have to additionally split lines? And only calculate width of last line? So there could be 2 changes here -- check -1, and account for newlines in another way.

andy@lisa:~/git/oilshell/oil$ bin/osh --rcfile /dev/null
osh$ PS1=$'\n'

ls 
 .git/                 .gitignore            .mypy_cache/          .swp                  .travis.yml           .vagrant/             INSTALL.txt           LICENSE.txt           Makefile              NOTES-cpython.txt     
 NOTES.txt             Python-2.7.13/        README.md             Rplots.pdf            TODO-completion.txt   TODO.txt              Vagrantfile           __init__.py           __pycache__/          _bin/                 
 _build/               _chroot/              _deps/                _devbuild/            _release/             _tmp/                 array.patch           asdl.txt              asdl/                 benchmarks/           
 bin/                  build/                configure             core/                 demo/                 devtools/             dir/                  doc/                  edit.txt              fastlex.pyi           
 fastlex.so            file                  frontend/             gold/                 install               libc.pyi              libc.so               line_input.so         local.sh              logs-0.6.pre13.tar    
 metrics/              misc/                 mycpp/                native/               oil-version.txt       oil_lang/             opy/                  osh/                  ovm2/                 pgen2/                
 portable-rules.mk     posix_.pyi            posix_.so             pylib/                r.txt                 release-notes.txt     release-roadmap.txt   spec/                 test/                 testdata/             
 tmp.txt               tools/                types/                uftrace.data.old/     uftrace.data/         vendor/               web/                  
Traceback (most recent call last):
  File "/home/andy/git/oilshell/oil/core/comp_ui.py", line 89, in PrintCandidates
    self._PrintCandidates(*args)
  File "/home/andy/git/oilshell/oil/core/comp_ui.py", line 374, in _PrintCandidates
    self._ReturnToPrompt(num_lines+1)
  File "/home/andy/git/oilshell/oil/core/comp_ui.py", line 309, in _ReturnToPrompt
    assert last_prompt_len != -1
AssertionError

jyn514 · 2019-07-04T18:25:26Z

I'm not sure that the autocomplete UI would work even if we only counted the last line, that's something to look into. -1 should probably be handled the same way as other errors, by just counting the number of bytes. We should also print a warning if the prompt is invalid or the UTF8 locale isn't installed instead of silently failing.

jyn514 · 2019-07-04T18:27:43Z

From #368 (comment):

I can't replicate the error for PS1=$'\xe2\x86\x92' on my machine. Here's my locale:

LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

I noticed you have LANGUAGE=en_US, could you try changing that and seeing what happens? Currently we only change LC_CTYPE because that's what the linux man pages document, maybe it's different on MacOS.

The newline threw an assertion error for me as well.

andychu · 2019-07-04T19:00:55Z

Yeah it's possible we may need to take into account the prompt height in the future. But for now I think fixing the AssertionError is OK.

I would like to print a message rather than silently failing, but I'm not sure where to do it.

I still get the same error:

[osh] lisa ~/git/oilshell/oil$ export LANGUAGE=
[osh] lisa ~/git/oilshell/oil$ locale
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
[osh] lisa ~/git/oilshell/oil$ PS1=$'\n'

ls 
 .git/                 .gitignore            .mypy_cache/          .swp                  .travis.yml           .vagrant/             INSTALL.txt           LICENSE.txt           Makefile              NOTES-cpython.txt     
 NOTES.txt             Python-2.7.13/        README.md             Rplots.pdf            TODO-completion.txt   TODO.txt              Vagrantfile           __init__.py           __pycache__/          _bin/                 
 _build/               _chroot/              _deps/                _devbuild/            _release/             _tmp/                 array.patch           asdl.txt              asdl/                 benchmarks/           
 bin/                  build/                configure             core/                 demo/                 devtools/             dir/                  doc/                  edit.txt              fastlex.pyi           
 fastlex.so            file                  frontend/             gold/                 install               libc.pyi              libc.so               line_input.so         local.sh              logs-0.6.pre13.tar    
 metrics/              misc/                 mycpp/                native/               oil-version.txt       oil_lang/             opy/                  osh/                  ovm2/                 pgen2/                
 portable-rules.mk     posix_.pyi            posix_.so             pylib/                r.txt                 release-notes.txt     release-roadmap.txt   spec/                 test/                 testdata/             
 tmp.txt               tools/                types/                uftrace.data.old/     uftrace.data/         vendor/               web/                  
Traceback (most recent call last):
  File "/home/andy/git/oilshell/oil/core/comp_ui.py", line 89, in PrintCandidates
    self._PrintCandidates(*args)
  File "/home/andy/git/oilshell/oil/core/comp_ui.py", line 374, in _PrintCandidates
    self._ReturnToPrompt(num_lines+1)
  File "/home/andy/git/oilshell/oil/core/comp_ui.py", line 309, in _ReturnToPrompt
    assert last_prompt_len != -1
AssertionError

jyn514 · 2019-07-04T19:03:12Z

I meant for the unicode arrow, not for the newline.

andychu · 2019-07-04T19:07:02Z

Does it make a difference? either way we have to handle -1.

I must have made some kind of error minimizing the case. The full prompt in the stackoverflow case failed. But it looks like the cause was the \n and not the arrow.

jyn514 · 2019-07-06T17:16:47Z

How would you expect this to behave for a newline inside an escape sequence? For example PS1=$'abc\x01\033[0;31m\n\x02> ', should it be len('abc> ') or len('> ')? Or is it malformed and not worth testing?

Only try to find width of last line of prompt. Handle -1 from wcwidth by returning number of bytes in prompt. Does not handle newline inside of escape characters. Addresses oilshell#394

andychu · 2019-07-06T17:32:56Z

I'm not sure... I try to come up with a strategy to test what bash does when I don't know.

But I think you can just do a mechanical change to fall back on len() if -1 is returned.

That is, follow the existing logic and remove everything inside \x01 \x02, and then calculate the width.

"monotonically increasing correctness" is OK... i.e. I would fix this bug first and then worry about multiline prompts, which may require larger changes to other parts of the code.

andychu · 2019-07-06T17:34:04Z

The important part is really testing. So when we refactor later, we don't break this.

In this case I would add a unit tests for the PromptLen function. It can return -1 now, but it never should, because that breaks an invariant in the rest of the program (caught by the assertion).

Prior to the unicode change, the function could never return -1.

jyn514 · 2019-07-06T17:35:56Z

Do you use property-based testing? It would be nice to have something like https://hypothesis.readthedocs.io/en/latest/quickstart.html that asserts that the function never returns -1 instead of coming up with test cases manually.

jyn514 · 2019-07-06T17:53:01Z

Something like this:

diff --git a/core/comp_ui_test.py b/core/comp_ui_test.py
index 7650f7ce..aa95a981 100755
--- a/core/comp_ui_test.py
+++ b/core/comp_ui_test.py
@@ -7,6 +7,8 @@ from __future__ import print_function
 import cStringIO
 import sys
 import unittest
+import hypothesis
+from hypothesis.strategies import text
 
 from core import comp_ui  # module under test
 from core import util
@@ -170,9 +172,9 @@ class PromptTest(unittest.TestCase):
     self.assertEqual(comp_ui._PromptLen("abc\ndef"), 3)
     self.assertEqual(comp_ui._PromptLen(""), 0)
 
-  def testControlCharacters(self):
-    self.assertEqual(comp_ui._PromptLen("\xef"), 1)
-    self.assertEqual(comp_ui._PromptLen("\x03\x05"), 2)
+  @hypothesis.given(text())
+  def testNeverPanics(self, s):
+    assert comp_ui._PromptLen(s) >= 0, repr(s)
 
 if __name__ == '__main__':
   unittest.main()
diff --git a/vendor/typing.py b/vendor/typing.py
index b0fdb1c2..a9a700bf 100644
--- a/vendor/typing.py
+++ b/vendor/typing.py
@@ -10,6 +10,9 @@
 # if TYPE_CHECKING:
 #   NullFunc = Callable[[int, int], int]
 
+TypingMeta = None
+TypeVar = None
+_ForwardRef = None
 List = None
 Tuple = None
 Optional = None

andychu · 2019-07-06T18:41:04Z

It's something I've been interested in, but haven't gotten around to trying.

Does it actually fail with the old code in this case? Like it manages to find the \n ? I guess it can hard-code those special strings and try it.

If so it might be nice to start trying it. We would have to sort ouf the PIP / Travis dependencies, which shouldn't be hard.

(Also if it takes a long time to run, I would put them in a different file so we can selectively run them. I like that the unit tests run pretty fast, and have no deps)

jyn514 · 2019-07-06T18:47:22Z

It runs pretty fast, but yes it introduces a dependency. It caught the -1, I think it came up with \x1f or something like that.

andychu · 2019-07-06T18:51:48Z

OK great, I would accept a PR to add it. Although I would still be inclined to put things in a different file to start with. When we gain confidence with it, we can move them into the normal test files.

Maybe one per dir, like osh/property_tests.py and core/property_tests.py ? The _tests.py suffix shouldn't get picked up by test/unit.sh, which is probably what we want at first.

Running it in Travis would be nice too but that could be a followup change.

The one caveat I have is that internal invariants are subject to change / refactoring -- hence my preference for spec tests over unit tests. But yes in this case there was a specific runtime invariant that the assert was protecting.

jyn514 · 2019-07-06T19:16:53Z

This just caught another error! PyArg_ParseTuple throws a TypeError if a string contains a null byte.

andychu · 2019-07-06T19:25:21Z

OK interesting. Yeah I have been wondering about this... I think this will happen in all the other C extension too.

bash silently truncates.

$ echo $'foo\x00bar'
foo

mksh seems not to though...

Python builtins raise an error:

>>> os.listdir('bin\0')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: listdir() argument 1 must be encoded string without null bytes, not str

Not sure yet what we should do... great find though!!!

jyn514 · 2019-07-06T19:26:46Z

Hmm python seems to handle it fine as long as it doesn't go through a C extension:

(osh) $ echo $'foo\x00bar'
foobar
(osh) $ PS1=$'foo > \x00bar > '
foo >

See oilshell#402 (comment) and oilshell#394 (comment) for discussion.

- Only try to find width of last line of prompt. - Handle -1 from wcwidth by returning number of bytes in prompt. - Does not handle newline inside of escape characters. Addresses #394 - Add tests for control characters in prompt See #402 (comment) and #394 (comment) for discussion.

andychu · 2019-07-17T19:20:43Z

Released with 0.7.pre1: https://www.oilshell.org/release/0.7.pre1/

andychu changed the title ~~wcswidth() returning -1 not handled~~ wcswidth() returning -1 not handled (and newline not handled) Jul 4, 2019

andychu changed the title ~~wcswidth() returning -1 not handled (and newline not handled)~~ wcswidth() returning -1 not handled (and newline in PS1 not handled) Jul 4, 2019

jyn514 added a commit to jyn514/oil that referenced this issue Jul 6, 2019

Handle -1 from wcwidth

2138fed

Only try to find width of last line of prompt. Handle -1 from wcwidth by returning number of bytes in prompt. Does not handle newline inside of escape characters. Addresses oilshell#394

jyn514 mentioned this issue Jul 6, 2019

Allow prompts to contain newline #402

Merged

jyn514 mentioned this issue Jul 6, 2019

Add property testing using hypothesis #406

Merged

jyn514 added a commit to jyn514/oil that referenced this issue Jul 6, 2019

Remove escaped characters before splitting lines in prompt

db80b14

See oilshell#402 (comment) and oilshell#394 (comment) for discussion.

jyn514 added a commit to jyn514/oil that referenced this issue Jul 6, 2019

Remove escaped characters before splitting lines in prompt

ea1c7bc

See oilshell#402 (comment) and oilshell#394 (comment) for discussion.

andychu added the pending-release label Jul 7, 2019

andychu closed this as completed Jul 17, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wcswidth() returning -1 not handled (and newline in PS1 not handled) #394

wcswidth() returning -1 not handled (and newline in PS1 not handled) #394

andychu commented Jul 4, 2019

andychu commented Jul 4, 2019

jyn514 commented Jul 4, 2019

jyn514 commented Jul 4, 2019 •

edited

andychu commented Jul 4, 2019

jyn514 commented Jul 4, 2019

andychu commented Jul 4, 2019

jyn514 commented Jul 6, 2019

andychu commented Jul 6, 2019

andychu commented Jul 6, 2019

jyn514 commented Jul 6, 2019

jyn514 commented Jul 6, 2019

andychu commented Jul 6, 2019 •

edited

jyn514 commented Jul 6, 2019

andychu commented Jul 6, 2019

jyn514 commented Jul 6, 2019

andychu commented Jul 6, 2019 •

edited

jyn514 commented Jul 6, 2019 •

edited

andychu commented Jul 17, 2019

wcswidth() returning -1 not handled (and newline in PS1 not handled) #394

wcswidth() returning -1 not handled (and newline in PS1 not handled) #394

Comments

andychu commented Jul 4, 2019

andychu commented Jul 4, 2019

jyn514 commented Jul 4, 2019

jyn514 commented Jul 4, 2019 • edited

andychu commented Jul 4, 2019

jyn514 commented Jul 4, 2019

andychu commented Jul 4, 2019

jyn514 commented Jul 6, 2019

andychu commented Jul 6, 2019

andychu commented Jul 6, 2019

jyn514 commented Jul 6, 2019

jyn514 commented Jul 6, 2019

andychu commented Jul 6, 2019 • edited

jyn514 commented Jul 6, 2019

andychu commented Jul 6, 2019

jyn514 commented Jul 6, 2019

andychu commented Jul 6, 2019 • edited

jyn514 commented Jul 6, 2019 • edited

andychu commented Jul 17, 2019

jyn514 commented Jul 4, 2019 •

edited

andychu commented Jul 6, 2019 •

edited

andychu commented Jul 6, 2019 •

edited

jyn514 commented Jul 6, 2019 •

edited