Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wcswidth() returning -1 not handled (and newline in PS1 not handled) #394

Closed
andychu opened this issue Jul 4, 2019 · 18 comments
Closed

Comments

@andychu
Copy link
Contributor

andychu commented Jul 4, 2019

I just merged this, but when I tested it against some random unicode prompt, libc.wcswidth() is returning -1? That causes an AssertionError later.

I reduced it to this, which works in bash:

PS1=$'\xe2\x86\x92'

Can you reproduce this on your machine? Either way, we should handle the -1 return value.

But I'm not sure why this is happening. Maybe the '\x01 stripping isn't right? That would be an unprintable character.

https://unix.stackexchange.com/questions/25903/awesome-symbols-and-characters-in-a-bash-prompt

Originally posted by @andychu in #368 (comment)

@andychu
Copy link
Contributor Author

andychu commented Jul 4, 2019

@jyn514 Here's a simpler repro. (not sure why the last one was flaky)

A newline in the prompt causes this problem. So I think we should check -1 and then return the fallback... or maybe somehow indicate a warning? newlines in prompts are probably not uncommon.

Maybe we have to additionally split lines? And only calculate width of last line? So there could be 2 changes here -- check -1, and account for newlines in another way.

andy@lisa:~/git/oilshell/oil$ bin/osh --rcfile /dev/null
osh$ PS1=$'\n'

ls 
 .git/                 .gitignore            .mypy_cache/          .swp                  .travis.yml           .vagrant/             INSTALL.txt           LICENSE.txt           Makefile              NOTES-cpython.txt     
 NOTES.txt             Python-2.7.13/        README.md             Rplots.pdf            TODO-completion.txt   TODO.txt              Vagrantfile           __init__.py           __pycache__/          _bin/                 
 _build/               _chroot/              _deps/                _devbuild/            _release/             _tmp/                 array.patch           asdl.txt              asdl/                 benchmarks/           
 bin/                  build/                configure             core/                 demo/                 devtools/             dir/                  doc/                  edit.txt              fastlex.pyi           
 fastlex.so            file                  frontend/             gold/                 install               libc.pyi              libc.so               line_input.so         local.sh              logs-0.6.pre13.tar    
 metrics/              misc/                 mycpp/                native/               oil-version.txt       oil_lang/             opy/                  osh/                  ovm2/                 pgen2/                
 portable-rules.mk     posix_.pyi            posix_.so             pylib/                r.txt                 release-notes.txt     release-roadmap.txt   spec/                 test/                 testdata/             
 tmp.txt               tools/                types/                uftrace.data.old/     uftrace.data/         vendor/               web/                  
Traceback (most recent call last):
  File "/home/andy/git/oilshell/oil/core/comp_ui.py", line 89, in PrintCandidates
    self._PrintCandidates(*args)
  File "/home/andy/git/oilshell/oil/core/comp_ui.py", line 374, in _PrintCandidates
    self._ReturnToPrompt(num_lines+1)
  File "/home/andy/git/oilshell/oil/core/comp_ui.py", line 309, in _ReturnToPrompt
    assert last_prompt_len != -1
AssertionError

@andychu andychu changed the title wcswidth() returning -1 not handled wcswidth() returning -1 not handled (and newline not handled) Jul 4, 2019
@andychu andychu changed the title wcswidth() returning -1 not handled (and newline not handled) wcswidth() returning -1 not handled (and newline in PS1 not handled) Jul 4, 2019
@jyn514
Copy link
Contributor

jyn514 commented Jul 4, 2019

I'm not sure that the autocomplete UI would work even if we only counted the last line, that's something to look into. -1 should probably be handled the same way as other errors, by just counting the number of bytes. We should also print a warning if the prompt is invalid or the UTF8 locale isn't installed instead of silently failing.

@jyn514
Copy link
Contributor

jyn514 commented Jul 4, 2019

From #368 (comment):

I can't replicate the error for PS1=$'\xe2\x86\x92' on my machine. Here's my locale:

LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

I noticed you have LANGUAGE=en_US, could you try changing that and seeing what happens? Currently we only change LC_CTYPE because that's what the linux man pages document, maybe it's different on MacOS.

The newline threw an assertion error for me as well.

@andychu
Copy link
Contributor Author

andychu commented Jul 4, 2019

Yeah it's possible we may need to take into account the prompt height in the future. But for now I think fixing the AssertionError is OK.

I would like to print a message rather than silently failing, but I'm not sure where to do it.

I still get the same error:

[osh] lisa ~/git/oilshell/oil$ export LANGUAGE=
[osh] lisa ~/git/oilshell/oil$ locale
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
[osh] lisa ~/git/oilshell/oil$ PS1=$'\n'

ls 
 .git/                 .gitignore            .mypy_cache/          .swp                  .travis.yml           .vagrant/             INSTALL.txt           LICENSE.txt           Makefile              NOTES-cpython.txt     
 NOTES.txt             Python-2.7.13/        README.md             Rplots.pdf            TODO-completion.txt   TODO.txt              Vagrantfile           __init__.py           __pycache__/          _bin/                 
 _build/               _chroot/              _deps/                _devbuild/            _release/             _tmp/                 array.patch           asdl.txt              asdl/                 benchmarks/           
 bin/                  build/                configure             core/                 demo/                 devtools/             dir/                  doc/                  edit.txt              fastlex.pyi           
 fastlex.so            file                  frontend/             gold/                 install               libc.pyi              libc.so               line_input.so         local.sh              logs-0.6.pre13.tar    
 metrics/              misc/                 mycpp/                native/               oil-version.txt       oil_lang/             opy/                  osh/                  ovm2/                 pgen2/                
 portable-rules.mk     posix_.pyi            posix_.so             pylib/                r.txt                 release-notes.txt     release-roadmap.txt   spec/                 test/                 testdata/             
 tmp.txt               tools/                types/                uftrace.data.old/     uftrace.data/         vendor/               web/                  
Traceback (most recent call last):
  File "/home/andy/git/oilshell/oil/core/comp_ui.py", line 89, in PrintCandidates
    self._PrintCandidates(*args)
  File "/home/andy/git/oilshell/oil/core/comp_ui.py", line 374, in _PrintCandidates
    self._ReturnToPrompt(num_lines+1)
  File "/home/andy/git/oilshell/oil/core/comp_ui.py", line 309, in _ReturnToPrompt
    assert last_prompt_len != -1
AssertionError

@jyn514
Copy link
Contributor

jyn514 commented Jul 4, 2019

I meant for the unicode arrow, not for the newline.

@andychu
Copy link
Contributor Author

andychu commented Jul 4, 2019

Does it make a difference? either way we have to handle -1.

I must have made some kind of error minimizing the case. The full prompt in the stackoverflow case failed. But it looks like the cause was the \n and not the arrow.

@jyn514
Copy link
Contributor

jyn514 commented Jul 6, 2019

How would you expect this to behave for a newline inside an escape sequence? For example PS1=$'abc\x01\033[0;31m\n\x02> ', should it be len('abc> ') or len('> ')? Or is it malformed and not worth testing?

jyn514 added a commit to jyn514/oil that referenced this issue Jul 6, 2019
Only try to find width of last line of prompt.
Handle -1 from wcwidth by returning number of bytes in prompt.
Does not handle newline inside of escape characters.

Addresses oilshell#394
@andychu
Copy link
Contributor Author

andychu commented Jul 6, 2019

I'm not sure... I try to come up with a strategy to test what bash does when I don't know.

But I think you can just do a mechanical change to fall back on len() if -1 is returned.

That is, follow the existing logic and remove everything inside \x01 \x02, and then calculate the width.

"monotonically increasing correctness" is OK... i.e. I would fix this bug first and then worry about multiline prompts, which may require larger changes to other parts of the code.

@andychu
Copy link
Contributor Author

andychu commented Jul 6, 2019

The important part is really testing. So when we refactor later, we don't break this.

In this case I would add a unit tests for the PromptLen function. It can return -1 now, but it never should, because that breaks an invariant in the rest of the program (caught by the assertion).

Prior to the unicode change, the function could never return -1.

@jyn514
Copy link
Contributor

jyn514 commented Jul 6, 2019

Do you use property-based testing? It would be nice to have something like https://hypothesis.readthedocs.io/en/latest/quickstart.html that asserts that the function never returns -1 instead of coming up with test cases manually.

@jyn514
Copy link
Contributor

jyn514 commented Jul 6, 2019

Something like this:

diff --git a/core/comp_ui_test.py b/core/comp_ui_test.py
index 7650f7ce..aa95a981 100755
--- a/core/comp_ui_test.py
+++ b/core/comp_ui_test.py
@@ -7,6 +7,8 @@ from __future__ import print_function
 import cStringIO
 import sys
 import unittest
+import hypothesis
+from hypothesis.strategies import text
 
 from core import comp_ui  # module under test
 from core import util
@@ -170,9 +172,9 @@ class PromptTest(unittest.TestCase):
     self.assertEqual(comp_ui._PromptLen("abc\ndef"), 3)
     self.assertEqual(comp_ui._PromptLen(""), 0)
 
-  def testControlCharacters(self):
-    self.assertEqual(comp_ui._PromptLen("\xef"), 1)
-    self.assertEqual(comp_ui._PromptLen("\x03\x05"), 2)
+  @hypothesis.given(text())
+  def testNeverPanics(self, s):
+    assert comp_ui._PromptLen(s) >= 0, repr(s)
 
 if __name__ == '__main__':
   unittest.main()
diff --git a/vendor/typing.py b/vendor/typing.py
index b0fdb1c2..a9a700bf 100644
--- a/vendor/typing.py
+++ b/vendor/typing.py
@@ -10,6 +10,9 @@
 # if TYPE_CHECKING:
 #   NullFunc = Callable[[int, int], int]
 
+TypingMeta = None
+TypeVar = None
+_ForwardRef = None
 List = None
 Tuple = None
 Optional = None

@andychu
Copy link
Contributor Author

andychu commented Jul 6, 2019

It's something I've been interested in, but haven't gotten around to trying.

Does it actually fail with the old code in this case? Like it manages to find the \n ? I guess it can hard-code those special strings and try it.

If so it might be nice to start trying it. We would have to sort ouf the PIP / Travis dependencies, which shouldn't be hard.

(Also if it takes a long time to run, I would put them in a different file so we can selectively run them. I like that the unit tests run pretty fast, and have no deps)

@jyn514
Copy link
Contributor

jyn514 commented Jul 6, 2019

It runs pretty fast, but yes it introduces a dependency. It caught the -1, I think it came up with \x1f or something like that.

@andychu
Copy link
Contributor Author

andychu commented Jul 6, 2019

OK great, I would accept a PR to add it. Although I would still be inclined to put things in a different file to start with. When we gain confidence with it, we can move them into the normal test files.

Maybe one per dir, like osh/property_tests.py and core/property_tests.py ? The _tests.py suffix shouldn't get picked up by test/unit.sh, which is probably what we want at first.

Running it in Travis would be nice too but that could be a followup change.


The one caveat I have is that internal invariants are subject to change / refactoring -- hence my preference for spec tests over unit tests. But yes in this case there was a specific runtime invariant that the assert was protecting.

@jyn514
Copy link
Contributor

jyn514 commented Jul 6, 2019

This just caught another error! PyArg_ParseTuple throws a TypeError if a string contains a null byte.

@andychu
Copy link
Contributor Author

andychu commented Jul 6, 2019

OK interesting. Yeah I have been wondering about this... I think this will happen in all the other C extension too.

bash silently truncates.

$ echo $'foo\x00bar'
foo

mksh seems not to though...

Python builtins raise an error:

>>> os.listdir('bin\0')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: listdir() argument 1 must be encoded string without null bytes, not str

Not sure yet what we should do... great find though!!!

@jyn514
Copy link
Contributor

jyn514 commented Jul 6, 2019

Hmm python seems to handle it fine as long as it doesn't go through a C extension:

(osh) $ echo $'foo\x00bar'
foobar
(osh) $ PS1=$'foo > \x00bar > '
foo > 

jyn514 added a commit to jyn514/oil that referenced this issue Jul 6, 2019
jyn514 added a commit to jyn514/oil that referenced this issue Jul 6, 2019
andychu pushed a commit that referenced this issue Jul 6, 2019
- Only try to find width of last line of prompt.
- Handle -1 from wcwidth by returning number of bytes in prompt.
- Does not handle newline inside of escape characters.

Addresses #394

- Add tests for control characters in prompt

See #402 (comment) and
#394 (comment) for
discussion.
@andychu
Copy link
Contributor Author

andychu commented Jul 17, 2019

Released with 0.7.pre1: https://www.oilshell.org/release/0.7.pre1/

@andychu andychu closed this as completed Jul 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants