shorten function of textwrap module is susceptible to non-normalized whitespaces #62923

vajrasky · 2013-08-13T06:26:52Z

BPO	18723
Nosy	@pitrou, @ezio-melotti, @vajrasky
Files	fix_for_non_normalized_whitespaces_in_placeholder.patch fix_for_non_normalized_whitespaces_in_placeholder_v2.patch fix_for_non_normalized_whitespaces_in_placeholder_v3.patch fix_for_non_normalized_whitespaces_in_placeholder_v4.patch fix_for_non_normalized_whitespaces_in_placeholder_v5.patch

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2013-10-16.07:38:56.759>
created_at = <Date 2013-08-13.06:26:52.028>
labels = ['type-bug', 'library']
title = 'shorten function of textwrap module is susceptible to\tnon-normalized whitespaces'
updated_at = <Date 2013-10-16.07:38:56.757>
user = 'https://github.com/vajrasky'

bugs.python.org fields:

activity = <Date 2013-10-16.07:38:56.757>
actor = 'vajrasky'
assignee = 'none'
closed = True
closed_date = <Date 2013-10-16.07:38:56.759>
closer = 'vajrasky'
components = ['Library (Lib)']
creation = <Date 2013-08-13.06:26:52.028>
creator = 'vajrasky'
dependencies = []
files = ['31271', '31274', '31276', '31277', '31278']
hgrepos = []
issue_num = 18723
keywords = ['patch']
message_count = 11.0
messages = ['195045', '195046', '195047', '195059', '195060', '195067', '195073', '195079', '195080', '195081', '200038']
nosy_count = 3.0
nosy_names = ['pitrou', 'ezio.melotti', 'vajrasky']
pr_nums = []
priority = 'normal'
resolution = 'fixed'
stage = 'patch review'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue18723'
versions = ['Python 3.4']

vajrasky · 2013-08-13T06:26:52Z

In shorten function of textwrap module, the placeholder becomes a hole where we can inject non-normalized whitespaces to the text.

>>> text = "Hello there, how are you this fine day? I'm glad to hear it!"
>>> from textwrap import shorten
>>> shorten(text, 40, placeholder="              ")
'Hello there, how are you'

We normalize the only-whitespaces placeholder.

But....

>>> shorten(text, 17, placeholder="  \n\t(...)  \n\t[...]  \n\t")
'(...)  \n\t[...]'
>>> shorten(text, 40, placeholder="  \n\t(...)  \n\t[...]  \n\t")
'Hello there, how  \n\t(...)  \n\t[...]'

Attached the patch to normalize the non-normalized whitespaces in placeholder.

pitrou · 2013-08-13T06:44:06Z

I'm not convinced this is a bug. The whitespace right-stripping is more of an implementation detail. You can really put what you want inside the placeholder.

vajrasky · 2013-08-13T06:54:54Z

Okay, nevermind about non-normalized whitespaces in placeholder, but what about this case?

>>> text = "Hello there, how are you this fine day? I'm glad to hear it!"
>>> from textwrap import shorten
>>> shorten(text, 10, placeholder="     ")
'Hello'
>>> shorten(text, 9, placeholder="     ")
''
>>> len('Hello')
5

Isn't that weird?

pitrou · 2013-08-13T09:23:09Z

Isn't that weird?

Agreed, this one is a bug. The stripping in shorten() should be smarter, i.e. it should not affect the placeholder's own spaces.
Do you want to write a patch?

pitrou · 2013-08-13T09:24:08Z

Correcting myself:

i.e. it should not affect the placeholder's own spaces.

... except for leading whitespace in case the placeholder ends up alone in the result.
i.e. shorten("somethingtoolong") should return "(...)", not " (...)".

vajrasky · 2013-08-13T11:05:30Z

Do you want to write a patch?
My pleasure.

Attached the second version of the patch to accomodate Pitrou's request.

vajrasky · 2013-08-13T13:19:47Z

Attached more refined patch. Removed unnecessary test. Added more robust test. Added shorten in __all__.

vajrasky · 2013-08-13T15:10:07Z

I missed this case:

>>> from textwrap import shorten
>>> shorten('hell', 4)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/sky/Code/python/programming_language/cpython/Lib/textwrap.py", line 386, in shorten
    return w.shorten(text, placeholder=placeholder)
  File "/home/sky/Code/python/programming_language/cpython/Lib/textwrap.py", line 322, in shorten
    raise ValueError("placeholder too large for max width")
ValueError: placeholder too large for max width

Also, in this patch, I removed the unnecessary stripping of the text part.

pitrou · 2013-08-13T15:34:16Z

> I missed this case:
> 
> >>> from textwrap import shorten
> >>> shorten('hell', 4)
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File
>   "/home/sky/Code/python/programming_language/cpython/Lib/textwrap.py",
>   line 386, in shorten
>     return w.shorten(text, placeholder=placeholder)
>   File
>   "/home/sky/Code/python/programming_language/cpython/Lib/textwrap.py",
>   line 322, in shorten
>     raise ValueError("placeholder too large for max width")
> ValueError: placeholder too large for max width

This is by design. Passing a placeholder larger than the width is a
programming error, regardless of whether the text is small enough.

vajrasky · 2013-08-13T15:48:41Z

Okay, attached the fifth version not to care about the case where text is smaller than the placeholder.

vajrasky · 2013-10-16T07:38:57Z

Serhiy's commit http://hg.python.org/cpython/rev/2e8c424dc638 fixed this issue already. So I closed this ticket.

vajrasky mannequin added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Aug 13, 2013

pitrou changed the title ~~shorten function of textwrap module is susceptible to non-normalized whitespaces~~ shorten function of textwrap module is susceptible to non-normalized whitespaces Aug 13, 2013

vajrasky mannequin closed this as completed Oct 16, 2013

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

shorten function of textwrap module is susceptible to non-normalized whitespaces #62923

shorten function of textwrap module is susceptible to non-normalized whitespaces #62923

vajrasky mannequin commented Aug 13, 2013

vajrasky mannequin commented Aug 13, 2013

pitrou commented Aug 13, 2013

vajrasky mannequin commented Aug 13, 2013

pitrou commented Aug 13, 2013

pitrou commented Aug 13, 2013

vajrasky mannequin commented Aug 13, 2013

vajrasky mannequin commented Aug 13, 2013

vajrasky mannequin commented Aug 13, 2013

pitrou commented Aug 13, 2013

vajrasky mannequin commented Aug 13, 2013

vajrasky mannequin commented Oct 16, 2013

shorten function of textwrap module is susceptible to non-normalized whitespaces #62923

shorten function of textwrap module is susceptible to non-normalized whitespaces #62923

Comments

vajrasky mannequin commented Aug 13, 2013

vajrasky mannequin commented Aug 13, 2013

pitrou commented Aug 13, 2013

vajrasky mannequin commented Aug 13, 2013

pitrou commented Aug 13, 2013

pitrou commented Aug 13, 2013

vajrasky mannequin commented Aug 13, 2013

vajrasky mannequin commented Aug 13, 2013

vajrasky mannequin commented Aug 13, 2013

pitrou commented Aug 13, 2013

vajrasky mannequin commented Aug 13, 2013

vajrasky mannequin commented Oct 16, 2013