New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
shorten function of textwrap module is susceptible to non-normalized whitespaces #62923
Comments
In shorten function of textwrap module, the placeholder becomes a hole where we can inject non-normalized whitespaces to the text. >>> text = "Hello there, how are you this fine day? I'm glad to hear it!"
>>> from textwrap import shorten
>>> shorten(text, 40, placeholder=" ")
'Hello there, how are you' We normalize the only-whitespaces placeholder. But.... >>> shorten(text, 17, placeholder=" \n\t(...) \n\t[...] \n\t")
'(...) \n\t[...]'
>>> shorten(text, 40, placeholder=" \n\t(...) \n\t[...] \n\t")
'Hello there, how \n\t(...) \n\t[...]' Attached the patch to normalize the non-normalized whitespaces in placeholder. |
I'm not convinced this is a bug. The whitespace right-stripping is more of an implementation detail. You can really put what you want inside the placeholder. |
Okay, nevermind about non-normalized whitespaces in placeholder, but what about this case? >>> text = "Hello there, how are you this fine day? I'm glad to hear it!"
>>> from textwrap import shorten
>>> shorten(text, 10, placeholder=" ")
'Hello'
>>> shorten(text, 9, placeholder=" ")
''
>>> len('Hello')
5 Isn't that weird? |
Agreed, this one is a bug. The stripping in shorten() should be smarter, i.e. it should not affect the placeholder's own spaces. |
Correcting myself:
... except for leading whitespace in case the placeholder ends up alone in the result. |
Attached the second version of the patch to accomodate Pitrou's request. |
Attached more refined patch. Removed unnecessary test. Added more robust test. Added shorten in __all__. |
I missed this case: >>> from textwrap import shorten
>>> shorten('hell', 4)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/sky/Code/python/programming_language/cpython/Lib/textwrap.py", line 386, in shorten
return w.shorten(text, placeholder=placeholder)
File "/home/sky/Code/python/programming_language/cpython/Lib/textwrap.py", line 322, in shorten
raise ValueError("placeholder too large for max width")
ValueError: placeholder too large for max width Also, in this patch, I removed the unnecessary stripping of the text part. |
> I missed this case:
>
> >>> from textwrap import shorten
> >>> shorten('hell', 4)
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> File
> "/home/sky/Code/python/programming_language/cpython/Lib/textwrap.py",
> line 386, in shorten
> return w.shorten(text, placeholder=placeholder)
> File
> "/home/sky/Code/python/programming_language/cpython/Lib/textwrap.py",
> line 322, in shorten
> raise ValueError("placeholder too large for max width")
> ValueError: placeholder too large for max width This is by design. Passing a placeholder larger than the width is a |
Okay, attached the fifth version not to care about the case where text is smaller than the placeholder. |
Serhiy's commit http://hg.python.org/cpython/rev/2e8c424dc638 fixed this issue already. So I closed this ticket. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: