Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

shorten function of textwrap module is susceptible to non-normalized whitespaces #62923

Closed
vajrasky mannequin opened this issue Aug 13, 2013 · 11 comments
Closed

shorten function of textwrap module is susceptible to non-normalized whitespaces #62923

vajrasky mannequin opened this issue Aug 13, 2013 · 11 comments
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@vajrasky
Copy link
Mannequin

vajrasky mannequin commented Aug 13, 2013

BPO 18723
Nosy @pitrou, @ezio-melotti, @vajrasky
Files
  • fix_for_non_normalized_whitespaces_in_placeholder.patch
  • fix_for_non_normalized_whitespaces_in_placeholder_v2.patch
  • fix_for_non_normalized_whitespaces_in_placeholder_v3.patch
  • fix_for_non_normalized_whitespaces_in_placeholder_v4.patch
  • fix_for_non_normalized_whitespaces_in_placeholder_v5.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2013-10-16.07:38:56.759>
    created_at = <Date 2013-08-13.06:26:52.028>
    labels = ['type-bug', 'library']
    title = 'shorten function of textwrap module is susceptible to\tnon-normalized whitespaces'
    updated_at = <Date 2013-10-16.07:38:56.757>
    user = 'https://github.com/vajrasky'

    bugs.python.org fields:

    activity = <Date 2013-10-16.07:38:56.757>
    actor = 'vajrasky'
    assignee = 'none'
    closed = True
    closed_date = <Date 2013-10-16.07:38:56.759>
    closer = 'vajrasky'
    components = ['Library (Lib)']
    creation = <Date 2013-08-13.06:26:52.028>
    creator = 'vajrasky'
    dependencies = []
    files = ['31271', '31274', '31276', '31277', '31278']
    hgrepos = []
    issue_num = 18723
    keywords = ['patch']
    message_count = 11.0
    messages = ['195045', '195046', '195047', '195059', '195060', '195067', '195073', '195079', '195080', '195081', '200038']
    nosy_count = 3.0
    nosy_names = ['pitrou', 'ezio.melotti', 'vajrasky']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'patch review'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue18723'
    versions = ['Python 3.4']

    @vajrasky
    Copy link
    Mannequin Author

    vajrasky mannequin commented Aug 13, 2013

    In shorten function of textwrap module, the placeholder becomes a hole where we can inject non-normalized whitespaces to the text.

    >>> text = "Hello there, how are you this fine day? I'm glad to hear it!"
    >>> from textwrap import shorten
    >>> shorten(text, 40, placeholder="              ")
    'Hello there, how are you'

    We normalize the only-whitespaces placeholder.

    But....

    >>> shorten(text, 17, placeholder="  \n\t(...)  \n\t[...]  \n\t")
    '(...)  \n\t[...]'
    >>> shorten(text, 40, placeholder="  \n\t(...)  \n\t[...]  \n\t")
    'Hello there, how  \n\t(...)  \n\t[...]'

    Attached the patch to normalize the non-normalized whitespaces in placeholder.

    @vajrasky vajrasky mannequin added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Aug 13, 2013
    @pitrou
    Copy link
    Member

    pitrou commented Aug 13, 2013

    I'm not convinced this is a bug. The whitespace right-stripping is more of an implementation detail. You can really put what you want inside the placeholder.

    @vajrasky
    Copy link
    Mannequin Author

    vajrasky mannequin commented Aug 13, 2013

    Okay, nevermind about non-normalized whitespaces in placeholder, but what about this case?

    >>> text = "Hello there, how are you this fine day? I'm glad to hear it!"
    >>> from textwrap import shorten
    >>> shorten(text, 10, placeholder="     ")
    'Hello'
    >>> shorten(text, 9, placeholder="     ")
    ''
    >>> len('Hello')
    5

    Isn't that weird?

    @pitrou
    Copy link
    Member

    pitrou commented Aug 13, 2013

    Isn't that weird?

    Agreed, this one is a bug. The stripping in shorten() should be smarter, i.e. it should not affect the placeholder's own spaces.
    Do you want to write a patch?

    @pitrou
    Copy link
    Member

    pitrou commented Aug 13, 2013

    Correcting myself:

    i.e. it should not affect the placeholder's own spaces.

    ... except for leading whitespace in case the placeholder ends up alone in the result.
    i.e. shorten("somethingtoolong") should return "(...)", not " (...)".

    @vajrasky
    Copy link
    Mannequin Author

    vajrasky mannequin commented Aug 13, 2013

    Do you want to write a patch?
    My pleasure.

    Attached the second version of the patch to accomodate Pitrou's request.

    @vajrasky
    Copy link
    Mannequin Author

    vajrasky mannequin commented Aug 13, 2013

    Attached more refined patch. Removed unnecessary test. Added more robust test. Added shorten in __all__.

    @vajrasky
    Copy link
    Mannequin Author

    vajrasky mannequin commented Aug 13, 2013

    I missed this case:

    >>> from textwrap import shorten
    >>> shorten('hell', 4)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/home/sky/Code/python/programming_language/cpython/Lib/textwrap.py", line 386, in shorten
        return w.shorten(text, placeholder=placeholder)
      File "/home/sky/Code/python/programming_language/cpython/Lib/textwrap.py", line 322, in shorten
        raise ValueError("placeholder too large for max width")
    ValueError: placeholder too large for max width

    Also, in this patch, I removed the unnecessary stripping of the text part.

    @pitrou
    Copy link
    Member

    pitrou commented Aug 13, 2013

    > I missed this case:
    > 
    > >>> from textwrap import shorten
    > >>> shorten('hell', 4)
    > Traceback (most recent call last):
    >   File "<stdin>", line 1, in <module>
    >   File
    >   "/home/sky/Code/python/programming_language/cpython/Lib/textwrap.py",
    >   line 386, in shorten
    >     return w.shorten(text, placeholder=placeholder)
    >   File
    >   "/home/sky/Code/python/programming_language/cpython/Lib/textwrap.py",
    >   line 322, in shorten
    >     raise ValueError("placeholder too large for max width")
    > ValueError: placeholder too large for max width

    This is by design. Passing a placeholder larger than the width is a
    programming error, regardless of whether the text is small enough.

    @pitrou pitrou changed the title shorten function of textwrap module is susceptible to non-normalized whitespaces shorten function of textwrap module is susceptible to non-normalized whitespaces Aug 13, 2013
    @vajrasky
    Copy link
    Mannequin Author

    vajrasky mannequin commented Aug 13, 2013

    Okay, attached the fifth version not to care about the case where text is smaller than the placeholder.

    @vajrasky
    Copy link
    Mannequin Author

    vajrasky mannequin commented Oct 16, 2013

    Serhiy's commit http://hg.python.org/cpython/rev/2e8c424dc638 fixed this issue already. So I closed this ticket.

    @vajrasky vajrasky mannequin closed this as completed Oct 16, 2013
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    1 participant