Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

textwrap.dedent() expands tabs #42613

Closed
bethard mannequin opened this issue Nov 19, 2005 · 6 comments
Closed

textwrap.dedent() expands tabs #42613

bethard mannequin opened this issue Nov 19, 2005 · 6 comments
Labels
stdlib Python modules in the Lib dir

Comments

@bethard
Copy link
Mannequin

bethard mannequin commented Nov 19, 2005

BPO 1361643
Nosy @birkenfeld, @rhettinger
Files
  • textwrap.diff: Diff for textwrap.py and test_textwrap.py
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2006-06-11.00:41:01.000>
    created_at = <Date 2005-11-19.19:02:09.000>
    labels = ['library']
    title = 'textwrap.dedent() expands tabs'
    updated_at = <Date 2006-06-11.00:41:01.000>
    user = 'https://bugs.python.org/bethard'

    bugs.python.org fields:

    activity = <Date 2006-06-11.00:41:01.000>
    actor = 'gward'
    assignee = 'gward'
    closed = True
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2005-11-19.19:02:09.000>
    creator = 'bethard'
    dependencies = []
    files = ['1837']
    hgrepos = []
    issue_num = 1361643
    keywords = []
    message_count = 6.0
    messages = ['26902', '26903', '26904', '26905', '26906', '26907']
    nosy_count = 4.0
    nosy_names = ['gward', 'georg.brandl', 'rhettinger', 'bethard']
    pr_nums = []
    priority = 'high'
    resolution = 'fixed'
    stage = None
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue1361643'
    versions = ['Python 2.5']

    @bethard
    Copy link
    Mannequin Author

    bethard mannequin commented Nov 19, 2005

    I'm not sure whether this is a documentation bug or a
    code bug, but textwrap.dedent() expands tabs (and
    AFAICT doesn't give the user any way of stopping this):

    py> def test():
    ... x = ('abcd efgh\n'
    ... 'ijkl mnop\n')
    ... y = textwrap.dedent('''\
    ... abcd efgh
    ... ijkl mnop
    ... ''')
    ... return x, y
    ...
    py> test()
    ('abcd\tefgh\nijkl\tmnop\n', 'abcd efgh\nijkl
    mnop\n')

    Looking at the code, I can see the culprit is the first
    line:

    lines = text.expandtabs().split('\n')

    If this is the intended behavior, I think the first
    sentence in the documentation[1] should be replaced with:

    """
    Replace all tabs in string with spaces as per
    str.expandtabs() and then remove any whitespace that
    can be uniformly removed from the left of every line in
    text.
    """

    and (I guess this part is an RFE) textwrap.dedent()
    should gain an optional expandtabs= keyword argument to
    disable this behavior.

    If it's not the intended behavior, I'd love to see that
    .expandtabs() call removed.

    [1]http://docs.python.org/lib/module-textwrap.html

    @bethard bethard mannequin closed this as completed Nov 19, 2005
    @bethard bethard mannequin assigned gward Nov 19, 2005
    @bethard bethard mannequin added the stdlib Python modules in the Lib dir label Nov 19, 2005
    @bethard bethard mannequin closed this as completed Nov 19, 2005
    @bethard bethard mannequin assigned gward Nov 19, 2005
    @bethard bethard mannequin added the stdlib Python modules in the Lib dir label Nov 19, 2005
    @rhettinger
    Copy link
    Contributor

    Logged In: YES
    user_id=80475

    FWIW, the tab expansion would be more useful if the default
    tabsize could be changed.

    @rhettinger
    Copy link
    Contributor

    Logged In: YES
    user_id=80475

    After more thought, I think the expandtabs() is a bug since
    it expands content tabs as well as margin tabs:

    >>> textwrap.dedent('\tABC\t\tDEF')
    'ABC             DEF'

    This is especially problematic given that dedent() has to
    guess at the tab size.

    If this gets fixed, I recommend using regular expressions as
    a way to indentify common margin prefixes on non-empty
    lines. This will also mixes of spaces and tabs without
    altering content with embedded tabs and without making
    assumptions about the tab size. Also, it ought to run
    somewhat faster.

    @rhettinger
    Copy link
    Contributor

    Logged In: YES
    user_id=80475

    Suggested code:

    import re as _re
    _emptylines_with_spaces = _re.compile('(?m)^[ \t]+$')
    _prefixes_on_nonempty_lines = _re.compile('(?m)(^[
    \t]*)(?:[^ \t\n]+)')
    
    def dedent(text):
      text = _emptylines_with_spaces.sub('', text)
      prefixes = _prefixes_on_nonempty_lines.findall(text)
      margin = min(prefixes or [''])
      if margin:
        text = _re.sub('(?m)^' + margin, '', text)
      return text

    @birkenfeld
    Copy link
    Member

    Logged In: YES
    user_id=1188172

    Looks good!

    @gward
    Copy link
    Mannequin

    gward mannequin commented Jun 11, 2006

    Logged In: YES
    user_id=14422

    I agree that the docs are (pretty) clear and the code is
    wrong. When determining common leading whitespace, tabs and
    spaces should *not* be treated as equivalent.

    Raymond's fix was close, but not quite there: considering
    only the length of leading whitespace still causes space/tab
    confusion. (This only became clear to me after I wrote
    several test cases.)

    My fix is based on Raymond's, i.e. it uses regexes for most
    of the heavy lifting rather than splitting the input string
    on newline and looping over the lines. The bit that's
    different is determining what exactly is the common leading
    whitespace string. Anyways, this ended up being a complete
    rewrite of dedent(). I also added a paragraph to the docs
    to clarify the distinction between tabs and spaces.

    Checked in under rev 46844 (trunk only).

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants