textwrap.dedent() expands tabs #42613

bethard · 2005-11-19T19:02:09Z

BPO	1361643
Nosy	@birkenfeld, @rhettinger
Files	textwrap.diff: Diff for textwrap.py and test_textwrap.py

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2006-06-11.00:41:01.000>
created_at = <Date 2005-11-19.19:02:09.000>
labels = ['library']
title = 'textwrap.dedent() expands tabs'
updated_at = <Date 2006-06-11.00:41:01.000>
user = 'https://bugs.python.org/bethard'

bugs.python.org fields:

activity = <Date 2006-06-11.00:41:01.000>
actor = 'gward'
assignee = 'gward'
closed = True
closed_date = None
closer = None
components = ['Library (Lib)']
creation = <Date 2005-11-19.19:02:09.000>
creator = 'bethard'
dependencies = []
files = ['1837']
hgrepos = []
issue_num = 1361643
keywords = []
message_count = 6.0
messages = ['26902', '26903', '26904', '26905', '26906', '26907']
nosy_count = 4.0
nosy_names = ['gward', 'georg.brandl', 'rhettinger', 'bethard']
pr_nums = []
priority = 'high'
resolution = 'fixed'
stage = None
status = 'closed'
superseder = None
type = None
url = 'https://bugs.python.org/issue1361643'
versions = ['Python 2.5']

bethard · 2005-11-19T19:02:09Z

I'm not sure whether this is a documentation bug or a
code bug, but textwrap.dedent() expands tabs (and
AFAICT doesn't give the user any way of stopping this):

py> def test():
... x = ('abcd efgh\n'
... 'ijkl mnop\n')
... y = textwrap.dedent('''\
... abcd efgh
... ijkl mnop
... ''')
... return x, y
...
py> test()
('abcd\tefgh\nijkl\tmnop\n', 'abcd efgh\nijkl
mnop\n')

Looking at the code, I can see the culprit is the first
line:

lines = text.expandtabs().split('\n')

If this is the intended behavior, I think the first
sentence in the documentation[1] should be replaced with:

"""
Replace all tabs in string with spaces as per
str.expandtabs() and then remove any whitespace that
can be uniformly removed from the left of every line in
text.
"""

and (I guess this part is an RFE) textwrap.dedent()
should gain an optional expandtabs= keyword argument to
disable this behavior.

If it's not the intended behavior, I'd love to see that
.expandtabs() call removed.

[1]http://docs.python.org/lib/module-textwrap.html

rhettinger · 2005-11-19T20:18:46Z

Logged In: YES
user_id=80475

FWIW, the tab expansion would be more useful if the default
tabsize could be changed.

rhettinger · 2005-11-20T04:52:00Z

Logged In: YES
user_id=80475

After more thought, I think the expandtabs() is a bug since
it expands content tabs as well as margin tabs:

>>> textwrap.dedent('\tABC\t\tDEF')
'ABC             DEF'

This is especially problematic given that dedent() has to
guess at the tab size.

If this gets fixed, I recommend using regular expressions as
a way to indentify common margin prefixes on non-empty
lines. This will also mixes of spaces and tabs without
altering content with embedded tabs and without making
assumptions about the tab size. Also, it ought to run
somewhat faster.

rhettinger · 2005-11-20T06:04:48Z

Logged In: YES
user_id=80475

Suggested code:

import re as _re
_emptylines_with_spaces = _re.compile('(?m)^[ \t]+$')
_prefixes_on_nonempty_lines = _re.compile('(?m)(^[
\t]*)(?:[^ \t\n]+)')

def dedent(text):
  text = _emptylines_with_spaces.sub('', text)
  prefixes = _prefixes_on_nonempty_lines.findall(text)
  margin = min(prefixes or [''])
  if margin:
    text = _re.sub('(?m)^' + margin, '', text)
  return text

birkenfeld · 2005-12-15T08:45:18Z

Logged In: YES
user_id=1188172

Looks good!

gward · 2006-06-11T00:41:01Z

Logged In: YES
user_id=14422

I agree that the docs are (pretty) clear and the code is
wrong. When determining common leading whitespace, tabs and
spaces should *not* be treated as equivalent.

Raymond's fix was close, but not quite there: considering
only the length of leading whitespace still causes space/tab
confusion. (This only became clear to me after I wrote
several test cases.)

My fix is based on Raymond's, i.e. it uses regexes for most
of the heavy lifting rather than splitting the input string
on newline and looping over the lines. The bit that's
different is determining what exactly is the common leading
whitespace string. Anyways, this ended up being a complete
rewrite of dedent(). I also added a paragraph to the docs
to clarify the distinction between tabs and spaces.

Checked in under rev 46844 (trunk only).

bethard mannequin closed this as completed Nov 19, 2005

bethard mannequin assigned gward Nov 19, 2005

bethard mannequin added the stdlib Python modules in the Lib dir label Nov 19, 2005

bethard mannequin closed this as completed Nov 19, 2005

bethard mannequin assigned gward Nov 19, 2005

bethard mannequin added the stdlib Python modules in the Lib dir label Nov 19, 2005

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

textwrap.dedent() expands tabs #42613

textwrap.dedent() expands tabs #42613

bethard mannequin commented Nov 19, 2005

bethard mannequin commented Nov 19, 2005

rhettinger commented Nov 19, 2005

rhettinger commented Nov 20, 2005

rhettinger commented Nov 20, 2005

birkenfeld commented Dec 15, 2005

gward mannequin commented Jun 11, 2006

textwrap.dedent() expands tabs #42613

textwrap.dedent() expands tabs #42613

Comments

bethard mannequin commented Nov 19, 2005

bethard mannequin commented Nov 19, 2005

rhettinger commented Nov 19, 2005

rhettinger commented Nov 20, 2005

rhettinger commented Nov 20, 2005

birkenfeld commented Dec 15, 2005

gward mannequin commented Jun 11, 2006