-
-
Notifications
You must be signed in to change notification settings - Fork 31.6k
textwrap: Non-breaking space not honored #64690
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The textwrap module does not distinguish non-breaking space (\xa0) from other whitespace when determining word boundaries. In the beginning of the module, the _whitespace variable is defined to address this issue but is not used in the regular expressions determining the splitting rules. |
Thanks for the patch, Kaarle. Could you add some tests in Lib/test/test_textwrap? Also, for your contribution to be integrated, we'll need you to sign a contributor's agreement: http://www.python.org/psf/contrib/contrib-form/ |
It looks to me that code can be a little more clear if use C-style formatting. |
Using a multiline regex (with re.VERBOSE) would also avoid the clutter of parens and quotes. |
What about other spaces: '\N{OGHAM SPACE MARK}', '\N{EN QUAD}', '\N{EM QUAD}', '\N{EN SPACE}', '\N{EM SPACE}', '\N{THREE-PER-EM SPACE}', '\N{FOUR-PER-EM SPACE}', '\N{SIX-PER-EM SPACE}', '\N{FIGURE SPACE}', '\N{PUNCTUATION SPACE}', '\N{THIN SPACE}', '\N{HAIR SPACE}', '\N{LINE SEPARATOR}', '\N{PARAGRAPH SEPARATOR}', '\N{NARROW NO-BREAK SPACE}', '\N{MEDIUM MATHEMATICAL SPACE}', '\N{IDEOGRAPHIC SPACE}'? In Python 2 textwrap supported only 8-bit spaces, but Python 3 should support full Unicode. And from this side of view the proposed patch is a regression. |
NON-BREAKING SPACE and NARROW NON-BREAKING SPACE are characters whose intent is clear and who are used by knowledgeable users and smart software, for example LibreOffice with an fr_FR locale. I don’t know about the other characters listed by Serhiy, and I wouldn’t worry about them unless users requested support for them or another core dev explained why they should be supported. A comment at the start of the module (where _whitespace, used in the patch here, is defined) even talks about NBSP; it is focused on bytes though and should be updated for the Python 3 unicode world. |
changed honor-non-breaking-spaces.patch: added test for \N{NARROW NO-BREAK SPACE} |
Thank you, this looks really good. I left some comments on rietveld. |
Patch on top of dbudinova's that attempts to replace the concatenation of strings with a verbose regex. |
Hey there, wanted to follow up on the state of this... is there a reason why this has not made it into vanilla yet? If so, I'd like to try to help out clear impediments if I can. This issue is *really*, really, really annoying me. I've posted about a year ago on python-list (http://code.activestate.com/lists/python-list/685604/) and was referred to this bug and thought I'd wait it out. But now the last change was 2 years ago and no relief in sight. So if nothing else, please take it as a gentle reassurance that this bug is really affecting real-world scenarios and annoying as hell. Especially since the semantic of a non-breaking space is pretty much exactly to *not* break on text wrapping. If there's anything I can contribute to get things going again, by all means please let me know. All hands on deck! Cheers, |
It probably just got forgotten. If you want to help move it forward please do a review of the patch (see https://docs.python.org/devguide/tracker.html#reviewing-patches), including whether or not all outstanding review comments have been addressed, and post your recommendations here. |
The code of the textwrap module was changed since publishing the last patch. Proposed patch resolves conflicts and addresses Eric's comments. Maybe add breaking Unicode spaces (OGHAM SPACE MARK, EN QUAD, etc) to _whitespace? I think in future we should implement the Unicode line breaking algorithm [1]. |
New changeset fcabef0ce773 by Serhiy Storchaka in branch '3.5': New changeset bfa400108fc5 by Serhiy Storchaka in branch '3.6': New changeset b86dacb9e668 by Serhiy Storchaka in branch 'default': |
Misc/NEWS
so that it is managed by towncrier #552Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: