-
-
Notifications
You must be signed in to change notification settings - Fork 31.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TextWrapper break_long_words=True, break_on_hyphens=True on long words #72846
Comments
Quoting https://docs.python.org/2/library/textwrap.html width (default: 70) The maximum length of wrapped lines. As long as there are no individual words in the input text longer than width, TextWrapper guarantees that no output line will be longer than width characters. It appears that with break_long_words=True and break_on_hyphens=True, any hyphenated term longer than the specified width does not get preferentially broken at a hyphen. Example input: We used the enyzme 2-succinyl-6-hydroxy-2,4-cyclohexadiene-1-carboxylate synthase. Using break_long_words=True, break_on_hyphens=True Expected result using break_long_words=True, break_on_hyphens=True Given a width=50, then the 53 character long "word" of "2-succinyl-6-hydroxy-2,4-cyclohexadiene-1-carboxylate" must be split somewhere, and since break_on_hyphens=True it should break at a hyphen as shown above as the desired output. Sample code: import textwrap
w = 50
text = "We used the enyzme 2-succinyl-6-hydroxy-2,4-cyclohexadiene-1-carboxylate synthase."
print("Input:")
print("=" * w)
print(text)
print("=" * w)
print("Using break_long_words=True, break_on_hyphens=True")
print("=" * w)
print(textwrap.fill(text, width=w, break_long_words=True, break_on_hyphens=True))
print("=" * w) |
This is because the current algorithm of breaking on hyphens allows to break only between letters. This prevents breaking dates and times. Perhaps it should be made more lenient in the case of too long word. |
textwrap does not actually apply the break-on-hyphen algorithm at all to long words. It just chops them up into depth-sized pieces. The PR I just submitted looks for hyphens and uses them as cut points if they exist, without any attempt to understand their context. |
Actually I see what Serhiy meant about the hyphen algorithm - the regex breaking up words. Yes, this is applied to long words and the reason he stated for this issue is correct. It is probably possible to make that regex understand width and long-words, but it would be more complicated and will need to be recalculated for each width. I think long words are not the typical input, so it's better to handle them separately and keep the rest simple. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: