New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
str.title() misbehaves with apostrophes #51257
Comments
str.title() capitalizes the first letter after an apostrophe: >>> "This isn't right".title()
"This Isn'T Right" The library function string.capwords, which appears to have exactly the >>> string.capwords("This isn't right")
"This Isn't Right" Tested on 2.6.2 on Mac OS X |
This was already asked some years ago. http://mail.python.org/pipermail/python-list/2006-April/549340.html |
The string module, however, fails to properly capitalize anything in quotes: >>> string.capwords("i pity the 'foo'.")
"I Pity The 'foo'." The string module could be easily made to work like the object. The |
I agree with the OP that str.title should be made smarter. As it Extending on Thomas's comment, I think string.capwords() needs to be As it stands, we have two methods that both don't quite do what we would |
I believe capwords was supposed to be removed in 3.0, but this did not |
If you can find a link to the discussion for removing capwords, we can |
I haven't been able to find any discussion of deprecating capwords other http://mail.python.org/pipermail/python-3000/2007-April/006642.html Later in the thread Barry says he is neutral on removing capwords, and I think Ezio found some other information somewhere. |
If "correct handling of apostrophe's and quotation marks, keeping the I can make a test and patch for this if this is what we decide. |
I'm still researching what other languages do. MS-Excel matches what It would also be nice to handle hyphenates like "xray" --> "X-ray". Am thinking that it would be nice if the user could pass-in an optional A broader solution would be to replace string.capwords() with a more http://aitech.ac.jp/~ckelly/midi/help/caps.html http://search.cpan.org/dist/Text-Capitalize/Capitalize.pm "Headline Style" in the Chicago Manual of Style or http://grammar.about.com/b/2008/04/11/rules-for-capitalizing-the-words-in-a-title.htm Any such attempt at a broad solution needs to provide ways for users to |
Thomas, if you write-up an initial patch, aim for the most conservative i'm I'm Given letters-apostrophe-letter, capitalize only the first letter and |
We shouldn't change the current default behaviour, people are probably Besides, doing the right thing is both (natural) language-dependent and However, adding an optional argument to str.title() so as to change the |
Guido, do you have an opinion on whether to have str.title() handle IMO, the problem comes-up often enough that people are looking for I'm not worried about Antoines's comment that we can't change anything Options:
My order of preferences is 2,4,3,1. |
While I was fixing bpo-7000 I found that the tests for capwords had been In bpo-6412 other problems of .title() are discussed, and there are also a |
Well I think even in English it doesn't work right. My point is that capitalization is both language-sensitive and
I really think the only reasonable options are 3 and 1. |
By the way, we might want to mention in the documentation that the |
Raymond, please refrain from emotional terms like "bug factory". I have nothing to say about whether string.capwords() should be removed, The title() method exists primarily because the Unicode standard has a Also note that .title() matches .istitle() in the sense that I worry that providing an API that adds a way to specify a set of What's a realistic use case for .title() anyway? (Proposal: close as won't fix.) |
A doc fix sounds like a great idea. |
I will add a comment to the docs. |
I don't recall anything specifically wrt removing capwords. Most likely |
Guido van Rossum wrote:
The primary use is when converting a string to be used as The implementation follows the rules laid out in UTR#21: http://unicode.org/reports/tr21/tr21-3.html The Python version only implements the basic set of rules, i.e. It doesn't implement the special casing rules, since these would It also doesn't implement mappings that would result in a change of Patches to enhance the code to support those additional rules Regarding the apostrophe: the Unicode standard doesn't appear to It's likely that the special use of the apostrophe in English Regarding the idea to add an option to define which characters to |
Marc-Andre Lemburg wrote:
Looking at the many different uses in various languages, this http://en.wikipedia.org/wiki/Apostrophe To make things even more complicated, the usual typewriter apostrophe |
bpo-6412 has a patch. |
Yup, and the right one typographically isn't necessarily the ASCII |
Ezio Melotti wrote:
That patch looks promising. |
I admit I don't fully understand the semantics of capwords(). But from This algorithm should be implemented anyway, to properly solve |
Sure, but it should be another function, which might have its place in capwords() itself could be deprecated, since it's an obvious one-liner. |
Christoph Burgmer wrote:
string.capwords() is an old function from the days before Unicode.
Simple word breaking would be nice to have in Python as new Note however, that word boundaries are just as complicated as casing: |
Antoine Pitrou wrote:
Yes, sorry, I meant the semantics, where as you are right for the Marc-Andre Lemburg wrote:
ICU already has the full implementation, so Python could get away with >>> from PyICU import UnicodeString, Locale, BreakIterator
>>> en_US_locale = Locale('en_US')
>>> breakIter = BreakIterator.createWordInstance(en_US_locale)
>>> s = UnicodeString("There's a hole in the bucket.")
>>> print s.toTitle(breakIter, en_US_locale)
There's A Hole In The Bucket.
>>> breakIter.setText("There's a hole in the bucket.")
>>> last = 0
>>> for i in breakIter:
... print s[last:i]
... last = i
...
There's A Hole In The Bucket |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: