Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

str.capitalize should titlecase the first character not uppercase #80730

Closed
stevendaprano opened this issue Apr 7, 2019 · 10 comments
Closed
Labels
3.8 (EOL) end of life easy interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-unicode type-feature A feature request or enhancement

Comments

@stevendaprano
Copy link
Member

BPO 36549
Nosy @ezio-melotti, @stevendaprano, @serhiy-storchaka, @zooba, @ZackerySpytz, @kingdom5500
PRs
  • bpo-36549: str.capitalize now titlecases the first character instead of uppercasing it #12804
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2019-04-12.15:36:11.742>
    created_at = <Date 2019-04-07.10:40:51.722>
    labels = ['interpreter-core', 'easy', 'type-feature', '3.8', 'expert-unicode']
    title = 'str.capitalize should titlecase the first character not uppercase'
    updated_at = <Date 2019-04-12.18:43:26.649>
    user = 'https://github.com/stevendaprano'

    bugs.python.org fields:

    activity = <Date 2019-04-12.18:43:26.649>
    actor = 'steve.dower'
    assignee = 'none'
    closed = True
    closed_date = <Date 2019-04-12.15:36:11.742>
    closer = 'steve.dower'
    components = ['Interpreter Core', 'Unicode']
    creation = <Date 2019-04-07.10:40:51.722>
    creator = 'steven.daprano'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 36549
    keywords = ['patch', 'easy (C)']
    message_count = 10.0
    messages = ['339568', '339570', '339804', '339878', '339890', '340066', '340067', '340076', '340095', '340096']
    nosy_count = 6.0
    nosy_names = ['ezio.melotti', 'steven.daprano', 'serhiy.storchaka', 'steve.dower', 'ZackerySpytz', 'kingsley']
    pr_nums = ['12804']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue36549'
    versions = ['Python 3.8']

    @stevendaprano
    Copy link
    Member Author

    str.capitalize appears to uppercase the first character of the string, which is okay for ASCII but not for non-English letters.

    For example, the letter NJ in Croatian appears as Nj at the start of words when the first character is capitalized:

    Njemačka ('Germany'), not NJemačka.

    (In ASCII, that's Njemacka not NJemacka.)

    https://en.wikipedia.org/wiki/Gaj's_Latin_alphabet#Digraphs

    But using any of:

    U+01CA LATIN CAPITAL LETTER NJ
    U+01CB LATIN CAPITAL LETTER N WITH SMALL LETTER J
    U+01CC LATIN SMALL LETTER NJ

    we get the wrong result with capitalize:

    py> 'NJemačka'.capitalize()
    'NJemačka'
    py> 'Njemačka'.capitalize()
    'NJemačka'
    py> 'njemačka'.capitalize()
    'NJemačka'

    I believe that the correct behaviour is to titlecase the first code point and lowercase the rest, which is what the Apache library here does:

    https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringUtils.html#capitalize-java.lang.String-

    @serhiy-storchaka
    Copy link
    Member

    I think this is a reasonable change.

    Also the docs for str.title() should be fixed.

    @serhiy-storchaka serhiy-storchaka added easy interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-unicode 3.8 (EOL) end of life type-feature A feature request or enhancement labels Apr 7, 2019
    @kingdom5500
    Copy link
    Mannequin

    kingdom5500 mannequin commented Apr 9, 2019

    Hello there,

    I'm an absolute beginner here and this whole thing is a little overwhelming, so please bear with me. I think this would be a suitable first task for me to take on because it appears to be a simple one-line change (correct me if I'm mistaken, though).

    @serhiy-storchaka
    Copy link
    Member

    This issue is easy if you know C.

    • Find the implementation of str.capitalize in unicodeobject.c and make it using the title case. See on the implementation of str.title for example.

    • Find tests for str.capitalize and aďd new cases. Finding the proper place for test may be the hardest part.

    • Update the documentation for str.capitalize. Add the versionchanged directive.

    • Fix the documentation for str.title. Use str.capitalize in the example.

    • Add the news and What's New entries.

    @kingdom5500
    Copy link
    Mannequin

    kingdom5500 mannequin commented Apr 10, 2019

    Thanks for clarifying all of that! I now have the patch and tests working locally. However, I'm not too sure what documentation needs to be changed for str.title. Should it specify that only the first letter of digraphs are capitalised, rather than the full character?
    I sure hope I get the hang of this soon :-D

    @zooba
    Copy link
    Member

    zooba commented Apr 12, 2019

    New changeset b015fc8 by Steve Dower (Kingsley M) in branch 'master':
    bpo-36549: str.capitalize now titlecases the first character instead of uppercasing it (GH-12804)
    b015fc8

    @zooba
    Copy link
    Member

    zooba commented Apr 12, 2019

    Thanks! I'm a big fan of this change :)

    @zooba zooba closed this as completed Apr 12, 2019
    @ZackerySpytz
    Copy link
    Mannequin

    ZackerySpytz mannequin commented Apr 12, 2019

    I think that the PR may have been merged too quickly. Serhiy had made a list, and I think that the PR was missing some necessary changes.

    @zooba
    Copy link
    Member

    zooba commented Apr 12, 2019

    What is missing? It looks like everything on Serhiy's list was done.

    @zooba
    Copy link
    Member

    zooba commented Apr 12, 2019

    Oh, apart from the What's New section. But this looks enough like a bugfix (previous behaviour "wasn't capitalizing my name correctly" - new behaviour "now capitalizes my name correctly") that it's hardly critical to advertise it on that page.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 (EOL) end of life easy interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-unicode type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants