Skip to content

[stdlib] Rename char_length to count_codepoints for consistency#5764

Closed
martinvuyk wants to merge 1 commit into
modular:mainfrom
martinvuyk:rename-char-length
Closed

[stdlib] Rename char_length to count_codepoints for consistency#5764
martinvuyk wants to merge 1 commit into
modular:mainfrom
martinvuyk:rename-char-length

Conversation

@martinvuyk
Copy link
Copy Markdown
Contributor

@martinvuyk martinvuyk commented Jan 12, 2026

Rename char_length to count_codepoints for consistency and add the methods to String and StringLiteral

@martinvuyk martinvuyk requested a review from a team as a code owner January 12, 2026 19:32
@ConnorGray
Copy link
Copy Markdown
Member

Thanks for this Martin! :)

While we're in the area, what do we think about naming this codepoint_count() instead?

I think _count() would emphasize that this is actually performing work 🏋️ to iterate and count up all the codepoints, which is an O(N) operation—and is characteristically unlike a typical len(..) call which just cheaply reads a pre-computed field.

@martinvuyk
Copy link
Copy Markdown
Contributor Author

Thanks for this Martin! :)

While we're in the area, what do we think about naming this codepoint_count() instead?

I think _count() would emphasize that this is actually performing work 🏋️ to iterate and count up all the codepoints, which is an O(N) operation—and is characteristically unlike a typical len(..) call which just cheaply reads a pre-computed field.

@ConnorGray I think that is an interesting idea. I was thinking of length more to keep in line with the byte_length() method (and __len__). If we go the count route then it feels more intuitive to say count_codepoints() (and eventually count_graphemes()) to imply work being done vs sounding like an attribute; though bytecount is also somewhat used in other languages to mean just size in bytes. I'm not sure that people unfamiliar with unicode/UTF-8 will intuitively reach out to count_codepoints() (this is a somewhat weak supposition though)

@ConnorGray
Copy link
Copy Markdown
Member

ConnorGray commented Jan 12, 2026

Hi Martin, you have a point. I like count_codepoints() as well, though something about codepoints_count() feels more readable to me. Mulling it over, maybe it's that ending with "count" makes it "feel" obvious that it's returning the count— where's count_codepoints() almost feels like it's intended to have a side effect (instead of returning the count).

E.g. looking at:

for i in str.count_codepoints():
    ...

vs

for i in str.codepoints_count():
    ...

Of course, perhaps we might ultimately end up with something like str.codepoints().count(), which would be very explicit and somewhat Rusty 🦀.

I suppose I don't have a strong opinion between count_codepoints() and codepoints_count(), so I'll defer to you on that 🙂 I do think I have a preference for something with "count" instead of "len" in the name for this, but curious to hear other folks thoughts as well.

Perhaps @NathanSWard and/or @barcharcraz have thoughts?

@ConnorGray
Copy link
Copy Markdown
Member

I'm not sure that people unfamiliar with unicode/UTF-8 will intuitively reach out to count_codepoints() (this is a somewhat weak supposition though)

This is probably a fair point. Though, if they're unfamiliar with Unicode / UTF-8, I think they're better served if our API's nudge them towards learning about those? 🙂 If they reach for an "easy" sounding name like len(str) and find that it exists, but don't understand what it does, that might be worse overall.

@martinvuyk
Copy link
Copy Markdown
Contributor Author

This is probably a fair point. Though, if they're unfamiliar with Unicode / UTF-8, I think they're better served if our API's nudge them towards learning about those? 🙂 If they reach for an "easy" sounding name like len(str) and find that it exists, but don't understand what it does, that might be worse overall.

Yeah that's why I was thinking of maybe deprecating String.__len__ in #5765. Since I think people won't reach out for the builtin method's docstrings to read before just using them, I think it will be best for us to just deprecate it.

we might ultimately end up with something like str.codepoints().count()

We can already do len(str.codepoints()) which calls the inner slice's char_length() method. It seems like we will have many more cases of this "builtin vs chaining" function concatenation style differences.

Something that would solve many of the "people won't go and read builtin method's docstrings for each type would be solved by having the compiler allow something like

@always_inline
fn len[T: Sized](value: T) -> Int:
    __doc__ = T.__len__.__doc__
    return value.__len__()

@ConnorGray
Copy link
Copy Markdown
Member

Hey Martin, just chatted with some folks on the team. I think we're +1 to trying out deprecating String.__len__(). We already made a similar conservative decision with @barcharcraz's recent PR making String.__getitem__ keyword-only string[byte=...].

I like your idea about how to share docstrings between methods. That's definitely a problem we need to solve. I'm not sure of the best design. I could also see something like a @doc(from=...) annotation.

@ConnorGray
Copy link
Copy Markdown
Member

Regarding needing to pick some name, I suggest we go with count_codepoints() 🙂 Nathan and I talked about it and he convinced me that's the better name.

@martinvuyk martinvuyk force-pushed the rename-char-length branch 3 times, most recently from a8165d5 to 7c6ab27 Compare January 14, 2026 22:57
@martinvuyk martinvuyk changed the title [stdlib] Rename char_length to codepoint_length for consistency [stdlib] Rename char_length to count_codepoints for consistency Jan 15, 2026
…ngLiteral

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
@ConnorGray
Copy link
Copy Markdown
Member

!sync

@modularbot modularbot added the imported-internally Signals that a given pull request has been imported internally. label Jan 15, 2026
Copy link
Copy Markdown
Contributor

@barcharcraz barcharcraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks martin, this does seem like a better name.

Good catch to add it to StringLiteral as well

@modularbot
Copy link
Copy Markdown
Collaborator

✅🟣 This contribution has been merged 🟣✅

Your pull request has been merged to the internal upstream Mojo sources. It will be reflected here in the Mojo repository on the main branch during the next Mojo nightly release, typically within the next 24-48 hours.

We use Copybara to merge external contributions, click here to learn more.

@modularbot modularbot added the merged-internally Indicates that this pull request has been merged internally label Jan 16, 2026
@modularbot
Copy link
Copy Markdown
Collaborator

Landed in 8cf841c! Thank you for your contribution 🎉

@modularbot modularbot added the merged-externally Merged externally in public mojo repo label Jan 17, 2026
@martinvuyk martinvuyk deleted the rename-char-length branch January 17, 2026 15:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

imported-internally Signals that a given pull request has been imported internally. merged-externally Merged externally in public mojo repo merged-internally Indicates that this pull request has been merged internally

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants