Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WG14 N2653: char8_t: A type for UTF-8 characters and strings #5

Closed
tahonermann opened this issue Apr 23, 2018 · 15 comments
Closed

WG14 N2653: char8_t: A type for UTF-8 characters and strings #5

tahonermann opened this issue Apr 23, 2018 · 15 comments
Assignees
Labels
enhancement New feature or request paper accepted WG14 WG14

Comments

@tahonermann
Copy link
Member

Proposals:

Reference implementation:

@tahonermann tahonermann self-assigned this Apr 23, 2018
@tahonermann tahonermann added the enhancement New feature or request label Apr 23, 2018
@tahonermann tahonermann added the paper submitted A paper proposing a specific solution has been submitted label Aug 6, 2018
@WPMGPRoSToTeMa
Copy link

What about adding std::to_chars/std::from_chars support for char8_t? (only for Basic Latin of course)

@tahonermann
Copy link
Member Author

What about adding std::to_chars/std::from_chars support for char8_t? (only for Basic Latin of course)

I'd like to consider that a separate issue from this one (which I'm going to close now since we adopted char8_t for C++20). Regardless, I don't see the connection to std::to_chars and std::from_chars; those interfaces exist for conversion of (floating point) values to string representation. We do need interfaces for transcoding, but don't have any proposals in front of us yet.,

@tahonermann
Copy link
Member Author

Actually, I'm not going to close this issue as it tracks updating both C and C++. We're halfway there...

@tahonermann
Copy link
Member Author

tahonermann commented Nov 21, 2018

Minutes from the WG14 meeting in Brno, April 23-27th, 2018 in which the char8_t proposal for C (N2231) was discussed are available at:

@tahonermann tahonermann added paper revision needed An updated paper proposing a specific solution is needed and removed paper submitted A paper proposing a specific solution has been submitted labels Nov 17, 2019
@tahonermann tahonermann added the WG14 WG14 label Feb 15, 2020
@tahonermann tahonermann changed the title char8_t (WG21 P0482, WG14 N2231) WG14: char8_t: A type for UTF-8 characters and strings Feb 15, 2020
@DBJDBJ
Copy link

DBJDBJ commented Feb 24, 2020

Sorry Tom that was 2018, not 2008 ... WG14 cogitations are long but not that long :)

@tahonermann
Copy link
Member Author

@DBJDBJ, thanks, comment fixed.

@MarcusJohnson91
Copy link

MarcusJohnson91 commented Apr 23, 2020

So will char8_t be available in C, and do you know what macro would define it's existence (since it isn't in uchar.h like char16_t and char32_t)?

obviously __cpp_char8_t won't be C's macro, right?

@tahonermann
Copy link
Member Author

Hi @MarcusJohnson91. I'm working on completing an implementation of N2231 (char8_t for C) for gcc and glibc now. Once done, I'll submit a revision of N2231 to the C committee to consider adopting and will follow up with them.

obviously __cpp_char8_t won't be C's macro, right?

Correct. I've been planning to use __STDC_CHAR8_T for the feature test macro name for C. This name matches the conventions used for other C feature test macros.

@tahonermann tahonermann changed the title WG14: char8_t: A type for UTF-8 characters and strings WG14 N2231: char8_t: A type for UTF-8 characters and strings Nov 23, 2020
@tahonermann tahonermann changed the title WG14 N2231: char8_t: A type for UTF-8 characters and strings WG14 N2653: char8_t: A type for UTF-8 characters and strings Jun 7, 2021
@tahonermann tahonermann added paper submitted A paper proposing a specific solution has been submitted and removed paper revision needed An updated paper proposing a specific solution is needed labels Jun 7, 2021
@MarcusJohnson91
Copy link

Hey @tahonermann Has char8_t been adopted by WG14?

Reading draft N2731, and I'm not seeing it :(

@tahonermann
Copy link
Member Author

@MarcusJohnson91, no, not yet. I expect it to be discussed at the next WG14 meeting.

@MarcusJohnson91
Copy link

MarcusJohnson91 commented Nov 26, 2021

Hey @tahonermann, my proposal for new length modifiers needs to be finished like ASAP and it sounds like we'll present at the same meeting.

I'm creating the length modifiers U16 and U32 for char16_t and char32_t characters and strings.

if you're planning on adding something like that for char8_t, I would use U8 to fit in, basically c/s/lc/ls are for locale specific conversions, and the U16/U32/U8? variants are for Unicode.

Just a heads up.

I can't wait for your paper to get in!

@tahonermann
Copy link
Member Author

Hi, @MarcusJohnson91, I have yet to review your proposal. I’m curious how it deals with encoding issues. It may be worth discussing it in an SG16 telecon as we would presumably want to align behavior with a hypothetical future std::format() overload or formatter that supports the UTF char variants.

As for your proposal supporting char8_t, that is something we can reconcile if both proposals are accepted.

@MarcusJohnson91
Copy link

MarcusJohnson91 commented Nov 29, 2021

I've submitted a new revision of my proposal November 26th, but it hasn't been published yet.

the original proposal is N2761; the main difference is I changed it from l16/l32 to U16/U32, and I clarified that precision and width specifiers operate on Codepoints, not Codeunits for security reasons.

I believe Robert suggested I add support for char8_t, and in my draft I did specify U8 for char8_t types, but when I looked in the paper and saw that char8_t wasn't in yet, I dropped it from the draft.

@tahonermann
Copy link
Member Author

N2653 was accepted for C2x during the WG14 meeting held in late January and early February of 2022. An updated working paper has not yet been published but should be forthcoming. Closing.

@tahonermann tahonermann added paper accepted and removed paper submitted A paper proposing a specific solution has been submitted labels Jun 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request paper accepted WG14 WG14
Development

No branches or pull requests

4 participants