Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clarify that utf-8 is just a possible encoding of strings #684

Merged
merged 5 commits into from
Apr 16, 2021

Conversation

andimarek
Copy link
Contributor

this change tries to clarify that String scalars are not always UTF-8 strings, but actually sequences of unicode code points, which could be UTF-8, but doesn't have to.

human-readable text. All response formats must support string representations,
and that representation must be used here.

**Result Coercion**

Fields returning the type {String} expect to encounter UTF-8 string internal values.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this part should continue to specify UTF-8? As I read it, it's about the serialization of strings in responses, where specifying an encoding is actually appropriate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section is about result coercion not serialization. Of course String can be serialized to UTF-8 (and often they are via UTF-8 JSON) but it doesn't have to be.

@IvanGoncharov
Copy link
Member

@andimarek Maybe I'm missing something but the discussion was about extending the range of possible code points not removing UTF8 from the spec?
Internally we can use whatever encoding we want it just that if we send it to GraphQL server or receive it from GraphQL server it should UTF-8.
How clients can figure out what encoding they should use for strings?

If you need to send string in some other encoding you can always create custom scalar for that and with specifyBy your clients can figure out what encoding to use.

@andimarek
Copy link
Contributor Author

andimarek commented Feb 7, 2020

@IvanGoncharov this is just a cleanup/correction. As discussed today the current section mentioning UTF-8 is just wrong: UTF-8 is one of the possible Unicode encodings. Strings are sequences of unicode code points, not UTF-8 Strings. In fact the reference implementation itself uses UTF-16 to represent Strings (because JS uses UTF-16 internally to encode Unicode).

Also: sending data over the wire (serialization) is different from Scalar Coercion. The most commonly used serialization format is JSON which again is normally always encoded in UTF-8. We have an extra section how to serialize to JSON. But this is in noway required: JSON UTF-8 encoded serialization is just an option.

@IvanGoncharov IvanGoncharov added the 🤷‍♀️ Ambiguity An issue/PR which identifies or fixes spec ambiguity label May 30, 2020
Base automatically changed from master to main February 3, 2021 04:50
@leebyron leebyron added this to the May2021 milestone Apr 6, 2021
@leebyron
Copy link
Collaborator

@andimarek I made some edits, let me know if these look good to you

@leebyron leebyron added the ✏️ Editorial PR is non-normative or does not influence implementation label Apr 16, 2021
@leebyron
Copy link
Collaborator

I'm going to merge this now since this is the other half of the change made in #854

@leebyron leebyron merged commit 61c50f2 into graphql:main Apr 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤷‍♀️ Ambiguity An issue/PR which identifies or fixes spec ambiguity ✏️ Editorial PR is non-normative or does not influence implementation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants