New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specify WGSL as UTF-16-encoded text. #1626
Conversation
Resolves gpuweb#565. USVString is UTF-16.
FWIW, golang does quite a nice job describing how it consumes unicode: |
WGSL meeting minutes 2021-04-13
|
As discussed in the WGSL meeting, I'm not convinced the WGSL spec should be specifying the specific encoding scheme, and as an implementor, I'm certainty not convinced we'll want everything converted to UTF-16. In my view, the encoding scheme is a detail of the WGSL compiler implementation, not the language. If we're attempting to lay the foundation for non-ascii text, I believe the spec should be stating is that WGSL is written as a sequence of unicode code-points, and we'll want to rework the Textual structure section in terms of code-points. As mentioned above, Go does a great job describing its source code representation as code-points. I'm aware that link opens with the line |
We actually (surprisingly!) have opinions on this. First, on the face of it, we agree with @kvark: The entry point we are specifying that will accept WGSL (createShaderModule()) will accept a DOMString, not a byte sequence. So, from that perspective, there’s no reason to specify it. However, we are creating a programming language. People are going to put WGSL programs in files, regardless of the specific entry points we standardize. We could create a world where no bye encoding is blessed, and tools have to have a bunch of BOM code (and more) to guess encodings. Or, we could cut those problems off at the knee, and just bless a particular encoding in this group, thereby helping tools interoperate, regardless of the fact that createShaderModule() won’t actually use that part of the spec. The latter approach is just a better world to live in. UTF-8 is a better pick than UTF-16 though. It’s the default encoding for most text editors. (At least the editors I’m familiar with!) |
Did you mean @ben-clayton, or am I missing a discussion? UTF-8 is preferable to 16, but I still feel that this is the wrong place to be specifying this. I expect we (tint) will be converting DOMString byte sequences to integer code points for processing (not UTF-8). We'll have to handle a bunch of different encoding schemes, so handling a wider set of BOM file encodings seems trivial. With that said, I'm not opposed to the WGSL spec stating that programs that consume WGSL source files must be able to support UTF-8, but additional unicode encoding schemes may also be supported. |
WGSL the language doesn't need to define file/storage formats for the code. It just seems unnecessary? We'd be arguing about UTF-16 vs UTF-8, which isn't productive and isn't helping anybody. |
Discussed in 2021-04-20
|
WGSL meeting minutes 2021-04-20
|
Fixed by #1646 |
No description provided.