Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify WGSL as UTF-16-encoded text. #1626

Closed
wants to merge 2 commits into from
Closed

Conversation

kdashg
Copy link
Contributor

@kdashg kdashg commented Apr 12, 2021

No description provided.

@github-actions
Copy link
Contributor

Previews, as seen when this build job started (09be9c5):
WebGPU | IDL
WGSL
Explainer

@kdashg kdashg changed the title Utf8 shaders Specify WGSL as UTF-16-encoded text. Apr 12, 2021
@github-actions
Copy link
Contributor

Previews, as seen when this build job started (55ba8c4):
WebGPU | IDL
WGSL
Explainer

Resolves gpuweb#565.
USVString is UTF-16.
@github-actions
Copy link
Contributor

Previews, as seen when this build job started (8ef882a):
WebGPU | IDL
WGSL
Explainer

@ben-clayton
Copy link
Collaborator

FWIW, golang does quite a nice job describing how it consumes unicode:
https://golang.org/ref/spec#Source_code_representation

@kdashg
Copy link
Contributor Author

kdashg commented Apr 14, 2021

WGSL meeting minutes 2021-04-13
  • BC: I agree with KN: Do we need this specified here? (probably?)
  • KN: Ties into how we define how to talk about errors/warnings for locations. Code-unit vs code-point. It seems like a use case is for easily converting error/warning to offsets into JS strings
  • DM: Why do we need this specified at all? Does any part here care?
  • JG: difference in how we treat the input. If treated as ascii vs UTF-8 it has effects on validity of incoming code. Parsers need to handle that. Need to know end-of-line for example.
  • KN: no byte representation in the spec. It is a string that the implementation can choose to encode how they wish.
  • JG: Constrained by Web IDL
  • KN: native apis will care.
  • JG: this should be written down

@ben-clayton
Copy link
Collaborator

As discussed in the WGSL meeting, I'm not convinced the WGSL spec should be specifying the specific encoding scheme, and as an implementor, I'm certainty not convinced we'll want everything converted to UTF-16. In my view, the encoding scheme is a detail of the WGSL compiler implementation, not the language.

If we're attempting to lay the foundation for non-ascii text, I believe the spec should be stating is that WGSL is written as a sequence of unicode code-points, and we'll want to rework the Textual structure section in terms of code-points.

As mentioned above, Go does a great job describing its source code representation as code-points. I'm aware that link opens with the line Source code is Unicode text encoded in UTF-8., however I don't see any specific reason why using other unicode encoding schemes would affect the definition of its lexical elements.

@litherum
Copy link
Contributor

litherum commented Apr 19, 2021

We actually (surprisingly!) have opinions on this.

First, on the face of it, we agree with @kvark: The entry point we are specifying that will accept WGSL (createShaderModule()) will accept a DOMString, not a byte sequence. So, from that perspective, there’s no reason to specify it.

However, we are creating a programming language. People are going to put WGSL programs in files, regardless of the specific entry points we standardize.

We could create a world where no bye encoding is blessed, and tools have to have a bunch of BOM code (and more) to guess encodings. Or, we could cut those problems off at the knee, and just bless a particular encoding in this group, thereby helping tools interoperate, regardless of the fact that createShaderModule() won’t actually use that part of the spec. The latter approach is just a better world to live in.

UTF-8 is a better pick than UTF-16 though. It’s the default encoding for most text editors. (At least the editors I’m familiar with!)

@ben-clayton
Copy link
Collaborator

First, on the face of it, we agree with @kvark

Did you mean @ben-clayton, or am I missing a discussion?

UTF-8 is preferable to 16, but I still feel that this is the wrong place to be specifying this. I expect we (tint) will be converting DOMString byte sequences to integer code points for processing (not UTF-8). We'll have to handle a bunch of different encoding schemes, so handling a wider set of BOM file encodings seems trivial.

With that said, I'm not opposed to the WGSL spec stating that programs that consume WGSL source files must be able to support UTF-8, but additional unicode encoding schemes may also be supported.

@kvark
Copy link
Contributor

kvark commented Apr 19, 2021

WGSL the language doesn't need to define file/storage formats for the code. It just seems unnecessary? We'd be arguing about UTF-16 vs UTF-8, which isn't productive and isn't helping anybody.
What may happen in practice is that the tools will have "blessed" encoding (I mean Naga and Tint), and that organically will influence other tools to follow.

@dneto0
Copy link
Contributor

dneto0 commented Apr 20, 2021

Discussed in 2021-04-20

  • Proposal: "UTF-8 is always a valid encoding for a WGSL program."

@dneto0 dneto0 self-assigned this Apr 20, 2021
@kdashg
Copy link
Contributor Author

kdashg commented Apr 20, 2021

WGSL meeting minutes 2021-04-20
  • MM: Technically correct: “We don’t need to specify an encoding”. But, we may want to spec an encoding for ecosystem reasons. Tools and standalone shader files might otherwise pick randomly, to the detriment of the ecosystem. We could simplify this by blessing a particular encoding
  • DN: I agree, as long as the encoding we choose isn’t “bad” :) Also let’s be careful about the language we use here: Strong recommendation without saying that we need/use it for WGSL in the browser?
  • BC: But why should the language forbid other encodings? All we need for WGSL is “we support unicode”, and tools are welcome to do what they want. Forbidding in the language spec feels like overstepping the bounds of specification here
  • DN: Ok, I think we should “allow” other encodings
  • MM: Ok, but I’d like to
  • BC: “tooling MUST accept <encoding>”
  • DN: Can we agree that tools MUST support utf-8?
  • MM: Related question, error messages have offsets, which are in code units. It’s ok if the diagnostics are in utf-16, even though the page is utf-8.
  • BC: Diagnostics position/offsets are always in the units that correspond to the encoding provided for the shader source. So they scale the same as the source presented.
  • MM: Sounds good.
  • BC: Ok, what about identifiers?
  • MM: Probably in a year or so?
  • JB: Rust just recently had an RFC for this
  • MM: Post-MVP though, even when we talked about this long ago
  • BC: Yeah, prefer to keep that post-mvp
  • MM: Unicode in comments probably? Even if everything else must be ascii?
  • JG: Do we have unicode comments today?
  • MM: Yes, and furthermore it’d be extra code to reject ascii.
  • DN: Straw man “UTF-8 is always a valid encoding for a WGSL program”

dneto0 added a commit to dneto0/gpuweb that referenced this pull request Apr 20, 2021
dneto0 added a commit that referenced this pull request Apr 21, 2021
@dneto0
Copy link
Contributor

dneto0 commented Apr 21, 2021

Fixed by #1646

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants