Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

[Graphics] Unicode characters in GL.ShaderSource #18

Closed
thefiddler opened this Issue · 4 comments

2 participants

@thefiddler
Owner

Originally reported here: http://www.opentk.com/node/1332

Starting with OpenGL 4.2, shaders can be encoded as UTF-8 strings. According to the GLSL spec, non-ASCII characters are only allowed in comments.

OpenTK currently marshals strings as ASCII, either through automatic marshaling (versions <= 1.0) or through manual marshaling (versions >= 1.1-beta3). The relevant code resides in https://github.com/opentk/opentk/blob/develop/Source/OpenTK/BindingsBase.cs#L153

Proposed solution:

  1. use the pointer overload of Encoding.UTF8.GetBytes() instead of Marshal.PtrToStringAnsi
  2. test that this works on all platforms (including mobile), using both ASCII and non-ASCII shaders
  3. measure performance impact
  4. if impact is significant, search for alternative solutions (e.g. strip/replace multibyte chars, or provide byte[] overloads and have the application perform the necessary marshaling)

Note that strings passed to GL.ShaderSource et al must not contain Byte Order Marks (BOMs). If they do, shader compilation will fail at runtime.

@thefiddler thefiddler referenced this issue from a commit
@thefiddler thefiddler Marshal strings as UTF8 (affects #18)
Starting with OpenGL 4.2, strings passed to GL.ShaderSource are allowed
to contain multi-byte characters in comments (issue #18). This patch
modifies the marshaling code to use UTF8.GetBytes in order to marshal
strings, instead of Marshal.StringToHGlobalAnsi().
94b04c0
@thefiddler thefiddler referenced this issue from a commit
@thefiddler thefiddler Marshal strings as UTF8 (affects #18)
Starting with OpenGL 4.2, strings passed to GL.ShaderSource are allowed
to contain multi-byte characters in comments (issue #18). This patch
modifies the marshaling code to use UTF8.GetBytes in order to marshal
strings, instead of Marshal.StringToHGlobalAnsi().
fd0c086
@ganaware

I'm sorry for the late response to this issue.
In short, the problem is not fixed by the above commit.

I tested the following source on the utf8 branch merged with HEAD.
https://github.com/ganaware/opentk/blob/utf8_test/Source/Examples/OpenGL/4.x/ShaderSourceWithJapaneseCommentTest.cs
Then it complained that:
(0) : error C0000: syntax error, unexpected $end, expecting "::" at token ""

I think that:

@thefiddler thefiddler was assigned
@thefiddler
Owner

Thanks for the test case, I can reproduce the issue now. Observations:

  • If the "Current language for non-Unicode programs" (on Windows) is set to a western language, then the test case works correctly on HEAD. This explains why I was not able to observe this problem before. (Marshal.StringToHGlobalAnsi internally calls MultiByteCharToWideChar(CP_ACP).)
  • If the "Current language for non-Unicode programs" is set to jp-JA, then I can observe the failure you described.
  • The utf8 branch does not work correctly for the reason you described (incorrect string length.)
  • It is not possible to change the length parameter for functions that take strings in the general case, since the binding generator does not know how to associate a string parameter with the correct length parameter. This means, we cannot use UTF8 in an automated fashion, as I was hoping to.

Considering the above, my solution would be to replace Marshal.StringToHGlobalAnsi() with Encoding.ASCII.GetBytes(). (Using the former is definitely a bug, since Ansi is not compatible with ASCII on non-western locales.) This will allow shaders with multi-byte characters to work as expected.

The downside is that the return value of GL.GetShaderSource() will no longer match what you pass to GL.ShaderSource(), i.e. multi-byte chars will be replaced by question marks '?'. I consider this to be a purely academic problem, for two reasons:
1. GL.GetShaderSource() is not commonly used. Indeed, if you call GL.ShaderSource() then you already have access to the source string and there is no reason to retrieve it via GL.GetShaderSource().
2. If for some reason you really, absolutely, must be able to round-trip a shader with multi-byte characters, you can always use the IntPtr overload of GL.ShaderSource() and GL.GetShaderSource(). The IntPtr overload gives you complete control over the string conversion, so you can encode/decode from UTF8 in a non-destructive fashion.

@thefiddler
Owner

This issue is now fixed in e1ef27d

I have also added your test case to OpenTK.Examples (renamed to "Shader UTF8 Support".) Many thanks!

@thefiddler thefiddler closed this
@ganaware

I think your final solution is the best.
Indeed, what I wanted was the success of GL.CompileShader() and not the precise support of UTF8.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.