Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

character kind parameter #1

Open
zbeekman opened this issue Dec 16, 2019 · 3 comments
Open

character kind parameter #1

zbeekman opened this issue Dec 16, 2019 · 3 comments

Comments

@zbeekman
Copy link

Shouldn't the constants have an explicit character kind parameter? There's no guarantee that the "DEFAULT" character kind is "ASCII".

I even wonder if it would make sense to set everything to ISO_10646 characters (UCS4... unicode, basically)

Also, it would be nice to have compile time polymorphism to allow non-ascii (i.e. ISO_10646 or DEFAULT when default isn't ascii) character kinds to be queried since ISO_10646 is a superset that include ASCII characters, although the bitwise representation at runtime may be different (likely padded with zeros).

The only half decent way I know to write extensive code with compile time polymorphism is using some templating/code-generation approach. I've been using Jin2For for this.

@ivan-pi
Copy link
Owner

ivan-pi commented Dec 16, 2019

Thanks for the comment. This was just a quick port of the functions in the D std.ascii module (https://dlang.org/phobos/std_ascii.html). The same functionality is also in the ctype header file of the C standard library (http://www.cplusplus.com/reference/cctype/).

If I understand correctly you are suggesting I create several copies of these function to operate on the following character kinds:

integer, parameter :: default = selected_char_kind('default')
integer, parameter :: ascii = selected_char_kind('ascii')
integer, parameter :: iso = selected_char_kind('iso_10646')

of which only the default set is guaranteed to be supported by a given processor. Moreover, the compiler vendors are not required to support the ASCII and ISO_10646 sets (my ifort 19.0.3 only supports one character kind).

Indeed with jin2for (similar to what you did with Zstdlib), I could reduce the amount of boilerplate code necessary. Perhaps this should be a separate discussion at https://github.com/fortran-lang/stdlib. I will create a new proposal there.

@zbeekman
Copy link
Author

If I understand correctly you are suggesting I create several copies of these function to operate on the following character kinds:

Well, not exactly, because, as you noted, they're not guaranteed to exist, and when they do exist, "DEFAULT" is often/usually the same kind as "ASCII", so you can't create overloaded functions with arguments that are "ascii" and "default". (In that case you'd have a duplicate interface.)

That's one nice thing about jin2for: It doesn't assume anything and interrogates the numeric kinds from the compiler to then generate the code. So if only one character kind is supported then your code will only have that one kind. I'll cross post this on the new issue you made.

@wclodius2
Copy link

The DEFAULT character kind is guaranteed to contain all the characters of the Fortran character set, which is all the printable characters of ASCII. It says nothing about the control codes or the order of printable characters in the character set. The order dependence for the printable characters can be consistently worked around by using ACHAR and IACHAR. In practice, the default character set is a mapping to the system's internal character set which is UTF-8 on Linux, UTF-16 on Windows, and Mac Roman(?) on the Macintosh. All map to ASCII for code points 0:127. The Chinese and Japanese computers tend to use national character septs that map to ASCII for 0:127. I don't know if the code set is well defined for Berkely Unix, but the ones I know use the Latin character sets which also map to ASCII for code points 0:127. I don't know what they use in India, but I would be very surprised if their character sets didn't also map to ASCII for 0:127. The only computers I know of that don't map to ASCII in code points 0:127, are those using EBCDIC(?) mostly IBM mainframes. The EBCDIC actually comprise a variety of character sets with the specific active one context dependent. The XL Fortran compiler, https://www.ibm.com/support/knowledgecenter/SS2MB5_14.1.0/com.ibm.xlf141.bg.doc/language_ref/asciit.html, appears to use an EBCDIC character set with equivalents to all the ASCII control characters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants