Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GDL::make_name() odditites #16

Open
bobh0303 opened this issue Sep 23, 2017 · 2 comments
Open

GDL::make_name() odditites #16

bobh0303 opened this issue Sep 23, 2017 · 2 comments

Comments

@bobh0303
Copy link
Contributor

I'm thinking about a revision to the make_name() routine in GDL.pm, but as I study the code there are a number of oddities I'm wondering if we should clean up. Thoughts welcomed.

(I should say up front that, for backwards compatibility, any revision that generates different GDL identifiers will be disabled by default and require an option parameter to enable it.)

One major concern is that the routine does not attempt to separate out (and process individually) ligature components of the glyph names. For example, a while glyph name of:

  • uni1234abcd generates a GDL identifier g1234_abcd
    and
  • u12345 generates g12345

if we put those two into a ligature we get, bewilderingly:

  • uni1234abcd_u12345 generates g1234_abcd_1234_u5
    and
  • u12345_uni1234abcd generates g12345_ni1234abcd

(rather than something more expected like g1234_abcd_12345 and g12345_1234_abcd respectively)

So perhaps my first question is: shouldn't make_name() be processing such ligature components independently?

Next question is whether we really want to lowercase USVs in names? Currently, for example,

  • uni1234ABCD generates g1234_abcd
    and
  • uABCDE generates gabcde

Personally I find the uppercase USVs more readable.

Happy for this to be a brainstorming session...

@mhosken
Copy link
Contributor

mhosken commented Sep 23, 2017

From what you say, it's behaving as I intended it to behave. But I am open to a discussion on that.

According to the AGL a ligature name may be uxxxx_uxxxx_uxxxx_... or unixxxxyyyyzzzz... but not both. Hence uxxxx_uniyyyyzzzz is wrong and so I treat it as such. But if you want to change it such that in effect we treat uni as u, then that's fine by me. These cases shouldn't be occurring anyway. OK I admit u12345_uni1234abcd should have output g12345_uni1234abcd.

As to casing. I prefer lowercase, it's less noisy in a glyph name. Perhaps we need a switch for that too? I would like people to be able to get names they want to work with.

@bobh0303
Copy link
Contributor Author

According to the AGL a ligature name may be uxxxx_uxxxx_uxxxx_... or unixxxxyyyyzzzz... but not both. Hence uxxxx_uniyyyyzzzz is wrong and so I treat it as such.

Actually what is wrong -- or at least not recommended -- about this case is using 'u' notation for BMP characters. This is mentioned in AGL Specification (Section 6) where it says:

... it is recommended to specify names by using the "uni" prefix for characters in the Basic Multilingual Plane (BMP), and the shorter "u" prefix for characters in the 16 Supplemental Planes
...
Why is the prefix "u" not yet recommended for glyphs that are encoded in Unicode's BMP? The prefix "u" is not supported by Acrobat Versions 4 and 5. It became supported by Acrobat Version 6 and later, which is also when support for Unicode characters outside the BMP (Basic Multilingual Plane) was introduced. AGL names and glyph names that use the prefix "uni," along with the "." and "_" parsing rules, are already supported by Acrobat Versions 4 and 5.

But as for mixing 'u' and 'uni' notations in a ligature, this appears to be perfectly acceptable and in fact AGL Specification (Section 3) includes this example:

The name "Lcommaaccent_uni20AC0308_u1040C.alternate" has three components, which are "Lcommaaccent," "uni20AC0308," and "u1040C." It is mapped to the string U+013B U+20AC U+0308 U+1040C.

On a tangent: In reading the spec, I realize my original examples are flawed in that they have lower case hex digits in the glyph name, while the spec requires upper case only. In fact it gives this example:

The name "uni20ac" has a single component, which is mapped to an empty string (note the lowercase "a" and "c").

This also means our code should be tightened up to recognize only uppercase hex digits in the glyph name.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants