Permalink
Browse files

Update EEP 40 from Richard O'Keefe dated Fri, 2 Nov 2012 11:41:46 +1300

  • Loading branch information...
1 parent b84a353 commit 7879a87b86588ce5b10874fe6065c8e5d3d24c91 @RaimoNiskanen RaimoNiskanen committed Nov 2, 2012
Showing with 37 additions and 9 deletions.
  1. +37 −9 eeps/eep-0040.md
View
@@ -66,6 +66,8 @@ Other\_Id\_Start, are drawn from [Unicode][] and [UAX#31][].
Lu = upper case letters
Lt = title case letters
+ Ll = lower case letters
+ Lo = non-case letters (Arabic, Chinese, and so on)
Pc = connector punctuators, including the low line (_) and
a number of other characters like undertie (‿).
Other_Id_Start = script capital p, estimated symbol,
@@ -84,7 +86,7 @@ Variables
var_start ::= (XID_Start ∩ (Lu ∪ Lt ∪ Other_Id_Start)) ∪ Pc
- var_continue ::= XID_Continue ∪ "@"
+ var_continue ::= XID_Continue ∪ "@" \ "ªº"
The choice of XID here follows Python. It ensures that the normalisation
of a variable is still a variable. In fact Unicode variables should be
@@ -98,9 +100,12 @@ mistaken for unquoted atoms. There are 10 Pc characters in the Basic
Multilingual Plane. The Erlang parser treats a variable beginning
with an underscore specially: there will be no complaint if it is a
singleton. There are 9 other Pc characters for which this special
-treatment is not applied. Of course, someone might be using fonts
+treatment is not applied. Using that approach, ‿ would not be a wild-card,
+\_隠者 should be a singleton, and ‿隠者 should not.
+
+Of course, someone might be using fonts
that do include say Arabic letters but not say the undertie. We can
-deal with that by revising the underscore rule.
+deal with that by revising the underscore rule, which I recommend:
Variable does not begin with a Pc character =>
should not be a singleton.
@@ -112,7 +117,7 @@ deal with that by revising the underscore rule.
Latin-1 character =>
may be a singleton.
- Variable begins with a Pc character following by
+ Variable begins with a Pc character followed by
a character outside the Latin-1 range =>
should not be a singleton.
@@ -128,7 +133,7 @@ Unquoted atoms
atom_start ::= XID_Start \ (Lu ∪ Lt ∪ "ªº")
| "." (Ll ∪ Lo)
- atom_continue ::= XID_Continue | "@"
+ atom_continue ::= XID_Continue "@" \ "ªº"
| "." (Ll ∪ Lo)
Again the choice of XID follows Python, and ensures that the
@@ -148,11 +153,16 @@ introduced.
- Any Python identifier or keyword is
an Erlang variable or unquoted atom or keyword
- unless it begins with "ª" or "º".
+ unless it contains "ª" or "º".
- @ signs may occur freely in variables and unquoted atoms except as the
first character, as now.
+- Although they are in the Ll set, and so are technically lower case
+ letters, "ª" and "º" are not allowed in variable names or
+ unquoted atoms in this proposal because they are not allowed in
+ Erlang now.
+
- dots may not be followed by capital letters, digits, or underscores,
as now.
@@ -200,16 +210,30 @@ will recall only too clearly how much of an impairment to readability
a hailstorm of single quotation marks was. And if you can use
γαμμα as an atom, does it make any sense to refuse Γαμμα?
+One of the goals for this EEP is that if an Erlang text contains only
+Latin-1 characters, then it should be legal under the new rules if and
+only if it is legal under the old rules, and should have the same
+meaning in either context. During the transition period, there will
+be people writing Erlang code for systems following the new rules, and
+giving it to people using Latin-1 or at any rate old-rules systems.
+They should not _accidentally_ introduce incompatibilities. This is
+why we have to ban "ª" and "º" for now. Later we may lift that ban.
+
There are three ways we have to customize the UAX 31 definition.
- We have to continue to support "@" in variables and
"@" and "." in unquoted atoms for backwards compatibility.
- - We have to continue to forbid unquoted atoms beginning
- with the Latin-1 masculine and feminine ordinal indicators.
+ - We have to continue to forbid unquoted atoms containing
+ the Latin-1 masculine and feminine ordinal indicators.
- We have to distinguish between variables and unquoted atoms.
+Dmitry Belyaev has raised the issue of localising keywords. That is
+outside the scope of this EEP, which is concerned with which character
+sequences are variables and which are keywords-or-unquoted-atoms.
+This has to be got right first before we can consider localised keywords.
+
Trouble spot
------------
@@ -225,7 +249,11 @@ like LATIN LETTER SMALL CAPITAL M, but what would a capital capital M
be?
One possibility is to raise the issue with the Unicode consortium and
-leave this unresolved until they reply.
+leave this unresolved until they reply. The issue _has_ been raised,
+and the tentative reply "you may not be able to rely on any given
+standard property for special purposes. Especially if that property is not
+formally stable." given. The next step may well be to seek a revision
+to UAX#31, because Erlang is not alone in wanting a case distinction.
Another possibility would be to say that an Lu character may only
begin a variable if it has a lower-case counterpart, and an Ll

0 comments on commit 7879a87

Please sign in to comment.