Browse files

Update EEP 40 from Richard O'Keefe dated Mon, 5 Nov 2012 11:56:48 +1300

  • Loading branch information...
RaimoNiskanen committed Nov 5, 2012
1 parent 9ad00de commit 7a03b0675b0cde8ab93fb333b4fc1d3f5a2ce89f
Showing with 26 additions and 6 deletions.
  1. +26 −6 eeps/
@@ -99,8 +99,9 @@ begin with _some_ special character to ensure that they are not
mistaken for unquoted atoms. There are 10 Pc characters in the Basic
Multilingual Plane. The Erlang parser treats a variable beginning
with an underscore specially: there will be no complaint if it is a
-singleton. There are 9 other Pc characters for which this special
-treatment is not applied. Using that approach, ‿ would not be a wild-card,
+singleton. One approach would be to say that this special treatment
+does not apply to the other 9 Pc characters.
+Using that approach, ‿ would not be a wild-card,
\_隠者 should be a singleton, and ‿隠者 should not.
Of course, someone might be using fonts
@@ -113,12 +114,12 @@ deal with that by revising the underscore rule, which I recommend:
Variable is just a Pc character and nothing else =>
is a wild card.
- Variable begins with a Pc character followed by a
- Latin-1 character =>
+ Variable begins with a Pc character followed by an
+ Lu or Lt or Pc character =>
may be a singleton.
Variable begins with a Pc character followed by
- a character outside the Latin-1 range =>
+ a legal character other than an Lu or Lt or Pc character =>
should not be a singleton.
Thus ‿ is a wild-card, 隠者 is an atom, \_隠者 should not be
@@ -140,7 +141,8 @@ normalisation of an unquoted atom is still an unquoted atom.
Unquoted atoms should be normalised.
The details of Erlang unquoted atoms are somewhat subtle; I have
-checked my understanding experimentally.
+checked my understanding experimentally. An initial dot is allowed,
+but is always discarded. That's odd, but it's the way it is now.
@@ -228,11 +230,29 @@ There are three ways we have to customize the UAX 31 definition.
- We have to distinguish between variables and unquoted atoms.
+There is a fourth way we _might_ customize it. Ken Whistler of
+Unicode advises that he "doesn't see much point" in allowing Pc
+characters other than LOW LINE and FULLWIDTH LOW LINE, unless there
+are legacy reasons why something else has to be supported. It seems
+like a good idea that if s is a legal ASCII identifier, the full width
+version of s should also be a legal identifier, so FULLWIDTH LOW LINE
+definitely ought to be allowed. I find using UNDERTIE cool, but it's
+an editor's mark really. If we reject the other Pc characters now, we
+can always allow them later if we find a need; if we allow them now,
+it will be hard to reject them later. Making this change _clearly_ in
+the definitions will take a little thought, so that's for the next revision.
Dmitry Belyaev has raised the issue of localising keywords. That is
outside the scope of this EEP, which is concerned with which character
sequences are variables and which are keywords-or-unquoted-atoms.
This has to be got right first before we can consider localised keywords.
+The leading underscore rule was revised on the 5th of November on
+the advice of Ulrich Neumerkel to avoid the problem that \_Œuvre
+would not have been accepted as a singleton. Now it will. This was
+ironic, as Māori variables like \_Āporo would have been misclassified.
Trouble spot

0 comments on commit 7a03b06

Please sign in to comment.