Skip to content
Browse files

Update EEP 40 from Richard O'Keefe dated Mon, 5 Nov 2012 11:56:48 +1300

  • Loading branch information...
1 parent 9ad00de commit 7a03b0675b0cde8ab93fb333b4fc1d3f5a2ce89f @RaimoNiskanen RaimoNiskanen committed Nov 5, 2012
Showing with 26 additions and 6 deletions.
  1. +26 −6 eeps/eep-0040.md
View
32 eeps/eep-0040.md
@@ -99,8 +99,9 @@ begin with _some_ special character to ensure that they are not
mistaken for unquoted atoms. There are 10 Pc characters in the Basic
Multilingual Plane. The Erlang parser treats a variable beginning
with an underscore specially: there will be no complaint if it is a
-singleton. There are 9 other Pc characters for which this special
-treatment is not applied. Using that approach, ‿ would not be a wild-card,
+singleton. One approach would be to say that this special treatment
+does not apply to the other 9 Pc characters.
+Using that approach, ‿ would not be a wild-card,
\_隠者 should be a singleton, and ‿隠者 should not.
Of course, someone might be using fonts
@@ -113,12 +114,12 @@ deal with that by revising the underscore rule, which I recommend:
Variable is just a Pc character and nothing else =>
is a wild card.
- Variable begins with a Pc character followed by a
- Latin-1 character =>
+ Variable begins with a Pc character followed by an
+ Lu or Lt or Pc character =>
may be a singleton.
Variable begins with a Pc character followed by
- a character outside the Latin-1 range =>
+ a legal character other than an Lu or Lt or Pc character =>
should not be a singleton.
Thus ‿ is a wild-card, 隠者 is an atom, \_隠者 should not be
@@ -140,7 +141,8 @@ normalisation of an unquoted atom is still an unquoted atom.
Unquoted atoms should be normalised.
The details of Erlang unquoted atoms are somewhat subtle; I have
-checked my understanding experimentally.
+checked my understanding experimentally. An initial dot is allowed,
+but is always discarded. That's odd, but it's the way it is now.
Keywords
--------
@@ -228,11 +230,29 @@ There are three ways we have to customize the UAX 31 definition.
- We have to distinguish between variables and unquoted atoms.
+There is a fourth way we _might_ customize it. Ken Whistler of
+Unicode advises that he "doesn't see much point" in allowing Pc
+characters other than LOW LINE and FULLWIDTH LOW LINE, unless there
+are legacy reasons why something else has to be supported. It seems
+like a good idea that if s is a legal ASCII identifier, the full width
+version of s should also be a legal identifier, so FULLWIDTH LOW LINE
+definitely ought to be allowed. I find using UNDERTIE cool, but it's
+an editor's mark really. If we reject the other Pc characters now, we
+can always allow them later if we find a need; if we allow them now,
+it will be hard to reject them later. Making this change _clearly_ in
+the definitions will take a little thought, so that's for the next revision.
+
Dmitry Belyaev has raised the issue of localising keywords. That is
outside the scope of this EEP, which is concerned with which character
sequences are variables and which are keywords-or-unquoted-atoms.
This has to be got right first before we can consider localised keywords.
+The leading underscore rule was revised on the 5th of November on
+the advice of Ulrich Neumerkel to avoid the problem that \_Œuvre
+would not have been accepted as a singleton. Now it will. This was
+ironic, as Māori variables like \_Āporo would have been misclassified.
+
+
Trouble spot
------------

0 comments on commit 7a03b06

Please sign in to comment.
Something went wrong with that request. Please try again.