Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse non-breaking-space #2492

Closed
wants to merge 1 commit into from
Closed

Conversation

ericmj
Copy link
Member

@ericmj ericmj commented Jul 4, 2014

No description provided.

@ericmj
Copy link
Member Author

ericmj commented Jul 4, 2014

@josevalim String.split/1 does not split on nbsp because it is in the bidirectional category "Common Number Separator". String.split/1 splits on the categories: "Paragraph Separator", "Segment Separator" and "Whitespace".

@josevalim
Copy link
Member

You said erlang handles it but erlang accepts all codepoints between 120 and 160, so we need to look at other places. I tried in ruby and it does not accept it as any other whitespace.

$ ruby -e "1 + 2"
-e:1: syntax error, unexpected tIDENTIFIER, expecting end-of-input

Other places we can look at?

@ericmj
Copy link
Member Author

ericmj commented Jul 4, 2014

ruby accepts it in the shell (the space expands to "\U+FFC2\U+FFA0" on input):

irb(main):001:0> 1\U+FFC2\U+FFA0+2
=> 3

python does not:

>>> 1 +2
  File "<stdin>", line 1
    1 +2
     ^
SyntaxError: invalid syntax

ghci does:

Prelude> 1 +2
3

@josevalim josevalim added this to the v1.0 milestone Jul 5, 2014
@josevalim
Copy link
Member

@ericmj I am closing this one due to the lack of consistency in other languages. I think the unicode consortium has a document about how whitespace, characters and so on must be treated by programming languages. Someone could read it with more details if they are interested. For now, we can wait until R18 is out, which is when we will add full unicode support to the language (in atoms, identifiers and so on) and when we will definitely need to look into this.

@josevalim josevalim closed this Jul 8, 2014
@ericmj ericmj deleted the parse-nbsp branch June 5, 2019 11:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants