Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add | quotes and \ escapes to Hy symbols and keywords. #1117

Closed
gilch opened this issue Sep 25, 2016 · 5 comments · Fixed by #2064
Closed

Add | quotes and \ escapes to Hy symbols and keywords. #1117

gilch opened this issue Sep 25, 2016 · 5 comments · Fixed by #2064
Labels

Comments

@gilch
Copy link
Member

gilch commented Sep 25, 2016

Common Lisp lets you use arbitrary strings as symbols. If you include certain characters you have to escape them, either by quoting the whole string in ||, or with a \ per character.

* (symbolp '|Am I a symbol?|)

T
* '|Am I a symbol?|

|Am I a symbol?|
* (symbolp 'Am\ I?)

T
* 'Am\ I?

|AM I?|

Common Lisp's reader uppercases symbols by default, but valid symbols can have lower case characters as demonstrated above. Hy's reader does similar shenanigans with *, -, ?, and !, yet does allow them in symbols.

=> '*A-symbol!*
'A_SYMBOL_bang'
=> (HySymbol "*A-symbol!*")
'*A-symbol!*'

I propose we add the Common lisp style symbol quotes and escape to Hy, so if you really want the symbol to contain a - or end in ? (etc.), you can, but most of the time it will get mangled to something Python can use more easily as now. This could make Hy easier to work with when these details matter.

=> '|*A-symbol!*|
'*A-symbol!*'
@refi64
Copy link
Contributor

refi64 commented Sep 25, 2016

Wouldn't it be better to have something like '"text here", i.e. a single quote following the text in double-quotes. The syntax is less weird and easier to read.

@gilch
Copy link
Member Author

gilch commented Sep 25, 2016

I don't think so. That particular syntax isn't going to work, since it already means (quote "text here") to the reader. We already have the HySymbol function to convert strings into quoted symbols. That's not the point. The point is, you should have a read syntax for not-quoted symbols that contain special characters. ('"my func" '"my var") wouldn't do anything; it's just a list of symbols. If we don't quote them, ("my func" "my var") is now just a list of strings. But (|my func| |my var|) is executable code in Common Lisp--it applies |my func| to |my var|.

We could perhaps do a reader macro on a string like Clojure does for regex, but I didn't pull this syntax out of my nose. It is how Common Lisp does it and Scheme too. I think we need a better reason than "kirby doesn't like it" to break established convention with some other syntax, or we're just making things harder on Hy's users who may already be familiar with it.

The syntax is less weird and easier to read.

It's perfectly readable.
Technically the \ escape alone is enough. We don't need the | also, but I think quotation marks like

|I'm a symbol!|

looks a lot nicer than escapes like

I\'m\ a\ symbol\!

but maybe that's just me...

@Kodiologist
Copy link
Member

Although the practical use for this is slim, it would definitely be nice to have for the sake of completeness.

To be clear, you want this to not only prevent manging (e.g., hyphens to underscores), but also to allow things like spaces and parentheses in symbols, right? Then it's probably worth thinking about exactly what characters are allowed. For example, can ASCII nulls be quoted? And in the context of Unicode, does a backslash apply to a single character, a single glyph, a single byte, or what?

@gilch
Copy link
Member Author

gilch commented Sep 25, 2016

What's allowed depends on Python. I especially want to be able to prevent manging (without giving up manging), but any "attr" that Python's getattr/setattr/delattr can use on an object should be a valid symbol. I'm pretty sure that's any Python string whatsoever. What "Python string" means changed in Python 3, however. I also think the globals dict can take arbitrary strings without trouble, even Unicode in Python2, since u'foo'=='foo'. (Just because you can put Unicode in a string doesn't mean your terminal will print it.) Even Hy's locals will presently tolerate more characters than Python will:

=> ((lambda [fo?o] fo?o) 42)
(lambda fo?o: fo?o)(42)           ; not valid Python!! AST doesn't care.
42

Local names get optimized to integers in bytecode anyway.

Python already lets you name classes anything:

>>> type('*bang-A rang*!',(),{})
<class '__main__.*bang-A rang*!'>
>>> type('♥',(),{})
<class '__main__.♥'>

If Python can have such names, Hy should be able to evaluate them as symbols.

Backslashes should escape anything that would otherwise cause Hy's reader to end the token. We could also let Python handle any remaining backslashes with the same rules as a Unicode string.

@gilch
Copy link
Member Author

gilch commented Aug 10, 2017

@Kodiologist pointed out in #1327 that we could implement this as another prefix to string literals. E.g. s"foo bar" could be a Symbol instead of a string. We're already using letter prefixes like Python does, like r"foo\bar", so this is not a dramatic change. This would still potentially let us use #"foo" syntax as regex literals like in Clojure. I am worried about what happens if Python adds another string prefix. It doesn't seem to happen often though.

We can almost do this with tag macros. Something like (deftag s [name] (HySymbol (str name))) would mostly work. Then #s"foo bar" would expand to the literal, non-quoted symbol with the name foo bar. But when quoted, like '#s"foo bar", it becomes

HyExpression([ HySymbol('dispatch_tag_macro'),
  HyString('s'),
  HyString('foo bar')])

Instead of HySymbol("foo bar") as desired. This could make it fail in some macros, which are looking for the HySymbol model. We'd need true tagged literals or reader macros for this to work.

And if we want to use arbitrary symbols in the tag macros themselves, it's not good enough. Something like #|foo bar| baz or #foo\ bar baz could work as the "foo bar" tag applied to baz, that is,

[HyExpression([ HySymbol('dispatch_tag_macro'),
  HyString('foo bar'),
  HySymbol('baz')])]

But I don't think it could work at all with the s prefix.
The #s"foo bar" baz syntax would instead tokenize as

[HyExpression([ HySymbol('dispatch_tag_macro'),
  HyString('s'),
  HyString('foo bar')]),
 HySymbol('baz')]

The | and \ syntax does seem superior, and has a long tradition in other Lisps. But the cost is that we can't use | to mean a plain symbol anymore. We're using it for the bitwise-or operator now. We could use \| instead, but I'd rather rename it. We'd also want to rename ~``^``& to be consistent. But this would also be a good thing, since it would clean up an ambiguity with "unquote", give us Clojure's metadata syntax for annotations #640 #656, and let us use the shorter Clojure-style & instead of &rest in arguments lists.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants