Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement #qC string literals #1327

Closed
wants to merge 1 commit into from

Conversation

Kodiologist
Copy link
Member

@Kodiologist Kodiologist commented Jul 19, 2017

Closes #1287.

There's no documentation or NEWS update yet.

Note that matching delimiters are supported, but not inner pairs of matching delimiters (e.g., #q(()) is parsed the same as "(" )). This is because the lexer is regex-based, and Python doesn't implement recursive regexes. Matching delimiters are supported.

@gilch, I didn't reserve a character for indicating multi-character delimiters since you hadn't seemed to settle on [,=, or <. Multi-character delimiters are indicated with <.

@gilch
Copy link
Member

gilch commented Jul 19, 2017

I handn't settled because I wasn't that picky and wanted to hear other opinions, but I don't see how = could possibly work. I don't have strong feelings between [ and <, but let's go with [ like Lua.

@Kodiologist
Copy link
Member Author

So the opening syntax would be something like q[foo[. You said that the closing syntax could just be foo, but shouldn't it be ]foo] so that text editors don't think there are unmatched opening brackets?

@gilch
Copy link
Member

gilch commented Jul 19, 2017

Good point. So do opening like q[foo[ and closing like ]foo]. I'd also like to strip the first newline, if present, like Lua does.

@gilch
Copy link
Member

gilch commented Jul 19, 2017

If we did the opening like q<foo> then the closing could be just foo without the unmatched brackets, but I don't have a strong preference here either at the moment. The q[foo[ version is closer to how Lua does it.

@gilch
Copy link
Member

gilch commented Jul 19, 2017

This is because the lexer is regex-based, and Python doesn't implement recursive regexes.

Python does have a regex library on PyPI that can do recursive regexes like Perl. It has the same license as Python. I wonder if we can just drop that in without rewriting rply.

@Kodiologist
Copy link
Member Author

A bold idea. I'll try it out.

@Kodiologist
Copy link
Member Author

It works! (https://github.com/Kodiologist/hy/tree/hashstrings-recursive-regex) The catch is that regex contains C code and has ambivalent support for PyPy. So I don't think we should do this. Even if we weren't supporting PyPy, recursive matching of delimiters probably isn't worth adding a C dependency to a project that otherwise uses only pure Python.

@Kodiologist
Copy link
Member Author

Maybe I could achieve this instead with a Rule subclass that uses procedural code instead of a regex in its matches method. Do you think it's worth it?

@gilch
Copy link
Member

gilch commented Jul 19, 2017

Do you think it's worth it?

Probably yes. This would make it so easy to quote code from any language with a balanced delimiter pair, which is most of them. (Even J balances parentheses! [Edit: no it doesn't.]). I was even considering gating the feature (e.g. like we do for nonlocal) so only PyPy wouldn't support it. But if we can do it everywhere, that's better.

Is a rule subclass like that part of rply's public interface? If not, this is a harder call, since it would make updating rply more difficult if its implementation ever changes.

@gilch
Copy link
Member

gilch commented Jul 19, 2017

This is making me reconsider the choice of q[foo[ with ]foo]. If we can do balanced pair quoting, maybe it's better not to reserve [, since it would probably be a common choice for short strings. I think we should do q<foo> with foo instead. (And strip one initial newline, if present.)

@Kodiologist
Copy link
Member Author

Is a rule subclass like that part of rply's public interface?

It isn't, but I'm not too worried about that, because I'll only be making assumptions about a small amount of the code, and the author of rply seems pretty responsive.

I think we should do q<foo> with foo instead. (And strip one initial newline, if present.)

Sounds good to me.

hy/lex/lexer.py Outdated
# Unicode General Category "Pi" with the matching closing mark
("«", "»"), ("‘", "’"), ("‛", "’"), ("“", "”"), ("‹", "›"), ("⸂", "⸃"), ("⸄", "⸅"), ("⸉", "⸊"), ("⸌", "⸍"), ("⸜", "⸝"), ("⸠", "⸡"), # noqa
# BidiBrackets.txt
("(", ")"), ("[", "]"), ("{", "}"), ("༺", "༻"), ("༼", "༽"), ("᚛", "᚜"), ("⁅", "⁆"), ("⁽", "⁾"), ("₍", "₎"), ("⌈", "⌉"), ("⌊", "⌋"), ("〈", "〉"), ("❨", "❩"), ("❪", "❫"), ("❬", "❭"), ("❮", "❯"), ("❰", "❱"), ("❲", "❳"), ("❴", "❵"), ("⟅", "⟆"), ("⟦", "⟧"), ("⟨", "⟩"), ("⟪", "⟫"), ("⟬", "⟭"), ("⟮", "⟯"), ("⦃", "⦄"), ("⦅", "⦆"), ("⦇", "⦈"), ("⦉", "⦊"), ("⦋", "⦌"), ("⦍", "⦐"), ("⦏", "⦎"), ("⦑", "⦒"), ("⦓", "⦔"), ("⦕", "⦖"), ("⦗", "⦘"), ("⧘", "⧙"), ("⧚", "⧛"), ("⧼", "⧽"), ("⸢", "⸣"), ("⸤", "⸥"), ("⸦", "⸧"), ("⸨", "⸩"), ("〈", "〉"), ("《", "》"), ("「", "」"), ("『", "』"), ("【", "】"), ("〔", "〕"), ("〖", "〗"), ("〘", "〙"), ("〚", "〛"), ("﹙", "﹚"), ("﹛", "﹜"), ("﹝", "﹞"), ("(", ")"), ("[", "]"), ("{", "}"), ("⦅", "⦆"), ("「", "」")) # noqa
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no need to list all these out. https://docs.python.org/3.6/library/unicodedata.html

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unicodedata can tell which characters are Pi, but not identify the matching end-quote characters, nor does it seem to have BidiBrackets.txt.

@Kodiologist
Copy link
Member Author

Kodiologist commented Jul 19, 2017

Okay, #q<foo> and balanced delimiters are in there now.

@gilch
Copy link
Member

gilch commented Jul 20, 2017

I just realized, this will prevent us from using any other tag macro symbols that start with q.

The obvious fix is to make them use the same rules as other tag macros, with whitespace when it would be ambiguous, so it would be #q X with X, and #q <foo> with foo.

But #q[ still works fine.

@Kodiologist Kodiologist changed the title Implement q#C string literals Implement #qC string literals Jul 20, 2017
@Kodiologist
Copy link
Member Author

(Kek, I can't keep my own syntax straight.)

@Kodiologist
Copy link
Member Author

I just realized, this will prevent us from using an other tag macro symbols that start with q.

Quite so.

The obvious fix is to make them use the same rules as other tag macros, so #q X with X, and #q <foo>. But #q[, still works.

I would rather not allow or require this whitespace. It looks weird and leads to a construct like #q X having non-significant whitespace (where all runs of whitespace are treated the same) on the left of the delimiter (X) and significant whitespace (where each whitespace character is preserved literally) on the right. If we must have tag macros whose names begin with q, how about removing the plain style of q-string (#qXfoobarX) and leaving the balanced style (#q(foobar)) and pointy style (#q<X>foobarX)?

@gilch
Copy link
Member

gilch commented Jul 20, 2017

Removing the plain style is fine with me. It seems kind of redundant given the balanced style.

The balanced style works fine with #q[, #q(, and #q{, since this does not violate the rules for tag macros. Spaces here would still be allowed, though we could discourage that style for q-strings.

The remaining styles are still a problem though. Naked symbols are currently allowed to contain < and higher Unicode, including your other brackets. #1117 would help with a lot of our symbol concerns, since naked symbols could contain arbitrary strings with escapes.

Do we need to support so many bracket types for the balanced style? Users could be confused about which characters are allowed in tags. Are the three bracket types enough? If not, would the addition of « and » to the other brackets suffice? We could add just one more character, «, to what ends an identifier. That wouldn't be too hard to remember.

Is requiring the separation so bad for the pointy style? It's clear what's in the delimiter from the <>, so it's also clear the separating whitespace isn't. It would usually be used for multi-line strings anyway, so the slight extra length is not a big deal.

@Kodiologist
Copy link
Member Author

Even if the only change from what I have now is to remove the plain style, the only way you could get your tag macro shadowed is if its first character is q and its second character is < or one of the bracket or quote delimiter characters, which would make a strange name for a macro.

« (and the others) ought to stay identifier characters so long as they have no special meaning outside of #qC. My proposals A and B in #1287 were to use a fixed character for quoting rather than a pick-your-own-quotes operator, but they didn't seem very popular.

@Kodiologist
Copy link
Member Author

@kirbyfan64 @tuturto Can you guys weigh in on what you would accept?

@tuturto
Copy link
Contributor

tuturto commented Aug 2, 2017

Out of all options, I like balanced style best.

@Kodiologist
Copy link
Member Author

@tuturto Yeah, but see the back-and-forth between me and Matthew above. Would you accept dropping the plain style but still not requiring or allowing whitespace after the q?

@tuturto
Copy link
Contributor

tuturto commented Aug 2, 2017

Plain style can go. I don't have strong opinion about whitespace, but I'm actually slightly leaning towards requiring it. But for me either (whitespace, no whitespace) is ok.

@Kodiologist
Copy link
Member Author

Okay, thanks. @kirbyfan64?

@Kodiologist
Copy link
Member Author

Looks like we're Ryanless for this PR. @gilch, would you accept a spaceless balanced style with all the delimiters allowed here? You're proposing a different PR that will prevent calling tag macros whose names begin with _ (#1354), so perhaps you'll also allow restriction of weirder prefixes like q⁅ and q⸌. If not, I'll take your compromise in which #q(, #q[, #q{, and #q« are the only forms of the balanced style.

@gilch
Copy link
Member

gilch commented Aug 7, 2017

You're proposing a different PR that will prevent calling tag macros whose names begin with _ (#1354)

I did notice that, and originally thought about implementing it differently. But Clojure also does it that way. I checked. But I'd rather not add any more exceptions on what's allowed in tags.

(I think it would be nice if Hy's parser could read EDN. This would make it easier to use a nice exchange format with the Clojure ecosystem. With the addition of the #_ discard syntax, I think it can.)

I've also become very worried about how any new string syntax will interact with our tooling. Hy is currently similar enough to Clojure that we can use Clojure editors on Hy with fairly good results. It's too bad Clojure doesn't have something like this already. If we add something new, it's very important that vim-hy and hy-mode (with Parinfer!) can support it. So I'd rather keep the new syntax simple. The Balanced Style would work in most cases, but for arbitrary strings, we need some kind of custom delimiters. (Can we escape a terminating bracket?) I think we need to discuss it more, and try out some of these proposals with the editors before we put it on master.

@Kodiologist
Copy link
Member Author

Don't worry, I've gotten myself utterly confused, too.

@refi64
Copy link
Contributor

refi64 commented Aug 8, 2017

Yeah, this is a mess. I still have no clue what the hell "pointy style" is. I think we just need to start from the top. How about we start from the top...

My understanding is that the current state of the code is that is looks like #q{delim}STUFF delim, right?

Here's my belief:

Why is the Lua syntax awesome?

For starters, it's dead-simple if you're not using custom delimiters (which is the case most of the time). e.g. #q[[abc]]. On the other hand, the current syntax always requires a delimiter, e.g. #q{EOF} stuff EOF. My stance (partly from using heredocs for a long time) is that:

  • 99% of the time the delimiter will either be EOF or some weird character, where everyone uses a different one
  • There are too many choices. Not only can you pick any delimiter, but you can also pick what delimits the delimiter (e.g. #qAdelimiterA is valid). Really, I don't see any use for using a custom character after the q, since the delimiter can be whatever you want anyway.

In addition, it plays nicely with Python's one way to do it philosophy: it allows just enough flexibility, but most of the time it'll end up being roughly the same thing.

@refi64
Copy link
Contributor

refi64 commented Aug 8, 2017

Also, I'm not sure what this has to do with list syntax. Isn't everything handled in the lexer anyway?

@Kodiologist
Copy link
Member Author

Kodiologist commented Aug 8, 2017

I still have no clue what the hell "pointy style" is.

It looks like #q<FOOEY>my stringFOOEY.

My understanding is that the current state of the code is that is looks like #q{delim}STUFF delim, right?

No. The current state of the code allows three styles, none of which allow #q{delim}STUFF delim:

  1. The plain style: qXmy stringX.
  2. The balanced style: q{my string {still part of my string}}.
  3. The aforementioned pointy style, #q<FOOEY>my stringFOOEY. It's called the "pointy" style because you have to use ASCII angle brackets in the opening delimiter.

Also, I'm not sure what this has to do with list syntax.

Matthew wants a syntax that does not restrict what tag macros you can call, compared to what's already the case. #[ currently isn't a valid tag macro call, whereas (e.g.) is, because [ lexes as the start of a list.

@Kodiologist
Copy link
Member Author

Kodiologist commented Aug 8, 2017

#[ currently isn't a valid tag macro call, whereas (e.g.) is, because [ lexes as the start of a list.

Actually, that's not quite true. The real reason #[ can't lex as a tag macro call is because a tag macro call can only contain identifier characters, and [ isn't an identifier character (the regex that defines identifier is in fact [^()\[\]{}'"\s;]+).

@gilch
Copy link
Member

gilch commented Aug 8, 2017

@kirbyfan64

What @Kodiologist said.

no clue what the hell "pointy style" is

Like #q <END> string END. But @Kodiologist didn't like the space, so he proposed #q<END>, which interferes with tag macros.

what this has to do with list syntax

That's only if we don't have the #q as the dispatch and use [= instead, which looks like a list and a symbol. So I proposed using #[ instead.


How about we start from the top

So my current best proposal is both

  1. #[ dispatch

#[foo[ <remove 1 newline, if present> string ]foo] where foo is any string that does not contain []. This means both #[[...]] and #[=[...]=] are valid. Like Lua, escapes don't work, since these are raw strings, and it will also strip one newline from the start, if present.

#[[
spam
eggs]]

is just like "spam\neggs".

But unlike Lua, you have to start with #, or it's a list. And also unlike Lua, you're allowed to use a delimiter besides some number of =. This could be parsed into the Hy model as metadata for use with macros. It could also be a hint to editors to use some other highlighting mode when quoting some other language, e.g. #[Python[...]Python] would be highlighted like Python. But most of the time you'd use #[[ or some number of = when nesting like #[===[, if you don't have a better idea for the delimiter. And we could say so in the Style Guide.

And

  1. The « and » balanced style

This is like Proposal A from #1287. It works just like the normal double-quote style, but since it's paired, you can have « and » inside of it as long as it's balanced (or escaped). (And ", of course). It would have the normal string prefixes, like for a raw string.

It's really too bad double quotes aren't paired as a historical artifact from typewriters, or we'd already be doing this in Python. But given the #[foo[ style and normal double quotes, Unicode is still not required for developing Hy, so I don't mind adding this as an option.

@Kodiologist
Copy link
Member Author

I endorse this proposal and will implement it if it gets sufficient traction.

@Kodiologist
Copy link
Member Author

Kodiologist commented Aug 8, 2017

Although, I think that allowing unescaped nested guillemets is probably a bad idea because (1) it complicates the lexer, (2) it will complicate syntax coloring (which will no longer be possible with non-recursive regexes alone), and (3) you're much less likely to need nested guillemets than you would nested parentheses or square brackets or whatever, and if you do, there's still the Lua style. But if we really need it, at least I've already gotten most of the code down.

@refi64
Copy link
Contributor

refi64 commented Aug 8, 2017

Same. (TBH I still don't really grasp the use case for proposal 2, but I don't feel like figuring out anything else...)

@Kodiologist
Copy link
Member Author

@gilch It looks like at least the three of us (you, me, and Ryan) agree on your proposal. Can you just comment on the question of nested guillemets? Then I'll write it up and make a new PR for it.

@gilch
Copy link
Member

gilch commented Aug 9, 2017

We seem to agree on the Lua strings, at least.

(1) it complicates the lexer,

Not too much though. And it could get better in the future. I wonder what the chances are of that regex library replacing Python's standard-library re someday. We might also consider a pull request for rply to make the lexer handle this kind of thing better.

(2) it will complicate syntax coloring (which will no longer be possible with non-recursive regexes alone),

A very important concern. We do not want to make tooling for Hy difficult to implement. Yes, we'd need the equivalent power of a pushdown automaton. But PDAs are old technology. This is a solved problem. Both Vimscript and Elisp are Turing Complete. They can handle it even if their regexes can't.

Python has been called "executable pseudocode". We could publish the Python algorithm to match these strings in our docs under a public domain license for future tool writers to copy in whatever language they prefer. We could publish the equivalent PCRE recursive "regex" too.

Both Perl and Ruby have balanced-style strings with the same issue. How much of a problem is it for them, really? How does their tooling handle it? Any general-purpose editor or syntax highlighting library that can handle those properly could also handle Hy with an appropriate script. If you're using an editor that can't, you could escape them anyway. (And consider getting a better editor.)

(3) you're much less likely to need nested guillemets

True, but I think paired string delimiters imply this feature. Not having it would be weird. There might be use for it in i18n. It does seem like a lot of problems for a feature we'd rarely use. I'd also be okay with only the Lua strings though.

Guillements are an interesting choice. If we're using Unicode anyway we could have used the English-style 66/99 quotes with U+201C and U+201D. But they might be harder to distinguish than the guillements in some fonts. And guillements are probably easier to type on most systems. Unfortunately, some languages (like German) like to use them backwards, »like this«. The other way is more common, but we could support both, by allowing guillement strings to start with » too. In that case, the nested guillements would also need to be backwards, or escaped.

I'd like to think about this some more.

@gilch
Copy link
Member

gilch commented Aug 9, 2017

I think I've got a good alternative. Implementing #1117 would allow us to create HySymbols from arbitrary strings using | <string> |. This is a single delimiter like " is, so we could avoid the need for balancing logic. We could use a tag macro to convert it back to a string at compile time. So, something like #q |foo bar|, or even #q|foo bar| could work, depending on how we set up the lexing for |-quoted symbols. This is still one character shorter than #[[foo bar]], and comparable to the Plain Style. Also, a HySymbol is-a Python string, and will work as one in most contexts, so you often wouldn't even need the tag macro. Just use '|foo bar|, which is even shorter. We could also use a different tag macro to convert it differently, like #b for a bytestring or #f for a format string, etc.

@Kodiologist
Copy link
Member Author

Kodiologist commented Aug 9, 2017

Both Perl and Ruby have balanced-style strings with the same issue. How much of a problem is it for them, really? How does their tooling handle it?

I can't speak for Ruby, but the complexity of Perl's quoting forms tends to be behind the corner cases that Perl syntax highlighters have trouble with. That's probably the hardest Perl feature to highlight, except perhaps the magic variables with weird names like $' (and modern Perl should generally use English; so you can use the longer names instead, anyway).

If we're using Unicode anyway we could have used the English-style 66/99 quotes with U+201C and U+201D. But they might be harder to distinguish than the guillements in some fonts.

Agreed, and hard to distinguish from ASCII double quotes, too.

Unfortunately, some languages (like German) like to use them backwards, »like this«. The other way is more common, but we could support both, by allowing guillement strings to start with » too.

Oh dear, that sounds like a recipe for insanity. If annoying or confusing people who use guillemets in the opposite direction in their native language is a concern, it's probably better to choose different characters, like and or and .

My first objection to quoting with vertical bars would be, how does the parser tell whether (foo | x |) is a function of one argument, the symbol named " x ", or three arguments, the first and third of which are the | function (the shadow version of the bitwise OR operator from Python)?

@gilch
Copy link
Member

gilch commented Aug 9, 2017

(foo | x |) would clearly be tokenized as HyExpression([ HySymbol('foo'), HySymbol(' x ')]) under #1117.

This does mean we can no longer write | as our bitwise or. We could certainly spell it out as bit-or (or bor or or-) instead though. We did that with = to setv. (We'd also want to spell out &, ^, and ~ for consistency, which would also free up those characters for other uses.) \| could also work, but I don't think it's as pretty.

@refi64
Copy link
Contributor

refi64 commented Aug 10, 2017

FWIW is there a reason we can't just use PLY? It works much better with things like this.

@Kodiologist
Copy link
Member Author

I'd rather not rename a bunch of operators when we could just use other syntax. So, if there are no objections, I'm going to implement Lua style and a guillemet style supporting nesting. We've rather drawn this out, so it will be nice to conclude it.

@gilch
Copy link
Member

gilch commented Aug 10, 2017

I'd rather not rename a bunch of operators when we could just use other syntax.

We'd have to do that for #1117 anyway, which is important for a lot of other issues. I'd rather not have three kinds of string literals when two will do. A change in grammar is a much bigger deal than renaming some core operators. And it's confusing that bitwise-not is the same symbol as unquote, so I want to change that one anyway. For example,

`(~(foo))

Does the above immediately call foo and unquote its result, or put (foo) in the expansion and bitwise-negate it? Obvoiusly the repl can tell you. But `(-(foo)) would negate it. That's inconsistent.

Freeing up & from bitwise-and would allow us to use it in function calls. So we could use the shorter Clojure style (foo [& args]...) instead of (foo [&rest args]).

And if we free up | from bitwise-or, we can have both arbitrary-string symbols and a concise non-double-quote string syntax with only one grammar change instead of two. This would make the guillemet style redundant.

[Edit: and ^. That's only four. This would give us the Clojure metadata syntax, which we could use to implement Python's annotations. #640 #656, among others uses. And maybe even Cython #934]

We can leave the other bitwise operators alone.

@gilch
Copy link
Member

gilch commented Aug 10, 2017

So for this PR, it's fine if it's just the #[foo[ Lua style. While a little more verbose that we might like in some cases, that's good enough until we implement #1117 with a #q tag macro to convert the symbol name to a string in another PR. But if you want to do #1117 here too, that's fine.

@Kodiologist
Copy link
Member Author

A change in grammar is a much bigger deal than renaming some core operators.

The grammar has to change in any case in order to implement a new form of symbol quoting or string literal or whatever.

Wouldn't it be better to quote with some syntax that doesn't require changing an operator from the Python name, like \ … \ or #( … ) or #` … ` or « … »?

And seeing as people are going to use funny characters in strings more often than in symbols, shouldn't we have the default for the syntax be a string rather than a symbol?

@Kodiologist
Copy link
Member Author

Related to that last point, easier entry of symbol with weird names doesn't actually need a new quoting syntax—it would suffice to add another prefix to string literals that makes the result a symbol instead of a plain string.

@Kodiologist
Copy link
Member Author

FWIW is there a reason we can't just use PLY?

I don't know, not having used it, but rewriting the lexer and parser is out of the scope of this PR.

@gilch
Copy link
Member

gilch commented Aug 10, 2017

FWIW is there a reason we can't just use PLY?
rewriting the lexer and parser is out of the scope of this PR.

@kirbyfan64 PLY has a compatible BSD license, so I don't know of a reason. In what way is it better though? Can it do this kind of nested parsing we've discussed for the balanced styles any better without adding a new regex engine dependency? If so, and if we settle on a balanced style, it might be worth it. But @Kodiologist pointed out that any balanced style would complicate our tooling.

@gilch
Copy link
Member

gilch commented Aug 10, 2017

The grammar has to change in any case in order to implement a new form of symbol quoting or string literal or whatever.

But it would be a less complex grammar with only the two string literals and the arbitrary symbol syntax, than with three string literal types and the arbitrary symbol syntax.

Wouldn't it be better to quote with some syntax that doesn't require changing an operator from the Python name, like \ … \ or #( … ) or # or « … »?

I actually think the | and \ syntax is better for symbols. Freeing up those four names from the operators would help with other issues. But if the others also feel very strongly that we must not rename our bitwise-or, then || symbols are not an option. And we'd have to come up with some other syntax for #1117. It's not like Hy has infix notation. Bitwise operators aren't even used that much in Python. But if you are prone to bit-bashing, you'd have to use parentheses and indentation to make the expression readable. So I don't think slightly longer names are a big deal.

I'd rather keep \ as an escape character for symbols even if we don't use the | quoting, like in Emacs Lisp. So \...\ is out.

#(...) means xi in Clojure. Using it for balanced strings might confuse the Clojure users, but I think it's actually not an unreasonable choice. It's only one character longer than "", without introducing Unicode syntax. It also has a nice parallel with #[[]]. If we wanted a prefix version of xi for the Clojure people, something like #%(f %1 %2 ... %&) would work about as well instead. We also considered it for genexprs #867, which isn't xi either. They could use a slightly longer tag, but it wasn't clear what when we only had one character sharp macros. Maybe #for would work. Balanced strings do have the aforementioned problem with tooling, but if we are doing paired delimiters, I want them balanced. I'm not sure if I like this better than the guillements or not (or both?), but I'm leaning towards #(...). It's fine to allow the users to use Unicode, but I'd rather not add any to Hy itself.

We've already discussed A and B, but one of them is paired and the other isn't.

And seeing as people are going to use funny characters in strings more often than in symbols, shouldn't we have the default for the syntax be a string rather than a symbol?

No, because we want to be able to use funny characters in tag macros. We've already got the super-short "" notation for strings with an overhead of just two characters. We'd almost always just use that. And we'll be adding #[[]] too for crazy raw strings, which are probably not short.

It feels like we're quibbling over saving one character. Compare the overhead of Lua #[[...]] (5) to the overhead of Plain #qX...X (4), which you thought was good enough before (if barely?). Note that tagged symbol #q|...| is also 4.

We could perhaps have r|foo bar| be the same as r"foo bar". But that would be ambiguous with #tag-r|foo bar|. Is that the tag tag-r applied to symbol |foo bar| or is that tag tag- applied to string r"foo bar". If we always assumed a separation before the| then it's unambiguous. And we could do the #q|foo bar| tagged symbol strings.

I'm still not getting the use case where neither "" nor #[[]] is good enough, and neither was @kirbyfan64. Unicode is hard to type on most systems, so it's not like we're saving keystrokes. And an Emacs binding could insert #[[ just as easily as «. Guillemonts sure are prettier, but you can also have Emacs abbreviate things with Unicode for your reading pleasure, like λ for fn or xᵢ for xi. Why not « for #[[ and » for ]]? Can we do that last one to the Lua-Style strings without also doing it to nested lists? If not, something like #[;[ ];] should be unambiguous. This way you can read Unicode, but still be compatible with ASCII tools and files.

So, compared to Lua Style, it's not easier to read (on Emacs). It's not easier to write (unless you have a foreign keyboard, and even they require AltGr). It's not easier to implement tooling for. Many terminals can't print Unicode, so it's not very good for the command line either, and Unicode is still hard to type. I could maybe see #(...) being better here, but barely. It's only two characters shorter.

@Kodiologist
Copy link
Member Author

For what it's worth, I use the bitwise operators much more often for vectorized logic with NumPy or Pandas objects than for actual bitwise logic.

It feels like we're quibbling over saving one character.

Yeah, we've been bikeshedding mercilessly about this since the beginning. Somebody's gotta give in order for this to end. So, I give. Would you accept a PR that adds the Lua style (without nesting of the customized delimiter) and doesn't do anything else?

@gilch
Copy link
Member

gilch commented Aug 10, 2017

Yeah, we've been bikeshedding mercilessly about this since the beginning.

I didn't see the discussion as trivial. The grammar is something we should try to keep simple and understandable. Changes here should be carefully thought out in the context of the whole language instead of blindly accepting the first thing that seems like a good idea. I felt like the options have been improving because of the discussion. I saw good points that I hadn't thought of on my own.

Would you accept a PR that adds the Lua style (without nesting of the customized delimiter) and doesn't do anything else?

A single change could probably get approved faster than two.

Which version exactly are you proposing? I'd accept the version like #[foo[ (remove one newline, if present) string body ]foo], where foo is any string not containing [ nor ], including the empty string, and the foo is available as metadata in the HyString model for use with macros, etc.

This is not a balanced style, so a pure FSM regex engine could highlight it, e.g. #[foo[ #[foo[ bar ]foo] baz ]foo] would end at the first matching, ]foo], and be equivalent to the string " #foo[ bar ". The remaining baz ]foo] wouldn't get highlighted and might be a syntax error depending on context. No need for a PDA. If you want to nest these, you have to pick a different foo, usually just by adding another =.

@Kodiologist
Copy link
Member Author

Which version exactly are you proposing? I'd accept the version like #[foo[ (remove one newline, if present) string body ]foo], where foo is any string not containing [ nor ], including the empty string, and the foo is available as metadata in the HyString model.

Yeah, that one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants