Implement #qC string literals #1327

Kodiologist · 2017-07-19T19:27:46Z

Closes #1287.

There's no documentation or NEWS update yet.

Note that matching delimiters are supported, but not inner pairs of matching delimiters (e.g., #q(()) is parsed the same as "(" )). This is because the lexer is regex-based, and Python doesn't implement recursive regexes. Matching delimiters are supported.

~~@gilch, I didn't reserve a character for indicating multi-character delimiters since you hadn't seemed to settle on [,=, or <.~~ Multi-character delimiters are indicated with <.

gilch · 2017-07-19T19:35:10Z

I handn't settled because I wasn't that picky and wanted to hear other opinions, but I don't see how = could possibly work. I don't have strong feelings between [ and <, but let's go with [ like Lua.

Kodiologist · 2017-07-19T19:39:46Z

So the opening syntax would be something like q[foo[. You said that the closing syntax could just be foo, but shouldn't it be ]foo] so that text editors don't think there are unmatched opening brackets?

gilch · 2017-07-19T19:41:25Z

Good point. So do opening like q[foo[ and closing like ]foo]. I'd also like to strip the first newline, if present, like Lua does.

gilch · 2017-07-19T19:45:33Z

If we did the opening like q<foo> then the closing could be just foo without the unmatched brackets, but I don't have a strong preference here either at the moment. The q[foo[ version is closer to how Lua does it.

gilch · 2017-07-19T20:09:34Z

This is because the lexer is regex-based, and Python doesn't implement recursive regexes.

Python does have a regex library on PyPI that can do recursive regexes like Perl. It has the same license as Python. I wonder if we can just drop that in without rewriting rply.

Kodiologist · 2017-07-19T20:20:54Z

A bold idea. I'll try it out.

Kodiologist · 2017-07-19T21:43:41Z

It works! (https://github.com/Kodiologist/hy/tree/hashstrings-recursive-regex) The catch is that regex contains C code and has ambivalent support for PyPy. So I don't think we should do this. Even if we weren't supporting PyPy, recursive matching of delimiters probably isn't worth adding a C dependency to a project that otherwise uses only pure Python.

Kodiologist · 2017-07-19T21:46:24Z

Maybe I could achieve this instead with a Rule subclass that uses procedural code instead of a regex in its matches method. Do you think it's worth it?

gilch · 2017-07-19T22:01:49Z

Do you think it's worth it?

Probably yes. This would make it so easy to quote code from any language with a balanced delimiter pair, which is most of them. (Even J balances parentheses! [Edit: no it doesn't.]). I was even considering gating the feature (e.g. like we do for nonlocal) so only PyPy wouldn't support it. But if we can do it everywhere, that's better.

Is a rule subclass like that part of rply's public interface? If not, this is a harder call, since it would make updating rply more difficult if its implementation ever changes.

gilch · 2017-07-19T22:15:15Z

This is making me reconsider the choice of q[foo[ with ]foo]. If we can do balanced pair quoting, maybe it's better not to reserve [, since it would probably be a common choice for short strings. I think we should do q<foo> with foo instead. (And strip one initial newline, if present.)

Kodiologist · 2017-07-19T22:33:17Z

Is a rule subclass like that part of rply's public interface?

It isn't, but I'm not too worried about that, because I'll only be making assumptions about a small amount of the code, and the author of rply seems pretty responsive.

I think we should do q<foo> with foo instead. (And strip one initial newline, if present.)

Sounds good to me.

gilch · 2017-07-19T23:28:30Z

hy/lex/lexer.py

+  # Unicode General Category "Pi" with the matching closing mark
+  ("«", "»"), ("‘", "’"), ("‛", "’"), ("“", "”"), ("‹", "›"), ("⸂", "⸃"), ("⸄", "⸅"), ("⸉", "⸊"), ("⸌", "⸍"), ("⸜", "⸝"), ("⸠", "⸡"),  # noqa
+  # BidiBrackets.txt
+  ("(", ")"), ("[", "]"), ("{", "}"), ("༺", "༻"), ("༼", "༽"), ("᚛", "᚜"), ("⁅", "⁆"), ("⁽", "⁾"), ("₍", "₎"), ("⌈", "⌉"), ("⌊", "⌋"), ("〈", "〉"), ("❨", "❩"), ("❪", "❫"), ("❬", "❭"), ("❮", "❯"), ("❰", "❱"), ("❲", "❳"), ("❴", "❵"), ("⟅", "⟆"), ("⟦", "⟧"), ("⟨", "⟩"), ("⟪", "⟫"), ("⟬", "⟭"), ("⟮", "⟯"), ("⦃", "⦄"), ("⦅", "⦆"), ("⦇", "⦈"), ("⦉", "⦊"), ("⦋", "⦌"), ("⦍", "⦐"), ("⦏", "⦎"), ("⦑", "⦒"), ("⦓", "⦔"), ("⦕", "⦖"), ("⦗", "⦘"), ("⧘", "⧙"), ("⧚", "⧛"), ("⧼", "⧽"), ("⸢", "⸣"), ("⸤", "⸥"), ("⸦", "⸧"), ("⸨", "⸩"), ("〈", "〉"), ("《", "》"), ("「", "」"), ("『", "』"), ("【", "】"), ("〔", "〕"), ("〖", "〗"), ("〘", "〙"), ("〚", "〛"), ("﹙", "﹚"), ("﹛", "﹜"), ("﹝", "﹞"), ("（", "）"), ("［", "］"), ("｛", "｝"), ("｟", "｠"), ("｢", "｣"))  # noqa


There's no need to list all these out. https://docs.python.org/3.6/library/unicodedata.html

unicodedata can tell which characters are Pi, but not identify the matching end-quote characters, nor does it seem to have BidiBrackets.txt.

Kodiologist · 2017-07-19T23:51:36Z

Okay, #q<foo> and balanced delimiters are in there now.

gilch · 2017-07-20T00:27:18Z

I just realized, this will prevent us from using any other tag macro symbols that start with q.

The obvious fix is to make them use the same rules as other tag macros, with whitespace when it would be ambiguous, so it would be #q X with X, and #q <foo> with foo.

But #q[ still works fine.

Kodiologist · 2017-07-20T00:34:44Z

(Kek, I can't keep my own syntax straight.)

Kodiologist · 2017-07-20T00:45:36Z

I just realized, this will prevent us from using an other tag macro symbols that start with q.

Quite so.

The obvious fix is to make them use the same rules as other tag macros, so #q X with X, and #q <foo>. But #q[, still works.

I would rather not allow or require this whitespace. It looks weird and leads to a construct like #q X having non-significant whitespace (where all runs of whitespace are treated the same) on the left of the delimiter (X) and significant whitespace (where each whitespace character is preserved literally) on the right. If we must have tag macros whose names begin with q, how about removing the plain style of q-string (#qXfoobarX) and leaving the balanced style (#q(foobar)) and pointy style (#q<X>foobarX)?

gilch · 2017-07-20T01:12:08Z

Removing the plain style is fine with me. It seems kind of redundant given the balanced style.

The balanced style works fine with #q[, #q(, and #q{, since this does not violate the rules for tag macros. Spaces here would still be allowed, though we could discourage that style for q-strings.

The remaining styles are still a problem though. Naked symbols are currently allowed to contain < and higher Unicode, including your other brackets. #1117 would help with a lot of our symbol concerns, since naked symbols could contain arbitrary strings with escapes.

Do we need to support so many bracket types for the balanced style? Users could be confused about which characters are allowed in tags. Are the three bracket types enough? If not, would the addition of « and » to the other brackets suffice? We could add just one more character, «, to what ends an identifier. That wouldn't be too hard to remember.

Is requiring the separation so bad for the pointy style? It's clear what's in the delimiter from the <>, so it's also clear the separating whitespace isn't. It would usually be used for multi-line strings anyway, so the slight extra length is not a big deal.

Kodiologist · 2017-07-20T01:41:49Z

Even if the only change from what I have now is to remove the plain style, the only way you could get your tag macro shadowed is if its first character is q and its second character is < or one of the bracket or quote delimiter characters, which would make a strange name for a macro.

« (and the others) ought to stay identifier characters so long as they have no special meaning outside of #qC. My proposals A and B in #1287 were to use a fixed character for quoting rather than a pick-your-own-quotes operator, but they didn't seem very popular.

Kodiologist · 2017-07-22T17:53:05Z

@kirbyfan64 @tuturto Can you guys weigh in on what you would accept?

tuturto · 2017-08-02T15:26:47Z

Out of all options, I like balanced style best.

Kodiologist · 2017-08-02T17:46:24Z

@tuturto Yeah, but see the back-and-forth between me and Matthew above. Would you accept dropping the plain style but still not requiring or allowing whitespace after the q?

tuturto · 2017-08-02T18:03:01Z

Plain style can go. I don't have strong opinion about whitespace, but I'm actually slightly leaning towards requiring it. But for me either (whitespace, no whitespace) is ok.

Kodiologist · 2017-08-02T18:36:35Z

Okay, thanks. @kirbyfan64?

Kodiologist · 2017-08-07T16:52:41Z

Looks like we're Ryanless for this PR. @gilch, would you accept a spaceless balanced style with all the delimiters allowed here? You're proposing a different PR that will prevent calling tag macros whose names begin with _ (#1354), so perhaps you'll also allow restriction of weirder prefixes like q⁅ and q⸌. If not, I'll take your compromise in which #q(, #q[, #q{, and #q« are the only forms of the balanced style.

gilch · 2017-08-07T21:23:14Z

You're proposing a different PR that will prevent calling tag macros whose names begin with _ (#1354)

I did notice that, and originally thought about implementing it differently. But Clojure also does it that way. I checked. But I'd rather not add any more exceptions on what's allowed in tags.

(I think it would be nice if Hy's parser could read EDN. This would make it easier to use a nice exchange format with the Clojure ecosystem. With the addition of the #_ discard syntax, I think it can.)

I've also become very worried about how any new string syntax will interact with our tooling. Hy is currently similar enough to Clojure that we can use Clojure editors on Hy with fairly good results. It's too bad Clojure doesn't have something like this already. If we add something new, it's very important that vim-hy and hy-mode (with Parinfer!) can support it. So I'd rather keep the new syntax simple. The Balanced Style would work in most cases, but for arbitrary strings, we need some kind of custom delimiters. (Can we escape a terminating bracket?) I think we need to discuss it more, and try out some of these proposals with the editors before we put it on master.

Kodiologist · 2017-08-08T18:26:21Z

Don't worry, I've gotten myself utterly confused, too.

refi64 · 2017-08-08T18:36:14Z

Yeah, this is a mess. I still have no clue what the hell "pointy style" is. I think we just need to start from the top. How about we start from the top...

My understanding is that the current state of the code is that is looks like #q{delim}STUFF delim, right?

Here's my belief:

Why is the Lua syntax awesome?

For starters, it's dead-simple if you're not using custom delimiters (which is the case most of the time). e.g. #q[[abc]]. On the other hand, the current syntax always requires a delimiter, e.g. #q{EOF} stuff EOF. My stance (partly from using heredocs for a long time) is that:

99% of the time the delimiter will either be EOF or some weird character, where everyone uses a different one
There are too many choices. Not only can you pick any delimiter, but you can also pick what delimits the delimiter (e.g. #qAdelimiterA is valid). Really, I don't see any use for using a custom character after the q, since the delimiter can be whatever you want anyway.

In addition, it plays nicely with Python's one way to do it philosophy: it allows just enough flexibility, but most of the time it'll end up being roughly the same thing.

refi64 · 2017-08-08T18:36:31Z

Also, I'm not sure what this has to do with list syntax. Isn't everything handled in the lexer anyway?

Kodiologist · 2017-08-08T18:42:06Z

I still have no clue what the hell "pointy style" is.

It looks like #q<FOOEY>my stringFOOEY.

My understanding is that the current state of the code is that is looks like #q{delim}STUFF delim, right?

No. The current state of the code allows three styles, none of which allow #q{delim}STUFF delim:

The plain style: qXmy stringX.
The balanced style: q{my string {still part of my string}}.
The aforementioned pointy style, #q<FOOEY>my stringFOOEY. It's called the "pointy" style because you have to use ASCII angle brackets in the opening delimiter.

Also, I'm not sure what this has to do with list syntax.

Matthew wants a syntax that does not restrict what tag macros you can call, compared to what's already the case. #[ currently isn't a valid tag macro call, whereas (e.g.) #« is, because [ lexes as the start of a list.

Kodiologist · 2017-08-08T18:45:26Z

#[ currently isn't a valid tag macro call, whereas (e.g.) #« is, because [ lexes as the start of a list.

Actually, that's not quite true. The real reason #[ can't lex as a tag macro call is because a tag macro call can only contain identifier characters, and [ isn't an identifier character (the regex that defines identifier is in fact [^()\[\]{}'"\s;]+).

gilch · 2017-08-08T19:10:03Z

@kirbyfan64

What @Kodiologist said.

no clue what the hell "pointy style" is

Like #q <END> string END. But @Kodiologist didn't like the space, so he proposed #q<END>, which interferes with tag macros.

what this has to do with list syntax

That's only if we don't have the #q as the dispatch and use [= instead, which looks like a list and a symbol. So I proposed using #[ instead.

How about we start from the top

So my current best proposal is both

#[ dispatch

#[foo[ <remove 1 newline, if present> string ]foo] where foo is any string that does not contain []. This means both #[[...]] and #[=[...]=] are valid. Like Lua, escapes don't work, since these are raw strings, and it will also strip one newline from the start, if present.

#[[
spam
eggs]]

is just like "spam\neggs".

But unlike Lua, you have to start with #, or it's a list. And also unlike Lua, you're allowed to use a delimiter besides some number of =. This could be parsed into the Hy model as metadata for use with macros. It could also be a hint to editors to use some other highlighting mode when quoting some other language, e.g. #[Python[...]Python] would be highlighted like Python. But most of the time you'd use #[[ or some number of = when nesting like #[===[, if you don't have a better idea for the delimiter. And we could say so in the Style Guide.

And

The « and » balanced style

This is like Proposal A from #1287. It works just like the normal double-quote style, but since it's paired, you can have « and » inside of it as long as it's balanced (or escaped). (And ", of course). It would have the normal string prefixes, like r« for a raw string.

It's really too bad double quotes aren't paired as a historical artifact from typewriters, or we'd already be doing this in Python. But given the #[foo[ style and normal double quotes, Unicode is still not required for developing Hy, so I don't mind adding this as an option.

Kodiologist · 2017-08-08T19:12:46Z

I endorse this proposal and will implement it if it gets sufficient traction.

Kodiologist · 2017-08-08T19:19:12Z

Although, I think that allowing unescaped nested guillemets is probably a bad idea because (1) it complicates the lexer, (2) it will complicate syntax coloring (which will no longer be possible with non-recursive regexes alone), and (3) you're much less likely to need nested guillemets than you would nested parentheses or square brackets or whatever, and if you do, there's still the Lua style. But if we really need it, at least I've already gotten most of the code down.

refi64 · 2017-08-08T19:19:34Z

Same. (TBH I still don't really grasp the use case for proposal 2, but I don't feel like figuring out anything else...)

Kodiologist · 2017-08-09T15:29:54Z

@gilch It looks like at least the three of us (you, me, and Ryan) agree on your proposal. Can you just comment on the question of nested guillemets? Then I'll write it up and make a new PR for it.

gilch · 2017-08-09T22:36:42Z

We seem to agree on the Lua strings, at least.

(1) it complicates the lexer,

Not too much though. And it could get better in the future. I wonder what the chances are of that regex library replacing Python's standard-library re someday. We might also consider a pull request for rply to make the lexer handle this kind of thing better.

(2) it will complicate syntax coloring (which will no longer be possible with non-recursive regexes alone),

A very important concern. We do not want to make tooling for Hy difficult to implement. Yes, we'd need the equivalent power of a pushdown automaton. But PDAs are old technology. This is a solved problem. Both Vimscript and Elisp are Turing Complete. They can handle it even if their regexes can't.

Python has been called "executable pseudocode". We could publish the Python algorithm to match these strings in our docs under a public domain license for future tool writers to copy in whatever language they prefer. We could publish the equivalent PCRE recursive "regex" too.

Both Perl and Ruby have balanced-style strings with the same issue. How much of a problem is it for them, really? How does their tooling handle it? Any general-purpose editor or syntax highlighting library that can handle those properly could also handle Hy with an appropriate script. If you're using an editor that can't, you could escape them anyway. (And consider getting a better editor.)

(3) you're much less likely to need nested guillemets

True, but I think paired string delimiters imply this feature. Not having it would be weird. There might be use for it in i18n. It does seem like a lot of problems for a feature we'd rarely use. I'd also be okay with only the Lua strings though.

Guillements are an interesting choice. If we're using Unicode anyway we could have used the English-style 66/99 quotes with “ U+201C and ” U+201D. But they might be harder to distinguish than the guillements in some fonts. And guillements are probably easier to type on most systems. Unfortunately, some languages (like German) like to use them backwards, »like this«. The other way is more common, but we could support both, by allowing guillement strings to start with » too. In that case, the nested guillements would also need to be backwards, or escaped.

I'd like to think about this some more.

gilch · 2017-08-09T22:44:14Z

I think I've got a good alternative. Implementing #1117 would allow us to create HySymbols from arbitrary strings using | <string> |. This is a single delimiter like " is, so we could avoid the need for balancing logic. We could use a tag macro to convert it back to a string at compile time. So, something like #q |foo bar|, or even #q|foo bar| could work, depending on how we set up the lexing for |-quoted symbols. This is still one character shorter than #[[foo bar]], and comparable to the Plain Style. Also, a HySymbol is-a Python string, and will work as one in most contexts, so you often wouldn't even need the tag macro. Just use '|foo bar|, which is even shorter. We could also use a different tag macro to convert it differently, like #b for a bytestring or #f for a format string, etc.

Kodiologist · 2017-08-09T22:57:54Z

Both Perl and Ruby have balanced-style strings with the same issue. How much of a problem is it for them, really? How does their tooling handle it?

I can't speak for Ruby, but the complexity of Perl's quoting forms tends to be behind the corner cases that Perl syntax highlighters have trouble with. That's probably the hardest Perl feature to highlight, except perhaps the magic variables with weird names like $' (and modern Perl should generally use English; so you can use the longer names instead, anyway).

If we're using Unicode anyway we could have used the English-style 66/99 quotes with “ U+201C and ” U+201D. But they might be harder to distinguish than the guillements in some fonts.

Agreed, and hard to distinguish from ASCII double quotes, too.

Unfortunately, some languages (like German) like to use them backwards, »like this«. The other way is more common, but we could support both, by allowing guillement strings to start with » too.

Oh dear, that sounds like a recipe for insanity. If annoying or confusing people who use guillemets in the opposite direction in their native language is a concern, it's probably better to choose different characters, like ‹ and › or ⁅ and ⁆.

My first objection to quoting with vertical bars would be, how does the parser tell whether (foo | x |) is a function of one argument, the symbol named " x ", or three arguments, the first and third of which are the | function (the shadow version of the bitwise OR operator from Python)?

gilch · 2017-08-09T23:27:26Z

(foo | x |) would clearly be tokenized as HyExpression([ HySymbol('foo'), HySymbol(' x ')]) under #1117.

This does mean we can no longer write | as our bitwise or. We could certainly spell it out as bit-or (or bor or or-) instead though. We did that with = to setv. (We'd also want to spell out &, ^, and ~ for consistency, which would also free up those characters for other uses.) \| could also work, but I don't think it's as pretty.

refi64 · 2017-08-10T01:50:51Z

FWIW is there a reason we can't just use PLY? It works much better with things like this.

Kodiologist · 2017-08-10T02:23:43Z

I'd rather not rename a bunch of operators when we could just use other syntax. So, if there are no objections, I'm going to implement Lua style and a guillemet style supporting nesting. We've rather drawn this out, so it will be nice to conclude it.

gilch · 2017-08-10T04:29:55Z

I'd rather not rename a bunch of operators when we could just use other syntax.

We'd have to do that for #1117 anyway, which is important for a lot of other issues. I'd rather not have three kinds of string literals when two will do. A change in grammar is a much bigger deal than renaming some core operators. And it's confusing that bitwise-not is the same symbol as unquote, so I want to change that one anyway. For example,

`(~(foo))

Does the above immediately call foo and unquote its result, or put (foo) in the expansion and bitwise-negate it? Obvoiusly the repl can tell you. But `(-(foo)) would negate it. That's inconsistent.

Freeing up & from bitwise-and would allow us to use it in function calls. So we could use the shorter Clojure style (foo [& args]...) instead of (foo [&rest args]).

And if we free up | from bitwise-or, we can have both arbitrary-string symbols and a concise non-double-quote string syntax with only one grammar change instead of two. This would make the guillemet style redundant.

[Edit: and ^. That's only four. This would give us the Clojure metadata syntax, which we could use to implement Python's annotations. #640 #656, among others uses. And maybe even Cython #934]

We can leave the other bitwise operators alone.

gilch · 2017-08-10T04:38:24Z

So for this PR, it's fine if it's just the #[foo[ Lua style. While a little more verbose that we might like in some cases, that's good enough until we implement #1117 with a #q tag macro to convert the symbol name to a string in another PR. But if you want to do #1117 here too, that's fine.

Kodiologist · 2017-08-10T04:58:16Z

A change in grammar is a much bigger deal than renaming some core operators.

The grammar has to change in any case in order to implement a new form of symbol quoting or string literal or whatever.

Wouldn't it be better to quote with some syntax that doesn't require changing an operator from the Python name, like \ … \ or #( … ) or #` … ` or « … »?

And seeing as people are going to use funny characters in strings more often than in symbols, shouldn't we have the default for the syntax be a string rather than a symbol?

Kodiologist · 2017-08-10T04:59:57Z

Related to that last point, easier entry of symbol with weird names doesn't actually need a new quoting syntax—it would suffice to add another prefix to string literals that makes the result a symbol instead of a plain string.

Kodiologist · 2017-08-10T05:05:24Z

FWIW is there a reason we can't just use PLY?

I don't know, not having used it, but rewriting the lexer and parser is out of the scope of this PR.

gilch · 2017-08-10T05:46:02Z

FWIW is there a reason we can't just use PLY?
rewriting the lexer and parser is out of the scope of this PR.

@kirbyfan64 PLY has a compatible BSD license, so I don't know of a reason. In what way is it better though? Can it do this kind of nested parsing we've discussed for the balanced styles any better without adding a new regex engine dependency? If so, and if we settle on a balanced style, it might be worth it. But @Kodiologist pointed out that any balanced style would complicate our tooling.

gilch · 2017-08-10T07:03:10Z

The grammar has to change in any case in order to implement a new form of symbol quoting or string literal or whatever.

But it would be a less complex grammar with only the two string literals and the arbitrary symbol syntax, than with three string literal types and the arbitrary symbol syntax.

Wouldn't it be better to quote with some syntax that doesn't require changing an operator from the Python name, like \ … \ or #( … ) or #… or « … »?

I actually think the | and \ syntax is better for symbols. Freeing up those four names from the operators would help with other issues. But if the others also feel very strongly that we must not rename our bitwise-or, then || symbols are not an option. And we'd have to come up with some other syntax for #1117. It's not like Hy has infix notation. Bitwise operators aren't even used that much in Python. But if you are prone to bit-bashing, you'd have to use parentheses and indentation to make the expression readable. So I don't think slightly longer names are a big deal.

I'd rather keep \ as an escape character for symbols even if we don't use the | quoting, like in Emacs Lisp. So \...\ is out.

#(...) means xi in Clojure. Using it for balanced strings might confuse the Clojure users, but I think it's actually not an unreasonable choice. It's only one character longer than "", without introducing Unicode syntax. It also has a nice parallel with #[[]]. If we wanted a prefix version of xi for the Clojure people, something like #%(f %1 %2 ... %&) would work about as well instead. We also considered it for genexprs #867, which isn't xi either. They could use a slightly longer tag, but it wasn't clear what when we only had one character sharp macros. Maybe #for would work. Balanced strings do have the aforementioned problem with tooling, but if we are doing paired delimiters, I want them balanced. I'm not sure if I like this better than the guillements or not (or both?), but I'm leaning towards #(...). It's fine to allow the users to use Unicode, but I'd rather not add any to Hy itself.

We've already discussed A and B, but one of them is paired and the other isn't.

And seeing as people are going to use funny characters in strings more often than in symbols, shouldn't we have the default for the syntax be a string rather than a symbol?

No, because we want to be able to use funny characters in tag macros. We've already got the super-short "" notation for strings with an overhead of just two characters. We'd almost always just use that. And we'll be adding #[[]] too for crazy raw strings, which are probably not short.

It feels like we're quibbling over saving one character. Compare the overhead of Lua #[[...]] (5) to the overhead of Plain #qX...X (4), which you thought was good enough before (if barely?). Note that tagged symbol #q|...| is also 4.

I'm still not getting the use case where neither "" nor #[[]] is good enough, and neither was @kirbyfan64. Unicode is hard to type on most systems, so it's not like we're saving keystrokes. And an Emacs binding could insert #[[ just as easily as «. Guillemonts sure are prettier, but you can also have Emacs abbreviate things with Unicode for your reading pleasure, like λ for fn or xᵢ for xi. Why not « for #[[ and » for ]]? Can we do that last one to the Lua-Style strings without also doing it to nested lists? If not, something like #[;[ ];] should be unambiguous. This way you can read Unicode, but still be compatible with ASCII tools and files.

So, compared to Lua Style, it's not easier to read (on Emacs). It's not easier to write (unless you have a foreign keyboard, and even they require AltGr). It's not easier to implement tooling for. Many terminals can't print Unicode, so it's not very good for the command line either, and Unicode is still hard to type. I could maybe see #(...) being better here, but barely. It's only two characters shorter.

Kodiologist · 2017-08-10T14:44:18Z

For what it's worth, I use the bitwise operators much more often for vectorized logic with NumPy or Pandas objects than for actual bitwise logic.

It feels like we're quibbling over saving one character.

Yeah, we've been bikeshedding mercilessly about this since the beginning. Somebody's gotta give in order for this to end. So, I give. Would you accept a PR that adds the Lua style (without nesting of the customized delimiter) and doesn't do anything else?

gilch · 2017-08-10T18:58:59Z

Yeah, we've been bikeshedding mercilessly about this since the beginning.

I didn't see the discussion as trivial. The grammar is something we should try to keep simple and understandable. Changes here should be carefully thought out in the context of the whole language instead of blindly accepting the first thing that seems like a good idea. I felt like the options have been improving because of the discussion. I saw good points that I hadn't thought of on my own.

Would you accept a PR that adds the Lua style (without nesting of the customized delimiter) and doesn't do anything else?

A single change could probably get approved faster than two.

Which version exactly are you proposing? I'd accept the version like #[foo[ (remove one newline, if present) string body ]foo], where foo is any string not containing [ nor ], including the empty string, and the foo is available as metadata in the HyString model for use with macros, etc.

This is not a balanced style, so a pure FSM regex engine could highlight it, e.g. #[foo[ #[foo[ bar ]foo] baz ]foo] would end at the first matching, ]foo], and be equivalent to the string " #foo[ bar ". The remaining baz ]foo] wouldn't get highlighted and might be a syntax error depending on context. No need for a PDA. If you want to nest these, you have to pick a different foo, usually just by adding another =.

Kodiologist · 2017-08-10T19:01:51Z

Which version exactly are you proposing? I'd accept the version like #[foo[ (remove one newline, if present) string body ]foo], where foo is any string not containing [ nor ], including the empty string, and the foo is available as metadata in the HyString model.

Yeah, that one.

Kodiologist force-pushed the new-str-literals branch from a621754 to 9e71c00 Compare July 19, 2017 21:39

gilch reviewed Jul 19, 2017

View reviewed changes

Kodiologist force-pushed the new-str-literals branch from 9e71c00 to ef325e2 Compare July 19, 2017 23:48

Kodiologist changed the title ~~Implement q#C string literals~~ Implement #qC string literals Jul 20, 2017

Implement q#C string literals

c55fc71

Kodiologist force-pushed the new-str-literals branch from ef325e2 to c55fc71 Compare July 20, 2017 15:53

gilch mentioned this pull request Aug 10, 2017

Add | quotes and \ escapes to Hy symbols and keywords. #1117

Closed

Kodiologist mentioned this pull request Aug 10, 2017

Add #[DELIM[ … ]DELIM] syntax for string literals #1379

Merged

2 tasks

Kodiologist closed this Aug 10, 2017

Kodiologist deleted the new-str-literals branch August 27, 2017 21:41

Implement #qC string literals #1327

Implement #qC string literals #1327

Conversation

Kodiologist commented Jul 19, 2017 • edited

gilch commented Jul 19, 2017 • edited

Kodiologist commented Jul 19, 2017

gilch commented Jul 19, 2017

gilch commented Jul 19, 2017 • edited

gilch commented Jul 19, 2017

Kodiologist commented Jul 19, 2017

Kodiologist commented Jul 19, 2017

Kodiologist commented Jul 19, 2017

gilch commented Jul 19, 2017 • edited

gilch commented Jul 19, 2017 • edited

Kodiologist commented Jul 19, 2017

gilch Jul 19, 2017

Choose a reason for hiding this comment

Kodiologist Jul 19, 2017

Choose a reason for hiding this comment

Kodiologist commented Jul 19, 2017 • edited

gilch commented Jul 20, 2017 • edited

Kodiologist commented Jul 20, 2017

Kodiologist commented Jul 20, 2017

gilch commented Jul 20, 2017 • edited

Kodiologist commented Jul 20, 2017

Kodiologist commented Jul 22, 2017

tuturto commented Aug 2, 2017

Kodiologist commented Aug 2, 2017

tuturto commented Aug 2, 2017

Kodiologist commented Aug 2, 2017

Kodiologist commented Aug 7, 2017

gilch commented Aug 7, 2017

Kodiologist commented Aug 8, 2017

refi64 commented Aug 8, 2017

refi64 commented Aug 8, 2017

Kodiologist commented Aug 8, 2017 • edited

Kodiologist commented Aug 8, 2017 • edited

gilch commented Aug 8, 2017 • edited

Kodiologist commented Aug 8, 2017

Kodiologist commented Aug 8, 2017 • edited

refi64 commented Aug 8, 2017

Kodiologist commented Aug 9, 2017

gilch commented Aug 9, 2017

gilch commented Aug 9, 2017 • edited

Kodiologist commented Aug 9, 2017 • edited

gilch commented Aug 9, 2017 • edited

refi64 commented Aug 10, 2017

Kodiologist commented Aug 10, 2017

gilch commented Aug 10, 2017 • edited

gilch commented Aug 10, 2017

Kodiologist commented Aug 10, 2017

Kodiologist commented Aug 10, 2017

Kodiologist commented Aug 10, 2017

gilch commented Aug 10, 2017

gilch commented Aug 10, 2017 • edited

Kodiologist commented Aug 10, 2017

gilch commented Aug 10, 2017 • edited

Kodiologist commented Aug 10, 2017

Kodiologist commented Jul 19, 2017 •

edited

gilch commented Jul 19, 2017 •

edited

gilch commented Jul 19, 2017 •

edited

gilch commented Jul 19, 2017 •

edited

gilch commented Jul 19, 2017 •

edited

Kodiologist commented Jul 19, 2017 •

edited

gilch commented Jul 20, 2017 •

edited

gilch commented Jul 20, 2017 •

edited

Kodiologist commented Aug 8, 2017 •

edited

Kodiologist commented Aug 8, 2017 •

edited

gilch commented Aug 8, 2017 •

edited

Kodiologist commented Aug 8, 2017 •

edited

gilch commented Aug 9, 2017 •

edited

Kodiologist commented Aug 9, 2017 •

edited

gilch commented Aug 9, 2017 •

edited

gilch commented Aug 10, 2017 •

edited

gilch commented Aug 10, 2017 •

edited

gilch commented Aug 10, 2017 •

edited