Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing quirks #149

Closed
gilch opened this issue Feb 12, 2022 · 11 comments · Fixed by #228 or #233
Closed

Parsing quirks #149

gilch opened this issue Feb 12, 2022 · 11 comments · Fixed by #228 or #233
Labels
enhancement New feature or request
Milestone

Comments

@gilch
Copy link
Owner

gilch commented Feb 12, 2022

\\. should be an alias of QzBSOL_., i.e. a module literal. Instead it's read as QzFULLxSTOP_. Not sure how that happened, but full stops are handled separately, and this seems wrong.

It is currently possible for a reader macro to start with : e.g. (defmacro :QzHASH_ (..., which doesn't seem like the ideal way to say it. This ends up adding a non-identifier string to the _macro_ namespace, which prevents attribute-access symbols like _macro_.:QzHASH_ from working. I'm not sure what \:\# should be. Making it :QzHASH_ would work for the defmacro, but it doesn't make a lot of sense. Maybe the \: should suppress the control word interpretation, making it QzCOLON_QzHASH_? But then the reader macro usage would have to be spelled \:#foo, which isn't as nice. Should control words be disallowed as reader macro names? Should they always be treated like the first character is escaped? Maybe just colons? I will have to think about this some more.

Reader macros with extras could be more compact if symbols weren't allowed to contain !. Then you could say foo#!1!2!3 bar instead of foo# !1 !2 !3 bar or foo#!!! 1 2 3 bar. You could still escape them, like any other character. But there are conventions where mutating/dangerous functions end in a !, and now they'd have to end in a \!, which isn't as nice. Maybe there's some way to avoid that. Extra could maybe use some other character instead of !. Lisp's symbols can contain more characters than Python, but that also means you need spaces to separate things in situations Python wouldn't. I'm not really satisfied with the whole Extra system, but I don't have a better idea yet.

@gilch
Copy link
Owner Author

gilch commented Feb 12, 2022

If we get rid of collection atoms, #130, that frees up the []{} characters for other things. We could write extras in square brackets, like foo#[1 2 3]bar, foo[1 2 3]#bar, or even foo[1 2 3]bar, in which case, non-extra macros might be written foo[]bar? I'm not sure how easy this is to parse yet, or how it would interact with tooling like Emacs or Parinfer.

A comment string now turns out like this:

(exec
  <<#[
  ;for i in 'abc':
  ;    for j in 'xyz':
  ;        print(i+j, end=" ")
  ;print('.')
  ;
  ]#"\n")

The single leading ; is bad though. Emacs would indent them to the margin. (Which would still work though.) You could avoid that with ;;. The macro could easily strip the first character of each line:

(exec
  <<#[
  ;;for i in 'abc':
  ;;    for j in 'xyz':
  ;;        print(i+j, end=" ")
  ;;print('.')
  ;;
  ]#"\n")

But now Parinfer wouldn't like the brackets here and would have to indent them at least this much:

(exec
  <<#[
      ;;for i in 'abc':
      ;;    for j in 'xyz':
      ;;        print(i+j, end=" ")
      ;;print('.')
      ;;
      #_/]#"\n")

It wouldn't care about the comments' indentation, but if we want them aligned with the other elements, that's where they go. Parinfer also wouldn't allow a closing bracket to start a line like this, so it needs the final discarded item. Square brackets are maybe nice for inline extras, but this seems worse if they span multiple lines.

@gilch
Copy link
Owner Author

gilch commented Feb 16, 2022

# is now the macro for sets. But _macro_.# is still read as a reader macro. _macro_.\# works, but probably reader macro names should not be allowed to end in a dot.

@gilch
Copy link
Owner Author

gilch commented Mar 13, 2022

## should probably be a single-character reader macro. It's currently a symbol. \## works though.

@gilch
Copy link
Owner Author

gilch commented Apr 25, 2022

. is a module handle. For the empty-named module. Which is weird. That would totally be a syntax error in a Python import statement. Empty is not a valid identifier, and can't be munged to one. This should probably just be a symbol: QzFULLxSTOP_.

@gilch
Copy link
Owner Author

gilch commented Apr 25, 2022

.. is a SyntaxError. ... is Ellipsis. Four or more is likewise a syntax error. This is related to . being the empty-named module. Maybe these errors should be symbols too?

@gilch
Copy link
Owner Author

gilch commented Sep 7, 2022

[# is now a macro for the subscript operation, particularly good for slices. So that's kind of claimed again.

@gilch gilch mentioned this issue Sep 22, 2022
@gilch
Copy link
Owner Author

gilch commented Apr 12, 2023

The EDN Hissps have claimed . to represent :, since that's not allowed in EDN, and Hissp needs it.

The X# series is often combined with inject to turn a Python expression into a function, e.g. XY#.#"Y-X". It would be slightly nicer to do XY.# instead. This saves all of one character in Lissp, which is why I didn't bother, but in the EDN Hissps, where inject is currently has a longer spelling (#XY #hissp/."Y-X"), this is a much bigger win: #XY."Y-X".

The dot is the natural choice here (though not the only one), but it would have to munge for the attr in macro to be a valid identifier. It can currently be defined using X\.\#, but the tag still has to be spelled X\.#, negating the 1-character saving, and probably requiring the full spelling of the munged name in EDN Hissps. The munger needs to leave dots alone for attr access and module handles, so tags with a trailing (or leading?) dot would have to be special cased in the respective readers.

@gilch gilch added the enhancement New feature or request label May 25, 2023
@gilch gilch added this to the 0.4 milestone May 26, 2023
@gilch
Copy link
Owner Author

gilch commented Jun 16, 2023

I'm noticing that the desired interpretation of XY.# as a XYQzFULLxSTOP_ tag and _macro_.# as a _macro_.QzHASH_ attr are incompatible. Of course, XY\.# and _macro_.\# would disambiguate, but one or the other shouldn't have to.

Tags have no particular need for a leading dot either, so .XY# should maybe be allowed (as a QzFULLxSTOP_XY tag), freeing up the trailing dot for the attr interpretation. Are dots even allowed in EDN tags though? If so, where? I was paying attention to those details when writing garden-of-edn, but I need to check that again.

@gilch
Copy link
Owner Author

gilch commented Jun 17, 2023

Looks like EDN tags can't start with a dot (must be alphabetic after the #), but the part after the prefix/ (the name) could, as long as the next character is not a digit.

@gilch gilch mentioned this issue Jun 17, 2023
@gilch
Copy link
Owner Author

gilch commented Jun 25, 2023

Found another one:

#> (.hex : self 42.0)
>>> self=(42.0).hex()

This happens to compile to a valid assignment, which is interesting. Only at the top level though. It's a little too situational to be very useful and it's nonsense when nested. Not worth keeping. But is it worth fixing? The fix would probably be to make it an error, but it's already that (except at the top level, where it's not, but not useful either).

@gilch gilch reopened this Jun 25, 2023
@gilch
Copy link
Owner Author

gilch commented Jun 25, 2023

Actually, it does work nested. Sometimes.

>>> print(
...   end=(42.0).hex())
0x1.5000000000000p+5#>

Still seems too situational to be useful. I still kind of think this should be an error, and seeing how it's not just a top-level problem makes me more inclined to fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant