Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ASR: change how a symbol is printed #1510

Open
certik opened this issue Feb 7, 2023 · 9 comments
Open

ASR: change how a symbol is printed #1510

certik opened this issue Feb 7, 2023 · 9 comments
Labels
asr ASR related changes dcr design change request

Comments

@certik
Copy link
Contributor

certik commented Feb 7, 2023

Currently a symbol (e.g. in Var(symbol v)) is printed as 2 x, meaning it's the symbol table ID 2, variable "x" from that table. For example, (Var 2 x).

This has issues with parsing where you have to parse two items for symbol instead of just one, and also things look inconsistent, is it a symbol (owning) or a pointer to a symbol (non-owning)?

The best proposed idea how to fix this is to introduce:

symref = SymbolRef(int symtab_id, identifier symnym)

expr =
    ...
    Var(symref v)

And then the current (Var 2 x) becomes (Var (SymbolRef 2 x)).

See #1492 (comment) for more details.

This cleans up the abstract representation of ASR, now symbol is always owning, and SymbolRef is always non-owning.

Important: we need to preserve the current speed, so in C++ the SymbolRef would actually always be replaced with just a pointer to the symbol, so the generated C++ code would be equivalent to what it is today, so no slow down. But conceptually, the pointer to symbol x is the same as (SymbolRef 2 x). In languages that do not have pointers, such while printing or serializing (or Clojure), one can use (SymbolRef 2 x) directly.

Other ideas, just for printing:

  • (Var (2 x))
  • (Var {2 x})

But it seems our experience shows that it's better to use longer, descriptive names in ASR printouts and not to use shortcuts, so that it's more obvious what the printout represents to newcomers. The ASR Clojure like printout is NOT used for performance parts of the code. For that we use binary representation where we eventually implement all tricks possible to keep the binary representation as compact as possible, or alternatively to deserialize it as quickly as possible.

Related issues:

@certik certik added asr ASR related changes bug Something isn't working labels Feb 7, 2023
@rebcabin
Copy link
Contributor

rebcabin commented Feb 7, 2023

I am not sure we need both Var and SymbolRef. Perhaps one of the two will do in all cases, e.g., FunctionCall, which has naked pairs like 2 pow 3 foo, and ExplicitDeallocate, which has an explicit vector of symrefs, e.g., [2 x 3 y] >~~should-be~~> [(SymbolRef 2 x) (SymbolRef 3 y)].

If we need both, I prefer (Var 2 x) and (SymbolRef 3 y) to (Var (SymbolRef 2 x)). The extra level of nesting seems superfluous and wasteful.

If we can get away with one of the two, why not do that?

@rebcabin
Copy link
Contributor

rebcabin commented Feb 7, 2023

Clojure will interpret (Var {2 x}) as a hash map with 2 as key and x as identifier. It's OK, but I don't see why for such a small data item.

@rebcabin rebcabin added dcr design change request and removed bug Something isn't working labels Feb 7, 2023
@certik
Copy link
Contributor Author

certik commented Feb 7, 2023

symbol is currently used in quite a few nodes. If we want to use Var, we need to replace symbol with expr, and enforce in verify() that the expr is an actual Var and no other expr. This is a limitation of ASDL.

We can rename Var to SymbolRef, but that I think does not change anything in the structure and by itself does not solve anything.

@rebcabin
Copy link
Contributor

rebcabin commented Feb 7, 2023 via email

@rebcabin
Copy link
Contributor

rebcabin commented Feb 7, 2023

Let me try for a little more clarity with some point-like questions:

  1. naked streams of pairs like 2 x 3 y () -- with () representing a non-existent pair -- and it's not a pair to boot! -- are problematic for humans to read, programs to process, to parse, ASDL or EDN to specify, and so on. They're not even tupled! Shall we agree not to use them?
  2. streams of pairs in vector brackets, like [2 x 3 y nil nil] are better, but still not great because we don't, prima-facie, know what they represent, and we sill have to count values inside the square brackets to check for even-ness and other possible constraints like "at-least-one" and "no more than three."
  3. (Var 2 x) and [(Var 2 x) (Var 3 y) NullVar] seem like good solutions to me. Now, the pairs are tagged and typed (with head Var) and tupled (in round brackets, so we don't have to count and partition a stream of naked values) and we have an explicit bottom value for the Var type.
  4. Ditto (SymbolRef 2 x) and [(SymbolRef 2 x) (SymbolRef 3 y) NullSymbolRef].
  5. I don't see any good coming out of (Var [2 x]), (Var (SymbolRef 2 x)), (Var (2 x)), (Var {2 x}), and the like. What's the point?
  6. I don't understand the difference between (Var 2 x) and (SymbolRef 2 x). Except for the type-tag, Var and SymbolRef respectively, they have the same structure. What are they used for? Can Var be used any particular places where SymbolRef is used, and vice versa? How about all places, i.e., are Var and SymbolRef equivalent? If so, are they redundant? If not, what is the difference?

@rebcabin
Copy link
Contributor

rebcabin commented Feb 7, 2023

Here is an example illustrating ambiguity with naked streams. These are two FunctionCalls from tests/expr7.py:

;; (FunctionCall
;;  1            \____ should be (SymbolRef 1 test_pow_1)
;;  test_pow_1   /
;;  ()           >---- should be NullSymbolRef (new symconst)
;;  [((IntegerConstant 1 (Integer 4 [])))   ; call_args
;;   ((IntegerConstant 2 (Integer 4 [])))]
;;  (Integer 4 [])
;;  ()
;;  ())


;; (FunctionCall _____________________
;;  2                                 \____ should be SymbolRef(...)
;;  pow/__lpython_overloaded_0__pow __/
;;  2     \____ should be (SymbolRef 2 pow)
;;  pow __/
;;  [((IntegerConstant 2 (Integer 4 [])))
;;   ((IntegerConstant 2 (Integer 4 [])))]
;;  (Real 8 [])
;;  (RealConstant 4.0 (Real 8 []))
;;  ())

The (tricky, non-robust) code for processing (working-around) these examples is around line 476 in rebcabin/ClojureProjects002@2b92b43#diff-1a0a9fc9fb7f9291d27c713a3b60f54e75050e723a463da5dd9df82a20a68080R456

@aadityasinha-dotcom
Copy link

Hey, is this issue available? Can I work on this issue?

@aadityasinha-dotcom
Copy link

??

@certik
Copy link
Contributor Author

certik commented Feb 24, 2023

@aadityasinha-dotcom you can work on any issue you want, simply submit a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
asr ASR related changes dcr design change request
Projects
None yet
Development

No branches or pull requests

3 participants