Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chocking on interpolation with interpolation #249

Open
akimd opened this issue May 7, 2021 · 22 comments
Open

Chocking on interpolation with interpolation #249

akimd opened this issue May 7, 2021 · 22 comments
Labels

Comments

@akimd
Copy link

akimd commented May 7, 2021

Hi Markus!

Weird things happen when there are interpolations within interpolations.

$ unparser -ve '"#{"#{true ? "1
" : "2"}
"}"'
(string)
Original-Source:
"#{"#{true ? "1
" : "2"}
"}"
Generated-Source:
"#{<<-HEREDOC}"
Original-Node:
(dstr
  (begin
    (dstr
      (begin
        (if
          (true)
          (str "1\n")
          (str "2")))
      (str "\n"))))
Generated-Node:
#<Parser::SyntaxError: unexpected token tLSHFT>
/opt/local/lib/ruby3.0/gems/3.0.0/gems/parser-3.0.1.0/lib/parser/diagnostic/engine.rb:72:in `process'
/opt/local/lib/ruby3.0/gems/3.0.0/gems/parser-3.0.1.0/lib/parser/base.rb:286:in `on_error'
(eval):3:in `_racc_do_parse_c'
(eval):3:in `do_parse'
/opt/local/lib/ruby3.0/gems/3.0.0/gems/parser-3.0.1.0/lib/parser/base.rb:190:in `parse'
/opt/local/lib/ruby3.0/gems/3.0.0/gems/unparser-0.6.0/lib/unparser.rb:116:in `block in parse_either'
/opt/local/lib/ruby3.0/gems/3.0.0/gems/unparser-0.6.0/lib/unparser/either.rb:34:in `wrap_error'
/opt/local/lib/ruby3.0/gems/3.0.0/gems/unparser-0.6.0/lib/unparser.rb:115:in `parse_either'
/opt/local/lib/ruby3.0/gems/3.0.0/gems/unparser-0.6.0/lib/unparser/either.rb:115:in `bind'
/opt/local/lib/ruby3.0/gems/3.0.0/gems/unparser-0.6.0/lib/unparser/validation.rb:63:in `from_string'
/opt/local/lib/ruby3.0/gems/3.0.0/gems/unparser-0.6.0/lib/unparser/cli.rb:40:in `validation'
/opt/local/lib/ruby3.0/gems/3.0.0/gems/unparser-0.6.0/lib/unparser/cli.rb:143:in `public_send'
/opt/local/lib/ruby3.0/gems/3.0.0/gems/unparser-0.6.0/lib/unparser/cli.rb:143:in `process_target'
/opt/local/lib/ruby3.0/gems/3.0.0/gems/unparser-0.6.0/lib/unparser/cli.rb:133:in `block in exit_status'
/opt/local/lib/ruby3.0/gems/3.0.0/gems/unparser-0.6.0/lib/unparser/cli.rb:132:in `each'
/opt/local/lib/ruby3.0/gems/3.0.0/gems/unparser-0.6.0/lib/unparser/cli.rb:132:in `exit_status'
/opt/local/lib/ruby3.0/gems/3.0.0/gems/unparser-0.6.0/lib/unparser/cli.rb:64:in `run'
/opt/local/lib/ruby3.0/gems/3.0.0/gems/unparser-0.6.0/bin/unparser:10:in `<top (required)>'
/opt/local/bin/unparser:23:in `load'
/opt/local/bin/unparser:23:in `<main>'
Error: (string)

I agree no human being would write such code. But our generator does :(

Cheers!

@mbj
Copy link
Owner

mbj commented May 7, 2021

2 issues play in here:

First:

There are dynamic strings that produce AST that can only be the result of parsing heredocs. And the signature for these ASTs of "must have been a heredoc and can only be produced from a heredoc" are almost random.

Unparser has some heuristics to trigger heredoc emits, and sometimes (which happened here): We end up with triggering heredocs where we should not have. But somtimes emitting more heredocs makes the emitter code easier, while still producing the same AST on parsing.

Second:

Heredocs need 'parallel' dispatch, as they have a very dislocal effect on the token stream. Its likely that one of the parrallel all sides is missing hence not getting you the body of the heredoc, only the header.

Thanks for reporting: Right now, I'm super busy commercially, assume this one takes a very long time to fix.

@mbj mbj added the bug label May 7, 2021
@akimd
Copy link
Author

akimd commented May 7, 2021

Markus,
In trying to find something to reproduce our failure, I may well have written something actually more complex than our original use case. And so far, we can't reproduce the original issue, so we might just as well have used some older version of unparser somewhere.
In other words: we are not blocked by this issue, and AFAIC you are welcome to let it rust.
Cheers!

@mbj
Copy link
Owner

mbj commented May 7, 2021

@akimd its my end goal that unparser can round trip ALL ast that parser emits. Which may mean I'll have to ask the parser team to simplfy the ast.

I'm close to ask them to give me a dedicated node for heredocs in the original source, this would make my part so much easier.

@mbj
Copy link
Owner

mbj commented Apr 22, 2022

@akimd I recently spend a bit more time to think about the problem. Overall I've to conclude that the current parser AST, on any form of dynamic strings is currently lossy.

Not lossy for execution semantics, but lossy in terms of round trip semantics. Meaning you cloud certainly implement a ruby on top of that AST that executes correctly, but you cannot implement an unparser that produces source that produces the same AST, as critical information is lost/noisy.

I've got a local AST builder that subclasses the offiical parser builder producing an AST that does not suffer this compression losses.

Now the question is: Whats your use case? Do you round trip hand written ruby, (That may include heredocs / dynamic strings)? Or do you generate an AST from other sources to than unparse with unparser?

@akimd
Copy link
Author

akimd commented Apr 23, 2022

Hey Markus,
Oh, you have a new face :)

Unparser is the final part of a parse|transform|unparse pipe. We don't care about exact syntactic round tripping, we just care about the (execution) semantics.

Cheers!

@mbj
Copy link
Owner

mbj commented Apr 23, 2022

Oh, you have a new face :)

ageless version ;)

Unparser is the final part of a parse|transform|unparse pipe. We don't care about exact syntactic round tripping, we just care about the (execution) semantic

You are using Unparser.parse, if not would you consider using that one instead of Parser::CurrencyRuby.parse ?

@akimd
Copy link
Author

akimd commented Apr 24, 2022

Markus,
I have been told that since we have something that works (and works around that particular issue), we won't change it again.
So I guess we'll stick to Parser's parser.

@mbj
Copy link
Owner

mbj commented Apr 24, 2022

@akimd Unparser.parse "is" the parsers parser, but it uses the modernized AST format. Thats all I want to know.

@akimd
Copy link
Author

akimd commented Apr 29, 2022

Markus,
Yes, I know, that was very clear. Yet this stuff is in production, and the guy in charge of that does not want to change it now.
Maybe in the future this will be revisited, but not now. Sorry!

@mbj
Copy link
Owner

mbj commented Apr 29, 2022

So are you using the modernized AST Format or not?

@akimd
Copy link
Author

akimd commented May 5, 2022

Hi Markus,
No, we are still using the "native" AST from parser itself.

@mbj
Copy link
Owner

mbj commented May 5, 2022

kk, so as a recomemndation: The parsers default AST format is lossy on semantics, and unparser only supports the modern one for that reason.

Also all other major users of parser opt into the modern format for that reason, and it can very well be that you have semantic issues as you process the default format.

@mbj
Copy link
Owner

mbj commented May 5, 2022

Also a fix for this issue will probably only ever land in the modern AST format.

@akimd
Copy link
Author

akimd commented May 9, 2022

I perfectly understand all these points, but the decision is not mine.  I'm just a proxy here.

@mbj
Copy link
Owner

mbj commented Jan 6, 2023

@akimd I'd like to emphasize here that the "default parser AST" is often wrong when it comes to execution semantics! If the semantics of the executed ruby are important: Its very much recommended to opt into modern AST format.

@akimd
Copy link
Author

akimd commented Jan 7, 2023

Hi Markus,
Happy new year!
Thanks for the heads up. I will again forward your recommendation, and will tell you if we moved to the modernize'd AST format.
FWIW, thanks to your message, I realized that in the code I maintain, I was lagging on several emit_*, which I have now enabled.
Cheers!

@vidarh
Copy link
Contributor

vidarh commented Oct 13, 2023

@akimd Is this still affecting you? I can't promise anything, but given what I ran into was a very similar problem (see the mention above), and I found a solution that worked for that case, I might see if I can get a chance to look into this one too. I reduced the error case at the start of this ticket to this (the ternary doesn't appear to make a difference at all):

"#{"#{}\n"}"

Which still produces this (after applying my fix from the other ticket):

Generated-Source:
"#{<<-HEREDOC}"
Original-Node:
(dstr
  (begin
    (dstr
      (begin)
      (str "\n"))))
Generated-Node:
#<Parser::SyntaxError: unexpected token tLSHFT>

That puts the location of the missing heredoc output somewhere in a relatively small number of places.

@akimd
Copy link
Author

akimd commented Oct 15, 2023

Hi @vidarh,
Thanks for the link!

I'm not in charge of the piece of code where this first happened (and I don't have access to it). So by necessity workarounds must have been installed, and I don't know which.

IOW, we no longer depend on this.

Cheers!

mbj added a commit that referenced this issue May 28, 2024
* This is an entirely new approach.
* Instead to find the "correct" dstr segments we simply try all and unparse the first one
  that round trips.
* This so far guarantees we always get good concrete syntax, but it can be time intensive as
  the combinatoric space of possible dynamic string sequence is quadratic with the dstr children size.
* For this reason we try above (currently) dstr children to unparse as heredoc first.
* Passes the entire corpus and fixes bugs.

[fix #249]
@mbj
Copy link
Owner

mbj commented May 28, 2024

@akimd I found a fix for this issue (in fact the entire dstr saga): #366

That branch does not pass CI at the time of me posting this comment (still working through mutation coverage)but check this out (run from that branch):

bundle exec unparser --verbose -e '"#{"#{true ? "1
" : "2"}
"}"'
(string)
Original-Source:
"#{"#{true ? "1
" : "2"}
"}"
Generated-Source:
"#{"#{if true
  "1\n"
else
  "2"
end}\n"}"
Original-Node:
(dstr
  (begin
    (dstr
      (begin
        (if
          (true)
          (str "1\n")
          (str "2")))
      (str "\n"))))
Generated-Node:
(dstr
  (begin
    (dstr
      (begin
        (if
          (true)
          (str "1\n")
          (str "2")))
      (str "\n"))))
Success: (string)

@akimd
Copy link
Author

akimd commented Jun 21, 2024

Hi Markus,

I think I face the same issue again, under a different disguise.

require 'parser/current'
require 'unparser'

ast = Unparser.parse(<<~'RUBY')
  def get_val(var: nil, **opts)
    foo format: <<~JS
        a;
        #{transform};
        b;
    JS
  end
RUBY
p ast
puts Unparser.unparse(ast)
s(:def, :get_val,
  s(:args,
    s(:kwoptarg, :var,
      s(:nil)),
    s(:kwrestarg, :opts)),
  s(:send, nil, :foo,
    s(:kwargs,
      s(:pair,
        s(:sym, :format),
        s(:dstr,
          s(:str, "a;\n"),
          s(:begin,
            s(:send, nil, :transform)),
          s(:str, ";\n"),
          s(:str, "b;\n"))))))
def get_val(var: nil, **opts)
  foo(format: <<-HEREDOC)
end

Cheers!

@mbj
Copy link
Owner

mbj commented Jun 21, 2024

@akimd very likely this is fixed by #366 but that PR is not yet ready for prime time. It uses an exhaustive search algorithm to select round tripping dstr segments. And this has factorial runtime, its guaranteed to be correct but for large dstr constructs its to slow.

So I've to improve the search algorithm, still this entire issue class may be possible to be eliminated soon.

mbj added a commit that referenced this issue Jun 22, 2024
* This is an entirely new approach.
* Instead to find the "correct" dstr segments we simply try all and unparse the first one
  that round trips.
* This so far guarantees we always get good concrete syntax, but it can be time intensive as
  the combinatoric space of possible dynamic string sequence is quadratic with the dstr children size.
* For this reason we try above (currently) dstr children to unparse as heredoc first.
* Passes the entire corpus and fixes bugs.

[fix #249]
@mbj
Copy link
Owner

mbj commented Jun 22, 2024

@akimd This is fixed in #366 but #366 is not release ready yet, its a big change on how unparser does dynamic string literals and needs more testing.

$ cat x.rb
    foo format: <<~JS
        a;
        #{transform};
        b;
    JS
mutant-dev@mbj ~/devel/unparser (fix/dstr?) $ bundle exec bin/unparser --verbose x.rb
#<Unparser::CLI::Target::Path path=#<Pathname:x.rb>>
x.rb
Original-Source:
    foo format: <<~JS
        a;
        #{transform};
        b;
    JS

Generated-Source:
foo(format: "a;\n#{transform};
b;\n")
Original-Node:
(send nil :foo
  (kwargs
    (pair
      (sym :format)
      (dstr
        (str "a;\n")
        (begin
          (send nil :transform))
        (str ";\n")
        (str "b;\n")))))
Generated-Node:
(send nil :foo
  (kwargs
    (pair
      (sym :format)
      (dstr
        (str "a;\n")
        (begin
          (send nil :transform))
        (str ";\n")
        (str "b;\n")))))
Success: x.rb

As always your bugreports are appreciated!

mbj added a commit that referenced this issue Sep 16, 2024
* This is an entirely new approach.
* Instead to find the "correct" dstr segments we simply try all and unparse the first one
  that round trips.
* This so far guarantees we always get good concrete syntax, but it can be time intensive as
  the combinatoric space of possible dynamic string sequence is quadratic with the dstr children size.
* For this reason we try above (currently) dstr children to unparse as heredoc first.
* Passes the entire corpus and fixes bugs.

[fix #249]
mbj added a commit that referenced this issue Sep 16, 2024
* This is an entirely new approach.
* Instead to find the "correct" dstr segments we simply try all and unparse the first one
  that round trips.
* This so far guarantees we always get good concrete syntax, but it can be time intensive as
  the combinatoric space of possible dynamic string sequence is quadratic with the dstr children size.
* For this reason we try above (currently) dstr children to unparse as heredoc first.
* Passes the entire corpus and fixes bugs.

[fix #249]
mbj added a commit that referenced this issue Sep 17, 2024
* This is an entirely new approach.
* Instead to find the "correct" dstr segments we simply try all and unparse the first one
  that round trips.
* This so far guarantees we always get good concrete syntax, but it can be time intensive as
  the combinatoric space of possible dynamic string sequence is quadratic with the dstr children size.
* For this reason we try above (currently) dstr children to unparse as heredoc first.
* Passes the entire corpus and fixes bugs.

[fix #249]
mbj added a commit that referenced this issue Sep 17, 2024
* This is an entirely new approach.
* Instead to find the "correct" dstr segments we simply try all and unparse the first one
  that round trips.
* This so far guarantees we always get good concrete syntax, but it can be time intensive as
  the combinatoric space of possible dynamic string sequence is quadratic with the dstr children size.
* For this reason we try above (currently) dstr children to unparse as heredoc first.
* Passes the entire corpus and fixes bugs.

[fix #249]
mbj added a commit that referenced this issue Sep 17, 2024
* This is an entirely new approach.
* Instead to find the "correct" dstr segments we simply try all and unparse the first one
  that round trips.
* This so far guarantees we always get good concrete syntax, but it can be time intensive as
  the combinatoric space of possible dynamic string sequence is quadratic with the dstr children size.
* For this reason we try above (currently) dstr children to unparse as heredoc first.
* Passes the entire corpus and fixes bugs.

[fix #249]
mbj added a commit that referenced this issue Sep 20, 2024
* This is an entirely new approach.
* Instead to find the "correct" dstr segments we simply try all and unparse the first one
  that round trips.
* This so far guarantees we always get good concrete syntax, but it can be time intensive as
  the combinatoric space of possible dynamic string sequence is quadratic with the dstr children size.
* For this reason we try above (currently) dstr children to unparse as heredoc first.
* Passes the entire corpus and fixes bugs.

[fix #249]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants