New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parse.y: add heredoc <<~ syntax (Feature #9098) #878

Closed
wants to merge 10 commits into
base: trunk
from

Conversation

9 participants
@bjmllr
Copy link

bjmllr commented Apr 20, 2015

Allows for the use of heredocs which appear nicely indented in ruby source code, but the indentation is removed during parsing.

Original proposal: https://bugs.ruby-lang.org/issues/9098

Uses the syntax suggested by Avdi Grimm (<<~), and should have the same semantics as String#strip_heredoc from ActiveSupport, that is, the indentation of the least-indented line is removed from each line of the string.

No attempt was made to deal with inconsistent indentation (tabs are considered equal to spaces).

Please let me know if I can improve this patch. Thanks!

@bjmllr bjmllr force-pushed the bjmllr:tildoc branch from 61a35ad to a59c3af Apr 20, 2015

@bjmllr bjmllr closed this Apr 20, 2015

@bjmllr bjmllr reopened this Apr 20, 2015

@avdi

This comment has been minimized.

Copy link

avdi commented Apr 22, 2015

Yay!

@arbox

This comment has been minimized.

Copy link

arbox commented Apr 22, 2015

👍

@bjmllr bjmllr force-pushed the bjmllr:tildoc branch from 2d2c2df to aab34a4 Apr 22, 2015

@Fryguy

This comment has been minimized.

Copy link

Fryguy commented Apr 22, 2015

What happens if there are blank lines in the here doc (not necessarily with leading whitespace) ?

@bjmllr

This comment has been minimized.

Copy link

bjmllr commented Apr 22, 2015

@Fryguy currently those would be considered lines with no indentation, so they would cause the entire heredoc to be flush left. That means that the documentation I just pushed is incorrect, but before I fix it, do you think it's better to ignore blank lines, or treat them as lines with no indentation?

@Fryguy

This comment has been minimized.

Copy link

Fryguy commented Apr 22, 2015

I was originally thinking ignore them for the purposes of figuring out the strip size. As a user of the method, my least surprise would be with this:

class FancyHello
  def self.hello
    puts <<~README.inspect
      Hello

        World!
    README
  end
end

FancyHello.hello # => "Hello\n\n  World!\n"

Not 100% sure though...what do others think? @avdi?

@bjmllr

This comment has been minimized.

Copy link

bjmllr commented Apr 23, 2015

With this last commit, lines which are blank (empty or consisting only of tabs and spaces) will not be used to find the base indentation level. On a blank line, any amount of indentation less than the heredoc's base indentation level will be ignored, while any additional indentation will be preserved.

@nobu

This comment has been minimized.

Copy link
Member

nobu commented Apr 23, 2015

I expect that literally written spaces/tabs would be stripped, but not escaped ones, such as \, \s, \t, \040, \x09, and so on.
If we use your approach, line_indent should be counted at parsing each lines, but not after the whole here doc, I think.

@bjmllr bjmllr force-pushed the bjmllr:tildoc branch from 0b0545d to bf88013 Apr 23, 2015

@bjmllr

This comment has been minimized.

Copy link

bjmllr commented Apr 23, 2015

@nobu \ (backslash space) is now preserved when it appears at the start of a line (other escape sequences should be the same). To achieve this, I moved the counting of line_indent inside parser_tokadd_string. Is this more or less what you had in mind?

@avdi

This comment has been minimized.

Copy link

avdi commented Apr 23, 2015

I've thought about this a lot, and I've ended up with two options that I would find acceptable:

#1. Indent is based on shortest-indented non-whitespace line. So:

class FancyHello
  def self.hello
    puts <<~README.inspect
      Hello

        World!
    README
  end
end

outputs:

Hello

  World!

#2. Final indent is based on the indent level of the closing marker (README, in this example). Output would be:

  Hello

    World!

Of the two, I suspect #1 is less likely to surprise people. In both cases, blank lines are ignored for the purpose of indent.

@bjmllr

This comment has been minimized.

Copy link

bjmllr commented Apr 23, 2015

@avdi Seems like we're all leaning toward #1, that's the behavior implemented in this PR.

@nobu

This comment has been minimized.

Copy link
Member

nobu commented Apr 26, 2015

Escaped spaces seem fine.
Still not working well with string interpolation, #{}.
You'll need to reset heredoc_indent at the beginning of a heredoc but not for each fragments, as well as lex_strterm, and dedent them at the rule string1.
Also, heredoc_indent needs to be saved/restored around compstmt in string_content.

@nobu

This comment has been minimized.

Copy link
Member

nobu commented Apr 26, 2015

BTW, it's better to adopt the existing coding style (indent, braces, etc.) to send patches, even if it is far from your favorites.
This is not MUST and won't be the only reason to reject for ruby usually, but recommended in general.

@bjmllr

This comment has been minimized.

Copy link

bjmllr commented Apr 26, 2015

@nobu I definitely didn't intend to introduce style inconsistencies! I guess you specifically meant where I was using just spaces for indentation, instead of tabs and then spaces ... if so, I think I have fixed it with this last commit. I'll work on the other issues later this week. Thanks for all the feedback!

@bjmllr bjmllr force-pushed the bjmllr:tildoc branch from 4aeaf2b to 7f9c35f May 3, 2015

@bjmllr bjmllr closed this May 3, 2015

@bjmllr bjmllr reopened this May 3, 2015

@bjmllr

This comment has been minimized.

Copy link

bjmllr commented May 4, 2015

@nobu With the changes you mentioned above, interpolation seems to be working now. I also updated ripper to provide the dedented string and added support for backticks.

@Ajedi32

This comment has been minimized.

Copy link

Ajedi32 commented Jul 15, 2015

👍 This would solve a long-time annoyance I (and presumably many others) have had with the heredoc syntax. Rails has had a solution to this for a while, but for non-rails code the need to remove indentation from heredoc strings has been rather irritating.

@deivid-rodriguez

This comment has been minimized.

Copy link

deivid-rodriguez commented Nov 4, 2015

Hi! I'm guessing this would need a rebase if it was to be merged, but... is it still being considered? I would personally find it very handy to have it in core.

@sikachu

This comment has been minimized.

Copy link

sikachu commented Nov 23, 2015

@bjmllr thank you so much for implementing this patch. I gave up a while back since I don't know C and lexer so well and couldn't finish the patch.

Per @avdi's comment, the original intention is to have the output as example #1 as well. I'm glad that this is being adopted.

bjmllr added some commits Apr 20, 2015

parse.y: find <<~ heredoc strip size in tokadd_string
As suggested by nobu, this eliminates one of the passes through the
string and also allows us to give different treatment to escape
sequences.

doing this required a few other changes:
* parser_params now includes a parser_heredoc_indent element
* the amount of indentation to remove from a heredoc is now found in
  parser_tokadd_string rather than parser_heredoc_indent
* parser_heredoc_dedent is now called from parser_here_document
  rather than parser_str_new

some cleanup happened in this commit as well:
* removed unneeded parser_heredoc_dedent signature
* starting size for parser->parser_heredoc_indent is now INT_MAX
* parser_heredoc_dedent now calls dispose_string on the input string,
  unless parser->parser_heredoc_indent is 0, in which case it returns
  the input string

bjmllr added some commits May 1, 2015

minor progress toward handling interpolation
* added an interpolation test case, which still fails
* eliminated STR_FUNC_DEDENT, heredoc_dedent is sufficient
* changed heredoc_dedent() to accept and return NODE's
* call heredoc_dedent() and reset heredoc_dedent from parser rules
* save and restore heredoc_dedent around compstmt in string_content
* set heredoc_indent as soon as a squiggly heredoc starts parsing
parse.y: interpolated expressions in ~ heredoc
* make heredoc_line_indent a member of parser_params (needed because a
  line can be broken into multiple nodes by an interpolated expression)
* drop reading_indentation from parser_tokadd_string in favor of
  heredoc_line_indent
* rewrite heredoc_dedent() to walk the AST and rewrite indentation across
  string fragments
* add failing case of interpolated string, fix it by tracking yet-to-be-removed
  indent for the count process and for the copy process separately
* remove carriage return handling
add heredoc dedenting to ripper
* extract actual dedenting activity from parser_heredoc_dedent() to
  parser_heredoc_dedent_string()
* add parser_heredoc_dedent_ripper() for use in ripper
* add squiggly heredoc tests

@bjmllr bjmllr force-pushed the bjmllr:tildoc branch 2 times, most recently from 35dbcd5 to eb7f824 Nov 27, 2015

@bjmllr

This comment has been minimized.

Copy link

bjmllr commented Nov 28, 2015

I rebased this branch and made a first attempt at @matz 's request regarding the handling of hard tabs. It should now do something sensible for any indentation other than spaces followed by tabs on a single line.

The build error seems to be unrelated, something in test_fork.rb?

@hsbt

This comment has been minimized.

Copy link
Member

hsbt commented Nov 28, 2015

@bjmllr I re-runned Travis CI.

@nobu

This comment has been minimized.

Copy link

nobu commented on eb7f824 Nov 28, 2015

tab is expanded to the next multiple of 8.
it's not same as 8 spaces always.

@Ajedi32

This comment has been minimized.

Copy link

Ajedi32 commented Nov 28, 2015

I'm confused. Why are tabs being treated as equivalent to spaces at all? E.g. If I write:

def hello
  puts <<~README.inspect
<tab>Hello

<space><space><space><space><space><space><space><space>World!
    README
  end
end

Are you saying that should be accepted by the compiler? Why? Why should that be any less invalid than:

def hello
  puts <<~README.inspect
<tab>Hello

<space><space><space><space>World!
    README
  end
end

or

def hello
  puts <<~README.inspect
<space><space><space><space>Hello

<space><space>World!
    README
  end
end

Shouldn't we just throw an error in all of those cases? Is there ever a legitimate reason why you'd want to allow inconsistent indentation in one of these blocks? What happens when someone has their editor set to display tabs as 4 spaces, and writes:

def hello
  puts <<~README.inspect
<space><space><space><space>Hello

<tab>World!
    README
  end
end

Why should that result in:

hello #=> "    Hello\n\n\tWorld!"

I certainly wouldn't expect that result intuitively. In such a case, wouldn't a well-written error message explaining that I'm mixing tabs and spaces be much more helpful for me as a developer?

@nobu

This comment has been minimized.

Copy link

nobu commented on 388f8ca Nov 28, 2015

This is too ad hoc to accept.
Now I'm thinking that dispatching heredoc_dedent event after tSTRING_END if it is needed, to leave it to the ripper subclasses.
Another problem about ripper is Ripper#column.

@nobu

This comment has been minimized.

Copy link

nobu commented on parse.y in 9a9b4bd Nov 28, 2015

Reset heredoc_line_indent to -1.
Otherwise, nested heredoc doesn't work.

p <<~E
  bar#{<<~F}1
 xxx
F
  zot
E

shows " barxxx\n1\n zot\n", but should be "barxxx\n1\nzot\n"

@bjmllr

This comment has been minimized.

Copy link

bjmllr commented Dec 8, 2015

Closing this since the feature was added in 9a28a29

@bjmllr bjmllr closed this Dec 8, 2015

@edward edward referenced this pull request Jun 14, 2016

Merged

Nokogiri HTML refactor #6

1 of 1 task complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment