Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Ripper for lexical analyser #512

Merged
merged 151 commits into from Sep 12, 2017
Merged

Use Ripper for lexical analyser #512

merged 151 commits into from Sep 12, 2017

Conversation

aycabta
Copy link
Member

@aycabta aycabta commented Sep 6, 2017

I replaced legacy lexical analyser from IRB with Ripper what is standard library of Ruby. It's important to support new syntax.

I explain what this code can parse Ruby code with checking diff for all Ruby documents and all Rails documents.

Preparing

Checkout each branches

mkdir test-for-rdoc
cd test-for-rdoc
git clone -b master --single-branch git@github.com:ruby/rdoc.git original-rdoc
cd original-rdoc
git checkout 0c5eb20 # This is HEAD of master as of now
bundle
cd ..
git clone -b use-ripper --single-branch git@github.com:aycabta/rdoc.git ripper-rdoc
cd ripper-rdoc
git checkout 23f12c1 # This is HEAD of use-ripper branch as of now
bundle
cd ..
git clone -b trunk --single-branch git@github.com:ruby/ruby.git
cd ruby
git checkout d68a6b3
cd ..
git clone -b master --single-branch git@github.com:rails/rails.git
cd rails
git checkout 665ac7c
cd ..

Generate CRuby documents

ruby -Ioriginal-rdoc/lib original-rdoc/exe/rdoc -o ruby-doc-original ruby
ruby -Iripper-rdoc/lib ripper-rdoc/exe/rdoc -https://github.com/zzak/sdoc/pull/112o ruby-doc-ripper ruby

Generate Rails documents

cd rials
# Tha patch below for removing monkey patch (https://github.com/zzak/sdoc/pull/112) and,
# using suitable variable (https://github.com/zzak/sdoc/pull/113) with
# suitable behavior of RDoc (https://github.com/ruby/rdoc/pull/502)
curl -fsSL https://gist.githubusercontent.com/aycabta/1d4eac210d3ede1b91a520804499a457/raw/ca90a8ce8fe3089a1d45629a01550012b2be46e1/sdoc.diff | patch -p1 
bundle update
RUBYOPT=-I../original-rdoc/lib bundle exec rake rdoc
mv doc/rdoc ../rails-doc-original
RUBYOPT=-I../ripper-rdoc/lib bundle exec rake rdoc
mv doc/rdoc ../rails-doc-ripper
cd ..

Check diff

Ruby

diff -ru --exclude "*.js" --exclude "*.gz" --exclude "*.css" --exclude "*.png" --exclude "*.gif" --exclude "*.ttf" rails-doc-original rails-doc-ripper | less

Rails

diff -ru --exclude "*.js" --exclude "*.gz" --exclude "*.css" --exclude "*.png" --exclude "*.gif" --exclude "*.ttf" ruby-doc-original ruby-doc-ripper | less

Explanation of remaining differences

These are unnecessary to fix.

Ruby

Exception2MessageMapper.html

  def bind(cl)
    self.module_eval "#{<<-"begin;"}\n#{<<-"end;"}", __FILE__, __LINE__+1
    begin;
      def Raise(err = nil, *rest)
        Exception2MessageMapper.Raise(self.class, err, *rest)
      end
      alias Fail Raise

      class << self
        undef included
      end
      def self.included(mod)
        mod.extend Exception2MessageMapper
      end
    end;
  end

I heard about this from @nobu, this evaluation code meta programming with complex heredoc is a trick for syntax highlighting of editor. This is strange and complex case, so I think that conforming output of tihs perfectly is unnecessary.

A method is listed at table_of_contents.html too.

alias

  • MonitorMixin.html
  • ruby/doc/syntax/miscellaneous_rdoc.html

For example,

alias :old_method :method

This is difficult to lexical analyse by IRB's one because it's not separated by comma. The alias syntax is special case of Ruby. In Ripper version, it becomes perfect.

ruby/doc/contributing_rdoc.html

git remote add my_fork git@github.com:my_username/ruby.git

Syntax highlighting for shell command as Ruby. RDoc has only Ruby syntax highlighting feature and it becomes normal quotation without syntex highlighting as Ruby when lexical analysing and parsing as Ruby fails with exception.

Regexp.html

def o.to_regexp() /foo/ end

A method definition one liner with regexp literal. It's so difficult to support this case, and if supports it, RDoc code becomes so complex. In Ripper version, it becomes perfect.

(GitHub syntax highlighting fails too...the correct syntax highlighted regexp literal is like below)

a =~ /foo/

Some commented out methods what are treated as document

I sent issues to ruby-core:

But in Ripper version, these skipped.

Rails

There is one difference.

--- rails-doc-original/classes/AbstractController/Callbacks/ClassMethods.html   2017-09-06 02:14:55.777896261 +0900
+++ rails-doc-ripper/classes/AbstractController/Callbacks/ClassMethods.html     2017-09-06 02:11:53.167905708 +0900
@@ -268,12 +268,14 @@
 <p>Note that <code>:only</code> has priority over <code>:if</code> in case
 they are used together.</p>
 
-<pre><code>only: :index, if: -&gt; { true } # the :if option will be ignored.</code></pre>
+<pre><code>only: :index, if: -&gt; { true } # the :if option will be ignored.
+</code></pre>

In original RDoc, the end of source code quotation is closed without newline. In Ripper RDoc, it's with newline. The IRB lexical analyser separates token to comment and the end of newline, but Ripper has the end of newline in a comment token. This is difficult to conform perfectly.

There are many cases of this in Rails diff, but these are the same as HTML rendering.

Excursus

I sent a patch for Ripper to ruby-core about lexical analysis state because this branch needed it, and I took some advices about lexical analysis state layer by @mrkn and I wrote RDoc::RipperStateLex for it. If the patch what is for Ripper is taken, I'll replace RDoc::RipperStateLex with the new Ripper after all old Ripper has gone to EOL.

@aycabta
Copy link
Member Author

aycabta commented Sep 6, 2017

I warry about Rails using forked SDoc and it has wrong behavior (those are fixed original repository of SDoc, 1 and 2).

@zzak: I want to join SDoc collaborators and RubyGems owners for fix some trouble in the future, and it's just an aside, let's talk about some documentations in the future.

@amatsuda and/or other Rails committers: This SDoc problem is so severe for replacement lexical analyser with Ripper because RDoc crashes with this Pull Request certainly until now, especially about this monkey patch. I think that Rails should use original SDoc as new version.

@drbrain
Copy link
Member

drbrain commented Sep 6, 2017

My years-long dream come true

} && $~
line
end.join("\n")
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too bad expand_tab doesn't support @options.tab_width

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amatsuda
Copy link
Member

amatsuda commented Sep 6, 2017

No worries about the sdoc thing. We can switch to any version so far as it produces correct output.

@hsbt hsbt self-requested a review September 7, 2017 01:44
@aycabta
Copy link
Member Author

aycabta commented Sep 7, 2017

Everyone, please test this branch for documentation to your software and check diff between original and this one.

@aycabta
Copy link
Member Author

aycabta commented Sep 7, 2017

I wrote doctest.sh for checking diff between original RDoc and Ripper version RDoc of specified GitHub repository.

$ git clone git@gist.github.com:24a122ba798337cc3b9a7bfb32fc28f4.git doctest
$ cd doctest
$ ./doctest.sh
Usage: ./doctest.sh username/repository
$ ./doctest.sh ruby/ruby
# run diff command with less command

@robin850
Copy link
Contributor

robin850 commented Sep 8, 2017

Awesome work @aycabta ! 👏 ❤️

Rails is currently pointing to a fork of SDoc because the API site relies on unreleased changed and we are waiting for 1.0.0 final to be cut before updating the Gemfile but no worries, this is temporary.

By the way, the generation time is reduced by ~10s in the Rails repository with this patch ; great ! As you mentioned earlier, the outputs aren't exactly the same but apart from different code highlighting, everything seems fine regarding Rails.

@aycabta
Copy link
Member Author

aycabta commented Sep 8, 2017

Rails is currently pointing to a fork of SDoc because the API site relies on unreleased changed and we are waiting for 1.0.0 final to be cut before updating the Gemfile but no worries, this is temporary.

OK, I understand what is going on.

everything seems fine regarding Rails.

Thank you, I think that this Pull Request is practically settled.

@hsbt
Copy link
Member

hsbt commented Sep 11, 2017

Some commented out methods what are treated as document

I merged these patches into ruby core.

@hsbt
Copy link
Member

hsbt commented Sep 11, 2017

I confirmed that differences of rdoc documents generated old-rdoc and ripper version. I didn't find another difference without Explanation of remaining differences.

I approved to merge this pull request and release 6.0.0.beta2 before RubyKaigi 2017.

@aycabta How do you think about this release?

@aycabta
Copy link
Member Author

aycabta commented Sep 11, 2017

@hsbt You sure can, that's why I wrote all code.

@hsbt hsbt merged commit e96d747 into ruby:master Sep 12, 2017
@aycabta aycabta deleted the use-ripper branch September 12, 2017 02:03
@aycabta
Copy link
Member Author

aycabta commented Sep 12, 2017

💕 🌸 💞

matzbot pushed a commit to ruby/ruby that referenced this pull request Sep 12, 2017
  * This version changed lexer used Ripper from lexer based IRB.
    see details: ruby/rdoc#512

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@59845 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
@ghost
Copy link

ghost commented Oct 11, 2017

🌸

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants