Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lazy map adds extra array layer to single-element array output #5044

Closed
Taywee opened this issue Feb 15, 2018 · 17 comments · Fixed by #5812
Closed

lazy map adds extra array layer to single-element array output #5044

Taywee opened this issue Feb 15, 2018 · 17 comments · Fixed by #5812
Milestone

Comments

@Taywee
Copy link

Taywee commented Feb 15, 2018

Environment

jruby 9.1.15.0 (2.3.3) 2017-12-07 929fde8 OpenJDK 64-Bit Server VM 25.151-b12 on 1.8.0_151-b12 +jit [linux-x86_64]

Linux nyarlathotep 4.14.11-gentoo-r2 #1 SMP Thu Jan 4 17:46:16 MST 2018 x86_64 Intel(R) Core(TM) i5-3570K CPU @ 3.40GHz GenuineIntel GNU/Linux

Expected Behavior

The expected behavior should be as shown in standard Ruby:

[15] pry(main)> [1, 2, 3].map{|a| [a + 1]}.to_a
=> [[2], [3], [4]]
[16] pry(main)> [1, 2, 3].lazy.map{|a| [a + 1]}.to_a
=> [[2], [3], [4]]
[17] pry(main)> [[1], [2], [3]].lazy.map(&:itself).to_a
=> [[1], [2], [3]]

Actual Behavior

Run from vanilla jruby's IRB, it looks like this:

irb(main):005:0> [1, 2, 3].map{|a| [a + 1]}.to_a
=> [[2], [3], [4]]
irb(main):006:0> [1, 2, 3].lazy.map{|a| [a + 1]}.to_a
=> [[[2]], [[3]], [[4]]]
irb(main):007:0> [[1], [2], [3]].lazy.map(&:itself).to_a
=> [[[1]], [[2]], [[3]]]
@Taywee
Copy link
Author

Taywee commented Feb 15, 2018

I found another bug that might be related. After a take, map erroneously strips arrays, and has some weird behaviour, including an incorrect ArgumentError:

[66] pry(main)> [[1],[2],[3]].lazy.take(2).map { |elem| p elem }.force
1
2
=> [1, 2]
[67] pry(main)> [[],[2],[3]].lazy.take(2).map { |elem| p elem }.force
nil
2
=> [nil, 2]
[68] pry(main)> [[1, 2],[2],[3]].lazy.take(2).map { |elem| p elem }.force
1
2
=> [1, 2]
[69] pry(main)> [[1, 3],[2],[3]].lazy.take(2).map { |elem| p elem }.force
1
2
=> [1, 2]
[70] pry(main)> [[1, 3],[2],[3]].lazy.take(2).map { |elem| elem.itself }.force
=> [1, 2]
[71] pry(main)> [[1, 3],[2],[3]].lazy.take(2).map(&:itself).force
ArgumentError: wrong number of arguments calling `itself` (1 for 0)
from uri:classloader:/jruby/kernel/enumerator.rb:72:in `block in map'
[72] pry(main)> 

@headius
Copy link
Member

headius commented Feb 15, 2018

Hmm...well if this issue is specific to lazy maybe there's an easy fix possible? Our lazy Enumerator is implemented in Ruby: https://github.com/jruby/jruby/blob/master/core/src/main/ruby/jruby/kernel/enumerator.rb#L11

@Taywee
Copy link
Author

Taywee commented Feb 15, 2018

It does look like it's just specific to lazy; I couldn't replicate any of the bugs in non-lazy enumeration. Looking at the ruby, it seems to me that it might be some issue with the argument splats (there is some imbalance with the splatting into the yielder, and splatting of arguments for the block between select, take, and map, for instance), but I haven't studied the code enough to be very sure.

@headius
Copy link
Member

headius commented Feb 15, 2018

I know that other Ruby-in-Ruby impls have had to introduce special non-Ruby semantics for some of these argument passing scenarios. I had hoped we had such cases covered inside our "native" block and Enumerable logic, but perhaps we're missing something more.

@glucero
Copy link
Contributor

glucero commented Feb 15, 2018

Could this be a bug in Yielder#<<.

JRuby's RubyYielder.op_lshift is very different than RubyYielder.yield

https://github.com/jruby/jruby/blob/master/core/src/main/java/org/jruby/RubyYielder.java#L105-L113

But MRI's yielder_yield_push just calls yielder_yield as is:

https://github.com/ruby/ruby/blob/trunk/enumerator.c#L1239-L1243

This change seems to fix this specific issue:

diff --git a/core/src/main/ruby/jruby/kernel/enumerator.rb b/core/src/main/ruby/jruby/kernel/enumerator.rb
index 9a0076d2e5..481ea7c828 100644
--- a/core/src/main/ruby/jruby/kernel/enumerator.rb
+++ b/core/src/main/ruby/jruby/kernel/enumerator.rb
@@ -73,7 +73,7 @@ class Enumerator
     def map
       _block_error(:map) unless block_given?
       Lazy.new(self) do |yielder, *values|
-        yielder << yield(*values)
+        yielder.yield yield(*values)
       end.__set_inspect :map
     end
     alias_method :collect, :map

This new functionality in op_lshift started here:

cb25efe

And it looks like there are already comments there about how that commit may have caused issues. - however reverting that commit causes a regression.

@glucero
Copy link
Contributor

glucero commented Feb 15, 2018

Oh it looks like this issue already has a full and rich history: #3814

@Taywee
Copy link
Author

Taywee commented Feb 15, 2018

Lot of layers to this, with some other bugs as well:

[21] pry(main)> p Enumerator.new{|y| y << []}.first
nil
=> nil
[22] pry(main)> p Enumerator.new{|y| y.yield []}.first
nil
=> nil
[11] pry(main)> p Enumerator.new{|y| y << [1]}.to_a
[[[1]]]
=> [[[1]]]
[12] pry(main)> p Enumerator.new{|y| y.yield [1]}.to_a
[[1]]
=> [[1]]
[13] pry(main)> p Enumerator.new{|y| y << [1]}.first
[1]
=> [1]
[14] pry(main)> p Enumerator.new{|y| y.yield [1]}.first
1
=> 1

@glucero yeah, looks like it.

@headius
Copy link
Member

headius commented Feb 15, 2018

Hmm, Yielder is native and could be the culprit indeed.

@headius
Copy link
Member

headius commented Feb 15, 2018

It might be worth it (and not too complicated) to make a pure-Ruby Yielder and see how it behaves. One area we have a lot of trouble is mapping all the various block-dispatch semantics to Java calls.

@Taywee
Copy link
Author

Taywee commented Feb 15, 2018

Pure-ruby yielder should be relatively easy. I'm not fully versed in MRI's C API, but just looking at the sources, it seems roughly similar to:

class Yielder
  def initialize(&block)
    @block = block
  end
  def yield(*args)
    @block.call(*args)
  end
  alias_method :<<, :yield
end

Testing with a simple implementation of a subset of Enumerator, it appears to work about as expected:

class MyEnumerator
  include Enumerable

  def initialize(&block)
    @block = block
  end

  def each(&block)
    if block_given?
      @block.call(Yielder.new(&block))
    else
      self
    end
  end
end

This appears to work as expected in MRI, but JRuby still gives some erroneous results in some circumstances (I'm guessing due to faulty Enumerable methods):

[11] pry(main)> MyEnumerator.new{|y| y << [1]}.to_a
=> [[1]]
[12] pry(main)> MyEnumerator.new{|y| y.yield [1]}.to_a
=> [[1]]
[13] pry(main)> MyEnumerator.new{|y| y << [1]}.first
=> 1
[14] pry(main)> MyEnumerator.new{|y| y.yield [1]}.first
=> 1
[15] pry(main)> MyEnumerator.new{|y| y << []}.first
=> nil
[16] pry(main)> MyEnumerator.new{|y| y.yield []}.first
=> nil

@Taywee
Copy link
Author

Taywee commented Feb 15, 2018

Might it make sense to implement Enumerator, Enumerable, and Yielder all in pure Ruby? There'd have to be a lot of interaction between them and some Fiber magic, but it seems like it'd be more manageable to avoid a large amount of the Ruby/Java shuffling here.

@headius
Copy link
Member

headius commented Feb 16, 2018

@Taywee It would indeed, and we've discussed this frequently because of exactly these problems. The weird vagaries of parameter passing for blocks works "perfectly" in pure Ruby because it's passing through the actual Ruby compiler/JIT, while Enumerable and Enumerator and Yielder all are trying to emulate that on a case-by-case basis. Also the Enumerable code sucks.

I think it would be a good time to do this in 9.2. What do you think, @enebo?

In the short term I think it would be totally fine to move Yielder into Ruby, since it's only a small amount of code.

@headius headius added this to the JRuby 9.2.0.0 milestone Feb 16, 2018
@Taywee
Copy link
Author

Taywee commented Feb 19, 2018

@headius I've written up a quick-and-dirty Enumerator and Enumerable implementation here, which should be complete (though not necessarily bug-free yet), short Lazy. I've done some minor testing of it and most of what I expect to be correct is correct. There are some changes due to make it behave in-line with the way JRuby does coercion and error detection, and I think there are places where it uses each but should be using each_entry, and I'm certain there are bugs floating in there given that it's completely untested, but it might be a good start. I'm planning on trying to integrate the existing Lazy enumerator class and seeing if I can drop it into JRuby and get it passing tests.

@headius
Copy link
Member

headius commented Feb 19, 2018

I'll have a look at what you've come up with! There are a few other pure-Ruby implementations out there, like the one from Rubinius and the adapted version in TruffleRuby. Both of those use some internal functions we'd need to expose, such as for coercing various values.

I did try just a pure-ruby Yielder and it only regressed one spec.

For 9.1.x this might be safest to just one-off fix, as we have with other Enumerable and Enumerator methods. But I think it's time we moved to pure Ruby for the whole Enumerable stack in 9.2.

@Taywee
Copy link
Author

Taywee commented Feb 19, 2018

Yeah, adapting the Rubinius implementation might be a better option, being established and well-tested and all. Any route to getting to a more stable and trustable Enumera* set.

@headius headius mentioned this issue Mar 27, 2018
15 tasks
@enebo enebo modified the milestones: JRuby 9.2.0.0, JRuby 9.2.1.0 May 24, 2018
@headius
Copy link
Member

headius commented Oct 26, 2018

We are now finally starting to incorporate some inlining, which means the possibility of moving more Enumerator and Enumerable methods to Ruby has real legs. I'm going to bump this one more version and we'll see if we can start getting some of these methods moved over.

Note this would also help issues like #4108 #4212 and others.

@headius headius modified the milestones: JRuby 9.2.1.0, JRuby 9.2.2.0 Oct 26, 2018
@enebo enebo removed this from the JRuby 9.2.5.0 milestone Dec 6, 2018
@enebo enebo added this to the JRuby 9.2.6.0 milestone Dec 6, 2018
@enebo enebo modified the milestones: JRuby 9.2.6.0, JRuby 9.2.7.0 Dec 19, 2018
@enebo enebo modified the milestones: JRuby 9.2.7.0, JRuby 9.2.8.0 Jan 9, 2019
@headius
Copy link
Member

headius commented Apr 12, 2019

Started poking at this a bit, to see if only partially moving to Ruby would help. It turns out that it's a pretty mixed bag:

  • Just moving Generator to Ruby doesn't seem to help at all.
  • Moving Generator and Yielder to Ruby fixes most of your these examples (notably not the ones @Taywee mentioned after his pure-Ruby attempt) but regresses at least Enumerable#first (it unboxes something it should not).

The next step would be making Enumerable#first be pure-Ruby, and this is where things start to break down. The only known compliant implementations of #first in Ruby originate in Rubinius here:

https://github.com/rubinius/rubinius/blob/master/core/enumerable.rb#L583-L589

The logic for __take__ lives in the same file here:

https://github.com/rubinius/rubinius/blob/master/core/enumerable.rb#L826-L843

And this shows that the argument processing for almost all the Enumerable methods has a bit more complexity than pure Ruby can actually handle, with this special Rubinius.single_block_arg marker for certain types of arguments.

We ought to be able to mimic that logic, though perhaps it will require work to the block/proc yield/call logic. However, this makes the task rather more complicated than we would do in a minor release.

We also are not yet in a position to optimize through the various pure-Ruby Enumerable methods, which would be preferable before making the move.

@enebo Do we think that's a prerequisite, or not? The current implementation obviously doesn't yield and specialize any better, but it does avoid a lot of DynamicScope allocation by using native block bodies. We won't be able to eliminate that for pure-Ruby blocks without inlining.

kares added a commit that referenced this issue Aug 2, 2019
also from comments GH-3814 which are now passing

`Enumerator.new { |y| y.yield([1]) }.lazy.map { |e| e }.to_a`
seems to be left not working as in MRI
Adithya-copart pushed a commit to Adithya-copart/jruby that referenced this issue Aug 25, 2019
also from comments jrubyGH-3814 which are now passing

`Enumerator.new { |y| y.yield([1]) }.lazy.map { |e| e }.to_a`
seems to be left not working as in MRI
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants