Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Miniboxing should interoperate with specialization #108

Closed
VladUreche opened this issue Jun 27, 2014 · 4 comments
Closed

Miniboxing should interoperate with specialization #108

VladUreche opened this issue Jun 27, 2014 · 4 comments

Comments

@VladUreche
Copy link
Member

... at least for FunctionX and TupleX, although arguably TupleXspecialization should be deprecated due to the duplicate field issue in specialization (SI-3585).

Suggested by @alexandru in the miniboxing mailing list: https://groups.google.com/d/msg/scala-miniboxing/dJDmxtPnwCg/cqh7Tm7xnhEJ

@VladUreche
Copy link
Member Author

The problem is that the function application requires boxing, although the caller is miniboxed and the FunctionX is specialized.

The first problem is that FunctionX is specialized but not miniboxed, and we have no means of reliably overriding the Scala standard library to replace specialized functions by miniboxed ones, at least not in a reliable way. Since FunctionX is not miniboxed, the plugin considers it generic, thus not optimizing calls to f.apply.

Currently, the miniboxing transformation occurs before specialization, so it would be specialization's responsibility to redirect the call from f.apply(...) to f.apply$mcXX$sp(...). But miniboxing transforms generics shallowly, so even though T and R are miniboxed, Function1[T, R] does not become Function1[Long, Long], but stays Function1[T, R].

Now, according to the definition of optimized stack initiators in the miniboxing tutorial, which is also applicable to specialization, a receiver of type Function1[T, R] is not a suitable as a optimized stack initiator. Also, since specialization doesn't have access to the miniboxing metadata, it is not an optimized stack propagator either. So it ends up as an optimized stack inhibitor, thus requiring boxing.

Now, how to fix this?

A proposed solution is to create a specialized bridging method, that miniboxing can call:

def function_spec_bridge_JJ[T, R](T_Tag: Byte, R_Tag: Byte, f: Function1[T, R], arg: Long): Long =
  // of course we would optimize this, but this is the idea:
  (T_Tag, R_Tag) match {
    case (INT, INT) => int2minibox(f.asInstanceOf[Function1[Int, Int]].apply(minibox2int(arg)))
    case ... =>
  }

Since Function1[Int, Int] is a valid optimized stack initiator from specialization's point of view, the call to apply will be rewritten to apply$mcII$sp and the boxing will be gone.

But what about that tuple of tags? Isn't that expensive?

Yes, it is. An optimized implementation is:

def function_spec_bridge_JJ[T, R](T_Tag: Byte, R_Tag: Byte, f: Function1[T, R], arg: Long): Long =
  T_Tag match {
    case INT => function_spec_bridge_JI[T, R](R_Tag, f, minibox2int(arg))
    ...
  }
def function_spec_bridge_IJ[T, R](R_Tag: Byte, f: Function1[T, R], arg: Int): Long =
  R_Tag match {
    case INT => int2minibox(f.asInstanceOf[Function1[Int, Int]].apply(arg))
    ...
  }

The above two-layered structure will be amenable to inlining in the caller, thus potentially optimizing the code for call sites where the tags are known statically. On the other hand, it does two steps of dispatching, thus significantly slowing down execution.

An alternative implementation would be:

def function_spec_bridge_JJ[T, R](T_Tag: Byte, R_Tag: Byte, f: Function1[T, R], arg: Long): Long =
  (T_Tag + 10 * R_Tag) match {
    case INT + 10 * INT => int2minibox(f.asInstanceOf[Function1[Int, Int]].apply(minibox2int(arg)))
    ...
  }

The second solution encourages the HotSpot vm to compile of the function_spec_bridge_JJ without inlining it in the caller. Since the function call is not inlined, and f is always megamorphic, this requires two virtual calls but a single switch and does not put pressure on the register allocator.

To sum up, it is not clear which solution would be better, if either would be better at all. I think benchmarks should decide between the two implementations. Over the next few days I will implement both solutions and make a compiler flag to trigger one or the other, so we can benchmark.

@VladUreche
Copy link
Member Author

Of course, the best solution would be to allow miniboxing stubs in specialization, thus allowing miniboxing to optimally call into specialized code. This would actually provide wonderful performance, but that would require binary incompatible changes to the Scala library...

@alexandru
Copy link

+1 eagerly waiting to see the results of those benchmarks

@VladUreche
Copy link
Member Author

@alexandru, sorry for the radio silence these last few days -- I had a few distractions and I've been thinking about the best solution for this problem -- I think I have a solution that will be significantly better -- only introducing overheads when creating a function, not when calling it. I'm looking into this now.

VladUreche added a commit that referenced this issue Jul 4, 2014
VladUreche added a commit that referenced this issue Jul 9, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants