Skip to content

Pattern matcher: extractors become name-based. #2848

Merged
merged 45 commits into from Aug 20, 2013

4 participants

@paulp
paulp commented Aug 17, 2013

An extractor is no longer required to return Option[T], and
can instead return anything which directly contains methods
with these signatures:

  def isEmpty: Boolean
  def get: T

If the type of get contains methods with the names of
product selectors (_1, _2, etc.) then the type and arity
of the extraction is inferred from the type of get. If
it does not contain _1, then it is a single value
extractor analogous like Option[T].

This has significant benefits and opens new territory:

  • an AnyVal based Option-like class can be used which leverages null as None, and no allocations are necessary
  • for primitive types the benefit is squared (see below)
  • the performance difference between case classes and extractors should now be largely eliminated
  • this in turn allows us to recapture great swaths of memory which are currently squandered (e.g. every TypeRef has fields for pre and args, even though these are more than half the time NoPrefix and Nil)

Here is a primitive example:

  final class OptInt(val x: Int) extends AnyVal {
    def get: Int = x
    def isEmpty = x == Int.MinValue // or whatever is appropriate
  }
  // This boxes TWICE: Int => Integer => Some(Integer)
  def unapply(x: Int): Option[Int]
  // This boxes NONCE
  def unapply(x: Int): OptInt

As a multi-value example, after I contribute some methods to TypeRef:

  def isEmpty = false
  def get     = this
  def _1      = pre
  def _2      = sym
  def _3      = args

Then its extractor becomes

  def unapply(x: TypeRef) = x

Which, it need hardly be said, involves no allocations.

paulp added some commits Aug 15, 2013
@paulp paulp Crasher in symbol tracer. 5212aaf
@paulp paulp Repair NPE in -Ytyper-debug output. 5eb5ad5
@paulp paulp Add some standard names used in pattern matcher. 37e43d0
@paulp paulp Make memberType less crashy.
It's a source of constant peril that sym.tpe on NoSymbol
is fine (it's NoType) but tpe memberType sym on NoSymbol throws
a NSDNHO. The last thing we should be doing is discouraging
people from using memberType in favor of sym.tpe, the latter
being almost always the wrong thing.
130b5d7
@paulp paulp Some general purpose methods.
Motivated by pattern matcher work, also useful elsewhere.
4334f4c
@paulp paulp Expand the understanding of bytecode tests.
The new method is the same as sameMethodAndFieldSignatures, but ignores
generic signatures. This allows for testing methods which receive the same
descriptor but differing generic signatures. In particular, this happens
with value classes, which get a generic signature where a method written
in terms of the underlying values does not.
de1d8c3
@paulp paulp Pushed some noisy logging down to debuglog. 2a31f0a
@paulp paulp Add a helper method drop to ScalaRunTime.
We should do a lot more of this - it's ridiculously difficult
and error prone to generate code of this kind involving implicits,
type inference, etc. where the same goal is trivially accomplished
by generating a method call and letting the typer work out the
details.
44b4dcf
@paulp paulp Removed some dead code in Uncurry. e7c6108
@paulp paulp Positioned variations of inform/warning/globalError.
Because who doesn't want a little positioning in their life.
87d80ff
@paulp paulp Cosmetic cleanup in the matcher. b1d72f1
@paulp paulp Minor improvement in pattern typer inference.
This exploits the infrastructure developed for checking the
checkability of type patterns to improve pattern type inference,
which suffered from a similar deficit. There was a hack for
SI-2486, the best I could manage at the time I wrote it; that
is replaced with the principled approach.
35122d6
@paulp paulp Add checkability condition.
All parents of an intersection type must be checkable for the
type to be checkable.
9672a80
@paulp paulp Cleanups in Unapplies. b38c928
@paulp paulp An Unapplied extractor.
This makes it a lot less error prone and redundant to find the
part you need when unwrapping an UnApply tree.
e76507f
@paulp paulp Remedied glaring omission in error output.
Catching an assert and providing beautifully formatted contextual
information is a questionable service if you forget to provide the
error message from the assert.
0be0b99
@paulp paulp Segreated CPS hacks in pattern matcher. d351a1f
@paulp paulp Simplified the MaybeBoundTyped extractor a bit. 0cf47bd
@paulp paulp Deduplicate mkZero in pattern matcher. b084cab
@paulp paulp Broke up typed1's giant pattern match.
Another level of dispatch based on what trees on can expect
to see during what mode. This should be beneficial for both
performance (smaller methods, fewer type tests) and correctness
(prevent trees from reaching inappropriate typing methods by
construction rather than via ad hoc checks.)

This work also revealed that UnApply trees never reach here,
so I removed typedUnApply as dead code.
dc872cd
@paulp paulp Pulled pattern typing methods from Typers.
To the extent possible this commit is purely the extraction
of those methods into the newly created PatternTypers trait.
The slicing and dicing of those methods will follow shortly.
3349d5a
@paulp paulp Compressed central TreeMaker pattern match.
It's easier to follow the logic when it can all be seen at once.
I moved the specification excerpts down below for continued reference.
13ad734
@paulp paulp Turned TreeMaker into case class.
Type aliases are better than naked tuples, but only barely.
c5f7aac
@paulp paulp Move most of Typers#Typer#typedTyped into PatternTypers.
Again moving pattern-typing logic out of Typers.

You can tell I like writing Typers#Typer#typedTyped.
e611eea
@paulp paulp SI-4425 do some validity checking on unapplies.
Filter out unapplies which can't be called (such as those with
a second non-implicit parameter list) and report the error in a
meaningful fashion.
35775a8
@paulp paulp Expanded logic in formalTypes.
Made the super unusual move (for me) of making a method longer
instead of shorter, because I had quite a time modifying the logic
as it was. Ideally we wouldn't use this method because it is much
less efficient than necessary. If one is typing a call like this:

  def f(xs: Int*)
  f(p1, p2, p3, ..., p500)

Then it is not necessary to create a list with 500 IntTpes. Once
you run out of non-varargs types, every argument which remains can
be typed against the vararg type directly.
a2189f4
@paulp paulp Add some logging to instantiateTypeVar.
This method is right at the center of ALL KINDS of bad.
"this is quite nasty" begins the comment, and I credit its
deadpan understatement. A little logging is the least we
can offer the next guy.
8c94e9c
@paulp paulp Cleanups in Typers.
Should be behaviorally neutral. A little more logging and a
little better style.
a31f1f0
@paulp paulp Simplify management of pattern vars.
Always set their info the same way, and when doing
so always translate repeated types into sequence types.
e60e837
@paulp paulp Stylistic cleanups in patmat.
Should be behaviorally neutral.
db2e756
@paulp paulp Introduced classes to encapsulate extractor info.
To be put to use in a commit shortly to follow.
b895541
@paulp paulp Pattern matcher: extractors become name-based.
An extractor is no longer required to return Option[T], and
can instead return anything which directly contains methods
with these signatures:

  def isEmpty: Boolean
  def get: T

If the type of get contains methods with the names of
product selectors (_1, _2, etc.) then the type and arity
of the extraction is inferred from the type of get. If
it does not contain _1, then it is a single value
extractor analogous like Option[T].

This has significant benefits and opens new territory:

  - an AnyVal based Option-like class can be used which
    leverages null as None, and no allocations are necessary
  - for primitive types the benefit is squared (see below)
  - the performance difference between case classes and
    extractors should now be largely eliminated
  - this in turn allows us to recapture great swaths of
    memory which are currently squandered (e.g. every
    TypeRef has fields for pre and args, even though these
    are more than half the time NoPrefix and Nil)

Here is a primitive example:

  final class OptInt(val x: Int) extends AnyVal {
    def get: Int = x
    def isEmpty = x == Int.MinValue // or whatever is appropriate
  }
  // This boxes TWICE: Int => Integer => Some(Integer)
  def unapply(x: Int): Option[Int]
  // This boxes NONCE
  def unapply(x: Int): OptInt

As a multi-value example, after I contribute some methods to TypeRef:

  def isEmpty = false
  def get     = this
  def _1      = pre
  def _2      = sym
  def _3      = args

Then it's extractor becomes

  def unapply(x: TypeRef) = x

Which, it need hardly be said, involves no allocations.
8f05647
@paulp paulp New tests for name-based pattern matcher. 1cd7a9e
@paulp
paulp commented Aug 17, 2013

I didn't want to mess with #2835 because it has those green check marks. This commit ends up at almost exactly the same place, but with much more granularity in the key commit.

was: 14 files changed, 1011 insertions(+), 806 deletions(-)
now: 9 files changed, 333 insertions(+), 419 deletions(-)

The only change in the end result is a bugfix which I found trying to compile scalariform involving nested bindings like b1 @ (b2: Foo). I couldn't reproduce the overloading issue seen in jenkins, but maybe somehow it will turn out to be related to that fix.

paulp and others added some commits Aug 17, 2013
@paulp paulp Introduced case class BoundTree.
The first step in improving the handling of (Symbol, Tree).
4f6b16a
@paulp paulp Move more pattern logic into BoundTree. 54bb76b
@xeno-by xeno-by SI-5903 extractor macros
Establishes a pattern that can be used to implement extractor macros
that give the programmer control over signatures of unapplications
at compile-time.

=== The pattern ===

In a nutshell, given an unapply method (for simplicity, in this
example the scrutinee is of a concrete type, but it's also possible
to have the extractor be polymorphic, as demonstrated in the tests):

```
def unapply(x: SomeType) = ???
```

One can write a macro that generates extraction signatures for unapply
on per-call basis, using the target of the calls (c.prefix) and the type
of the scrutinee (that comes with x), and then communicate these signatures
to the typechecker.

For example, here's how one can define a macro that simply passes
the scrutinee back to the pattern match (for information on how to
express signatures that involve multiple extractees, visit
scala/scala#2848).

```
def unapply(x: SomeType) = macro impl
def impl(c: Context)(x: c.Tree) = {
  q"""
    new {
      class Match(x: SomeType) {
        def isEmpty = false
        def get = x
      }
      def unapply(x: SomeType) = new Match(x)
    }.unapply($x)
  """
}
```

In addition to the matcher, which implements domain-specific
matching logic, there's quite a bit of boilerplate here, but
every part of it looks necessary to arrange a non-frustrating dialogue
with the typer. Maybe something better can be done in this department,
but I can't see how, without introducing modifications to the typechecker.

Even though the pattern uses structural types, somehow no reflective calls
are being generated (as verified by -Xlog-reflective-calls and then
by manual examination of the produced code). That's a mystery to me, but
that's also good news, since that means that extractor macros aren't
going to induce performance penalties.

Almost. Unfortunately, I couldn't turn matchers into value classes
because one can't declare value classes local. Nevertheless,
I'm leaving a canary in place (neg/t5903e) that will let us know
once this restriction is lifted.

=== Use cases ===

In particular, the pattern can be used to implement shapeshifting
pattern matchers for string interpolators without resorting to dirty
tricks. For example, quasiquote unapplications can be unhardcoded now:

```
def doTypedApply(tree: Tree, fun0: Tree, args: List[Tree], ...) = {
  ...
  fun.tpe match {
    case ExtractorType(unapply) if mode.inPatternMode =>
      // this hardcode in Typers.scala is no longer necessary
      if (unapply == QuasiquoteClass_api_unapply) macroExpandUnapply(...)
      else doTypedUnapply(tree, fun0, fun, args, mode, pt)
  }
}
```

Rough implementation strategy here would involve writing an extractor
macro that destructures c.prefix, analyzes parts of StringContext and
then generates an appropriate matcher as outlined above.

=== Implementation details ===

No modifications to core logic of typer or patmat are necessary,
as we're just piggybacking on scala/scala#2848.

The only minor change I introduced is a guard against misbehaving
extractor macros that don't conform to the pattern (e.g. expand into
blocks or whatever else). Without the guard we'd crash with an NPE,
with the guard we get a sane compilation error.
84a3359
@paulp paulp Pull translatePattern entirely into BoundTree. ef30ea3
@paulp paulp Reworked MaybeTypedBound. 017460e
@paulp paulp Finish segregating patmat cps hacks. 22b82a4
@paulp paulp Merge remote-tracking branch 'xeno-by/topic/patmat2' into patmat2 a9efda4
@paulp paulp Revert "Minor improvement in pattern typer inference."
This reverts commit 35122d6cda84bb2df69ca51c6b1b80e61693bf6f.
It also includes a test case embodying the reversion reason:
the test case no longer compiled. Parties interested in the
surrounding details may want to look at SI-7472.
a905d0e
@paulp
paulp commented Aug 18, 2013

@dragos wasn't what I thought. Hopefully a905d0e will involve real-passing instead of faux-passing.

@paulp paulp Refined name-based patmat methods.
This fleshes out some of the slightly unfinished corners
of the adventure, especially for unapplySeq. There's still
an unhealthy amount of duplication and a paucity of
specification, but I think it's in eminently good shape
for a milestone.
6d77da3
@paulp
paulp commented Aug 18, 2013

For ease of reference:

[INFO] org.scala-ide.sdt.build ........................... SUCCESS [0.523s]
[INFO] org.scala-ide.sdt.aspects ......................... SUCCESS [5.057s]
[INFO] org.scala-ide.sdt.core ............................ SUCCESS [52.015s]
[INFO] org.scala-ide.sdt.core.tests ...................... SUCCESS [2:06.068s]
[INFO] org.scala-ide.sdt.spy ............................. SUCCESS [4.982s]
[INFO] org.scala-ide.sdt.debug ........................... SUCCESS [14.482s]
[INFO] org.scala-ide.sdt.debug.tests ..................... SUCCESS [58.897s]
[INFO] org.scala-ide.sdt.weaving.feature ................. SUCCESS [0.127s]
[INFO] org.scala-ide.sdt.feature ......................... SUCCESS [0.135s]
[INFO] org.scala-ide.sdt.dev.feature ..................... SUCCESS [0.133s]
[INFO] org.scala-ide.sdt.source.feature .................. SUCCESS [0.079s]
[INFO] org.scala-ide.sdt.update-site ..................... SUCCESS [3.463s]
@paulp
paulp commented Aug 18, 2013

All those happy successes are after a905d0e. After 6d77da3 it went back to failing, but I can't for the life of me make any sense of the six megabytes of console output. Whether the failure is spurious or not, I'd think it would be better if it would not abandon ship at the first sniff of something wrong. I'm just going to assume it's spurious unless someone tells me otherwise.

 [INFO] org.scala-ide.sdt.build ........................... SUCCESS [0.479s]
 [INFO] org.scala-ide.sdt.aspects ......................... SUCCESS [4.516s]
 [INFO] org.scala-ide.sdt.core ............................ SUCCESS [43.115s]
 [INFO] org.scala-ide.sdt.core.tests ...................... FAILURE [1:57.485s]
 [INFO] org.scala-ide.sdt.spy ............................. SKIPPED
 [INFO] org.scala-ide.sdt.debug ........................... SKIPPED
 [INFO] org.scala-ide.sdt.debug.tests ..................... SKIPPED
 [INFO] org.scala-ide.sdt.weaving.feature ................. SKIPPED
 [INFO] org.scala-ide.sdt.feature ......................... SKIPPED
 [INFO] org.scala-ide.sdt.dev.feature ..................... SKIPPED
 [INFO] org.scala-ide.sdt.source.feature .................. SKIPPED
 [INFO] org.scala-ide.sdt.update-site ..................... SKIPPED
paulp added some commits Aug 18, 2013
@paulp paulp An unapplySeq-via-String test.
There are a lot of details yet to be ironed out when it comes
to sequences, but at least here's a little evidence that the basic
mechanisms work.
b3d9dfa
@paulp paulp Refinement of name-based unapplySeq.
Can't finnesse the drop method. Call it blindly for now, even
though in the long run you won't have to write drop.
6d4e71c
@dragos
dragos commented Aug 19, 2013

@paulp I'm going to have a look.

@dragos
dragos commented Aug 19, 2013

@paulp That failure is spurious.

@adriaanm
The Scala Programming Language member

@paulp, thanks for splitting up the PR further -- LEG(TM)

@adriaanm
The Scala Programming Language member

(The 'E' reminded me of another adjective. Epic.)

@adriaanm
The Scala Programming Language member

LGTM -- let's merge this tomorrow-Tuesday

@paulp paulp Merge branch 'master' into patmat
Conflicts:
	src/compiler/scala/tools/nsc/Global.scala
	src/compiler/scala/tools/nsc/transform/patmat/MatchTranslation.scala
01f771e
@paulp
paulp commented Aug 20, 2013

I'm resolving the merge conflict. I'm not rebasing since I'll lose those 44 green checkmarks. Let me know if I should rebase anyway.

@adriaanm
The Scala Programming Language member

No, thanks for resolving the conflict. I was about to merge this PR, so I'll wait until 01f771e goes green to merge anything else.

@paulp
paulp commented Aug 20, 2013

@adriaanm it has gone to state "happy green", now initiating breath holding sequence.

@adriaanm
The Scala Programming Language member

Hope you haven't turned blue yet. Stay happy green!

@adriaanm
The Scala Programming Language member

I was doing some quick local checking. Will hit merge button in 5 4

@adriaanm adriaanm merged commit 38d5b20 into scala:master Aug 20, 2013

1 check passed

Details default pr-scala Took 78 min.
@adriaanm
The Scala Programming Language member

:fireworks:

@paulp paulp deleted the unknown repository branch Aug 20, 2013
@paulp
paulp commented Aug 20, 2013

Hope you haven't turned blue yet. Stay happy green!

https://wiki.jenkins-ci.org/display/JENKINS/Green+Balls

@adriaanm
The Scala Programming Language member

The green is a bit... radio-active, but otherwise this could very well turn out to be my favorite jenkins plugin. (Not the hardest contest.)

@retronym retronym added a commit to retronym/scala that referenced this pull request Dec 12, 2013
@retronym retronym Test case for recently improved unchecked warning
Prior to scala#2848, the enclosed
test compiled without warning and printed:

    true
    false

Features interacting:
  - implicit class tags to enable type patterns on abstract types
  - type tests on compound types.

I think the unchecked warning is acceptable for now.
c4e1b03
@retronym retronym added a commit to retronym/scala that referenced this pull request Mar 12, 2014
@retronym retronym SI-8395 Regression in pattern matching with nested binds
Regressed in scala#2848. In particular,
see 0cf47bd and 017460e.

This commit does away with the `unbind` in the `TypeBound` extractor,
as this was burrowing through

Because of the regression, this pattern:

    (s @ (_s @ (_: String)))

was translated into `typeTestStep`, rather than a `bindingStep`. This
came down the the use of `unbind` in the `TypeBound` extractor.

My first step was to remove the `unbind`. That led to another
problem: the tree now matches `SymbolAndTypeBound`, which extracted
`symbol = s, tpe = String`, ignoring the `_s`. I changed that
extractor to no longer recursively apply to the sub-pattern tree,
which is what `MaybeTypedBound` used to do.

I also checked for other uses of `unbind` in the match translation.
The only place I found it is in `BoundTree#pt`. I believe that this
usage is correct, or at least, not obviously incorrect.
b34af76
@retronym retronym added a commit to retronym/scala that referenced this pull request Mar 12, 2014
@retronym retronym SI-8395 Regression in pattern matching with nested binds
Regressed in scala#2848. In particular,
see 0cf47bd and 017460e.

Because of the regression, this pattern:

    (s @ (_s @ (_: String)))

was translated into `typeTestStep`, rather than a `bindingStep`. This
came down the the use of `unbind` in the `TypeBound` extractor.

My first step was to remove the `unbind`. That led to another
problem: the tree now matches `SymbolAndTypeBound`, which extracted
`symbol = s, tpe = String`, ignoring the `_s`. I changed that
extractor to no longer recursively apply to the sub-pattern tree,
which is what `MaybeTypedBound` used to do.

I also checked for other uses of `unbind` in the match translation.
The only place I found it is in `BoundTree#pt`. I believe that this
usage is correct, or at least, not obviously incorrect.
5ef842e
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.