Skip to content

Use applyOrElse for PartialFunctions #6799

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 27, 2018
Merged

Conversation

szeiger
Copy link
Contributor

@szeiger szeiger commented Jun 14, 2018

This avoids evaluating the guards twice. It can be significantly
(25% in my benchmarks) slower in the worst case (a PF with a single
clause with very efficient guard that usually does not match) but can
have much larger performance gains in other scenarios.

All flatMap-based implementations suitable for lazy collection types
are replaced by a new View.Collect primitive that uses an Iterator
instead, which is considerably faster.

Fixes scala/bug#10901

@scala-jenkins scala-jenkins added this to the 2.13.0-M5 milestone Jun 14, 2018
@@ -211,6 +211,12 @@ object View extends IterableFactory[View] {
def iterator = underlying.iterator.flatMap(f)
}

/** A view that collects elements of the underlying collection. */
@SerialVersionUID(3L)
class Collect[+A, B](underlying: SomeIterableOps[A], pf: PartialFunction[A, B]) extends AbstractView[B] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not final?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of those classes are. More specific View types like IndexedSeqView have to be able to extend them.

Copy link
Contributor

@allanrenucci allanrenucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we update PartialFunction to use a local marker instead of a self returning function?

/** To implement patterns like {{{ if(pf isDefinedAt x) f1(pf(x)) else f2(x) }}} efficiently
* the following trick is used:
*
* To avoid double evaluation of pattern matchers & guards `applyOrElse` method is used here
* instead of `isDefinedAt`/`apply` pair.
*
* After call to `applyOrElse` we need both the function result it returned and
* the fact if the function's argument was contained in its domain. The only degree of freedom we have here
* to achieve this goal is tweaking with the continuation argument (`default`) of `applyOrElse` method.
* The obvious way is to throw an exception from `default` function and to catch it after
* calling `applyOrElse` but I consider this somewhat inefficient.
*
* I know only one way how you can do this task efficiently: `default` function should return unique marker object
* which never may be returned by any other (regular/partial) function. This way after calling `applyOrElse` you need
* just one reference comparison to distinguish if `pf isDefined x` or not.
*
* This correctly interacts with specialization as return type of `applyOrElse`
* (which is parameterized upper bound) can never be specialized.
*
* Here `fallback_pf` is used as both unique marker object and special fallback function that returns it.
*/
private[this] val fallback_pf: PartialFunction[Any, Any] = { case _ => fallback_pf }
private def checkFallback[B] = fallback_pf.asInstanceOf[PartialFunction[Any, B]]
private def fallbackOccurred[B](x: B) = (fallback_pf eq x.asInstanceOf[AnyRef])

@@ -416,7 +417,8 @@ trait Iterator[+A] extends IterableOnce[A] with IterableOnceOps[A, Iterator, Ite

def collect[B](pf: PartialFunction[A, B]): Iterator[B] = new AbstractIterator[B] {
// Manually buffer to avoid extra layer of wrapping with buffered
private[this] var hd: A = _
private[this] var hd: B = _
private[this] final val marker = ScalaRunTime.pfMarker // this should be static if there was a way to make it
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be a field? Why not a local variable in hasNext?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A local variable would have to be initialized on every call to hasNext. This requires reading an object's field which is slower than a regular instance field. Ideally it should be a static field.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest defining pfMarker in scala.runtime.Statics and removing this field.

try { if (pf isDefinedAt exception) Success(pf(exception)) else this } catch { case NonFatal(e) => Failure(e) }
override def recoverWith[U >: T](pf: PartialFunction[Throwable, Try[U]]): Try[U] =
try { if (pf isDefinedAt exception) pf(exception) else this } catch { case NonFatal(e) => Failure(e) }
override def recover[U >: T](pf: PartialFunction[Throwable, U]): Try[U] = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch, Stefan!

@@ -476,7 +476,7 @@ sealed abstract class LongMap[+T] extends AbstractMap[Long, T]
override def ++ [V1 >: T](that: scala.collection.Iterable[(Long, V1)]): LongMap[V1] = concat(that)

def collect[V2](pf: PartialFunction[(Long, T), (Long, V2)]): LongMap[V2] =
flatMap(kv => if (pf.isDefinedAt(kv)) new View.Single(pf(kv)) else View.Empty)
LongMap.from(new View.Collect(coll, pf))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not builder-based implementations instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, builder-based would probably be better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll use #6776 (which I just merged) for this purpose.

@szeiger
Copy link
Contributor Author

szeiger commented Jun 28, 2018

I've added a commit to build speclib locally instead of using a pre-built binary. The old version was no longer compatible.

@szeiger
Copy link
Contributor Author

szeiger commented Jun 28, 2018

See scala/bug#7822 and 94166682ee for context for the speclib stuff.

szeiger added 2 commits July 20, 2018 14:49
- Build automatically from sources instead of fetching a pre-built
  binary which can get out of date and needs to be updated manually

- Remove outdated build script (from the ant days)
This avoids evaluating the guards twice. It can be significantly
(25% in my benchmarks) slower in the worst case (a PF with a single
clause with very efficient guard that usually does not match) but can
have much larger performance gains in other scenarios.

All flatMap-based implementations suitable for lazy collection types
are replaced by a new `View.Collect` primitive that uses an Iterator
instead, which is considerably faster.
@szeiger
Copy link
Contributor Author

szeiger commented Jul 20, 2018

Rebased and updated to use the reusable strict implementations. I've also moved pfMarker into Statics.

Should we update PartialFunction to use a local marker instead of a self returning function?

The common wisdom so far was that a self-returning function is faster. My benchmarks show the opposite but they only test the collection-specific use-cases. I didn't want to make such a change without more data.

@szeiger szeiger merged commit 21afb09 into scala:2.13.x Jul 27, 2018
@szeiger szeiger deleted the issue/10901 branch July 27, 2018 16:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants