New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove parallel collection views and, with them, Gen*View #3191
Conversation
In 2.12, this gives us the option to move the code from Gen*View down into *View. If we don't do something more drastic with views, which inertia and history suggests is a real possibility, we can at least shed a little of the implementation. These abstractions are *only* used to share implementation; there is no `view` method available on, for instance, `GenSeq` that lets one abstract over parallel/sequential collections while spawning views. scala> (List(1): collection.GenSeq[Int]).view <console>:8: error: value view is not a member of scala.collection.GenSeq[Int] (List(1): collection.GenSeq[Int]).view ^ Let's keep it that way. I suspect that views over parallel collections exist not because they were the most sought after feature, but rather because the initial incarnatin of parallel collections used to live undernead TraversableOnce, and hence were obligated to implement `def view`. This change will give us deprecation warnings in the non deprecated places that extend `Gen*View` (three, by my count) in the interim. There are ways to avoid this, but they aren't particularly appealing.
Review by @Ichoran @jsuereth @adriaanm @axel22
Secondary is review of the code change itself. If we decided to go ahead with this, I'll need to go through that again with a fine toothcomb to confirm linearization order hasn't accidentally changed. We can do that with tooling. If we don't feel comfortable with this for 2.11, we could think about shooting off a 2.12 branch (I don't think that should be |
In "To Kill a View" @odersky points out that the
His suggestion was to:
This PR is a slightly different takes a different path to the same destination: |
Views in parallel collections are something I never really liked a lot, and I think they need a completely different hierarchy, since the operations that take can implement reasonably efficient comprise only a subset of all collection operations. Expecting to replace any occurence of a collection/parallel collection with a view and reap the benefits is in vain. So if you want a quick ok after a quick glance on the changeset, without studying this PR carefully - LGTM. However, isn't the standard way of doing things now going through a deprecation cycle? |
While I think this is a move into the right direction, I think it should go into 2.12 (never thought I would be one of those who doesn't want to ship everything immediately :-D), because this should be - in the best case - only the tip of the iceberg. In my opinion, we need to have a look at the whole collection framework and rethink our strategy we have concerning performance and the things Java 8 ships.
The only advantage of views compared to Java 8's Streams as far as I see is that one can't "exhaust" them. I mean, IF we can make views comparably fast, then we have a huge issue less (then only the question whether we should do it by default remains ... I think that would be the right thing, but even if we don't we still don't mandate more boilerplate than Java), if we can't make it work, we need to consider other options: As far as I see, individual collection types shine when we use individual operation for which they were built, but bulk-operations can almost never use these advantages, because what matters most here is pure iteration speed (e. g. having some special data-structure with a blazingly fast operation X just can't show its advantages when we spent a considerable amount of time building and rebuilding the collection after pretty much every step ... which are probably exactly those types of operations many special purpose data-structures are not fast at). Maybe a "common" (e. g. one type) way to facilitate these bulk-operations similar to Streams would prove to be more beneficial, because it would allow us to really tune all the operations there instead of having to deal with * plus the issue of all the overhead of the mentioned reification of the data structure after every operation. A completely different approach would be to keep things the way they are and add some notion of purity to the compiler, so that the compiler can avoid the overhead if it isn't observable from the outside. I'm not sure this is practical though, considering that we want to have less complexity, not more and purity is a huge, complicated topic which we probably won't be able to spec, implement and ship in the next few releases. In the end, if we really want to fix the remaining issues of the collection library, we have to think about what level of breakage is acceptable. I think it is possible to preserve source compatibility to a large extent, but I think all those people with custom collections will have to fix things manually. Wow, this has gotten quite long ... maybe I should post to the mailing list instead. |
Idea: Maybe we can make parallel collections lazy by default (what I mean by that is to avoid building all the immediate data structures mentioned above), because there aren't any guarantees regarding the order in which things happen anyway. |
Well, there are guarantees currently - there is an ordering between different operations in e.g. soc notifications@github.com wrote:
Sent from my Android phone with K-9 Mail. Please excuse my brevity. |
@soc We have to tear things down, before we can build them up anew. Parallel views are the low hanging fruit that I think we should seriously consider for this release. There is no way we can migrate to an approach were we auto-fuse There is some research happening at EPFL into (parallel) collections 2.0, part of which is a @axel22: My counter argument for the deprecation cycle here is that |
In that case, LGTM! Jason Zaugg notifications@github.com wrote:
Sent from my Android phone with K-9 Mail. Please excuse my brevity. |
+1! On Sunday, November 24, 2013, Aleksandar Prokopec wrote:
|
For the sequential views that we retain, we still are faced with the "view gaps." These arise when new methods are added to the collections API and are not overriden in the view. For example: scala> List(1).view.distinct.force
java.lang.UnsupportedOperationException: SeqView(...).newBuilder
scala> List(1).view.inits.toList
java.lang.UnsupportedOperationException: SeqView(...).newBuilder It turns out these are pretty easy to fix, once you spot them. (This fact was never clear to me before.) Once just needs to add: override def distinct: This =
newForced(thisSeq.distinct).asInstanceOf[This] That will force all the prior operations over then entire collection, but, hey, that's better than a So, all that remains is a robust way to identify the gaps, and to prevent more from opening up. The following reflects on the views and finds all methods which produce a new view (read: ones that return a type containing "Repr") that are inherited from somewhere other than "*ViewLike":
I will plug these and backstop with a test that runs this report. There are of course deeper issues with views that these two changes won't address: their strength (transparency) is also a weakness: it's hard to predict whether they will be forced when you hand them off to unknown code that operates over, e.g, |
I've plugged the gaps in sequential views in #3192 |
PLS REBUILD ALL |
(kitty-note-to-self: ignore 29199430) |
@retronym regarding “We have to tear things down, before we can build them up anew.”, I agree. Even if we would reintroduce them with improved/fixed semantics and better performance, a deprecation seems to be the right thing to do. |
For the record, I disagree, and that's why I never did it. Lots of manual maintenance is a possible way to promote the current view architecture from unusable to terrible, and allow you to keep it for a lot longer. If adding a method to the collections requires changes in half a dozen places, and if you miss one it all compiles and runs and breaks in the distant future when someone actually uses views, then it's being done wrong and you will never run out of UnsupportedOperationErrors to chase down (and heaven help them if you ever decide to do something better with your time.) Independently of all that, it's still not clear that it's better. If you are naively using views it is very likely the execution model matters. Would you rather see an exception the first time you run it, if the alternative is that your application is 100x slower than it should be? Not everyone is better off with the slow option, by any means. Good doctors know that sometimes life support is not the merciful option. Views should have a DNR order. |
@paulp With the new test cases and the deprecation, isn't that already moving into exactly this direction? Btw, I' curious of the impact on jar file size ... :-) |
As long as the white-knuckled clinging to the view code continues, the only direction I can see it moving in is circles. |
Also, don't expect my approval of anything, this is lose/lose no matter how it goes. Remember that this is a good chunk of the reason I quit, so if you really do fix it now, imagine how that looks from here. |
Views can only ever be a best-effort performance wise given the breadth of |
That's what the test guards against. Looks like we have managed to deprecate Forwarder/Proxy, but otherwise the same approach would work for them. I know you you that, as you wrote the same reflective checks many moons ago. |
Sorry, I thought we were over in #3192, where that test lives. |
PLS REBUILD ALL |
(kitty-note-to-self: ignore 29207338) |
I'm not really sure what to make of that. Here's what's happened:
I don't buy into arguments that we shouldn't fix code incrementally as that weakens that case for removing it altogether. All the more when the effort to fix it is relatively small. |
@retronym Don't take anything I say about this as criticism of you. There's nothing wrong with your choices. You are an enabler to some extent, but so was I, more so and for longer. |
I hadn't seen the test you're talking about yet. It's true, and at the time I wrote that it wouldn't have done any good because it wouldn't have been run before code was checked in. Now things are different, you could integrate it into the PR tests. |
(Sounds like you've found it already, but for the benefit of onlookers) It's this one: https://github.com/scala/scala/pull/3192/files#diff-873f35ad470869e7a7df74e7169f9b56R1 |
While I fully agree that this is a step in the right direction - just to give my two cents: I second Paul's philosophy that something called a I hope that the future redesign Not being a native speaker and a professional scrabble player, I often don't understand Paul's remarks :) I'm not saying that this PR is in some way wrong, it's an iterative step mending the current state of views. In a future redesign, I would consider having them have a different hierarchy of methods more reasonable for their lazyness. For example, we've made the distinction between |
Agreed this is a step in the right direction, and I hope we can address Paul & Alex's very valid concerns by making the deprecation message much stronger. Something along the lines of "Do not call |
Without having tried to do so, my intuition is that we can alert ourselves to the necessity of attending to view performance in addition to keeping them up to date via reflection. Usually problems with view performance are easy to detect (factors of two), and we can use the same mechanism to demand performance tests as we use to demand an implementation so you don't throw runtime errors. |
@adriaanm : @paulp and @axel22 are discussing a broader problem with all views: as they implement, e.g. That said, I don't mind strengthening the wording on the deprecation. I suspect we'll have approximately zero clients of that API, though. We didn't have any in our test suite. To recap: my pair of view PRs don't attempt to solve this problem, but instead:
|
@axel22 Don't worry, native english speakers find me equally inscrutable. Thanks, your statement isn't precisely what I meant but it's close enough and I certainly agree with it. |
PLS REBUILD ALL |
(kitty-note-to-self: ignore 29295187) |
- code that used to be inherited in *View is now inlined - the `view` methods on `ParIteratoa` and `ParSeq` now convert to sequential collections, and are deprecated asking the user to do this explicitly in the future. Should be largely source compatible with 2.10.x, on the assumption that the removed classes, while being public, were internal implementation details. A few tests used now-removed classes to demonstrate compiler crashes. I managed to confirm that after my decoupling, t4365 still exercises the bug: % qbin/scalac test/files/pos/t4365/*.scala warning: there were 2 deprecation warning(s); re-run with -deprecation for details one warning found % scalac-hash 7b4e450 test/files/pos/t4365/*.scala warning: there were 2 deprecation warning(s); re-run with -deprecation for details one warning found % scalac-hash 7b4e450~1 test/files/pos/t4365/*.scala 2<&1 | grep -i wrong error: something is wrong: cannot make sense of type application something is wrong: cannot make sense of type application something is wrong: cannot make sense of type application I didn't manage to do the same for specializes-sym-crash.scala, and instead just made it compile.
thanks for clearing that up, I glossed over a little too fast there |
This one is good to go from my side. I think we have a consensus that we are source compatible enough, the only clients affected are those that referred to the I'd better give it a: PLS REBUILD ALL To make sure it doesn't clash with the recently merged #3192. |
Remove parallel collection views and, with them, Gen*View
This removes one layer from a many-layered implementation.
Maybe that will be enough to help us fix a bug or two.