From b65bf77803033a08f0c2002e4c6397eba6533eca Mon Sep 17 00:00:00 2001 From: Dmitri Gribenko Date: Tue, 1 Mar 2016 17:27:48 -0800 Subject: [PATCH 01/20] Add NNNN-collections-move-indices.md --- proposals/NNNN-collections-move-indices.md | 837 +++++++++++++++++++++ 1 file changed, 837 insertions(+) create mode 100644 proposals/NNNN-collections-move-indices.md diff --git a/proposals/NNNN-collections-move-indices.md b/proposals/NNNN-collections-move-indices.md new file mode 100644 index 0000000000..c9c269e9b4 --- /dev/null +++ b/proposals/NNNN-collections-move-indices.md @@ -0,0 +1,837 @@ +# New collections model + +* Proposal: [SE-NNNN](https://github.com/apple/swift-evolution/blob/master/proposals/NNNN-collections-move-indices.md) +* Author(s): [Dmitri Gribenko](https://github.com/gribozavr), [Dave Abrahams](https://github.com/dabrahams), [Maxim Moiseev](https://github.com/moiseev) +* Status: **Awaiting review** +* Review manager: TBD + +## Introduction + +We are proposing a new model for collections, where indices can only be +advanced forward or backward by the corresponding collection instance. +Indices become opaque tokens representing collection positions, that can +be produced and consumed by collection APIs. This allows us to reduce +the amount of data stored in indices to the bare minimum. + +Compared to the current state, the new scheme simplifies implementation +of non-trivial indices, and fixes concurrency issues in `Set` and +`Dictionary` indices. It also allows us to eliminate reference-counted +stored properties from most indices, including non-trivial ones, like +`Set.Index` and `Dictionary.Index`, creating more optimizable code. + +Out of scope for this proposal: + +* Expanding the set of concrete collections provided by the standard + library. + +* Expanding the set of collection protocols to provide functionality + beyond what is already provided (for example, protocols for sorted + collections, queues etc.) Discussing how other concrete collections + fit into the current protocol hierarchy is in scope, though. + +Swift-evolution thread: [link to the discussion thread for that proposal](https://lists.swift.org/pipermail/swift-evolution) + +## Motivation + +Swift standard library defines one basic `Collection` protocol, and +three kinds of collection indices: forward, bidirectional and random +access. A concrete collection type uses one of these indices based on +the capabilities of the backing data structure. For example, a +singly-linked list can only have forward indices, a tree with parent +pointers has bidirectional indices, while `Array` and `Deque` have +random access indices. + +In our experience implementing non-trivial collections it turned out +that instances of forward and bidirectional indices need to hold a +reference to the storage of the collection that they traverse, in order +to implement `successor()` and `predecessor()` APIs. For example, +imagine an index into a hashtable with open addressing: in order to find +the next element after a given one, we need to inspect the storage +contents to skip empty buckets in the hashtable. + +Indices that store references to underlying collection's storage have +disadvantages: + +* Code that handles indices has to perform reference counting, which + blocks some optimizations, and definitely means more work at runtime. + +* Indices that keep references to collections' storage conflict with + copy-on-write. A live index makes underlying storage non-uniquely + referenced, forcing unnecessary copies when the collection is mutated. + +In the standard library, `Dictionary` and `Set` have to use a +double-indirection trick to avoid forcing a copy of the storage when +there is a live index. Unfortunately, this trick is not thread-safe: it +allows a data race when an index is incremented concurrently with the +collection being mutated. + +Our experience working with the current collection protocols suggests +that we should consider other schemes that don't require such +double-indirection tricks or other undue performance burdens. + +### Why Dictionary and Set indices are not thread-safe + +`Dictionary` and `Set` use a double-indirection trick to avoid +disturbing the reference count of the storage with indices. + +``` + +--+ class struct + |RC|---------+ +-----------------+ + +--+ Storage |<---------| DictionaryIndex | + | | | value | + +----------+ +-----------------+ + ^ + +--+ | class struct + |RC|-------------+ +------------+ + +--+ Indirection |<-------| Dictionary | + | ("owner") | | value | + +--------------+ +------------+ +``` + +Instances of `Dictionary` point to an indirection, while instances of +`DictionaryIndex` point to the storage itself. This allows us to have +two separate reference counts. One of the refcounts tracks just the +live `Dictionary` instances, which allows us to perform precise +uniqueness checks. + +The issue that we were previously unaware of is that this scheme is not +thread-safe. When uniquely-referenced storage is being mutated in +place, indices can be concurrently being incremented (on a different +thread). This would be a read/write data race. + +Fixing this data race (to provide memory safety) would require locking +dictionary storage on every access, which would be an unacceptable +performance penalty. + +## Proposed solution + +We propose to allow implementing collections whose indices don't have +reference-countable stored properties. + +From the API standpoint, this implies that indices can't be moved +forward or backward by themselves (for example, calling `i.successor()` +is not possible anymore for an arbitrary index). Only the corresponding +collection instance can advance indices (e.g., `c.next(i)`). By +removing API requirements from indices, we reduce the required amount of +information that indices need to store or reference. + +In this model indices can store the minimal amount of information only +about the element position in the collection. Usually index can be +represented a few of word-sized integers (typically, just one) that +efficiently encode the "path" in the data structure from the root to the +element. Since one is free to choose the encoding of the "path", we +think that it should be possible to choose it in such a way that indices +are cheaply comparable. + +### Protocol hierarchy + +In the proposed model indices don't have any method or property +requirements (these APIs were moved to Collection), so index protocols +are eliminated. Instead, we are introducing `BidirectionalCollection` +and `RandomAccessCollection`. These protocols naturally compose with +existing `MutableCollection` and `RangeReplaceableCollection` to +describe the collection's capabilities: + +```swift +protocol Sequence { ... } +protocol Collection : Sequence { ... } + +protocol MutableCollection : Collection { ... } +protocol RangeReplaceableCollection : Collection { ... } + +protocol BidirectionalCollection : Collection { ... } // new +protocol RandomAccessCollection : BidirectionalCollection { ... } // new +``` + +``` + Sequence + ^ + | + + + Collection + ^ ^ + | +--------+ + BidirectionalCollection | | + ^ | MutableCollection + | | + | | + RandomAccessCollection RangeReplaceableCollection +``` + +### Analysis + +Advantages of the proposed model: + +* Indices don't need to keep a reference to the collection. + - Indices are simpler to implement. + - Indices are not reference-countable, and thus cheaper to + handle. + - Handling indices does not cause refcounting, and does not block + optimizations. + +* The hierarchy of index protocols is removed, and is replaced with + protocols for forward, bidirectional and random-access + collections. + - This is closer to how people generally talk about collections. + - Writing a generic constraint for bidirectional and + random-access collections becomes simpler. + +* Indices can conform to `Comparable` without incurring extra + memory overhead. Indices need to store all the necessary data + anyway. + + This allows, for example, to relax the precondition on the + `distance(from:to:)` method: even for forward collections, it is now + possible to measure the distance from a later index to an earlier + index. + +* `Dictionary` and `Set` indices (and other non-trivial indices) can't + create opportunities for data races. + +* While this model allows to design indices that are not + reference-countable, it does not prohibit defining indices that + *are* reference countable. + - All existing collection designs are still possible to + implement, but some are less efficient than the new model allows. + - If there is a specialized collection that needs + reference-countable indices for algorithmic or memory + efficiency, where such tradeoff is reasonable, such a + collection is still possible to implement in the new model. + See the discussion of trees below, the tree design (2)(c). + +Neutral as compared to the current collections: + +* A value-typed linked list still can't conform to `Collection`. + A reference-typed one can. + +Disadvantages of the proposed collections model: + +* Advancing an index forward or backward becomes harder -- the statement + now includes two entities (collection and index): + + ``` + j = c.next(i) vs. j = i.successor() + ``` + + In practice though, we found that when the code is performing index + manipulations, the collection is typically still around stored in a + variable, so the code does not need to reach out for it in a + non-trivial way. + +* Collection's API now includes methods for advancing indices. + +### Implementation difficulties + +We have a couple of implementation difficulties that will be solved with +further improvements to the compiler or the generics system. These +issues will cause the new API to be suboptimal in the short term, but as +the necessary compiler features will be implemented, the API will +improve. We don't consider any of these issues to be blockers to adopt +the proposed collections model. + +* `Range` has conflicting requirements: + + - range bounds need to be comparable and incrementable, in order + for `Range` to conform to `Collection`, + + - we frequently want to use `Range` as a "transport" data type, just + to carry a pair of indices around (for example, as an argument for + `removeSubrange(_:)`). Indices are neither comparable nor + incrementable. + + Solution: add conditional a conformance for `Range` to `Collection` + when the bounds conform to `Strideable`. We don't have this compiler + feature now. As a workaround, we will introduce a parallel type, + `StrideableRange`. + +2. We can't specify constraints on associated types. This forces many + algorithms to specify constraints that should be implied by + `Sequence` or `Collection` conformances. + + Solution: constraints on associated types are a desirable + language feature, part of the Swift generics model. This issue + will be fixed by compiler improvements. + +## Case study: trees + +Trees are very interesting data structures with many unique +requirements. We are interested in allowing efficient and memory-safe +implementations of collections based on a search trees (e.g., RB trees +or B-trees). The specific requirements are as follows. + +- The collection and indices should be memory-safe. They should provide + good QoI in the form of precondition traps. Ensuring memory-safety + shouldn't cause unreasonable performance or memory overhead. + +- Collection instances should be able to share nodes on mutation + (persistent data structures). + +- Subscript on an index should cost at worst amortized O(1). + +- Advancing an index to the next or previous position should cost + at worst amortized O(1). + +- Indices should not contain reference countable stored properties. + +- Mutating or deleting an element in the collection should not + invalidate indices pointing at other elements. + + This design constraint needs some extra motivation, because it might + not be obvious. Preserving index validity across mutation is + important for algorithms that iterate over the tree and mutate it in + place, for example, removing a subrange of elements between two + indices, or removing elements that don't satisfy a predicate. When + implementing such an algorithm, you would typically have an index that + points to the current element. You can copy the index, advance it, + and then remove the previous element using its index. If the mutation + of the tree invalidates all indices, it is not possible to continue + the iteration. Thus, it is desired to invalidate just one index for + the element that was deleted. + +It is not possible to satisfy all of these requirements at the same +time. Designs that cover some of the requirements are possible. + +1. Persistent trees with O(log n) subscripting and advancing, and strict + index invalidation. + + If we choose to make a persistent data structure with node reuse, + then the tree nodes can't have parent pointers (a node can have + multiple different parents in different trees). This means that it + is not possible to advance an index in O(1). If we need to go up the + tree while advancing the index, without parent pointers we would need + to traverse the tree starting from the root in O(log n). + + Thus, index has to essentially store a path through the tree from the + root to the node (it is usually possible to encode this path in a + 64-bit number). Since the index stores the path, subscripting on + such an index would also cost O(log n). + + We should note that persistent trees typically use B-trees, so the + base of the logarithm would be typically large (e.g., 32). We also + know that the size of the RAM is limited. Thus, we could treat the + O(log n) complexity as effectively constant for all practical + purposes. But the constant factor will be much larger than in other + designs. + + Swift's collection index model does not change anything as compared + to other languages. The important point is that the proposed index + model allows such implementations of persistent collections. + +2. Trees with O(1) subscripting and advancing. + + If we want subscripting to be O(1), then the index has to store a + pointer to a tree node. Since we want avoid reference countable + properties in indices, the node pointer should either be + `unsafe(unowned)` or an `UnsafePointer`. These pointers can't be + dereferenced safely without knowing in advance that it is safe to do + so. We need some way to quickly check if it is safe to dereference + the pointer stored in the node. + + A pointer to a tree node can become invalid when the node was + deallocated. A tree node can be deallocated if the corresponding + element is removed from the tree. + + (a) Trees with O(1) subscripting and advancing, and strict index + invalidation. + + One simple way to perform the safety check when + dereferencing the unsafe pointer stored within the index + would be to detect any tree mutation between the index + creation and index use. It is simple to do with version + numbers: we add an ID number to every tree. This ID would + be unique among all trees created within the process, and it + would be re-generated on every mutation. The tree ID is + copied into every index. When the index is used with the + collection, we check that the ID of the tree matches the + tree ID stored in the index. This fast check ensures memory + safety. + + (b) Trees with O(1) subscripting and advancing, permissive index + invalidation, and extra storage to ensure memory safety. + + Another way to perform the safety check would be to directly + check if the unsafe pointer stored in the index is actually + linked into the tree. To do that with acceptable time + complexity, we would need to have an extra data structure + for every tree, for example, a hash table-based set of all + node pointers. With this design, all index operations get + an O(1) hit for the hash table lookup for the safety check, + but we get an O(log n) memory overhead for the extra data + structure. On the other hand, a tree already has O(log n) + memory overhead for allocating nodes themselves, thus the + extra data structure would only increase the constant factor + on memory overhead. + +| | (1) | (2)(a) | (2)(b) | +| -------------------------------------|----------|---------|------- | +| Memory-safe | Yes | Yes | Yes | +| Indices are not reference-counted | Yes | Yes | Yes | +| Shares nodes | Yes | No | No | +| Subscripting on an index | O(log n) | O(1) | O(1) | +| Advancing an index | O(log n) | O(1) | O(1) | +| Deleting does not invalidate indices | No | No | Yes | +| Extra O(n) storage just for safety checks | No | No | Yes | + +Each of the designs discussed above has its uses, but the intuition +is that (2)(a) is the one most commonly needed in practice. (2)(a) +does not have the desired index invalidation properties. There is +a small number of commonly used algorithms that require that +property, and they can be provided as methods on the collection, +for example `removeAll(in: Range)` and +`removeAll(_: (Element)->Bool)`. + +If we were to allow reference-counted indices (basically, the +current collections model), then an additional design is possible +-- let's call it (2)(c) for the purpose of discussion. This design +would be like (2)(b), but won't require extra storage that is used +only for safety checks. Instead, every index would pay a RC +penalty and carry a strong reference to the tree node. + +Note that (2)(c) is still technically possible to implement in the +new collection index model, it just goes against a goal of having +indices free of reference-counted stored properties. + +## Collection APIs + +This is a rough sketch of the API. + +There are a few API details that we can't express in Swift today because of the +missing compiler features, or extra APIs that only exist as workarounds. It is +important to see what they are to understand the grand plan, so we marked them +with `FIXME(compiler limitation)` below. + +There are a few minor open questions that are marked with `FIXME(design)`. + +```swift +// No changes from the current scheme. +public protocol IteratorProtocol { + associatedtype Element + mutating func next() -> Element? +} + +// No changes from the current scheme. +public protocol Sequence { + associatedtype Iterator : IteratorProtocol + + /// A type that represents a subsequence of some of the elements. + associatedtype SubSequence + // FIXME(compiler limitation): + // associatedtype SubSequence : Sequence + // where Iterator.Element == SubSequence.Iterator.Element + + func makeIterator() -> Iterator + + // Defaulted requirements, algorithms. + + @warn_unused_result + func map( + @noescape transform: (Generator.Element) throws -> T + ) rethrows -> [T] + + @warn_unused_result + func dropFirst(n: Int) -> SubSequence + + // Other algorithms have been omitted for brevity since their signatures + // didn't change. +} + +/* +FIXME(compiler limitation): we can't write this extension now because +concrete typealiases in extensions don't work well. + +extension Sequence { + /// The type of elements that the sequence contains. + /// + /// It is just a shorthand to simplify generic constraints. + typealias Element = Iterator.Element +} +*/ + +// `Indexable` protocol is an unfortunate workaround for a compiler limitation. +// Without this workaround it is not possible to implement `IndexingIterator`. +// +// FIXME(compiler limitation): remove `Indexable`, and move all of its +// requirements to `Collection`. +public protocol Indexable { + /// A type that represents a valid position in the collection. + /// + /// Valid indices consist of the position of every element and a + /// "past the end" position that's not valid for use as a subscript. + associatedtype Index : Comparable + + associatedtype _Element + var startIndex: Index { get } + var endIndex: Index { get } + + /// Returns the element at the given `position`. + /// + /// - Complexity: O(1). + subscript(position: Index) -> _Element { get } + + /// Returns the next consecutive index after `i`. + /// + /// - Precondition: `self` has a well-defined successor. + /// + /// - Complexity: O(1). + @warn_unused_result + func next(i: Index) -> Index + + /// Equivalent to `i = self.next(i)`, but can be faster because it + /// does not need to create a new index. + /// + /// - Precondition: `self` has a well-defined successor. + /// + /// - Complexity: O(1). + func _nextInPlace(i: inout Index) + // This method has a default implementation. + + /// Traps if `index` is not in `bounds`, or performs + /// a no-op if such a check is not possible to implement in O(1). + /// + /// Use this method to implement cheap fail-fast checks in algorithms + /// and wrapper data structures to bring the failure closer to the + /// source of the bug. + /// + /// Do not use this method to implement checks that guarantee memory + /// safety. This method is allowed to be implemented as a no-op. + /// + /// - Complexity: O(1). + func _failEarlyRangeCheck(index: Index, inBounds: Range) + // This method has a default implementation. + // FIXME(design): this API can be generally useful, maybe we should + // de-underscore it, and make it a public API? + + /// Traps if `subRange` is not in `bounds`, or performs + /// a no-op if such a check is not possible to implement in O(1). + /// + /// Use this method to implement cheap fail-fast checks in algorithms + /// and wrapper data structures to bring the failure closer to the + /// source of the bug. + /// + /// Do not use this method to implement checks that guarantee memory + /// safety. This method is allowed to be implemented as a no-op. + /// + /// - Complexity: O(1). + func _failEarlyRangeCheck(subRange: Range, inBounds: Range + + /// A type that represents a slice of some of the elements. + associatedtype SubSequence : Sequence, Indexable = Slice + // FIXME(compiler limitation): + // associatedtype SubSequence : Collection + // where + // Iterator.Element == SubSequence.Iterator.Element, + // SubSequence.SubSequence == SubSequence + // + // These constraints allow processing collections in generic code by + // repeatedly slicing them in a loop. + + /// A signed integer type that can represent the number of steps between any + /// two indices. + associatedtype IndexDistance : SignedIntegerType = Int + + /// Returns the result of advancing `i` by `n` positions. + /// + /// - Returns: + /// - If `n > 0`, the result of applying `next` to `i` `n` times. + /// - If `n < 0`, the result of applying `previous` to `i` `-n` times. + /// - Otherwise, `i`. + /// + /// - Precondition: `n >= 0` if `self` only conforms to `Collection`. + /// - Complexity: + /// - O(1) if `self` conforms to `RandomAccessCollection`. + /// - O(`abs(n)`) otherwise. + @warn_unused_result + func advance(i: Index, by n: IndexDistance) -> Index + // This method has a default implementation. + + /// Returns the result of advancing `i` by `n` positions or until it reaches + /// `limit`. + /// + /// - Returns: + /// - If `n > 0`, the result of applying `next` to `i` `n` times, + /// but not past `limit`. + /// - If `n < 0`, the result of applying `previous` to `i` `-n` times, + /// but not past `limit`. + /// - Otherwise, `i`. + /// + /// - Precondition: `n >= 0` if `self` only conforms to `Collection`. + /// - Complexity: + /// - O(1) if `self` conforms to `RandomAccessCollection`. + /// - O(`abs(n)`) otherwise. + @warn_unused_result + func advance(i: Index, by n: IndexDistance, limit: Index) -> Index + // This method has a default implementation. + + /// Measure the distance between `start` and `end`. + /// + /// - Precondition: `start` and `end` are valid indices into this + /// collection. + /// + /// - Complexity: + /// - O(1) if `self` conforms to `RandomAccessCollection`. + /// - O(`abs(n)`) otherwise, where `n` is the function's result. + @warn_unused_result + func distance(from start: Index, to end: Index) -> IndexDistance + // This method has a default implementation. + + /// A type for the collection of indices for this collection. + /// + /// An instance of `Indices` can hold a strong reference to the collection + /// itself, causing the collection to be non-uniquely referenced. If you + /// need to mutate the collection while iterating over its indices, use the + /// `next()` method to produce indices instead. + associatedtype Indices : Sequence, Indexable = DefaultCollectionIndices + // FIXME(compiler limitation): + // associatedtype Indices : Collection + // where + // Indices.Iterator.Element == Index, + // Indices.Index == Index, + // Indices.SubSequence == Indices + + /// A collection of indices for this collection. + var indices: IndexRange { get } + // This property has a default implementation. + + /// Returns the number of elements. + /// + /// - Complexity: O(1) if `self` conforms to `RandomAccessCollection`; + /// O(N) otherwise. + var count: IndexDistance { get } + // This property has a default implementation. + + /// Returns the element at the given `position`. + /// + /// - Complexity: O(1). + subscript(position: Index) -> Generator.Element { get } + + /// - Complexity: O(1). + subscript(bounds: Range) -> SubSequence { get } + // This property has a default implementation. + + // Other algorithms (including `first`, `isEmpty`, `index(of:)`, `sorted()` + // etc.) have been omitted for brevity since their signatures didn't change. +} + +/// A collection whose indices can be advanced backwards. +public protocol BidirectionalCollection : Collection { + // FIXME(compiler limitation): + // associatedtype SubSequence : BidirectionalCollection + + // FIXME(compiler limitation): + // associatedtype Indices : BidirectionalCollection + + /// Returns the previous consecutive index before `i`. + /// + /// - Precondition: `self` has a well-defined predecessor. + /// + /// - Complexity: O(1). + @warn_unused_result + func previous(i: Index) -> Index + + /// Equivalent to `i = self.previous(i)`, but can be faster because it + /// does not need to create a new index. + /// + /// - Precondition: `self` has a well-defined predecessor. + /// + /// - Complexity: O(1). + func _previousInPlace(i: inout Index) + // This method has a default implementation. + + // Other algorithms (including `last`, `popLast(_:)`, `dropLast(_:)`, + // `suffix(_:)` etc.) have been omitted for brevity since their + // signatures didn't change. +} + +/// A collection whose indices can be advanced by an arbitrary number of +/// positions in O(1). +public protocol RandomAccessCollection : BidirectionalCollection { + // FIXME(compiler limitation): + // associatedtype SubSequence : RandomAccessCollection + + // FIXME(compiler limitation): + // associatedtype Indices : RandomAccessCollection + + associatedtype Index : Strideable + // FIXME(compiler limitation): where Index.Distance == IndexDistance + // FIXME(design): does this requirement to conform to `Strideable` + // limit possible collection designs? +} + +public protocol MyMutableCollectionType : MyForwardCollectionType { + associatedtype SubSequence : Collection = MutableSlice + // FIXME(compiler limitation): + // associatedtype SubSequence : MutableCollection + + /// Access the element at `position`. + /// + /// - Precondition: `position` indicates a valid position in `self` and + /// `position != endIndex`. + /// + /// - Complexity: O(1) + subscript(i: Index) -> Generator.Element { get set } + + /// Returns a collection representing a contiguous sub-range of + /// `self`'s elements. + /// + /// - Complexity: O(1) for the getter, O(`bounds.count`) for the setter. + subscript(bounds: Range) -> SubSequence { get set } + // This subscript has a default implementation. +} + +// No changes from the current scheme. +public protocol RangeReplaceableCollection : Collection { ... } +``` + +## Impact on existing code + +Code that **does not need to change**: + +* Code that works with `Array`, `ArraySlice`, `ContiguousArray`, their + indices (`Int`s), performs index arithmetic. + +* Code that operates on arbitrary collections and indices (on concrete + instances or in generic context), but does not advance indices + +* Iteration over collection's indices with `c.indices` does not change + when: + - the underlying collection is not mutated, + - or it is known that `c.indices` is trivial (for example, in + `Array`), + - or when performance is not a concern: + + ```swift + // No change, because 'c' is not mutated, only read. + for i in c.indices { + print(i, c[i]) + } + + // No change, because Array's index collection is trivial and does not + // hold a reference to Array's storage. There is no performance + // impact. + for i in myArray.indices { + c[i] *= 2 + } + + // No change, because 'c' is known to be small, and doing a copy on + // the first loop iteration is acceptable. + for i in c.indices { + c[i] *= 2 + } + ``` + +* APIs of high-level collection algorithms don't change, even for + algorithms that accept indices as parameters or return indices (e.g., + `index(of:)`, `min()`, `sort()`, `prefix()`, `prefix(upTo:)` etc.) + +Code that **needs to change**: + +* Code that advances indices (`i.successor()`, `i.predecessor()`, + `i.advanced(by:)` etc.) or calculates distances between indices + (`i.distance(to:)`) now needs to call a method on the collection + instead. + + ```swift + // Before: + var i = c.indexOf { $0 % 2 == 0 } + i = i.successor() + print(c[i]) + + // After: + var i = c.indexOf { $0 % 2 == 0 } // No change in algorithm API. + i = c.next(i) // Advancing an index requires a collection instance. + print(c[i]) // No change in subscripting. + ``` + + The transformation from `i.successor()` to `c.next(i)` is non-trivial. + Performing it correctly requires knowing extra information -- how to + get the corresponding collection. In the general case, it is not + possible to perform this migration automatically. In some cases, a + sophisticated migrator could handle the easy cases. + +* Custom collection implementations need to change. A simple fix would + be to just move the the methods from indices to collections to satisfy + new protocol requirements. This is a more or less mechanical fix that + does not require design work. This fix would allow the code to + compile and run. + + In order to take advantage of the performance improvements in + the new model, and remove reference-counted stored properties from + indices, the representation of the index might need to be redesigned. + + Implementing custom collections, as compared to using collections, is + a niche case. We believe that for custom collection types it is + sufficient to provide clear steps for manual migration that don't + require a redesign. Implementing this in an automated migrator might + be possible, but would be a heroic migration for a rare case. + +## Implementation status + +Since this is a large change to the collection model, to evaluate it, we have +prepared [a prototype +implementation](https://github.com/apple/swift/blob/master/test/Prototypes/CollectionsMoveIndices.swift). +Shawn Erickson and Austin Zheng have been working on re-implementing the +prototype in the core library on the [swift-3-indexing-model +branch](https://github.com/apple/swift/tree/swift-3-indexing-model). + +The prototype and this proposal have one major difference around the use +of unowned references. In the prototype, we are suggesting to use +unowned references to implement `c.indices`. Unfortunately, we realized +that this design can create an unpredictable user model. For example: + +```swift +let c = getCollection() +for i in c.indices { + if i.isRed { + print(c[i]) + } + print("no") +} +``` + +If ARC can prove that the collection does contain indices where +`i.isRed` is true, then it can deinit the collection right after +constructing the index collection, before starting the loop. Then, the +first time we need to advance the index, dereferencing the unowned +reference to the collection storage (which is gone now) will cause a +trap. + +## Alternatives considered + +We considered index-less models, for example, [D's +std.range](https://dlang.org/library/std/range.html) (see also [On +Iteration by Andrei +Alexandrescu](http://www.informit.com/articles/printerfriendly/1407357)). +Ranges work well for reference-typed collections, but it is not clear +how to adjust the concept of D's range (similar to `Slice` in Swift) for +mutable value-typed collections. In D, you process a collection by +repeatedly slicing off elements. Once you have found an element that +you would like to mutate, it is not clear how to actually change the +original collection, if the collection and its slice are value types? + From c3bdea36e73f6abc12a52a87cbdc263df17036bf Mon Sep 17 00:00:00 2001 From: Dave Abrahams Date: Thu, 3 Mar 2016 08:15:29 -0800 Subject: [PATCH 02/20] Add missing word --- proposals/NNNN-collections-move-indices.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/NNNN-collections-move-indices.md b/proposals/NNNN-collections-move-indices.md index c9c269e9b4..59f9872f12 100644 --- a/proposals/NNNN-collections-move-indices.md +++ b/proposals/NNNN-collections-move-indices.md @@ -815,7 +815,7 @@ for i in c.indices { } ``` -If ARC can prove that the collection does contain indices where +If ARC can prove that the collection does not contain indices where `i.isRed` is true, then it can deinit the collection right after constructing the index collection, before starting the loop. Then, the first time we need to advance the index, dereferencing the unowned From d9d57467cd17fa2fc2390ef52dd7e86af10d2da0 Mon Sep 17 00:00:00 2001 From: Dave Abrahams Date: Thu, 7 Apr 2016 15:48:33 -0700 Subject: [PATCH 03/20] New indexing model revisions work --- proposals/NNNN-collections-move-indices.md | 170 +++++++++++---------- 1 file changed, 88 insertions(+), 82 deletions(-) diff --git a/proposals/NNNN-collections-move-indices.md b/proposals/NNNN-collections-move-indices.md index 59f9872f12..47d2074c45 100644 --- a/proposals/NNNN-collections-move-indices.md +++ b/proposals/NNNN-collections-move-indices.md @@ -1,107 +1,81 @@ # New collections model * Proposal: [SE-NNNN](https://github.com/apple/swift-evolution/blob/master/proposals/NNNN-collections-move-indices.md) -* Author(s): [Dmitri Gribenko](https://github.com/gribozavr), [Dave Abrahams](https://github.com/dabrahams), [Maxim Moiseev](https://github.com/moiseev) +* Author(s): [Dmitri Gribenko](https://github.com/gribozavr), + [Dave Abrahams](https://github.com/dabrahams), + [Maxim Moiseev](https://github.com/moiseev) +* [Swift-evolution thread](http://news.gmane.org/find-root.php?message_id=CA%2bY5xYfqKR6yC2Q%2dG7D9N7FeY%3dxs1x3frq%3d%3dsyGoqYpOcL9yrw%40mail.gmail.com) * Status: **Awaiting review** * Review manager: TBD -## Introduction +## Summary -We are proposing a new model for collections, where indices can only be -advanced forward or backward by the corresponding collection instance. -Indices become opaque tokens representing collection positions, that can -be produced and consumed by collection APIs. This allows us to reduce -the amount of data stored in indices to the bare minimum. +We propose a new model for `Collection`s wherein responsibility for +index traversal is moved from the index to the collection itself. For +example, instead of writing `i.successor()`, one would write +`c.successor(i)`. We also propose the following changes as a +consequence of the new model: -Compared to the current state, the new scheme simplifies implementation -of non-trivial indices, and fixes concurrency issues in `Set` and -`Dictionary` indices. It also allows us to eliminate reference-counted -stored properties from most indices, including non-trivial ones, like -`Set.Index` and `Dictionary.Index`, creating more optimizable code. - -Out of scope for this proposal: - -* Expanding the set of concrete collections provided by the standard - library. - -* Expanding the set of collection protocols to provide functionality - beyond what is already provided (for example, protocols for sorted - collections, queues etc.) Discussing how other concrete collections - fit into the current protocol hierarchy is in scope, though. - -Swift-evolution thread: [link to the discussion thread for that proposal](https://lists.swift.org/pipermail/swift-evolution) +* A collection's `Index` can be any `Comparable` type. +* The distinction between intervals and ranges disappears, leaving + only ranges. +* A closed range that includes the maximal value of its `Bound` type + is now representable and does not trap. +* Make existing “private” in-place index traversal methods available + publicly. ## Motivation -Swift standard library defines one basic `Collection` protocol, and -three kinds of collection indices: forward, bidirectional and random -access. A concrete collection type uses one of these indices based on -the capabilities of the backing data structure. For example, a -singly-linked list can only have forward indices, a tree with parent -pointers has bidirectional indices, while `Array` and `Deque` have -random access indices. - -In our experience implementing non-trivial collections it turned out -that instances of forward and bidirectional indices need to hold a -reference to the storage of the collection that they traverse, in order -to implement `successor()` and `predecessor()` APIs. For example, -imagine an index into a hashtable with open addressing: in order to find -the next element after a given one, we need to inspect the storage -contents to skip empty buckets in the hashtable. - -Indices that store references to underlying collection's storage have -disadvantages: +In collections that don't support random access, (string views, sets, +dictionaries, trees, etc.) it's very common that deriving one index +value from another requires somehow inspecting the collection's data. +For example, you could represent an index into a hash table as an +offset into the underlying storage, except that one needs to actually +look at *structure* of the hash table to reach the next bucket. In +the current model, supporting `i.successor()` means that the index +must additionally store not just an offset, but a reference to the +collection's structure. + +The consequences for performance aren't pretty: * Code that handles indices has to perform reference counting, which blocks some optimizations, and definitely means more work at runtime. -* Indices that keep references to collections' storage conflict with - copy-on-write. A live index makes underlying storage non-uniquely - referenced, forcing unnecessary copies when the collection is mutated. +* Indices that keep references to collections' storage block the + copy-on-write optimization. A live index makes underlying storage + non-uniquely referenced, forcing unnecessary copies when the + collection is mutated. In the standard library, `Dictionary` and + `Set` use a double-indirection trick to work around this issue. + Unfortunately, even this trick is not a solution, because (as we + have just realized) it isn't threadsafe. [^1] -In the standard library, `Dictionary` and `Set` have to use a -double-indirection trick to avoid forcing a copy of the storage when -there is a live index. Unfortunately, this trick is not thread-safe: it -allows a data race when an index is incremented concurrently with the -collection being mutated. +By giving responsibility for traversal to the collection, we ensure +that operations that need the collection's structure always have it, +without the costs of holding references in indices. -Our experience working with the current collection protocols suggests -that we should consider other schemes that don't require such -double-indirection tricks or other undue performance burdens. +## Other Benefits -### Why Dictionary and Set indices are not thread-safe +Although this change is primarily motivated by performance, it has +other significant benefits: -`Dictionary` and `Set` use a double-indirection trick to avoid -disturbing the reference count of the storage with indices. +* Simplifies implementation of non-trivial indices. +* Allows us to eliminate the `Range`/`Interval` distinction. +* Makes it feasible to fix existing concurrency issues in `Set` and + `Dictionary` indices. +* Allows `String` views to share a single index type, letting us + eliminate the need for cumbersome index conversion functions (not + part of this proposal, but planned). -``` - +--+ class struct - |RC|---------+ +-----------------+ - +--+ Storage |<---------| DictionaryIndex | - | | | value | - +----------+ +-----------------+ - ^ - +--+ | class struct - |RC|-------------+ +------------+ - +--+ Indirection |<-------| Dictionary | - | ("owner") | | value | - +--------------+ +------------+ -``` +## Out of Scope -Instances of `Dictionary` point to an indirection, while instances of -`DictionaryIndex` point to the storage itself. This allows us to have -two separate reference counts. One of the refcounts tracks just the -live `Dictionary` instances, which allows us to perform precise -uniqueness checks. +This proposal intentionally does not: -The issue that we were previously unaware of is that this scheme is not -thread-safe. When uniquely-referenced storage is being mutated in -place, indices can be concurrently being incremented (on a different -thread). This would be a read/write data race. - -Fixing this data race (to provide memory safety) would require locking -dictionary storage on every access, which would be an unacceptable -performance penalty. +* Expand the set of concrete collections provided by the standard + library. +* Expand the set of collection protocols to provide functionality + beyond what is already provided (for example, protocols for sorted + collections, queues etc.) Discussing how other concrete collections + fit into the current protocol hierarchy is in scope, though. ## Proposed solution @@ -835,3 +809,35 @@ repeatedly slicing off elements. Once you have found an element that you would like to mutate, it is not clear how to actually change the original collection, if the collection and its slice are value types? +[^1]: `Dictionary` and `Set` use a double-indirection trick to avoid + disturbing the reference count of the storage with indices. + + ``` + +--+ class struct + |RC|---------+ +-----------------+ + +--+ Storage |<---------| DictionaryIndex | + | | | value | + +----------+ +-----------------+ + ^ + +--+ | class struct + |RC|-------------+ +------------+ + +--+ Indirection |<-------| Dictionary | + | ("owner") | | value | + +--------------+ +------------+ + ``` + + Instances of `Dictionary` point to an indirection, while + instances of `DictionaryIndex` point to the storage itself. + This allows us to have two separate reference counts. One of + the refcounts tracks just the live `Dictionary` instances, which + allows us to perform precise uniqueness checks. + + The issue that we were previously unaware of is that this scheme + is not thread-safe. When uniquely-referenced storage is being + mutated in place, indices can be concurrently being incremented + (on a different thread). This would be a read/write data race. + + Fixing this data race (to provide memory safety) would require + locking dictionary storage on every access, which would be an + unacceptable performance penalty. + From bae3357d3de645e60a7226dbb3ae110aac07bef3 Mon Sep 17 00:00:00 2001 From: Dave Abrahams Date: Thu, 7 Apr 2016 15:50:02 -0700 Subject: [PATCH 04/20] Markdown formatting fix --- proposals/NNNN-collections-move-indices.md | 62 +++++++++++----------- 1 file changed, 32 insertions(+), 30 deletions(-) diff --git a/proposals/NNNN-collections-move-indices.md b/proposals/NNNN-collections-move-indices.md index 47d2074c45..11b0d09abb 100644 --- a/proposals/NNNN-collections-move-indices.md +++ b/proposals/NNNN-collections-move-indices.md @@ -809,35 +809,37 @@ repeatedly slicing off elements. Once you have found an element that you would like to mutate, it is not clear how to actually change the original collection, if the collection and its slice are value types? +---- + [^1]: `Dictionary` and `Set` use a double-indirection trick to avoid - disturbing the reference count of the storage with indices. - - ``` - +--+ class struct - |RC|---------+ +-----------------+ - +--+ Storage |<---------| DictionaryIndex | - | | | value | - +----------+ +-----------------+ - ^ - +--+ | class struct - |RC|-------------+ +------------+ - +--+ Indirection |<-------| Dictionary | - | ("owner") | | value | - +--------------+ +------------+ - ``` - - Instances of `Dictionary` point to an indirection, while - instances of `DictionaryIndex` point to the storage itself. - This allows us to have two separate reference counts. One of - the refcounts tracks just the live `Dictionary` instances, which - allows us to perform precise uniqueness checks. - - The issue that we were previously unaware of is that this scheme - is not thread-safe. When uniquely-referenced storage is being - mutated in place, indices can be concurrently being incremented - (on a different thread). This would be a read/write data race. - - Fixing this data race (to provide memory safety) would require - locking dictionary storage on every access, which would be an - unacceptable performance penalty. +disturbing the reference count of the storage with indices. + +``` + +--+ class struct + |RC|---------+ +-----------------+ + +--+ Storage |<---------| DictionaryIndex | + | | | value | + +----------+ +-----------------+ + ^ + +--+ | class struct + |RC|-------------+ +------------+ + +--+ Indirection |<-------| Dictionary | + | ("owner") | | value | + +--------------+ +------------+ +``` + +Instances of `Dictionary` point to an indirection, while +instances of `DictionaryIndex` point to the storage itself. +This allows us to have two separate reference counts. One of +the refcounts tracks just the live `Dictionary` instances, which +allows us to perform precise uniqueness checks. + +The issue that we were previously unaware of is that this scheme +is not thread-safe. When uniquely-referenced storage is being +mutated in place, indices can be concurrently being incremented +(on a different thread). This would be a read/write data race. + +Fixing this data race (to provide memory safety) would require +locking dictionary storage on every access, which would be an +unacceptable performance penalty. From 8e4bebd21d8f5cb08890ec1cde80dbbe194fa576 Mon Sep 17 00:00:00 2001 From: Dave Abrahams Date: Fri, 8 Apr 2016 13:51:56 -0700 Subject: [PATCH 05/20] More swift 3 indexing model changes --- proposals/NNNN-collections-move-indices.md | 152 ++++++++++++++------- 1 file changed, 104 insertions(+), 48 deletions(-) diff --git a/proposals/NNNN-collections-move-indices.md b/proposals/NNNN-collections-move-indices.md index 11b0d09abb..4b37e67cb0 100644 --- a/proposals/NNNN-collections-move-indices.md +++ b/proposals/NNNN-collections-move-indices.md @@ -77,61 +77,98 @@ This proposal intentionally does not: collections, queues etc.) Discussing how other concrete collections fit into the current protocol hierarchy is in scope, though. -## Proposed solution - -We propose to allow implementing collections whose indices don't have -reference-countable stored properties. - -From the API standpoint, this implies that indices can't be moved -forward or backward by themselves (for example, calling `i.successor()` -is not possible anymore for an arbitrary index). Only the corresponding -collection instance can advance indices (e.g., `c.next(i)`). By -removing API requirements from indices, we reduce the required amount of -information that indices need to store or reference. - -In this model indices can store the minimal amount of information only -about the element position in the collection. Usually index can be -represented a few of word-sized integers (typically, just one) that -efficiently encode the "path" in the data structure from the root to the -element. Since one is free to choose the encoding of the "path", we -think that it should be possible to choose it in such a way that indices -are cheaply comparable. - -### Protocol hierarchy - -In the proposed model indices don't have any method or property -requirements (these APIs were moved to Collection), so index protocols -are eliminated. Instead, we are introducing `BidirectionalCollection` -and `RandomAccessCollection`. These protocols naturally compose with -existing `MutableCollection` and `RangeReplaceableCollection` to -describe the collection's capabilities: +## Changes in Detail -```swift -protocol Sequence { ... } -protocol Collection : Sequence { ... } +### Collection Protocol Hierarchy -protocol MutableCollection : Collection { ... } -protocol RangeReplaceableCollection : Collection { ... } +In the proposed model, indices don't have any requirements beyond +`Comparable`, so index protocols are eliminated. Instead, we +introduce `BidirectionalCollection` and `RandomAccessCollection` to +provide the same index traversal distinctions, as shown here: -protocol BidirectionalCollection : Collection { ... } // new -protocol RandomAccessCollection : BidirectionalCollection { ... } // new ``` + +--------+ + |Sequence| + +---+----+ + | . + +----+-----+ + |Collection| + +----+-----+ + | + +--------------+-------------+ + | | | + | +--------+--------+ | + | |MutableCollection| | + | +-----------------+ | + | | ++---------+-------------+ +---------+----------------+ +|BidirectionalCollection| |RangeReplaceableCollection| ++---------+-------------+ +--------------------------+ + | + +--------+-------------+ + |RandomAccessCollection| + +----------------------+ +``` + +These protocols compose naturally with the existing protocols +`MutableCollection` and `RangeReplaceableCollection` to describe a +collection's capabilities, e.g. +```swift +struct Array + : RandomAccessCollection, + MutableCollection, + RangeReplaceableCollection { ... } + +struct UnicodeScalarView : BidirectionalCollection { ... } ``` - Sequence - ^ - | - + - Collection - ^ ^ - | +--------+ - BidirectionalCollection | | - ^ | MutableCollection - | | - | | - RandomAccessCollection RangeReplaceableCollection + +### Range Protocols and Types + +The proposal adds several new types and protocols to support ranges: + +``` +-------------+ + |RangeProtocol| + +-----+-------+ + | + +------------+---------------------+ + | | ++---------+-----------+ : +----------+--------+ +|HalfOpenRangeProtocol| : |ClosedRangeProtocol| ++----+------+---------+ | +-------+------+----+ + | | | | | ++----+---+ | +-----------+-----------+ | +---+----------+ +|Range| | | RandomAccessCollection| | |ClosedRange| ++========+ | +----+---------------+--+ | +==============+ + | | | | + +----+-------+----+ +--+-----+--------------+ + |CountableRange| |CountableClosedRange| + +=================+ +=======================+ ``` +* The old `Range`, `ClosedInterval`, and + `OpenInterval` are replaced with four new generic range types: + + * Two for general ranges whose bounds are `Comparable`: + `Range` and `ClosedRange`. + + * Two for ranges that additionally conform to + `RandomAccessCollection`, requiring bounds that are `Strideable` + with `Stride` conforming to `Integer` : `CountableRange` and + `CountableClosedRange`. [These types can be folded into + `Range` and `ClosedRange` when Swift acquires conditional + conformance capability.] + +We also introduce three new protocols: + +* `RangeProtocol` +* `HalfOpenRangeProtocol` +* `ClosedRangeProtocol` + +These protocols mostly exist facilitate implementation-sharing among +the range types, and would seldom need to be implemented outside the +standard library. + ### Analysis Advantages of the proposed model: @@ -194,6 +231,26 @@ Disadvantages of the proposed collections model: * Collection's API now includes methods for advancing indices. +## The `Comparable` Requirement on Indices + +In this model indices store the minimal amount of information required +to describe an element's position. Usually an index can be +represented with one or two `Int`s that efficiently encode the path to +the element from the root of a data structure. Since one is free to +choose the encoding of the “path”, we think it is possible to choose +it in such a way that indices are cheaply comparable. That has been +the case for all of the indices required to implement the standard +library, and a few others we investigated while researching this +change. + +It's worth noting that this requirement isn't strictly +necessary. Without it, though, indices would have no requirements +beyond `Equatable`, and creation of a `Range` would have to be +allowed for any `T` conforming to `Equatable`. As a consequence, most +interesting range operations, such as containment checks, would be +unavailable unless `T` were also `Comparable`, and we'd be unable to +provide comple bounds-checking in the general case. + ### Implementation difficulties We have a couple of implementation difficulties that will be solved with @@ -842,4 +899,3 @@ mutated in place, indices can be concurrently being incremented Fixing this data race (to provide memory safety) would require locking dictionary storage on every access, which would be an unacceptable performance penalty. - From 1e0e196e3d3f3ce2b076cbc5db6453f71e0dbed8 Mon Sep 17 00:00:00 2001 From: Dave Abrahams Date: Fri, 8 Apr 2016 13:54:41 -0700 Subject: [PATCH 06/20] Markdown fixup --- proposals/NNNN-collections-move-indices.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/proposals/NNNN-collections-move-indices.md b/proposals/NNNN-collections-move-indices.md index 4b37e67cb0..3c8981fa1a 100644 --- a/proposals/NNNN-collections-move-indices.md +++ b/proposals/NNNN-collections-move-indices.md @@ -127,7 +127,8 @@ struct UnicodeScalarView : BidirectionalCollection { ... } The proposal adds several new types and protocols to support ranges: -``` +-------------+ +``` + +-------------+ |RangeProtocol| +-----+-------+ | From 3186f424891c62b4d87b1299483557917fee78b3 Mon Sep 17 00:00:00 2001 From: Dave Abrahams Date: Fri, 8 Apr 2016 13:56:26 -0700 Subject: [PATCH 07/20] Put dots around RandomAccessCollection In this diagram it isn't one of the new components we're referencing --- proposals/NNNN-collections-move-indices.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/proposals/NNNN-collections-move-indices.md b/proposals/NNNN-collections-move-indices.md index 3c8981fa1a..a39f5bdd21 100644 --- a/proposals/NNNN-collections-move-indices.md +++ b/proposals/NNNN-collections-move-indices.md @@ -134,13 +134,13 @@ The proposal adds several new types and protocols to support ranges: | +------------+---------------------+ | | -+---------+-----------+ : +----------+--------+ -|HalfOpenRangeProtocol| : |ClosedRangeProtocol| -+----+------+---------+ | +-------+------+----+ - | | | | | -+----+---+ | +-----------+-----------+ | +---+----------+ -|Range| | | RandomAccessCollection| | |ClosedRange| -+========+ | +----+---------------+--+ | +==============+ ++---------+-----------+ +----------+--------+ +|HalfOpenRangeProtocol| |ClosedRangeProtocol| ++----+------+---------+ : +-------+------+----+ + | | : | | ++----+---+ | +...........+...........+ | +---+----------+ +|Range| | : RandomAccessCollection: | |ClosedRange| ++========+ | +....+...............+..+ | +==============+ | | | | +----+-------+----+ +--+-----+--------------+ |CountableRange| |CountableClosedRange| From 6f7f850d701bb15a45aaea52a714a3bf0f5dca03 Mon Sep 17 00:00:00 2001 From: Dave Abrahams Date: Fri, 8 Apr 2016 13:59:50 -0700 Subject: [PATCH 08/20] Tout the advantages of ClosedRange --- proposals/NNNN-collections-move-indices.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/proposals/NNNN-collections-move-indices.md b/proposals/NNNN-collections-move-indices.md index a39f5bdd21..d0b28dfbf1 100644 --- a/proposals/NNNN-collections-move-indices.md +++ b/proposals/NNNN-collections-move-indices.md @@ -150,8 +150,10 @@ The proposal adds several new types and protocols to support ranges: * The old `Range`, `ClosedInterval`, and `OpenInterval` are replaced with four new generic range types: - * Two for general ranges whose bounds are `Comparable`: - `Range` and `ClosedRange`. + * Two for general ranges whose bounds are `Comparable`: `Range` + and `ClosedRange`. Having a separate `ClosedRange` type allows + us to address the vexing inability of the old `Range` to represent + a range containing the maximal value of its bound. * Two for ranges that additionally conform to `RandomAccessCollection`, requiring bounds that are `Strideable` From c9e2d4562cc5697af43ec488a9592daae42fbfac Mon Sep 17 00:00:00 2001 From: Dave Abrahams Date: Fri, 8 Apr 2016 14:08:15 -0700 Subject: [PATCH 09/20] Edits for readability --- proposals/NNNN-collections-move-indices.md | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/proposals/NNNN-collections-move-indices.md b/proposals/NNNN-collections-move-indices.md index d0b28dfbf1..01797f4332 100644 --- a/proposals/NNNN-collections-move-indices.md +++ b/proposals/NNNN-collections-move-indices.md @@ -38,16 +38,18 @@ collection's structure. The consequences for performance aren't pretty: -* Code that handles indices has to perform reference counting, which - blocks some optimizations, and definitely means more work at runtime. +* Code that handles indices has to perform atomic reference counting, + which has significant overhead and can prevent the optimizer from + making other improvements. -* Indices that keep references to collections' storage block the - copy-on-write optimization. A live index makes underlying storage +* Additional references to a collections storage block the + library-level copy-on-write optimization: in-place mutation of + uniquely-referenced data. A live index makes underlying storage non-uniquely referenced, forcing unnecessary copies when the collection is mutated. In the standard library, `Dictionary` and `Set` use a double-indirection trick to work around this issue. Unfortunately, even this trick is not a solution, because (as we - have just realized) it isn't threadsafe. [^1] + have recently realized) it isn't threadsafe. [^1] By giving responsibility for traversal to the collection, we ensure that operations that need the collection's structure always have it, From e5917efad17c15a8c23a77b3e1243caaf0c7b2d0 Mon Sep 17 00:00:00 2001 From: Dave Abrahams Date: Fri, 8 Apr 2016 17:49:08 -0700 Subject: [PATCH 10/20] Further edits --- proposals/NNNN-collections-move-indices.md | 309 +++++---------------- 1 file changed, 75 insertions(+), 234 deletions(-) diff --git a/proposals/NNNN-collections-move-indices.md b/proposals/NNNN-collections-move-indices.md index 01797f4332..2df2a0bdbb 100644 --- a/proposals/NNNN-collections-move-indices.md +++ b/proposals/NNNN-collections-move-indices.md @@ -62,6 +62,10 @@ other significant benefits: * Simplifies implementation of non-trivial indices. * Allows us to eliminate the `Range`/`Interval` distinction. +* Making traversal a direct property of the `Collection` protocol, + rather than its associated `Index` type, is closer to most peoples' + mental model for collections, and simplifies the writing of many + generic constraints. * Makes it feasible to fix existing concurrency issues in `Set` and `Dictionary` indices. * Allows `String` views to share a single index type, letting us @@ -79,7 +83,12 @@ This proposal intentionally does not: collections, queues etc.) Discussing how other concrete collections fit into the current protocol hierarchy is in scope, though. -## Changes in Detail +## Overview of Changes + +To facilitate evaluation, we've submitted a +[pull request](https://github.com/apple/swift/pull/2108) for the code +and documentation changes implementing this proposal. See below for a +discussion of the major points. ### Collection Protocol Hierarchy @@ -174,69 +183,7 @@ These protocols mostly exist facilitate implementation-sharing among the range types, and would seldom need to be implemented outside the standard library. -### Analysis - -Advantages of the proposed model: - -* Indices don't need to keep a reference to the collection. - - Indices are simpler to implement. - - Indices are not reference-countable, and thus cheaper to - handle. - - Handling indices does not cause refcounting, and does not block - optimizations. - -* The hierarchy of index protocols is removed, and is replaced with - protocols for forward, bidirectional and random-access - collections. - - This is closer to how people generally talk about collections. - - Writing a generic constraint for bidirectional and - random-access collections becomes simpler. - -* Indices can conform to `Comparable` without incurring extra - memory overhead. Indices need to store all the necessary data - anyway. - - This allows, for example, to relax the precondition on the - `distance(from:to:)` method: even for forward collections, it is now - possible to measure the distance from a later index to an earlier - index. - -* `Dictionary` and `Set` indices (and other non-trivial indices) can't - create opportunities for data races. - -* While this model allows to design indices that are not - reference-countable, it does not prohibit defining indices that - *are* reference countable. - - All existing collection designs are still possible to - implement, but some are less efficient than the new model allows. - - If there is a specialized collection that needs - reference-countable indices for algorithmic or memory - efficiency, where such tradeoff is reasonable, such a - collection is still possible to implement in the new model. - See the discussion of trees below, the tree design (2)(c). - -Neutral as compared to the current collections: - -* A value-typed linked list still can't conform to `Collection`. - A reference-typed one can. - -Disadvantages of the proposed collections model: - -* Advancing an index forward or backward becomes harder -- the statement - now includes two entities (collection and index): - - ``` - j = c.next(i) vs. j = i.successor() - ``` - - In practice though, we found that when the code is performing index - manipulations, the collection is typically still around stored in a - variable, so the code does not need to reach out for it in a - non-trivial way. - -* Collection's API now includes methods for advancing indices. - -## The `Comparable` Requirement on Indices +### The `Comparable` Requirement on Indices In this model indices store the minimal amount of information required to describe an element's position. Usually an index can be @@ -256,176 +203,70 @@ interesting range operations, such as containment checks, would be unavailable unless `T` were also `Comparable`, and we'd be unable to provide comple bounds-checking in the general case. -### Implementation difficulties - -We have a couple of implementation difficulties that will be solved with -further improvements to the compiler or the generics system. These -issues will cause the new API to be suboptimal in the short term, but as -the necessary compiler features will be implemented, the API will -improve. We don't consider any of these issues to be blockers to adopt -the proposed collections model. - -* `Range` has conflicting requirements: - - - range bounds need to be comparable and incrementable, in order - for `Range` to conform to `Collection`, - - - we frequently want to use `Range` as a "transport" data type, just - to carry a pair of indices around (for example, as an argument for - `removeSubrange(_:)`). Indices are neither comparable nor - incrementable. - - Solution: add conditional a conformance for `Range` to `Collection` - when the bounds conform to `Strideable`. We don't have this compiler - feature now. As a workaround, we will introduce a parallel type, - `StrideableRange`. - -2. We can't specify constraints on associated types. This forces many - algorithms to specify constraints that should be implied by - `Sequence` or `Collection` conformances. - - Solution: constraints on associated types are a desirable - language feature, part of the Swift generics model. This issue - will be fixed by compiler improvements. - -## Case study: trees - -Trees are very interesting data structures with many unique -requirements. We are interested in allowing efficient and memory-safe -implementations of collections based on a search trees (e.g., RB trees -or B-trees). The specific requirements are as follows. - -- The collection and indices should be memory-safe. They should provide - good QoI in the form of precondition traps. Ensuring memory-safety - shouldn't cause unreasonable performance or memory overhead. - -- Collection instances should be able to share nodes on mutation - (persistent data structures). - -- Subscript on an index should cost at worst amortized O(1). - -- Advancing an index to the next or previous position should cost - at worst amortized O(1). - -- Indices should not contain reference countable stored properties. - -- Mutating or deleting an element in the collection should not - invalidate indices pointing at other elements. - - This design constraint needs some extra motivation, because it might - not be obvious. Preserving index validity across mutation is - important for algorithms that iterate over the tree and mutate it in - place, for example, removing a subrange of elements between two - indices, or removing elements that don't satisfy a predicate. When - implementing such an algorithm, you would typically have an index that - points to the current element. You can copy the index, advance it, - and then remove the previous element using its index. If the mutation - of the tree invalidates all indices, it is not possible to continue - the iteration. Thus, it is desired to invalidate just one index for - the element that was deleted. - -It is not possible to satisfy all of these requirements at the same -time. Designs that cover some of the requirements are possible. - -1. Persistent trees with O(log n) subscripting and advancing, and strict - index invalidation. - - If we choose to make a persistent data structure with node reuse, - then the tree nodes can't have parent pointers (a node can have - multiple different parents in different trees). This means that it - is not possible to advance an index in O(1). If we need to go up the - tree while advancing the index, without parent pointers we would need - to traverse the tree starting from the root in O(log n). - - Thus, index has to essentially store a path through the tree from the - root to the node (it is usually possible to encode this path in a - 64-bit number). Since the index stores the path, subscripting on - such an index would also cost O(log n). - - We should note that persistent trees typically use B-trees, so the - base of the logarithm would be typically large (e.g., 32). We also - know that the size of the RAM is limited. Thus, we could treat the - O(log n) complexity as effectively constant for all practical - purposes. But the constant factor will be much larger than in other - designs. - - Swift's collection index model does not change anything as compared - to other languages. The important point is that the proposed index - model allows such implementations of persistent collections. - -2. Trees with O(1) subscripting and advancing. - - If we want subscripting to be O(1), then the index has to store a - pointer to a tree node. Since we want avoid reference countable - properties in indices, the node pointer should either be - `unsafe(unowned)` or an `UnsafePointer`. These pointers can't be - dereferenced safely without knowing in advance that it is safe to do - so. We need some way to quickly check if it is safe to dereference - the pointer stored in the node. - - A pointer to a tree node can become invalid when the node was - deallocated. A tree node can be deallocated if the corresponding - element is removed from the tree. - - (a) Trees with O(1) subscripting and advancing, and strict index - invalidation. - - One simple way to perform the safety check when - dereferencing the unsafe pointer stored within the index - would be to detect any tree mutation between the index - creation and index use. It is simple to do with version - numbers: we add an ID number to every tree. This ID would - be unique among all trees created within the process, and it - would be re-generated on every mutation. The tree ID is - copied into every index. When the index is used with the - collection, we check that the ID of the tree matches the - tree ID stored in the index. This fast check ensures memory - safety. - - (b) Trees with O(1) subscripting and advancing, permissive index - invalidation, and extra storage to ensure memory safety. - - Another way to perform the safety check would be to directly - check if the unsafe pointer stored in the index is actually - linked into the tree. To do that with acceptable time - complexity, we would need to have an extra data structure - for every tree, for example, a hash table-based set of all - node pointers. With this design, all index operations get - an O(1) hit for the hash table lookup for the safety check, - but we get an O(log n) memory overhead for the extra data - structure. On the other hand, a tree already has O(log n) - memory overhead for allocating nodes themselves, thus the - extra data structure would only increase the constant factor - on memory overhead. - -| | (1) | (2)(a) | (2)(b) | -| -------------------------------------|----------|---------|------- | -| Memory-safe | Yes | Yes | Yes | -| Indices are not reference-counted | Yes | Yes | Yes | -| Shares nodes | Yes | No | No | -| Subscripting on an index | O(log n) | O(1) | O(1) | -| Advancing an index | O(log n) | O(1) | O(1) | -| Deleting does not invalidate indices | No | No | Yes | -| Extra O(n) storage just for safety checks | No | No | Yes | - -Each of the designs discussed above has its uses, but the intuition -is that (2)(a) is the one most commonly needed in practice. (2)(a) -does not have the desired index invalidation properties. There is -a small number of commonly used algorithms that require that -property, and they can be provided as methods on the collection, -for example `removeAll(in: Range)` and -`removeAll(_: (Element)->Bool)`. - -If we were to allow reference-counted indices (basically, the -current collections model), then an additional design is possible --- let's call it (2)(c) for the purpose of discussion. This design -would be like (2)(b), but won't require extra storage that is used -only for safety checks. Instead, every index would pay a RC -penalty and carry a strong reference to the tree node. - -Note that (2)(c) is still technically possible to implement in the -new collection index model, it just goes against a goal of having -indices free of reference-counted stored properties. +That said, the requirement has real benefits. For example, it allows +us to support distance measurement between arbitrary indices, even in +collections without random access traversal. In the old model, +`x.distance(to: y)` for these collections had the undetectable +precondition that `x` precede `y`, with unpredictable consequences for +violation in the general case. + +## Downsides + +The proposed approach has several disadvantages, which we explore here +in the interest of full disclosure: + +* Index movement is more complex in principle, since it now involves + not only the index, but the collection as well. The impact of this + complexity is limited somewhat because it's very common that code + moving indices occurs in a method of the collection type, where + “implicit `self`” kicks in. The net result is that index + manipulations end up looking like free function calls: + + ```swift + let j = successor(i) // self.successor(i) + let k = index(5, stepsFrom: j) // self.index(5, stepsFrom: j) + ``` + +* The + [new index manipulation methods](https://github.com/apple/swift/blob/swift-3-indexing-model/stdlib/public/core/Collection.swift#L135) + increase the API surface area of `Collection`, which is already + quite large since algorithms are implemented as extensions. + +* Because Swift is unable to express conditional protocol + conformances, implementing this change has required us to create a + great deal of complexity in the standard library API. Aside from + the two excess “`Countable`” range types, there are new overloads + for slicing and twelve distinct slice types that capture all the + combinations of traversal, mutability, and range-replaceability. + While these costs are probably temporary, they are very real in the + meantime. + +* The API complexity mentioned above stresses the type checker, + requiring + [several](https://github.com/apple/swift/commit/1a875cb922fa0c98d51689002df8e202993db2d3) + [changes](https://github.com/apple/swift/commit/6c56af5c1bc319825872a25041ec33ab0092db05) + just to get our test code to type-check in reasonable time. Again, + an ostensibly temporary—but still real—cost. + +## Limitations of the Model + +Ideally, our collection model would allow us to implement every +interesting data structure with memory safety, optimal performance, +value semantics, and a variety of other useful properties such as +minimal invalidation of indexes upon mutation. In practice, these +goals and the Swift language model interact in complicated ways, +preventing some designs altogether, and suggesting a variety of +implementation strategies for others that can be selected based on +one's priorities. We've done some in-depth investigation of these +implications, but presenting and explaining them is well beyond the +scope of this proposal. + +We can, however, be fairly sure that this change does not regress our +ability to build any Collections that could have been built in Swift +2.2. After all, it is still *possible* to implement indices that store +references and have the old traversal methods (the collection's +traversal methods would simply forward to those of the index), so we +haven't lost the ability to express anything. ## Collection APIs From 3ff69468a3ca5c367d8c924296a3736cb182e4be Mon Sep 17 00:00:00 2001 From: Dave Abrahams Date: Sat, 9 Apr 2016 19:17:31 -0700 Subject: [PATCH 11/20] Further edits; eliminate API dump --- proposals/NNNN-collections-move-indices.md | 397 ++++----------------- 1 file changed, 61 insertions(+), 336 deletions(-) diff --git a/proposals/NNNN-collections-move-indices.md b/proposals/NNNN-collections-move-indices.md index 2df2a0bdbb..ac83375edf 100644 --- a/proposals/NNNN-collections-move-indices.md +++ b/proposals/NNNN-collections-move-indices.md @@ -83,19 +83,22 @@ This proposal intentionally does not: collections, queues etc.) Discussing how other concrete collections fit into the current protocol hierarchy is in scope, though. -## Overview of Changes +## Overview of Type And Protocol Changes -To facilitate evaluation, we've submitted a -[pull request](https://github.com/apple/swift/pull/2108) for the code -and documentation changes implementing this proposal. See below for a -discussion of the major points. +This section covers the proposed structural changes to the library at +a high level. Details such as protocols introduced purely to work +around compiler limitations (e.g. `Indexable` or `IndexableBase`) have +been omitted. For a complete view of the the code +and documentation changes implementing this proposal, please see this +[pull request](https://github.com/apple/swift/pull/2108). ### Collection Protocol Hierarchy In the proposed model, indices don't have any requirements beyond -`Comparable`, so index protocols are eliminated. Instead, we -introduce `BidirectionalCollection` and `RandomAccessCollection` to -provide the same index traversal distinctions, as shown here: +`Comparable`, so the `ForwardIndex`, `BidirectionalIndex`, and +`RandomAccessIndex` protocols are eliminated. Instead, we introduce +`BidirectionalCollection` and `RandomAccessCollection` to provide the +same traversal distinctions, as shown here: ``` +--------+ @@ -126,9 +129,9 @@ These protocols compose naturally with the existing protocols collection's capabilities, e.g. ```swift -struct Array - : RandomAccessCollection, - MutableCollection, +struct Array + : RandomAccessCollection, + MutableCollection, RangeReplaceableCollection { ... } struct UnicodeScalarView : BidirectionalCollection { ... } @@ -139,9 +142,9 @@ struct UnicodeScalarView : BidirectionalCollection { ... } The proposal adds several new types and protocols to support ranges: ``` - +-------------+ - |RangeProtocol| - +-----+-------+ + +-------------+ + |RangeProtocol| + +------+------+ | +------------+---------------------+ | | @@ -168,9 +171,9 @@ The proposal adds several new types and protocols to support ranges: * Two for ranges that additionally conform to `RandomAccessCollection`, requiring bounds that are `Strideable` - with `Stride` conforming to `Integer` : `CountableRange` and - `CountableClosedRange`. [These types can be folded into - `Range` and `ClosedRange` when Swift acquires conditional + with `Stride` conforming to `Integer`: `CountableRange` and + `CountableClosedRange`. [These types can be folded into + `Range` and `ClosedRange` when Swift acquires conditional conformance capability.] We also introduce three new protocols: @@ -183,8 +186,39 @@ These protocols mostly exist facilitate implementation-sharing among the range types, and would seldom need to be implemented outside the standard library. -### The `Comparable` Requirement on Indices +### The Associated `Indices` Type +The following code iterates over the indices of all elements in +`collection`: + +```swift +for index in collection.indices { ... } +``` + +In Swift 2, `collection.indices` returned a `Range`, but +because a range is a simple pair of indices and indices can no longer +be advanced on their own, `Range` is no longer iterable. + +In order to keep code like the above working, `Collection` has +acquired an associated `Indices` type that is always iterable, and +three generic types were introduced to provide a default `Indices` for +each `Collection` traversal category: `DefaultIndices`, +`DefaultBidirectionalIndices`, and `DefaultRandomAccessIndices`. +These types store the underlying collection as a means of traversal. +Collections like `Array` whose `Indices` don't need the collection +simply use `typealias Indices = CountableRange`. + +### Expanded Default Slice Types + +Because Swift doesn't support conditional protocol conformances and +the three traversal distinctions have been moved into the `Collection` +hierarchy, the four generic types `Slice`, `MutableSlice`, +`RangeReplaceableSlice`, and `MutableRangeReplaceableSlice` have +become twelve, with the addition of variations such as +`RangeReplaceableBidirectionalSlice`. + +### The `Comparable` Requirement on Indices + In this model indices store the minimal amount of information required to describe an element's position. Usually an index can be represented with one or two `Int`s that efficiently encode the path to @@ -209,6 +243,15 @@ collections without random access traversal. In the old model, `x.distance(to: y)` for these collections had the undetectable precondition that `x` precede `y`, with unpredictable consequences for violation in the general case. + +## Detailed API Changes + +In this section we describe the new APIs at a high level + +To facilitate evaluation, we've submitted a +[pull request](https://github.com/apple/swift/pull/2108) for the code +and documentation changes implementing this proposal. See below for a +discussion of the major points. ## Downsides @@ -268,324 +311,6 @@ references and have the old traversal methods (the collection's traversal methods would simply forward to those of the index), so we haven't lost the ability to express anything. -## Collection APIs - -This is a rough sketch of the API. - -There are a few API details that we can't express in Swift today because of the -missing compiler features, or extra APIs that only exist as workarounds. It is -important to see what they are to understand the grand plan, so we marked them -with `FIXME(compiler limitation)` below. - -There are a few minor open questions that are marked with `FIXME(design)`. - -```swift -// No changes from the current scheme. -public protocol IteratorProtocol { - associatedtype Element - mutating func next() -> Element? -} - -// No changes from the current scheme. -public protocol Sequence { - associatedtype Iterator : IteratorProtocol - - /// A type that represents a subsequence of some of the elements. - associatedtype SubSequence - // FIXME(compiler limitation): - // associatedtype SubSequence : Sequence - // where Iterator.Element == SubSequence.Iterator.Element - - func makeIterator() -> Iterator - - // Defaulted requirements, algorithms. - - @warn_unused_result - func map( - @noescape transform: (Generator.Element) throws -> T - ) rethrows -> [T] - - @warn_unused_result - func dropFirst(n: Int) -> SubSequence - - // Other algorithms have been omitted for brevity since their signatures - // didn't change. -} - -/* -FIXME(compiler limitation): we can't write this extension now because -concrete typealiases in extensions don't work well. - -extension Sequence { - /// The type of elements that the sequence contains. - /// - /// It is just a shorthand to simplify generic constraints. - typealias Element = Iterator.Element -} -*/ - -// `Indexable` protocol is an unfortunate workaround for a compiler limitation. -// Without this workaround it is not possible to implement `IndexingIterator`. -// -// FIXME(compiler limitation): remove `Indexable`, and move all of its -// requirements to `Collection`. -public protocol Indexable { - /// A type that represents a valid position in the collection. - /// - /// Valid indices consist of the position of every element and a - /// "past the end" position that's not valid for use as a subscript. - associatedtype Index : Comparable - - associatedtype _Element - var startIndex: Index { get } - var endIndex: Index { get } - - /// Returns the element at the given `position`. - /// - /// - Complexity: O(1). - subscript(position: Index) -> _Element { get } - - /// Returns the next consecutive index after `i`. - /// - /// - Precondition: `self` has a well-defined successor. - /// - /// - Complexity: O(1). - @warn_unused_result - func next(i: Index) -> Index - - /// Equivalent to `i = self.next(i)`, but can be faster because it - /// does not need to create a new index. - /// - /// - Precondition: `self` has a well-defined successor. - /// - /// - Complexity: O(1). - func _nextInPlace(i: inout Index) - // This method has a default implementation. - - /// Traps if `index` is not in `bounds`, or performs - /// a no-op if such a check is not possible to implement in O(1). - /// - /// Use this method to implement cheap fail-fast checks in algorithms - /// and wrapper data structures to bring the failure closer to the - /// source of the bug. - /// - /// Do not use this method to implement checks that guarantee memory - /// safety. This method is allowed to be implemented as a no-op. - /// - /// - Complexity: O(1). - func _failEarlyRangeCheck(index: Index, inBounds: Range) - // This method has a default implementation. - // FIXME(design): this API can be generally useful, maybe we should - // de-underscore it, and make it a public API? - - /// Traps if `subRange` is not in `bounds`, or performs - /// a no-op if such a check is not possible to implement in O(1). - /// - /// Use this method to implement cheap fail-fast checks in algorithms - /// and wrapper data structures to bring the failure closer to the - /// source of the bug. - /// - /// Do not use this method to implement checks that guarantee memory - /// safety. This method is allowed to be implemented as a no-op. - /// - /// - Complexity: O(1). - func _failEarlyRangeCheck(subRange: Range, inBounds: Range - - /// A type that represents a slice of some of the elements. - associatedtype SubSequence : Sequence, Indexable = Slice - // FIXME(compiler limitation): - // associatedtype SubSequence : Collection - // where - // Iterator.Element == SubSequence.Iterator.Element, - // SubSequence.SubSequence == SubSequence - // - // These constraints allow processing collections in generic code by - // repeatedly slicing them in a loop. - - /// A signed integer type that can represent the number of steps between any - /// two indices. - associatedtype IndexDistance : SignedIntegerType = Int - - /// Returns the result of advancing `i` by `n` positions. - /// - /// - Returns: - /// - If `n > 0`, the result of applying `next` to `i` `n` times. - /// - If `n < 0`, the result of applying `previous` to `i` `-n` times. - /// - Otherwise, `i`. - /// - /// - Precondition: `n >= 0` if `self` only conforms to `Collection`. - /// - Complexity: - /// - O(1) if `self` conforms to `RandomAccessCollection`. - /// - O(`abs(n)`) otherwise. - @warn_unused_result - func advance(i: Index, by n: IndexDistance) -> Index - // This method has a default implementation. - - /// Returns the result of advancing `i` by `n` positions or until it reaches - /// `limit`. - /// - /// - Returns: - /// - If `n > 0`, the result of applying `next` to `i` `n` times, - /// but not past `limit`. - /// - If `n < 0`, the result of applying `previous` to `i` `-n` times, - /// but not past `limit`. - /// - Otherwise, `i`. - /// - /// - Precondition: `n >= 0` if `self` only conforms to `Collection`. - /// - Complexity: - /// - O(1) if `self` conforms to `RandomAccessCollection`. - /// - O(`abs(n)`) otherwise. - @warn_unused_result - func advance(i: Index, by n: IndexDistance, limit: Index) -> Index - // This method has a default implementation. - - /// Measure the distance between `start` and `end`. - /// - /// - Precondition: `start` and `end` are valid indices into this - /// collection. - /// - /// - Complexity: - /// - O(1) if `self` conforms to `RandomAccessCollection`. - /// - O(`abs(n)`) otherwise, where `n` is the function's result. - @warn_unused_result - func distance(from start: Index, to end: Index) -> IndexDistance - // This method has a default implementation. - - /// A type for the collection of indices for this collection. - /// - /// An instance of `Indices` can hold a strong reference to the collection - /// itself, causing the collection to be non-uniquely referenced. If you - /// need to mutate the collection while iterating over its indices, use the - /// `next()` method to produce indices instead. - associatedtype Indices : Sequence, Indexable = DefaultCollectionIndices - // FIXME(compiler limitation): - // associatedtype Indices : Collection - // where - // Indices.Iterator.Element == Index, - // Indices.Index == Index, - // Indices.SubSequence == Indices - - /// A collection of indices for this collection. - var indices: IndexRange { get } - // This property has a default implementation. - - /// Returns the number of elements. - /// - /// - Complexity: O(1) if `self` conforms to `RandomAccessCollection`; - /// O(N) otherwise. - var count: IndexDistance { get } - // This property has a default implementation. - - /// Returns the element at the given `position`. - /// - /// - Complexity: O(1). - subscript(position: Index) -> Generator.Element { get } - - /// - Complexity: O(1). - subscript(bounds: Range) -> SubSequence { get } - // This property has a default implementation. - - // Other algorithms (including `first`, `isEmpty`, `index(of:)`, `sorted()` - // etc.) have been omitted for brevity since their signatures didn't change. -} - -/// A collection whose indices can be advanced backwards. -public protocol BidirectionalCollection : Collection { - // FIXME(compiler limitation): - // associatedtype SubSequence : BidirectionalCollection - - // FIXME(compiler limitation): - // associatedtype Indices : BidirectionalCollection - - /// Returns the previous consecutive index before `i`. - /// - /// - Precondition: `self` has a well-defined predecessor. - /// - /// - Complexity: O(1). - @warn_unused_result - func previous(i: Index) -> Index - - /// Equivalent to `i = self.previous(i)`, but can be faster because it - /// does not need to create a new index. - /// - /// - Precondition: `self` has a well-defined predecessor. - /// - /// - Complexity: O(1). - func _previousInPlace(i: inout Index) - // This method has a default implementation. - - // Other algorithms (including `last`, `popLast(_:)`, `dropLast(_:)`, - // `suffix(_:)` etc.) have been omitted for brevity since their - // signatures didn't change. -} - -/// A collection whose indices can be advanced by an arbitrary number of -/// positions in O(1). -public protocol RandomAccessCollection : BidirectionalCollection { - // FIXME(compiler limitation): - // associatedtype SubSequence : RandomAccessCollection - - // FIXME(compiler limitation): - // associatedtype Indices : RandomAccessCollection - - associatedtype Index : Strideable - // FIXME(compiler limitation): where Index.Distance == IndexDistance - // FIXME(design): does this requirement to conform to `Strideable` - // limit possible collection designs? -} - -public protocol MyMutableCollectionType : MyForwardCollectionType { - associatedtype SubSequence : Collection = MutableSlice - // FIXME(compiler limitation): - // associatedtype SubSequence : MutableCollection - - /// Access the element at `position`. - /// - /// - Precondition: `position` indicates a valid position in `self` and - /// `position != endIndex`. - /// - /// - Complexity: O(1) - subscript(i: Index) -> Generator.Element { get set } - - /// Returns a collection representing a contiguous sub-range of - /// `self`'s elements. - /// - /// - Complexity: O(1) for the getter, O(`bounds.count`) for the setter. - subscript(bounds: Range) -> SubSequence { get set } - // This subscript has a default implementation. -} - -// No changes from the current scheme. -public protocol RangeReplaceableCollection : Collection { ... } -``` - ## Impact on existing code Code that **does not need to change**: From a28ef1b2cc711bbb8732d40d4ce1782b765f0dd7 Mon Sep 17 00:00:00 2001 From: Dave Abrahams Date: Sat, 9 Apr 2016 19:17:43 -0700 Subject: [PATCH 12/20] Whitespace cleanups --- proposals/NNNN-collections-move-indices.md | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/proposals/NNNN-collections-move-indices.md b/proposals/NNNN-collections-move-indices.md index ac83375edf..0af686e063 100644 --- a/proposals/NNNN-collections-move-indices.md +++ b/proposals/NNNN-collections-move-indices.md @@ -104,7 +104,7 @@ same traversal distinctions, as shown here: +--------+ |Sequence| +---+----+ - | . + | +----+-----+ |Collection| +----+-----+ @@ -133,7 +133,7 @@ struct Array : RandomAccessCollection, MutableCollection, RangeReplaceableCollection { ... } - + struct UnicodeScalarView : BidirectionalCollection { ... } ``` @@ -168,7 +168,7 @@ The proposal adds several new types and protocols to support ranges: and `ClosedRange`. Having a separate `ClosedRange` type allows us to address the vexing inability of the old `Range` to represent a range containing the maximal value of its bound. - + * Two for ranges that additionally conform to `RandomAccessCollection`, requiring bounds that are `Strideable` with `Stride` conforming to `Integer`: `CountableRange` and @@ -209,16 +209,16 @@ Collections like `Array` whose `Indices` don't need the collection simply use `typealias Indices = CountableRange`. ### Expanded Default Slice Types - + Because Swift doesn't support conditional protocol conformances and the three traversal distinctions have been moved into the `Collection` hierarchy, the four generic types `Slice`, `MutableSlice`, `RangeReplaceableSlice`, and `MutableRangeReplaceableSlice` have become twelve, with the addition of variations such as `RangeReplaceableBidirectionalSlice`. - + ### The `Comparable` Requirement on Indices - + In this model indices store the minimal amount of information required to describe an element's position. Usually an index can be represented with one or two `Int`s that efficiently encode the path to @@ -243,7 +243,7 @@ collections without random access traversal. In the old model, `x.distance(to: y)` for these collections had the undetectable precondition that `x` precede `y`, with unpredictable consequences for violation in the general case. - + ## Detailed API Changes In this section we describe the new APIs at a high level @@ -264,12 +264,12 @@ in the interest of full disclosure: moving indices occurs in a method of the collection type, where “implicit `self`” kicks in. The net result is that index manipulations end up looking like free function calls: - + ```swift let j = successor(i) // self.successor(i) let k = index(5, stepsFrom: j) // self.index(5, stepsFrom: j) ``` - + * The [new index manipulation methods](https://github.com/apple/swift/blob/swift-3-indexing-model/stdlib/public/core/Collection.swift#L135) increase the API surface area of `Collection`, which is already @@ -283,7 +283,7 @@ in the interest of full disclosure: combinations of traversal, mutability, and range-replaceability. While these costs are probably temporary, they are very real in the meantime. - + * The API complexity mentioned above stresses the type checker, requiring [several](https://github.com/apple/swift/commit/1a875cb922fa0c98d51689002df8e202993db2d3) From 65aec994b80fa7bd7c848e3a82bad5fc66305ad7 Mon Sep 17 00:00:00 2001 From: Dave Abrahams Date: Sun, 10 Apr 2016 11:14:55 -0700 Subject: [PATCH 13/20] Outline Collection API changes --- proposals/NNNN-collections-move-indices.md | 112 ++++++++++++++++++++- 1 file changed, 107 insertions(+), 5 deletions(-) diff --git a/proposals/NNNN-collections-move-indices.md b/proposals/NNNN-collections-move-indices.md index 0af686e063..bbeb52c6d1 100644 --- a/proposals/NNNN-collections-move-indices.md +++ b/proposals/NNNN-collections-move-indices.md @@ -246,12 +246,114 @@ violation in the general case. ## Detailed API Changes -In this section we describe the new APIs at a high level +This section describes changes to methods, properties, and associated +types at a high level. Details related to working around compiler +limitations have been omitted. For a complete view of the the code +and documentation changes implementing this proposal, please see this +[pull request](https://github.com/apple/swift/pull/2108). + +### `Collection`s + +The following APIs were added: + +```swift +protocol Collection { + ... + /// A type that can represent the number of steps between pairs of + /// `Index` values where one value is reachable from the other. + /// + /// Reachability is defined by the ability to produce one value from + /// the other via zero or more applications of `successor(of:)`. + associatedtype IndexDistance : SignedInteger = Int + + /// A collection type whose elements are the indices of `self` that + /// are valid for subscripting, in ascending order. + associatedtype Indices : Collection = DefaultIndices + + /// The indices that are valid for subscripting `self`, in ascending order. + /// + /// - Note: `indices` can hold a strong reference to the collection itself, + /// causing the collection to be non-uniquely referenced. If you need to + /// mutate the collection while iterating over its indices, use the + /// `successor(of:)` method starting with `startIndex` to produce indices + /// instead. + /// + /// ``` + /// var c = [10, 20, 30, 40, 50] + /// var i = c.startIndex + /// while i != c.endIndex { + /// c[i] /= 5 + /// i = c.successor(of: i) + /// } + /// // c == [2, 4, 6, 8, 10] + /// ``` + var indices: Indices { get } + + /// Returns the result of advancing `i` by `n` positions. + /// + /// - Returns: + /// - If `n > 0`, the `n`th successor of `i`. + /// - If `n < 0`, the `n`th predecessor of `i`. + /// - Otherwise, `i` unmodified. + /// + /// - Precondition: `n >= 0` unless `Self` conforms to + /// `BidirectionalCollection`. + /// - Precondition: + /// - If `n > 0`, `n <= self.distance(from: i, to: self.endIndex)` + /// - If `n < 0`, `n >= self.distance(from: i, to: self.startIndex)` + /// + /// - Complexity: + /// - O(1) if `Self` conforms to `RandomAccessCollection`. + /// - O(`abs(n)`) otherwise. + func index(n: IndexDistance, stepsFrom i: Index) -> Index + + /// Returns the result of advancing `i` by `n` positions, or until it + /// equals `limit`. + /// + /// - Returns: + /// - If `n > 0`, the `n`th successor of `i` or `limit`, whichever + /// is reached first. + /// - If `n < 0`, the `n`th predecessor of `i` or `limit`, whichever + /// is reached first. + /// - Otherwise, `i` unmodified. + /// + /// - Precondition: `n >= 0` unless `Self` conforms to + /// `BidirectionalCollection`. + /// + /// - Complexity: + /// - O(1) if `Self` conforms to `RandomAccessCollection`. + /// - O(`abs(n)`) otherwise. + func index( + n: IndexDistance, stepsFrom i: Index, limitedBy limit: Index + ) -> Index + + /// Returns the distance between `start` and `end`. + /// + /// - Precondition: `start <= end` unless `Self` conforms to + /// `BidirectionalCollection`. + /// - Complexity: + /// - O(1) if `Self` conforms to `RandomAccessCollection`. + /// - O(`n`) otherwise, where `n` is the method's result. + func distance(from start: Index, to end: Index) -> IndexDistance +} + +protocol BidirectionalCollection { + /// Returns the position immediately preceding `i`. + /// + /// - Precondition: `i > startIndex && i <= endIndex` + func predecessor(of i: Index) -> Index + + /// Replaces `i` with its predecessor. + /// + /// - Precondition: `i > startIndex && i <= endIndex` + func formPredecessor(i: inout Index) +} +``` -To facilitate evaluation, we've submitted a -[pull request](https://github.com/apple/swift/pull/2108) for the code -and documentation changes implementing this proposal. See below for a -discussion of the major points. +Note that `RandomAccessCollection` does not add any *syntactic* +requirements beyond those of `BidirectionalCollection`. Instead, it +places tighter performance bounds on operations such as `c.index(n, +stepsFrom: i)` (O(1) instead of O(`n`)). ## Downsides From 4aefed5f9a32c5a3e72a6f00c6a3342015ab2259 Mon Sep 17 00:00:00 2001 From: Dave Abrahams Date: Sun, 10 Apr 2016 11:15:06 -0700 Subject: [PATCH 14/20] Document another downside --- proposals/NNNN-collections-move-indices.md | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/proposals/NNNN-collections-move-indices.md b/proposals/NNNN-collections-move-indices.md index bbeb52c6d1..1c4ae14819 100644 --- a/proposals/NNNN-collections-move-indices.md +++ b/proposals/NNNN-collections-move-indices.md @@ -360,6 +360,16 @@ stepsFrom: i)` (O(1) instead of O(`n`)). The proposed approach has several disadvantages, which we explore here in the interest of full disclosure: +* In Swift 2, `RandomAccessIndex` has operations like `+` that provide + easy access to arbitrary position offsets in some collections. That + could also be seen as discouragement from trying to do random access + operations with less-refined index protocols, because in those cases + one has to resort to constructs like `i.advancedBy(n)`. In this + proposal, there is only `c.index(n, stepsFrom: i)`, which makes + random access equally (in)convenient for all collections, and there + is no particular syntactic penalty for doing things that might turn + out to be inefficient. + * Index movement is more complex in principle, since it now involves not only the index, but the collection as well. The impact of this complexity is limited somewhat because it's very common that code From 3215fbdf560f7fa486185698877e055222471620 Mon Sep 17 00:00:00 2001 From: Dave Abrahams Date: Sun, 10 Apr 2016 12:01:03 -0700 Subject: [PATCH 15/20] Update information on range APIs --- proposals/NNNN-collections-move-indices.md | 108 ++++++++++++++++++++- 1 file changed, 103 insertions(+), 5 deletions(-) diff --git a/proposals/NNNN-collections-move-indices.md b/proposals/NNNN-collections-move-indices.md index 1c4ae14819..f3f0de0dec 100644 --- a/proposals/NNNN-collections-move-indices.md +++ b/proposals/NNNN-collections-move-indices.md @@ -164,17 +164,17 @@ The proposal adds several new types and protocols to support ranges: * The old `Range`, `ClosedInterval`, and `OpenInterval` are replaced with four new generic range types: - * Two for general ranges whose bounds are `Comparable`: `Range` + * Two for general ranges (whose bounds are `Comparable`): `Range` and `ClosedRange`. Having a separate `ClosedRange` type allows us to address the vexing inability of the old `Range` to represent a range containing the maximal value of its bound. * Two for ranges that additionally conform to - `RandomAccessCollection`, requiring bounds that are `Strideable` - with `Stride` conforming to `Integer`: `CountableRange` and - `CountableClosedRange`. [These types can be folded into + `RandomAccessCollection` (requiring bounds that are `Strideable` + with `Stride` conforming to `Integer`): `CountableRange` and + `CountableClosedRange`. These types can be folded into `Range` and `ClosedRange` when Swift acquires conditional - conformance capability.] + conformance capability. We also introduce three new protocols: @@ -355,6 +355,104 @@ requirements beyond those of `BidirectionalCollection`. Instead, it places tighter performance bounds on operations such as `c.index(n, stepsFrom: i)` (O(1) instead of O(`n`)). +## `Range`s + +The `RangeProtocol` shown below, along with the description of +[Range Protocols and Types](#range-protocols-and-types) above, provide +a complete picture of the changes to ranges. + +```swift +/// A type that represents a contiguous range of any comparable value. +/// +/// A range contains at least every value `x` where `lowerBound < x` and +/// `x < upperBound`. Individual types that conform to `RangeProtocol` must +/// specify their containment rules for the bounds of the range. +/// +/// The standard library defines two kinds of ranges: closed ranges, +/// represented by `ClosedRangeProtocol`, and half-open ranges, represented by +/// `HalfOpenRangeProtocol`. A closed range contains both its lower and upper +/// bounds, and therefore cannot be empty. A half-open range, on the other +/// hand, contains its lower bound when nonempty but never its upper bound. +/// +/// let closed: ClosedRange = 5...10 +/// closed.contains(5) // true +/// closed.contains(10) // true +/// +/// let halfOpen: Range = 5..<10 +/// halfOpen.contains(5) // true +/// halfOpen.contains(10) // false +/// +/// Not all empty ranges are equal; the bounds of two empty ranges must also be +/// equal for the ranges to be equal. +public protocol RangeProtocol : Equatable { + + /// The representation of the range's endpoints and the values + /// contained in it. + associatedtype Bound : Comparable + + /// Creates an instance with the given bounds. + /// + /// - Note: As this initializer does not check its precondition, it + /// should be used as an optimization only, when one is absolutely + /// certain that `lower <= upper`. In general, the `..<` and `...` + /// operators are to be preferred for forming ranges. + /// + /// - Precondition: `lower <= upper` + init(uncheckedBounds: (lower: Bound, upper: Bound)) + + /// Returns `true` if the range contains the `value`. + /// + /// Any type of range contains every value `x` where + /// `lowerBound < x < upperBound`. `RangeProtocol` makes no requirement as + /// to whether individual range types must contain either their lower or + /// upper bound. + func contains(value: Bound) -> Bool + + /// Returns `true` iff `self` and `other` contain a value in common. + /// + /// Any type of range contains every value `x` where + /// `lowerBound < x < upperBound`. `RangeProtocol` makes no requirement as + /// to whether individual range types must contain either their lower or + /// upper bound. + func overlaps(other: Self) -> Bool + + /// Returns `true` iff `self.contains(x)` is `false` for all values of `x`. + var isEmpty: Bool { get } + + // Note: When the range is also a collection, it is crucial to + // enforce the invariant that lowerBound <= upperBound, or it may be + // empty with startIndex != endIndex. + + /// The range's lower bound. + /// + /// Depending on the concrete type of the range, `lowerBound` may or may not + /// be contained in the range. + var lowerBound: Bound { get } + + /// The range's upper bound. + /// + /// Depending on the concrete type of the range, `upperBound` may or may not + /// be contained in the range. + var upperBound: Bound { get } + + /// Returns `self` clamped to `limits`. + /// + /// The bounds of the result, even if it is empty, are always + /// limited to the bounds of `limits`. + func clamped(to limits: Self) -> Self +} +``` + +Note in particular: + +* In `Range`, `T` is `Comparable` rather than an index + type that can be advanced, so a generalized range is no longer a + `Collection`, and `startIndex`/`endIndex` have become + `lowerBound`/`upperBound`. +* The semantic order of `Interval`'s `clamp` method, which was + unclear at its use-site, has been inverted and updated with a + preposition for clarity. + ## Downsides The proposed approach has several disadvantages, which we explore here From e8788cf5ae1256c11ecb2d6297585634dfb5599b Mon Sep 17 00:00:00 2001 From: Dave Abrahams Date: Sun, 10 Apr 2016 12:02:45 -0700 Subject: [PATCH 16/20] Move a section --- proposals/NNNN-collections-move-indices.md | 40 +++++++++++----------- 1 file changed, 20 insertions(+), 20 deletions(-) diff --git a/proposals/NNNN-collections-move-indices.md b/proposals/NNNN-collections-move-indices.md index f3f0de0dec..09b4fefd73 100644 --- a/proposals/NNNN-collections-move-indices.md +++ b/proposals/NNNN-collections-move-indices.md @@ -83,6 +83,26 @@ This proposal intentionally does not: collections, queues etc.) Discussing how other concrete collections fit into the current protocol hierarchy is in scope, though. +## Limitations of the Model + +Ideally, our collection model would allow us to implement every +interesting data structure with memory safety, optimal performance, +value semantics, and a variety of other useful properties such as +minimal invalidation of indexes upon mutation. In practice, these +goals and the Swift language model interact in complicated ways, +preventing some designs altogether, and suggesting a variety of +implementation strategies for others that can be selected based on +one's priorities. We've done some in-depth investigation of these +implications, but presenting and explaining them is well beyond the +scope of this proposal. + +We can, however, be fairly sure that this change does not regress our +ability to build any Collections that could have been built in Swift +2.2. After all, it is still *possible* to implement indices that store +references and have the old traversal methods (the collection's +traversal methods would simply forward to those of the index), so we +haven't lost the ability to express anything. + ## Overview of Type And Protocol Changes This section covers the proposed structural changes to the library at @@ -501,26 +521,6 @@ in the interest of full disclosure: just to get our test code to type-check in reasonable time. Again, an ostensibly temporary—but still real—cost. -## Limitations of the Model - -Ideally, our collection model would allow us to implement every -interesting data structure with memory safety, optimal performance, -value semantics, and a variety of other useful properties such as -minimal invalidation of indexes upon mutation. In practice, these -goals and the Swift language model interact in complicated ways, -preventing some designs altogether, and suggesting a variety of -implementation strategies for others that can be selected based on -one's priorities. We've done some in-depth investigation of these -implications, but presenting and explaining them is well beyond the -scope of this proposal. - -We can, however, be fairly sure that this change does not regress our -ability to build any Collections that could have been built in Swift -2.2. After all, it is still *possible* to implement indices that store -references and have the old traversal methods (the collection's -traversal methods would simply forward to those of the index), so we -haven't lost the ability to express anything. - ## Impact on existing code Code that **does not need to change**: From 6523c364afc7def5e2ed6c256f38eff23efdb4b7 Mon Sep 17 00:00:00 2001 From: Dave Abrahams Date: Sun, 10 Apr 2016 12:22:24 -0700 Subject: [PATCH 17/20] Detail some APIs previously missed and update notes accordingly --- proposals/NNNN-collections-move-indices.md | 49 ++++++++++++++++++++-- 1 file changed, 45 insertions(+), 4 deletions(-) diff --git a/proposals/NNNN-collections-move-indices.md b/proposals/NNNN-collections-move-indices.md index 09b4fefd73..35759b47de 100644 --- a/proposals/NNNN-collections-move-indices.md +++ b/proposals/NNNN-collections-move-indices.md @@ -309,6 +309,15 @@ protocol Collection { /// ``` var indices: Indices { get } + /// Returns the position immediately after `i`. + /// + /// - Precondition: `(startIndex.. Index + + /// Replaces `i` with its successor. + func formSuccessor(i: inout Index) + /// Returns the result of advancing `i` by `n` positions. /// /// - Returns: @@ -347,6 +356,31 @@ protocol Collection { n: IndexDistance, stepsFrom i: Index, limitedBy limit: Index ) -> Index + /// Advances `i` by `n` positions. + /// + /// - Precondition: `n >= 0` unless `Self` conforms to + /// `BidirectionalCollection`. + /// - Precondition: + /// - If `n > 0`, `n <= self.distance(from: i, to: self.endIndex)` + /// - If `n < 0`, `n >= self.distance(from: i, to: self.startIndex)` + /// + /// - Complexity: + /// - O(1) if `Self` conforms to `RandomAccessCollection`. + /// - O(`abs(n)`) otherwise. + func formIndex(n: IndexDistance, stepsFrom i: inout Index) + + /// Advances `i` by `n` positions, or until it equals `limit`. + /// + /// - Precondition: `n >= 0` unless `Self` conforms to + /// `BidirectionalCollection`. + /// + /// - Complexity: + /// - O(1) if `Self` conforms to `RandomAccessCollection`. + /// - O(`abs(n)`) otherwise. + func formIndex( + n: IndexDistance, stepsFrom i: inout Index, limitedBy limit: Index + ) + /// Returns the distance between `start` and `end`. /// /// - Precondition: `start <= end` unless `Self` conforms to @@ -370,10 +404,17 @@ protocol BidirectionalCollection { } ``` -Note that `RandomAccessCollection` does not add any *syntactic* -requirements beyond those of `BidirectionalCollection`. Instead, it -places tighter performance bounds on operations such as `c.index(n, -stepsFrom: i)` (O(1) instead of O(`n`)). +Note: + +* The mutating `formSuccessor`, `formPredecessor`, and the `formIndex` + overloads essentially enshrine the previously-hidden + `_successorInPlace` et al., which can be important for performance + when handling the rare heavyweight index type such as `AnyIndex`. + +* `RandomAccessCollection` does not add any *syntactic* requirements + beyond those of `BidirectionalCollection`. Instead, it places + tighter performance bounds on operations such as `c.index(n, + stepsFrom: i)` (O(1) instead of O(`n`)). ## `Range`s From 85fef51b32057405048b33a0d176a33cbed88116 Mon Sep 17 00:00:00 2001 From: Dave Abrahams Date: Sun, 10 Apr 2016 12:27:05 -0700 Subject: [PATCH 18/20] Update user impact section. --- proposals/NNNN-collections-move-indices.md | 53 ++++++---------------- 1 file changed, 14 insertions(+), 39 deletions(-) diff --git a/proposals/NNNN-collections-move-indices.md b/proposals/NNNN-collections-move-indices.md index 35759b47de..274077454d 100644 --- a/proposals/NNNN-collections-move-indices.md +++ b/proposals/NNNN-collections-move-indices.md @@ -566,38 +566,13 @@ in the interest of full disclosure: Code that **does not need to change**: -* Code that works with `Array`, `ArraySlice`, `ContiguousArray`, their - indices (`Int`s), performs index arithmetic. +* Code that works with `Array`, `ArraySlice`, `ContiguousArray`, and + their indices. * Code that operates on arbitrary collections and indices (on concrete - instances or in generic context), but does not advance indices + instances or in generic context), but does no index traversal. -* Iteration over collection's indices with `c.indices` does not change - when: - - the underlying collection is not mutated, - - or it is known that `c.indices` is trivial (for example, in - `Array`), - - or when performance is not a concern: - - ```swift - // No change, because 'c' is not mutated, only read. - for i in c.indices { - print(i, c[i]) - } - - // No change, because Array's index collection is trivial and does not - // hold a reference to Array's storage. There is no performance - // impact. - for i in myArray.indices { - c[i] *= 2 - } - - // No change, because 'c' is known to be small, and doing a copy on - // the first loop iteration is acceptable. - for i in c.indices { - c[i] *= 2 - } - ``` +* Iteration over collection's indices with `c.indices` does not change. * APIs of high-level collection algorithms don't change, even for algorithms that accept indices as parameters or return indices (e.g., @@ -613,20 +588,20 @@ Code that **needs to change**: ```swift // Before: var i = c.indexOf { $0 % 2 == 0 } - i = i.successor() - print(c[i]) + let j = i.successor() + print(c[j]) // After: var i = c.indexOf { $0 % 2 == 0 } // No change in algorithm API. - i = c.next(i) // Advancing an index requires a collection instance. - print(c[i]) // No change in subscripting. + let j = c.successor(i) // Advancing an index requires a collection instance. + print(c[j]) // No change in subscripting. ``` - The transformation from `i.successor()` to `c.next(i)` is non-trivial. - Performing it correctly requires knowing extra information -- how to - get the corresponding collection. In the general case, it is not - possible to perform this migration automatically. In some cases, a - sophisticated migrator could handle the easy cases. + The transformation from `i.successor()` to `c.successor(i)` is + non-trivial. Performing it correctly requires knowing how to get + the corresponding collection. In general, it is not possible to + perform this migration automatically. A very sophisticated migrator + could handle some easy cases. * Custom collection implementations need to change. A simple fix would be to just move the the methods from indices to collections to satisfy @@ -634,7 +609,7 @@ Code that **needs to change**: does not require design work. This fix would allow the code to compile and run. - In order to take advantage of the performance improvements in + In order to take advantage of performance improvements in the new model, and remove reference-counted stored properties from indices, the representation of the index might need to be redesigned. From 8ca47335e967cccccc7a50dacec6cb4d349b467a Mon Sep 17 00:00:00 2001 From: Dave Abrahams Date: Sun, 10 Apr 2016 12:29:38 -0700 Subject: [PATCH 19/20] Wrap up new indexing proposal --- proposals/NNNN-collections-move-indices.md | 34 +++------------------- 1 file changed, 4 insertions(+), 30 deletions(-) diff --git a/proposals/NNNN-collections-move-indices.md b/proposals/NNNN-collections-move-indices.md index 274077454d..ce88ee74a5 100644 --- a/proposals/NNNN-collections-move-indices.md +++ b/proposals/NNNN-collections-move-indices.md @@ -619,36 +619,10 @@ Code that **needs to change**: require a redesign. Implementing this in an automated migrator might be possible, but would be a heroic migration for a rare case. -## Implementation status +## Implementation Status -Since this is a large change to the collection model, to evaluate it, we have -prepared [a prototype -implementation](https://github.com/apple/swift/blob/master/test/Prototypes/CollectionsMoveIndices.swift). -Shawn Erickson and Austin Zheng have been working on re-implementing the -prototype in the core library on the [swift-3-indexing-model -branch](https://github.com/apple/swift/tree/swift-3-indexing-model). - -The prototype and this proposal have one major difference around the use -of unowned references. In the prototype, we are suggesting to use -unowned references to implement `c.indices`. Unfortunately, we realized -that this design can create an unpredictable user model. For example: - -```swift -let c = getCollection() -for i in c.indices { - if i.isRed { - print(c[i]) - } - print("no") -} -``` - -If ARC can prove that the collection does not contain indices where -`i.isRed` is true, then it can deinit the collection right after -constructing the index collection, before starting the loop. Then, the -first time we need to advance the index, dereferencing the unowned -reference to the collection storage (which is gone now) will cause a -trap. +[This pull request](https://github.com/apple/swift/pull/2108) contains +a complete implementation. ## Alternatives considered @@ -661,7 +635,7 @@ how to adjust the concept of D's range (similar to `Slice` in Swift) for mutable value-typed collections. In D, you process a collection by repeatedly slicing off elements. Once you have found an element that you would like to mutate, it is not clear how to actually change the -original collection, if the collection and its slice are value types? +original collection, if the collection and its slice are value types. ---- From bf6988716de91e59a88ba734ec537f37435233ad Mon Sep 17 00:00:00 2001 From: Dave Abrahams Date: Sun, 10 Apr 2016 12:34:06 -0700 Subject: [PATCH 20/20] Retitle the proposal --- proposals/NNNN-collections-move-indices.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/NNNN-collections-move-indices.md b/proposals/NNNN-collections-move-indices.md index ce88ee74a5..56db714cbe 100644 --- a/proposals/NNNN-collections-move-indices.md +++ b/proposals/NNNN-collections-move-indices.md @@ -1,4 +1,4 @@ -# New collections model +# A New Model for Collections and Indices * Proposal: [SE-NNNN](https://github.com/apple/swift-evolution/blob/master/proposals/NNNN-collections-move-indices.md) * Author(s): [Dmitri Gribenko](https://github.com/gribozavr),