Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indices are (wrongly?) assumed to be integers and zero-based #37

Closed
ole opened this issue Apr 18, 2017 · 3 comments
Closed

Indices are (wrongly?) assumed to be integers and zero-based #37

ole opened this issue Apr 18, 2017 · 3 comments

Comments

@ole
Copy link
Contributor

ole commented Apr 18, 2017

I have a comment regarding terminology.

The indices the library computes for the Edit steps are currently always integers and always zero-based (i.e. the first index is 0). This works great for the intended main use case (table and collection view updates) because the index paths of a table or collection view are also zero-based and composed of integers.

However, it doesn't reflect the reality of Swift's collection types. For example, ArraySlice uses integer indices, but they aren't zero-based. If you compute the changeset for two ArraySlice instances, the result is arguably wrong or at least confusing:

let source = [2,3,4,5].dropFirst() // ArraySlice<Int>
let target = [3,4,5,6].dropFirst() // ArraySlice<Int>
let edits = Changeset.edits(from: source, to: target)
print(edits)
// Prints "[delete 3 at index 0, insert 6 at index 2]"

If you tried to apply this changeset literally to the source collection (i.e. delete the element at index 0 from source), youʼd get a crash because source doesnʼt have an "index" 0.

In a way, a similar issue is when you compute changesets for strings (as most of the unit tests do) because string indices are not integers, so saying something like "insert "a" at index 5" doesn't strictly make sense in Swift.

I find this a little confusing and misleading. Changeset appears to be generic for any collection, and yet it doesn't follow Swift's conventions where the term "index" has a very specific meaning.

When I first noticed this "problem" I wanted to suggest replacing the integer indices in Edit and EditOperation with the collection's index type (i.e. something like T.Index). On second thought, this doesn't make sense because a changeset would have to compute "virtual" indices for a collection that doesn't really exist, and since collections are free to invalidate any existing index upon mutation, I don't think you could compute the indices in a generic way. Moreover, using real collection indices would make supporting table/collection view updates harder, not easier.

After writing all this, I realize that the way this works is actually the best solution for the given problem. Having written this, I'm going to submit it too, even though I don't have a specific suggestion. Feel free to close this issue, and maybe it will help others understand this better when they find it in search.

My only suggestion would be to make it clear that Changeset and Edit aren't concerned with indices, but with offsets. I think if the debugDescription read "delete x at offset 0" etc. I would have been less confused. This is exemplified by the use of enumerated() to compute the column value in edits(from:to:), since enumerated() computes offsets from 0 and not indices (see http://khanlou.com/2017/03/you-probably-don't-want-enumerated/ for more info on enumerated() and how it's often misinterpreted).

@osteslag
Copy link
Owner

Thanks for your comments, @ole. You make a very valid point and, as always, in a well-articulated way.

[The] Edit steps are currently always integers and always zero-based (i.e. the first index is 0). This works great for the intended main use case (table and collection view updates) because the index paths of a table or collection view are also zero-based and composed of integers.

I have toyed with the idea of supporting a generic index type (the collection’s). This would make it easy to apply table/collection view data source changes spanning multiple sections, for example, using IndexPath.

(Now, more than a year later, I still feel like a newbie when it comes to Swift and haven’t yet taken the time to fully understand the problem and possible solutions.)

Your suggestion to use the term offset where applicable is just excellent. I will do that and clarify the issue in the README.

Thanks.

@ole
Copy link
Contributor Author

ole commented Apr 21, 2017

I think even if you wanted to use the collection's generic index type, it's not going to work. Take strings as an example — or String.CharacterView, to be exact, because String doesn't conform to Collection in Swift 3 (but that will likely change again in Swift 4). When I talk about string indices in the following, I mean String.CharacterView.Index, which is typealiased to String.Index.

Say we want to compute:

Changeset.edits(from: "ABC".characters, to: "😜BD".characters)

The result should clearly be:

[replace with 😜 at offset 0, replace with D at offset 2]

(I'm using "offset" here for what the library currently calls "index".)

If we wanted to refactor the library to use the collection's index type rather than integer offsets, we run into a problem. Here's the transformation matrix:

0: "😜" 2: "B" 3: "D"
0: "A"
1: "B"
2: "C"

The number before each character indicates the character's string index value. Notice that the index value of "B" in the target row is 2 and not 1, as one would perhaps expect. String indices are opaque values, but internally they currently use a UTF-16 offset to indicate the character's position in the string's storage. Since the emoji character takes up two UTF-16 units, all following characters have their index value incremented. The index with the value 1 doesn't exist in the target string. (Note that the offset of "B" from the start of the target string is still 1; that's why it's important to distinguish between index and offset.)

Now see what happens if we apply the first substitution (replace the "A" at index 0 with "😜"): Making this change invalidates the indices of all subsequent characters in the string. That is, if we have earlier obtained an index for the "C" character in the source string, it would no longer be valid after performing the first substitution (the same index now points to the "B" character; depending on the change, it may also have become entirely invalid and cause a crash if you tried to subscript the mutated with it).

It's probably possible (if difficult) to apply the current algorithm to strings using the string index type because the algorithm goes through the collections from left to right (top to bottom) and never has to consider values to the right (bottom) of the current element, but the Collection protocol makes no such guarantees. Collections are free to invalidate any existing index upon mutation. Therefore, I'm not sure it's even possible to write a generic version of the algorithm that uses the collection's index type. And even if it's possible, it may not be a good idea.

@osteslag
Copy link
Owner

Closed by #39.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants