-
Notifications
You must be signed in to change notification settings - Fork 10.5k
Description
Previous ID | SR-15173 |
Radar | rdar://problem/82934328 |
Original Reporter | @karwa |
Type | Bug |
Attachment: Download
Environment
Apple Swift version 5.5-dev (LLVM bcf6482631963a8, Swift 37aa203)
This is just a random nightly I happen to have installed. As of reporting, Godbolt is running:
Swift version 5.6-dev (LLVM ca88d53176e346a, Swift dc60996)
Additional Detail from JIRA
Votes | 0 |
Component/s | Standard Library |
Labels | Bug |
Assignee | None |
Priority | Medium |
md5: 09ab6e8a9dcadc52c76efb301ff669b5
Issue Description:
FlattenSequence.count has really, really bad performance.
Check out the following code:
func test(_ input: Array<Array<Int>>) -> Int {
input.joined().count
}
Trying this in Godbolt (nightly, -O), it generates an enormous amount of code. 349 lines, full of generic specializations of this and that, retains, releases, __swift_instantiateConcreteTypeFromMangledName, and, worst of all, this thing:
generic specialization <[Swift.Int]> of protocol witness for Swift.Collection.subscript.read : (A.Index) -> A.Element in conformance [A] : Swift.Collection in Swift:
push r15
push r14
push r12
push rbx
push rax
mov r14, rdx
mov r15, rsi
mov r12, rdi
mov edi, 40
call malloc@PLT
mov rbx, rax
mov qword ptr [r12], rax
mov rdi, rax
mov rsi, r15
mov rdx, r14
call (generic specialization <[Swift.Int]> of Swift.Array.subscript.read : (Swift.Int) -> A)
mov qword ptr [rbx + 32], rax
lea rax, [rip + (protocol witness for Swift.Collection.subscript.read : (A.Index) -> A.Element in conformance [A] : Swift.Collection in Swiftgeneric specialization <[Swift.Int]> of with unmangled suffix ".resume.0")]
add rsp, 8
pop rbx
pop r12
pop r14
pop r15
ret
Yes, you read that correctly. This is an Array 'read' accessor - the thing that will access every element, and it has a big, fat malloc in the middle.
It's really bad. Here you can see it (and the resulting frees) absolutely dominate my actual collection code.
My benchmarks got 500% slower when using it. I couldn't believe it.
Anyway, what's really interesting is what happens when you try the following instead:
func test2(_ input: Array<Array<Int>>) -> Int {
var count = 0
for _ in input.joined() { count += 1}
return count
}
Boom. 59 lines, all in a single function. None of these generic specializations floating around, no __swift_instantiateConcreteTypeFromMangledName, and most importantly: no mallocs (okay, there's a couple of retain/releases. I wonder if they're really necessary, but I can live with them). This code is hundreds of times faster, not to mention much, much smaller. Again, compiling at -O.
Anyway, I looked at the implementation of FlattenSequence, and nothing really jumps out at me. It seems to do a full index traversal in its `distance(from:to🙂`, which seems a bit unnecessary, and it doesn't appear to make use of the wrapped collection's version of `distance(from:to🙂` AFAICT - so, room for improvement, but nothing that would cause the behaviour I'm seeing.
I get a feeling this is going to be a tricky one. Bring a shovel, because there's some serious digging to be done.
Here's the Godbolt link with both versions of the function. Un-comment the one you'd like to see.