Skip to content

[SR-15173] FlattenSequence.count has awful, awful performance #57496

@karwa

Description

@karwa
Previous ID SR-15173
Radar rdar://problem/82934328
Original Reporter @karwa
Type Bug

Attachment: Download

Environment

Apple Swift version 5.5-dev (LLVM bcf6482631963a8, Swift 37aa203)

This is just a random nightly I happen to have installed. As of reporting, Godbolt is running:

Swift version 5.6-dev (LLVM ca88d53176e346a, Swift dc60996)

Additional Detail from JIRA
Votes 0
Component/s Standard Library
Labels Bug
Assignee None
Priority Medium

md5: 09ab6e8a9dcadc52c76efb301ff669b5

Issue Description:

FlattenSequence.count has really, really bad performance.

Check out the following code:

func test(_ input: Array<Array<Int>>) -> Int {
    input.joined().count
}

Trying this in Godbolt (nightly, -O), it generates an enormous amount of code. 349 lines, full of generic specializations of this and that, retains, releases, __swift_instantiateConcreteTypeFromMangledName, and, worst of all, this thing:

generic specialization <[Swift.Int]> of protocol witness for Swift.Collection.subscript.read : (A.Index) -> A.Element in conformance [A] : Swift.Collection in Swift:
        push    r15
        push    r14
        push    r12
        push    rbx
        push    rax
        mov     r14, rdx
        mov     r15, rsi
        mov     r12, rdi
        mov     edi, 40
        call    malloc@PLT
        mov     rbx, rax
        mov     qword ptr [r12], rax
        mov     rdi, rax
        mov     rsi, r15
        mov     rdx, r14
        call    (generic specialization <[Swift.Int]> of Swift.Array.subscript.read : (Swift.Int) -> A)
        mov     qword ptr [rbx + 32], rax
        lea     rax, [rip + (protocol witness for Swift.Collection.subscript.read : (A.Index) -> A.Element in conformance [A] : Swift.Collection in Swiftgeneric specialization <[Swift.Int]> of  with unmangled suffix ".resume.0")]
        add     rsp, 8
        pop     rbx
        pop     r12
        pop     r14
        pop     r15
        ret

Yes, you read that correctly. This is an Array 'read' accessor - the thing that will access every element, and it has a big, fat malloc in the middle.

It's really bad. Here you can see it (and the resulting frees) absolutely dominate my actual collection code.

My benchmarks got 500% slower when using it. I couldn't believe it.

Anyway, what's really interesting is what happens when you try the following instead:

func test2(_ input: Array<Array<Int>>) -> Int {
    var count = 0
    for _ in input.joined() { count += 1}
    return count
}

Boom. 59 lines, all in a single function. None of these generic specializations floating around, no __swift_instantiateConcreteTypeFromMangledName, and most importantly: no mallocs (okay, there's a couple of retain/releases. I wonder if they're really necessary, but I can live with them). This code is hundreds of times faster, not to mention much, much smaller. Again, compiling at -O.

Anyway, I looked at the implementation of FlattenSequence, and nothing really jumps out at me. It seems to do a full index traversal in its `distance(from:to🙂`, which seems a bit unnecessary, and it doesn't appear to make use of the wrapped collection's version of `distance(from:to🙂` AFAICT - so, room for improvement, but nothing that would cause the behaviour I'm seeing.

I get a feeling this is going to be a tricky one. Bring a shovel, because there's some serious digging to be done.

Here's the Godbolt link with both versions of the function. Un-comment the one you'd like to see.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugA deviation from expected or documented behavior. Also: expected but undesirable behavior.standard libraryArea: Standard library umbrella

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions