performance issue iterating over a query #23

violabg · 2014-11-24T10:02:39Z

I'm having big performance issue iterating over a query, this is my code:

func getPdfWithCategory(language:Language) -> CategoriesForLanguage {
        let categoriesForLanguage = CategoriesForLanguage()

        let PDFs = pdfDb!.pdfsTb.table
        let categories = pdfDb!.categoriesTb.table
        start = NSDate()
        let query = PDFs.select(PDFs[*], categories[pdfDb!.categoriesTb.id], categories[pdfDb!.categoriesTb.name], categories[pdfDb!.categoriesTb.thumb], categories[pdfDb!.categoriesTb.orderid])
.join(categories, on: categories[pdfDb!.categoriesTb.id] == PDFs[pdfDb!.pdfsTb.categoryId])
.filter(PDFs[pdfDb!.pdfsTb.languageId] == language.id)

        let timeInterval:NSTimeInterval = start!.timeIntervalSinceNow
        println("query \(timeInterval)")

        println("query count = \(query.count)")

        start = NSDate()
        for row in query {

        }

        let timeInterval2:NSTimeInterval = start!.timeIntervalSinceNow
        println("for row in query \(timeInterval2)")
        return categoriesForLanguage
    }

and this is the print results:

query -0.0109010338783264
query count = 88
for row in query -2.39910697937012

I have even removed the join to see if it was that:

func getPdfWithCategory(language:Language) -> CategoriesForLanguage {
        let categoriesForLanguage = CategoriesForLanguage()

        let PDFs = pdfDb!.pdfsTb.table
        let categories = pdfDb!.categoriesTb.table
        start = NSDate()
        let query = PDFs.select(PDFs[*]).filter(PDFs[pdfDb!.pdfsTb.languageId] == language.id)

        let timeInterval:NSTimeInterval = start!.timeIntervalSinceNow
        println("query \(timeInterval)")

        println("query count = \(query.count)")

        start = NSDate()
        for row in query {

        }

        let timeInterval2:NSTimeInterval = start!.timeIntervalSinceNow
        println("for row in query \(timeInterval2)")
        return categoriesForLanguage
    }

it improved a little bit, but still too long:

query -0.00782698392868042
query count = 88
for row in query -1.40383696556091

P.S. I'm testing on a iPad

stephencelis · 2014-11-24T15:44:19Z

There are likely some low-hanging performance improvements to be made right now, especially around row iteration. I've just been optimizing the interface, first. Can we isolate a couple factors, though?

Is this a Release build, or Debug?
Performance vs. raw query. Let's find out how much more expensive the type-safe iteration is. You can run pdfDb!.trace(println) before all the above queries to set up a SQL query logger, from here, take the queries that spit out and test them with raw iteration (for _ in db.prepare(query) {}). Run both queries at least twice to account for warm-up.

If you have runnable code you can share, please email it and it should make any performance testing a bit easier.

violabg · 2014-11-24T16:16:08Z

I have changed the code to trace raw iteration, and it is much faster

func getPdfWithCategory(language:Language) -> CategoriesForLanguage {
        let categoriesForLanguage = CategoriesForLanguage()

        pdfDb!.db!.trace(println)

        let PDFs = pdfDb!.pdfsTb.table
        let categories = pdfDb!.categoriesTb.table
        start = NSDate()

        let query = "SELECT p.* , c.id as categoryid, c.name as categoryname, c.thumb as categorythumb, c.orderid as categorysortid FROM pdf p inner join categories c on c.id=p.categoryid WHERE p.languageId='\(language.id)'"
        let timeInterval:NSTimeInterval = start!.timeIntervalSinceNow
        println("query \(timeInterval)")

        start = NSDate()

        let stmt = pdfDb!.db!.prepare(query)
        for row in stmt {

        }

        let timeInterval2:NSTimeInterval = start!.timeIntervalSinceNow
        println("for row in query \(timeInterval2)")

        return categoriesForLanguage
    }

query -0.00111699104309082
SELECT p.* , c.id as categoryid, c.name as categoryname, c.thumb as categorythumb, c.orderid as categorysortid FROM pdf p inner join categories c on c.id=p.categoryid WHERE p.languageId='4'
for row in query -0.623710989952087

compared with the original code output:

query -0.00927197933197021
SELECT count() FROM pdf INNER JOIN categories ON (categories.id = pdf.categoryId) WHERE (pdf.languageId = '4')
query count = 88
SELECT pdf., categories.id, categories.name, categories.thumb, categories.orderid FROM pdf INNER JOIN categories ON (categories.id = pdf.categoryId) WHERE (pdf.languageId = '4')
for row in query -2.36225700378418

regarding sharing my code, I need to create a new project without all sensitive data and will email it to you.

stephencelis · 2014-11-24T16:21:40Z

Great, thanks! I have a few ideas already, but would be helpful to have a baseline to work with.

TomasLinhart · 2014-12-14T22:40:55Z

I can also confirm it is really slow. To get 1000 rows from my database it takes over 5 seconds. When I use db.prepare then the performance is much better.

stephencelis · 2014-12-15T04:49:37Z

@TomasLinhart Can you give actual benchmarks? I'm under the impression that the difference is linear and about 4x slower at the moment, but any specifics to the contrary (with examples) would be helpful.

TomasLinhart · 2014-12-15T13:36:19Z

@stephencelis Sure, I can provide you something. But it is not so hard to test. Just create a database with over 1000 rows and at least 15 columns and try to load everything.

stephencelis · 2014-12-15T17:30:55Z

@TomasLinhart Sure, but less time for me to have a reproducible problem the better. My personal use cases aren't dealing with this much data all at once, so it's been more of a back burner fix for me. (In fact, if you give me your use case that would be helpful knowledge, as well.) The easier you make it for me to take it off the back burner, the more likely fixes will come sooner when I have personal time to spare and allot to this part of the project. I'm also happy to take pull requests if you'd like to dig in yourself and spend time contributing to the project.

TomasLinhart · 2014-12-15T19:17:49Z

@stephencelis Sure, I will prepare an example project. Currently I just switched into db.prepare but in future I might need more so I could dive in and do some pull requests but for now I just want to finish my hobby project.

My use case is I have data set that I want to have in memory so I can manipulate it easily.

TomasLinhart · 2014-12-15T19:40:46Z

I have created a project with benchmark https://github.com/TomasLinhart/SQLitePerformance Just download it and run it and you will see difference in the performance.

stephencelis · 2014-12-15T20:01:54Z

Thanks! I'll take a look at this one—and @violabg's, above—when I have the chance.

TomasLinhart · 2014-12-15T20:09:11Z

I have updated the benchmark with C API example which is 17x times faster than db.prepare and 225x faster than typed API.

stephencelis · 2014-12-15T20:16:03Z

@TomasLinhart db.prepare is fairly lightweight, but there is implicit type conversion happening there that isn't happening in your native example. Assuming your benchmarks are from a release build, we may hit the point where Swift becomes the bottleneck (and will hopefully be optimized automatically in the future). I'll check for optimizations that can be made there, as well, though.

TomasLinhart · 2014-12-15T21:01:34Z

@stephencelis You are right, I checked your code for db.prepare and it is looking good. I tried to remove the logic for generator and then it is almost same speed as my C API example. So I guess biggest bottleneck in db.prepare is row.append but it is not possible to do much about it...

stephencelis · 2014-12-15T21:04:47Z

@TomasLinhart Were you running things in debug mode? My results:

using typed API took 6.32480198144913 seconds
using db.prepare took 0.377619981765747 seconds
using C native API took 0.0522429943084717 seconds

If I change your iteration logic to use sqlite3_column_type and sqlite3_column_{type} methods to determine the types and yield arrays of rows, there is negligible overhead:

using db.prepare took 0.388479948043823 seconds
using C native API took 0.382124006748199 seconds

        var results = [[Any?]]()
        if (sqlite3_prepare_v2(db, query.cStringUsingEncoding(NSUTF8StringEncoding)!, -1, &statement, nil)
            == SQLITE_OK) {
                let columnCount = sqlite3_column_count(statement)
                while (sqlite3_step(statement) == SQLITE_ROW) {
                    var row = [Any?]()
                    for idx in 0..<columnCount {
                        switch sqlite3_column_type(statement, Int32(idx)) {
                        case SQLITE_BLOB:
                            let bytes = sqlite3_column_blob(statement, Int32(idx))
                            let length = sqlite3_column_bytes(statement, Int32(idx))
                            row.append(Blob(bytes: bytes, length: Int(length)))
                        case SQLITE_FLOAT:
                            row.append(Double(sqlite3_column_double(statement, Int32(idx))))
                        case SQLITE_INTEGER:
                            let int = Int(sqlite3_column_int64(statement, Int32(idx)))
                            var bool = false
                            if let type = String.fromCString(sqlite3_column_decltype(statement, Int32(idx))) {
                                bool = type.hasPrefix("BOOL")
                            }
                            row.append(bool ? int != 0 : int)
                        case SQLITE_NULL:
                            row.append(nil)
                        case SQLITE_TEXT:
                            row.append(String.fromCString(UnsafePointer<CChar>(sqlite3_column_text(statement, Int32(idx))))!)
                        case let type:
                            assertionFailure("unsupported column type: \(type)")
                        }
                    }
                    results.append(row)
                }
                sqlite3_finalize(statement)
        }

Assuming you add logic to your C API example that handles the underlying types at all, I imagine there is little to no performance hit using SQLite.swift.

Given that, the typed interface is still weighing in at about 21x less performant in this case, which is better than 225x, but slow enough to warrant investigation and cleanup when I can.

Meanwhile, you should be able to batch/stream the results into your interface to avoid such big delays and still be able to use the typed interface.

TomasLinhart · 2014-12-15T21:07:05Z

Yeah, I was, sorry about that. So db.prepare is fast enough, check my last comment, biggest overhead is the append but there is not much we can do about that.

So only problem is only the typed interface. 😄

stephencelis · 2014-12-15T21:07:43Z

@TomasLinhart Ah, good to hear you're seeing the same results as me now :)

In the end I think the typed interface will become more performant as Swift improves its own performance, but I do believe in making optimizations in SQLite.swift wherever possible, so I'll try to dig in sometime soon!

stephencelis · 2014-12-15T21:13:32Z

@TomasLinhart Actually, the point about append is a good one. I've seen cases where things like reduce and map are slower, but map actually improves the performance here quite a bit:

using db.prepare took 0.273506999015808 seconds
using C native API took 0.397485971450806 seconds

I'll commit this optimization (and more) when I have more time to dig in. Thanks!

stephencelis · 2014-12-15T21:22:40Z

Also quickly cached columnNames in the QueryGenerator and we're down to 3.4x the slowdown of the original db.prepare and 5x with the new improvements.

using typed API took 1.3848859667778 seconds
using db.prepare (old) took 0.388479948043823 seconds
using db.prepare took 0.279254972934723 seconds

This may be the easiest win for now. I'm not sure if you want to dig in for any more optimizations.

TomasLinhart · 2014-12-15T21:28:20Z

Hehe, I was also digging down and discovered same problem 😄

stephencelis · 2014-12-15T21:36:23Z

One last improvement is an internal restructuring of how Row stores its data:

using typed API took 0.305689036846161 seconds
using db.prepare (new) took 0.307142019271851 seconds

Iteration should basically be identically performant, now.

OK! I'll push these fixes but then I really need to get back to work 💨

TomasLinhart · 2014-12-15T21:42:35Z

Good job, I am glad the problem was discovered. Looking forward to your commit! 😃

stephencelis · 2014-12-15T22:34:06Z

Closed by:

stephencelis · 2014-12-15T22:35:36Z

Thanks for your help, @TomasLinhart!

violabg · 2014-12-18T09:29:51Z

Hi Stephen,
I don't know if you had any chances to look at my sample code, but the last commit did not improve much for me.
Also when I try this:

let PDFs = pdfDb!.pdfsTb.table
let categories = pdfDb!.categoriesTb.table

let query = PDFs.select(PDFs[*], categories[pdfDb!.categoriesTb.id], categories[pdfDb!.categoriesTb.name], categories[pdfDb!.categoriesTb.thumb], categories[pdfDb!.categoriesTb.orderid])
.join(categories, on: categories[pdfDb!.categoriesTb.id] == PDFs[pdfDb!.pdfsTb.categoryId])
.filter(PDFs[pdfDb!.pdfsTb.languageId] == language.id)

it breaks saying :
fatal error: no such table: "pdf": file /workspace/apple/ios8/SQLitePerformance-master/SQLite.swift/SQLite Common/Database.swift, line 335

with it didn't happen before

stephencelis · 2014-12-18T14:10:59Z

@violabg That sounds like an unrelated issue. What does pdfDb!.pdfsTb.table look like? If you run pdfDb!.run("select count(*) from pdfs") (or whatever your PDFs table name is) you should get the same error. It sounds like the table doesn't exist yet.

violabg · 2014-12-18T14:56:18Z

let PDFs = pdfDb!.pdfsTb.table
let categories = pdfDb!.categoriesTb.table
println(PDFs.count) //prints 467
println(categories.count) //prints 13
let query = PDFs.select(PDFs[*], categories[pdfDb!.categoriesTb.id], categories[pdfDb!.categoriesTb.name], categories[pdfDb!.categoriesTb.thumb], categories[pdfDb!.categoriesTb.orderid]).join(categories, on: categories[pdfDb!.categoriesTb.id] == PDFs[pdfDb!.pdfsTb.categoryId]).filter(PDFs[pdfDb!.pdfsTb.languageId] == language.id)

        var categoriesDictionary = [Int: PdfCategory]()
        var currentCategory:PdfCategory

        for row in query {

here I get the error, when it try to enter the for in loop
fatal error: no such table: "pdf": file /workspace/apple/ios8/SQLitePerformance-master/SQLite.swift/SQLite Common/Database.swift, line 335

wich is table:
let PDFs = pdfDb!.pdfsTb.table

but it works here
println(PDFs.count) //prints 467
println(categories.count) //prints 13

stephencelis · 2014-12-18T16:25:56Z

Very strange. The error is coming from SQLite, so the query must be getting malformed somehow. Can you email me the latest version of your code so that I can better troubleshoot?

kfmfe04 · 2014-12-19T03:48:51Z

Aside: just wanted to throw in another sample from my personal project.

+65% longer for a flat-out table scan when using typed versus raw (8.54s vs 5.16s).

FWIW, I have been using flat-out binary block dumps for large arrays (10k) of structs/fixed-data for speed (like a column-based DBMS), but I am migrating to sqlite for ease-of-maintenance and flexibility.

        /// typed: db_read -8.54326200485229
        let tbl = GGGTable.CardAnnotation
        for r in tbl {
            if var s = notes[ r[e_cid] ]
            {
                s.unixtime = r[e_show_me_on]
            } else {
                notes[ r[e_cid] ] = CardAnnotation( unixtime: r[e_show_me_on] )
            }
        }

        /// raw: db_read -5.16345697641373
        let db = GGGDatabase.db
        let stmt = db.prepare( "SELECT * FROM cardannotation" )
        for r in stmt {
            let idx = r[0] as Int
            let ut  = r[1] as Int
            if var s = notes[ idx ]
            {
                s.unixtime = ut
            } else {
                notes[ idx ] = CardAnnotation( unixtime: ut )
            }
        }

stephencelis · 2014-12-19T15:24:36Z

@kfmfe04 And this is in a release build, not debug?

Regardless, the difference isn't too bad, considering. Certain optimizations are going to come with time as Swift improves. In the meantime there will be a tradeoff in speed vs. safety and code clarity.

SQLite.swift's type-safe interface should be pretty speedy when you use SQLite itself for querying and scoping the data you're working with, but if you're loading a large dataset into memory, you're probably going to want to optimize as you did, dipping down into the raw API with fewer generics.

I'm hoping that Swift's performance improves over time and the gap you're seeing narrows quite a bit.

I'm also open to pull requests that continue to improve performance with the type-safe interface, so feel free to dig in if you have ideas! I'll also keep performance in mind as I work with things.

stephencelis · 2014-12-19T15:45:14Z

@violabg Looks like the issue you've stumbled upon is a table quoting bug. I'll try to fix it soon!

violabg · 2014-12-19T15:51:48Z

thanks Stephen,
once you fix it, could you test the performance of my code and see if they match your results?

stephencelis · 2014-12-19T17:31:21Z

@violabg Fix pushed here: 1f070f7

I'm not sure how to check the performance difference, though. Please let me know how it's improved!

violabg · 2014-12-20T09:29:16Z

Thanks Stephen, the problem is fixed now and this are my test results:

func getPdfWithCategory(language:Language) -> CategoriesForLanguage {
        let categoriesForLanguage = CategoriesForLanguage()
        categoriesForLanguage.laguangeId = language.id

        let PDFs = pdfDb!.pdfsTb.table
        let categories = pdfDb!.categoriesTb.table

        self.start = NSDate()
        let query = PDFs.select(PDFs[*], categories[pdfDb!.categoriesTb.id], categories[pdfDb!.categoriesTb.name], categories[pdfDb!.categoriesTb.thumb], categories[pdfDb!.categoriesTb.orderid]).join(categories, on: categories[pdfDb!.categoriesTb.id] == PDFs[pdfDb!.pdfsTb.categoryId]).filter(PDFs[pdfDb!.pdfsTb.languageId] == language.id)

        let timeInterval:NSTimeInterval = start!.timeIntervalSinceNow
        println("query \(timeInterval)")

        println("query.count \(query.count)")

        var categoriesDictionary = [Int: PdfCategory]()
        var currentCategory:PdfCategory

        self.start = NSDate()
        for row in query {
//            
        }
        let timeInterval2:NSTimeInterval = start!.timeIntervalSinceNow
        println("for \(timeInterval2)")

        return categoriesForLanguage
    }

on mac:
query -0.00147002935409546
query.count 87
for -0.021390974521637
timeInterval -0.0214470028877258

on ipad 4:
query -0.0244329571723938
query.count 87
for -0.350589990615845
timeInterval -0.350932002067566

iteration over 87 records fills a little slow, I have ported this project from Objective C and is much faster.
Probably as you said, Swift is the bottleneck.

kfmfe04 · 2014-12-28T12:06:06Z

that was a debug build - ok - I have a more pressing issue

I do a table scan something like

    let sentences = db[ "sentences" ]
    let id              = Expression<Int>("id")
    let entry     = Expression<String>("entry")
    let address   = Expression<String>("address")
    let tokens = Expression<String>("tokens")
    for r in sentences
    {
    // ....
    }

for about 8,000 records, this takes 3 minutes on an iPod touch. Parsing a pipe-delimited ASCII file equivalent takes 25 seconds. Is this within range of expectations or should a table-scan be faster?

Maybe I should try passing in a "SELECT" statement instead and see how fast it is?

stephencelis · 2014-12-29T21:18:16Z

@kfmfe04 What are you doing with the data as you iterate? If you're doing empty iteration, it's not a fair comparison because SQLite.swift has type-handling code built in. If you're working with such a large dataset all at once, you may need to optimize by ditching the type-safe interface for something lower-level. Or better yet, limit your queries to only the data you need, and use aggregate functions for calculations across the dataset.

androidcn · 2015-01-07T03:29:23Z

so slow ....
using
for about 2000rows

stephencelis · 2015-01-07T06:24:14Z

@androidcn Please try to constructively contribute to the conversation and read this thread thoroughly (check my questions to others throughout, including whether you've compiled for "Release" or not, and what device you're running on).

In short:

The basic iteration API should be as fast as interfacing with libsqlite3 directly, while the typed API is as fast as adding type-checking logic for each row. You're likely hitting a Swift or SQLite bottleneck.
In the end, iterating over 2,000 rows seems a bit heavy-handed when SQLite provides aggregation APIs and when you can scope/batch your queries.

androidcn · 2015-01-09T03:29:46Z

@stephencelis but i want to use typed API

stephencelis · 2015-01-09T04:36:35Z

@androidcn The typed API is as fast as manually-typed, so please do. It will improve with Swift. Scope your queries for performance.

stephencelis · 2015-02-14T01:30:32Z

An update to those listening: Swift 1.2 performs 2x+ faster for me. If you end up re-running your benchmarks (off the swift-1-2 branch), please share your experiences.

androidcn · 2015-02-14T05:40:59Z

i will check it out later
On Sat, 14 Feb, 2015 at 9:30 am Stephen Celis notifications@github.com
wrote:

An update to those listening: Swift 1.2 performs 2x+ faster for me. If you
end up re-running your benchmarks (off the swift-1-2 branch), please
share your experiences.

Reply to this email directly or view it on GitHub
#23 (comment)
.

violabg · 2015-02-16T16:20:52Z

I have tested my code with branche Swift 1.2.
I get 1,3 sec.
with Swift 1.1 I get 6,9 sec.

using raw sql I get 0,28 sec. with Swift 1.2
and 0,98 sec. with Swift 1.1

stephencelis · 2015-02-16T20:51:57Z

Thanks @violabg! Hopefully things will continue to improve steadily.

TomasLinhart · 2015-02-17T06:48:52Z

A little bit OT. When will you merge the Swift 1.2 branch into master, @stephencelis ?

stephencelis · 2015-02-17T07:05:36Z

@TomasLinhart per #62:

This PR is open to track changes against the Swift 1.2 beta. It won't be merged till Swift 1.2 goes GM, but will be continuously rebased onto master.

sehgalsaheel · 2015-06-16T03:49:19Z

Is the limit set to 1000? Or is there a way to do more than 1000 (8000 to be specific)

stephencelis · 2015-06-16T11:37:43Z

@sehgalsaheel Have you tried it? Have you had problems? You should have the same ability, limit-wise, as you'd have with raw sqlite3. If the device can't handle 8,000, you should be batching.

stephencelis added the enhancement label Nov 24, 2014

stephencelis closed this as completed Dec 15, 2014

ronnycsharp mentioned this issue Apr 13, 2015

No such module 'SQLite' in Xcode 6.3 Swift 1.2 and Target iOS 7.0 #97

Closed

performance issue iterating over a query #23

performance issue iterating over a query #23

Comments

violabg commented Nov 24, 2014

stephencelis commented Nov 24, 2014

violabg commented Nov 24, 2014

stephencelis commented Nov 24, 2014

TomasLinhart commented Dec 14, 2014

stephencelis commented Dec 15, 2014

TomasLinhart commented Dec 15, 2014

stephencelis commented Dec 15, 2014

TomasLinhart commented Dec 15, 2014

TomasLinhart commented Dec 15, 2014

stephencelis commented Dec 15, 2014

TomasLinhart commented Dec 15, 2014

stephencelis commented Dec 15, 2014

TomasLinhart commented Dec 15, 2014

stephencelis commented Dec 15, 2014

TomasLinhart commented Dec 15, 2014

stephencelis commented Dec 15, 2014

stephencelis commented Dec 15, 2014

stephencelis commented Dec 15, 2014

TomasLinhart commented Dec 15, 2014

stephencelis commented Dec 15, 2014

TomasLinhart commented Dec 15, 2014

stephencelis commented Dec 15, 2014

stephencelis commented Dec 15, 2014

violabg commented Dec 18, 2014

stephencelis commented Dec 18, 2014

violabg commented Dec 18, 2014

stephencelis commented Dec 18, 2014

kfmfe04 commented Dec 19, 2014

stephencelis commented Dec 19, 2014

stephencelis commented Dec 19, 2014

violabg commented Dec 19, 2014

stephencelis commented Dec 19, 2014

violabg commented Dec 20, 2014

kfmfe04 commented Dec 28, 2014

stephencelis commented Dec 29, 2014

androidcn commented Jan 7, 2015

stephencelis commented Jan 7, 2015

androidcn commented Jan 9, 2015

stephencelis commented Jan 9, 2015

stephencelis commented Feb 14, 2015

androidcn commented Feb 14, 2015

violabg commented Feb 16, 2015

stephencelis commented Feb 16, 2015

TomasLinhart commented Feb 17, 2015

stephencelis commented Feb 17, 2015

sehgalsaheel commented Jun 16, 2015

stephencelis commented Jun 16, 2015