Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

performance issue iterating over a query #23

Closed
violabg opened this issue Nov 24, 2014 · 47 comments
Closed

performance issue iterating over a query #23

violabg opened this issue Nov 24, 2014 · 47 comments

Comments

@violabg
Copy link

violabg commented Nov 24, 2014

I'm having big performance issue iterating over a query, this is my code:

func getPdfWithCategory(language:Language) -> CategoriesForLanguage {
        let categoriesForLanguage = CategoriesForLanguage()

        let PDFs = pdfDb!.pdfsTb.table
        let categories = pdfDb!.categoriesTb.table
        start = NSDate()
        let query = PDFs.select(PDFs[*], categories[pdfDb!.categoriesTb.id], categories[pdfDb!.categoriesTb.name], categories[pdfDb!.categoriesTb.thumb], categories[pdfDb!.categoriesTb.orderid])
.join(categories, on: categories[pdfDb!.categoriesTb.id] == PDFs[pdfDb!.pdfsTb.categoryId])
.filter(PDFs[pdfDb!.pdfsTb.languageId] == language.id)

        let timeInterval:NSTimeInterval = start!.timeIntervalSinceNow
        println("query \(timeInterval)")

        println("query count = \(query.count)")

        start = NSDate()
        for row in query {

        }

        let timeInterval2:NSTimeInterval = start!.timeIntervalSinceNow
        println("for row in query \(timeInterval2)")
        return categoriesForLanguage
    }

and this is the print results:

query -0.0109010338783264
query count = 88
for row in query -2.39910697937012

I have even removed the join to see if it was that:

func getPdfWithCategory(language:Language) -> CategoriesForLanguage {
        let categoriesForLanguage = CategoriesForLanguage()

        let PDFs = pdfDb!.pdfsTb.table
        let categories = pdfDb!.categoriesTb.table
        start = NSDate()
        let query = PDFs.select(PDFs[*]).filter(PDFs[pdfDb!.pdfsTb.languageId] == language.id)

        let timeInterval:NSTimeInterval = start!.timeIntervalSinceNow
        println("query \(timeInterval)")

        println("query count = \(query.count)")

        start = NSDate()
        for row in query {

        }

        let timeInterval2:NSTimeInterval = start!.timeIntervalSinceNow
        println("for row in query \(timeInterval2)")
        return categoriesForLanguage
    }

it improved a little bit, but still too long:

query -0.00782698392868042
query count = 88
for row in query -1.40383696556091

P.S. I'm testing on a iPad

@stephencelis
Copy link
Owner

There are likely some low-hanging performance improvements to be made right now, especially around row iteration. I've just been optimizing the interface, first. Can we isolate a couple factors, though?

  1. Is this a Release build, or Debug?
  2. Performance vs. raw query. Let's find out how much more expensive the type-safe iteration is. You can run pdfDb!.trace(println) before all the above queries to set up a SQL query logger, from here, take the queries that spit out and test them with raw iteration (for _ in db.prepare(query) {}). Run both queries at least twice to account for warm-up.

If you have runnable code you can share, please email it and it should make any performance testing a bit easier.

@violabg
Copy link
Author

violabg commented Nov 24, 2014

I have changed the code to trace raw iteration, and it is much faster

func getPdfWithCategory(language:Language) -> CategoriesForLanguage {
        let categoriesForLanguage = CategoriesForLanguage()

        pdfDb!.db!.trace(println)

        let PDFs = pdfDb!.pdfsTb.table
        let categories = pdfDb!.categoriesTb.table
        start = NSDate()

        let query = "SELECT p.* , c.id as categoryid, c.name as categoryname, c.thumb as categorythumb, c.orderid as categorysortid FROM pdf p inner join categories c on c.id=p.categoryid WHERE p.languageId='\(language.id)'"
        let timeInterval:NSTimeInterval = start!.timeIntervalSinceNow
        println("query \(timeInterval)")

        start = NSDate()

        let stmt = pdfDb!.db!.prepare(query)
        for row in stmt {

        }

        let timeInterval2:NSTimeInterval = start!.timeIntervalSinceNow
        println("for row in query \(timeInterval2)")

        return categoriesForLanguage
    }

query -0.00111699104309082
SELECT p.* , c.id as categoryid, c.name as categoryname, c.thumb as categorythumb, c.orderid as categorysortid FROM pdf p inner join categories c on c.id=p.categoryid WHERE p.languageId='4'
for row in query -0.623710989952087

compared with the original code output:

query -0.00927197933197021
SELECT count() FROM pdf INNER JOIN categories ON (categories.id = pdf.categoryId) WHERE (pdf.languageId = '4')
query count = 88
SELECT pdf.
, categories.id, categories.name, categories.thumb, categories.orderid FROM pdf INNER JOIN categories ON (categories.id = pdf.categoryId) WHERE (pdf.languageId = '4')
for row in query -2.36225700378418

regarding sharing my code, I need to create a new project without all sensitive data and will email it to you.

@stephencelis
Copy link
Owner

Great, thanks! I have a few ideas already, but would be helpful to have a baseline to work with.

@TomasLinhart
Copy link

I can also confirm it is really slow. To get 1000 rows from my database it takes over 5 seconds. When I use db.prepare then the performance is much better.

@stephencelis
Copy link
Owner

@TomasLinhart Can you give actual benchmarks? I'm under the impression that the difference is linear and about 4x slower at the moment, but any specifics to the contrary (with examples) would be helpful.

@TomasLinhart
Copy link

@stephencelis Sure, I can provide you something. But it is not so hard to test. Just create a database with over 1000 rows and at least 15 columns and try to load everything.

@stephencelis
Copy link
Owner

@TomasLinhart Sure, but less time for me to have a reproducible problem the better. My personal use cases aren't dealing with this much data all at once, so it's been more of a back burner fix for me. (In fact, if you give me your use case that would be helpful knowledge, as well.) The easier you make it for me to take it off the back burner, the more likely fixes will come sooner when I have personal time to spare and allot to this part of the project. I'm also happy to take pull requests if you'd like to dig in yourself and spend time contributing to the project.

@TomasLinhart
Copy link

@stephencelis Sure, I will prepare an example project. Currently I just switched into db.prepare but in future I might need more so I could dive in and do some pull requests but for now I just want to finish my hobby project.

My use case is I have data set that I want to have in memory so I can manipulate it easily.

@TomasLinhart
Copy link

I have created a project with benchmark https://github.com/TomasLinhart/SQLitePerformance Just download it and run it and you will see difference in the performance.

@stephencelis
Copy link
Owner

Thanks! I'll take a look at this one—and @violabg's, above—when I have the chance.

@TomasLinhart
Copy link

I have updated the benchmark with C API example which is 17x times faster than db.prepare and 225x faster than typed API.

@stephencelis
Copy link
Owner

@TomasLinhart db.prepare is fairly lightweight, but there is implicit type conversion happening there that isn't happening in your native example. Assuming your benchmarks are from a release build, we may hit the point where Swift becomes the bottleneck (and will hopefully be optimized automatically in the future). I'll check for optimizations that can be made there, as well, though.

@TomasLinhart
Copy link

@stephencelis You are right, I checked your code for db.prepare and it is looking good. I tried to remove the logic for generator and then it is almost same speed as my C API example. So I guess biggest bottleneck in db.prepare is row.append but it is not possible to do much about it...

@stephencelis
Copy link
Owner

@TomasLinhart Were you running things in debug mode? My results:

using typed API took 6.32480198144913 seconds
using db.prepare took 0.377619981765747 seconds
using C native API took 0.0522429943084717 seconds

If I change your iteration logic to use sqlite3_column_type and sqlite3_column_{type} methods to determine the types and yield arrays of rows, there is negligible overhead:

using db.prepare took 0.388479948043823 seconds
using C native API took 0.382124006748199 seconds
        var results = [[Any?]]()
        if (sqlite3_prepare_v2(db, query.cStringUsingEncoding(NSUTF8StringEncoding)!, -1, &statement, nil)
            == SQLITE_OK) {
                let columnCount = sqlite3_column_count(statement)
                while (sqlite3_step(statement) == SQLITE_ROW) {
                    var row = [Any?]()
                    for idx in 0..<columnCount {
                        switch sqlite3_column_type(statement, Int32(idx)) {
                        case SQLITE_BLOB:
                            let bytes = sqlite3_column_blob(statement, Int32(idx))
                            let length = sqlite3_column_bytes(statement, Int32(idx))
                            row.append(Blob(bytes: bytes, length: Int(length)))
                        case SQLITE_FLOAT:
                            row.append(Double(sqlite3_column_double(statement, Int32(idx))))
                        case SQLITE_INTEGER:
                            let int = Int(sqlite3_column_int64(statement, Int32(idx)))
                            var bool = false
                            if let type = String.fromCString(sqlite3_column_decltype(statement, Int32(idx))) {
                                bool = type.hasPrefix("BOOL")
                            }
                            row.append(bool ? int != 0 : int)
                        case SQLITE_NULL:
                            row.append(nil)
                        case SQLITE_TEXT:
                            row.append(String.fromCString(UnsafePointer<CChar>(sqlite3_column_text(statement, Int32(idx))))!)
                        case let type:
                            assertionFailure("unsupported column type: \(type)")
                        }
                    }
                    results.append(row)
                }
                sqlite3_finalize(statement)
        }

Assuming you add logic to your C API example that handles the underlying types at all, I imagine there is little to no performance hit using SQLite.swift.

Given that, the typed interface is still weighing in at about 21x less performant in this case, which is better than 225x, but slow enough to warrant investigation and cleanup when I can.

Meanwhile, you should be able to batch/stream the results into your interface to avoid such big delays and still be able to use the typed interface.

@TomasLinhart
Copy link

Yeah, I was, sorry about that. So db.prepare is fast enough, check my last comment, biggest overhead is the append but there is not much we can do about that.

So only problem is only the typed interface. 😄

@stephencelis
Copy link
Owner

@TomasLinhart Ah, good to hear you're seeing the same results as me now :)

In the end I think the typed interface will become more performant as Swift improves its own performance, but I do believe in making optimizations in SQLite.swift wherever possible, so I'll try to dig in sometime soon!

@stephencelis
Copy link
Owner

@TomasLinhart Actually, the point about append is a good one. I've seen cases where things like reduce and map are slower, but map actually improves the performance here quite a bit:

using db.prepare took 0.273506999015808 seconds
using C native API took 0.397485971450806 seconds

I'll commit this optimization (and more) when I have more time to dig in. Thanks!

@stephencelis
Copy link
Owner

Also quickly cached columnNames in the QueryGenerator and we're down to 3.4x the slowdown of the original db.prepare and 5x with the new improvements.

using typed API took 1.3848859667778 seconds
using db.prepare (old) took 0.388479948043823 seconds
using db.prepare took 0.279254972934723 seconds

This may be the easiest win for now. I'm not sure if you want to dig in for any more optimizations.

@TomasLinhart
Copy link

Hehe, I was also digging down and discovered same problem 😄

@stephencelis
Copy link
Owner

One last improvement is an internal restructuring of how Row stores its data:

using typed API took 0.305689036846161 seconds
using db.prepare (new) took 0.307142019271851 seconds

Iteration should basically be identically performant, now.

OK! I'll push these fixes but then I really need to get back to work 💨

@TomasLinhart
Copy link

Good job, I am glad the problem was discovered. Looking forward to your commit! 😃

@stephencelis
Copy link
Owner

Closed by:

@stephencelis
Copy link
Owner

Thanks for your help, @TomasLinhart!

@violabg
Copy link
Author

violabg commented Dec 18, 2014

Hi Stephen,
I don't know if you had any chances to look at my sample code, but the last commit did not improve much for me.
Also when I try this:

let PDFs = pdfDb!.pdfsTb.table
let categories = pdfDb!.categoriesTb.table

let query = PDFs.select(PDFs[*], categories[pdfDb!.categoriesTb.id], categories[pdfDb!.categoriesTb.name], categories[pdfDb!.categoriesTb.thumb], categories[pdfDb!.categoriesTb.orderid])
.join(categories, on: categories[pdfDb!.categoriesTb.id] == PDFs[pdfDb!.pdfsTb.categoryId])
.filter(PDFs[pdfDb!.pdfsTb.languageId] == language.id)

it breaks saying :
fatal error: no such table: "pdf": file /workspace/apple/ios8/SQLitePerformance-master/SQLite.swift/SQLite Common/Database.swift, line 335

with it didn't happen before

@stephencelis
Copy link
Owner

@violabg That sounds like an unrelated issue. What does pdfDb!.pdfsTb.table look like? If you run pdfDb!.run("select count(*) from pdfs") (or whatever your PDFs table name is) you should get the same error. It sounds like the table doesn't exist yet.

@violabg
Copy link
Author

violabg commented Dec 18, 2014

let PDFs = pdfDb!.pdfsTb.table
let categories = pdfDb!.categoriesTb.table
println(PDFs.count) //prints 467
println(categories.count) //prints 13
let query = PDFs.select(PDFs[*], categories[pdfDb!.categoriesTb.id], categories[pdfDb!.categoriesTb.name], categories[pdfDb!.categoriesTb.thumb], categories[pdfDb!.categoriesTb.orderid]).join(categories, on: categories[pdfDb!.categoriesTb.id] == PDFs[pdfDb!.pdfsTb.categoryId]).filter(PDFs[pdfDb!.pdfsTb.languageId] == language.id)

        var categoriesDictionary = [Int: PdfCategory]()
        var currentCategory:PdfCategory

        for row in query {

here I get the error, when it try to enter the for in loop
fatal error: no such table: "pdf": file /workspace/apple/ios8/SQLitePerformance-master/SQLite.swift/SQLite Common/Database.swift, line 335

wich is table:
let PDFs = pdfDb!.pdfsTb.table

but it works here
println(PDFs.count) //prints 467
println(categories.count) //prints 13

@stephencelis
Copy link
Owner

Very strange. The error is coming from SQLite, so the query must be getting malformed somehow. Can you email me the latest version of your code so that I can better troubleshoot?

@kfmfe04
Copy link

kfmfe04 commented Dec 19, 2014

Aside: just wanted to throw in another sample from my personal project.

+65% longer for a flat-out table scan when using typed versus raw (8.54s vs 5.16s).

FWIW, I have been using flat-out binary block dumps for large arrays (10k) of structs/fixed-data for speed (like a column-based DBMS), but I am migrating to sqlite for ease-of-maintenance and flexibility.

        /// typed: db_read -8.54326200485229
        let tbl = GGGTable.CardAnnotation
        for r in tbl {
            if var s = notes[ r[e_cid] ]
            {
                s.unixtime = r[e_show_me_on]
            } else {
                notes[ r[e_cid] ] = CardAnnotation( unixtime: r[e_show_me_on] )
            }
        }

        /// raw: db_read -5.16345697641373
        let db = GGGDatabase.db
        let stmt = db.prepare( "SELECT * FROM cardannotation" )
        for r in stmt {
            let idx = r[0] as Int
            let ut  = r[1] as Int
            if var s = notes[ idx ]
            {
                s.unixtime = ut
            } else {
                notes[ idx ] = CardAnnotation( unixtime: ut )
            }
        }

@stephencelis
Copy link
Owner

@kfmfe04 And this is in a release build, not debug?

Regardless, the difference isn't too bad, considering. Certain optimizations are going to come with time as Swift improves. In the meantime there will be a tradeoff in speed vs. safety and code clarity.

SQLite.swift's type-safe interface should be pretty speedy when you use SQLite itself for querying and scoping the data you're working with, but if you're loading a large dataset into memory, you're probably going to want to optimize as you did, dipping down into the raw API with fewer generics.

I'm hoping that Swift's performance improves over time and the gap you're seeing narrows quite a bit.

I'm also open to pull requests that continue to improve performance with the type-safe interface, so feel free to dig in if you have ideas! I'll also keep performance in mind as I work with things.

@stephencelis
Copy link
Owner

@violabg Looks like the issue you've stumbled upon is a table quoting bug. I'll try to fix it soon!

@violabg
Copy link
Author

violabg commented Dec 19, 2014

thanks Stephen,
once you fix it, could you test the performance of my code and see if they match your results?

@stephencelis
Copy link
Owner

@violabg Fix pushed here: 1f070f7

I'm not sure how to check the performance difference, though. Please let me know how it's improved!

@violabg
Copy link
Author

violabg commented Dec 20, 2014

Thanks Stephen, the problem is fixed now and this are my test results:

func getPdfWithCategory(language:Language) -> CategoriesForLanguage {
        let categoriesForLanguage = CategoriesForLanguage()
        categoriesForLanguage.laguangeId = language.id

        let PDFs = pdfDb!.pdfsTb.table
        let categories = pdfDb!.categoriesTb.table

        self.start = NSDate()
        let query = PDFs.select(PDFs[*], categories[pdfDb!.categoriesTb.id], categories[pdfDb!.categoriesTb.name], categories[pdfDb!.categoriesTb.thumb], categories[pdfDb!.categoriesTb.orderid]).join(categories, on: categories[pdfDb!.categoriesTb.id] == PDFs[pdfDb!.pdfsTb.categoryId]).filter(PDFs[pdfDb!.pdfsTb.languageId] == language.id)

        let timeInterval:NSTimeInterval = start!.timeIntervalSinceNow
        println("query \(timeInterval)")

        println("query.count \(query.count)")

        var categoriesDictionary = [Int: PdfCategory]()
        var currentCategory:PdfCategory

        self.start = NSDate()
        for row in query {
//            
        }
        let timeInterval2:NSTimeInterval = start!.timeIntervalSinceNow
        println("for \(timeInterval2)")

        return categoriesForLanguage
    }

on mac:
query -0.00147002935409546
query.count 87
for -0.021390974521637
timeInterval -0.0214470028877258

on ipad 4:
query -0.0244329571723938
query.count 87
for -0.350589990615845
timeInterval -0.350932002067566

iteration over 87 records fills a little slow, I have ported this project from Objective C and is much faster.
Probably as you said, Swift is the bottleneck.

@kfmfe04
Copy link

kfmfe04 commented Dec 28, 2014

that was a debug build - ok - I have a more pressing issue

I do a table scan something like

    let sentences = db[ "sentences" ]
    let id              = Expression<Int>("id")
    let entry     = Expression<String>("entry")
    let address   = Expression<String>("address")
    let tokens = Expression<String>("tokens")
    for r in sentences
    {
    // ....
    }

for about 8,000 records, this takes 3 minutes on an iPod touch. Parsing a pipe-delimited ASCII file equivalent takes 25 seconds. Is this within range of expectations or should a table-scan be faster?

Maybe I should try passing in a "SELECT" statement instead and see how fast it is?

@stephencelis
Copy link
Owner

@kfmfe04 What are you doing with the data as you iterate? If you're doing empty iteration, it's not a fair comparison because SQLite.swift has type-handling code built in. If you're working with such a large dataset all at once, you may need to optimize by ditching the type-safe interface for something lower-level. Or better yet, limit your queries to only the data you need, and use aggregate functions for calculations across the dataset.

@androidcn
Copy link

so slow ....
using
for about 2000rows

@stephencelis
Copy link
Owner

@androidcn Please try to constructively contribute to the conversation and read this thread thoroughly (check my questions to others throughout, including whether you've compiled for "Release" or not, and what device you're running on).

In short:

  • The basic iteration API should be as fast as interfacing with libsqlite3 directly, while the typed API is as fast as adding type-checking logic for each row. You're likely hitting a Swift or SQLite bottleneck.
  • In the end, iterating over 2,000 rows seems a bit heavy-handed when SQLite provides aggregation APIs and when you can scope/batch your queries.

@androidcn
Copy link

@stephencelis but i want to use typed API

@stephencelis
Copy link
Owner

@androidcn The typed API is as fast as manually-typed, so please do. It will improve with Swift. Scope your queries for performance.

@stephencelis
Copy link
Owner

An update to those listening: Swift 1.2 performs 2x+ faster for me. If you end up re-running your benchmarks (off the swift-1-2 branch), please share your experiences.

@androidcn
Copy link

i will check it out later
On Sat, 14 Feb, 2015 at 9:30 am Stephen Celis notifications@github.com
wrote:

An update to those listening: Swift 1.2 performs 2x+ faster for me. If you
end up re-running your benchmarks (off the swift-1-2 branch), please
share your experiences.

Reply to this email directly or view it on GitHub
#23 (comment)
.

@violabg
Copy link
Author

violabg commented Feb 16, 2015

I have tested my code with branche Swift 1.2.
I get 1,3 sec.
with Swift 1.1 I get 6,9 sec.

using raw sql I get 0,28 sec. with Swift 1.2
and 0,98 sec. with Swift 1.1

@stephencelis
Copy link
Owner

Thanks @violabg! Hopefully things will continue to improve steadily.

@TomasLinhart
Copy link

A little bit OT. When will you merge the Swift 1.2 branch into master, @stephencelis ?

@stephencelis
Copy link
Owner

@TomasLinhart per #62:

This PR is open to track changes against the Swift 1.2 beta. It won't be merged till Swift 1.2 goes GM, but will be continuously rebased onto master.

@sehgalsaheel
Copy link

Is the limit set to 1000? Or is there a way to do more than 1000 (8000 to be specific)

@stephencelis
Copy link
Owner

@sehgalsaheel Have you tried it? Have you had problems? You should have the same ability, limit-wise, as you'd have with raw sqlite3. If the device can't handle 8,000, you should be batching.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants