-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
performance issue iterating over a query #23
Comments
There are likely some low-hanging performance improvements to be made right now, especially around row iteration. I've just been optimizing the interface, first. Can we isolate a couple factors, though?
If you have runnable code you can share, please email it and it should make any performance testing a bit easier. |
I have changed the code to trace raw iteration, and it is much faster func getPdfWithCategory(language:Language) -> CategoriesForLanguage {
let categoriesForLanguage = CategoriesForLanguage()
pdfDb!.db!.trace(println)
let PDFs = pdfDb!.pdfsTb.table
let categories = pdfDb!.categoriesTb.table
start = NSDate()
let query = "SELECT p.* , c.id as categoryid, c.name as categoryname, c.thumb as categorythumb, c.orderid as categorysortid FROM pdf p inner join categories c on c.id=p.categoryid WHERE p.languageId='\(language.id)'"
let timeInterval:NSTimeInterval = start!.timeIntervalSinceNow
println("query \(timeInterval)")
start = NSDate()
let stmt = pdfDb!.db!.prepare(query)
for row in stmt {
}
let timeInterval2:NSTimeInterval = start!.timeIntervalSinceNow
println("for row in query \(timeInterval2)")
return categoriesForLanguage
} query -0.00111699104309082 compared with the original code output: query -0.00927197933197021 regarding sharing my code, I need to create a new project without all sensitive data and will email it to you. |
Great, thanks! I have a few ideas already, but would be helpful to have a baseline to work with. |
I can also confirm it is really slow. To get 1000 rows from my database it takes over 5 seconds. When I use db.prepare then the performance is much better. |
@TomasLinhart Can you give actual benchmarks? I'm under the impression that the difference is linear and about 4x slower at the moment, but any specifics to the contrary (with examples) would be helpful. |
@stephencelis Sure, I can provide you something. But it is not so hard to test. Just create a database with over 1000 rows and at least 15 columns and try to load everything. |
@TomasLinhart Sure, but less time for me to have a reproducible problem the better. My personal use cases aren't dealing with this much data all at once, so it's been more of a back burner fix for me. (In fact, if you give me your use case that would be helpful knowledge, as well.) The easier you make it for me to take it off the back burner, the more likely fixes will come sooner when I have personal time to spare and allot to this part of the project. I'm also happy to take pull requests if you'd like to dig in yourself and spend time contributing to the project. |
@stephencelis Sure, I will prepare an example project. Currently I just switched into db.prepare but in future I might need more so I could dive in and do some pull requests but for now I just want to finish my hobby project. My use case is I have data set that I want to have in memory so I can manipulate it easily. |
I have created a project with benchmark https://github.com/TomasLinhart/SQLitePerformance Just download it and run it and you will see difference in the performance. |
Thanks! I'll take a look at this one—and @violabg's, above—when I have the chance. |
I have updated the benchmark with C API example which is 17x times faster than db.prepare and 225x faster than typed API. |
@TomasLinhart |
@stephencelis You are right, I checked your code for db.prepare and it is looking good. I tried to remove the logic for generator and then it is almost same speed as my C API example. So I guess biggest bottleneck in db.prepare is row.append but it is not possible to do much about it... |
@TomasLinhart Were you running things in debug mode? My results:
If I change your iteration logic to use
var results = [[Any?]]()
if (sqlite3_prepare_v2(db, query.cStringUsingEncoding(NSUTF8StringEncoding)!, -1, &statement, nil)
== SQLITE_OK) {
let columnCount = sqlite3_column_count(statement)
while (sqlite3_step(statement) == SQLITE_ROW) {
var row = [Any?]()
for idx in 0..<columnCount {
switch sqlite3_column_type(statement, Int32(idx)) {
case SQLITE_BLOB:
let bytes = sqlite3_column_blob(statement, Int32(idx))
let length = sqlite3_column_bytes(statement, Int32(idx))
row.append(Blob(bytes: bytes, length: Int(length)))
case SQLITE_FLOAT:
row.append(Double(sqlite3_column_double(statement, Int32(idx))))
case SQLITE_INTEGER:
let int = Int(sqlite3_column_int64(statement, Int32(idx)))
var bool = false
if let type = String.fromCString(sqlite3_column_decltype(statement, Int32(idx))) {
bool = type.hasPrefix("BOOL")
}
row.append(bool ? int != 0 : int)
case SQLITE_NULL:
row.append(nil)
case SQLITE_TEXT:
row.append(String.fromCString(UnsafePointer<CChar>(sqlite3_column_text(statement, Int32(idx))))!)
case let type:
assertionFailure("unsupported column type: \(type)")
}
}
results.append(row)
}
sqlite3_finalize(statement)
} Assuming you add logic to your C API example that handles the underlying types at all, I imagine there is little to no performance hit using SQLite.swift. Given that, the typed interface is still weighing in at about 21x less performant in this case, which is better than 225x, but slow enough to warrant investigation and cleanup when I can. Meanwhile, you should be able to batch/stream the results into your interface to avoid such big delays and still be able to use the typed interface. |
Yeah, I was, sorry about that. So db.prepare is fast enough, check my last comment, biggest overhead is the append but there is not much we can do about that. So only problem is only the typed interface. 😄 |
@TomasLinhart Ah, good to hear you're seeing the same results as me now :) In the end I think the typed interface will become more performant as Swift improves its own performance, but I do believe in making optimizations in SQLite.swift wherever possible, so I'll try to dig in sometime soon! |
@TomasLinhart Actually, the point about
I'll commit this optimization (and more) when I have more time to dig in. Thanks! |
Also quickly cached
This may be the easiest win for now. I'm not sure if you want to dig in for any more optimizations. |
Hehe, I was also digging down and discovered same problem 😄 |
One last improvement is an internal restructuring of how
Iteration should basically be identically performant, now. OK! I'll push these fixes but then I really need to get back to work 💨 |
Good job, I am glad the problem was discovered. Looking forward to your commit! 😃 |
Thanks for your help, @TomasLinhart! |
Hi Stephen, let PDFs = pdfDb!.pdfsTb.table
let categories = pdfDb!.categoriesTb.table
let query = PDFs.select(PDFs[*], categories[pdfDb!.categoriesTb.id], categories[pdfDb!.categoriesTb.name], categories[pdfDb!.categoriesTb.thumb], categories[pdfDb!.categoriesTb.orderid])
.join(categories, on: categories[pdfDb!.categoriesTb.id] == PDFs[pdfDb!.pdfsTb.categoryId])
.filter(PDFs[pdfDb!.pdfsTb.languageId] == language.id) it breaks saying : with it didn't happen before |
@violabg That sounds like an unrelated issue. What does |
let PDFs = pdfDb!.pdfsTb.table
let categories = pdfDb!.categoriesTb.table
println(PDFs.count) //prints 467
println(categories.count) //prints 13
let query = PDFs.select(PDFs[*], categories[pdfDb!.categoriesTb.id], categories[pdfDb!.categoriesTb.name], categories[pdfDb!.categoriesTb.thumb], categories[pdfDb!.categoriesTb.orderid]).join(categories, on: categories[pdfDb!.categoriesTb.id] == PDFs[pdfDb!.pdfsTb.categoryId]).filter(PDFs[pdfDb!.pdfsTb.languageId] == language.id)
var categoriesDictionary = [Int: PdfCategory]()
var currentCategory:PdfCategory
for row in query { here I get the error, when it try to enter the for in loop wich is table: but it works here |
Very strange. The error is coming from SQLite, so the query must be getting malformed somehow. Can you email me the latest version of your code so that I can better troubleshoot? |
Aside: just wanted to throw in another sample from my personal project. +65% longer for a flat-out table scan when using typed versus raw (8.54s vs 5.16s). FWIW, I have been using flat-out binary block dumps for large arrays (10k) of structs/fixed-data for speed (like a column-based DBMS), but I am migrating to sqlite for ease-of-maintenance and flexibility. /// typed: db_read -8.54326200485229
let tbl = GGGTable.CardAnnotation
for r in tbl {
if var s = notes[ r[e_cid] ]
{
s.unixtime = r[e_show_me_on]
} else {
notes[ r[e_cid] ] = CardAnnotation( unixtime: r[e_show_me_on] )
}
}
/// raw: db_read -5.16345697641373
let db = GGGDatabase.db
let stmt = db.prepare( "SELECT * FROM cardannotation" )
for r in stmt {
let idx = r[0] as Int
let ut = r[1] as Int
if var s = notes[ idx ]
{
s.unixtime = ut
} else {
notes[ idx ] = CardAnnotation( unixtime: ut )
}
} |
@kfmfe04 And this is in a release build, not debug? Regardless, the difference isn't too bad, considering. Certain optimizations are going to come with time as Swift improves. In the meantime there will be a tradeoff in speed vs. safety and code clarity. SQLite.swift's type-safe interface should be pretty speedy when you use SQLite itself for querying and scoping the data you're working with, but if you're loading a large dataset into memory, you're probably going to want to optimize as you did, dipping down into the raw API with fewer generics. I'm hoping that Swift's performance improves over time and the gap you're seeing narrows quite a bit. I'm also open to pull requests that continue to improve performance with the type-safe interface, so feel free to dig in if you have ideas! I'll also keep performance in mind as I work with things. |
@violabg Looks like the issue you've stumbled upon is a table quoting bug. I'll try to fix it soon! |
thanks Stephen, |
Thanks Stephen, the problem is fixed now and this are my test results: func getPdfWithCategory(language:Language) -> CategoriesForLanguage {
let categoriesForLanguage = CategoriesForLanguage()
categoriesForLanguage.laguangeId = language.id
let PDFs = pdfDb!.pdfsTb.table
let categories = pdfDb!.categoriesTb.table
self.start = NSDate()
let query = PDFs.select(PDFs[*], categories[pdfDb!.categoriesTb.id], categories[pdfDb!.categoriesTb.name], categories[pdfDb!.categoriesTb.thumb], categories[pdfDb!.categoriesTb.orderid]).join(categories, on: categories[pdfDb!.categoriesTb.id] == PDFs[pdfDb!.pdfsTb.categoryId]).filter(PDFs[pdfDb!.pdfsTb.languageId] == language.id)
let timeInterval:NSTimeInterval = start!.timeIntervalSinceNow
println("query \(timeInterval)")
println("query.count \(query.count)")
var categoriesDictionary = [Int: PdfCategory]()
var currentCategory:PdfCategory
self.start = NSDate()
for row in query {
//
}
let timeInterval2:NSTimeInterval = start!.timeIntervalSinceNow
println("for \(timeInterval2)")
return categoriesForLanguage
} on mac: on ipad 4: iteration over 87 records fills a little slow, I have ported this project from Objective C and is much faster. |
that was a debug build - ok - I have a more pressing issue I do a table scan something like
for about 8,000 records, this takes 3 minutes on an iPod touch. Parsing a pipe-delimited ASCII file equivalent takes 25 seconds. Is this within range of expectations or should a table-scan be faster? Maybe I should try passing in a "SELECT" statement instead and see how fast it is? |
@kfmfe04 What are you doing with the data as you iterate? If you're doing empty iteration, it's not a fair comparison because SQLite.swift has type-handling code built in. If you're working with such a large dataset all at once, you may need to optimize by ditching the type-safe interface for something lower-level. Or better yet, limit your queries to only the data you need, and use aggregate functions for calculations across the dataset. |
so slow .... |
@androidcn Please try to constructively contribute to the conversation and read this thread thoroughly (check my questions to others throughout, including whether you've compiled for "Release" or not, and what device you're running on). In short:
|
@stephencelis but i want to use typed API |
@androidcn The typed API is as fast as manually-typed, so please do. It will improve with Swift. Scope your queries for performance. |
An update to those listening: Swift 1.2 performs 2x+ faster for me. If you end up re-running your benchmarks (off the |
i will check it out later
|
I have tested my code with branche Swift 1.2. using raw sql I get 0,28 sec. with Swift 1.2 |
Thanks @violabg! Hopefully things will continue to improve steadily. |
A little bit OT. When will you merge the Swift 1.2 branch into master, @stephencelis ? |
@TomasLinhart per #62:
|
Is the limit set to 1000? Or is there a way to do more than 1000 (8000 to be specific) |
@sehgalsaheel Have you tried it? Have you had problems? You should have the same ability, limit-wise, as you'd have with raw sqlite3. If the device can't handle 8,000, you should be batching. |
I'm having big performance issue iterating over a query, this is my code:
and this is the print results:
query -0.0109010338783264
query count = 88
for row in query -2.39910697937012
I have even removed the join to see if it was that:
it improved a little bit, but still too long:
query -0.00782698392868042
query count = 88
for row in query -1.40383696556091
P.S. I'm testing on a iPad
The text was updated successfully, but these errors were encountered: