-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to speedup virtual tables? #465
Comments
Using the C-API I know I can insert ~700k rows incredibly fast (few seconds). |
Hmm -- never mind that actually runs pretty fast -- can't figure out why my generator is so slow. |
I know it's from https://github.com/fzakaria/sqlelf/blob/main/sqlelf/elf/instruction.py
I just did a silly log statement and see it finishes the instructions to a list pretty fast then I am just 🕐 for the query to finish. ❯ sqlelf /usr/bin/python3 --sql "select COUNT(*) from elf_instructions where section = '.text'"
Starting
Ending: 7
Starting
Ending: 7
Starting
Ending: 1485
Starting
Ending: 693445 |
Here is the output of cProfile with snakeviz with the command:
I'll close this issue pretty soon -- unless you have any feedback; sorry for the noise here. |
The visualization does show the time is spent in "your" code. Virtual tables will be slower in the big picture though because they do more work. APSW/Python makes them slower still because GILs have to be acquired and released, and Python evaluation will be slower than C. My advise is that if you are going to have a large amount of data and you are going to return all of it via virtual tables, then you'll find it quicker to load the data into a SQLite temporary table and just use that. SQLite's btrees and C code will be the most performant. Virtual tables are great when they can avoid returning information hence saving work. |
Thank you for taking the time to answer. Anyways -- that's on me to investigate; thanks again :) |
I've attached a profile with py-spy if anyone is interested. Lot of time just spent in "Column" for apsw.
|
It doesn't look like there is anything I could do in APSW to make it faster. SQlite only asks for a single value at a time, so 10,000 rows of 10 columns means 10,000 calls to The APSW Python code you are using could be made slightly faster by eliminating an if statement and replacing with specialised methods, but I don't think it will make a meaningful difference. It does look like the code you execute in response to I did realise that the code that does conversion between Python and SQLite C types isn't covered in speedtest. so it should be #466 |
I even batch all the work ahead of time but the generator itself spends a lot time just returning the results. In any case it's "usable" as is right now with ~5-10seconds wait time. It's been a good exercise to explore the space which is what Python is perfect for now. |
I did my own quick virtual table benchmark with a million rows of 10 columns. Exercising it was
Something else you are likely affected by is that you have zillions of duplicate values. It is something I've experienced before when doing analytics processing. eg every string Python does intern single letter strings and -1 through 255. What I've done before is make a dict to store all values and get a singleton for them, something like: _uniques = {}
def unique_value(v):
try:
return _uniques[v]
except KeyError:
_uniques[v] = v
return v I did look at Capstone source to see if they call the Python intern or the C level intern but didn't find either. Perhaps tweaks like above will delay the rewrites by a few months :) |
Looks like I do need to optimise the Column method at least. It's body is an if self.access == VTColumnAccess.By_Index:
v = self.current_row[which]
elif self.access == VTColumnAccess.By_Name:
v = self.current_row[self.columns[which]]
elif self.access == VTColumnAccess.By_Attr:
v = getattr(self.current_row, self.columns[which]) I reordered the if statements to be name, attr, index and then reran my benchmark:
That is a ~20% difference by index being first versus third in the
Another ~2% difference. So ballpark I could make things ~25% faster by using specialised versions of the methods. Re-opening to do just that. |
@rogerbinns cool insights! Should I be passing unique sentinels back to apsw? |
I've attached a demo counter.zip that shows the approach in use. A list of 100,000 identical dictionaries are created. The first just duplicates the same dictionary 100,000 times, the second creates each dictionary on the fly, and the third runs each key and value through As expected the first results in 4 distinct objects, the second in 100,003 distinct objects, and the third in 4. This approach is useful if you end up with lots of items in memory simultaneously that have the same value, but were separately allocated. I suspect that isn't something you actually want to do. You should probably also be using tuples for your data instead of dictionaries, because dictionaries are a more complicated more overhead data structure, but you aren't actually using their resulting extra functionality. |
The difference between (1) and (2) are subtle and I'm surprised they give different # of objects:
In my example I have:
mnemonic, address, and op_str were all allocated by Capstone and likely have a low cardinality. |
For (1) it is the same objects in Python bytecode, so they just return an additional reference. For (2) it is constructing new objects (that is why I deliberately made them an expression not a constant). For strings Python will look in the interned strings, and for numbers -1 through 255 are preallocated. Since the expression results are neither you get new objects even though they have the same values. Under the hood in the Python C APIs everything is a PyObject pointer. Everything is done by reference, not by value. Unless you are keeping all the |
well I am passing all these objects back to SQLite which does store them in-memory to do the SQL analysis. I'm working on a new object file format for Linux as well; I would love to chat about that with you sometime. |
SQLite has a significantly more compact storage format than Python, although it doesn't do deduplication. (You can manually do that similar to normalization, but I doubt it is worth it.) Best wishes on the potential future coder :) I'd be delighted to discuss object file formats - contact details can be found via my github profile. |
This results in 5-10% less cpu consumption. The entrance method already ensures that VTColumnAccess is passed in Refs #465
This avoids the if/elif chains. The result is about twice the throughput of naive version. Refs #465
The Python speedups above result in a doubling of throughput - the benchmark went from ~950k Column calls per second to ~1.95 million. That was due to dropping if statements and overwriting the Column method at runtime with a specialised version. By_Index is still the fastest but By_Dict and By_Attr aren't that much worse because they aren't lower down in nested if statements. I've now done profiling on the C code (using callgrind) and there isn't anything that can be done. It does turn out that SQLite still inserts each row into a database page, so the flow is Python values get converted to SQLite/C types, they get inserted into a page, then the row is returned from the page again, requiring each SQLite/C value to be converted back to Python and put into a tuple. It would still be quicker for this much data to use executemany and insert it into a temp table, especially if you are going to do operations on it at the SQL level (eg count of instructions). |
Wow I should bump apsw now on my package -- I will see 100x speedup for this part? |
The updated code isn't in a release yet. It will likely be a few weeks or more before I do another release, unless this is urgent :) In the short term you can copy the file into wherever you have apsw and get all the benefits. If you change none of your code, you'll get up to a doubling of throughput - Amdahl's law applies. |
The release incorporating this is now out. Good news is that another ~25% improvement is coming in #477 |
Sweet! I have a TODO to revisit this and bump my revision post release. |
I just upgraded and I am seeing solid speedups. Use to be in the 5-7s range.
I also tried I noticed the output table has a lot of fancy terminal coloring now. |
is there an easy way to have virtual tables store their data in a temporary table? SELECT *
FROM ELF_SYMBOLS caller, ELF_SYMBOLS callee
WHERE
caller.name = callee.name AND
caller.path != callee.path
|
The terminal colours have always been there, based on the type of the value. The line drawing was added in 3.42.0.0 (May 2023). You have to do the temporary table stuff yourself #477 being completed should take out about another second in your runtime above. You can try it out now by checking out the vectorcall branch of APSW git. For your query above you should prepend it with |
Another idea was creating a new database Edit: or for apsw would be to automatically create the TEMP table wrapper for each generator? |
Making a temp table is as simple as Using |
Wow I'm surprised how fast slurping it in a TEMP table is...
They execute so fast now :) For one-off executions you pay the cost of re-inserting it into the
But if you start the |
Did you notice:
That shows really should be looking at the query plan, and will get more performance improvements through the query structure and creating indices. If you don't change SQLite defaults then cache size should be used - the default is 2MB. |
Hi Roger!
I'm not sure if this is a function of SQLite or the C-wrapper but I'm finding the virtual table wrapper very slow.
I'm trying to run it over a table that may have ~700k rows and it's taking a very very long time.
I've boiled down the sample to:
I'm not very proficient at profiling both C-extensions and Python -- cProfile doesn't have anything really to help here.
The text was updated successfully, but these errors were encountered: