New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DBManager PostgreSQL backend using core APIs instead of psycopg2 #33225
Conversation
@strk please keep in mind that the roadmap is to eventually get rid of all provider specific connectors implemented in python and use the So, just to remind you that the connection (low level DB connection) status should be IMHO handled in the provider and not in this plugin and not in QGIS core either, what I think could be an acceptable approach is for the @nyalldawson any opinion on this? |
@elpaso agreed! |
You mean we should stop using PsycoPg as a whole ? |
Yes. This topic has been discussed a lot, and the ultimate goal is port DB manager to C++ (or better, drop it altogether and replace with the C++ implementation we are already using in the QGIS browser panel), of course this won't happen overnight until we do not reach feature parity but at least let's try to make the steps in the right direction. The fact the we (you included) are spending a large share of our time (and the donor's money) fixing and patching a constantly broken DB manager is a living proof of the rightfulness of the decision, a great part of the functionality is now available, well tested and well maintained in C++. |
Ok, thanks for the heads up. @Elpas It looks like you did some work in that direction for the gpkg dbplugin so is it a good idea to try at following that path ? One thing I see is that the "model" of the DBManager DBPlugin is to have a concept of a "cursor", which is what you call .execute() on (and .close). This concept is not present in the abstract connection provider, just a plan single-run executeSQL (no transaction?). |
Yes, that was just an initial migration to the new API in order to test its concepts but you can certainly start from there.
Yeah, transactions are missing, we briefly thought about that and we decided it wasn't a priority for the class scope. Do you see a solid use case for that? |
I don't yet see a solid use case, other than migrating incrementally (users might be already using that cursor concept). Is actually not just transactions, but also partial fetches from a cursor (_fetchone) that are missing. Maybe those could be implemented by creating an actual cursor in the database by subclasses... |
fetchone is just a normal execSql where you just pick the first entry from the first row, no need to implement that. |
it still needs to be implemented, in DBConnector subclasses, unless DBConnector base class as a whole implements that (and drops the "cursor" concept) |
I hate python. The ability to use any member/method drives to such a mess!
The PostgreSQL specific subclass probably used to have that kind of code, but it was removed at some point in time:
Following git history it looks like such "move" was already present in the very first inclusion of code in core, as it is found in 3c5b3bb (2012 commit from @brushtyler ) |
That's exactly why the plan is to move all this code outside of python. We had the same issue with processing too. Python is just a bad choice for large, complex projects (unless you soak them with near 100% test coverage, including all GUI functionality) |
@elpaso another thing missing in the new API is obtaining the names of returned columns, for a generic query executor (as needed by the SQL window of DBManager). In psycopg2 terms this is cursor's The |
True, any idea? We could add an argument to return column names in the first row? |
it's not just names, here's the list of currently expected information:
|
d57bc39
to
8d44699
Compare
I've faked those records for the moment, pretending all fields are strings with a length of 10 (both internal and display). It seems to work. SQL Window can query the database, even upon restart of backend. This is as of 8d4469901327dcb9bedce6c708ebd7046dcb3921. I'm sure there are lots of places that are broken with this change though, so it'll take some time to adapt more users (for sure topoviewer will need adaptation) |
Getting name in first row could be good enough for a start, but I wonder if we really need cursors |
How about a getTableInformation? |
How about a getTableInformation?
It's not only about table, but about a query result.
Could be a complex select.
In most cases it'd take running the query, or at least
_preparing_ it, or defining a cursor, in order to obtain
that information, so a separate function might be problematic
from a performance point of view, unless the separate
function returns some handle to subsequently fetch rows
from the thing (that's what a "cursor" is in psycopg2).
|
I would say that if this is a PG specific implementation it's probably acceptable to have a custom implementation in the PG plugin connector. But I think that getting column information from a query is a valid use case for the abstract connection API, always thinking about all supported DB backends, not limited to PG. |
QgsCredentials, | ||
QgsDataSourceUri, | ||
QgsProviderRegistry, | ||
QgsAbstractDatabaseProviderConnection, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need to import this? It's abstract, you get the concrete instance out of the factory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the import completely, with c01e34bf12
Is there a python function to map the QVariant resultset values into typed python variables ? |
You don't need it: SIP automatically takes care of the conversion, if it doesn't then you must write native conversion code like we've done in code for many |
It looks like I'm getting some QVariant typed values in the resultset. Debugging further to find out which ones, could be just some missing conversion we have to add. Probably for NULL ? XXX col of rec of resultset valued NULL is typed <class 'PyQt5.QtCore.QVariant'> |
I'm missing the context :) But there is some conversion code from DB returned values in several places, take a look to the provider implementation of execSql (possibly the private implementation) |
I confirm my problem was with NULL. By adding special code in my CursorProxy to deal with NULL (converting it to None) fixes my case. I guess this should be done at lower levels, will try to find the culprit. |
@elpaso see 73f22499e8ebfa0a22dab213831ff9b57de91ee6 for context. |
@strk maybe I forgot to check for QVariant NULL here? https://github.com/qgis/QGIS/blob/master/src/providers/postgres/qgspostgresproviderconnection.cpp#L246 |
This reverts commit 0ad368a9d28128a4f80896cc6f9989f12b758840.
c3bd091
to
8e5ff28
Compare
Ok, so as anticipated, creating a QgsVectorLayer makes operations far slower as a new layer has to be created for each and every query being executed, including "service" ones. For instance, this I have in the logs, with current state of this PR:
Basically, some code part of DBManager wants to check for a capability, and runs a query, the query executor creates a VectorLayer to find out names of fields, which triggers lots of other queries. |
NOTE: lazily fetching field names when needed might not be possible as current code does things like: |
Do you mean a generator (using 'yield' ) ? |
Thanks @roya0045 for the suggestion but I found a simpler way ( |
@elpaso with 8ca08ec I've got precision and lenght information from the QgsField instance, but I'm not sure about "displaySize" vs. "internalSize" (I guess displaySize should be left hard-coded or somehow lower?) and I dunno how to extract a mapping between the data type returned by QgsField::type and the python type name... |
Well, I've used the The only missing piece here would be the chunk-based access, but now I'm not sure it is worth the trouble because I could confirm that creating a VectorLayer for each and every query is really killer for performance. @palmerj what do you think ? Is the SQL Window of DBManager used for inspecting data in big tables ? |
How do you mean? Showing many rows in the grid or running a query against a large table (hence the longer time for setup if using QgsVectorLayer)? We often do both (or would like to) |
I suspect running a query against a large table could still be fast (QgsVectorLayer supposedly fetching rows only on demand, and current code not fetching rows at all using it). But the current code also runs the query via the abstract connection new core API which returns all rows at once. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
The DBManager's SQL window gets stuck when the connection to the (PostgreSQL) database is interrupted (for example on backend restart). This PR is aimed at providing a solution, either as automatic-reconnect or user-requested reconnect action.
See #31994