-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow incremental I/O to blobs in sqlite3 #69093
Comments
SQLite supports incremental I/O to blobs, i.e. the capability to stream reads and writes to blobs without having to load the entire blob into memory first. See https://www.sqlite.org/c3ref/blob_open.html for more details on the C API. It'd be nice if it were possible to do this in Python using sqlite3 (it is already possible with apsw). |
I opened a pull request for blob support in the pysqlite github repository: I will do the needed changes for python3 and will post a patch soon. |
I did the needed changes for the pull request at pysqlite for porting to python3. I will continue updating the patch if there will be changes in pysqlite and vice versa. |
Pinging as mentioned in the devguide. |
Thanks for the review Serhiy. Attached is the updated patch after the changes. |
Pinging again. I think this would be a great enhancement to the sqlite module. |
Sorry Aviv, I just forget about this issue. Added new comments on Rietveld. Many lines in sqlite3.rst still are too long. It would be worth to ask other developers about wanted interface. |
Thanks for the CR Serhiy. Attached is a new patch after the fixes from the CR. What other developers should I ask? The interface is file like and is the same as apsw. |
apsw has different order of arguments for blobopen(), all arguments are mandatory, and the parameter for read/write mode is called "writeable" rather than "readonly". This part of API needs to be discussed. Ask for discussion on mailing lists: Python-Ideas and maybe Python-List. |
Uploading patch after fixes from berker CR. The
I don't think that there is enough differences between the possible API's to justify sending a message to the mailing lists. |
I opened a PR in github. I tagged some other developers if that will not start a discussion on the API I will post in the python-ideas. |
Is there similar API in other databases? |
I am not a DB expert but from a quick search I couldn't find a similar API. I did find a few API's in other programming languages to sqlite blob:
It seems like most of them give the same API as apsw. |
Small discussion is started at pull request [1]. There are doubts about the usefulness of incremental I/O API. SQLite is rarely used for storing blobs of the size of hundreds of megabytes. For smaller blobs it may be enough to read or write all data at once. There are also questions about the support of len(), since other file-like objects don't support len(). This discussion remembered me about mmap objects. mmap objects implement two protocols: file protocol and sequence protocol (including the support of len()). The BLOB object looks similar to the mmap object. sqlite3_blob_write() may only modify the contents of the BLOB; it is not possible to increase the size of a BLOB using this API. Maybe implement the sequence protocol in the BLOB object? Or implement only the sequence protocol and drop away the file protocol? [1] #271 |
Just to make sure when you say "sequence protocol" you thin about the doing blob[4:6] = "ab"? I actually think this is a nice feature. The Blob and the mmap object has a lot in common so I think that making the same API will be best. I think that adding the sequence protocol in addition to the file protocol is best. |
Pinging. As I mentioned in the PR I need a little help with __contains__. |
As I wrote in the PR: I believe that this is a very good feature for the sqlite module. I know that there is no active core developer that is currently working on sqlite module but this patch is already waiting 2 years. Could I do anything to help this patch merged? |
I am looking forward for this to be included. My main use case is on restricted IoT devices, where I need to handle BLOBs of 2MB size. This is ugly because it is not part of the atomic commit of the database. I would very much appreciate if I could get a file-like API for BLOBs in sqlite. |
Hi everybody, I let a comment on github. I was asking about if are there some plans with this PR? Cheers |
It looks like this has a PR that just needs rebasing, then it will be ready for another review. |
FYI, I'm maintaining a rebased version on my fork: https://github.com/erlend-aasland/cpython/tree/sqlite-blob Here's the changes I've applied upon PR 271:
I don't think it is ready for inclusion yet; the test suite needs to be restructured, or maybe rewritten from scratch. Pr. now most of the tests maintain a single connection with a prepared database. IMO, it's a fragile design and it is hard to read, debug, and verify the test code. I'll see if I can get to it in a couple of weeks. |
I've submitted my changes as a separate PR. I believe we should consider duplicating the apsw API, for users convenience. Pr. now, they are very similar, except for the "open blob" API: apsw uses |
The diff is pretty heavy, here's the diffstat: It will be hard to find reviewers for such a large PR. I suggest to remove mapping support and context manager support for now, and then add those features in separate PR's. |
Slimmed PR diff (excluding clinic), without context manager and mapping protocol: $ git diff main Modules/_sqlite/*.[ch] Lib Doc Misc PC* setup.py
Doc/includes/sqlite3/blob.py | 12 +
Doc/includes/sqlite3/blob_with.py | 12 +
Doc/library/sqlite3.rst | 73 ++++++
Doc/whatsnew/3.11.rst | 4 +
Lib/test/test_sqlite3/test_dbapi.py | 165 +++++++++++++-
Misc/NEWS.d/next/Library/2018-04-18-16-15-55.bpo-24905.jYqjYx.rst | 3 +
Modules/_sqlite/blob.c | 342 +++++++++++++++++++++++++++++
Modules/_sqlite/blob.h | 24 ++
Modules/_sqlite/connection.c | 83 ++++++-
Modules/_sqlite/connection.h | 5 +-
Modules/_sqlite/module.c | 6 +-
Modules/_sqlite/module.h | 1 +
PCbuild/_sqlite3.vcxproj | 2 +
PCbuild/_sqlite3.vcxproj.filters | 6 +
setup.py | 1 +
15 files changed, 733 insertions(+), 6 deletions(-) Looks promising. |
@erlend-aasland can we close this issue with #30680 merged, or do you want to keep it for further API enhancements for blobs? |
I noticed this was missing while writing typeshed stubs. It's useful to expose it for use in annotations.
- document that you cannot open a blob handle in a WITHOUT ROWID table - document the blobopen() positional arguments in the same order as they appear
Let's keep it open until we've got context manager and mapping support in place. |
I noticed this was missing while writing typeshed stubs. It's useful to expose it for use in annotations and for exploration.
Unless sqlite3_blob_open() returns SQLITE_MISUSE, the error code and message are available on the connection object. This means we have to handle SQLITE_MISUSE error messages explicitly.
Authored-by: Aviv Palivoda <palaviv@gmail.com> Co-authored-by: Erlend E. Aasland <erlend.aasland@innova.no>
- document that you cannot open a blob handle in a WITHOUT ROWID table - document the blobopen() positional arguments in the same order as they appear - relocate sqlite3.Blob section
Authored-by: Aviv Palivoda <palaviv@gmail.com> Co-authored-by: Erlend E. Aasland <erlend.aasland@innova.no>
Incremental I/O for blobs is now supported in the sqlite3 extension module in the upcoming Python 3.11 release. Thanks a lot to @palaviv, @JelleZijlstra, and everyone else who contributed by commenting and reviewing. IMO, the implementation can benefit from further tweaks, but let's do that in separate issues if needed. |
cc. @eamanu, @nightlark |
(cherry picked from commit 8a5e3c2) Co-authored-by: Christian Heimes <christian@python.org>
sqlite3
#30356sqlite3
#30680Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: