Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Database class mechanism for cross-connection in-memory databases #1151

Closed
simonw opened this issue Dec 17, 2020 · 11 comments
Closed

Database class mechanism for cross-connection in-memory databases #1151

simonw opened this issue Dec 17, 2020 · 11 comments

Comments

@simonw
Copy link
Owner

simonw commented Dec 17, 2020

Next challenge: figure out how to use the Database class from https://github.com/simonw/datasette/blob/0.53/datasette/database.py for an in-memory database which persists data for the duration of the lifetime of the server, and allows access to that in-memory database from multiple threads in a way that lets them see each other's changes.

Originally posted by @simonw in #1150 (comment)

@simonw simonw changed the title Figure out how to use the Databsae class for persistent in-memory databases Figure out how to use the Databaase class for persistent in-memory databases Dec 17, 2020
@simonw simonw changed the title Figure out how to use the Databaase class for persistent in-memory databases Figure out how to use the Database class for persistent in-memory databases Dec 17, 2020
@simonw
Copy link
Owner Author

simonw commented Dec 17, 2020

https://sqlite.org/inmemorydb.html

The database ceases to exist as soon as the database connection is closed. Every :memory: database is distinct from every other. So, opening two database connections each with the filename ":memory:" will create two independent in-memory databases.

[...]

The special ":memory:" filename also works when using URI filenames. For example:

 rc = sqlite3_open("file::memory:", &db);

[...]

However, the same in-memory database can be opened by two or more database connections as follows:

 rc = sqlite3_open("file::memory:?cache=shared", &db);

[...]
If two or more distinct but shareable in-memory databases are needed in a single process, then the mode=memory query parameter can be used with a URI filename to create a named in-memory database:

rc = sqlite3_open("file:memdb1?mode=memory&cache=shared", &db);

@simonw
Copy link
Owner Author

simonw commented Dec 17, 2020

I'm going to try with file:datasette?mode=memory&cache=shared.

@simonw
Copy link
Owner Author

simonw commented Dec 17, 2020

This works in ipython:

In [1]: import sqlite3

In [2]: c1 = sqlite3.connect("file:datasette?mode=memory&cache=shared", uri=True)

In [3]: c2 = sqlite3.connect("file:datasette?mode=memory&cache=shared", uri=True)

In [4]: c1.executescript("CREATE TABLE hello (world TEXT)")
Out[4]: <sqlite3.Cursor at 0x1104addc0>

In [5]: c1.execute("select * from sqlite_master").fetchall()
Out[5]: [('table', 'hello', 'hello', 2, 'CREATE TABLE hello (world TEXT)')]

In [6]: c2.execute("select * from sqlite_master").fetchall()
Out[6]: [('table', 'hello', 'hello', 2, 'CREATE TABLE hello (world TEXT)')]

In [7]: c3 = sqlite3.connect("file:datasette?mode=memory&cache=shared", uri=True)

In [9]: c3.execute("select * from sqlite_master").fetchall()
Out[9]: [('table', 'hello', 'hello', 2, 'CREATE TABLE hello (world TEXT)')]

In [10]: c4 = sqlite3.connect("file:datasette?mode=memory", uri=True)

In [11]: c4.execute("select * from sqlite_master").fetchall()
Out[11]: []

@simonw
Copy link
Owner Author

simonw commented Dec 17, 2020

This worked as a prototype:

diff --git a/datasette/database.py b/datasette/database.py
index 412e0c5..a90e617 100644
--- a/datasette/database.py
+++ b/datasette/database.py
@@ -24,11 +24,12 @@ connections = threading.local()
 
 
 class Database:
-    def __init__(self, ds, path=None, is_mutable=False, is_memory=False):
+    def __init__(self, ds, path=None, is_mutable=False, is_memory=False, uri=None):
         self.ds = ds
         self.path = path
         self.is_mutable = is_mutable
         self.is_memory = is_memory
+        self.uri = uri
         self.hash = None
         self.cached_size = None
         self.cached_table_counts = None
@@ -46,6 +47,8 @@ class Database:
                 }
 
     def connect(self, write=False):
+        if self.uri:
+            return sqlite3.connect(self.uri, uri=True, check_same_thread=False)
         if self.is_memory:
             return sqlite3.connect(":memory:")
         # mode=ro or immutable=1?

Then in ipython:

from datasette.app import Datasette
from datasette.database import Database
ds = Datasette([])
db = Database(ds, uri="file:datasette?mode=memory&cache=shared", is_memory=True)
await db.execute_write("create table foo (bar text)")
await db.table_names()
# Outputs ["foo"]
db2 = Database(ds, uri="file:datasette?mode=memory&cache=shared", is_memory=True)
await db2.table_names()
# Also outputs ["foo"]

@simonw
Copy link
Owner Author

simonw commented Dec 17, 2020

I'm going to add an argument to the Database() constructor which means "connect to named in-memory database called X".

db = Database(ds, memory_name="datasette")

@simonw
Copy link
Owner Author

simonw commented Dec 17, 2020

Do I use the current is_memory= boolean anywhere at the moment?

https://ripgrep.datasette.io/-/ripgrep?pattern=is_memory - doesn't look like it.

I may remove that feature, since it's not actually useful, and replace it with a mechanism for creating shared named memory databases instead.

@simonw simonw added this to the Datasette 1.0 milestone Dec 17, 2020
@simonw
Copy link
Owner Author

simonw commented Dec 17, 2020

Wait I do use it - if you run datasette --memory - which is useful for trying things out in SQL that doesn't need to run against a table.

@simonw
Copy link
Owner Author

simonw commented Dec 18, 2020

Is it possible to connect to a memory database in read-only mode?

file:foo?mode=memory&cache=shared&mode=ro isn't valid because it features mode= more than once.

https://stackoverflow.com/a/40548682 suggests using PRAGMA query_only on the connection instead.

@simonw
Copy link
Owner Author

simonw commented Dec 18, 2020

I tested this with a one-off plugin and it worked!

from datasette import hookimpl
from datasette.database import Database


@hookimpl
def startup(datasette):
    datasette.add_database("statistics", Database(
        datasette,
        memory_name="statistics"
    ))

This created a /statistics database when I ran datasette - and if I installed https://github.com/simonw/datasette-write I could then create tables in it which persisted until I restarted the server.

@simonw simonw closed this as completed in 5e9895c Dec 18, 2020
@simonw
Copy link
Owner Author

simonw commented Dec 18, 2020

This feature is illustrated by the tests:

@pytest.mark.asyncio
async def test_database_memory_name(app_client):
ds = app_client.ds
foo1 = Database(ds, memory_name="foo")
foo2 = Database(ds, memory_name="foo")
bar1 = Database(ds, memory_name="bar")
bar2 = Database(ds, memory_name="bar")
for db in (foo1, foo2, bar1, bar2):
table_names = await db.table_names()
assert table_names == []
# Now create a table in foo
await foo1.execute_write("create table foo (t text)", block=True)
assert await foo1.table_names() == ["foo"]
assert await foo2.table_names() == ["foo"]
assert await bar1.table_names() == []
assert await bar2.table_names() == []
@pytest.mark.asyncio
async def test_in_memory_databases_forbid_writes(app_client):
ds = app_client.ds
db = Database(ds, memory_name="test")
with pytest.raises(sqlite3.OperationalError):
await db.execute("create table foo (t text)")
assert await db.table_names() == []
# Using db.execute_write() should work:
await db.execute_write("create table foo (t text)", block=True)
assert await db.table_names() == ["foo"]

I added new documentation for the Datasette() constructor here as well: https://docs.datasette.io/en/latest/internals.html#database-ds-path-none-is-mutable-false-is-memory-false-memory-name-none

@simonw simonw changed the title Figure out how to use the Database class for persistent in-memory databases Database class mechanism for persistent cross-connection in-memory databases Dec 18, 2020
@simonw simonw changed the title Database class mechanism for persistent cross-connection in-memory databases Database class mechanism for cross-connection in-memory databases Dec 18, 2020
simonw added a commit that referenced this issue Jan 19, 2021
@simonw simonw modified the milestones: Datasette 1.0, Datasette 0.54 Jan 24, 2021
This was referenced Jan 25, 2021
simonw added a commit that referenced this issue Jan 25, 2021
@simonw
Copy link
Owner Author

simonw commented Jan 26, 2021

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant