Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document (and reconsider design of) Database.execute() and Database.execute_against_connection_in_thread() #685

Closed
simonw opened this issue Feb 25, 2020 · 15 comments

Comments

@simonw
Copy link
Owner

@simonw simonw commented Feb 25, 2020

In #683 I started a new section of internals documentation covering the Database class: https://datasette.readthedocs.io/en/latest/internals.html#database-class

I decided not to document .execute() and .execute_against_connection_in_thread() yet because I'm not 100% happy with their API design yet.

@simonw
Copy link
Owner Author

@simonw simonw commented Feb 25, 2020

Current implementations:

async def execute_against_connection_in_thread(self, fn):
def in_thread():
conn = getattr(connections, self.name, None)
if not conn:
conn = self.connect()
self.ds.prepare_connection(conn, self.name)
setattr(connections, self.name, conn)
return fn(conn)
return await asyncio.get_event_loop().run_in_executor(
self.ds.executor, in_thread
)
async def execute(
self,
sql,
params=None,
truncate=False,
custom_time_limit=None,
page_size=None,
log_sql_errors=True,
):
"""Executes sql against db_name in a thread"""
page_size = page_size or self.ds.page_size
def sql_operation_in_thread(conn):
time_limit_ms = self.ds.sql_time_limit_ms
if custom_time_limit and custom_time_limit < time_limit_ms:
time_limit_ms = custom_time_limit
with sqlite_timelimit(conn, time_limit_ms):
try:
cursor = conn.cursor()
cursor.execute(sql, params or {})
max_returned_rows = self.ds.max_returned_rows
if max_returned_rows == page_size:
max_returned_rows += 1
if max_returned_rows and truncate:
rows = cursor.fetchmany(max_returned_rows + 1)
truncated = len(rows) > max_returned_rows
rows = rows[:max_returned_rows]
else:
rows = cursor.fetchall()
truncated = False
except (sqlite3.OperationalError, sqlite3.DatabaseError) as e:
if e.args == ("interrupted",):
raise QueryInterrupted(e, sql, params)
if log_sql_errors:
print(
"ERROR: conn={}, sql = {}, params = {}: {}".format(
conn, repr(sql), params, e
)
)
raise
if truncate:
return Results(rows, truncated, cursor.description)
else:
return Results(rows, False, cursor.description)
with trace("sql", database=self.name, sql=sql.strip(), params=params):
results = await self.execute_against_connection_in_thread(
sql_operation_in_thread
)
return results

At the very least the method name execute_against_connection_in_thread() should be updated to something that's more similar to the new (and documented) .execute_write_fn() method.

@simonw
Copy link
Owner Author

@simonw simonw commented May 8, 2020

Working with this in simonw/datasette-media#2 made me really want to redesign this API: simonw/datasette-media@be23ec3

@simonw
Copy link
Owner Author

@simonw simonw commented May 8, 2020

I'm going to rename execute_against_connection_in_thread() to just execute_fn().

@simonw
Copy link
Owner Author

@simonw simonw commented May 8, 2020

The API design for the .execute() function is actually fine, I think it's more the API of the returned Results object that I want to improve.

That object encapsulates the returned data, the named columns and whether or not the result was truncated.

return Results(rows, truncated, cursor.description)

The rows argument comes from either rows = cursor.fetchmany(max_returned_rows + 1) or rows = cursor.fetchall(). It's a Python list of sqlite3.Row objects.

Here's the current class implementation:

class Results:
def __init__(self, rows, truncated, description):
self.rows = rows
self.truncated = truncated
self.description = description
@property
def columns(self):
return [d[0] for d in self.description]
def __iter__(self):
return iter(self.rows)
def __len__(self):
return len(self.rows)

@simonw
Copy link
Owner Author

@simonw simonw commented May 8, 2020

Maybe all I really want here is to add length and access-by-index methods to the Results class? So I don't have to do results.rows[0][0] to get at a single returned value.

Also how about a results.single_value() method which returns a value only if there is a single row with a single column, otherwise raises an exception?

@simonw
Copy link
Owner Author

@simonw simonw commented May 8, 2020

I'm going to add .first() and .last() methods too. These will return the first or last row, or None if the results are empty (rather than raising an IndexError).

@simonw
Copy link
Owner Author

@simonw simonw commented May 8, 2020

I'll write the API documentation first, in a branch.

@simonw
Copy link
Owner Author

@simonw simonw commented May 8, 2020

I'm not going to do .last() because it could be confusing if truncation has come into play.

@simonw
Copy link
Owner Author

@simonw simonw commented May 8, 2020

@simonw
Copy link
Owner Author

@simonw simonw commented May 8, 2020

This code should live somewhere other than datasette/utils/__init__.py. Especially the exceptions, since calling code needs to be able to sensibly import them.

@simonw
Copy link
Owner Author

@simonw simonw commented May 8, 2020

I'll move the code into datasette/database.py.

@simonw
Copy link
Owner Author

@simonw simonw commented May 8, 2020

[ Fun aside: I implemented and shipped this entire branch in my browser using the beta of GitHub's new Codespaces ]

@simonw simonw closed this as completed in 4433306 May 8, 2020
@simonw
Copy link
Owner Author

@simonw simonw commented May 8, 2020

New documentation is live here: https://datasette.readthedocs.io/en/latest/internals.html#database-class

@simonw
Copy link
Owner Author

@simonw simonw commented May 8, 2020

I'm going to ship a release with just this change purely so I can start depending on it from my in-development https://github.com/simonw/datasette-media plugin.

@simonw
Copy link
Owner Author

@simonw simonw commented May 8, 2020

Re-opening this ticket because I forgot to document execute_fn() (the old execute_against_connection_in_thread() method).

@simonw simonw reopened this May 8, 2020
simonw added a commit that referenced this issue May 8, 2020
simonw added a commit that referenced this issue May 8, 2020
@simonw simonw closed this as completed May 8, 2020
simonw added a commit to simonw/datasette-media that referenced this issue May 8, 2020
Using new .first() method from simonw/datasette#685
@simonw simonw added this to the Datasette 1.0 milestone May 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant