Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming CSV spends a lot of time in table_column_details #1673

Open
simonw opened this issue Mar 20, 2022 · 1 comment
Open

Streaming CSV spends a lot of time in table_column_details #1673

simonw opened this issue Mar 20, 2022 · 1 comment

Comments

@simonw
Copy link
Owner

simonw commented Mar 20, 2022

At least I think it does. I tried running py-spy top -p $PID against a Datasette process that was trying to do:

datasette covid.db --get '/covid/ny_times_us_counties.csv?_size=10&_stream=on'

While investigating:

And spotted this:

datasette covid.db --get /covid/ny_times_us_counties.csv?_size=10&_stream=on' (python v3.10.2)
Total Samples 5800
GIL: 71.00%, Active: 98.00%, Threads: 4

  %Own   %Total  OwnTime  TotalTime  Function (filename:line)                                                                                                                                            
  8.00%   8.00%    4.32s     4.38s   sql_operation_in_thread (datasette/database.py:212)
  5.00%   5.00%    3.77s     3.93s   table_column_details (datasette/utils/__init__.py:614)
  6.00%   6.00%    3.72s     3.72s   _worker (concurrent/futures/thread.py:81)
  7.00%   7.00%    2.98s     2.98s   _read_from_self (asyncio/selector_events.py:120)
  5.00%   6.00%    2.35s     2.49s   detect_fts (datasette/utils/__init__.py:571)
  4.00%   4.00%    1.34s     1.34s   _write_to_self (asyncio/selector_events.py:140)

Relevant code:

def table_column_details(conn, table):
if supports_table_xinfo():
# table_xinfo was added in 3.26.0
return [
Column(*r)
for r in conn.execute(
f"PRAGMA table_xinfo({escape_sqlite(table)});"
).fetchall()
]
else:
# Treat hidden as 0 for all columns
return [
Column(*(list(r) + [0]))
for r in conn.execute(
f"PRAGMA table_info({escape_sqlite(table)});"
).fetchall()
]

@simonw
Copy link
Owner Author

simonw commented Mar 20, 2022

Maybe it's because supports_table_xinfo() creates a brand new in-memory SQLite connection every time you call it?

def _sqlite_version():
return tuple(
map(
int,
sqlite3.connect(":memory:")
.execute("select sqlite_version()")
.fetchone()[0]
.split("."),
)
)
def supports_table_xinfo():
return sqlite_version() >= (3, 26, 0)

Actually no, I'm caching that already:

_cached_sqlite_version = None
def sqlite_version():
global _cached_sqlite_version
if _cached_sqlite_version is None:
_cached_sqlite_version = _sqlite_version()
return _cached_sqlite_version

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant