Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable FTS for attached dbs #609

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Conversation

petergaultney
Copy link

@petergaultney petergaultney commented Jan 2, 2024

This is just a work-in-progress, although it does:

  • Allows tables in attached databases to be 'addressed' by sqlite-utils by loading them in explicitly: db.attach('./a1.db', alias='a1'); a1_foo = db.table('foo', schema='a1'). This table should generally operate successfully, although this is far from fully tested and I expect there are some edge cases where it does not work as expected due to SQLite limitations on attached tables.
  • Permits creation of and basic usage of FTS on table(s) in an attached db, where the FTS tables are created in the main database.
    • Continues to store all the true state inside the database itself - e.g., detects existing FTS tables and automagically allows searching if the database is detached and later reattached.

All tests pass locally, as well as black and mypy.

You could call this a proof-of-concept; I think there probably would need to be discussions about some of my refactors around triggers, counts, etc, to decide whether we want to take a similar approach to those (modify the 'main' database rather than the attached ones, under the reasoning that someone attaching a database could presumably make separate connections to that database if they wanted to set up triggers, count tables, etc), do something else, or possibly try to just not support anything having to do with attached databases/tables in those cases.

This would close #608


2024-01-04: Only now am I realizing, based on the content of your tests, that you have use cases where you want to support dots being part of an actual table name. And that this means my approach is not workable, since the API would not allow us to distinguish between cases where the user intended to match an existing, attached database name, and cases where the user wanted to put that matching string in a table name inside main.

Hopefully the idea of supporting this is compelling enough that we can figure out what a better API (and internal storage format) might be. Perhaps we could use tuples ("a1", "foo") and force users to be explicit when creating/accessing the tables from the Database object, and then store the database name in a separate field inside the Queryable object?

I really liked the simplicity of a simple string (it works well for the CLI use cases as well, although maybe a CLI would never attach a database in the first place, since that's a stateful operation) -- but I am sure you don't want to break backward compatibilty.


2024-01-09: I have reworked this so that the above section is no longer an issue. The schema_name is stored explicitly within the Table object, and the only real change to functionality is that table_names() and related methods on Database will no longer operate across schemas - they instead operate on a single schema (attached database) at a time.

@petergaultney petergaultney changed the title work-in-progress FTS for attached dbs Enable FTS for attached dbs Jan 2, 2024
sqlite_utils/db.py Outdated Show resolved Hide resolved
sqlite_utils/db.py Outdated Show resolved Hide resolved
Comment on lines 1429 to 1431
@property
def is_attached(self) -> bool:
return dbname(self.name) not in {'', 'main'}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is used nowhere but seems like a nice thing to provide direct access to.

Comment on lines 1433 to 1438
@property
def _pragma_name(self) -> Tuple[str, str]:
if "." in self.name:
db, name = self.name.split(".")
return db + ".", name
return "", self.name
Copy link
Author

@petergaultney petergaultney Jan 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is maybe premature optimization for not changing too much of the code in an visually noisy way. It really just lets me modify all the places we do PRAGMA things without making big changes to what's there.

Also, it probably deserves a better name, because it is also useful (as seen below) in non-PRAGMA situations, but I didn't want to bikeshed names for something that might not be an approach you'd like in the first place.

@@ -2386,6 +2444,7 @@ def enable_fts(
:param tokenize: Custom SQLite tokenizer to use, for example ``"porter"`` to enable Porter stemming.
:param replace: Should any existing FTS index for this table be replaced by the new one?
"""
table_name = tablename(self.name)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the end, the actual changes to enable FTS were, as expected, essentially nil outside the changes to support 'addressing' attached Databases and their Tables.

Hopefully this new ability can be an overall win for the library, and hopefully since you have a lot of experience with SQLite you can help advise on some of the pitfalls I am missing.

Copy link

codecov bot commented Feb 15, 2024

Codecov Report

Attention: 5 lines in your changes are missing coverage. Please review.

Comparison is base (17eb818) 95.69% compared to head (61de8d9) 95.56%.

Files Patch % Lines
sqlite_utils/db.py 93.15% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #609      +/-   ##
==========================================
- Coverage   95.69%   95.56%   -0.13%     
==========================================
  Files           8        8              
  Lines        2854     2886      +32     
==========================================
+ Hits         2731     2758      +27     
- Misses        123      128       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Attached database tables representable by Table
1 participant