Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

table.optimize() should delete junk rows from *_fts_docsize #153

Closed
simonw opened this issue Sep 7, 2020 · 3 comments
Closed

table.optimize() should delete junk rows from *_fts_docsize #153

simonw opened this issue Sep 7, 2020 · 3 comments
Labels
enhancement New feature or request python-library

Comments

@simonw
Copy link
Owner

simonw commented Sep 7, 2020

The second challenge here is cleaning up all of those junk rows in existing *_fts_docsize tables. Doing that just to the demo database from https://github-to-sqlite.dogsheep.net/github.db dropped its size from 22MB to 16MB! Here's the SQL:

DELETE FROM [licenses_fts_docsize] WHERE id NOT IN (
  SELECT rowid FROM [licenses_fts]);

I can do that as part of the existing table.optimize() method, which optimizes FTS tables.
Originally posted by @simonw in #149 (comment)

@simonw simonw added the enhancement New feature or request label Sep 7, 2020
@simonw
Copy link
Owner Author

simonw commented Sep 7, 2020

Writing a test for this will be a tiny bit tricky. I think I'll use a test that replicates the bug in #149.

@simonw
Copy link
Owner Author

simonw commented Sep 7, 2020

FTS4 uses a different column name here: https://datasette-sqlite-fts4.datasette.io/24ways-fts4/articles_fts_docsize

CREATE TABLE 'articles_fts_docsize'(docid INTEGER PRIMARY KEY, size BLOB);

@simonw simonw closed this as completed in 3e87500 Sep 7, 2020
simonw added a commit that referenced this issue Sep 7, 2020
simonw added a commit that referenced this issue Sep 8, 2020
This isn't necessary any more since the new .rebuild_fts()
method can achieve the same thing.

Refs #155, #153
@simonw
Copy link
Owner Author

simonw commented Sep 8, 2020

I've reverted this change again, because it turns out using the rebuild FTS mechanism is a better way of repairing this issue - see #155.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request python-library
Projects
None yet
Development

No branches or pull requests

1 participant