Documentation touch-ups for #323.

lemon24 · Mar 5, 2024 · a5dc31a · a5dc31a
1 parent 6334473
commit a5dc31a
Show file tree

Hide file tree

Showing 4 changed files with 42 additions and 21 deletions.
diff --git a/CHANGES.rst b/CHANGES.rst
@@ -25,6 +25,7 @@ Unreleased
 * Add an internal :ref:`change tracking API <changes>`
   to formalize how search keeps in sync with storage.
   (:issue:`323`)
+* Refactor storage internals. (:issue:`323`)
 
 
 Version 3.11

diff --git a/docs/dev.rst b/docs/dev.rst
@@ -57,7 +57,6 @@ but I will prioritize supporting :doc:`contributors <contributing>`
 
   * :ref:`searchable tag values <searchable tags>`, e.g. for comments
   * :ref:`unification with entry.read/important <entry flag unification>`
-  * filter entries by entry tags, :issue:`328`
   * optimistic locking, :issue:`308`
   * filter tags by prefix, :issue:`309`
 
@@ -467,6 +466,9 @@ From the initial issue:
 
 Enabling search by default, and alternative search APIs: :issue:`252`.
 
+Change tracking API: :issue:`323#issuecomment-1930756417`, model validated in
+`this gist <https://gist.github.com/lemon24/558955ad82ba2e4f50c0184c630c668c>`_.
+
 External resources:
 
 * Comprehensive, although a bit old (2017): `What every software engineer should know about search <https://medium.com/startup-grind/what-every-software-engineer-should-know-about-search-27d1df99f80d>`_ (`full version <http://webcache.googleusercontent.com/search?q=cache:https://medium.com/startup-grind/what-every-software-engineer-should-know-about-search-27d1df99f80d&sca_esv=570067020&prmd=ivn&strip=1&vwsrc=0>`_)

diff --git a/docs/guide.rst b/docs/guide.rst
@@ -127,6 +127,8 @@ since each connection would be to a *different* database::
     reader.exceptions.StorageError: usage error: cannot use a private database from threads other than the creating thread
 
 
+.. _backups:
+
 Back-ups
 ~~~~~~~~
 
@@ -439,16 +441,27 @@ the entries that changed since the last call,
 so it is OK to call it relatively often.
 
 
-Because search adds  minor overhead to other :class:`Reader` methods
-and can almost double the size of the database,
-it can be turned on/off through the
-:meth:`~Reader.enable_search()` / :meth:`~Reader.disable_search()` methods.
-This is persistent across instances using the same database,
-and only needs to be done once.
-You can also use the ``search_enabled`` :func:`make_reader` argument
-for the same purpose.
-By default, search is disabled,
-and enabled automatically on the first :meth:`~Reader.update_search()` call.
+Search can be turned on/off through the
+:meth:`~Reader.enable_search()` / :meth:`~Reader.disable_search()` methods
+(persistent across instances using the same database),
+or the ``search_enabled`` argument of :func:`make_reader`;
+by default, search is enabled automatically
+on the first :meth:`~Reader.update_search()` call.
+If search is enabled,
+you should call :meth:`~Reader.update_search()` regularly
+to prevent unprocesses changes from accumulating over time.
+
+
+Because the search index can be almost as large as the main database,
+the default implementation splits it into a separate, attached database,
+which allows :ref:`backing up <backups>` the main database separately;
+for a reader created with ``make_reader('db.sqlite')``,
+the search index will be in ``db.sqlite.search``.
+
+
+.. versionchanged:: 3.12
+    Split the full-text search index into a separate database.
+
 
 
 

diff --git a/src/reader/_types.py b/src/reader/_types.py
@@ -1277,9 +1277,11 @@ def changes(self) -> ChangeTrackerType:
 class ChangeTrackerType(Protocol):  # pragma: no cover
     """Storage API used to keep the full-text search index in sync.
 
+    ----
+
     The sync model works as follows.
 
-    Each resource to be indexed has sequence that changes
+    Each resource to be indexed has a sequence that changes
     every time its text content changes.
     The sequence can be a global counter, a random number,
     or a high-precision timestamp;
@@ -1305,20 +1307,23 @@ class ChangeTrackerType(Protocol):  # pragma: no cover
     Processed changes are marked as done,
     regardless of the action taken. Pseudocode::
 
-        def update(self):
-            while True:
-                changes = self.storage.changes.get()
-                if not changes:
-                    break
-                self._process_changes(changes)
-                self.storage.changes.done(changes)
+        while changes := self.storage.changes.get():
+            self._process_changes(changes)
+            self.storage.changes.done(changes)
 
     Enabling change tracking sets the sequence of all resources
     and adds matching :attr:`~Action.INSERT` changes
     to allow backfilling the search index.
     The sequence may be :const:`None` when change tracking is disabled.
-    There is no guarantee the sequence of a resource is the same
-    if change tracking is disabled and then enabled again.
+    There is no guarantee the sequence of a resource remains the same
+    when change tracking is disabled and then enabled again.
+
+    .. seealso::
+
+        The model was validated using property-based testing
+        in `this gist <https://gist.github.com/lemon24/558955ad82ba2e4f50c0184c630c668c>`_.
+
+    ----
 
     The entry sequence is exposed as :attr:`.Entry._sequence`,
     and should change when