Skip to content

Commit

Permalink
Documentation touch-ups for #323.
Browse files Browse the repository at this point in the history
  • Loading branch information
lemon24 committed Mar 5, 2024
1 parent 6334473 commit a5dc31a
Show file tree
Hide file tree
Showing 4 changed files with 42 additions and 21 deletions.
1 change: 1 addition & 0 deletions CHANGES.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ Unreleased
* Add an internal :ref:`change tracking API <changes>`
to formalize how search keeps in sync with storage.
(:issue:`323`)
* Refactor storage internals. (:issue:`323`)


Version 3.11
Expand Down
4 changes: 3 additions & 1 deletion docs/dev.rst
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,6 @@ but I will prioritize supporting :doc:`contributors <contributing>`

* :ref:`searchable tag values <searchable tags>`, e.g. for comments
* :ref:`unification with entry.read/important <entry flag unification>`
* filter entries by entry tags, :issue:`328`
* optimistic locking, :issue:`308`
* filter tags by prefix, :issue:`309`

Expand Down Expand Up @@ -467,6 +466,9 @@ From the initial issue:

Enabling search by default, and alternative search APIs: :issue:`252`.

Change tracking API: :issue:`323#issuecomment-1930756417`, model validated in
`this gist <https://gist.github.com/lemon24/558955ad82ba2e4f50c0184c630c668c>`_.

External resources:

* Comprehensive, although a bit old (2017): `What every software engineer should know about search <https://medium.com/startup-grind/what-every-software-engineer-should-know-about-search-27d1df99f80d>`_ (`full version <http://webcache.googleusercontent.com/search?q=cache:https://medium.com/startup-grind/what-every-software-engineer-should-know-about-search-27d1df99f80d&sca_esv=570067020&prmd=ivn&strip=1&vwsrc=0>`_)
Expand Down
33 changes: 23 additions & 10 deletions docs/guide.rst
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,8 @@ since each connection would be to a *different* database::
reader.exceptions.StorageError: usage error: cannot use a private database from threads other than the creating thread


.. _backups:

Back-ups
~~~~~~~~

Expand Down Expand Up @@ -439,16 +441,27 @@ the entries that changed since the last call,
so it is OK to call it relatively often.


Because search adds minor overhead to other :class:`Reader` methods
and can almost double the size of the database,
it can be turned on/off through the
:meth:`~Reader.enable_search()` / :meth:`~Reader.disable_search()` methods.
This is persistent across instances using the same database,
and only needs to be done once.
You can also use the ``search_enabled`` :func:`make_reader` argument
for the same purpose.
By default, search is disabled,
and enabled automatically on the first :meth:`~Reader.update_search()` call.
Search can be turned on/off through the
:meth:`~Reader.enable_search()` / :meth:`~Reader.disable_search()` methods
(persistent across instances using the same database),
or the ``search_enabled`` argument of :func:`make_reader`;
by default, search is enabled automatically
on the first :meth:`~Reader.update_search()` call.
If search is enabled,
you should call :meth:`~Reader.update_search()` regularly
to prevent unprocesses changes from accumulating over time.


Because the search index can be almost as large as the main database,
the default implementation splits it into a separate, attached database,
which allows :ref:`backing up <backups>` the main database separately;
for a reader created with ``make_reader('db.sqlite')``,
the search index will be in ``db.sqlite.search``.


.. versionchanged:: 3.12
Split the full-text search index into a separate database.




Expand Down
25 changes: 15 additions & 10 deletions src/reader/_types.py
Original file line number Diff line number Diff line change
Expand Up @@ -1277,9 +1277,11 @@ def changes(self) -> ChangeTrackerType:
class ChangeTrackerType(Protocol): # pragma: no cover
"""Storage API used to keep the full-text search index in sync.
----
The sync model works as follows.
Each resource to be indexed has sequence that changes
Each resource to be indexed has a sequence that changes
every time its text content changes.
The sequence can be a global counter, a random number,
or a high-precision timestamp;
Expand All @@ -1305,20 +1307,23 @@ class ChangeTrackerType(Protocol): # pragma: no cover
Processed changes are marked as done,
regardless of the action taken. Pseudocode::
def update(self):
while True:
changes = self.storage.changes.get()
if not changes:
break
self._process_changes(changes)
self.storage.changes.done(changes)
while changes := self.storage.changes.get():
self._process_changes(changes)
self.storage.changes.done(changes)
Enabling change tracking sets the sequence of all resources
and adds matching :attr:`~Action.INSERT` changes
to allow backfilling the search index.
The sequence may be :const:`None` when change tracking is disabled.
There is no guarantee the sequence of a resource is the same
if change tracking is disabled and then enabled again.
There is no guarantee the sequence of a resource remains the same
when change tracking is disabled and then enabled again.
.. seealso::
The model was validated using property-based testing
in `this gist <https://gist.github.com/lemon24/558955ad82ba2e4f50c0184c630c668c>`_.
----
The entry sequence is exposed as :attr:`.Entry._sequence`,
and should change when
Expand Down

0 comments on commit a5dc31a

Please sign in to comment.