You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This document explains the filter optimization system built into Gramps 6.0 and describes how developers can extend it in new or existing filter rules.
Currently this isn't very well documented, and we could do more to make the contracts around selected_handles be more explicit. Ideas welcome.
When this document is complete, we should add it to a place for developers to have easy access.
I'd like to thank @stevenyoungs for help in designing, developing, and testing the system so far.
History: prepare() and apply_to_one()
The prepare() / apply_to_one() / reset() contract on Rule is not new to Gramps 6 (although we renamed apply to apply_to_one). Many rules have always used prepare() to do expensive work once (graph traversals, bookmark lookups, etc.) and then answered apply_to_one() with a cheap set-membership check.
What Gramps 6 did was standardize the attribute name. Previously each rule used its own internal variable name for the precomputed set. Gramps 6 requires that every rule which does this stores its result in an attribute called selected_handles:
That naming convention is what makes the optimizer possible: any code can inspect a rule and ask "does this rule have a selected_handles?" without knowing anything else about the rule.
The Problem the Optimizer Solves
Even with selected_handles populated by prepare(), the old filter engine still looped over every object in the database:
# old behaviour — still iterates the whole databaseforhandleindb.get_person_handles():
person=db.get_person_from_handle(handle)
ifrule.apply_to_one(db, person): # → handle in selected_handlesfinal_list.append(handle)
That loop is O(N) in the total number of people, even when only 3 of them match. The per-object get_person_from_handle() call means loading every person from storage.
The Optimizer: Turning Filtering Inside Out
The Optimizer class (optimizer.py) runs after all prepare() calls but before the per-object loop. It inspects every rule in the filter:
If a rule has selected_handles, those handles are collected.
If a rule has find_filter() (it embeds another named filter), the optimizer
recurses into that filter's rules.
It then combines the per-rule sets according to the filter's logical operator:
Logical op
What the optimizer does
and
Intersects all selected_handles sets across rules
or
Cannot intersect; optimizer does not prune
The intersection — called handles_in — is the set of handles that could possibly match the filter. The per-object loop is then gated on this set:
forhandleinpossible_handles:
ifhandles_inisnotNoneandhandlenotinhandles_in:
continue# skip: cannot match any AND ruleobj=self.get_object(db, handle)
ifapply_logical_op(db, obj, self.flist) !=self.invert:
final_list.append(obj.handle)
For a filter with one rule and a small selected_handles, this can reduce 10k get_person_from_handle()` calls to 3.
When every rule in a filter has selected_handles — i.e. there are no unoptimized rules — the intersected handles_in set is already the exact answer. There is no need to call apply_to_one() at all. PR #2160 detects this case and returns handles_in directly without entering the loop.
This is the key insight: once all prepare() calls finish, the filter is done. The set operations are the filter. No objects need to be loaded from storage; no apply_to_one() calls are made.
Rules that Embed Another Filter: find_filter()
Some rules delegate to a user-defined custom filter by name (e.g. "People matching filter <name>"). These implement find_filter() to return the embedded GenericFilter:
The optimizer calls find_filter() when a rule lacks selected_handles, then recurses into the returned filter to collect its rules' selected_handles sets. A "Matches filter" rule transparently benefits from any selected_handles optimization in the embedded filter's rules — no extra work needed.
Using filter.apply() instead of apply_to_one() inside rules (PR #2153)
Rules that use another filter as a source (e.g. "descendants of people matching filter X") previously called apply_to_one() per object, bypassing the optimizer entirely. PR #2153 changes these to call filter.apply(db) instead. apply() runs the full optimizer pipeline and returns matching handles directly; only those handles are then used for the outer traversal.
Direct SQL + the Optimizer: The Full Optimization (PR #2177)
The optimizer removes the Python loop. But prepare() still does its work in Python — walking the family graph, loading objects one at a time from the database. PR #2177 adds register_rule_override() to let a database backend replace that Python traversal with a single native query.
Registering an override
A database subclass calls register_rule_override() to associate a rule with an override class:
The override populates selected_handles from one SQL query rather than a Python loop. The result is the same attribute; the optimizer sees and uses it identically.
The full pipeline
The combination of all three layers reduces filtering to:
prepare() via SQL override — one indexed query per rule, no Python object loading.
Optimizer intersects the sets — pure Python set operations on the returned handles.
For a SQL-backed database, a complex multi-rule AND filter can reduce entirely to a handful of SQL queries and a set intersection — no objects loaded, no Python loop, no apply_to_one() calls.
Override mechanics
The key is (namespace, rule.name) — matching the rule's class.
The original rule method is passed as original so overrides can delegate: original(self.rule, db, user).
Overrides are bypassed automatically when db.is_proxy() is True (Living and Privacy proxies), so privacy logic is never skipped.
RuleOverride provides no-op defaults for both methods; subclasses only override what they accelerate.
Apply time — time in the main loop (zero if the loop was skipped).
Starting possible_handles — handle count before optimizer pruning.
If Apply time dominates, a rule in the filter lacks selected_handles and is forcing the full loop. Adding selected_handles to that rule (or a SQL override for it) is the highest-leverage change.
NOTE: there is also a relationship with DataDict, but I'll describe that later.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
This document explains the filter optimization system built into Gramps 6.0 and describes how developers can extend it in new or existing filter rules.
Currently this isn't very well documented, and we could do more to make the contracts around
selected_handlesbe more explicit. Ideas welcome.When this document is complete, we should add it to a place for developers to have easy access.
I'd like to thank @stevenyoungs for help in designing, developing, and testing the system so far.
History:
prepare()andapply_to_one()The
prepare()/apply_to_one()/reset()contract onRuleis not new to Gramps 6 (although we renamedapplytoapply_to_one). Many rules have always usedprepare()to do expensive work once (graph traversals, bookmark lookups, etc.) and then answeredapply_to_one()with a cheap set-membership check.What Gramps 6 did was standardize the attribute name. Previously each rule used its own internal variable name for the precomputed set. Gramps 6 requires that every rule which does this stores its result in an attribute called
selected_handles:That naming convention is what makes the optimizer possible: any code can inspect a rule and ask "does this rule have a
selected_handles?" without knowing anything else about the rule.The Problem the Optimizer Solves
Even with
selected_handlespopulated byprepare(), the old filter engine still looped over every object in the database:That loop is O(N) in the total number of people, even when only 3 of them match. The per-object
get_person_from_handle()call means loading every person from storage.The Optimizer: Turning Filtering Inside Out
The
Optimizerclass (optimizer.py) runs after allprepare()calls but before the per-object loop. It inspects every rule in the filter:selected_handles, those handles are collected.find_filter()(it embeds another named filter), the optimizerrecurses into that filter's rules.
It then combines the per-rule sets according to the filter's logical operator:
andselected_handlessets across rulesorThe intersection — called
handles_in— is the set of handles that could possibly match the filter. The per-object loop is then gated on this set:For a filter with one rule and a small
selected_handles, this can reduce 10k get_person_from_handle()` calls to 3.Skipping the loop entirely (PR #2160)
When every rule in a filter has
selected_handles— i.e. there are no unoptimized rules — the intersectedhandles_inset is already the exact answer. There is no need to callapply_to_one()at all. PR #2160 detects this case and returnshandles_indirectly without entering the loop.This is the key insight: once all
prepare()calls finish, the filter is done. The set operations are the filter. No objects need to be loaded from storage; noapply_to_one()calls are made.Rules that Embed Another Filter:
find_filter()Some rules delegate to a user-defined custom filter by name (e.g. "People matching filter <name>"). These implement
find_filter()to return the embeddedGenericFilter:The optimizer calls
find_filter()when a rule lacksselected_handles, then recurses into the returned filter to collect its rules'selected_handlessets. A "Matches filter" rule transparently benefits from anyselected_handlesoptimization in the embedded filter's rules — no extra work needed.Using
filter.apply()instead ofapply_to_one()inside rules (PR #2153)Rules that use another filter as a source (e.g. "descendants of people matching filter X") previously called
apply_to_one()per object, bypassing the optimizer entirely. PR #2153 changes these to callfilter.apply(db)instead.apply()runs the full optimizer pipeline and returns matching handles directly; only those handles are then used for the outer traversal.Direct SQL + the Optimizer: The Full Optimization (PR #2177)
The optimizer removes the Python loop. But
prepare()still does its work in Python — walking the family graph, loading objects one at a time from the database. PR #2177 addsregister_rule_override()to let a database backend replace that Python traversal with a single native query.Registering an override
A database subclass calls
register_rule_override()to associate a rule with an override class:The override populates
selected_handlesfrom one SQL query rather than a Python loop. The result is the same attribute; the optimizer sees and uses it identically.The full pipeline
The combination of all three layers reduces filtering to:
prepare()via SQL override — one indexed query per rule, no Python object loading.For a SQL-backed database, a complex multi-rule AND filter can reduce entirely to a handful of SQL queries and a set intersection — no objects loaded, no Python loop, no
apply_to_one()calls.Override mechanics
(namespace, rule.name)— matching the rule's class.originalso overrides can delegate:original(self.rule, db, user).db.is_proxy()isTrue(Living and Privacy proxies), so privacy logic is never skipped.RuleOverrideprovides no-op defaults for both methods; subclasses only override what they accelerate.Adding
selected_handlesto a New RuleWhen to add it:
prepare()without inspecting every object (graph traversal from a known root, bookmark list, ID lookup).When NOT to add it:
Rules Currently Implementing
selected_handlesGeneric base
rules/_hasgrampsid.pyPerson rules
person/_hasidof.pyperson/_isdefaultperson.pyperson/_isbookmarked.pyperson/_isancestorof.pyperson/_isdescendantof.pyperson/_isdescendantfamilyof.pyperson/_isrelatedwith.pyperson/_isancestoroffiltermatch.pyperson/_isdescendantoffiltermatch.pyperson/_ischildoffiltermatch.pyperson/_isparentoffiltermatch.pyperson/_issiblingoffiltermatch.pyperson/_isdescendantfamilyoffiltermatch.pyperson/_isduplicatedancestorof.pyperson/_relationshippathbetween.pyperson/_relationshippathbetweenbookmarks.pyperson/_deeprelationshippathbetween.pyperson/_islessthannthgenerationancestorof.pyperson/_ismorethannthgenerationancestorof.pyperson/_islessthannthgenerationdescendantof.pyperson/_ismorethannthgenerationdescendantof.pyperson/_islessthannthgenerationancestorofbookmarked.pyperson/_islessthannthgenerationancestorofdefaultperson.pyFamily rules
family/_hasidof.pyfamily/_isbookmarked.pyfamily/_isancestorof.pyfamily/_isdescendantof.pyOther object types
citation/_hasidof.pyevent/_hasidof.pymedia/_hasidof.pynote/_hasidof.pyplace/_hasidof.pyrepository/_hasidof.pysource/_hasidof.pyDebugging
The filter system logs timing at the
DEBUGlevel:Each
apply()call logs:Prepare time— time in allrule.prepare()calls.Apply time— time in the main loop (zero if the loop was skipped).Starting possible_handles— handle count before optimizer pruning.If
Apply timedominates, a rule in the filter lacksselected_handlesand is forcing the full loop. Addingselected_handlesto that rule (or a SQL override for it) is the highest-leverage change.NOTE: there is also a relationship with
DataDict, but I'll describe that later.NOTE: future work is also described in #2280
Beta Was this translation helpful? Give feedback.
All reactions