Two data-quality checks for Verify the Data: undated duplicate siblings & endogamy-aware same-surname parents #2352
Replies: 1 comment
-
|
I'd be interested if such self-contained sanity checks could be made could be made re-usable in custom testing jigs... similarly to the way 'Rules' are reusable in the custom filter layering frameworks. Currently, the Verify the Data and the What's Next? gramplet are data testing jigs that allow selected sanity check to be applied to different scopes of Gramps data. Verify the Data checks the ENTIRE tree and returns ALL alerts. What's Next checks an expanding bubble of people (starting from the Home Person). The "degrees of separation" expansion continue until a sufficient number of Alerts are found and there are ways to mark persons to skip testing. It would be helpful if such tools could share each others sanity checks. And where new test jigs could be created... such as a jig to select a specific sanity check(s) to perform on a different scope(s). As an example, my research plan for a day might be to confirm a research attempt has been logged for siblings of direct ancestor where no relationship and no offspring has been recorded in the Tree. (Those who are not marked with an Number of Marriages 'Other' event or whatever "End of Line" marker you use.) So... how could Sanity Checks be made modular and reusable? |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
While building a small open-source auditor for Gramps Web trees (
gramps-tree-audit, GPL-2.0+), two of its checks surfaced problems that Verify the Data doesn't currently catch. Sharing them here in case they're worth adding to the desktop tool (plugins/tool/verify.py) — both are cheap, deterministic, and found real issues on my own tree.1. Undated duplicate siblings
Duplicate-person detection that keys on a birth year can't see two bare child records of the same family — same given name, no dates on either. Example from a real tree: a family with both
James W.andJames "Limber Jim"as separate children, neither with a birth date. Obvious to a human reading the sibling list; invisible to any year-based match.Proposed check: within a family, flag two or more children whose given name is equal after normalization (strip quoted nicknames and trailing initials —
James W.→james,James "Limber Jim"→james) when neither child has a birth date. Deliberately narrow (both-undated only) so it doesn't overlap a date-based duplicate check or fire on genuinely distinct same-named siblings.On one tree this flagged 3 real duplicate pairs — two of which I'd never have thought to look for, including one in an unrelated branch.
2. Endogamy-aware "same surname as spouse"
Flagging a parent whose recorded surname equals their spouse's is useful — it's often a married name entered in place of a maiden name. But in endogamous communities (e.g. dense-surname Appalachian families), it's frequently a genuine cousin marriage, where the surname is correct.
So if Gramps surfaces this, it's worth framing as "needs judgment," not "restore the maiden name." On my tree, of the flagged cases, one was a real data-entry error (the maiden name was recoverable) and one was a real 2nd-cousin marriage (the wife was born with that surname). A blanket "fix" would have corrupted the correct record.
Both checks are small and self-contained. If there's interest I'm happy to put together a patch for
Verify the Data. The reference implementations (and the rest of the auditor) are in the linked repo — read-only, never modifies the tree, and every finding is re-verifiable from a fresh snapshot.Beta Was this translation helpful? Give feedback.
All reactions