Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Match Ancestry.com source by _APID, not SOUR #324

Open
Sternbach-Software opened this issue Oct 31, 2022 · 6 comments
Open

Match Ancestry.com source by _APID, not SOUR #324

Sternbach-Software opened this issue Oct 31, 2022 · 6 comments

Comments

@Sternbach-Software
Copy link

Sternbach-Software commented Oct 31, 2022

Running gedcom diff -hide-equal on two Ancestry gedcoms yields the same profiles with the same sources as 99.58% similar - but different, because their sources have different SOUR tags but the same _APID tags (which is the UID of Ancestry sources). Is there a way to specify to check source equality by _APID? If not from the CLI, where in code would I do this? All I want is that if two sources are the same, that they don't appear in the HTML output diff (and if they were the only diff, that the individual is not included in the diff).

@Sternbach-Software
Copy link
Author

Sternbach-Software commented Oct 31, 2022

Maybe with this SO about cmp.equals() for structs?

@Sternbach-Software
Copy link
Author

Looks like this is the function (func (node *IndividualNode) Similarity(other *IndividualNode, options SimilarityOptions) float64 {}). Though, not sure if we want to mess with their similarity (because that is checking if they are the same person), as much as excluding them from the final HTML if they are similar in everything except sources (as solely measured by _APID) - though it would have the desired effect.

@Sternbach-Software
Copy link
Author

Sternbach-Software commented Oct 31, 2022

Tag.Is(Tag) is here, but it doesn't look like that is used to exclude it from the diff.

Sternbach-Software added a commit to Sternbach-Software/gedcom that referenced this issue Nov 1, 2022
@Sternbach-Software
Copy link
Author

Found where the code should go, in SimpleNode.Equals(). You may want to add a command line param to enable or disable this.

@Sternbach-Software
Copy link
Author

Sternbach-Software commented Nov 9, 2022

And how would you address the difference in which ancestry and Geni output dates? These two should be equivalent, not similar (I want to even keep matches that are 99.99% similar, but not this).

Address Flushing
City Flushing
State New York
Continued New York United States of America
Country United States of America
Place Flushing, Queens County, New York, United States of America

@elliotchance
Copy link
Owner

For cases like this, we need to add a some custom comparison for PlaceNode (https://github.com/elliotchance/gedcom/blob/master/place_node.go). I can't remember how I implemented the similarity off the top of my head, but there should probably be a interface that nodes can implement if they want custom similarity logic.

It might make the most sense to add a String method to reduces a place into a single string line, then perform the comparison on the strings. So, in this case the similarity would be:

Flushing, New York New York United States of America, United States of America
Flushing, Queens County, New York, United States of America

These are still not exactly equal, but they are close enough to give a high similarity number that should be over the "equals" threshold.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants