<a href="https://colab.research.google.com/github/ronyates47/Gedcom-Utils/blob/main/GOLD_20250426.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!pip install pandas
!pip install python-gedcom
!pip install openpyxl
!pip install xlsxwriter
!pip install mlxtend


Collecting python-gedcom
  Downloading python_gedcom-1.0.0-py2.py3-none-any.whl.metadata (15 kB)
Downloading python_gedcom-1.0.0-py2.py3-none-any.whl (35 kB)
Installing collected packages: python-gedcom
Successfully installed python-gedcom-1.0.0
Collecting xlsxwriter
  Downloading XlsxWriter-3.2.3-py3-none-any.whl.metadata (2.7 kB)
Downloading XlsxWriter-3.2.3-py3-none-any.whl (169 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m169.4/169.4 kB[0m [31m3.3 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: xlsxwriter
Successfully installed xlsxwriter-3.2.3


In [7]:
# 04_18_2025_1500

#!/usr/bin/env python
"""
GEDCOM Composite Score Script using:
 - Chunk-based Parallel Processing for Speed (Stage 1: genealogical line creation)
 - A Trie-based approach, then final "Value" = 5 * (number of couples with node.count >=2) + (total couples)

For ancestral lines where none of the couples are repeated (a one-off line), the Value is still computed.
Now, instead of composite scoring, two new columns are added:
  - Value Range (the numeric bracket)
  - Value Label (a descriptive label)

Exports final CSV/HTML sorted by "Yates DNA Ancestral Line".
"""

import csv
import glob
import logging
import functools
import os
from datetime import datetime
from collections import defaultdict, Counter
import numpy as np
import pandas as pd
from concurrent.futures import ProcessPoolExecutor
from tqdm import tqdm

logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)

###############################################################################
# Global Variables
###############################################################################
anchor_gen1 = None
visited_pairs = set()
generation_table = []

###############################################################################
# Trie Data Structure
###############################################################################
class TrieNode:
    """A simple Trie node for storing a couple and counting how many lines pass here."""
    def __init__(self):
        self.count = 0
        self.children = {}  # maps couple string -> TrieNode

class Trie:
    def __init__(self):
        self.root = TrieNode()

    def insert_line(self, couples_list):
        """
        Insert a reversed line (list of couples) into the trie.
        Increment .count for each node visited.
        """
        current = self.root
        for couple in couples_list:
            if couple not in current.children:
                current.children[couple] = TrieNode()
            current = current.children[couple]
            current.count += 1

    def get_couple_count(self, couples_list):
        """
        For each couple in this line, retrieve the node.count if it exists.
        Returns a list of node.count values, in order.
        """
        counts = []
        current = self.root
        for couple in couples_list:
            if couple in current.children:
                current = current.children[couple]
                counts.append(current.count)
            else:
                counts.append(0)
                break
        return counts

###############################################################################
# Utility: chunk generator
###############################################################################
def chunks(lst, n):
    """Yield successive n-sized chunks from lst."""
    for i in range(0, len(lst), n):
        yield lst[i:i + n]

###############################################################################
# GedcomDataset
###############################################################################
class GedcomDataset:
    def __init__(self, gen_person):
        self.gen_person = gen_person
        self.extractable_detail = {}
        self.anchor_gen1 = None

    def add_extractable_detail(self, key, value):
        self.extractable_detail[key] = value

    def get_gen_person(self):
        name = self.extractable_detail.get('NAME', '')
        parts = name.split('/', 1)
        first_name = parts[0].split(' ')[0]
        last_name = parts[1].rstrip('/') if len(parts) > 1 else ""
        self.anchor_gen1 = last_name.replace(" ", "") + first_name.replace(" ", "")
        global anchor_gen1
        anchor_gen1 = self.anchor_gen1
        return self.gen_person.strip('@')

    def get_anchor_gen1(self):
        return self.anchor_gen1

    def get_extractable_NPFX(self):
        return self.extractable_detail.get('NPFX', '')

    def get_extractable_cm(self):
        """
        Extract cM from NPFX field. If NPFX has a format like "175&someSort**someYDNA",
        the cM is '175'. If it doesn't parse cleanly, returns blank.
        """
        npfx_value = self.extractable_detail.get('NPFX', '')
        if '&' in npfx_value:
            cm_value = npfx_value.split('&')[0].strip()
        elif '**' in npfx_value:
            cm_value = npfx_value.split('**')[0].strip()
        else:
            cm_value = npfx_value.strip()
        try:
            int(cm_value)
            return cm_value
        except ValueError:
            return ''

    def get_extractable_sort(self):
        """
        If NPFX has "xxx&sortVal**ydnaVal", returns sortVal. If not found, blank.
        """
        npfx_value = self.extractable_detail.get('NPFX', '')
        if '&' in npfx_value:
            sort_part = npfx_value.split('&')[1]
            if '**' in sort_part:
                sort_value = sort_part.split('**')[0].strip()
            else:
                sort_value = sort_part.strip()
            return sort_value
        return ''

    def get_extractable_YDNA(self):
        """
        If NPFX has something like "...**ydnaVal", return ydnaVal. If not found, blank.
        """
        npfx_value = self.extractable_detail.get('NPFX', '')
        if '**' in npfx_value:
            ydna_value = npfx_value.split('**')[1].strip()
            return ydna_value
        return ''

    def get_extractable_FAMC(self):
        return self.extractable_detail.get('FAMC', '').strip('@')

###############################################################################
# Gedcom Class
###############################################################################
class Gedcom:
    def __init__(self, file_name):
        self.file_name = file_name
        self.gedcom_datasets = []
        self.filter_pool = []

    def parse_gedcom(self):
        with open(self.file_name, 'r', encoding='utf-8-sig') as f:
            lines = f.readlines()

        current_dataset = None
        npfx_count = 0
        ydna_count = 0
        total_count = 0

        for line in lines:
            parts = line.strip().split(' ', 2)
            level = int(parts[0])
            tag = parts[1]
            value = parts[2] if len(parts) > 2 else None

            if level == 0 and tag.startswith('@') and tag.endswith('@') and value == 'INDI':
                total_count += 1
                current_dataset = GedcomDataset(tag)
                self.gedcom_datasets.append(current_dataset)
            elif current_dataset is not None:
                if level == 1 and tag in ['NAME', 'FAMC']:
                    current_dataset.add_extractable_detail(tag, value)
                elif level == 2 and tag == 'NPFX':
                    npfx_count += 1
                    current_dataset.add_extractable_detail(tag, value)
                    if '**' in value:
                        ydna_count += 1

        autosomal_count = npfx_count - ydna_count
        print(f"GEDCOM contained {total_count} total records")
        print(f"Records tagged and filtered by NPFX: {npfx_count}")
        print(f"Records with YDNA information: {ydna_count}")
        print(f"Autosomal matches: {autosomal_count}")

        for ds in self.gedcom_datasets:
            if ds.get_extractable_NPFX():
                self.filter_pool.append(ds)

        # Optional second-level filter
        manual_filter_activated = True
        if manual_filter_activated:
            try:
                df = pd.read_excel('filtered_ids.xlsx')
            except FileNotFoundError:
                logger.warning("filtered_ids.xlsx not found. Skipping second-level manual filter.")
            else:
                manual_filtered_ids = set(df['ID'])
                self.filter_pool = [
                    d for d in self.filter_pool if d.get_gen_person() in manual_filtered_ids
                ]
                print(f"After manual filter, total records: {len(self.filter_pool)}")
                logger.info(f"After manual filter, total records: {len(self.filter_pool)}")

        return autosomal_count

###############################################################################
# quick_extract_name
###############################################################################
def quick_extract_name(full_text):
    """
    Minimal function to extract a short name from a GEDCOM chunk.
    """
    name_marker = "\n1 NAME "
    idx = full_text.find(name_marker)
    if idx == -1:
        if full_text.startswith("1 NAME "):
            idx = 0
        else:
            return "UnknownName"
    start = idx + len(name_marker)
    end = full_text.find('\n', start)
    if end == -1:
        end = len(full_text)
    name_line = full_text[start:end].strip()
    if '/' not in name_line:
        return name_line[:10].replace(" ", "")
    first_name, last_name = name_line.split('/', 1)
    last_name = last_name.replace("/", "").strip()
    return last_name[:10].replace(" ", "") + first_name[:10].replace(" ", "")

###############################################################################
# Parents, Ancestors
###############################################################################
def find_parents(individual_id, generation, parents_map):
    global visited_pairs, generation_table
    if individual_id not in parents_map:
        return
    father_id, mother_id = parents_map[individual_id]
    if not father_id and not mother_id:
        return
    pair = (father_id, mother_id)
    if pair not in visited_pairs:
        visited_pairs.add(pair)
        generation_table.append((generation, pair))
    if father_id:
        find_parents(father_id, generation+1, parents_map)
    if mother_id:
        find_parents(mother_id, generation+1, parents_map)

def find_distant_ancestors(individual_id, parents_map, path=None):
    if path is None:
        path = []
    path.append(individual_id)
    if individual_id not in parents_map:
        return [path]
    father_id, mother_id = parents_map[individual_id]
    if not father_id and not mother_id:
        return [path]
    paths = []
    if father_id:
        paths.extend(find_distant_ancestors(father_id, parents_map, path[:]))
    if mother_id:
        paths.extend(find_distant_ancestors(mother_id, parents_map, path[:]))
    return paths if paths else [path]

###############################################################################
# filter_ancestral_line
###############################################################################
def filter_ancestral_line(winning_path_ids, generation_table_local, names_map):
    matching_table = []
    for generation, pair in generation_table_local:
        id1, id2 = pair
        if id1 in winning_path_ids or id2 in winning_path_ids:
            matching_table.append((generation, pair))
    matching_table.sort(key=lambda x: x[0])
    lines = []
    for gen, pair in matching_table:
        name_pair = [names_map.get(pid, "UnknownName") for pid in pair]
        lines.append(f"{name_pair[0]}&{name_pair[1]}")
    lines.reverse()
    return "~~~".join(lines)

###############################################################################
# process_record_wrapper (parallel) - STAGE 1
###############################################################################
def process_record_wrapper(individual_id, gedcom_instance, parents_map, names_map):
    """
    This is the function used in parallel for 'Processing individuals' stage.
    It gathers and builds the 'Yates DNA Ancestral Line' for each ID.
    """
    global generation_table, visited_pairs, anchor_gen1
    generation_table = []
    visited_pairs = set()

    find_parents(individual_id, 1, parents_map)
    distant_anc_paths = find_distant_ancestors(individual_id, parents_map)

    best_score = None
    best_path = None
    for path in distant_anc_paths:
        name_path = [names_map.get(pid, "UnknownName") for pid in path]
        score = 0
        for idx, nm in enumerate(name_path):
            if 'Yates' in nm:
                score += (idx + 1)
        if best_score is None or score > best_score:
            best_score = score
            best_path = path

    if not best_path:
        best_path = []

    # remove individual's own ID
    best_path_cleaned = [pid for pid in best_path if pid != individual_id]

    line_str = filter_ancestral_line(set(best_path_cleaned), generation_table, names_map)

    cm_value = ''
    sort_value = ''
    ydna_value = ''
    anchor_name = ''
    for ds in gedcom_instance.filter_pool:
        if ds.get_gen_person() == individual_id:
            cm_value = ds.get_extractable_cm()
            sort_value = ds.get_extractable_sort()
            ydna_value = ds.get_extractable_YDNA()
            anchor_name = ds.get_anchor_gen1()
            break

    short_name = names_map.get(individual_id, "UnknownName")
    # Return columns: ID#, Match to, Name, cM, Yates DNA Ancestral Line
    return [individual_id, sort_value, short_name, cm_value, line_str]

###############################################################################
# main()
###############################################################################
def main():
    def select_gedcom():
        files = glob.glob("*.ged")
        if not files:
            print("No GEDCOM files found.")
            return None
        print("Automatically selecting the first GEDCOM file.")
        return files[0]

    gedcom_file_path = select_gedcom()
    if not gedcom_file_path:
        print("No GEDCOM file selected; exiting.")
        return

    # 1) Parse GEDCOM and capture autosomal_count
    ged = Gedcom(gedcom_file_path)
    autosomal_count = ged.parse_gedcom()  # <-- autosomal_count returned now
    filter_count = len(ged.filter_pool)

    with open("autosomal_count.txt", "w") as f:
        f.write(str(autosomal_count))

    print("Records tagged and filtered by NPFX:", filter_count)

       # 2) Build parents_map, names_map from raw GEDCOM
    with open(gedcom_file_path, 'r', encoding='utf-8') as f:
        raw_data = f.read()

    blocks = raw_data.split('\n0 ')
    all_records = {}
    for blk in blocks:
        blk = blk.strip()
        if not blk:
            continue
        flend = blk.find('\n')
        if flend == -1:
            flend = len(blk)
        first_line = blk[:flend]
        if '@' in first_line:
            start = first_line.find('@') + 1
            end = first_line.find('@', start)
            rec_id = first_line[start:end].strip()
            all_records[rec_id] = blk

    parents_map = {}
    names_map = {}

    for rec_id, txt in all_records.items():
        nm = quick_extract_name("\n" + txt)
        names_map[rec_id] = nm

    # gather families
    families = {}
    for rec_id, txt in all_records.items():
        if 'FAM' in txt[:50]:
            father_idx = txt.find('1 HUSB @')
            if father_idx != -1:
                start = father_idx + len('1 HUSB @')
                end = txt.find('@', start)
                husb_id = txt[start:end]
            else:
                husb_id = None

            wife_idx = txt.find('1 WIFE @')
            if wife_idx != -1:
                start = wife_idx + len('1 WIFE @')
                end = txt.find('@', start)
                wife_id = txt[start:end]
            else:
                wife_id = None

            kids = []
            lines_ = txt.split('\n')
            for ln in lines_:
                if ln.strip().startswith('1 CHIL @'):
                    s2 = ln.strip().split('1 CHIL @')[1]
                    kid_id = s2.split('@')[0]
                    kids.append(kid_id)

            families[rec_id] = (husb_id, wife_id, kids)

    for fam_id, (f_id, m_id, k_list) in families.items():
        for kid in k_list:
            parents_map[kid] = (f_id, m_id)

    # 3) Gather ID list
    individual_ids = [d.get_gen_person() for d in ged.filter_pool]
    print(f"Processing {len(individual_ids)} individuals with chunk-based parallel...")

    # 4) Stage 1: Chunk-based parallel to build lines
    combined_rows = []
    chunk_size = 50
    max_workers = os.cpu_count() or 4
    logger.info("Starting chunk-based parallel processing with %d workers.", max_workers)

    total_records = len(individual_ids)
    from functools import partial

    with ProcessPoolExecutor(max_workers=max_workers) as executor, \
         tqdm(total=total_records, desc="Building Yates Lines (Stage 1)") as pbar:
        for chunk in chunks(individual_ids, chunk_size):
            func = partial(
                process_record_wrapper,
                gedcom_instance=ged,
                parents_map=parents_map,
                names_map=names_map
            )
            results = list(executor.map(func, chunk))
            combined_rows.extend(results)
            pbar.update(len(chunk))

    # combined_rows now has 5 columns: [ID#, "Match to", "Name", "cM", "Yates DNA Ancestral Line"]
    columns = ["ID#", "Match to", "Name", "cM", "Yates DNA Ancestral Line"]
    df = pd.DataFrame(combined_rows, columns=columns)

    df.index += 1

    def remove_specific_prefix(row):
        """
        Removes a specific hardcoded prefix from the 'Yates DNA Ancestral Line' if it matches exactly.
        """
        prefix = "YatesJohn&SearchingStill~~~YatesWilliam&SearchingStill~~~YatesWilliam&SearchingStill~~~YatesEdmund&CornellMargaret~~~"
        line = row.get("Yates DNA Ancestral Line", "")

        if line.startswith(prefix):
            row["Yates DNA Ancestral Line"] = line[len(prefix):]  # Trim the prefix
        return row

    # Apply this to your DataFrame
    df = df.apply(remove_specific_prefix, axis=1)


    # 5) Build a Trie from all reversed lines
    logger.info("Building Trie from reversed lines...")
    trie = Trie()
    num_lines_inserted = 0
    for _, row in df.iterrows():
        line_str = row["Yates DNA Ancestral Line"]
        if pd.isna(line_str) or not line_str.strip():
            continue
        couples_list = [x.strip() for x in line_str.split("~~~") if x.strip()]
        trie.insert_line(couples_list)
        num_lines_inserted += 1
    logger.info("Inserted %d lines into the trie.", num_lines_inserted)

    # 6) Compute final "Value" = 5*(#couples with node.count >=2) + (total couples)
    values = []
    prefix_counts = []  # store the count of couples with node.count >=2 for each line
    logger.info("Computing 'Value' = 5*(#couples with node.count >=2) + (total couples) ...")
    for idx, row in df.iterrows():
        line_str = row["Yates DNA Ancestral Line"]
        if pd.isna(line_str) or not line_str.strip():
            values.append(0)
            prefix_counts.append(0)
        else:
            couples_list = [x.strip() for x in line_str.split("~~~") if x.strip()]
            line_length = len(couples_list)
            node_counts = trie.get_couple_count(couples_list)
            # Count only couples that appear in at least two lines
            prefix_count = sum(1 for c in node_counts if c >= 2)
            val = 5 * prefix_count + line_length
            values.append(val)
            prefix_counts.append(prefix_count)
    df["Value"] = values
    df["PrefixCount"] = prefix_counts



    # 7) Assign Value Range and Value Label based on the calculated Value
    def assign_value_range_label(val):
      try:
          val = float(val)
      except (ValueError, TypeError):
          return "", ""

      if val >= 60:
        return ">=60", "1-likely correct"

      elif 47 <= val <= 59:
          return "59~47", "2-lines forming"
      elif 34 <= val <= 46:
          return "46~34", "3-patterns emerging"
      elif 21 <= val <= 33:
          return "33~21", "4-notable patterns"
      elif 8 <= val <= 20:
          return "20~8", "5-patterns stable"
      elif 1 <= val <= 7:
          return f"{val:.0f}", "6-need research"
      else:
          return f"{val:.0f}", "0-uncategorized"



    value_ranges = []
    value_labels = []
    for v in df["Value"]:
        rng, lbl = assign_value_range_label(v)
        value_ranges.append(rng)
        value_labels.append(lbl)
    df["Value Range"] = value_ranges
    df["Value Label"] = value_labels

    # 8) Sort final by "Yates DNA Ancestral Line"
    df.sort_values(by=["Yates DNA Ancestral Line"], ascending=True, inplace=True)

    # Remove the temporary PrefixCount column from the final output
    df.drop("PrefixCount", axis=1, inplace=True)

# final_cols = [
#     "ID#",
#     "cM",
#     "Match to",
#     "Value",
#     "Value Range",
#     "Value Label",
#     "Yates DNA Ancestral Line"
# ]

    final_cols = [
      "ID#",
      "cM",
      "Match to",
      "Value Range",
      "Value Label",
      "Yates DNA Ancestral Line"
]

    logger.info("Final DataFrame columns: %s", df.columns.tolist())
    print(df.head(10))

#    print(df[["ID#", "Yates DNA Ancestral Line"]].head(32))


    # 10) Export CSV and HTML
    csv_name = "final_combined_df_with_value_labels.csv"
    df.to_csv(csv_name, index=False)
    logger.info("Exported final DataFrame to '%s'.", csv_name)

    html_name = "HTML_combined_df_with_value_labels.html"
    css_style = """
    <style>
    table {
      width: 100%;
      border-collapse: collapse;
      margin: 20px 0;
    }
    table, th, td {
      border: 1px solid #333;
    }
    th, td {
      padding: 8px 12px;
      text-align: center;
    }
    th {
      background-color: #f2f2f2;
    }
    /* Left-align the last column */
    td:nth-child(6) {
      text-align: left;
}
    }
    </style>
    """
    html_content = css_style + df.to_html(
        index=False,
        columns=final_cols,
        escape=False
    )
    with open(html_name, "w", encoding="utf-8") as f:
        f.write(html_content)
    logger.info("Exported HTML to '%s'.", html_name)

if __name__ == '__main__':
    main()


Automatically selecting the first GEDCOM file.




GEDCOM contained 59702 total records
Records tagged and filtered by NPFX: 1417
Records with YDNA information: 90
Autosomal matches: 1327
Records tagged and filtered by NPFX: 1417
Processing 1417 individuals with chunk-based parallel...


Building Yates Lines (Stage 1): 100%|██████████| 1417/1417 [14:02<00:00,  1.68it/s]


         ID#         Match to               Name  cM  \
897   I53693          fridine   RosenbalmJessica  20   
681   I51586     hendricksjas     CrossFrancesCa  29   
1357  I59027       yeatesd_ws    JordanTravisLil  28   
1036  I54946       yatesjohnh      SloverDeborah  25   
1040  I54968  girtain,kathryn     BurchNaomiEuge  25   
90    I38493            Y-DNA    BurtonMerrittCa  01   
1415  I59628       yeatesd_tm    StewartLisaJean  11   
274   I46128    yates,andreal     JonesMaryKathe  23   
949   I54181           marmar      ReedLindaGail  21   
744   I52241          klingal  PhillipsPatriciaK  24   

                               Yates DNA Ancestral Line  Value Value Range  \
897   ArvinWilliamHe&YatesMargaretE~~~ArvinJohnAmbro...      5           5   
681   BaileyWilliam&YatesRhoda~~~CrossJamesMadi&Bail...      5           5   
1357  BelkThomas&YatesElizabeth~~~HelmsAsa&BelkHanna...      6           6   
1036  BennettWilliamBu&YatesElllen~~~CarmeliaEmanuel...      6         

In [11]:
# Cell 2: XHTML Template + Export (Fully Self-Contained Final Version)

import pandas as pd
from IPython.display import display, HTML
from datetime import datetime

# Load final CSV
df = pd.read_csv("final_combined_df_with_value_labels.csv")

# Load autosomal count
try:
    with open("autosomal_count.txt", "r") as f:
        autosomal_count = f.read().strip()
except FileNotFoundError:
    autosomal_count = "Unknown"

# Today's Date
today_date = datetime.today().strftime('%Y-%m-%d')

# XHTML Template
full_html_template = """<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
  <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
  <meta name="GENERATOR" content="Yatesville"/>
  <meta name="viewport" content="width=device-width, initial-scale=1"/>
  <title>DNA Report Card</title>
  <script src="../sorttable.js" type="text/javascript"></script>
  <style type="text/css">
    body {
      font-family: Arial, Helvetica, sans-serif;
      font-size: 18px; /* was 20px */
      background-color: #faf9d3;
    }
    .output-table table {
      width: 100%;
      border-collapse: collapse;
      margin: 15px 0; /* was 20px */
      background-color: #faf9d3;
    }
    .output-table table, .output-table th, .output-table td {
      border: 1px solid #333;
      text-align: center;
      background-color: #faf9d3;
      padding: 5px 8px; /* was 8px 12px */
    }
    .output-table th {
      background-color: #ffffcc;
      color: black;
      white-space: nowrap;
    }
    .output-table th:hover {
      background-color: #ffeb99;
    }
    .output-table td:nth-child(5) {
      min-width: 160px; /* tighten this slightly */
    }
    .output-table td:last-child, .output-table th:last-child {
      text-align: left;
      white-space: nowrap;
    }
</style>
</head>
<body>
<div align="center">

  <table class="fullpage-definedsection" cellpadding="0"><tr valign="top"><td>
    <table class="headersection" cellpadding="0"><tr valign="top"><td></td></tr></table>
    <table class="mainsection" cellpadding="7">
      <tr valign="top">
        <td>

          <!-- Intro Text -->
          <h2>A report card for your DNA family tree</h2>
          <font size="-2">
            Return to <a href="https://yates.one-name.net/gengen/dna_cousin_surname_study.htm">Study Home</a>
            &nbsp;&nbsp;|&nbsp;&nbsp;
            Autosomal matches: {autosomal_count}
            &nbsp;&nbsp;|&nbsp;&nbsp;
            Updated: {today_date}
          </font>
          <p>Imagine you have a report card for your family tree that tells you how "special" it is. Here’s how we break it down:</p>
          <p>Think of value like the total number of points you get from finding all the important family connections in your tree<br>
          and comparing them to all the other trees included in the Yates study.</p>
          <p>We then group them as a way to signal which ones seem to have potential for study:
            <b>>60:</b> likely correct, <b>59–47:</b><br> lines forming, <b>46–34:</b> patterns emerging,
            <b>33–21:</b> notable patterns, <b>20–8:</b> patterns stable, <b>7–1:</b> and 6-need research.</p>
          <p><b><i><font size="-1">Click on the header to sort any column</font></i></b>
            (And, remember <a href="https://yates.one-name.net/gengen/dna_theory_of_the_case.htm" target="_blank">what this is telling</a> us....)</p>
        </td>
      </tr>
    </table>

    <!-- Table Output -->
    <div class="output-table" style="margin-top: 10px;">
      <!-- TABLE_PLACEHOLDER -->
    </div>

    </td></tr></table>
  </td></tr></table>
</div>

<!-- Floating Top Button -->
<button onclick="topFunction()" id="myBtn" title="Go to top"
  style="position: fixed; bottom: 40px; right: 40px; z-index: 99; background-color: red; color: white;
         padding: 12px 20px; border: none; border-radius: 10px; cursor: pointer; font-size: 16px;">
  Top
</button>

<script>
// Scroll-to-top button
let mybutton = document.getElementById("myBtn");
window.onscroll = function() {
  if (document.body.scrollTop > 20 || document.documentElement.scrollTop > 20) {
    mybutton.style.display = "block";
  } else {
    mybutton.style.display = "none";
  }
};
function topFunction() {
  document.body.scrollTop = 0;
  document.documentElement.scrollTop = 0;
}
</script>

</body>
</html>
"""

# --- Build Table and Merge ---
final_cols = [
    "ID#",
    "cM",
    "Match to",
    "Value Range",
    "Value Label",
    "Yates DNA Ancestral Line"
]
df.sort_values(by=["Yates DNA Ancestral Line"], inplace=True)

html_table = df.to_html(index=False, columns=final_cols, escape=False, classes="dataframe sortable")

final_html = full_html_template.replace("{autosomal_count}", autosomal_count).replace("{today_date}", today_date)
final_html = final_html.replace("<!-- TABLE_PLACEHOLDER -->", html_table)

# Save HTML
with open("dna_cousin_surname_app.htm", "w", encoding="utf-8") as f:
    f.write(final_html)

# Preview inline
print(f"✅ HTML saved as dna_cousin_surname_app.htm with {len(df)} rows.")
display(HTML(final_html))


✅ HTML saved as dna_cousin_surname_app.htm with 1417 rows.


ID#,cM,Match to,Value Range,Value Label,Yates DNA Ancestral Line
I53693,20.0,fridine,5,6-need research,ArvinWilliamHe&YatesMargaretE~~~ArvinJohnAmbro&RoachAnnisEdna~~~ArvinJohnAmbro&MaidenAudreyEli~~~ButlerGeraldLev&ArvinSharonLee~~~RosenbalmJohn&Butler
I51586,29.0,hendricksjas,5,6-need research,BaileyWilliam&YatesRhoda~~~CrossJamesMadi&BaileyLucyNancy~~~CrossFrancisMa&WilkinsMaryAngel~~~CrossCollinGeo&SwitzerJuliaJohn~~~CrossFrancisMa&CarltonCarolynNe
I59027,28.0,yeatesd_ws,6,6-need research,BelkThomas&YatesElizabeth~~~HelmsAsa&BelkHannah~~~HelmsAsaMack&HelmsJemimaRac~~~HelmsHarleyHas&MoserMarthaJan~~~JordanThomasSmi&HelmsIdaJane~~~JordanFletcherW&PriceMirandaIo
I54946,25.0,yatesjohnh,6,6-need research,BennettWilliamBu&YatesElllen~~~CarmeliaEmanuel&BennettMaryAnn~~~CarmeliaGeorgeL&GaskillCarolineC~~~EvansJosephKai&CarmeliaMaryJane~~~SloverAndrewMil&EvansMaryEva~~~SloverFrederick&HuntJaneM
I54968,25.0,"girtain,kathryn",4,6-need research,BronsonJamesRobe&YatesAgnesMarg~~~BronsonHenryWesl&GreenNancyCath~~~SladeMatthew&BronsonBlondina~~~BurchAndrewJac&SladeGeorgiaNa
I38493,1.0,Y-DNA,4,6-need research,BurtonJohn&TorkingtonHarriet~~~BurtonSanfordSa&AngellAntoinette~~~BurtonKennethGo&VayroEdithIrvi~~~BurtonEdwardGou&SwimKaren
I59628,11.0,yeatesd_tm,20~8,5-patterns stable,ColliverJames&YatesNancy~~~ColliverJesseB&DoggettSarah~~~ColliverJamesP&DayDianaRuss~~~ColliverThomasJ&ViceLydia~~~StewartLawrenceW&ColliverMaryAudra~~~StewartRobertErw&MartinNancyPres
I46128,23.0,"yates,andreal",20~8,5-patterns stable,ColliverJames&YatesNancy~~~ColliverJesseB&DoggettSarah~~~ColliverRichardTh&PowersMaryEliza~~~ColliverJohnB&AndersonMaryLydia~~~JonesFrankThom&ColliverBeulahKat
I54181,21.0,marmar,20~8,5-patterns stable,CowdenWilliam&YatesCatherine~~~MilesWilliamDa&CowdenNancyAnn~~~MilesLeroyWalt&PalmerLucretiaE~~~OwenHerbertLe&MilesAliceHigg~~~ReedJohnWilli&OwenVirginiaI
I52241,24.0,klingal,20~8,5-patterns stable,CowdenWilliam&YatesCatherine~~~TylerAnderson&CowdenPhoebe~~~GrayJohnIra&TylerMarthaMat~~~HensonTurnerPhe&GraySallieEil~~~PhillipsClarenceE&HensonLelaKathe

0
"A report card for your DNA family tree  Return to Study Home  | Autosomal matches: 1327  | Updated: 2025-04-28  Imagine you have a report card for your family tree that tells you how ""special"" it is. Here’s how we break it down:  Think of value like the total number of points you get from finding all the important family connections in your tree  and comparing them to all the other trees included in the Yates study.  We then group them as a way to signal which ones seem to have potential for study:  >60: likely correct, 59–47:  lines forming, 46–34: patterns emerging,  33–21: notable patterns, 20–8: patterns stable, 7–1: and 6-need research.  Click on the header to sort any column  (And, remember what this is telling us....)"

ID#,cM,Match to,Value Range,Value Label,Yates DNA Ancestral Line
I53693,20.0,fridine,5,6-need research,ArvinWilliamHe&YatesMargaretE~~~ArvinJohnAmbro&RoachAnnisEdna~~~ArvinJohnAmbro&MaidenAudreyEli~~~ButlerGeraldLev&ArvinSharonLee~~~RosenbalmJohn&Butler
I51586,29.0,hendricksjas,5,6-need research,BaileyWilliam&YatesRhoda~~~CrossJamesMadi&BaileyLucyNancy~~~CrossFrancisMa&WilkinsMaryAngel~~~CrossCollinGeo&SwitzerJuliaJohn~~~CrossFrancisMa&CarltonCarolynNe
I59027,28.0,yeatesd_ws,6,6-need research,BelkThomas&YatesElizabeth~~~HelmsAsa&BelkHannah~~~HelmsAsaMack&HelmsJemimaRac~~~HelmsHarleyHas&MoserMarthaJan~~~JordanThomasSmi&HelmsIdaJane~~~JordanFletcherW&PriceMirandaIo
I54946,25.0,yatesjohnh,6,6-need research,BennettWilliamBu&YatesElllen~~~CarmeliaEmanuel&BennettMaryAnn~~~CarmeliaGeorgeL&GaskillCarolineC~~~EvansJosephKai&CarmeliaMaryJane~~~SloverAndrewMil&EvansMaryEva~~~SloverFrederick&HuntJaneM
I54968,25.0,"girtain,kathryn",4,6-need research,BronsonJamesRobe&YatesAgnesMarg~~~BronsonHenryWesl&GreenNancyCath~~~SladeMatthew&BronsonBlondina~~~BurchAndrewJac&SladeGeorgiaNa
I38493,1.0,Y-DNA,4,6-need research,BurtonJohn&TorkingtonHarriet~~~BurtonSanfordSa&AngellAntoinette~~~BurtonKennethGo&VayroEdithIrvi~~~BurtonEdwardGou&SwimKaren
I59628,11.0,yeatesd_tm,20~8,5-patterns stable,ColliverJames&YatesNancy~~~ColliverJesseB&DoggettSarah~~~ColliverJamesP&DayDianaRuss~~~ColliverThomasJ&ViceLydia~~~StewartLawrenceW&ColliverMaryAudra~~~StewartRobertErw&MartinNancyPres
I46128,23.0,"yates,andreal",20~8,5-patterns stable,ColliverJames&YatesNancy~~~ColliverJesseB&DoggettSarah~~~ColliverRichardTh&PowersMaryEliza~~~ColliverJohnB&AndersonMaryLydia~~~JonesFrankThom&ColliverBeulahKat
I54181,21.0,marmar,20~8,5-patterns stable,CowdenWilliam&YatesCatherine~~~MilesWilliamDa&CowdenNancyAnn~~~MilesLeroyWalt&PalmerLucretiaE~~~OwenHerbertLe&MilesAliceHigg~~~ReedJohnWilli&OwenVirginiaI
I52241,24.0,klingal,20~8,5-patterns stable,CowdenWilliam&YatesCatherine~~~TylerAnderson&CowdenPhoebe~~~GrayJohnIra&TylerMarthaMat~~~HensonTurnerPhe&GraySallieEil~~~PhillipsClarenceE&HensonLelaKathe
