Skip to content

Add ground truth labels for CRM dedup dataset#132

Merged
nikosbosse merged 1 commit intomainfrom
nikos/add-crm-ground-truth
Feb 11, 2026
Merged

Add ground truth labels for CRM dedup dataset#132
nikosbosse merged 1 commit intomainfrom
nikos/add-crm-ground-truth

Conversation

@nikosbosse
Copy link
Contributor

Summary

  • Adds docs/data/case_01_crm_data_ground_truth.csv with cluster_id labels for the 500-row CRM dataset
  • Same columns as case_01_crm_data.csv (company_name, contact_name, email_address) plus a cluster_id column
  • 500 rows, 121 unique clusters — derived from aletheia evals case_29_messy_crm solution

Test plan

  • Verify row count matches original (500 rows)
  • Verify columns are company_name, contact_name, email_address, cluster_id (no row_id)
  • Verify 121 unique clusters

🤖 Generated with Claude Code

Adds case_01_crm_data_ground_truth.csv with cluster_id labels for
the 500-row CRM dataset (121 unique companies). Same format as
case_01_crm_data.csv with an added cluster_id column.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@nikosbosse nikosbosse merged commit c298095 into main Feb 11, 2026
2 checks passed
@nikosbosse nikosbosse deleted the nikos/add-crm-ground-truth branch February 11, 2026 10:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant