Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SplinkDataFrame metadata in clustering + metrics #1981

Merged
merged 3 commits into from
Feb 29, 2024

Conversation

ADBond
Copy link
Contributor

@ADBond ADBond commented Feb 19, 2024

Type of PR

  • BUG
  • FEAT
  • MAINT
  • DOC

Is your Pull Request linked to an existing Issue or Pull Request?

Closes #1971. It does not include saving metadata in parquet - will open a separate issue for this.

Give a brief description for the solution you have provided

SplinkDataFrame holds a dict attribute metadata which we can use to store whatever we want.

This stores threshold_match_probability on frames when we:

  • cluster pairwise
  • compute cluster metrics

In the latter we use the former metadata (if available) when no explicit threshold parameter is provided.

PR Checklist

  • Added documentation for changes
  • Added feature to example notebooks or tutorial (if appropriate)
  • Added tests (if appropriate)
  • Updated CHANGELOG.md (if appropriate)
  • Made changes based off the latest version of Splink
  • Run the linter

Copy link
Member

@RobinL RobinL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

Copy link
Contributor

@zslade zslade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙌

Copy link
Contributor

@ThomasHepworth ThomasHepworth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've also scanned through this and it looks good to me.

@ADBond ADBond merged commit e4bedd7 into master Feb 29, 2024
10 checks passed
@ADBond ADBond deleted the splinkdataframe-metadata branch February 29, 2024 11:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEAT] SplinkDataFrame metadata
4 participants