Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split detailed comparisons in household inference #56

Merged
merged 1 commit into from
Apr 4, 2023

Conversation

dehall
Copy link
Collaborator

@dehall dehall commented Apr 4, 2023

Following up on previous PRs, this PR splits up the inspection of candidate links into multiple steps to reduce the maximum memory requirements of that step. The size of each iteration is driven by the split_factor argument added previously

This PR also tweaks the preprocessing of the PII file a little bit, to ensure that the parse_addr function is only called once per row. Previously it would parse once per row at the start of the script, then once per row again each time a pair was evaluated. Based on timing this seems to reduce runtime by 66%.

@dehall dehall merged commit 4619b41 into master Apr 4, 2023
@dehall dehall deleted the household_perf_split2 branch April 4, 2023 19:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant