Skip to content
Discussion options

You must be logged in to vote

Does that mean the memory consumption [from computing all buckets] will be reduced in Splink v4 by using a threshold weight?

Yes. At the very least the size of the output dataset should be far smaller, especially when using a very loose blocking rule like blocking on dob. It still has to the same number of calculations, it just only materialises a small % of the results.

Are rules operated on sequentially? Could I explode the rule set to do something that has a similar memory reduction effect?

No - they're computed in parallel. If you want the behaviour you're implying, you'd want to have an outer loop over MonthOfBirth, something like (pseudocode):

for each m in MonthOfBirth:
   df_f…

Replies: 2 comments 1 reply

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
1 reply
@RobinL
Comment options

Answer selected by gringer
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants