Replies: 3 comments 7 replies
-
See under the match key header here, which explains how to find which blocking rule created the match |
Beta Was this translation helpful? Give feedback.
1 reply
-
Thank you for this.
It’s not exactly what I was asking.
Say there are two competing records
External ID
Spine ID
Block
Score
5
50
1
14.6
27
50
3
14.6
There is still only a single score per record comparison but two records share the same score – how do we separate them?
Using the block, block 1 is more stringent than block 3.
I propose to select the match from block 1 (I’ll probably be able to write some Python to do this).
First, I need to assign a block number to the blocking conditions which is output with the match pairs.
How do I do this?
Best wishes
Ken
From: Robin Linacre ***@***.***>
Sent: Wednesday, August 2, 2023 10:41 AM
To: moj-analytical-services/splink ***@***.***>
Cc: Humphreys K (Kenneth) ***@***.***>; Author ***@***.***>
Subject: Re: [moj-analytical-services/splink] How do I assign a block number and output that in the linked pairs? (Discussion #1494)
Note that there is only a single score per record comparison, irrespective of the block that created it. So the order of the blocks is irrelevant. The model that Splink estimates is a global model that scores the whole record comparison.
Blocking rules are used only to select which record comparisons to score, not how to score them
—
Reply to this email directly, view it on GitHub<#1494 (reply in thread)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/A5SL2HJ3RFQTHNSKILU34RLXTIODPANCNFSM6AAAAAA3A34ZD4>.
You are receiving this because you authored the thread.Message ID: ***@***.******@***.***>>
**********************************************************************
This e-mail (and any files or other attachments transmitted with it) is intended solely for the attention of the addressee(s). Unauthorised use, disclosure, storage, copying or distribution of any part of this e-mail is not permitted. If you are not the intended recipient please destroy the email, remove any copies from your system and inform the sender immediately by return.
Communications with the Scottish Government may be monitored or recorded in order to secure the effective operation of the system and for other lawful purposes. The views or opinions contained within this e-mail may not necessarily reflect those of the Scottish Government.
**********************************************************************
|
Beta Was this translation helpful? Give feedback.
2 replies
-
Hi,
We have two different blocks giving the same score with a different external record matching the same spine record.
External ID
Spine ID
Block
Score
5
50
1
14.6
27
50
3
14.6
Blocking hierarchy (block number) assigned by me (not Splink output).
How could I input a block number in the blocking rules which is output in the links to give me output as in the table?
This would let me select the 1st pair as the block number is lower (more stringent block).
Best wishes
Ken
From: Robin Linacre ***@***.***>
Sent: Wednesday, August 2, 2023 11:36 AM
To: moj-analytical-services/splink ***@***.***>
Cc: Humphreys K (Kenneth) ***@***.***>; Author ***@***.***>
Subject: Re: [moj-analytical-services/splink] How do I assign a block number and output that in the linked pairs? (Discussion #1494)
I'm not sure I understand. Is this the table?
External ID
Spine ID
Block Score
5
50
14.6
27
50
14.6
What does 'block score' mean in that table?
The match_key column (the rightmost column of the output of predict()) contains an integer that corresponds to
the (zero based) blocking rules i.e. if your settings are like:
settings = {
"blocking_rules_to_generate_predictions": [
"l.first_name = r.first_name and l.last_name = r.last_name",
"l.age = r.age",
]
}
Then
match key 0 corresponds to "l.first_name = r.first_name and l.last_name = r.last_name",
match_key 1 corresponds to "l.age = r.age","
Note these are mutually exclusive - i.e. match_key = 1 means the record did not pass match key 0, but did pass match key 1.
—
Reply to this email directly, view it on GitHub<#1494 (reply in thread)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/A5SL2HLA27GJ3YHNHRNQ743XTIURFANCNFSM6AAAAAA3A34ZD4>.
You are receiving this because you authored the thread.Message ID: ***@***.******@***.***>>
**********************************************************************
This e-mail (and any files or other attachments transmitted with it) is intended solely for the attention of the addressee(s). Unauthorised use, disclosure, storage, copying or distribution of any part of this e-mail is not permitted. If you are not the intended recipient please destroy the email, remove any copies from your system and inform the sender immediately by return.
Communications with the Scottish Government may be monitored or recorded in order to secure the effective operation of the system and for other lawful purposes. The views or opinions contained within this e-mail may not necessarily reflect those of the Scottish Government.
**********************************************************************
|
Beta Was this translation helpful? Give feedback.
4 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi, I am expecting to get competing scores (matches) from different blocks.
How can I assign a block number (base on hierarchy of strictness of blocking) and output block number with the pairs?
I can then separate two competing matches with the same score based on the minimum number (most stringent) block
Using Pyspark in Azure Databricks.
Many thanks
Beta Was this translation helpful? Give feedback.
All reactions