-
Hi all, I am currently working with Splink I want to make a custom case expression to set levels of comparison for one variable based on the value of another variable. To illustrate this I have created a small simplified example of fake data that mimics my own:
Due to the way which iOS is built, users can change accessibility settings on an app basis only in OS versions > 15. Therefore, I am interested in a level 1 match on the So my question is: is it possible to create a custom case expression which can set a comparison level for one column based on the value of another column. Thank you all in advance for the help and to the Splink creators for an awesome tool! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Hiya. Yes - I think it is possible. I think the syntax in the settings for this column would be something like this. (You'll need other comparison columns, I've just included the specific one you're struggling with) sql = """
case
when accessibility_settings_string_l is null or accessibility_settings_string_r is null then -1
when
system_os_vesion_l >= 15 and system_os_vesion_r >= 15
and accessibility_settings_string_l = accessibility_settings_string_r then 2
when
system_os_vesion_l < 15 and system_os_vesion_r < 15
and accessibility_settings_string_l = accessibility_settings_string_r then 1
else 0
end
"""
settings = {
"link_type": "link_only",
"comparison_columns": [
{
"custom_name": "os_version_and_accessibility",
"num_levels": 3,
"case_expression": sql,
"custom_columns_used": [
"system_os_vesion",
"accessibility_settings_string",
],
},
],
} If you're not too far into your 'splink journey', it might be worth giving splink version 3 a go. You should find it easier to work with, especially if your datasets are not too big. (Splink 3 no longer requires Spark for small to medium sized linkages. So if your data is smaller than a few million records, you can probably do linkage without needing Spark, which means things should run faster). See the note at the top of here for more info. |
Beta Was this translation helpful? Give feedback.
Hiya. Yes - I think it is possible. I think the syntax in the settings for this column would be something like this. (You'll need other comparison columns, I've just included the specific one you're struggling with)