Levenshtein help #699
-
Hello, I am running the example model on my own data source and am getting this error "RuntimeError: Invalid Input Error: Levenshtein Function: 2nd argument too short" my comparison settings are this: levenshtein_at_thresholds("Claim_description", 2), looks like the codes is also failing on this line: linker.estimate_u_using_random_sampling(target_rows=1e6) |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
That happens in duckdb if you call the levenstein function where one on the inputs is a zero length string i.e. "". You need to turn zero length strings into true nulls before inputting your data into Splink Sample code for cleaning up the input dataframe:
|
Beta Was this translation helpful? Give feedback.
-
Ah I see, thank you so much for your help!
Best,
Michayla
…On Mon, Aug 8, 2022 at 2:43 PM Robin Linacre ***@***.***> wrote:
That happens in duckdb if you call the levenstein function where one on
the inputs is a zero length string i.e. "". You need to turn zero length
strings into true nulls before inputting your data into Splink
—
Reply to this email directly, view it on GitHub
<#699 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/A2LNGBXOIJVAHT3VVRWDEKDVYFIL7ANCNFSM55562XYA>
.
You are receiving this because you authored the thread.Message ID:
<moj-analytical-services/splink/repo-discussions/699/comments/3351927@
github.com>
|
Beta Was this translation helpful? Give feedback.
That happens in duckdb if you call the levenstein function where one on the inputs is a zero length string i.e. "". You need to turn zero length strings into true nulls before inputting your data into Splink
Sample code for cleaning up the input dataframe: