-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build for scala 2.13 #8
Conversation
As per suggestion in my original feature request, I have pushed some minimal changes to build the jars with Scala 2.13. As I am new to Scala, the result might not be 100 percent right. Feel free to suggest improvements. What I am puzzled the most is that I had to change the expected values in the test |
@@ -20,11 +20,9 @@ class JaroWinklerSimilarityTest extends FlatSpec with Matchers { | |||
distance.call("hippo", "elephant") should equal (0.44) | |||
distance.call("hippo", "zzzzzzzz") should equal (0.0) | |||
distance.call("hello", "hallo") should equal (0.88) | |||
distance.call("ABC Corporation", "ABC Corp") should equal (0.93) | |||
distance.call("ABC Corporation", "ABC Corp") should equal (0.90) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah - this is peculiar. It does appear the new test is closer to the correct answer than the old one though
In Python:
import duckdb
duckdb.sql("""
select jaro_winkler_similarity('ABC Corporation', 'ABC Corp')
""")
and
from jarowinkler import *
jarowinkler_similarity("ABC Corporation", "ABC Corp")
Both give 0.906666
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same score from this online tool
Weird,
install.packages("stringdist")
library(stringdist)
stringdist('ABC Corporation','ABC Corp',method='jw')
gives 0.1555556
But that does have parameters:
stringdist('ABC Corporation', 'ABC Corp', method='jw', p=0.1)
gives 0.9333
So perhaps there's something to do with how the functions are parameterised - the docs say p is Winklers ’prefix’ parameter for Jaro-Winkler distance, with 0 ≤ p ≤ 0.25.
kg005 Thank you so much for taking the time to contribute to this project and for your pull request. Our project currently uses Maven as its build system and sdkman for java/scala versions for consistency and compatibility reasons, as this is a setup that works in our day to day laptops and we intend to continue with this system for the foreseeable future. That said, your initiative has not gone unnoticed, and I believe your skills could be greatly beneficial to this project A scala_2.13 branch has been created that could use some improvement, and I would be thrilled if you could help us enhance it. If you're interested you can work on that instead. It already builds a jar however there is a dependency issue problem with scalatest and scalactic and scala 2.13 otherwise there is an
error As you can imagine this is not optimal. If you can reintroduce the scalatest plugin in the pom.xml in a way that it compiles with As a further note: we cannot accept a compiled jar itself due to security reasons (but as discussed we certainly would love to have help on the maven pom.xml) Meanwhile you can test if this jar works on your scala 2.13 configuration. Thanks again for your effort. |
Minimal changes to be able to build the splink_scalaudfs with Scala 2.13.