Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build for scala 2.13 #8

Closed

Conversation

kg005
Copy link

@kg005 kg005 commented Apr 18, 2024

Minimal changes to be able to build the splink_scalaudfs with Scala 2.13.

@kg005
Copy link
Author

kg005 commented Apr 18, 2024

As per suggestion in my original feature request, I have pushed some minimal changes to build the jars with Scala 2.13.

As I am new to Scala, the result might not be 100 percent right. Feel free to suggest improvements.

What I am puzzled the most is that I had to change the expected values in the test JaroWinklereSimilarityTest. Without that (or deleting the test completely) I was not able to build the package. The new expected values were validated at two different Scala online editors, so they hopefully should be right.

@@ -20,11 +20,9 @@ class JaroWinklerSimilarityTest extends FlatSpec with Matchers {
distance.call("hippo", "elephant") should equal (0.44)
distance.call("hippo", "zzzzzzzz") should equal (0.0)
distance.call("hello", "hallo") should equal (0.88)
distance.call("ABC Corporation", "ABC Corp") should equal (0.93)
distance.call("ABC Corporation", "ABC Corp") should equal (0.90)
Copy link
Member

@RobinL RobinL Apr 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - this is peculiar. It does appear the new test is closer to the correct answer than the old one though

In Python:

import duckdb
duckdb.sql("""
select jaro_winkler_similarity('ABC Corporation', 'ABC Corp') 
""")

and

from jarowinkler import *

jarowinkler_similarity("ABC Corporation", "ABC Corp")

Both give 0.906666

Copy link
Member

@RobinL RobinL Apr 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same score from this online tool

Weird,

install.packages("stringdist")
library(stringdist)
stringdist('ABC Corporation','ABC Corp',method='jw')

gives 0.1555556

But that does have parameters:

stringdist('ABC Corporation', 'ABC Corp', method='jw', p=0.1) gives 0.9333

So perhaps there's something to do with how the functions are parameterised - the docs say p is Winklers ’prefix’ parameter for Jaro-Winkler distance, with 0 ≤ p ≤ 0.25.

@mamonu
Copy link
Contributor

mamonu commented Apr 21, 2024

kg005

Thank you so much for taking the time to contribute to this project and for your pull request.
We really appreciate the effort you’ve put into this. However, after careful consideration, we've decided not to merge your PR at this time.

Our project currently uses Maven as its build system and sdkman for java/scala versions for consistency and compatibility reasons, as this is a setup that works in our day to day laptops and we intend to continue with this system for the foreseeable future.

That said, your initiative has not gone unnoticed, and I believe your skills could be greatly beneficial to this project

A scala_2.13 branch has been created that could use some improvement, and I would be thrilled if you could help us enhance it. If you're interested you can work on that instead.

It already builds a jar however there is a dependency issue problem with scalatest and scalactic and scala 2.13
as a result only if the test suite is not used and the scalatest plugin is not loaded from the pom.xml there is a successful build

otherwise there is an

splink_scalaudfs/src/test/scala/uk/gov/moj/dash/linkage/JaroWinklerDistanceTest.scala:6: error: class JaroWinklerSimilarityTest needs to be abstract.

error

As you can imagine this is not optimal. If you can reintroduce the scalatest plugin in the pom.xml in a way that it compiles with mvn package and creates a jar then that would be appreciated and used/merged. When this works you can reintroduce the tests from main too.

As a further note: we cannot accept a compiled jar itself due to security reasons (but as discussed we certainly would love to have help on the maven pom.xml)

Meanwhile you can test if this jar works on your scala 2.13 configuration.

Thanks again for your effort.

@mamonu mamonu closed this Apr 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants