Unable to load Jaro-Winkler Similarity in AWS Glue #1377
Unanswered
richard-a-lott
asked this question in
Q&A
Replies: 3 comments 9 replies
-
Actually this doesn't (quite) work! One has to copy the jar file into S3 before running the job. The rest of the method works though. In my case I'd already copied the jar, so it existed before the job ran. So a revised method to the above is:
|
Beta Was this translation helpful? Give feedback.
0 replies
-
Hmmm.. I've been able to use and load the jar file just fine. Are you including splink into your job in the job parameters section like below? |
Beta Was this translation helpful? Give feedback.
1 reply
-
Beta Was this translation helpful? Give feedback.
8 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
I've had a bit of an issue using Splink with AWS Glue (similar to this one: #636 (comment)) where Spark is unable to find and load the Splink UDF jar.
I confirmed the jar was present in the path identified by similarity_jar_location, and I confirmed the jar was included in multiple spark config parameters (spark.jars, spark.driver.extraCLassPath, spark.executor.extraClassPath), however my Glue job refused to recognise the jar.
I have found a work-around, which I'll share here, but it would be good to know if anyone has found a better way?
My work around is to specify, in the Glue Job Details, a location for "Dependend Jars Path" (has to be full S3 URI to the jar file, including the filename), and then to upload the jar file into this S3 location, at the start of the Glue job, prior to creating the Spark session:
(apologies for the image, I'm unable to paste the code in directly)
Maybe a bit messy but it seems to be working. Anyone have a better method?
Beta Was this translation helpful? Give feedback.
All reactions