-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spark 3.1.2 support does not work with protobuf (sparksql-scalapb) #35
Comments
Does this example work with Cloudflow |
shading should fix it .. https://stackoverflow.com/a/41609653/16405999. Since we don't use |
Tried to shade the module as per https://scalapb.github.io/docs/sparksql/ but ran into the following in the logs for
Code in branch Looks somewhat similar to scalapb/ScalaPB#1122 .. |
@debasishg , after further investigation the result is:
taking a look at what you did, probably the missing steps to hack together something are:
if the resulting Docker image works we can cleanup the process. |
@andreaTP also spark cannot work with
Hence I don't think we can delete all versions other than |
@debasishg all good but:
We strictly have to remove all versions but |
@andreaTP Ok, so I shaded
|
You probably need an extra step, move the compilation of protobuf files to a separate sub-project, depends on the artifact, and shade the proto dependency. |
Here's the latest status on this .. After exploring lots of options, I figured out that the only way to achieve the goal is to do the following steps:
This implies a redesign of how we package the image in cloudflow today for spark plugin. This is a significant effort I think. Should we invest time and effort on this ? @RayRoestenburg @andreaTP |
This should be retried with 0.2.0, which upgrades to Spark 3.2.0. Closing this ticket for that reason. |
Trying out a spark project with protobuf, when the executor runs:
Probably because of many protobuf libs on the classpath,
looking at sparksql-scalapb, hadoop contains an outdated protobuf library that needs to be shaded:
https://scalapb.github.io/docs/sparksql/#setting-up-your-project
https://github.com/thesamet/sparksql-scalapb-test/blob/master/build.sbt
But cloudflow does not use sbt-assembly, so we need to find another way to ensure that the right protobuf library is used.
(could also have another reason, but we should be able to see from dependencies)
Should probably try https://github.com/coursier/sbt-shading to shade the dependencies.
The text was updated successfully, but these errors were encountered: