New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Executors don't resolve dependencies #141
Comments
Stupid question probably, but can we influence the executor pods? For example add an init container with coursier to fetch all dependencies of stuff stated in deps.. |
The operator lays out the pod templates for both the driver and executor, so it should be easily possible. |
This maybe overlaps a bit with #117 |
Fixed in apache/spark#38828, due for inclusion in 3.4.0 (not released yet). |
Maybe I'm doing something wrong but I tested this quickly and SparkWrite wasn't picked up with
|
According to this, new releases come out very couple of months: as 3.3.0 was released in July 2022, 3.4.0 could be on its way in the medium term. Not able to find a roadmap date for it, though. This issue applies to JVM dependencies, not python ones (which are installed by ourselves). Currently the workaround for this is to use an image (based on a stackable one) with resolved dependencies "baked in" (we do this in the stackablectl datalake demo). |
thanks for the update Andrew! @lfrancke I'd suggest that we move this into "track" and wait for upstream, instead of spending time on developing a workaround |
Sounds good to me! |
Thanks for the quick response, I'm moving the ticket then! |
There is a 3.4.0-rc1 version now |
3.4.0 is officially released |
@lfrancke the new version should fix this, but we didn't want to move it to the next column until the lts discussion. |
I'm fine with 3.4 or do you see any reason not to support it? |
Dependency solving is still broken. See the PR above. Proposal: discuss the idea of introducing a mechanism (via an init container) that provisions dependencies on both drivers and executors before submitting the applications. Note that vector logging already does some pre-provisioning and already sets |
Possible next steps:
|
Part of #141 Reminder: cherry-pick to `release-23.4` after `main` merge.
Update: While setting up a JupyterHub setup with spark-k8s accessing hdfs we did find a setup that correctly resolved dependencies. |
I'm sorry I lost track. Can you briefly explain why this is closed again? |
We closed it as we have fixed the reported bug. Now you can e.g. pull in Iceberg, but not JDBC drivers if I understood correctly. |
Affected version
0.5.0
Current and expected behavior
Following https://iceberg.apache.org/docs/latest/getting-started/
Current
Use
Driver logs:
Executors do not pull the dependencies and fail with
java.lang.ClassNotFoundException: org.apache.iceberg.spark.source.SparkWrite$WriterFactory
(should come withorg.apache.iceberg:iceberg-spark-runtime-3.3_2.12:0.14.1
)Expected
drivers and executors pull the dependencies
Possible solution
No response
Additional context
No response
Environment
Would you like to work on fixing this bug?
maybe
The text was updated successfully, but these errors were encountered: