From f3f3b6353874bd19aa5c720a4a55ccebf29cafb1 Mon Sep 17 00:00:00 2001 From: Luca Canali Date: Thu, 25 Jul 2019 01:51:51 -0700 Subject: [PATCH] [SPARK-26995][K8S][2.4] Make ld-linux-x86-64.so.2 visible to snappy native library under /lib in docker image with Alpine Linux ## What changes were proposed in this pull request? This is a back port of #23898. Running Spark in Docker image with Alpine Linux 3.9.0 throws errors when using snappy. The issue can be reproduced for example as follows: `Seq(1,2).toDF("id").write.format("parquet").save("DELETEME1")` The key part of the error stack is as follows `SparkException: Task failed while writing rows. .... Caused by: java.lang.UnsatisfiedLinkError: /tmp/snappy-1.1.7-2b4872f1-7c41-4b84-bda1-dbcb8dd0ce4c-libsnappyjava.so: Error loading shared library ld-linux-x86-64.so.2: Noded by /tmp/snappy-1.1.7-2b4872f1-7c41-4b84-bda1-dbcb8dd0ce4c-libsnappyjava.so)` The source of the error appears to be that libsnappyjava.so needs ld-linux-x86-64.so.2 and looks for it in /lib, while in Alpine Linux 3.9.0 with libc6-compat version 1.1.20-r3 ld-linux-x86-64.so.2 is located in /lib64. Note: this issue is not present with Alpine Linux 3.8 and libc6-compat version 1.1.19-r10 A possible workaround proposed with this PR is to modify the Dockerfile by adding a symbolic link between /lib and /lib64 so that linux-x86-64.so.2 can be found in /lib. This is probably not the cleanest solution, but I have observed that this is what happened/happens already when using Alpine Linux 3.8.1 (a version of Alpine Linux which was not affected by the issue reported here). ## How was this patch tested? Manually tested by running a simple workload with spark-shell, using docker on a client machine and using Spark on a Kubernetes cluster. The test workload is: `Seq(1,2).toDF("id").write.format("parquet").save("DELETEME1")` Added a test to the KubernetesSuite / BasicTestsSuite Closes #25255 from dongjoon-hyun/SPARK-26995. Authored-by: Luca Canali Signed-off-by: Dongjoon Hyun --- .../kubernetes/docker/src/main/dockerfiles/spark/Dockerfile | 1 + .../spark/deploy/k8s/integrationtest/BasicTestsSuite.scala | 3 +++ 2 files changed, 4 insertions(+) diff --git a/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile b/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile index 196b74f8241ea..f0c6428c6a367 100644 --- a/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile +++ b/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile @@ -30,6 +30,7 @@ ARG k8s_tests=kubernetes/tests RUN set -ex && \ apk upgrade --no-cache && \ + ln -s /lib /lib64 && \ apk add --no-cache bash tini libc6-compat linux-pam nss && \ mkdir -p /opt/spark && \ mkdir -p /opt/spark/work-dir && \ diff --git a/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/BasicTestsSuite.scala b/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/BasicTestsSuite.scala index 4e749c40563dc..3c1d9ea54ab2c 100644 --- a/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/BasicTestsSuite.scala +++ b/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/BasicTestsSuite.scala @@ -52,6 +52,9 @@ private[spark] trait BasicTestsSuite { k8sSuite: KubernetesSuite => } test("Run SparkPi with an argument.", k8sTestTag) { + // This additional configuration with snappy is for SPARK-26995 + sparkAppConf + .set("spark.io.compression.codec", "snappy") runSparkPiAndVerifyCompletion(appArgs = Array("5")) }