Skip to content

Commit

Permalink
Merge pull request #1279 from mathbunnyru/asalikhov/archive_spark
Browse files Browse the repository at this point in the history
Install spark from archive.apache.org to be able to use old versions
  • Loading branch information
romainx committed May 4, 2021
2 parents 29ca05f + ef470a6 commit f46d59f
Show file tree
Hide file tree
Showing 2 changed files with 2 additions and 5 deletions.
2 changes: 1 addition & 1 deletion docs/using/specifics.md
Expand Up @@ -12,7 +12,7 @@ This page provides details about features specific to one or more images.

You can build a `pyspark-notebook` image (and also the downstream `all-spark-notebook` image) with a different version of Spark by overriding the default value of the following arguments at build time.

* Spark distribution is defined by the combination of the Spark and the Hadoop version and verified by the package checksum, see [Download Apache Spark](https://spark.apache.org/downloads.html) for more information. At this time the build will only work with the set of versions available on the Apache Spark download page, so it will not work with the archived versions.
* Spark distribution is defined by the combination of the Spark and the Hadoop version and verified by the package checksum, see [Download Apache Spark](https://spark.apache.org/downloads.html) and the [archive repo](https://archive.apache.org/dist/spark/) for more information.
* `spark_version`: The Spark version to install (`3.0.0`).
* `hadoop_version`: The Hadoop version (`3.2`).
* `spark_checksum`: The package checksum (`BFE4540...`).
Expand Down
5 changes: 1 addition & 4 deletions pyspark-notebook/Dockerfile
Expand Up @@ -29,10 +29,7 @@ RUN apt-get -y update && \

# Spark installation
WORKDIR /tmp
# Using the preferred mirror to download Spark
# hadolint ignore=SC2046
RUN wget -q $(wget -qO- https://www.apache.org/dyn/closer.lua/spark/spark-${APACHE_SPARK_VERSION}/spark-${APACHE_SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz\?as_json | \
python -c "import sys, json; content=json.load(sys.stdin); print(content['preferred']+content['path_info'])") && \
RUN wget -q "https://archive.apache.org/dist/spark/spark-${APACHE_SPARK_VERSION}/spark-${APACHE_SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz" && \
echo "${spark_checksum} *spark-${APACHE_SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz" | sha512sum -c - && \
tar xzf "spark-${APACHE_SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz" -C /usr/local --owner root --group root --no-same-owner && \
rm "spark-${APACHE_SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz"
Expand Down

0 comments on commit f46d59f

Please sign in to comment.