Spark Base Image

Dockerfile setup for Spark set-up, imbued with varying degree of Python data science packages.

This set-up is mainly useful for creating Spark workers via Docker containers, since for Pyspark, the requirements have to be installed on the workers.

All the build arguments variables are template interpolated in from the file templates/vars.yml, including the set of pip packages, which include the following:

attrs
numpy
pandas
pendulum==1.4.4
pyjwt
pyproj
python-dateutil
shapely
requests

Note that the versions might not be of the latest, and there are possible discrepencies between Python 2.7 and Python 3.y due to some of the package requirements.

Example Docker build command

BASE_VERSION="v2"
SPARK_VERSION="2.4.4"
SCALA_VERSION="2.12"
HADOOP_VERSION="3.1.0"
PYTHON_VERSION="3.7"
PACKAGE_SET="attrs~=19.3 numpy~=1.17 pandas~=0.25.0 pendulum==1.4.4 pyjwt~=1.5 pyproj~=1.9 python-dateutil~=2.8 shapely~=1.6 requests~=2.22"

docker build debian/ -t spark-base-debian \
    --build-arg "BASE_VERSION=${BASE_VERSION}" \
    --build-arg "SPARK_VERSION=${SPARK_VERSION}" \
    --build-arg "SCALA_VERSION=${SCALA_VERSION}" \
    --build-arg "HADOOP_VERSION=${HADOOP_VERSION}" \
    --build-arg "PYTHON_VERSION=${PYTHON_VERSION}" \
    --build-arg "PACKAGE_SET=${PACKAGE_SET}"

How to Apply Travis Template

For Linux user, you can download Tera CLI v0.2 at https://github.com/guangie88/tera-cli/releases and place it in PATH.

Otherwise, you will need cargo, which can be installed via rustup.

Once cargo is installed, simply run cargo install tera-cli --version=^0.2.0.

Always make changes in templates/ci.yml.tmpl since the template will be applied onto .github/workflows/ci.yml.

Run templates/apply-vars.sh to apply the template once tera-cli has been installed.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.github/workflows		.github/workflows
debian		debian
templates		templates
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
push-images.sh		push-images.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spark Base Image

Example Docker build command

How to Apply Travis Template

About

Releases

Packages

Languages

License

guangie88/spark-base

Folders and files

Latest commit

History

Repository files navigation

Spark Base Image

Example Docker build command

How to Apply Travis Template

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages