Skip to content

Personal Dockerfile for Machine Learning and BigData processing

Notifications You must be signed in to change notification settings

george-j-zhu/docker-spark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Personal Dockerfile for Machine Learning and BigData processing

If you reached out here for exploring Jupyter Docker images, I recommend you to check the following repository.
https://github.com/jupyter/docker-stacks

2 Dockerfiles

  • spark-cluster
  • dev-env

spark-cluster

This Dockerfile is currently based on jupyter/pyspark-notebook.
The following packages are installed here.

  • Scala 2.11
  • Apache Mahout 0.13.0
  • Keras
  • Tensorflow
  • Opencv
  • Elephas
  • python API for retrieving data from Poloniex

This Docker image is mainly used to bring up a Spark cluster.

dev-env

This Dockerfile is based on spark-cluster.
The following packages are installed here.

  • Emacs
  • Angular
  • SBT
  • Maven
  • openjdk8-jdk

This is my development environment.

Building

docker image build -t [docker image name] .

How to use

After building the Dockerfiles as Docker images, run the following command to start a new docker container.

docker container run --name=[docker container name] -it -v [host folder]:/home/jupyter/workspace -p [host port]:8888 [docker image name]
  • --name  [docker container name]
  • -it          run container in iteractive mode using tty connection.
  • -v          mount a [host folder] as /home/jupyter/workspace in the container. workspace folder is the root directory of jupyter notebook.
  • -p          bind container port 8888 to the [host port].

or you can use dev-env/spark-notebook-start.sh to run a container instance.
Please note that spark-notebook-start.sh will start a jupyter notebook server without any authentification.

PYTHONPATH

As I use personal libraries in my daily work, I mount the my libraries onto a PYTHONPATH volume in the docker container.

-v [host PYTHONPATH folder]:/opt/pythonlibs

Bring up a Spark Cluster

Copy spark-cluster/docker-compose.yml to anywhere you like. Bring up a spark cluster by the following command

docker-compose up

About

Personal Dockerfile for Machine Learning and BigData processing

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published