Skip to content

Commit

Permalink
merged with master, updated docker scripts to include stream-kafka-*
Browse files Browse the repository at this point in the history
  • Loading branch information
cfregly committed Mar 28, 2017
2 parents d1dc50a + 22f07dc commit 61c827e
Show file tree
Hide file tree
Showing 644 changed files with 12,144 additions and 290,295 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@

![Google Cloud Platform](http://pipeline.io/images/gce-logo-190x90.png)

* PipelineIO on [**Azure**](Setup-Pipeline-Azure)
* PipelineIO on [**Azure**](https://github.com/fluxcapacitor/pipeline/wiki/Setup-Pipeline-Azure)

![Azure](http://pipeline.io/images/azure-logo-200x103.png)

Expand Down
49 changes: 45 additions & 4 deletions gpu.ml/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -8,20 +8,23 @@ RUN \
&& conda install --yes -c conda-forge notebook=4.4.1 \
&& conda install --yes -c conda-forge findspark=1.0.0 \
&& conda install --yes -c conda-forge jupyter_contrib_nbextensions=0.2.4 \
&& conda install --yes -c anaconda-nb-extensions anaconda-nb-extensions=1.0.0
&& conda install --yes -c conda-forge ipywidgets=6.0.0 \
&& conda install --yes -c anaconda-nb-extensions anaconda-nb-extensions=1.0.0

RUN \
pip install jupyterlab \
&& jupyter serverextension enable --py jupyterlab --sys-prefix

RUN \
jupyter nbextension enable --py widgetsnbextension --sys-prefix

# Install non-secure dummyauthenticator for jupyterhub (dev purposes only)
RUN \
pip install jupyterhub-dummyauthenticator

RUN \
pip install jupyterhub-simplespawner


COPY lib/ lib/

RUN \
Expand Down Expand Up @@ -64,6 +67,10 @@ ENV \
HADOOP_CONF=${HADOOP_HOME}/etc/hadoop/ \
PATH=${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:${PATH}

# Required by Tensorflow
ENV \
HADOOP_HDFS_HOME=${HADOOP_HOME}

# Required by Tensorflow for HDFS
RUN \
echo 'export CLASSPATH=$(${HADOOP_HDFS_HOME}/bin/hadoop classpath --glob)' >> /root/.bashrc
Expand Down Expand Up @@ -125,9 +132,43 @@ RUN \
COPY src/ src/
COPY notebooks/ notebooks/
COPY profiles/ /root/.ipython/
COPY html/ html/
COPY run run

# Expose Spark Worker Port for Web Admin UI
EXPOSE 50070 39000 9000 6006 8754 7077 6066 6060 6061 4040 4041 4042 4043 4044
ENV \
TF_CPP_MIN_LOG_LEVEL=0 \
TF_XLA_FLAGS=--xla_generate_hlo_graph=.*

ENV \
PATH=$TENSORFLOW_HOME/bazel-bin/tensorflow/tools/graph_transforms:$TENSORFLOW_HOME/bazel-bin/tensorflow/compiler/aot:$TENSORFLOW_HOME/bazel-bin/tensorflow/compiler/tests:$TENSORFLOW_HOME/bazel-bin/tensorflow/examples/tutorials/word2vec:$TENSORFLOW_HOME/bazel-bin/tensorflow/examples/tutorials/mnist:$PATH

RUN \
mkdir -p /root/tensorboard

RUN \
mkdir -p /root/models

# Apache2
RUN \
apt-get install -y apache2

RUN \
a2enmod proxy \
&& a2enmod proxy_http \
&& a2dissite 000-default

RUN \
mv /var/www/html /var/www/html.orig

RUN \
mv /etc/apache2/apache2.conf /etc/apache2/apache2.conf.orig

# All paths (dirs, not files) up to and including /root must have +x permissions.
# It's just the way linux works. Don't fight it.
# http://askubuntu.com/questions/537032/how-to-configure-apache2-with-symbolic-links-in-var-www
RUN \
chmod a+x /root

EXPOSE 80 50070 39000 9000 6006 8754 7077 6066 6060 6061 4040 4041 4042 4043 4044
# 8000
CMD ["supervise", "."]
4 changes: 3 additions & 1 deletion gpu.ml/README.md
Original file line number Diff line number Diff line change
@@ -1 +1,3 @@
[Tensorflow + GPU in Docker](https://github.com/fluxcapacitor/pipeline/wiki/TensorFlow-GPU-in-Docker)
[AWS GPU + Tensorflow + Spark + HDFS + Docker](https://github.com/fluxcapacitor/pipeline/wiki/AWS-GPU-Tensorflow-Docker)

[Google Cloud GPU + Tensorflow + Spark + HDFS + Docker](https://github.com/fluxcapacitor/pipeline/wiki/GCP-GPU-Tensorflow-Docker)
224 changes: 224 additions & 0 deletions gpu.ml/config/apache2/apache2.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,224 @@
# This is the main Apache server configuration file. It contains the
# configuration directives that give the server its instructions.
# See http://httpd.apache.org/docs/2.4/ for detailed information about
# the directives and /usr/share/doc/apache2/README.Debian about Debian specific
# hints.
#
#
# Summary of how the Apache 2 configuration works in Debian:
# The Apache 2 web server configuration in Debian is quite different to
# upstream's suggested way to configure the web server. This is because Debian's
# default Apache2 installation attempts to make adding and removing modules,
# virtual hosts, and extra configuration directives as flexible as possible, in
# order to make automating the changes and administering the server as easy as
# possible.

# It is split into several files forming the configuration hierarchy outlined
# below, all located in the /etc/apache2/ directory:
#
# /etc/apache2/
# |-- apache2.conf
# | `-- ports.conf
# |-- mods-enabled
# | |-- *.load
# | `-- *.conf
# |-- conf-enabled
# | `-- *.conf
# `-- sites-enabled
# `-- *.conf
#
#
# * apache2.conf is the main configuration file (this file). It puts the pieces
# together by including all remaining configuration files when starting up the
# web server.
#
# * ports.conf is always included from the main configuration file. It is
# supposed to determine listening ports for incoming connections which can be
# customized anytime.
#
# * Configuration files in the mods-enabled/, conf-enabled/ and sites-enabled/
# directories contain particular configuration snippets which manage modules,
# global configuration fragments, or virtual host configurations,
# respectively.
#
# They are activated by symlinking available configuration files from their
# respective *-available/ counterparts. These should be managed by using our
# helpers a2enmod/a2dismod, a2ensite/a2dissite and a2enconf/a2disconf. See
# their respective man pages for detailed information.
#
# * The binary is called apache2. Due to the use of environment variables, in
# the default configuration, apache2 needs to be started/stopped with
# /etc/init.d/apache2 or apache2ctl. Calling /usr/bin/apache2 directly will not
# work with the default configuration.


# Global configuration
#

#
# ServerRoot: The top of the directory tree under which the server's
# configuration, error, and log files are kept.
#
# NOTE! If you intend to place this on an NFS (or otherwise network)
# mounted filesystem then please read the Mutex documentation (available
# at <URL:http://httpd.apache.org/docs/2.4/mod/core.html#mutex>);
# you will save yourself a lot of trouble.
#
# Do NOT add a slash at the end of the directory path.
#
#ServerRoot "/etc/apache2"
ServerName localhost
#
# The accept serialization lock file MUST BE STORED ON A LOCAL DISK.
#
Mutex file:${APACHE_LOCK_DIR} default

#
# PidFile: The file in which the server should record its process
# identification number when it starts.
# This needs to be set in /etc/apache2/envvars
#
PidFile ${APACHE_PID_FILE}

#
# Timeout: The number of seconds before receives and sends time out.
#
Timeout 300

#
# KeepAlive: Whether or not to allow persistent connections (more than
# one request per connection). Set to "Off" to deactivate.
#
KeepAlive On

#
# MaxKeepAliveRequests: The maximum number of requests to allow
# during a persistent connection. Set to 0 to allow an unlimited amount.
# We recommend you leave this number high, for maximum performance.
#
MaxKeepAliveRequests 100

#
# KeepAliveTimeout: Number of seconds to wait for the next request from the
# same client on the same connection.
#
KeepAliveTimeout 5


# These need to be set in /etc/apache2/envvars
User ${APACHE_RUN_USER}
Group ${APACHE_RUN_GROUP}

#
# HostnameLookups: Log the names of clients or just their IP addresses
# e.g., www.apache.org (on) or 204.62.129.132 (off).
# The default is off because it'd be overall better for the net if people
# had to knowingly turn this feature on, since enabling it means that
# each client request will result in AT LEAST one lookup request to the
# nameserver.
#
HostnameLookups Off

# ErrorLog: The location of the error log file.
# If you do not specify an ErrorLog directive within a <VirtualHost>
# container, error messages relating to that virtual host will be
# logged here. If you *do* define an error logfile for a <VirtualHost>
# container, that host's errors will be logged there and not here.
#
ErrorLog ${APACHE_LOG_DIR}/error.log

#
# LogLevel: Control the severity of messages logged to the error_log.
# Available values: trace8, ..., trace1, debug, info, notice, warn,
# error, crit, alert, emerg.
# It is also possible to configure the log level for particular modules, e.g.
# "LogLevel info ssl:warn"
#
LogLevel warn

# Include module configuration:
IncludeOptional mods-enabled/*.load
IncludeOptional mods-enabled/*.conf

# Include list of ports to listen on
Include ports.conf


# Sets the default security model of the Apache2 HTTPD server. It does
# not allow access to the root filesystem outside of /usr/share and /var/www.
# The former is used by web applications packaged in Debian,
# the latter may be used for local directories served by the web server. If
# your system is serving content from a sub-directory in /srv you must allow
# access here, or in any related virtual host.
<Directory />
Options FollowSymLinks
AllowOverride None
Require all denied
</Directory>

<Directory /usr/share>
AllowOverride None
Require all granted
</Directory>

<Directory /var/www/>
Options Indexes FollowSymLinks
AllowOverride None
Require all granted
</Directory>

#<Directory /root/html/www/>
# Options FollowSymLinks
# AllowOverride None
# Require all granted
#</Directory>

#<Directory /srv/>
# Options Indexes FollowSymLinks
# AllowOverride None
# Require all granted
#</Directory>

# AccessFileName: The name of the file to look for in each directory
# for additional configuration directives. See also the AllowOverride
# directive.
#
AccessFileName .htaccess

#
# The following lines prevent .htaccess and .htpasswd files from being
# viewed by Web clients.
#
<FilesMatch "^\.ht">
Require all denied
</FilesMatch>


#
# The following directives define some format nicknames for use with
# a CustomLog directive.
#
# These deviate from the Common Log Format definitions in that they use %O
# (the actual bytes sent including headers) instead of %b (the size of the
# requested file), because the latter makes it impossible to detect partial
# requests.
#
# Note that the use of %{X-Forwarded-For}i instead of %h is not recommended.
# Use mod_remoteip instead.
#
LogFormat "%v:%p %h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" vhost_combined
LogFormat "%h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" combined
LogFormat "%h %l %u %t \"%r\" %>s %O" common
LogFormat "%{Referer}i -> %U" referer
LogFormat "%{User-agent}i" agent

# Include of directories ignores editors' and dpkg's backup files,
# see README.Debian for details.

# Include generic snippets of statements
IncludeOptional conf-enabled/*.conf

# Include the virtual host configurations:
IncludeOptional sites-enabled/*.conf

# vim: syntax=apache ts=4 sw=4 sts=4 sr noet
53 changes: 53 additions & 0 deletions gpu.ml/config/apache2/www.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
<VirtualHost *:80>
# The ServerName directive sets the request scheme, hostname and port that
# the server uses to identify itself. This is used when creating
# redirection URLs. In the context of virtual hosts, the ServerName
# specifies what hostname must appear in the request's Host: header to
# match this virtual host. For the default virtual host (this file) this
# value is not decisive as it is used as a last resort host regardless.
# However, you must set it for any further virtual host explicitly.
ServerName datasticks.com

ServerAdmin chris@fregly.com
DocumentRoot /var/www/html/
DirectoryIndex index.html

# Available loglevels: trace8, ..., trace1, debug, info, notice, warn,
# error, crit, alert, emerg.
# It is also possible to configure the loglevel for particular
# modules, e.g.
#LogLevel info ssl:warn

#ErrorLog /root/logs/apache2/error.log
#CustomLog /root/logs/apache2/access.log combined

# For most configuration files from conf-available/, which are
# enabled or disabled at a global level, it is possible to
# include a line for only one particular virtual host. For example the
# following line enables the CGI configuration for this host only
# after it has been globally disabled with "a2disconf".
#Include conf-available/serve-cgi-bin.conf
ProxyRequests Off
ProxyPreserveHost On

<Proxy *>
Require all granted
</Proxy>

<Location /zeppelin>
ProxyPass http://zeppelin.demo.pipeline.io
Require all granted
</Location>
<Location /jupyter>
ProxyPass http://juptyer.demo.pipeline.io
Require all granted
</Location>
<Location /spark>
ProxyPass http://spark.demo.pipeline.io
Require all granted
</Location>
<Location /prediction>
ProxyPass http://prediction-pmml-aws.demo.pipeline.io
Require all granted
</Location>
</VirtualHost>
2 changes: 1 addition & 1 deletion gpu.ml/config/jupyterhub/jupyterhub_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -292,7 +292,7 @@
c.Spawner.disable_user_config = True

# Whitelist of environment variables for the subprocess to inherit
c.Spawner.env_keep = ['CUDA_PKG_VERSION', 'CUDA_VERSION', 'CUDNN_VERSION', 'HADOOP_CONF', 'HADOOP_CONF_DIR', 'HADOOP_HOME', 'HADOOP_OPTS', 'HADOOP_VERSION', 'HOME', 'HOSTNAME', 'JAVA_HOME', 'LD_LIBRARY_PATH', 'LIBRARY_PATH', 'PATH', 'PYSPARK_VERSION', 'PYTHONPATH', 'CONDA_ROOT', 'CONDA_DEFAULT_ENV', 'VIRTUAL_ENV', 'LANG', 'LC_ALL', 'SPARK_HOME', 'SPARK_VERSION', 'TENSORFLOW_VERSION', 'PYSPARK_PYTHON', 'SPARK_MASTER', 'PYSPARK_SUBMIT_ARGS', 'SPARK_SUBMIT_ARGS']
c.Spawner.env_keep = ['CUDA_PKG_VERSION', 'CUDA_VERSION', 'CUDNN_VERSION', 'HADOOP_CONF', 'HADOOP_CONF_DIR', 'HADOOP_HOME', 'HADOOP_OPTS', 'HADOOP_VERSION', 'HOME', 'HOSTNAME', 'JAVA_HOME', 'LD_LIBRARY_PATH', 'LIBRARY_PATH', 'PATH', 'PYSPARK_VERSION', 'PYTHONPATH', 'CONDA_ROOT', 'CONDA_DEFAULT_ENV', 'VIRTUAL_ENV', 'LANG', 'LC_ALL', 'SPARK_HOME', 'SPARK_VERSION', 'TENSORFLOW_VERSION', 'PYSPARK_PYTHON', 'SPARK_MASTER', 'PYSPARK_SUBMIT_ARGS', 'SPARK_SUBMIT_ARGS', 'TF_CPP_MIN_LOG_LEVEL', 'TF_XLA_FLAGS']

# Environment variables to load for the Spawner.
#
Expand Down
2 changes: 2 additions & 0 deletions gpu.ml/datasets/hdfs/file1.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
10,11,12,13,14
15,16,17,18,19
2 changes: 2 additions & 0 deletions gpu.ml/datasets/hdfs/file2.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
90,91,92,93,94
95,96,97,98,99
Binary file added gpu.ml/datasets/mnist/t10k-images-idx3-ubyte.gz
Binary file not shown.
Binary file added gpu.ml/datasets/mnist/t10k-labels-idx1-ubyte.gz
Binary file not shown.
Binary file added gpu.ml/datasets/mnist/train-images-idx3-ubyte.gz
Binary file not shown.
Binary file added gpu.ml/datasets/mnist/train-labels-idx1-ubyte.gz
Binary file not shown.

0 comments on commit 61c827e

Please sign in to comment.