-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Zeppelin: Add Zeppelin image to Spark example #17025
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,4 @@ | ||
FROM java:latest | ||
FROM java:openjdk-8-jdk | ||
|
||
ENV hadoop_ver 2.6.1 | ||
ENV spark_ver 1.5.1 | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,66 @@ | ||
# Copyright 2015 The Kubernetes Authors All rights reserved. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
# Based heavily on | ||
# https://github.com/dylanmei/docker-zeppelin/blob/master/Dockerfile | ||
# (which is similar to many others out there), but rebased onto maven | ||
# image. | ||
# | ||
# This image is a composition of the official docker-maven | ||
# Docker image from https://github.com/carlossg/docker-maven/ and | ||
# spark-base. | ||
|
||
FROM gcr.io/google_containers/spark-base:latest | ||
|
||
ENV ZEPPELIN_TAG v0.5.5 | ||
ENV MAVEN_VERSION 3.3.3 | ||
ENV SPARK_MINOR 1.5 | ||
ENV SPARK_PATCH 1 | ||
ENV SPARK_VER ${SPARK_MINOR}.${SPARK_PATCH} | ||
ENV HADOOP_MINOR 2.6 | ||
ENV HADOOP_PATCH 1 | ||
ENV HADOOP_VER ${HADOOP_MINOR}.${HADOOP_PATCH} | ||
|
||
RUN curl -fsSL http://archive.apache.org/dist/maven/maven-3/${MAVEN_VERSION}/binaries/apache-maven-${MAVEN_VERSION}-bin.tar.gz | tar xzf - -C /usr/share \ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You use curl here, though you install it further down on line 46. (though I changed the base image, so perhaps this isn't a problem for you on There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
&& mv /usr/share/apache-maven-${MAVEN_VERSION} /usr/share/maven \ | ||
&& ln -s /usr/share/maven/bin/mvn /usr/bin/mvn | ||
|
||
ENV MAVEN_HOME /usr/share/maven | ||
|
||
# libfontconfig is a workaround for | ||
# https://github.com/karma-runner/karma/issues/1270, which caused a | ||
# build break similar to | ||
# https://www.mail-archive.com/users@zeppelin.incubator.apache.org/msg01586.html | ||
|
||
RUN apt-get update \ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ohh man, do we really want to install mvn and devel tooling? Or poke upstream harder? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. All the Dockerfiles I found for Zeppelin built it. This image takes a long time to build. My plan was to use this PR as a foil to poke the upstream (especially on There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. binaries for spark 1.4.0 are available, which would be preferable @rnowling have version of zeppelin do you have running and does it work w/ spark 1.5? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Where? http://zeppelin-project.org/download.html
The v0.5.5 tag (which I just noticed, and may peg to now - it happened while I was working.) has no binaries on GitHub. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i guess zeppelin-project.org is out of sync w/ https://zeppelin.incubator.apache.org/download.html There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @rnowling is spark 1.5 w/ zeppelin 0.5 something you can test out in O(1hr)? if so, please do. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @mattf Zeppelin 0.5 doesn't support Spark 1.5 -- there is no Spark 1.5 profile under spark/pom.xml and (at least in my experience), Zeppelin is very specific about the version. The Zeppelin Dockerfile is using master which includes the changes in 0.5.5 to support Spark 1.5. (Maybe best to bind a specific version instead of the latest from master.) I'm building and testing Z 0.5.5 w/ Spark 1.5 locally now. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have Zeppelin 0.5.5 working with Spark 1.5 -- Spark 1.5.1 is bundled as part of the spark-1.5 profile. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, this Dockerfile is based on master, which I was testing with as of a few days ago and is practically 0.5.5. As I mentioned in https://github.com/kubernetes/kubernetes/pull/17025/files#r44416358, I can shift it over to 0.5.5 now that there's a tag that makes any sense. Zeppelin needs a better release cadence, though - their previous release was very old. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. JFYI, if I'm looking at their |
||
&& apt-get install -y net-tools build-essential git wget unzip python python-setuptools python-dev python-numpy libfontconfig \ | ||
&& apt-get clean \ | ||
&& rm -rf /var/lib/apt/lists/* | ||
|
||
RUN git clone https://github.com/apache/incubator-zeppelin.git --branch ${ZEPPELIN_TAG} /opt/zeppelin | ||
RUN cd /opt/zeppelin && \ | ||
mvn clean package \ | ||
-Pspark-${SPARK_MINOR} -Dspark.version=${SPARK_VER} \ | ||
-Phadoop-${HADOOP_MINOR} -Dhadoop.version=${HADOOP_VER} \ | ||
-Ppyspark \ | ||
-DskipTests && \ | ||
rm -rf /root/.m2 && \ | ||
rm -rf /root/.npm && \ | ||
echo "Successfully built Zeppelin" | ||
|
||
ADD zeppelin-log4j.properties /opt/zeppelin/conf/log4j.properties | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same comments as other entry around config. I would almost like to hold off further PRs until that is in place. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, I may take a trip sideways while we figure out how to get a more polished story around a lot of things in this space. I wanted to get a vertical slice up first. |
||
ADD zeppelin-env.sh /opt/zeppelin/conf/zeppelin-env.sh | ||
ADD docker-zeppelin.sh /opt/zeppelin/bin/docker-zeppelin.sh | ||
EXPOSE 8080 | ||
ENTRYPOINT ["/opt/zeppelin/bin/docker-zeppelin.sh"] |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
#!/bin/bash | ||
|
||
# Copyright 2015 The Kubernetes Authors All rights reserved. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
export ZEPPELIN_HOME=/opt/zeppelin | ||
export ZEPPELIN_CONF_DIR="${ZEPPELIN_HOME}/conf" | ||
|
||
echo "=== Launching Zeppelin under Docker ===" | ||
/opt/zeppelin/bin/zeppelin.sh "${ZEPPELIN_CONF_DIR}" |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
#!/bin/bash | ||
|
||
# Copyright 2015 The Kubernetes Authors All rights reserved. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
export MASTER="spark://spark-master:7077" | ||
export SPARK_HOME=/opt/spark | ||
export ZEPPELIN_JAVA_OPTS="-Dspark.jars=/opt/spark/lib/gcs-connector-latest-hadoop2.jar" | ||
# TODO(zmerlynn): Setting global CLASSPATH *should* be unnecessary, | ||
# but ZEPPELIN_JAVA_OPTS isn't enough here. :( | ||
export CLASSPATH="/opt/spark/lib/gcs-connector-latest-hadoop2.jar" | ||
export ZEPPELIN_NOTEBOOK_DIR="${ZEPPELIN_HOME}/notebook" | ||
export ZEPPELIN_MEM=-Xmx1024m | ||
export ZEPPELIN_PORT=8080 | ||
export PYTHONPATH="${SPARK_HOME}/python:${SPARK_HOME}/python/lib/py4j-0.8.2.1-src.zip" |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
# Set everything to be logged to the console. | ||
log4j.rootCategory=INFO, console | ||
log4j.appender.console=org.apache.log4j.ConsoleAppender | ||
log4j.appender.console.target=System.err | ||
log4j.appender.console.layout=org.apache.log4j.PatternLayout | ||
log4j.appender.console.layout.ConversionPattern=%5p [%d] ({%t} %F[%M]:%L) - %m%n |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,19 +1,21 @@ | ||
kind: ReplicationController | ||
apiVersion: v1 | ||
metadata: | ||
name: spark-driver-controller | ||
name: zeppelin-controller | ||
spec: | ||
replicas: 1 | ||
selector: | ||
component: spark-driver | ||
component: zeppelin | ||
template: | ||
metadata: | ||
labels: | ||
component: spark-driver | ||
component: zeppelin | ||
spec: | ||
containers: | ||
- name: spark-driver | ||
image: gcr.io/google_containers/spark-driver:1.5.1_v2 | ||
- name: zeppelin | ||
image: gcr.io/google_containers/zeppelin:v0.5.5_v1 | ||
ports: | ||
- containerPort: 8080 | ||
resources: | ||
requests: | ||
cpu: 100m |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
kind: Service | ||
apiVersion: v1 | ||
metadata: | ||
name: zeppelin | ||
spec: | ||
ports: | ||
- port: 8080 | ||
targetPort: 8080 | ||
selector: | ||
component: zeppelin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What version does upstream recommend, as they typically establish a baseline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Building Spark using Maven requires Maven 3.3.3 or newer and Java 7+
But I was actually pegging it because I didn't want the Zeppelin build process to go all over the process (in the mythical world where an
openjdk-9-jdk
gets released).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(And yes, I realize we're not building Spark in this image, I'm just pegging the JDK for the runtime.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(And I didn't even push a new image, because
:latest
is actually ==:openjdk-8-jdk
right now.)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
simple scan of the zeppelin docs show java 1.7 - https://zeppelin.incubator.apache.org/docs/install/install.html
maybe that means 1.7 only or 1.7+, pinning on java 8 start makes sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it seems to be function on JDK8, but I could kick it back if you want. Sorry, I did notice that pre-req as well and presumed it meant 1.7+. Other parts of that page are slightly in need of editing, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'd say stick w/ the 1.7+ assumption until something goes wrong