Skip to content

Commit

Permalink
Release 0.12.0 (#373)
Browse files Browse the repository at this point in the history
* Make location of config.json file configurable using environment variables (#350)

* Make location of config.json file configurable using environment variables

* Update minor version to 0.11.3

* Fix column drop issue when first row has missing value (#353)

* Remove extra line

* initial fix of dropping columns

* add unit tests

* revert sql query test change

* revert sql query test change 2

* bump versions

* move outside if

* Adding a working Docker setup for developing sparkmagic (#361)

* Adding a working Docker setup for developing sparkmagic

It includes the Jupyter notebook as well as the Livy+Spark endpoint.
Documentation is in the README

* Pre-configure the ~/.sparkmagic/config.json

Now you can just launch a PySpark wrapper kernel and have it work
out of the box.

* Add R to Livy container

Also added an R section to example_config.json to make it work
out of the box - and I think it's just a good thing to have it
anyway, otherwise how would users ever know it was meant to be
there?

* Add more detail to the README container section

* Add dev_mode build-arg.

Disabled by default. When enabled, builds the container using your local
copy of sparkmagic, so that you can test your development changes inside
the container.

* Adding missing kernels

Was missing Scala and Python2. Confirmed that Python2 and
Python3 are indeed separate environments on the spark
container.

* Kerberos authentication support (#355)

* Enabled kerberos authentication on sparkmagic and updated test cases.

* Enabled hide and show username/password based on auth_type.

* Updated as per comments.

* Updated documentation for kerberos support

* Added test cases to test backward compatibility of auth in handlers

* Update README.md

Change layout and add build status

* Bump version to 0.12.0 (#365)

* Remove extra line

* bump version

* Optional coerce (#367)

* Remove extra line

* added optional configuration to have optional coercion

* fix circular dependency between conf and utils

* add gcc installation for dev build

* fix parsing bug for coerce value

* fix parsing bug for coerce value 2

* Automatically configure wrapper-kernel endpoints in widget (#362)

* Add pre-configured endpoints to endpoint widget automatically

* Fix crash on partially-defined kernel configurations

* Use LANGS_SUPPORTED constant to get list of possible kernel config sections

* Rename is_default attr to implicitly_added

* Adding blank line between imports and class declaration

* Log failure to connect to implicitly-defined endpoints

* Adding comment explaining implicitly_added

* Pass auth parameter through

* Fix hash and auth to include auth parameter (#370)

* Fix hash and auth to include auth parameter

* fix endpoint validation

* remove unecessary commit

* Ability to add custom headers to HTTP calls (#371)

* Abiulity to add custom headers to rest call

* Fix import

* Ad basic conf test

* Fix tests

* Add test

* Fix tests

* Fix indent

* Addres review comments

* Add custom headers to example config
  • Loading branch information
aggFTW committed Jun 23, 2017
1 parent a10a59f commit 6422a2b
Show file tree
Hide file tree
Showing 37 changed files with 678 additions and 141 deletions.
36 changes: 36 additions & 0 deletions Dockerfile.jupyter
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
FROM jupyter/base-notebook:d0b2d159cc6c

ARG dev_mode=false

USER root

# This is needed because requests-kerberos fails to install on debian due to missing linux headers
RUN conda install requests-kerberos -y

USER $NB_USER

# Install sparkmagic - if DEV_MODE is set, use the one in the host directory.
# Otherwise, just install from pip.
COPY hdijupyterutils hdijupyterutils/
COPY autovizwidget autovizwidget/
COPY sparkmagic sparkmagic/
RUN if [ "$dev_mode" = "true" ]; then \
cd hdijupyterutils && pip install . && cd ../ && \
cd autovizwidget && pip install . && cd ../ && \
cd sparkmagic && pip install . && cd ../ ; \
else pip install sparkmagic ; fi

RUN mkdir /home/$NB_USER/.sparkmagic
COPY sparkmagic/example_config.json /home/$NB_USER/.sparkmagic/config.json
RUN sed -i 's/localhost/spark/g' /home/$NB_USER/.sparkmagic/config.json
RUN jupyter nbextension enable --py --sys-prefix widgetsnbextension
RUN jupyter-kernelspec install --user $(pip show sparkmagic | grep Location | cut -d" " -f2)/sparkmagic/kernels/sparkkernel
RUN jupyter-kernelspec install --user $(pip show sparkmagic | grep Location | cut -d" " -f2)/sparkmagic/kernels/pysparkkernel
RUN jupyter-kernelspec install --user $(pip show sparkmagic | grep Location | cut -d" " -f2)/sparkmagic/kernels/pyspark3kernel
RUN jupyter-kernelspec install --user $(pip show sparkmagic | grep Location | cut -d" " -f2)/sparkmagic/kernels/sparkrkernel
RUN jupyter serverextension enable --py sparkmagic

USER root
RUN chown $NB_USER /home/$NB_USER/.sparkmagic/config.json
RUN rm -rf hdijupyterutils/ autovizwidget/ sparkmagic/
USER $NB_USER
35 changes: 35 additions & 0 deletions Dockerfile.spark
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
FROM gettyimages/spark:2.1.0-hadoop-2.7

RUN apt-get update && apt-get install -yq --no-install-recommends --force-yes \
git \
openjdk-7-jdk \
maven \
python2.7 \
python3.4 \
r-base \
r-base-core && \
rm -rf /var/lib/apt/lists/*

ENV LIVY_BUILD_VERSION livy-server-0.3.0
ENV LIVY_APP_PATH /apps/$LIVY_BUILD_VERSION
ENV LIVY_BUILD_PATH /apps/build/livy
ENV PYSPARK_PYTHON python2.7
ENV PYSPARK3_PYTHON python3.4

RUN mkdir -p /apps/build && \
cd /apps/build && \
git clone https://github.com/cloudera/livy.git && \
cd $LIVY_BUILD_PATH && \
git checkout v0.3.0 && \
mvn -DskipTests -Dspark.version=$SPARK_VERSION clean package && \
ls -al $LIVY_BUILD_PATH && ls -al $LIVY_BUILD_PATH/assembly && ls -al $LIVY_BUILD_PATH/assembly/target && \
unzip $LIVY_BUILD_PATH/assembly/target/$LIVY_BUILD_VERSION.zip -d /apps && \
rm -rf $LIVY_BUILD_PATH && \
mkdir -p $LIVY_APP_PATH/upload && \
mkdir -p $LIVY_APP_PATH/logs


EXPOSE 8998

CMD ["/apps/livy-server-0.3.0/bin/livy-server"]

56 changes: 52 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
[![Build Status](https://travis-ci.org/jupyter-incubator/sparkmagic.svg?branch=master)](https://travis-ci.org/jupyter-incubator/sparkmagic)

# sparkmagic

Sparkmagic is a set of tools for interactively working with remote Spark clusters through [Livy](https://github.com/cloudera/hue/tree/master/apps/spark/java), a Spark REST server, in [Jupyter](http://jupyter.org) notebooks.
Expand All @@ -17,6 +19,7 @@ The Sparkmagic project includes a set of magics for interactively running Spark
* Automatic visualization of SQL queries in the PySpark, PySpark3, Spark and SparkR kernels; use an easy visual interface to interactively construct visualizations, no code required
* Easy access to Spark application information and logs (`%%info` magic)
* Ability to capture the output of SQL queries as Pandas dataframes to interact with other Python libraries (e.g. matplotlib)
* Authenticate to Livy via Basic Access authentication or via Kerberos

## Examples

Expand Down Expand Up @@ -54,9 +57,53 @@ See [Pyspark](examples/Pyspark Kernel.ipynb) and [Spark](examples/Spark Kernel.i

jupyter serverextension enable --py sparkmagic
### Server extension API
## Authentication Methods

Sparkmagic supports:

* No auth
* Basic authentication
* Kerberos

Kerberos support is implemented via the [requests-kerberos](https://github.com/requests/requests-kerberos) package. Sparkmagic expects a kerberos ticket to be available in the system. Requests-kerberos will pick up the kerberos ticket from a cache file. For the ticket to be available, the user needs to have run [kinit](https://web.mit.edu/kerberos/krb5-1.12/doc/user/user_commands/kinit.html) to create the kerberos ticket.

Currently, sparkmagic does not support passing a kerberos principal/token, but we welcome pull requests.

## Docker

The included `docker-compose.yml` file will let you spin up a full
sparkmagic stack that includes a Jupyter notebook with the appropriate
extensions installed, and a Livy server backed by a local-mode Spark instance.
(This is just for testing and developing sparkmagic itself; in reality,
sparkmagic is not very useful if your Spark instance is on the same machine!)

In order to use it, make sure you have [Docker](https://docker.com) and
[Docker Compose](https://docs.docker.com/compose/) both installed, and
then simply run:

docker-compose build
docker-compose up

You will then be able to access the Jupyter notebook in your browser at
http://localhost:8888. Inside this notebook, you can configure a
sparkmagic endpoint at http://spark:8998. This endpoint is able to
launch both Scala and Python sessions. You can also choose to start a
wrapper kernel for Scala, Python, or R from the list of kernels.

To shut down the containers, you can interrupt `docker-compose` with
`Ctrl-C`, and optionally remove the containers with `docker-compose
down`.

If you are developing sparkmagic and want to test out your changes in
the Docker container without needing to push a version to PyPI, you can
set the `dev_mode` build arg in `docker-compose.yml` to `true`, and then
re-build the container. This will cause the container to install your
local version of autovizwidget, hdijupyterutils, and sparkmagic. Make
sure to re-run `docker-compose build` before each test run.

## Server extension API

#### `/reconnectsparkmagic`:
### `/reconnectsparkmagic`:
* `POST`:
Allows to specify Spark cluster connection information to a notebook passing in the notebook path and cluster information.
Kernel will be started/restarted and connected to cluster specified.
Expand All @@ -68,11 +115,12 @@ Request Body example:
'username': 'username',
'password': 'password',
'endpoint': 'url',
'auth': 'Kerberos',
'kernelname': 'pysparkkernel'
}
```

*Note that the kernelname parameter is optional and defaults to the one specified on the config file or pysparkkernel if not on the config file.*
*Note that the auth can be either None, Basic_Access or Kerberos based on the authentication enabled in livy. The kernelname parameter is optional and defaults to the one specified on the config file or pysparkkernel if not on the config file.*
Returns `200` if successful; `400` if body is not JSON string or key is not found; `500` if error is encountered changing clusters.

Reply Body example:
Expand Down Expand Up @@ -125,4 +173,4 @@ To run unit tests, run:

nosetests hdijupyterutils autovizwidget sparkmagic

If you want to see an enhancement made but don't have time to work on it yourself, feel free to submit an [issue](https://github.com/jupyter-incubator/sparkmagic/issues) for us to deal with.
If you want to see an enhancement made but don't have time to work on it yourself, feel free to submit an [issue](https://github.com/jupyter-incubator/sparkmagic/issues) for us to deal with.
2 changes: 1 addition & 1 deletion autovizwidget/autovizwidget/__init__.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = '0.11.4'
__version__ = '0.12.0'
21 changes: 21 additions & 0 deletions docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
version: "3"
services:
spark:
image: jupyter/sparkmagic-livy
build:
context: .
dockerfile: Dockerfile.spark
hostname: spark
ports:
- "8998:8998"
jupyter:
image: jupyter/sparkmagic
build:
context: .
dockerfile: Dockerfile.jupyter
args:
dev_mode: "false"
links:
- spark
ports:
- "8888:8888"
2 changes: 1 addition & 1 deletion hdijupyterutils/hdijupyterutils/__init__.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = '0.11.4'
__version__ = '0.12.0'
13 changes: 11 additions & 2 deletions sparkmagic/example_config.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,17 @@
"kernel_python_credentials" : {
"username": "",
"password": "",
"url": "http://localhost:8998"
"url": "http://localhost:8998",
"auth": "None"
},

"kernel_scala_credentials" : {
"username": "",
"password": "",
"url": "http://localhost:8998",
"auth": "None"
},
"kernel_r_credentials": {
"username": "",
"password": "",
"url": "http://localhost:8998"
Expand Down Expand Up @@ -50,12 +57,14 @@
},

"use_auto_viz": true,
"coerce_dataframe": true,
"max_results_sql": 2500,
"pyspark_dataframe_encoding": "utf-8",

"heartbeat_refresh_seconds": 30,
"livy_server_heartbeat_timeout_seconds": 0,
"heartbeat_retry_seconds": 10,

"server_extension_default_kernel_name": "pysparkkernel"
"server_extension_default_kernel_name": "pysparkkernel",
"custom_headers": {}
}
3 changes: 2 additions & 1 deletion sparkmagic/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -9,4 +9,5 @@ requests
ipykernel>=4.2.2,<5
ipywidgets>5.0.0,<7.0
notebook>=4.2,<5.0
tornado>=4
tornado>=4
requests_kerberos>=0.8.0
3 changes: 2 additions & 1 deletion sparkmagic/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,7 @@ def version(path):
'ipykernel>=4.2.2,<5',
'ipywidgets>5.0.0,<7.0',
'notebook>=4.2,<5.0',
'tornado>=4'
'tornado>=4',
'requests_kerberos>=0.8.0'
])

3 changes: 2 additions & 1 deletion sparkmagic/sparkmagic/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
__version__ = '0.11.4'
__version__ = '0.12.0'

from sparkmagic.serverextension.handlers import load_jupyter_server_extension


Expand Down
23 changes: 21 additions & 2 deletions sparkmagic/sparkmagic/controllerwidget/addendpointwidget.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
# Distributed under the terms of the Modified BSD License.
from sparkmagic.controllerwidget.abstractmenuwidget import AbstractMenuWidget
from sparkmagic.livyclientlib.endpoint import Endpoint
import sparkmagic.utils.constants as constants


class AddEndpointWidget(AbstractMenuWidget):
Expand Down Expand Up @@ -32,24 +33,42 @@ def __init__(self, spark_controller, ipywidget_factory, ipython_display, endpoin
value='password',
width=widget_width
)
self.auth = self.ipywidget_factory.get_dropdown(
options={constants.AUTH_KERBEROS: constants.AUTH_KERBEROS, constants.AUTH_BASIC: constants.AUTH_BASIC,
constants.NO_AUTH: constants.NO_AUTH},
description=u"Auth type:"
)

# Submit widget
self.submit_widget = self.ipywidget_factory.get_submit_button(
description='Add endpoint'
)

self.auth.on_trait_change(self._show_correct_endpoint_fields)

self.children = [self.ipywidget_factory.get_html(value="<br/>", width=widget_width),
self.address_widget, self.user_widget, self.password_widget,
self.address_widget, self.auth, self.user_widget, self.password_widget,
self.ipywidget_factory.get_html(value="<br/>", width=widget_width), self.submit_widget]

for child in self.children:
child.parent_widget = self

self._show_correct_endpoint_fields()

def run(self):
endpoint = Endpoint(self.address_widget.value, self.user_widget.value, self.password_widget.value)
endpoint = Endpoint(self.address_widget.value, self.auth.value, self.user_widget.value, self.password_widget.value)
self.endpoints[self.address_widget.value] = endpoint
self.ipython_display.writeln("Added endpoint {}".format(self.address_widget.value))

# We need to call the refresh method because drop down in Tab 2 for endpoints wouldn't refresh with the new
# value otherwise.
self.refresh_method()

def _show_correct_endpoint_fields(self):
if self.auth.value == constants.NO_AUTH or self.auth.value == constants.AUTH_KERBEROS:
self.user_widget.layout.display = 'none'
self.password_widget.layout.display = 'none'
else:
self.user_widget.layout.display = 'flex'
self.password_widget.layout.display = 'flex'

21 changes: 20 additions & 1 deletion sparkmagic/sparkmagic/controllerwidget/magicscontrollerwidget.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,21 +5,40 @@
from sparkmagic.controllerwidget.manageendpointwidget import ManageEndpointWidget
from sparkmagic.controllerwidget.managesessionwidget import ManageSessionWidget
from sparkmagic.controllerwidget.createsessionwidget import CreateSessionWidget
from sparkmagic.livyclientlib.endpoint import Endpoint
from sparkmagic.utils.constants import LANGS_SUPPORTED
import sparkmagic.utils.configuration as conf


class MagicsControllerWidget(AbstractMenuWidget):
def __init__(self, spark_controller, ipywidget_factory, ipython_display, endpoints=None):
super(MagicsControllerWidget, self).__init__(spark_controller, ipywidget_factory, ipython_display)

if endpoints is None:
endpoints = {}
endpoints = {endpoint.url: endpoint for endpoint in self._get_default_endpoints()}
self.endpoints = endpoints

self._refresh()

def run(self):
pass

@staticmethod
def _get_default_endpoints():
default_endpoints = set()

for kernel_type in LANGS_SUPPORTED:
endpoint_config = getattr(conf, 'kernel_%s_credentials' % kernel_type)()
if all([p in endpoint_config for p in ["url", "password", "username"]]) and endpoint_config["url"] != "":
default_endpoints.add(Endpoint(
username=endpoint_config["username"],
password=endpoint_config["password"],
auth=endpoint_config.get("auth", None),
url=endpoint_config["url"],
implicitly_added=True))

return default_endpoints

def _refresh(self):
self.endpoints_dropdown_widget = self.ipywidget_factory.get_dropdown(
description="Endpoint:",
Expand Down
12 changes: 11 additions & 1 deletion sparkmagic/sparkmagic/controllerwidget/manageendpointwidget.py
Original file line number Diff line number Diff line change
@@ -1,13 +1,16 @@
# Copyright (c) 2015 aggftw@gmail.com
# Distributed under the terms of the Modified BSD License.
from sparkmagic.controllerwidget.abstractmenuwidget import AbstractMenuWidget
from sparkmagic.livyclientlib.exceptions import HttpClientException
from sparkmagic.utils.sparklogger import SparkLog


class ManageEndpointWidget(AbstractMenuWidget):
def __init__(self, spark_controller, ipywidget_factory, ipython_display, endpoints, refresh_method):
# This is nested
super(ManageEndpointWidget, self).__init__(spark_controller, ipywidget_factory, ipython_display, True)

self.logger = SparkLog("ManageEndpointWidget")
self.endpoints = endpoints
self.refresh_method = refresh_method

Expand All @@ -30,7 +33,14 @@ def get_existing_endpoint_widgets(self):

# Endpoints
for url, endpoint in self.endpoints.items():
endpoint_widgets.append(self.get_endpoint_widget(url, endpoint))
try:
endpoint_widgets.append(self.get_endpoint_widget(url, endpoint))
except HttpClientException:
# If we can't reach one of the default endpoints, just skip over it
if not endpoint.implicitly_added:
raise
else:
self.logger.info("Failed to connect to implicitly-defined endpoint at: %s" % url)

endpoint_widgets.append(self.ipywidget_factory.get_html(value="<br/>", width="600px"))
else:
Expand Down

0 comments on commit 6422a2b

Please sign in to comment.