Skip to content

Commit

Permalink
updated dockerfiles (#261)
Browse files Browse the repository at this point in the history
* updated dockerfiles

* new example

* pin versions

* typo

* pin more versions

* notebook

* fix pip syntax

* more updates and fixes

* latest versions

* drop daskernetes

* pip kubernetes verstion

* point to correct worker image

* trying stuff

* trying more stuff
  • Loading branch information
rabernat committed May 18, 2018
1 parent c670c3f commit 0293049
Show file tree
Hide file tree
Showing 4 changed files with 232 additions and 23 deletions.
26 changes: 13 additions & 13 deletions gce/notebook/Dockerfile
Expand Up @@ -8,15 +8,16 @@ RUN apt-get update \
USER $NB_USER

RUN conda install --yes -c defaults -c ioam -c bokeh/channel/dev -c intake \
bokeh=0.12.15dev1 \
bokeh=0.12.16 \
cython \
cytoolz \
datashader \
dask=0.17.5 \
dask-ml \
distributed=1.21.8 \
fastparquet \
ipywidgets \
jupyterlab \
jupyterlab=0.32.1 \
jupyterlab_launcher=0.10.5 \
holoviews \
lz4 \
matplotlib \
Expand All @@ -25,27 +26,26 @@ RUN conda install --yes -c defaults -c ioam -c bokeh/channel/dev -c intake \
nomkl \
numba \
numcodecs \
numpy \
numpy=1.14.3 \
pandas \
python-blosc \
scipy \
scikit-image \
scikit-learn \
tornado \
xarray \
zict \
intake-xarray \
&& conda clean -tipsy

RUN pip install fusepy click jedi kubernetes --upgrade --no-cache-dir
RUN pip install --upgrade pip

RUN pip install daskernetes==0.1.3 \
dask-kubernetes \
git+https://github.com/zarr-developers/zarr \
git+https://github.com/pydata/xarray \
git+https://github.com/dask/gcsfs \
git+https://github.com/jupyterhub/nbserverproxy \
git+https://github.com/xgcm/xgcm \
RUN pip install fusepy click jedi kubernetes==4.0.0 dask-kubernetes s3fs \
gcsfs==0.1.0 zarr==2.2.0 xarray==0.10.4 \
nbserverproxy==0.8.1 \
--upgrade --no-cache-dir

RUN pip install git+https://github.com/xgcm/xgcm \
git+https://github.com/bokeh/datashader.git \
--no-cache-dir \
--upgrade

Expand Down
212 changes: 212 additions & 0 deletions gce/notebook/examples/cm26.ipynb
@@ -0,0 +1,212 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# CM2.6 Ocean Model Analysis\n",
"\n",
"This notebook shows how to load and analyze ocean data from the GFDL [CM2.6](https://www.gfdl.noaa.gov/cm2-6/) high-resolution climate simulation.\n",
"\n",
"![CM2.6 SST](https://www.gfdl.noaa.gov/wp-content/uploads/ih/2012/06/cm2.6.png)\n",
"\n",
"Right now the only output available is the 5-day 3D fields of horizontal velocity, temperature, and salinity. We hope to add more going forward.\n",
"\n",
"Thanks to [Stephen Griffies](https://www.gfdl.noaa.gov/stephen-griffies-homepage/) for providing the data.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%matplotlib inline\n",
"\n",
"import numpy as np\n",
"import xarray as xr\n",
"import matplotlib.pyplot as plt\n",
"import holoviews as hv\n",
"import datashader\n",
"from holoviews.operation.datashader import regrid, shade, datashade\n",
"\n",
"hv.extension('bokeh', width=100)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create and Connect to Dask Distributed Cluster\n",
"\n",
"This will launch a cluster of virtual machines in the cloud."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from dask.distributed import Client, progress\n",
"from dask_kubernetes import KubeCluster\n",
"cluster = KubeCluster(n_workers=40)\n",
"cluster"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"👆 Don't forget to click this link to get the cluster dashboard"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"client = Client(cluster)\n",
"client"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Load CM 2.6 Data\n",
"\n",
"This data is stored in [xarray-zarr](http://xarray.pydata.org/en/latest/io.html#zarr) format in Google Cloud Storage.\n",
"This format is optimized for parallel distributed reads from within the cloud environment.\n",
"\n",
"It may take up to a minute to initialize the dataset when you run this cell."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#experiment = 'one_percent'\n",
"experiment = 'control'\n",
"\n",
"# Load with Cloud object storage\n",
"import gcsfs\n",
"gcsmap = gcsfs.mapping.GCSMap('pangeo-data/cm2.6/%s/temp_salt_u_v-5day_avg/' % experiment)\n",
"ds = xr.open_zarr(gcsmap, decode_cf=True, decode_times=False)\n",
"\n",
"# Print dataset\n",
"ds"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Visualize Temperature Data with Holoviews and Datashader\n",
"\n",
"The cells below show how to interactively explore the dataset.\n",
"\n",
"_**Warning**: it takes ~10-20 seconds to render each image after moving the sliders. Please be patient. There is an open [github issue](https://github.com/bokeh/datashader/issues/598) about improving the performance of datashader with this sort of dataset._"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"hv_ds = hv.Dataset(ds['temp'])\n",
"qm = hv_ds.to(hv.QuadMesh, kdims=[\"xt_ocean\", \"yt_ocean\"], dynamic=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%opts RGB [width=1000 height=600] \n",
"\n",
"# runs out of memory easily...change options at your own risk\n",
"datashade(qm, precompute=False, cmap=plt.cm.magma)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Make an Expensive Calculation\n",
"\n",
"Here we make a big reduction by taking the time and zonal mean of the temperature. This demonstrates how the cluster distributes the reads from storage."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"temp_zonal_mean = ds.temp.mean(dim=('time', 'xt_ocean'))\n",
"temp_zonal_mean"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Depending on the size of your cluster, this next cell will take a while. On a cluster of 40 workers, it took ~12 minutes."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%time temp_zonal_mean.load()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"fig, ax = plt.subplots(figsize=(16,8))\n",
"temp_zonal_mean.plot.contourf(yincrease=False, levels=np.arange(-2,30))\n",
"plt.title('Naive Zonal Mean Temperature')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python [default]",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.5"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
2 changes: 1 addition & 1 deletion gce/notebook/worker-template.yaml
Expand Up @@ -11,7 +11,7 @@ spec:
- 6GB
- --death-timeout
- '60'
image: pangeo/worker:2018-05-06
image: pangeo/worker:2018-05-17
name: dask-worker
securityContext:
capabilities:
Expand Down
15 changes: 6 additions & 9 deletions gce/worker/Dockerfile
Expand Up @@ -5,6 +5,7 @@ RUN wget -O /usr/local/bin/dumb-init https://github.com/Yelp/dumb-init/releases/
RUN chmod +x /usr/local/bin/dumb-init

RUN conda install --yes -c conda-forge \
cython \
cytoolz \
dask=0.17.4 \
distributed=1.21.8 \
Expand All @@ -15,12 +16,11 @@ RUN conda install --yes -c conda-forge \
nomkl \
numba \
numcodecs \
numpy \
numpy=1.14.3 \
pandas \
python-blosc \
scikit-image \
scipy \
xarray \
zict \
&& conda clean -tipsy

Expand All @@ -29,14 +29,11 @@ RUN apt-get update \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*

RUN pip install pyasn1 click urllib3 --upgrade
RUN pip install --upgrade pip

RUN pip install git+https://github.com/zarr-developers/zarr \
git+https://github.com/pydata/xarray \
git+https://github.com/dask/gcsfs@f99177b31c44fcc404619b2876a77cdcda955a75 \
fusepy \
--no-cache-dir \
--upgrade
RUN pip install pyasn1 click urllib3 fusepy s3fs \
gcsfs==0.1.0 zarr==2.2.0 xarray==0.10.4 \
--upgrade --no-cache-dir

ENV OMP_NUM_THREADS=1
ENV DASK_TICK_MAXIMUM_DELAY=5s
Expand Down

0 comments on commit 0293049

Please sign in to comment.