updated dockerfiles #261

rabernat · 2018-05-17T17:54:10Z

Should fix #259.

I pinned the numpy version to make sure it is the same on both notebook and worker.

Anything else we want to throw in here?

cc @mrocklin

jhamman · 2018-05-17T18:01:09Z

did you confirm this grabs cftime?

mrocklin · 2018-05-17T18:07:45Z

gce/notebook/Dockerfile

@@ -8,9 +8,8 @@ RUN apt-get update \
 USER $NB_USER

 RUN conda install --yes -c defaults -c ioam -c bokeh/channel/dev -c intake \
-    bokeh=0.12.15dev1 \
+    bokeh \


I recommend that we continue to pin to something just to be explicit, perhaps 0.12.16 ?

mrocklin · 2018-05-17T18:09:45Z

gce/worker/Dockerfile

@@ -33,7 +33,7 @@ RUN pip install pyasn1 click urllib3 --upgrade

 RUN pip install git+https://github.com/zarr-developers/zarr \
                git+https://github.com/pydata/xarray \
-                git+https://github.com/dask/gcsfs@f99177b31c44fcc404619b2876a77cdcda955a75 \
+                git+https://github.com/dask/gcsfs \


Same here. We should probably pin to something.

@martindurant has there been a release since this commit? It looks like it was committed on March 22nd

Yes, v0.1.0 on April 20

rabernat · 2018-05-17T18:16:03Z

did you confirm this grabs cftime?

No I have not. I was waiting to get some feedback before actually doing the docker build.

jhamman · 2018-05-17T18:23:26Z

gce/notebook/Dockerfile

@@ -37,15 +36,15 @@ RUN conda install --yes -c defaults -c ioam -c bokeh/channel/dev -c intake \
    intake-xarray \
    && conda clean -tipsy

-RUN pip install fusepy click jedi kubernetes --upgrade --no-cache-dir
+RUN pip install fusepy click jedi kubernetes gcsfs=0.1.0 --upgrade --no-cache-dir


I think we can safely pin xarray here to xarray=0.10.4

Is it better to install xarray from conda or pip?

In this case, probably pip. The dependencies are taken care of above and the defaults channel of conda is still on 10.3.

We actually currently have xarray in both the conda and pip install lists. Is there any good reason for that? Can we safely remove it form the conda install part?

rsignell-usgs · 2018-05-17T18:55:14Z

Would you be interested in including s3fs to facilitate across-cloud testing?

rabernat · 2018-05-17T18:58:58Z

Cython needed to be added to get cftime to work. Not sure why.

rabernat · 2018-05-17T18:59:29Z

Would you be interested in including s3fs to facilitate across-cloud testing?

This is a good idea @rsignell-usgs. Are you using the exact same docker images on your AWS pangeo cluster?

rabernat · 2018-05-17T19:05:06Z

Also, should we consider adding rasterio?

jhamman · 2018-05-17T19:11:51Z

Also, should we consider adding rasterio?

I think this would a good idea, though it will probably incur a fairly large cost in terms of the size of the docker image (gdal is a dep). Also, just for reference, xarray's rasterio backend will not work with distributed until we finish pydata/xarray#2131 so that limits the applicability here for a bit.

rsignell-usgs · 2018-05-17T19:19:19Z

@rabernat , I have these additions: s3fs, rasterio and utide:

https://github.com/rsignell-usgs/pangeo/blob/rsignell-aws/aws/notebook/Dockerfile
https://github.com/rsignell-usgs/pangeo/blob/rsignell-aws/aws/worker/Dockerfile

rabernat · 2018-05-17T19:21:00Z

Here is everything else that installed if I include conda-forge rasterio

The following NEW packages will be INSTALLED:

    affine:           2.2.0-py_0                    conda-forge
    boost:            1.66.0-py36_1                 conda-forge
    boost-cpp:        1.66.0-1                      conda-forge
    boto3:            1.7.22-py_0                   conda-forge
    botocore:         1.10.22-py_0                  conda-forge
    cairo:            1.14.12-h77bcde2_0            defaults   
    click-plugins:    1.0.3-py36_0                  conda-forge
    cligj:            0.4.0-py36_0                  conda-forge
    docutils:         0.14-py36_0                   conda-forge
    freexl:           1.0.5-0                       conda-forge
    gdal:             2.2.2-py36hc209d97_1          defaults   
    geos:             3.6.2-1                       conda-forge
    giflib:           5.1.4-0                       conda-forge
    jmespath:         0.9.3-py36_0                  conda-forge
    json-c:           0.12.1-0                      conda-forge
    kealib:           1.4.7-4                       conda-forge
    krb5:             1.14.6-0                      conda-forge
    libdap4:          3.19.2-1                      conda-forge
    libgdal:          2.2.2-h804cdde_1              defaults   
    libgfortran:      3.0.0-1                       defaults   
    libiconv:         1.15-0                        conda-forge
    libkml:           1.3.0-6                       conda-forge
    libpq:            9.6.6-h4e02ad2_0              defaults   
    libspatialite:    4.3.0a-h72746d6_18            defaults   
    openjpeg:         2.3.0-2                       conda-forge
    pixman:           0.34.0-2                      conda-forge
    poppler:          0.60.1-hc909a00_0             defaults   
    poppler-data:     0.4.9-0                       conda-forge
    proj4:            4.9.3-5                       conda-forge
    rasterio:         0.36.0-py36_3                 conda-forge
    s3transfer:       0.1.13-py36_0                 conda-forge
    snuggs:           1.4.1-py36_0                  conda-forge
    util-linux:       2.21-0                        defaults   
    xerces-c:         3.2.1-0                       conda-forge

The following packages will be SUPERSEDED by a higher-priority channel:

    hdf5:             1.10.2-hba1933b_1             defaults    --> 1.10.1-2                    conda-forge
    libnetcdf:        4.6.1-h13459d8_0              defaults    --> 4.4.1.1-10                  conda-forge
    netcdf4:          1.4.0-py36hd4b6044_1          defaults    --> 1.3.1-py36_1                conda-forge

The following packages will be DOWNGRADED:

    blas:             1.1-openblas                  conda-forge --> 1.0-mkl                     defaults   
    dbus:             1.13.2-h714fa37_1             defaults    --> 1.13.2-hc3f9b76_0           defaults   
    glib:             2.56.1-h000015b_0             defaults    --> 2.53.6-h5d9569c_2           defaults   
    gst-plugins-base: 1.14.0-hbbd80ab_1             defaults    --> 1.12.4-h33fb286_0           defaults   
    gstreamer:        1.14.0-hb453b48_1             defaults    --> 1.12.4-hb53b477_0           defaults   
    numpy:            1.14.3-py36_blas_openblas_200 conda-forge [blas_openblas] --> 1.14.2-py36_nomklh2b20989_1 defaults    [nomkl]
    qt:               5.9.5-h7e424d6_0              defaults    --> 5.9.4-h4e5bff0_0            defaults

It downgrades numpy, changes netcdf4 /hdf libraries, etc etc. Kind of a mess. The notebook image size increases from 2.63GB to 3.85GB.

Not quite sure what to do...is rasterio worth it?

rabernat · 2018-05-17T19:21:57Z

I'm also seeing this message in the pip installs:

dask-kubernetes 0.3.0 has requirement kubernetes==4, but you'll have kubernetes 6.0.0 which is incompatible.
daskernetes 0.1.3 has requirement kubernetes==4, but you'll have kubernetes 6.0.0 which is incompatible.

@mrocklin, what do you advise here?

mrocklin · 2018-05-17T19:23:13Z

I'll bump the version of kubernetes on dask-kubernetes and see how it fares

jhamman · 2018-05-17T19:24:12Z

It downgrades numpy, changes netcdf4 /hdf libraries, etc etc. Kind of a mess. The notebook image size increases from 2.63GB to 3.85GB.

Not quite sure what to do...is rasterio worth it?

My 2 cents. No. Not right now. We can't easily use it with distributed so for single machine users of rasterio on pangeo, they can conda install it in their notebook image.

mrocklin · 2018-05-17T19:24:59Z

Testing dask-kuberentes here: dask/dask-kubernetes#77

…

On Thu, May 17, 2018 at 3:21 PM, Ryan Abernathey ***@***.***> wrote: I'm also seeing this message in the pip installs: dask-kubernetes 0.3.0 has requirement kubernetes==4, but you'll have kubernetes 6.0.0 which is incompatible. daskernetes 0.1.3 has requirement kubernetes==4, but you'll have kubernetes 6.0.0 which is incompatible. @mrocklin <https://github.com/mrocklin>, what do you advise here? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#261 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AASszIKWCHFEYZ2I-eKZ4sKqBdmaIGW4ks5tzc3WgaJpZM4UDgmc> .

rabernat · 2018-05-17T19:26:14Z

Thanks Matt. Do we still need both daskernetes and dask-kubernetes?

mrocklin · 2018-05-17T19:28:56Z

We should probably drop daskernetes. People with old notebooks will have those notebooks fail. We should probably consider clearing out old user's environments regardless. Everyone who logged in a long time ago costs us a small stream of credits.

…

On Thu, May 17, 2018 at 3:26 PM, Ryan Abernathey ***@***.***> wrote: Thanks Matt. Do we still need both daskernetes and dask-kubernetes? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#261 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AASszEKxhLVxUEI2HOKzokuu9Fup7g-iks5tzc7WgaJpZM4UDgmc> .

rsignell-usgs · 2018-05-17T19:47:48Z

@rabernat, to get the new advanced features of rasterio, you want the 1.0 version in the conda-forge/label/dev channel.

conda install -c conda-forge/label/dev rasterio

and then to have distributed work, currently you would need @jhamman PR

pip install git+https://github.com/jhamman/xarray.git@feature/pickle_rasterio

rabernat · 2018-05-18T14:32:35Z

I am out of ideas.

mrocklin · 2018-05-18T14:34:43Z

Same

…

On Fri, May 18, 2018 at 10:32 AM, Ryan Abernathey ***@***.***> wrote: I am out of ideas. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#261 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AASszHopBxe1le_as-WLDQJZUVpqzFLGks5tztuDgaJpZM4UDgmc> .

mrocklin · 2018-05-18T14:36:15Z

A very slow-and-painful way would be to create a docker image that was a much smaller change from what we had before, and slowly accrue changes until something breaks. We could binary search this process to find the issue within a few iterations. On Fri, May 18, 2018 at 10:34 AM, Matthew Rocklin <mrocklin@anaconda.com> wrote:

…

Same On Fri, May 18, 2018 at 10:32 AM, Ryan Abernathey < ***@***.***> wrote: > I am out of ideas. > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <#261 (comment)>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/AASszHopBxe1le_as-WLDQJZUVpqzFLGks5tztuDgaJpZM4UDgmc> > . >

rabernat · 2018-05-18T14:41:55Z

Is it really so simple? We also depend on the upstream helm charts and docker images. They might have changed too.

mrocklin · 2018-05-18T14:50:09Z

I also upgraded the helm chart but with the old image tag and things worked fine, so I think that we can safely isolate the problem to the notebook image

rabernat · 2018-05-18T15:04:32Z

Got this tip on the jupyterhub gitter channel

we fixed our issue by running 'jupyter serverextension enable --py jupyterlab --sys-prefix' in our startup script. But we're running on internal hardware and not using docker, so am not sure if the underlying problem is the same, and if the solution would work

rabernat · 2018-05-18T15:15:54Z

I added jupyter serverextension enable --py jupyterlab --sys-prefix to the notebook image, rebuilt, pushed, and helm upgraded. No luck. Same problem. Rolled back.

rabernat · 2018-05-18T15:25:23Z

Just staring at the diff trying to figure out what is different. Maybe it's nbserverproxy, which we are installing from git master. In the previous image it was 0.8.1. Now it is 0.8.3. Perhaps better to pin to a specific version.

https://github.com/jupyterhub/nbserverproxy/releases

mrocklin · 2018-05-18T15:29:04Z

If I were doing this (which, unfortunately won't be until late today, if at all this week) I would go with binary search. I would revert half of the lines changed, test it, and see if it works or not, and then revert half or change half based on the result. This is a slow process, but finishes in finite time and gives the person in charge of this process some confidence that it will converge on an answer, which I've found reduces frustration.

…

On Fri, May 18, 2018 at 11:25 AM, Ryan Abernathey ***@***.***> wrote: Just staring at the diff trying to figure out what is different. Maybe it's nbserverproxy, which we are installing from git master. In the previous image it was 0.8.1. Now it is 0.8.3. Perhaps better to pin to a specific version. https://github.com/jupyterhub/nbserverproxy/releases — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#261 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AASszH_ZXgz5v1Fl5V1yrm1qv1jby7B4ks5tzufkgaJpZM4UDgmc> .

rabernat · 2018-05-18T15:35:13Z

My nbserverproxy idea did not work either.

I don't have time to work on this any more today. I was hoping to be clever to avoid Matt's suggested debugging method. But that is probably the way to go.

This is a huge waste of time.

mrocklin · 2018-05-18T15:36:45Z

This is a huge waste of time.

Managing distributed computing environments has made me appreciate and understand the reticence I've met when asking various IT staffs to change small things for me.

rabernat · 2018-05-18T15:50:36Z

@tjcrone has reported he was able to deploy the latest helm chart with a fresh helm install on a new cluster. So this may be a problem with upgrade.

No one is using the cluster now, so I'm going to try to delete and reinstall.

tjcrone · 2018-05-18T16:30:54Z

I was unable to replicate using the latest pangeo helm chart and the configuration files from the current master, minus the auth section, the ip address, and the logo page on a fresh GCP cluster. It worked with the default k8s version, 1.8.8-gke.0, and also worked after upgrading the cluster to 1.9.7-gke.0.

rabernat · 2018-05-18T16:36:34Z

Can you clarify what do you mean by "unable to replicate"? My understanding is that it worked on a fresh cluster, correct?

yuvipanda · 2018-05-18T17:01:00Z

When I just tried to login with http://pangeo.pydata.org/ it all works for me?

My advice for dealing with situations like this is to always have a 'staging' environment that mirrors your production environment, but is smaller in resources. This lets you experiment freely without worrying about disrupting users...

yuvipanda · 2018-05-18T17:02:20Z

Also, from #261 (comment) it looks like jupyterlab is now coming from conda-forge rather than default. I don't know enough about conda, but maybe this could be an issue?

rabernat · 2018-05-18T17:04:28Z

We have rolled back. The broken image is not live.

rabernat · 2018-05-18T17:06:29Z

@yuvipanda thanks for chiming in! If you care to hop into https://gitter.im/pangeo-data/Lobby, we could save a lot of noise on this thread. I have many questions for you.

Your advice about a staging environment is good. @tjcrone effectively tried something like that, and this new image worked fined in the staging environment! So there is something very subtle going on here.

rabernat · 2018-05-18T17:39:17Z

As usual, @yuvipanda saved the day.

The problem is that I kept pushing the new notebook images to the same tag (pangeo/notebook:2018-05-17). I needed to use a fresh tag for each one.

rabernat · 2018-05-18T18:19:30Z

I have upgraded the cluster to the latest helm chart (0.1.1-ebd6b1d) which points to the new notebook image (e2cc289d8f97a81ce75e38556344299f18623fb9).

updated dockerfiles

180d47f

mrocklin reviewed May 17, 2018

View reviewed changes

rabernat added 3 commits May 17, 2018 14:16

new example

73243ec

pin versions

9a9dc56

typo

22ba8f5

jhamman reviewed May 17, 2018

View reviewed changes

rabernat added 3 commits May 17, 2018 14:35

pin more versions

2953d32

notebook

7590f5b

fix pip syntax

d1ae018

more updates and fixes

d905cd0

rabernat added 2 commits May 17, 2018 15:48

latest versions

413759e

drop daskernetes

e8e8b12

rabernat mentioned this pull request May 17, 2018

Consider continuous deployment for this Repo pangeo-data/helm-chart#5

Closed

6 tasks

trying stuff

ce0477b

trying more stuff

e2cc289

rabernat mentioned this pull request May 18, 2018

bump notebook tag using hash pangeo-data/helm-chart#26

Merged

rabernat merged commit 0293049 into pangeo-data:master May 18, 2018

rabernat mentioned this pull request Jul 18, 2018

jupyterlab extension doesn't load in latest docker image pangeo-data/helm-chart#48

Closed

Navigation Menu

updated dockerfiles #261

updated dockerfiles #261

Conversation

rabernat commented May 17, 2018

jhamman commented May 17, 2018

mrocklin May 17, 2018

Choose a reason for hiding this comment

mrocklin May 17, 2018

Choose a reason for hiding this comment

martindurant May 17, 2018

Choose a reason for hiding this comment

rabernat commented May 17, 2018

jhamman May 17, 2018

Choose a reason for hiding this comment

rabernat May 17, 2018

Choose a reason for hiding this comment

jhamman May 17, 2018

Choose a reason for hiding this comment

rabernat May 17, 2018

Choose a reason for hiding this comment

rsignell-usgs commented May 17, 2018

rabernat commented May 17, 2018

rabernat commented May 17, 2018

rabernat commented May 17, 2018

jhamman commented May 17, 2018

rsignell-usgs commented May 17, 2018 • edited

rabernat commented May 17, 2018

rabernat commented May 17, 2018

mrocklin commented May 17, 2018

jhamman commented May 17, 2018

mrocklin commented May 17, 2018 via email

rabernat commented May 17, 2018

mrocklin commented May 17, 2018 via email

rsignell-usgs commented May 17, 2018 • edited

rabernat commented May 18, 2018

mrocklin commented May 18, 2018 via email

mrocklin commented May 18, 2018 via email

rabernat commented May 18, 2018

mrocklin commented May 18, 2018

rabernat commented May 18, 2018

rabernat commented May 18, 2018

rabernat commented May 18, 2018

mrocklin commented May 18, 2018 via email

rabernat commented May 18, 2018

mrocklin commented May 18, 2018

rabernat commented May 18, 2018

tjcrone commented May 18, 2018

rabernat commented May 18, 2018

yuvipanda commented May 18, 2018

yuvipanda commented May 18, 2018

rabernat commented May 18, 2018

rabernat commented May 18, 2018

rabernat commented May 18, 2018

rabernat commented May 18, 2018

rsignell-usgs commented May 17, 2018 •

edited

rsignell-usgs commented May 17, 2018 •

edited