Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

updated dockerfiles #261

Merged
merged 14 commits into from May 18, 2018
Merged

Conversation

rabernat
Copy link
Member

Should fix #259.

I pinned the numpy version to make sure it is the same on both notebook and worker.

Anything else we want to throw in here?

cc @mrocklin

@jhamman
Copy link
Member

jhamman commented May 17, 2018

did you confirm this grabs cftime?

@@ -8,9 +8,8 @@ RUN apt-get update \
USER $NB_USER

RUN conda install --yes -c defaults -c ioam -c bokeh/channel/dev -c intake \
bokeh=0.12.15dev1 \
bokeh \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend that we continue to pin to something just to be explicit, perhaps 0.12.16 ?

@@ -33,7 +33,7 @@ RUN pip install pyasn1 click urllib3 --upgrade

RUN pip install git+https://github.com/zarr-developers/zarr \
git+https://github.com/pydata/xarray \
git+https://github.com/dask/gcsfs@f99177b31c44fcc404619b2876a77cdcda955a75 \
git+https://github.com/dask/gcsfs \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. We should probably pin to something.

@martindurant has there been a release since this commit? It looks like it was committed on March 22nd

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, v0.1.0 on April 20

@rabernat
Copy link
Member Author

did you confirm this grabs cftime?

No I have not. I was waiting to get some feedback before actually doing the docker build.

@@ -37,15 +36,15 @@ RUN conda install --yes -c defaults -c ioam -c bokeh/channel/dev -c intake \
intake-xarray \
&& conda clean -tipsy

RUN pip install fusepy click jedi kubernetes --upgrade --no-cache-dir
RUN pip install fusepy click jedi kubernetes gcsfs=0.1.0 --upgrade --no-cache-dir
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can safely pin xarray here to xarray=0.10.4

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it better to install xarray from conda or pip?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, probably pip. The dependencies are taken care of above and the defaults channel of conda is still on 10.3.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually currently have xarray in both the conda and pip install lists. Is there any good reason for that? Can we safely remove it form the conda install part?

@rsignell-usgs
Copy link
Member

Would you be interested in including s3fs to facilitate across-cloud testing?

@rabernat
Copy link
Member Author

Cython needed to be added to get cftime to work. Not sure why.

@rabernat
Copy link
Member Author

Would you be interested in including s3fs to facilitate across-cloud testing?

This is a good idea @rsignell-usgs. Are you using the exact same docker images on your AWS pangeo cluster?

@rabernat
Copy link
Member Author

Also, should we consider adding rasterio?

@jhamman
Copy link
Member

jhamman commented May 17, 2018

Also, should we consider adding rasterio?

I think this would a good idea, though it will probably incur a fairly large cost in terms of the size of the docker image (gdal is a dep). Also, just for reference, xarray's rasterio backend will not work with distributed until we finish pydata/xarray#2131 so that limits the applicability here for a bit.

@rsignell-usgs
Copy link
Member

rsignell-usgs commented May 17, 2018

@rabernat
Copy link
Member Author

Here is everything else that installed if I include conda-forge rasterio

The following NEW packages will be INSTALLED:

    affine:           2.2.0-py_0                    conda-forge
    boost:            1.66.0-py36_1                 conda-forge
    boost-cpp:        1.66.0-1                      conda-forge
    boto3:            1.7.22-py_0                   conda-forge
    botocore:         1.10.22-py_0                  conda-forge
    cairo:            1.14.12-h77bcde2_0            defaults   
    click-plugins:    1.0.3-py36_0                  conda-forge
    cligj:            0.4.0-py36_0                  conda-forge
    docutils:         0.14-py36_0                   conda-forge
    freexl:           1.0.5-0                       conda-forge
    gdal:             2.2.2-py36hc209d97_1          defaults   
    geos:             3.6.2-1                       conda-forge
    giflib:           5.1.4-0                       conda-forge
    jmespath:         0.9.3-py36_0                  conda-forge
    json-c:           0.12.1-0                      conda-forge
    kealib:           1.4.7-4                       conda-forge
    krb5:             1.14.6-0                      conda-forge
    libdap4:          3.19.2-1                      conda-forge
    libgdal:          2.2.2-h804cdde_1              defaults   
    libgfortran:      3.0.0-1                       defaults   
    libiconv:         1.15-0                        conda-forge
    libkml:           1.3.0-6                       conda-forge
    libpq:            9.6.6-h4e02ad2_0              defaults   
    libspatialite:    4.3.0a-h72746d6_18            defaults   
    openjpeg:         2.3.0-2                       conda-forge
    pixman:           0.34.0-2                      conda-forge
    poppler:          0.60.1-hc909a00_0             defaults   
    poppler-data:     0.4.9-0                       conda-forge
    proj4:            4.9.3-5                       conda-forge
    rasterio:         0.36.0-py36_3                 conda-forge
    s3transfer:       0.1.13-py36_0                 conda-forge
    snuggs:           1.4.1-py36_0                  conda-forge
    util-linux:       2.21-0                        defaults   
    xerces-c:         3.2.1-0                       conda-forge

The following packages will be SUPERSEDED by a higher-priority channel:

    hdf5:             1.10.2-hba1933b_1             defaults    --> 1.10.1-2                    conda-forge
    libnetcdf:        4.6.1-h13459d8_0              defaults    --> 4.4.1.1-10                  conda-forge
    netcdf4:          1.4.0-py36hd4b6044_1          defaults    --> 1.3.1-py36_1                conda-forge

The following packages will be DOWNGRADED:

    blas:             1.1-openblas                  conda-forge --> 1.0-mkl                     defaults   
    dbus:             1.13.2-h714fa37_1             defaults    --> 1.13.2-hc3f9b76_0           defaults   
    glib:             2.56.1-h000015b_0             defaults    --> 2.53.6-h5d9569c_2           defaults   
    gst-plugins-base: 1.14.0-hbbd80ab_1             defaults    --> 1.12.4-h33fb286_0           defaults   
    gstreamer:        1.14.0-hb453b48_1             defaults    --> 1.12.4-hb53b477_0           defaults   
    numpy:            1.14.3-py36_blas_openblas_200 conda-forge [blas_openblas] --> 1.14.2-py36_nomklh2b20989_1 defaults    [nomkl]
    qt:               5.9.5-h7e424d6_0              defaults    --> 5.9.4-h4e5bff0_0            defaults   

It downgrades numpy, changes netcdf4 /hdf libraries, etc etc. Kind of a mess. The notebook image size increases from 2.63GB to 3.85GB.

Not quite sure what to do...is rasterio worth it?

@rabernat
Copy link
Member Author

I'm also seeing this message in the pip installs:

dask-kubernetes 0.3.0 has requirement kubernetes==4, but you'll have kubernetes 6.0.0 which is incompatible.
daskernetes 0.1.3 has requirement kubernetes==4, but you'll have kubernetes 6.0.0 which is incompatible.

@mrocklin, what do you advise here?

@mrocklin
Copy link
Member

I'll bump the version of kubernetes on dask-kubernetes and see how it fares

@jhamman
Copy link
Member

jhamman commented May 17, 2018

It downgrades numpy, changes netcdf4 /hdf libraries, etc etc. Kind of a mess. The notebook image size increases from 2.63GB to 3.85GB.

Not quite sure what to do...is rasterio worth it?

My 2 cents. No. Not right now. We can't easily use it with distributed so for single machine users of rasterio on pangeo, they can conda install it in their notebook image.

@mrocklin
Copy link
Member

mrocklin commented May 17, 2018 via email

@rabernat
Copy link
Member Author

Thanks Matt. Do we still need both daskernetes and dask-kubernetes?

@mrocklin
Copy link
Member

mrocklin commented May 17, 2018 via email

@rsignell-usgs
Copy link
Member

rsignell-usgs commented May 17, 2018

@rabernat, to get the new advanced features of rasterio, you want the 1.0 version in the conda-forge/label/dev channel.

conda install -c conda-forge/label/dev rasterio

and then to have distributed work, currently you would need @jhamman PR

pip install git+https://github.com/jhamman/xarray.git@feature/pickle_rasterio

@rabernat
Copy link
Member Author

I am out of ideas.

@mrocklin
Copy link
Member

mrocklin commented May 18, 2018 via email

@mrocklin
Copy link
Member

mrocklin commented May 18, 2018 via email

@rabernat
Copy link
Member Author

Is it really so simple? We also depend on the upstream helm charts and docker images. They might have changed too.

@mrocklin
Copy link
Member

I also upgraded the helm chart but with the old image tag and things worked fine, so I think that we can safely isolate the problem to the notebook image

@rabernat
Copy link
Member Author

Got this tip on the jupyterhub gitter channel

we fixed our issue by running 'jupyter serverextension enable --py jupyterlab --sys-prefix' in our startup script. But we're running on internal hardware and not using docker, so am not sure if the underlying problem is the same, and if the solution would work

@rabernat
Copy link
Member Author

I added jupyter serverextension enable --py jupyterlab --sys-prefix to the notebook image, rebuilt, pushed, and helm upgraded. No luck. Same problem. Rolled back.

@rabernat
Copy link
Member Author

Just staring at the diff trying to figure out what is different. Maybe it's nbserverproxy, which we are installing from git master. In the previous image it was 0.8.1. Now it is 0.8.3. Perhaps better to pin to a specific version.

https://github.com/jupyterhub/nbserverproxy/releases

@mrocklin
Copy link
Member

mrocklin commented May 18, 2018 via email

@rabernat
Copy link
Member Author

My nbserverproxy idea did not work either.

I don't have time to work on this any more today. I was hoping to be clever to avoid Matt's suggested debugging method. But that is probably the way to go.

This is a huge waste of time.

@mrocklin
Copy link
Member

This is a huge waste of time.

Managing distributed computing environments has made me appreciate and understand the reticence I've met when asking various IT staffs to change small things for me.

@rabernat
Copy link
Member Author

@tjcrone has reported he was able to deploy the latest helm chart with a fresh helm install on a new cluster. So this may be a problem with upgrade.

No one is using the cluster now, so I'm going to try to delete and reinstall.

@tjcrone
Copy link
Contributor

tjcrone commented May 18, 2018

I was unable to replicate using the latest pangeo helm chart and the configuration files from the current master, minus the auth section, the ip address, and the logo page on a fresh GCP cluster. It worked with the default k8s version, 1.8.8-gke.0, and also worked after upgrading the cluster to 1.9.7-gke.0.

@rabernat
Copy link
Member Author

Can you clarify what do you mean by "unable to replicate"? My understanding is that it worked on a fresh cluster, correct?

@yuvipanda
Copy link
Member

When I just tried to login with http://pangeo.pydata.org/ it all works for me?

My advice for dealing with situations like this is to always have a 'staging' environment that mirrors your production environment, but is smaller in resources. This lets you experiment freely without worrying about disrupting users...

@yuvipanda
Copy link
Member

Also, from #261 (comment) it looks like jupyterlab is now coming from conda-forge rather than default. I don't know enough about conda, but maybe this could be an issue?

@rabernat
Copy link
Member Author

We have rolled back. The broken image is not live.

@rabernat
Copy link
Member Author

@yuvipanda thanks for chiming in! If you care to hop into https://gitter.im/pangeo-data/Lobby, we could save a lot of noise on this thread. I have many questions for you.

Your advice about a staging environment is good. @tjcrone effectively tried something like that, and this new image worked fined in the staging environment! So there is something very subtle going on here.

@rabernat
Copy link
Member Author

As usual, @yuvipanda saved the day.

The problem is that I kept pushing the new notebook images to the same tag (pangeo/notebook:2018-05-17). I needed to use a fresh tag for each one.

@rabernat
Copy link
Member Author

I have upgraded the cluster to the latest helm chart (0.1.1-ebd6b1d) which points to the new notebook image (e2cc289d8f97a81ce75e38556344299f18623fb9).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

datashader import error
7 participants