New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
updated dockerfiles #261
updated dockerfiles #261
Conversation
did you confirm this grabs cftime? |
gce/notebook/Dockerfile
Outdated
@@ -8,9 +8,8 @@ RUN apt-get update \ | |||
USER $NB_USER | |||
|
|||
RUN conda install --yes -c defaults -c ioam -c bokeh/channel/dev -c intake \ | |||
bokeh=0.12.15dev1 \ | |||
bokeh \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend that we continue to pin to something just to be explicit, perhaps 0.12.16
?
gce/worker/Dockerfile
Outdated
@@ -33,7 +33,7 @@ RUN pip install pyasn1 click urllib3 --upgrade | |||
|
|||
RUN pip install git+https://github.com/zarr-developers/zarr \ | |||
git+https://github.com/pydata/xarray \ | |||
git+https://github.com/dask/gcsfs@f99177b31c44fcc404619b2876a77cdcda955a75 \ | |||
git+https://github.com/dask/gcsfs \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here. We should probably pin to something.
@martindurant has there been a release since this commit? It looks like it was committed on March 22nd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, v0.1.0 on April 20
No I have not. I was waiting to get some feedback before actually doing the docker build. |
gce/notebook/Dockerfile
Outdated
@@ -37,15 +36,15 @@ RUN conda install --yes -c defaults -c ioam -c bokeh/channel/dev -c intake \ | |||
intake-xarray \ | |||
&& conda clean -tipsy | |||
|
|||
RUN pip install fusepy click jedi kubernetes --upgrade --no-cache-dir | |||
RUN pip install fusepy click jedi kubernetes gcsfs=0.1.0 --upgrade --no-cache-dir |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can safely pin xarray here to xarray=0.10.4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it better to install xarray from conda or pip?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, probably pip. The dependencies are taken care of above and the defaults channel of conda is still on 10.3.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We actually currently have xarray in both the conda and pip install lists. Is there any good reason for that? Can we safely remove it form the conda install part?
Would you be interested in including |
Cython needed to be added to get cftime to work. Not sure why. |
This is a good idea @rsignell-usgs. Are you using the exact same docker images on your AWS pangeo cluster? |
Also, should we consider adding rasterio? |
I think this would a good idea, though it will probably incur a fairly large cost in terms of the size of the docker image (gdal is a dep). Also, just for reference, xarray's rasterio backend will not work with distributed until we finish pydata/xarray#2131 so that limits the applicability here for a bit. |
@rabernat , I have these additions: https://github.com/rsignell-usgs/pangeo/blob/rsignell-aws/aws/notebook/Dockerfile |
Here is everything else that installed if I include conda-forge rasterio
It downgrades numpy, changes netcdf4 /hdf libraries, etc etc. Kind of a mess. The notebook image size increases from 2.63GB to 3.85GB. Not quite sure what to do...is rasterio worth it? |
I'm also seeing this message in the pip installs:
@mrocklin, what do you advise here? |
I'll bump the version of kubernetes on dask-kubernetes and see how it fares |
My 2 cents. No. Not right now. We can't easily use it with distributed so for single machine users of rasterio on pangeo, they can conda install it in their notebook image. |
Testing dask-kuberentes here:
dask/dask-kubernetes#77
…On Thu, May 17, 2018 at 3:21 PM, Ryan Abernathey ***@***.***> wrote:
I'm also seeing this message in the pip installs:
dask-kubernetes 0.3.0 has requirement kubernetes==4, but you'll have kubernetes 6.0.0 which is incompatible.
daskernetes 0.1.3 has requirement kubernetes==4, but you'll have kubernetes 6.0.0 which is incompatible.
@mrocklin <https://github.com/mrocklin>, what do you advise here?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#261 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AASszIKWCHFEYZ2I-eKZ4sKqBdmaIGW4ks5tzc3WgaJpZM4UDgmc>
.
|
Thanks Matt. Do we still need both daskernetes and dask-kubernetes? |
We should probably drop daskernetes. People with old notebooks will have
those notebooks fail. We should probably consider clearing out old user's
environments regardless. Everyone who logged in a long time ago costs us a
small stream of credits.
…On Thu, May 17, 2018 at 3:26 PM, Ryan Abernathey ***@***.***> wrote:
Thanks Matt. Do we still need both daskernetes and dask-kubernetes?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#261 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AASszEKxhLVxUEI2HOKzokuu9Fup7g-iks5tzc7WgaJpZM4UDgmc>
.
|
@rabernat, to get the new advanced features of
and then to have distributed work, currently you would need @jhamman PR
|
I am out of ideas. |
Same
…On Fri, May 18, 2018 at 10:32 AM, Ryan Abernathey ***@***.***> wrote:
I am out of ideas.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#261 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AASszHopBxe1le_as-WLDQJZUVpqzFLGks5tztuDgaJpZM4UDgmc>
.
|
A very slow-and-painful way would be to create a docker image that was a
much smaller change from what we had before, and slowly accrue changes
until something breaks. We could binary search this process to find the
issue within a few iterations.
On Fri, May 18, 2018 at 10:34 AM, Matthew Rocklin <mrocklin@anaconda.com>
wrote:
… Same
On Fri, May 18, 2018 at 10:32 AM, Ryan Abernathey <
***@***.***> wrote:
> I am out of ideas.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#261 (comment)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AASszHopBxe1le_as-WLDQJZUVpqzFLGks5tztuDgaJpZM4UDgmc>
> .
>
|
Is it really so simple? We also depend on the upstream helm charts and docker images. They might have changed too. |
I also upgraded the helm chart but with the old image tag and things worked fine, so I think that we can safely isolate the problem to the notebook image |
Got this tip on the jupyterhub gitter channel
|
I added |
Just staring at the diff trying to figure out what is different. Maybe it's nbserverproxy, which we are installing from git master. In the previous image it was 0.8.1. Now it is 0.8.3. Perhaps better to pin to a specific version. |
If I were doing this (which, unfortunately won't be until late today, if at
all this week) I would go with binary search. I would revert half of the
lines changed, test it, and see if it works or not, and then revert half or
change half based on the result. This is a slow process, but finishes in
finite time and gives the person in charge of this process some confidence
that it will converge on an answer, which I've found reduces frustration.
…On Fri, May 18, 2018 at 11:25 AM, Ryan Abernathey ***@***.***> wrote:
Just staring at the diff trying to figure out what is different. Maybe
it's nbserverproxy, which we are installing from git master. In the
previous image it was 0.8.1. Now it is 0.8.3. Perhaps better to pin to a
specific version.
https://github.com/jupyterhub/nbserverproxy/releases
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#261 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AASszH_ZXgz5v1Fl5V1yrm1qv1jby7B4ks5tzufkgaJpZM4UDgmc>
.
|
My nbserverproxy idea did not work either. I don't have time to work on this any more today. I was hoping to be clever to avoid Matt's suggested debugging method. But that is probably the way to go. This is a huge waste of time. |
Managing distributed computing environments has made me appreciate and understand the reticence I've met when asking various IT staffs to change small things for me. |
@tjcrone has reported he was able to deploy the latest helm chart with a fresh helm install on a new cluster. So this may be a problem with upgrade. No one is using the cluster now, so I'm going to try to delete and reinstall. |
I was unable to replicate using the latest pangeo helm chart and the configuration files from the current master, minus the auth section, the ip address, and the logo page on a fresh GCP cluster. It worked with the default k8s version, 1.8.8-gke.0, and also worked after upgrading the cluster to 1.9.7-gke.0. |
Can you clarify what do you mean by "unable to replicate"? My understanding is that it worked on a fresh cluster, correct? |
When I just tried to login with http://pangeo.pydata.org/ it all works for me? My advice for dealing with situations like this is to always have a 'staging' environment that mirrors your production environment, but is smaller in resources. This lets you experiment freely without worrying about disrupting users... |
Also, from #261 (comment) it looks like jupyterlab is now coming from conda-forge rather than default. I don't know enough about conda, but maybe this could be an issue? |
We have rolled back. The broken image is not live. |
@yuvipanda thanks for chiming in! If you care to hop into https://gitter.im/pangeo-data/Lobby, we could save a lot of noise on this thread. I have many questions for you. Your advice about a staging environment is good. @tjcrone effectively tried something like that, and this new image worked fined in the staging environment! So there is something very subtle going on here. |
As usual, @yuvipanda saved the day. The problem is that I kept pushing the new notebook images to the same tag (pangeo/notebook:2018-05-17). I needed to use a fresh tag for each one. |
I have upgraded the cluster to the latest helm chart ( |
Should fix #259.
I pinned the numpy version to make sure it is the same on both notebook and worker.
Anything else we want to throw in here?
cc @mrocklin