-
Notifications
You must be signed in to change notification settings - Fork 393
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/docker #651
Feature/docker #651
Conversation
I wanted to add to the main HowCheck out the installation instructions (link) or try it out using the the docker images (link). But I avoided it for this PR, hoping to settle on "how to do the docker thing" first to make it as fluid as possible. I would like it if there was a one-liner to copy/paste from the main README instead of linking to the docker/README and following instructions there, but that requires some changes, namely:
|
docker/safe/Dockerfile
Outdated
${NB_USER} | ||
|
||
# allow user access to the WORKDIR | ||
RUN chown -R ${USER}:${USER} /opt/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can be a bit more restrictive by only changing /opt/polynote
or /opt/polynote/notebooks/
... thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the downsides to allowing the user to use /opt/
? What would extra restrictions buy us here, as the permissions are set on the directory inside the container - or am I missing something?
docker/README.md
Outdated
- ``-v `pwd`:/opt/config`` This mounts the current directory (the one you just created) into the container at `/opt/config` | ||
- ``-v `pwd`/notebooks:/opt/notebooks`` This mounts the `./notebooks` directory on the host into the container at `/opt/notebooks`, so that any notebooks you create are saved back to the host. | ||
- ``-v `pwd`/config.yml:/opt/config/config.yml`` This mounts the config file in the current directory (the one you just created) into the container at `/opt/config/config.yml` | ||
- ``-v `pwd`/notebooks:/opt/polynote/notebooks/saved`` This mounts the `./notebooks` directory on the host into the container at `/opt/polynote/notebooks/saved`, so that any notebooks you create are saved back to the host, and the example notebooks inside of `/opt/polynote/notebooks/` are still visible to you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mathematicalmichael Are you testing this with the latest version and config as specified in the readme? I believe the instructions in the current master version of the readme are accurate as long as you're using 0.2.13
/ latest
release. Let me know if that's not the case.
With the storage changes, everything in the root directory is persisted to the host volume aside from changes to the examples which are not mounted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The version I was using in my tests was corresponding to 0.2.13/latest
(export BASE_IMAGE=polynote/polynote:latest
)
I'm sorry, I don't think I follow you on the storage changes remark. The host volume is the folder on my machine with config
and notebooks
in it, and the root directory is what, exactly? /opt/polynote/
? Is there a desire to mount files other than the notebooks made in saved
and config.yml
?
I also just double-checked my config and it only has the listen
lines. I'll go ahead and retry with the newer config file just in case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just updated my config.yaml to have the new changes.
(really do consider adding this in by default to the docker folder) EDIT: didn't see comment below
I am unable to see examples
docker run --rm -it -e PYSPARK_ALLOW_INSECURE_GATEWAY=1 -p 127.0.0.1:8192:8192 -p 127.0.0.1:4040-4050:4040-4050 -v `pwd`/config.yml:/opt/config/config.yml -v `pwd`/notebooks:/opt/notebooks polynote-local --config /opt/config/config.yml
(I built polynote-local
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries, I think I could definitely find a better way to explain this storage stuff. #634 is the PR with a few more details on how it works.
I'm definitely able to see the examples on my end while following the current master instructions. Is it possible that you might need to docker pull polynote/polynote:latest
before building polynote-local
(your latest
might be pointing to an older image)? Can you see the examples if you use, say, polynote/polynote:0.2.13-2.11
along with the new config? If not, it might be a bug :(
In case it helps, here's what my pwd
looks like:
├── config.yml
└── notebooks
└── my\ first\ notebook.ipynb
And heres my run command:
docker run --rm -it -p 127.0.0.1:8192:8192 -p 127.0.0.1:4040-4050:4040-4050 -v `pwd`/config.yml:/opt/config/config.yml -v `pwd`/notebooks:/opt/notebooks polynote/polynote:latest --config /opt/config/config.yml
I'll try to explain the way these storage mounts work a little better. Let's break down the config (leaving aside the docker mounts for now):
storage:
dir: /opt/notebooks
This part means that /opt/notebooks
becomes the "root" directory you see in the UI. Thus, new notebooks that you make at the root of the UI will be written to /opt/notebooks
- for example, if you created a notebook called foo.ipynb
, it'd show up as just foo.ipynb
in the UI and would be backed by /opt/notebooks/foo.ipynb
on the filesystem.
mounts:
examples:
dir: examples
Here, the key examples:
says to create a new "mount point" called examples
, which is mounted relative to the notebook working directory (that is, relative to "root" as discussed in the previous paragraph). So in the UI, this looks like a directory named examples
.
The dir: examples
part says that this mount point is backed by a directory which is also called examples
. Being a relative path, it expands to $POLYNOTE_WORKING_DIR/examples
which is /opt/polynote/examples
in this case. So if you created a new notebook called examples/foo.ipynb
it'd be persisted to /opt/polynote/examples/foo.ipynb
.
Another way to think of these mount points is as symlinks. So think of it like this:
storage: dir: /opt/notebooks
==> export $NOTEBOOK_DIR=/opt/notebooks
.
mounts: examples: dir: examples
==> ln -s $POLYNOTE_WORKING_DIR/examples $NOTEBOOK_DIR/examples
.
And then imagine Polynote starts with cd $NOTEBOOK_DIR
.
Ok so let's come back to Docker. Now, if you use Docker to mount /opt/notebooks
onto your host, all the notebooks you create in Polynote will be persisted to the host - except for the notebooks in examples
.
Does that make sense? Let me know if there's anything that's still unclear - helps me make sure I can explain this clearly! 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh that's so helpful. I'm going to paraphrase and turn it into documentation as part of this PR. that explanation definitely needs to be in there in some form. I have done all sorts of mounting with docker before but I never quite had a config file so terse.
I have a project that I need to architect today at work but I'll come around to this at the end of the day and clean it up. thank you in advance for your patience, and thank you in retrospect for your responsiveness!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Glad it was helpful, no problem! Thanks as well for sticking with this PR!
One thing I think is important to clarify here is that the Storage Mounts
I'm referring to is a Polynote feature and nothing to do with Docker.
It's a bit confusing because Polynote and Docker now both have a concept called "mounts", but they are not related.
The reason we care about this in relation to Docker is because Polynote's Storage Mounts
tell Polynote where to look for files, and Docker mounts tell Docker which files to make available from the host, so as a user if I have files on my local machine I want Polynote to look at they need to be Docker-mounted into the container and then Polynote-mounted (using the config) so Polynote can find them.
|
||
We tag these images with the name `polynote-local`. | ||
|
||
## Base image |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might want to mention the dev dockerfile https://github.com/polynote/polynote/blob/master/docker/dev/Dockerfile as well. It has some instructions inside that I guess we could pull out into the readme if that's preferred.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually had a hard time understanding those instructions. How is the dev
one built?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry 😞 Let me see whether I can do a better job explaining this too.
The dev
docker image is based on a locally-created Polynote distribution (tarball). So, in order to build the dev
docker image, you need to first create the Polynote distribution using sbt dist
.
After that command completed, the distribution tarball lives in target/scala-2.11
. Since docker build
will send the entire current directory as build context to the daemon, it's best not to run it in the root directory of the project. So, we want to cd
over to that directory.
Finally we want to run the command. So, we run docker build
and point to the dev
Dockerfile.
Putting it all together:
git clone https://github.com/polynote/polynote.git
cd polynote/
sbt dist
cd target/scala-2.11/
docker build -t polynote:dev -f ../../docker/dev/Dockerfile .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's somewhat along the lines of what I was guessing from the comments, but it helps so much to see it fleshed out. The confusion stems from total lack of familiarity with what sbt
is/does. And also maybe the fact that I haven't worked with Scala or Spark, so I've never had to build things outside of the Dockerfile before running docker build
, and I imagine I'm not alone in that respect (perhaps it helps to clarify that my background is in mathematics, I've mostly done ml research but now am in consulting, picked up dev ops literally as a hobby but it's proven useful at the office).
Why does it need to be built outside of docker for dev
but not base
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't worry about changing dev
- since it's never published and meant for local usage it's probably better to make it simple.
sbt
is a build tool popular among Scala developers. It's used to build jars (amongst other things).
docker build
is used to create Docker containers, so it's unrelated to the applications you put inside the container. (Of course, you could run sbt
inside your Dockerfile
if you wanted to, since you can run anything inside your Dockerfile
).
Why does it need to be built outside of docker for dev but not base?
Every time we make a release we have GitHub CI build the Polynote tarballs. This process uses sbt
. When base
is built, it just downloads those tarballs, so there's no need for you to run sbt
yourself.
However, dev
is meant to be used while developing on Polynote, so it can't use those pre-built binaries. You need to build your own binaries (in order for the changes you've been making locally to be reflected). Thus, you need to use sbt dist
to build a local version of the binaries, which then get copied into the dev
docker image that you use locally for testing your changes.
@mathematicalmichael Thanks for this PR! Since I think the main point of this is to be publishing safer images, I think there's no need for the |
I think it would be great to add a link to the Docker README in the main README. We have some warnings and information in the Docker README that I'd prefer to keep around and hope that users will read before copy/pasting, so I'd rather not put those instructions in the main README. We can't force people to read and be careful but at least we can make it that much more likely to happen.
The same reasoning applies here - I don't want to distribute a dockerfile that listens on 0.0.0.0 if I can help it. I think I'd rather attempt to get people to be thoughtful about running Polynote (if they want to do anything useful they need to mount notebooks anyways so hopefully forcing them to set aside directories, configs, etc. isn't too much work). I understand that this adds some friction to the setup process and that probably turns some people off trying it. We can probably revisit these decisions once we have proper authentication and other security measures in place, but for now I'd rather err on the side of caution. Hopefully that makes sense 😄 |
I think that's preferable, but I didn't want to "force" it on you. If you're happy with that direction, I'll go ahead and throw the contents of I may have to come back to this tomorrow btw, so if I go off-grid suddenly , know I'll be back. |
thanks so much for explaining. yes, that makes perfect sense. I'll keep this in mind when writing documentation. regarding lowering friction for those attempting to "try it out"... I'm working on it (unsuccessfully so far): https://github.com/mathematicalmichael/polynote-binder In theory, it should be possible to get this to run on binder. |
@mathematicalmichael No worries 😄 If it's easier for you we can debug together on Gitter as well. |
Much appreciated! In this case I agree with you that non-root should be the default. Thanks! |
tell me if the docs need further work. ditto for Dockerfiles. I tested the |
I'm also testing that the same
|
still TODO:
|
trying to validate the |
docker/README.md
Outdated
> The Polynote installation comes with some example files, which we mount into `/opt/notebooks`, | ||
> allowing you to see the examples and try them out without having to make copies. | ||
> Note that changes to the examples will not be persisted onto the host if you follow the instructions given. | ||
> Only files in the `notebooks` folder visible in the sidebar will persist. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, you shouldn't see a notebooks
folder if it's configured properly. Everything in the sidebar should be persisted except for changes made in the examples
directory (as seen in the sidebar).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch! Left over from the documentation changes from earlier. still haven't gone through README again (as per the TODO).
On Mac you should be able to just install with |
I actually tend towards avoiding installing software on my mac, putting as much in docker as possible. This brings up a natural question... Why can't this also take place inside of Docker? EDIT: Turns out nothing's stopping it...
EDIT 2: failed =( ... gonna keep trying. EDIT 3: this seems to
EDIT 4: |
just keeping logs for myself here... the build instructions in the comment above lead to an unzipped file. So, as it stood, the build failed. I just kicked it off again. Since your dockerfile has a |
aw shucks. well, the dev image does start up successfully but I get all sorts of errors when I attempt to connect through my browser. So, I assume that's because of how the package was built. Shall I try the true local approach and to That is to say, I am under the impression that this PR as-is does actually work for the development version. I saw Polynote start up. The problem is probably with how I built it, because I tried it without the I pushed the changes to the I would love to get this built inside of docker, but would need your help trying to find an appropriate linux image in which to build spark (which I'll happily add to the |
going to retry with this docker image: https://medium.com/@yzhong.cs/getting-started-with-docker-scala-sbt-d91f8ac22f5f |
confirmed, works with own build of Spark. |
Looks great, thanks @mathematicalmichael ! |
docker/safe/Dockerfile
that can be used to turn any of the existing images into "safe" ones./polynote/
indocker run
command.$(pwd)/notebooks
by changing howconfig.yml
is mounted.README
Suggestions welcome. I originally thought about adding the contents of
safe
to thebase
image and then tacking onUSER ${NB_USER}
onto the end of each one.