Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/docker #651

Merged
merged 8 commits into from
Nov 18, 2019
Merged

Conversation

mathematicalmichael
Copy link
Contributor

  • adds new Dockerfile in docker/safe/Dockerfile that can be used to turn any of the existing images into "safe" ones.
  • addresses bug wherein example notebooks were not being mounted into the containers because of omitted /polynote/ in docker run command.
    • separates example notebooks from saved ones
  • change to avoid double-mounting $(pwd)/notebooks by changing how config.yml is mounted.
  • adds docker build instructions in README

Suggestions welcome. I originally thought about adding the contents of safe to the base image and then tacking on USER ${NB_USER} onto the end of each one.

Happy to make that change if y'all think "non-root" should be the default (I personally do, but that's just my opinion, didn't want to assume).

@mathematicalmichael
Copy link
Contributor Author

mathematicalmichael commented Nov 11, 2019

I wanted to add to the main README a section like the following:


How

Check out the installation instructions (link) or try it out using the the docker images (link).


But I avoided it for this PR, hoping to settle on "how to do the docker thing" first to make it as fluid as possible. I would like it if there was a one-liner to copy/paste from the main README instead of linking to the docker/README and following instructions there, but that requires some changes, namely:

  • Could we possibly mount the default config.yml and not make that file-creation step mandatory?

${NB_USER}

# allow user access to the WORKDIR
RUN chown -R ${USER}:${USER} /opt/
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be a bit more restrictive by only changing /opt/polynote or /opt/polynote/notebooks/... thoughts?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the downsides to allowing the user to use /opt/? What would extra restrictions buy us here, as the permissions are set on the directory inside the container - or am I missing something?

docker/README.md Outdated
- ``-v `pwd`:/opt/config`` This mounts the current directory (the one you just created) into the container at `/opt/config`
- ``-v `pwd`/notebooks:/opt/notebooks`` This mounts the `./notebooks` directory on the host into the container at `/opt/notebooks`, so that any notebooks you create are saved back to the host.
- ``-v `pwd`/config.yml:/opt/config/config.yml`` This mounts the config file in the current directory (the one you just created) into the container at `/opt/config/config.yml`
- ``-v `pwd`/notebooks:/opt/polynote/notebooks/saved`` This mounts the `./notebooks` directory on the host into the container at `/opt/polynote/notebooks/saved`, so that any notebooks you create are saved back to the host, and the example notebooks inside of `/opt/polynote/notebooks/` are still visible to you.
Copy link
Collaborator

@jonathanindig jonathanindig Nov 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mathematicalmichael Are you testing this with the latest version and config as specified in the readme? I believe the instructions in the current master version of the readme are accurate as long as you're using 0.2.13 / latest release. Let me know if that's not the case.

With the storage changes, everything in the root directory is persisted to the host volume aside from changes to the examples which are not mounted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The version I was using in my tests was corresponding to 0.2.13/latest (export BASE_IMAGE=polynote/polynote:latest)

I'm sorry, I don't think I follow you on the storage changes remark. The host volume is the folder on my machine with config and notebooks in it, and the root directory is what, exactly? /opt/polynote/? Is there a desire to mount files other than the notebooks made in saved and config.yml?

I also just double-checked my config and it only has the listen lines. I'll go ahead and retry with the newer config file just in case.

Copy link
Contributor Author

@mathematicalmichael mathematicalmichael Nov 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just updated my config.yaml to have the new changes.
(really do consider adding this in by default to the docker folder) EDIT: didn't see comment below

I am unable to see examples

docker run --rm -it -e PYSPARK_ALLOW_INSECURE_GATEWAY=1 -p 127.0.0.1:8192:8192 -p 127.0.0.1:4040-4050:4040-4050 -v `pwd`/config.yml:/opt/config/config.yml -v `pwd`/notebooks:/opt/notebooks polynote-local --config /opt/config/config.yml

(I built polynote-local)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries, I think I could definitely find a better way to explain this storage stuff. #634 is the PR with a few more details on how it works.

I'm definitely able to see the examples on my end while following the current master instructions. Is it possible that you might need to docker pull polynote/polynote:latest before building polynote-local (your latest might be pointing to an older image)? Can you see the examples if you use, say, polynote/polynote:0.2.13-2.11 along with the new config? If not, it might be a bug :(

In case it helps, here's what my pwd looks like:

├── config.yml
└── notebooks
    └── my\ first\ notebook.ipynb

And heres my run command:

docker run --rm -it -p 127.0.0.1:8192:8192 -p 127.0.0.1:4040-4050:4040-4050 -v `pwd`/config.yml:/opt/config/config.yml -v `pwd`/notebooks:/opt/notebooks polynote/polynote:latest --config /opt/config/config.yml

I'll try to explain the way these storage mounts work a little better. Let's break down the config (leaving aside the docker mounts for now):

storage:
  dir: /opt/notebooks

This part means that /opt/notebooks becomes the "root" directory you see in the UI. Thus, new notebooks that you make at the root of the UI will be written to /opt/notebooks - for example, if you created a notebook called foo.ipynb, it'd show up as just foo.ipynb in the UI and would be backed by /opt/notebooks/foo.ipynb on the filesystem.

  mounts:
    examples:
      dir: examples

Here, the key examples: says to create a new "mount point" called examples, which is mounted relative to the notebook working directory (that is, relative to "root" as discussed in the previous paragraph). So in the UI, this looks like a directory named examples.

The dir: examples part says that this mount point is backed by a directory which is also called examples. Being a relative path, it expands to $POLYNOTE_WORKING_DIR/examples which is /opt/polynote/examples in this case. So if you created a new notebook called examples/foo.ipynb it'd be persisted to /opt/polynote/examples/foo.ipynb.

Another way to think of these mount points is as symlinks. So think of it like this:

storage: dir: /opt/notebooks ==> export $NOTEBOOK_DIR=/opt/notebooks.
mounts: examples: dir: examples ==> ln -s $POLYNOTE_WORKING_DIR/examples $NOTEBOOK_DIR/examples.

And then imagine Polynote starts with cd $NOTEBOOK_DIR.

Ok so let's come back to Docker. Now, if you use Docker to mount /opt/notebooks onto your host, all the notebooks you create in Polynote will be persisted to the host - except for the notebooks in examples.

Does that make sense? Let me know if there's anything that's still unclear - helps me make sure I can explain this clearly! 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh that's so helpful. I'm going to paraphrase and turn it into documentation as part of this PR. that explanation definitely needs to be in there in some form. I have done all sorts of mounting with docker before but I never quite had a config file so terse.

I have a project that I need to architect today at work but I'll come around to this at the end of the day and clean it up. thank you in advance for your patience, and thank you in retrospect for your responsiveness!

Copy link
Collaborator

@jonathanindig jonathanindig Nov 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad it was helpful, no problem! Thanks as well for sticking with this PR!

One thing I think is important to clarify here is that the Storage Mounts I'm referring to is a Polynote feature and nothing to do with Docker.

It's a bit confusing because Polynote and Docker now both have a concept called "mounts", but they are not related.

The reason we care about this in relation to Docker is because Polynote's Storage Mounts tell Polynote where to look for files, and Docker mounts tell Docker which files to make available from the host, so as a user if I have files on my local machine I want Polynote to look at they need to be Docker-mounted into the container and then Polynote-mounted (using the config) so Polynote can find them.


We tag these images with the name `polynote-local`.

## Base image
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might want to mention the dev dockerfile https://github.com/polynote/polynote/blob/master/docker/dev/Dockerfile as well. It has some instructions inside that I guess we could pull out into the readme if that's preferred.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually had a hard time understanding those instructions. How is the dev one built?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry 😞 Let me see whether I can do a better job explaining this too.

The dev docker image is based on a locally-created Polynote distribution (tarball). So, in order to build the dev docker image, you need to first create the Polynote distribution using sbt dist.

After that command completed, the distribution tarball lives in target/scala-2.11. Since docker build will send the entire current directory as build context to the daemon, it's best not to run it in the root directory of the project. So, we want to cd over to that directory.

Finally we want to run the command. So, we run docker build and point to the dev Dockerfile.

Putting it all together:

git clone https://github.com/polynote/polynote.git
cd polynote/
sbt dist
cd target/scala-2.11/
docker build -t polynote:dev -f ../../docker/dev/Dockerfile .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's somewhat along the lines of what I was guessing from the comments, but it helps so much to see it fleshed out. The confusion stems from total lack of familiarity with what sbt is/does. And also maybe the fact that I haven't worked with Scala or Spark, so I've never had to build things outside of the Dockerfile before running docker build, and I imagine I'm not alone in that respect (perhaps it helps to clarify that my background is in mathematics, I've mostly done ml research but now am in consulting, picked up dev ops literally as a hobby but it's proven useful at the office).

Why does it need to be built outside of docker for dev but not base?

Copy link
Collaborator

@jonathanindig jonathanindig Nov 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't worry about changing dev - since it's never published and meant for local usage it's probably better to make it simple.

sbt is a build tool popular among Scala developers. It's used to build jars (amongst other things).

docker build is used to create Docker containers, so it's unrelated to the applications you put inside the container. (Of course, you could run sbt inside your Dockerfile if you wanted to, since you can run anything inside your Dockerfile).

Why does it need to be built outside of docker for dev but not base?

Every time we make a release we have GitHub CI build the Polynote tarballs. This process uses sbt. When base is built, it just downloads those tarballs, so there's no need for you to run sbt yourself.

However, dev is meant to be used while developing on Polynote, so it can't use those pre-built binaries. You need to build your own binaries (in order for the changes you've been making locally to be reflected). Thus, you need to use sbt dist to build a local version of the binaries, which then get copied into the dev docker image that you use locally for testing your changes.

@jonathanindig
Copy link
Collaborator

@mathematicalmichael Thanks for this PR! Since I think the main point of this is to be publishing safer images, I think there's no need for the safe image. Would it work to just add the user stuff into base directly?

@jonathanindig
Copy link
Collaborator

I would like it if there was a one-liner to copy/paste from the main README instead of linking to the docker/README and following instructions there

I think it would be great to add a link to the Docker README in the main README. We have some warnings and information in the Docker README that I'd prefer to keep around and hope that users will read before copy/pasting, so I'd rather not put those instructions in the main README. We can't force people to read and be careful but at least we can make it that much more likely to happen.

Could we possibly mount the default config.yml and not make that file-creation step mandatory?

The same reasoning applies here - I don't want to distribute a dockerfile that listens on 0.0.0.0 if I can help it. I think I'd rather attempt to get people to be thoughtful about running Polynote (if they want to do anything useful they need to mount notebooks anyways so hopefully forcing them to set aside directories, configs, etc. isn't too much work).

I understand that this adds some friction to the setup process and that probably turns some people off trying it. We can probably revisit these decisions once we have proper authentication and other security measures in place, but for now I'd rather err on the side of caution. Hopefully that makes sense 😄

@mathematicalmichael
Copy link
Contributor Author

@mathematicalmichael Thanks for this PR! Since I think the main point of this is to be publishing safer images, I think there's no need for the safe image. Would it work to just add the user stuff into base directly?

I think that's preferable, but I didn't want to "force" it on you. If you're happy with that direction, I'll go ahead and throw the contents of docker/safe/Dockerfile to the bottom of docker/base/Dockerfile and then do my best to make the requisite changes to the other versions.

I may have to come back to this tomorrow btw, so if I go off-grid suddenly , know I'll be back.

@mathematicalmichael
Copy link
Contributor Author

mathematicalmichael commented Nov 11, 2019

I would like it if there was a one-liner to copy/paste from the main README instead of linking to the docker/README and following instructions there

I think it would be great to add a link to the Docker README in the main README. We have some warnings and information in the Docker README that I'd prefer to keep around and hope that users will read before copy/pasting, so I'd rather not put those instructions in the main README. We can't force people to read and be careful but at least we can make it that much more likely to happen.

Could we possibly mount the default config.yml and not make that file-creation step mandatory?

The same reasoning applies here - I don't want to distribute a dockerfile that listens on 0.0.0.0 if I can help it. I think I'd rather attempt to get people to be thoughtful about running Polynote (if they want to do anything useful they need to mount notebooks anyways so hopefully forcing them to set aside directories, configs, etc. isn't too much work).

I understand that this adds some friction to the setup process and that probably turns some people off trying it. We can probably revisit these decisions once we have proper authentication and other security measures in place, but for now I'd rather err on the side of caution. Hopefully that makes sense 😄

thanks so much for explaining. yes, that makes perfect sense. I'll keep this in mind when writing documentation.

regarding lowering friction for those attempting to "try it out"... I'm working on it (unsuccessfully so far): https://github.com/mathematicalmichael/polynote-binder

In theory, it should be possible to get this to run on binder.
Does that undermine spark? Definitely somewhat. But from what I can tell, the changes to notebooks in Python alone are worth it (where can I find info on installing new language kernels? it's not clear they're supported but implied with the wording "polyglot"... though I assume it wasn't called "language agnostic" so maybe that was my own presumption).

@jonathanindig
Copy link
Collaborator

I may have to come back to this tomorrow btw, so if I go off-grid suddenly , know I'll be back.

@mathematicalmichael No worries 😄 If it's easier for you we can debug together on Gitter as well.

@jonathanindig
Copy link
Collaborator

I think that's preferable, but I didn't want to "force" it on you.

Much appreciated! In this case I agree with you that non-root should be the default. Thanks!

@mathematicalmichael
Copy link
Contributor Author

mathematicalmichael commented Nov 12, 2019

tell me if the docs need further work. ditto for Dockerfiles.

I tested the base build and it works perfectly (tested create/remove files, persistence, install python dependency in notebook and access it), about to try out the other two...

@mathematicalmichael
Copy link
Contributor Author

mathematicalmichael commented Nov 12, 2019

I'm also testing that the same run command can be used for all three by wrapping it in a bash script:

export NB_IMAGE=$(docker images | fzf | awk '{ print $1 ":" $2 }')
if [[ -n ${NB_IMAGE} ]]; then
    docker run --rm -it -e PYSPARK_ALLOW_INSECURE_GATEWAY=1 \
    -p 127.0.0.1:8192:8192 -p 127.0.0.1:4040-4050:4040-4050 \
    -v `pwd`/config.yml:/opt/config/config.yml \
    -v `pwd`/notebooks:/opt/notebooks $NB_IMAGE \
    --config /opt/config/config.yml;
else
    echo "nothing selected.";
fi

@mathematicalmichael
Copy link
Contributor Author

mathematicalmichael commented Nov 12, 2019

still TODO:

  • validate stylistic choices (in Dockerfile/README)

  • give README another read for clarity

  • Add mention of Docker to main project README

  • validate new spark-2.4 image

  • validate new dev image

  • validate new base image

@mathematicalmichael
Copy link
Contributor Author

trying to validate the dev image... what do I need to be able to run sbt?

docker/README.md Outdated
> The Polynote installation comes with some example files, which we mount into `/opt/notebooks`,
> allowing you to see the examples and try them out without having to make copies.
> Note that changes to the examples will not be persisted onto the host if you follow the instructions given.
> Only files in the `notebooks` folder visible in the sidebar will persist.
Copy link
Collaborator

@jonathanindig jonathanindig Nov 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, you shouldn't see a notebooks folder if it's configured properly. Everything in the sidebar should be persisted except for changes made in the examples directory (as seen in the sidebar).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch! Left over from the documentation changes from earlier. still haven't gone through README again (as per the TODO).

@jonathanindig
Copy link
Collaborator

trying to validate the dev image... what do I need to be able to run sbt?

On Mac you should be able to just install with brew install sbt. Otherwise I'd consult their website

@mathematicalmichael
Copy link
Contributor Author

mathematicalmichael commented Nov 13, 2019

trying to validate the dev image... what do I need to be able to run sbt?

On Mac you should be able to just install with brew install sbt. Otherwise I'd consult their website

I actually tend towards avoiding installing software on my mac, putting as much in docker as possible.
sbt required adoptopenjdk and I'm likely to uninstall them as soon as I finish the test.

This brings up a natural question... Why can't this also take place inside of Docker?


EDIT: Turns out nothing's stopping it...

~~docker run --rm -ti -v `pwd`:/workdir/ --workdir /workdir spikerlabs/scala-sbt sbt dist~~

EDIT 2: failed =( ... gonna keep trying.

EDIT 3: this seems to be progressing... work!

docker run --rm -ti -v `pwd`:/workdir --workdir /workdir intenthq/sbt-alpine it:dist

EDIT 4:
Now following the dev instructions as written in the README, fingers crossed.

@mathematicalmichael
Copy link
Contributor Author

mathematicalmichael commented Nov 13, 2019

just keeping logs for myself here...

the build instructions in the comment above lead to an unzipped file. So, as it stood, the build failed. I just kicked it off again. Since your dockerfile has a .gz, I ran gzip polynote-dist.tar and the build has picked up from there.

@mathematicalmichael
Copy link
Contributor Author

aw shucks. well, the dev image does start up successfully but I get all sorts of errors when I attempt to connect through my browser. So, I assume that's because of how the package was built.

Shall I try the true local approach and to sbt dist with brew's install, or are you willing to try out the instructions on my behalf? I'd really like to get it built inside a docker container (and it looks like I was able to), but my lack of Java experience really prevents me from debugging this...

That is to say, I am under the impression that this PR as-is does actually work for the development version. I saw Polynote start up. The problem is probably with how I built it, because I tried it without the safeuser as well and got the exact same errors.

I pushed the changes to the README ... how should I proceed with testing the dev? Go with brew? You try it? Or just trust it since it seems my errors had nothing to do with the new user (aka check off the TODO item for testing dev with no additional work from either of us)?

I would love to get this built inside of docker, but would need your help trying to find an appropriate linux image in which to build spark (which I'll happily add to the docker/README).

@mathematicalmichael
Copy link
Contributor Author

going to retry with this docker image: https://medium.com/@yzhong.cs/getting-started-with-docker-scala-sbt-d91f8ac22f5f

@mathematicalmichael
Copy link
Contributor Author

confirmed, works with own build of Spark.

@jonathanindig
Copy link
Collaborator

Looks great, thanks @mathematicalmichael !

@jonathanindig jonathanindig merged commit 5e4a611 into polynote:master Nov 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants