Support for git backed notebook provider.
local_path
: where the git repo is located on the local drive; no default - requiredremote
: optional, remote address, if specified, the remote will be cloned when the provider is initialized; if not given, a repo will be created at thelocal_path
branch
: optional; defaultmaster
authentication.key_file
: path to the key file; only applies to SSHauthentication.key_file_passphrase
: passphrase for the key file; not required; no defaultauthentication.username
: repository username; HTTP(S) onlyauthentication.password
: repository password
The provider supports HTTP, HTTPS and SSH protocols. The remote URI needs to be in the following format, for SSH:
ssh://user@example.com/repo.git
P.S. For gitub use this: ssh://git@github.com/some-github-organization/some-repo.git
For HTTP / HTTPS:
http(s)://example.com/repo.git
The provider first checks for key_file
, if none specified, it will look for the username
and password
. If none given, the provider assumes no authentication.
If key_file
is specified but the file does not exist, the provider will fail to initialize.
If the key_file
is protected with the passphrase, specify it with key_file_passphrase
property.
If using HTTP(S), specify both, username
and password
in the authentication
section.
If using password
based SSH authentication, specify authentication.password
only.
If a remote
is specified, all changes in the local repository will be pushed with -f
(force) to the remote.
In the configuration of the spark-notebook, set the following properties:
manager {
notebooks.io.provider = "notebook.io.GitNotebookProvider"
...
}
notebook.io.GitNotebookProvider {
// optional: remote = ""
// optional: branch = ""
local_path = ${manager.notebooks.dir}
authentication {
key_file = "${MESOS_SANDBOX}/git.key"
}
}
See conf/application-git-storage.conf for a complete sample config.
Please keep in mind that the Git storage is still experimental, even if we didn't notice any major issues so far. A few things to note:
- you need Java 8 to use Git support
- each auto-save currently creates a separate commit, so you might wish to disable notebook autosaving.
- a single Git repository is used for all the notebooks. It was not yet tested on very large spark-notebook deployments (e.g. more than 10 concurrent users).
- configuring it for the first time might be slightly inconvenient, e.g. if you filled in the wrong
remote_url
or wrong authentication details, you'd need to restart the spark-notebook for the changes to take effect.
The HTTPS and SSH intergration tests are by ignored by default (as GitHub credentials are needed).
To run those tests, open the GitNotebookProviderWithCloneTests.scala
and remove the @Ignore
tags.
Next, see ./test-git-support.bash
script, and add make the corresponding environment variables available (you may store them to ~/.bashrc
too).
For example:
- first generate a GitHub token (go to https://github.com/settings/tokens , click on
Generate new token
button, selectrepo
anduser
scopes) - run the
./test-git-support.bash
- if all tests pass, you should see two new commits matching the current unit test