Extends pygeoapi by a manager for kubernetes jobs and a process to execute notebooks via papermill on a cluster.
For each pygeoapi job, a kubernetes job is spawned which runs papermill in a docker image. You can use the default eurodatacube base image or configure your own image.
Jobs can be started with different parameters. Note that the path to the notebook file itself is a parameter, so by default, you can execute any notebook available to the job container.
A helm chart is available at https://github.com/eurodatacube/charts/tree/master/pygeoapi-eoxhub.
You can use the Dockerfile to get started. It's based on
geopython/pygeoapi:latest and installs
pygeoapi-kubernetes-papermill directly via
- Install pygeoapi.
python3 -m pip install git+git://github.com/eurodatacube/pygeoapi-kubernetes-papermill.git
Proper packages may be provided in the future.
Submitting and monitoring jobs
Please consult the
eurodatacube user documentation:
Kubernetes cluster setup
To really make this useful, you are going to need to think about how to integrate the job workflow to your existing environment.
A common case is to allow users to edit end debug their notebooks in a kubernetes-hosted JupyterLab and allow the jobs full read and write access to the user home in JupyterLab.
The helm chart used by
eurodatacube provides an example of the required kubernetes configuration. It contains:
- A deployment of pygeoapi including service, ingress
- Permissions for the deployment to start and list jobs
- A pygeoapi config file
In order to activate
pygeoapi-kubernetes-papermill, you will need to set up the
kubernetes manager and at least one notebook processing job.
A helm-templated complete example can be found here.
The manager has no configuration options:
manager: name: pygeoapi_kubernetes_papermill.KubernetesManager
The process can (and probably should) be configured (see below):
execute-notebook: type: process processor: name: pygeoapi_kubernetes_papermill.PapermillNotebookKubernetesProcessor default_image: "docker/example-image:1-2-3" image_pull_secret: "my-pull-secret" s3: bucket_name: "my-s3-bucket" secret_name: "secret-for-my-s3-bucket" s3_url: "https://s3-eu-central-1.amazonaws.com" output_directory: "/home/jovian/my-jobs" home_volume_claim_name: "user-foo" extra_pvcs: - claim_name: more-data mount_path: /home/jovyan/more-data jupyter_base_url: "https://example.com/jupyter" secrets: - name: "s3-access-credentials" # defaults to access via mount - name: "db" access: "mount" - name: "redis" access: "env" auto_mount_secrets: False checkout_git_repo: url: https://gitlab.example.com/repo.git secret_name: pygeoapi-git-secret log_output: false node_purpose: "" tolerations:  job_service_account: "" allow_fargate: False
Image to be used to execute the job.
It needs to contain
You can use this papermill engine to notify kubernetes about the job progress.
(Optional): Pull secret for the docker image.
Activate this s3fs sidecar container to make an s3 bucket available in the filesystem of the job so you can directly read from and write to it.
output_directory: Output directory for jobs in the docker container.
Persistent volume claim name of user home.
This volume claim will be made available under
/home/jovyan to the running job.
List of other volume claims that are available to the job.
If this is specified, then a link to result notebooks in jupyterlab is generated.
Note that the user home must be the same in JupyterLab and the job container for this to work.
Note that you can arbitrarily combine
extra_pvcs. This means that it's possible to e.g. only use
s3 to fetch notebooks and store results, or to use only other volume claims, or any combination of these.
List of secrets which will be mounted as volume under
/secret/<secret-name> or made available as environment variable.
Clone a git repo to /home/jovyan/git/algorithm before the job starts. Useful to execute the latest version of notebooks or code of that repository.
secret_name must contain
password for git https checkout.
Boolean, whether to enable
--log-output in papermill.