Merge pull request #44 from getindata/release-0.2.0

Release 0.2.0
getindata · Jan 18, 2021 · 0e3e596 · 0e3e596
2 parents 04b44a6 + c6f783d
commit 0e3e596
Show file tree

Hide file tree

Showing 42 changed files with 1,334 additions and 209 deletions.
diff --git a/.github/CODEOWNERS b/.github/CODEOWNERS
@@ -1,2 +1,2 @@
 # Define global code owners
-* @empe @szczeles
+* @em-pe @szczeles
diff --git a/.github/workflows/python-package.yml b/.github/workflows/python-package.yml
@@ -27,7 +27,7 @@ jobs:
    
     - name: Check pre-commit status
       run: |
-        pip install -r requirements-dev.txt
+        pip install .[tests]
         pre-commit run --all-files
 
     - name: Test with tox

diff --git a/.gitignore b/.gitignore
@@ -104,6 +104,7 @@ target/
 # celery beat schedule file
 celerybeat-schedule
 
+
 # SageMath parsed files
 *.sage.py
 
@@ -121,3 +122,5 @@ venv.bak/
 
 # mypy
 .mypy_cache/
+
+docs/_build
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -2,6 +2,21 @@
 
 ## [Unreleased]
 
+## [0.2.0] - 2021-01-18
+
+### Added
+
+-   Ability to change the effective user id for steps if the ownership of the volume needs it
+-   Hook that enables TemplatedConfigLoader that supports dynamic config files. Any env variable 
+    named `KEDRO_CONFIG_<NAME>` can be referenced in configuration files as `${name}`
+-   Added IAP authentication support for MLflow
+-   Increased test coverage for the CLI
+-   Creating github actions template with `kedro kubeflow init --with-github-actions`
+
+### Fixed
+
+-   Fixed broken `kubeflow init` command (#29)
+
 ## [0.1.10] - 2021-01-11
 
 ### Added
@@ -29,7 +44,9 @@
 -   Method to schedule runs for most recent version of given pipeline `kedro kubeflow schedule` 
 -   Shortcut to open UI for pipelines using `kedro kubeflow ui` 
 
-[Unreleased]: https://github.com/getindata/kedro-kubeflow/compare/0.1.10...HEAD
+[Unreleased]: https://github.com/getindata/kedro-kubeflow/compare/0.2.0...HEAD
+
+[0.2.0]: https://github.com/getindata/kedro-kubeflow/compare/0.1.10...0.2.0
 
 [0.1.10]: https://github.com/getindata/kedro-kubeflow/compare/0.1.9...0.1.10
 

diff --git a/README.md b/README.md
@@ -8,6 +8,9 @@
 
 [![Maintainability](https://api.codeclimate.com/v1/badges/fff07cbd2e5012a045a3/maintainability)](https://codeclimate.com/github/getindata/kedro-kubeflow/maintainability) 
 [![Test Coverage](https://api.codeclimate.com/v1/badges/fff07cbd2e5012a045a3/test_coverage)](https://codeclimate.com/github/getindata/kedro-kubeflow/test_coverage)
+[![Documentation Status](https://readthedocs.org/projects/kedro-kubeflow/badge/?version=latest)](https://kedro-kubeflow.readthedocs.io/en/latest/?badge=latest)
+[![FOSSA Status](https://app.fossa.com/api/projects/git%2Bgithub.com%2Fgetindata%2Fkedro-kubeflow.svg?type=shield)](https://app.fossa.com/projects/git%2Bgithub.com%2Fgetindata%2Fkedro-kubeflow?ref=badge_shield)
+
 ## About
 
 The main purpose of this plugin is to enable running kedro pipeline on Kubeflow Pipelines. It supports translation from 
@@ -16,8 +19,13 @@ a running kubeflow cluster with some convenient commands.
 
 The plugin can be used together with `kedro-docker` to simplify preparation of docker image for pipeline execution.   
 
+## Documentation
+
+For detailed documentation refer to https://kedro-kubeflow.readthedocs.io/
+
 ## Usage guide
 
+
 ```
 Usage: kedro kubeflow [OPTIONS] COMMAND [ARGS]...
  
@@ -39,19 +47,4 @@ Usage: kedro kubeflow [OPTIONS] COMMAND [ARGS]...
 ## Configuration file
 
 `kedro init` generates configuration file for the plugin, but users may want
-to adjust it to the requirements of the environment:
-
-```
-host: http://10.43.77.224
-
-run_config:
-  image: new-kedro-project
-  experiment_name: New Kedro Project
-  run_name: New Kedro Project
-  wait_for_completion: False
-  volume:
-    storageclass: # default
-    size: 1Gi
-    access_modes: [ReadWriteOnce]
-    skip_init: False
-```
+to adjust it to match the run environment requirements: https://kedro-kubeflow.readthedocs.io/en/latest/source/02_installation/02_configuration.html
diff --git a/docs/Makefile b/docs/Makefile
@@ -0,0 +1,20 @@
+# Minimal makefile for Sphinx documentation
+#
+
+# You can set these variables from the command line, and also
+# from the environment for the first two.
+SPHINXOPTS    ?=
+SPHINXBUILD   ?= sphinx-build
+SOURCEDIR     = .
+BUILDDIR      = _build
+
+# Put it first so that "make" without argument is like "make help".
+help:
+	@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
+
+.PHONY: help Makefile
+
+# Catch-all target: route all unknown targets to Sphinx using the new
+# "make mode" option.  $(O) is meant as a shortcut for $(SPHINXOPTS).
+%: Makefile
+	@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
diff --git a/docs/conf.py b/docs/conf.py
@@ -0,0 +1,81 @@
+# Configuration file for the Sphinx documentation builder.
+#
+# This file only contains a selection of the most common options. For a full
+# list see the documentation:
+# https://www.sphinx-doc.org/en/master/usage/configuration.html
+
+# -- Path setup --------------------------------------------------------------
+
+# If extensions (or modules to document with autodoc) are in another directory,
+# add these directories to sys.path here. If the directory is relative to the
+# documentation root, use os.path.abspath to make it absolute, like shown here.
+#
+# import os
+# import sys
+# sys.path.insert(0, os.path.abspath('.'))
+import re
+
+from kedro_kubeflow import version as release
+
+# -- Project information -----------------------------------------------------
+
+project = "Kedro Kubeflow Plugin"
+copyright = "2020, GetInData"
+author = "GetInData"
+
+# The full version, including alpha/beta/rc tags
+version = re.match(r"^([0-9]+\.[0-9]+).*", release).group(1)
+
+
+# -- General configuration ---------------------------------------------------
+
+# Add any Sphinx extension module names here, as strings. They can be
+# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
+# ones.
+extensions = [
+    # "sphinx.ext.autodoc",
+    # "sphinx.ext.napoleon",
+    # "sphinx_autodoc_typehints",
+    # "sphinx.ext.doctest",
+    # "sphinx.ext.todo",
+    # "sphinx.ext.coverage",
+    # "sphinx.ext.mathjax",
+    # "sphinx.ext.ifconfig",
+    # "sphinx.ext.viewcode",
+    # "sphinx.ext.mathjax",
+    "recommonmark",
+    "sphinx_rtd_theme",
+]
+
+# Add any paths that contain templates here, relative to this directory.
+
+autosummary_generate = True
+templates_path = ["_templates"]
+
+# List of patterns, relative to source directory, that match files and
+# directories to ignore when looking for source files.
+# This pattern also affects html_static_path and html_extra_path.
+exclude_patterns = ["_build", "Thumbs.db", ".DS_Store"]
+
+
+# -- Options for HTML output -------------------------------------------------
+
+# The theme to use for HTML and HTML Help pages.  See the documentation for
+# a list of builtin themes.
+#
+html_theme = "sphinx_rtd_theme"
+
+html_theme_options = {
+    "collapse_navigation": False,
+    "style_external_links": True,
+}
+
+
+# Add any paths that contain custom static files (such as style sheets) here,
+# relative to this directory. They are copied after the builtin static files,
+# so a file named "default.css" will overwrite the builtin "default.css".
+html_static_path = ["_static"]
+
+language = None
+
+pygments_style = "sphinx"
diff --git a/docs/index.rst b/docs/index.rst
@@ -0,0 +1,22 @@
+.. Kedro Kubeflow Plugin documentation master file, created by
+   sphinx-quickstart on Fri Jan  8 18:01:47 2021.
+   You can adapt this file completely to your liking, but it should at least
+   contain the root `toctree` directive.
+
+Welcome to Kedro Kubeflow Plugin's documentation!
+=================================================
+
+.. toctree::
+   :maxdepth: 2
+   :caption: Contents:
+
+   Introduction <source/01_introduction/01_intro.md>
+   Installation <source/02_installation/index.rst>
+   Getting Started <source/03_getting_started/index.rst>
+
+Indices and tables
+==================
+
+* :ref:`genindex`
+* :ref:`modindex`
+* :ref:`search`
diff --git a/docs/requirements.txt b/docs/requirements.txt
@@ -0,0 +1,3 @@
+sphinx
+sphinx_rtd_theme
+recommonmark
diff --git a/docs/source/01_introduction/01_intro.md b/docs/source/01_introduction/01_intro.md
@@ -0,0 +1,21 @@
+# Introduction
+
+## What is Kubeflow Pipelines?
+
+[Kubeflow Pipelines](https://www.kubeflow.org/docs/pipelines/) is a platform for
+building and deploying portable, scalable machine learning (ML) workflows based 
+on Docker containers. It works by defining pipelines with nodes (Kubernetes objects, 
+like pod or volume) and edges (dependencies between the nodes, like passing output 
+data as input). The pipelines are stored in the versioned database, allowing user 
+to run the pipeline once or schedule the recurring run.
+
+## Why to integrate Kedro project with Pipelines?
+
+Kubeflow Pipelines' main attitude is the portability. Once you define a pipeline,
+it can be started on any Kubernetes cluster. The code to execute is stored inside 
+docker images that cover not only the source itself, but all the libraries and 
+entire execution environment. Portability is also one of key Kedro aspects, as 
+the pieplines must be versionable and packagebale. Kedro, with 
+[Kedro-docker](https://github.com/quantumblacklabs/kedro-docker) plugin do a fantastic 
+job to achieve this and Kubeflow looks like a nice addon to run the pipelines 
+on powerful remote Kubernetes clusters.
diff --git a/docs/source/02_installation/01_installation.md b/docs/source/02_installation/01_installation.md
@@ -0,0 +1,81 @@
+# Installation guide
+
+## Kedro setup
+
+First, you need to install base Kedro package in ``<17.0`` version
+
+> Kedro 17.0 is supported by kedro-kubeflow, but [not by kedro-mlflow](https://github.com/Galileo-Galilei/kedro-mlflow/issues/144) yet, so the latest version from 0.16 family is recommended.
+
+```console
+$ pip install 'kedro<0.17'
+```
+
+## Plugin installation
+
+### Install from PyPI
+
+You can install ``kedro-kubeflow`` plugin from ``PyPi`` with `pip`:
+
+```console
+pip install --upgrade kedro-kubeflow
+```
+
+### Install from sources
+
+You may want to install the develop branch which has unreleased features:
+
+```console
+pip install git+https://github.com/getindata/kedro-kubeflow.git@develop
+```
+
+## Available commands
+
+You can check available commands by going into project directory and runnning:
+
+```console
+$ kedro kubeflow
+Usage: kedro kubeflow [OPTIONS] COMMAND [ARGS]...
+
+  Interact with Kubeflow Pipelines
+
+Options:
+  -e, --env TEXT  Environment to use.
+  -h, --help      Show this message and exit.
+
+Commands:
+  compile          Translates Kedro pipeline into YAML file with Kubeflow...
+  init             Initializes configuration for the plugin
+  list-pipelines   List deployed pipeline definitions
+  run-once         Deploy pipeline as a single run within given experiment.
+  schedule         Schedules recurring execution of latest version of the...
+  ui               Open Kubeflow Pipelines UI in new browser tab
+  upload-pipeline  Uploads pipeline to Kubeflow server
+```
+
+### `init`
+
+`init` command takes one argument (that is the kubeflow pipelines root url) and generates sample configuration file in `conf/base/kubeflow.yaml`. The YAML file content is described in the [Configuration section](../02_installation/02_configuration.md).
+
+### `ui`
+
+`ui` command opens a web browser pointing to the currently configured Kubeflow Pipelines UI. It's super useful for debugging, especially while working on multiple Kubeflow installations.
+
+### `list-pipelines`
+
+`list-pipelines` uses Kubeflow Pipelines to retrieve all registered pipelines
+
+### `compile`
+
+`compile` transforms Kedro pipeline into Argo workflow (Argo is the engine that powers Kubeflow Pipelines). The resulting `yaml` file can be uploaded to Kubeflow Pipelines via web UI.
+
+### `upload-pipeline`
+
+`upload-pipeline` compiles the pipeline and uploads it as a new pipeline version. The pipeline name is equal to the project name for simplicity.
+
+### `schedule`
+
+`schedule` creates recurring run of the previously uploaded pipeline. The cron expression (required parameter) is used to define at what schedule the pipeline should run.
+
+### `run-once`
+
+`run-once` is all-in-one command to compile the pipeline and run it in the Kubeflow environment.
diff --git a/docs/source/02_installation/02_configuration.md b/docs/source/02_installation/02_configuration.md
@@ -0,0 +1,60 @@
+# Configuration
+
+Plugin maintains the configuration in the `conf/base/kubeflow.yaml` file. Sample configuration can be generated using `kedro kubeflow init`:
+
+```yaml
+# Base url of the Kubeflow Pipelines, should include the schema (http/https)
+host: https://kubeflow.example.com/pipelines
+
+# Configuration used to run the pipeline
+run_config:
+
+  # Name of the image to run as the pipeline steps
+  image: kubeflow-plugin-demo
+
+  # Pull pilicy to be used for the steps. Use Always if you push the images
+  # on the same tag, or Never if you use only local images
+  image_pull_policy: IfNotPresent
+
+  # Name of the kubeflow experiment to be created
+  experiment_name: Kubeflow Plugin Demo
+
+  # Name of the run for run-once
+  run_name: Kubeflow Plugin Demo Run
+
+  # Flag indicating if the run-once should wait for the pipeline to finish
+  wait_for_completion: False
+
+  # Optional volume specification
+  volume:
+
+    # Storage class - use null (or no value) to use the default storage
+    # class deployed on the Kubernetes cluster
+    storageclass: # default
+
+    # The size of the volume that is created. Applicable for some storage
+    # classes
+    size: 1Gi
+
+    # Access mode of the volume used to exchange data. ReadWriteMany is
+    # preferred, but it is not supported on some environements (like GKE)
+    # Default value: ReadWriteOnce
+    #access_modes: [ReadWriteMany]
+
+    # Flag indicating if the data-volume-init step (copying raw data to the
+    # fresh volume) should be skipped
+    skip_init: False
+
+    # Allows to specify user executing pipelines within containers
+    # Default: root user (to avoid issues with volumes in GKE)
+    owner: 0
+```
+
+## Dynamic configuration support
+
+`kedro-kubeflow` contains hook that enables [TemplatedConfigLoader](https://kedro.readthedocs.io/en/stable/kedro.config.TemplatedConfigLoader.html).
+It allows passing environment variables to configuration files. It reads all environment variables following `KEDRO_CONFIG_<NAME>` pattern, which you 
+can later inject in configuration file using `${name}` syntax. 
+
+There are two special variables `KEDRO_CONFIG_COMMIT_ID`, `KEDRO_CONFIG_BRANCH_NAME` with support specifying default when variable is not set, 
+e.g. `${commit_id|dirty}`