Skip to content

Commit

Permalink
Merge pull request #44 from getindata/release-0.2.0
Browse files Browse the repository at this point in the history
Release 0.2.0
  • Loading branch information
em-pe committed Jan 18, 2021
2 parents 04b44a6 + c6f783d commit 0e3e596
Show file tree
Hide file tree
Showing 42 changed files with 1,334 additions and 209 deletions.
2 changes: 1 addition & 1 deletion .github/CODEOWNERS
Validating CODEOWNERS rules …
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
# Define global code owners
* @empe @szczeles
* @em-pe @szczeles
2 changes: 1 addition & 1 deletion .github/workflows/python-package.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ jobs:
- name: Check pre-commit status
run: |
pip install -r requirements-dev.txt
pip install .[tests]
pre-commit run --all-files
- name: Test with tox
Expand Down
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,7 @@ target/
# celery beat schedule file
celerybeat-schedule


# SageMath parsed files
*.sage.py

Expand All @@ -121,3 +122,5 @@ venv.bak/

# mypy
.mypy_cache/

docs/_build
19 changes: 18 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,21 @@

## [Unreleased]

## [0.2.0] - 2021-01-18

### Added

- Ability to change the effective user id for steps if the ownership of the volume needs it
- Hook that enables TemplatedConfigLoader that supports dynamic config files. Any env variable
named `KEDRO_CONFIG_<NAME>` can be referenced in configuration files as `${name}`
- Added IAP authentication support for MLflow
- Increased test coverage for the CLI
- Creating github actions template with `kedro kubeflow init --with-github-actions`

### Fixed

- Fixed broken `kubeflow init` command (#29)

## [0.1.10] - 2021-01-11

### Added
Expand Down Expand Up @@ -29,7 +44,9 @@
- Method to schedule runs for most recent version of given pipeline `kedro kubeflow schedule`
- Shortcut to open UI for pipelines using `kedro kubeflow ui`

[Unreleased]: https://github.com/getindata/kedro-kubeflow/compare/0.1.10...HEAD
[Unreleased]: https://github.com/getindata/kedro-kubeflow/compare/0.2.0...HEAD

[0.2.0]: https://github.com/getindata/kedro-kubeflow/compare/0.1.10...0.2.0

[0.1.10]: https://github.com/getindata/kedro-kubeflow/compare/0.1.9...0.1.10

Expand Down
25 changes: 9 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,9 @@

[![Maintainability](https://api.codeclimate.com/v1/badges/fff07cbd2e5012a045a3/maintainability)](https://codeclimate.com/github/getindata/kedro-kubeflow/maintainability)
[![Test Coverage](https://api.codeclimate.com/v1/badges/fff07cbd2e5012a045a3/test_coverage)](https://codeclimate.com/github/getindata/kedro-kubeflow/test_coverage)
[![Documentation Status](https://readthedocs.org/projects/kedro-kubeflow/badge/?version=latest)](https://kedro-kubeflow.readthedocs.io/en/latest/?badge=latest)
[![FOSSA Status](https://app.fossa.com/api/projects/git%2Bgithub.com%2Fgetindata%2Fkedro-kubeflow.svg?type=shield)](https://app.fossa.com/projects/git%2Bgithub.com%2Fgetindata%2Fkedro-kubeflow?ref=badge_shield)

## About

The main purpose of this plugin is to enable running kedro pipeline on Kubeflow Pipelines. It supports translation from
Expand All @@ -16,8 +19,13 @@ a running kubeflow cluster with some convenient commands.

The plugin can be used together with `kedro-docker` to simplify preparation of docker image for pipeline execution.

## Documentation

For detailed documentation refer to https://kedro-kubeflow.readthedocs.io/

## Usage guide


```
Usage: kedro kubeflow [OPTIONS] COMMAND [ARGS]...
Expand All @@ -39,19 +47,4 @@ Usage: kedro kubeflow [OPTIONS] COMMAND [ARGS]...
## Configuration file

`kedro init` generates configuration file for the plugin, but users may want
to adjust it to the requirements of the environment:

```
host: http://10.43.77.224
run_config:
image: new-kedro-project
experiment_name: New Kedro Project
run_name: New Kedro Project
wait_for_completion: False
volume:
storageclass: # default
size: 1Gi
access_modes: [ReadWriteOnce]
skip_init: False
```
to adjust it to match the run environment requirements: https://kedro-kubeflow.readthedocs.io/en/latest/source/02_installation/02_configuration.html
20 changes: 20 additions & 0 deletions docs/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Minimal makefile for Sphinx documentation
#

# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = .
BUILDDIR = _build

# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

.PHONY: help Makefile

# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
81 changes: 81 additions & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
# Configuration file for the Sphinx documentation builder.
#
# This file only contains a selection of the most common options. For a full
# list see the documentation:
# https://www.sphinx-doc.org/en/master/usage/configuration.html

# -- Path setup --------------------------------------------------------------

# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
#
# import os
# import sys
# sys.path.insert(0, os.path.abspath('.'))
import re

from kedro_kubeflow import version as release

# -- Project information -----------------------------------------------------

project = "Kedro Kubeflow Plugin"
copyright = "2020, GetInData"
author = "GetInData"

# The full version, including alpha/beta/rc tags
version = re.match(r"^([0-9]+\.[0-9]+).*", release).group(1)


# -- General configuration ---------------------------------------------------

# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = [
# "sphinx.ext.autodoc",
# "sphinx.ext.napoleon",
# "sphinx_autodoc_typehints",
# "sphinx.ext.doctest",
# "sphinx.ext.todo",
# "sphinx.ext.coverage",
# "sphinx.ext.mathjax",
# "sphinx.ext.ifconfig",
# "sphinx.ext.viewcode",
# "sphinx.ext.mathjax",
"recommonmark",
"sphinx_rtd_theme",
]

# Add any paths that contain templates here, relative to this directory.

autosummary_generate = True
templates_path = ["_templates"]

# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This pattern also affects html_static_path and html_extra_path.
exclude_patterns = ["_build", "Thumbs.db", ".DS_Store"]


# -- Options for HTML output -------------------------------------------------

# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
#
html_theme = "sphinx_rtd_theme"

html_theme_options = {
"collapse_navigation": False,
"style_external_links": True,
}


# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ["_static"]

language = None

pygments_style = "sphinx"
22 changes: 22 additions & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
.. Kedro Kubeflow Plugin documentation master file, created by
sphinx-quickstart on Fri Jan 8 18:01:47 2021.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
Welcome to Kedro Kubeflow Plugin's documentation!
=================================================

.. toctree::
:maxdepth: 2
:caption: Contents:

Introduction <source/01_introduction/01_intro.md>
Installation <source/02_installation/index.rst>
Getting Started <source/03_getting_started/index.rst>

Indices and tables
==================

* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`
3 changes: 3 additions & 0 deletions docs/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
sphinx
sphinx_rtd_theme
recommonmark
21 changes: 21 additions & 0 deletions docs/source/01_introduction/01_intro.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# Introduction

## What is Kubeflow Pipelines?

[Kubeflow Pipelines](https://www.kubeflow.org/docs/pipelines/) is a platform for
building and deploying portable, scalable machine learning (ML) workflows based
on Docker containers. It works by defining pipelines with nodes (Kubernetes objects,
like pod or volume) and edges (dependencies between the nodes, like passing output
data as input). The pipelines are stored in the versioned database, allowing user
to run the pipeline once or schedule the recurring run.

## Why to integrate Kedro project with Pipelines?

Kubeflow Pipelines' main attitude is the portability. Once you define a pipeline,
it can be started on any Kubernetes cluster. The code to execute is stored inside
docker images that cover not only the source itself, but all the libraries and
entire execution environment. Portability is also one of key Kedro aspects, as
the pieplines must be versionable and packagebale. Kedro, with
[Kedro-docker](https://github.com/quantumblacklabs/kedro-docker) plugin do a fantastic
job to achieve this and Kubeflow looks like a nice addon to run the pipelines
on powerful remote Kubernetes clusters.
81 changes: 81 additions & 0 deletions docs/source/02_installation/01_installation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
# Installation guide

## Kedro setup

First, you need to install base Kedro package in ``<17.0`` version

> Kedro 17.0 is supported by kedro-kubeflow, but [not by kedro-mlflow](https://github.com/Galileo-Galilei/kedro-mlflow/issues/144) yet, so the latest version from 0.16 family is recommended.
```console
$ pip install 'kedro<0.17'
```

## Plugin installation

### Install from PyPI

You can install ``kedro-kubeflow`` plugin from ``PyPi`` with `pip`:

```console
pip install --upgrade kedro-kubeflow
```

### Install from sources

You may want to install the develop branch which has unreleased features:

```console
pip install git+https://github.com/getindata/kedro-kubeflow.git@develop
```

## Available commands

You can check available commands by going into project directory and runnning:

```console
$ kedro kubeflow
Usage: kedro kubeflow [OPTIONS] COMMAND [ARGS]...

Interact with Kubeflow Pipelines

Options:
-e, --env TEXT Environment to use.
-h, --help Show this message and exit.

Commands:
compile Translates Kedro pipeline into YAML file with Kubeflow...
init Initializes configuration for the plugin
list-pipelines List deployed pipeline definitions
run-once Deploy pipeline as a single run within given experiment.
schedule Schedules recurring execution of latest version of the...
ui Open Kubeflow Pipelines UI in new browser tab
upload-pipeline Uploads pipeline to Kubeflow server
```

### `init`

`init` command takes one argument (that is the kubeflow pipelines root url) and generates sample configuration file in `conf/base/kubeflow.yaml`. The YAML file content is described in the [Configuration section](../02_installation/02_configuration.md).

### `ui`

`ui` command opens a web browser pointing to the currently configured Kubeflow Pipelines UI. It's super useful for debugging, especially while working on multiple Kubeflow installations.

### `list-pipelines`

`list-pipelines` uses Kubeflow Pipelines to retrieve all registered pipelines

### `compile`

`compile` transforms Kedro pipeline into Argo workflow (Argo is the engine that powers Kubeflow Pipelines). The resulting `yaml` file can be uploaded to Kubeflow Pipelines via web UI.

### `upload-pipeline`

`upload-pipeline` compiles the pipeline and uploads it as a new pipeline version. The pipeline name is equal to the project name for simplicity.

### `schedule`

`schedule` creates recurring run of the previously uploaded pipeline. The cron expression (required parameter) is used to define at what schedule the pipeline should run.

### `run-once`

`run-once` is all-in-one command to compile the pipeline and run it in the Kubeflow environment.
60 changes: 60 additions & 0 deletions docs/source/02_installation/02_configuration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
# Configuration

Plugin maintains the configuration in the `conf/base/kubeflow.yaml` file. Sample configuration can be generated using `kedro kubeflow init`:

```yaml
# Base url of the Kubeflow Pipelines, should include the schema (http/https)
host: https://kubeflow.example.com/pipelines

# Configuration used to run the pipeline
run_config:

# Name of the image to run as the pipeline steps
image: kubeflow-plugin-demo

# Pull pilicy to be used for the steps. Use Always if you push the images
# on the same tag, or Never if you use only local images
image_pull_policy: IfNotPresent

# Name of the kubeflow experiment to be created
experiment_name: Kubeflow Plugin Demo

# Name of the run for run-once
run_name: Kubeflow Plugin Demo Run

# Flag indicating if the run-once should wait for the pipeline to finish
wait_for_completion: False

# Optional volume specification
volume:

# Storage class - use null (or no value) to use the default storage
# class deployed on the Kubernetes cluster
storageclass: # default

# The size of the volume that is created. Applicable for some storage
# classes
size: 1Gi

# Access mode of the volume used to exchange data. ReadWriteMany is
# preferred, but it is not supported on some environements (like GKE)
# Default value: ReadWriteOnce
#access_modes: [ReadWriteMany]

# Flag indicating if the data-volume-init step (copying raw data to the
# fresh volume) should be skipped
skip_init: False

# Allows to specify user executing pipelines within containers
# Default: root user (to avoid issues with volumes in GKE)
owner: 0
```

## Dynamic configuration support

`kedro-kubeflow` contains hook that enables [TemplatedConfigLoader](https://kedro.readthedocs.io/en/stable/kedro.config.TemplatedConfigLoader.html).
It allows passing environment variables to configuration files. It reads all environment variables following `KEDRO_CONFIG_<NAME>` pattern, which you
can later inject in configuration file using `${name}` syntax.

There are two special variables `KEDRO_CONFIG_COMMIT_ID`, `KEDRO_CONFIG_BRANCH_NAME` with support specifying default when variable is not set,
e.g. `${commit_id|dirty}`

0 comments on commit 0e3e596

Please sign in to comment.