GitHub - simonjayhawkins/kedro: A Python library that implements software engineering best-practice for data and ML pipelines.

Theme	Status
Latest Release
Python Version
`master` Branch Build
`develop` Branch Build
Documentation Build
License
Code Style
Questions

What is Kedro?

"The centre of your data pipeline."

Kedro is a development workflow framework that implements software engineering best-practice for data pipelines with an eye towards productionising machine learning models. We provide a standard approach so that you can:

Worry less about how to write production-ready code,
Spend more time building data pipelines that are robust, scalable, deployable, reproducible and versioned,
And, standardise the way that your team collaborates across your project.

How do I install Kedro?

kedro is a Python package. To install it, simply run:

pip install kedro

See more detailed installation instructions, including how to setup Python virtual environments, in our installation guide and get started with our "Hello Word" example.

Why does Kedro exist?

Kedro is built upon our collective best-practice (and mistakes) trying to deliver real-world ML applications that have vast amounts of dirty data. We developed Kedro to achieve the following:

Collaboration on an analytics codebase when different team members have varied exposure to software engineering best-practice
Focussing on maintainable data and ML pipelines as the standard, instead of a singular activity of deploying models in production
A way to inspire the creation of reusable analytics code so that we never start from scratch when working on a new project
Efficient use of time because we're able to quickly move from experimentation into production

Kedro was originally designed by Aris Valtazanos and Nikolaos Tsaousis to solve challenges they faced in their project work. This work was later turned into a product thanks to the following contributors: Ivan Danov, Dmitrii Deriabin, Gordon Wrigley, Yetunde Dada, Nasef Khan, Kiyohito Kunii, Nikolaos Kaltsas, Meisam Emamjome, Peteris Erins, Lorena Balan, Richard Westenra and Anton Kirilenko.

What are the main features of Kedro?

A pipeline visualisation generated using Kedro-Viz

Feature	What is this?
Project Template	A standard, modifiable and easy-to-use project template based on Cookiecutter Data Science.
Data Catalog	A series of lightweight data connectors used for saving and loading data across many different file formats and file systems including local and network file systems, cloud object stores, and HDFS. The Data Catalog also includes data and model versioning for file-based systems. Used with a Python or YAML API.
Pipeline Abstraction	Automatic resolution of dependencies between pure Python functions and data pipeline visualisation using Kedro-Viz.
The Journal	An ability to reproduce pipeline runs with saved pipeline run results.
Coding Standards	Test-driven development using `pytest`, produce well-documented code using Sphinx, create linted code with support for `flake8`, `isort` and `black` and make use of the standard Python logging library.
Flexible Deployment	Deployment strategies that include the use of Docker with Kedro-Docker, conversion of Kedro pipelines into Airflow DAGs with Kedro-Airflow, leveraging a REST API endpoint with Kedro-Server (coming soon) and serving Kedro pipelines as a Python package. Kedro can be deployed locally, on-premise and cloud (AWS, Azure and Google Cloud Platform) servers, or clusters (EMR, EC2, Azure HDinsight and Databricks).

How do I use Kedro?

Our documentation explains:

Best-practice on how to get started using Kedro
A "Hello World" data and ML pipeline example based on the Iris dataset
A two-hour Spaceflights tutorial that teaches you beginner to intermediate functionality
How to use the CLI offered by kedro_cli.py (kedro new, kedro run, ...)
An overview of Kedro architecture
Frequently asked questions (FAQs)

Documentation for the latest stable release can be found here. You can also run kedro docs from your CLI and open the documentation for your current version of Kedro in a browser.

Note: The CLI is a convenient tool for being able to run kedro commands but you can also invoke the Kedro CLI as a Python module with python -m kedro

Note: Read our FAQs to learn how we differ from workflow managers like Airflow and Luigi.

Can I contribute?

Yes! Want to help build Kedro? Check out our guide to contributing.

Where can I learn more?

There is a growing community around Kedro. Have a look at our FAQs to find projects using Kedro and links to articles, podcasts and talks.

What licence do you use?

Kedro is licensed under the Apache 2.0 License.

We're hiring!

Do you want to be part of the team that builds Kedro and other great products at QuantumBlack? If so, you're in luck! QuantumBlack is currently hiring Software Engineers who love using data to drive their decisions. Take a look at our open positions and see if you're a fit.

Name		Name	Last commit message	Last commit date
Latest commit History 461 Commits
.circleci		.circleci
.github		.github
docs		docs
extras		extras
features		features
img		img
kedro		kedro
tests		tests
tools		tools
.coveragerc		.coveragerc
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.pylintrc		.pylintrc
.readthedocs.yml		.readthedocs.yml
CODEOWNERS		CODEOWNERS
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.md		LICENSE.md
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
RELEASE.md		RELEASE.md
behave.ini		behave.ini
legal_header.txt		legal_header.txt
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py
test_requirements.txt		test_requirements.txt
trufflehog-ignore.txt		trufflehog-ignore.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What is Kedro?

How do I install Kedro?

Why does Kedro exist?

What are the main features of Kedro?

How do I use Kedro?

Can I contribute?

Where can I learn more?

What licence do you use?

We're hiring!

About

Releases

Packages

Languages

License

simonjayhawkins/kedro

Folders and files

Latest commit

History

Repository files navigation

What is Kedro?

How do I install Kedro?

Why does Kedro exist?

What are the main features of Kedro?

How do I use Kedro?

Can I contribute?

Where can I learn more?

What licence do you use?

We're hiring!

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages