Skip to content

EuroPython 2018 Sprint on Bonobo

Romain Dorgueil edited this page Oct 27, 2018 · 14 revisions
Status: Sprint In Progress!
Date: 28/29 July 2018
Location: EICC Edinburgh / #sprint on slack

Whatever your background and experience is, you are very welcome. If you want to discover the library, want to bring your opinions on how this can be useful in name your job, help with improving the documentation with a new regard, want to dive in the code and propose patches, help refining how we present the toolkit, ... or anything else aiming at getting the tool better known and easier to use, feel free to join us!

I (Romain) will be available for introduction, help, routing, assistance, ... Ask me anything!

Before you start

  • Say hi, introduce yourself.
  • Join #sprint on Slack (https://bonobo-slack.herokuapp.com/).
  • Add your name to the bottom of the page.
  • Install the develop version (TODO add direct link)

Ressources for the sprint

Bleeding edge install

To install the bleeding edge "develop" version, you can run:

$ pip install -U -e git+https://github.com/python-bonobo/bonobo@develop#egg=bonobo

To work on the source code, it's better to clone the repository and use an editable package:

$ git clone git@github.com:python-bonobo/bonobo.git --branch develop
$ pip install --editable bonobo

If you don't have github credentials setup, you can use HTTPS:

$ git clone https://github.com/python-bonobo/bonobo.git --branch develop
$ pip install --editable bonobo

What can you work on

First, anybody, any level is welcome. Don't think you do not have enough experience, that's just plain wrong. Beginners and very experienced python enthusiasts are much welcome, and of course anybody in between or outside of this scope.

First steps

Learning bonobo should take much less than 1 hour (and I'll help!). It's a very good opportunity to make contributions to the first steps documentation. There must be typos here, minor errors, maybe completely erroneous things (I hope not!). Let's go together through the tutorials and fix whatever is unclear or erroneous. Also, if you think there are missing parts, feel free to add!

Tutorial: http://docs.bonobo-project.org/en/latest/tutorial/index.html

Using the library for public or private use cases

Best way to learn, after the first steps, is to actually try out and implement something. There are a lot of learnings to take from various people using the lib, please ask for assistance on anything, it will help building the best ETL for modern python!

Features and bugs

There are a lot of open features and bugs. Some of them are explicitly flagged as "easy" (github tag) to allow easier spotting of good candidates. Feature code and bug fixes should include documentation and tests.

Contribution guide

http://docs.bonobo-project.org/en/latest/contribute/index.html

Bugs

Arrrglll... We hate them!

https://github.com/python-bonobo/bonobo/issues?q=is%3Aopen+is%3Aissue+label%3Abug

Epics

Those are ideas you can pick to work on something. It is not a definitive an exhaustive list, feel free to propose something else. If you start working on something, announce it to other participant by adding your handle on this page next to the topic, and in Slack. This will allow multiple persons interested by the same topic to work together, and also ease getting help.

Documentation

  • Better onboarding.
  • Better API reference.
  • Document the internal mechanisms (bags, queues, strategies ...)
  • Better docs about sqlalchemy.

Current issues / bugs

Next: Better errors

  • A lot of errors are a bit cryptic for now. Especially the ones related to input format and prototype mismatching calls.

Next: Better separation of concerns

  • We need to have the Queue implementation chosen by the Exectution Strategy, to allow more execution strategies (like asyncio, subprocess, dask ...)

Next: coercion and casts

  • We need to rework how bonobo decide if an output is compatible with the next input. Type equality is too strict and cause a lot of trouble to users.

Next: less decorators and better config inheritance

  • What decorators can we remove? Especially thinking about the use_context, use_raw_input decorators.
  • If we can't remove them, how can we make them better? We should not need to add them again in subclasses (or should we?). Decorators should work the same for CBT (class based transformations) and FBT (function based transformations).
  • Can we optionally use type annotations for this purpose? Maybe "def xyz(context: NodeExecutionContext)" should be enough to ask for the context as a dependency?
  • Can we optionally use type annotations for service injection? Maybe "def xyz(..., foo: ServiceReference('some_foo')" can be more explicit?
  • Rework the Configurable inheritance mechanism (defaults? delete one?)

Next: basic tools (aggregates, etc.)

Next: new formats (geo, xml, ...)

Next: subprocess strategy

Next: dask strategy

DevXP: status tree in console

DevXP: jupyterlab

DevXP: standard options

name, filter, fields

DevXP: profiling

Next: Wrapping node or graphs

Participants

  • @oagbaneje
  • @maklori
  • (your handle here)