New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[doc] Basic documentation #1

Closed
hartym opened this Issue Dec 25, 2016 · 4 comments

Comments

Projects
None yet
3 participants
@hartym
Member

hartym commented Dec 25, 2016

  • Import things from rdc.etl
  • Create a better "Quick start"
  • Rewrite all outdated things.

@hartym hartym added this to the 0.5 milestone Dec 25, 2016

@hartym hartym added the 1 - accepted label Dec 25, 2016

@rsyring

This comment has been minimized.

rsyring commented Apr 22, 2017

Please consider adding a section that compares bonobo to PETL.

@hartym

This comment has been minimized.

Member

hartym commented Apr 22, 2017

Yes, comparisons to other tools are planned.

In the list (feel free to complete it) :

  • airflow
  • bubbles
  • dataiku
  • dataprep
  • dask
  • hadoop and ecosystem
  • luigi
  • pandas
  • pentaho
  • petl
  • pygrametl
  • pypes
  • pytoolz
  • talend
  • ...

If some expert on any of those tools is available to help me make the more honest comparison possible, it'd be amazing.

@hartym hartym changed the title from Basic documentation to [doc] Basic documentation Apr 23, 2017

@funkyfuture

This comment has been minimized.

funkyfuture commented Apr 23, 2017

ciao, bonobo might be something that i need as a pythonic replacement of xslt, thus i consulted the docs to get a grip of it. i didn't find out whether it fits, but i found some questions that would help me to figure it out. maybe that helps you when you update the docs (which i would strongly suggest as the library looks promising, but it's hard to judge if it'd be suited for a task.)

  • what exact facilities are available to control the evaluation logic of a graph?
  • can a graph contain another graph?
  • how would one access contextual data from a transformation?
    • are there parameter injections like pytest's fixtures?
  • are there yet any concepts how to process trees, like xml?
  • how is a plugin distinguished from a python import in a module that contains transformation callables?

on a sidenote, what the heck is marketing-automation? how would that make the world a better place?

@hartym

This comment has been minimized.

Member

hartym commented Apr 23, 2017

Hi @funkyfuture

Not easy to understand what you're looking for. You're saying "pythonic replacement of xslt", and bonobo can transform xml into something else (or into another xml). Which sounds like what you say, but not certain about your use case and whether or not it would be an idea worth considering.

I'll try to answer your questions here, even if this would maybe suit more a discussion on slack than comments in another ticket. I'll consider your questions for a future F.A.Q. section in the doc (along with others, of course)

What exact facilities are available to control the evaluation logic of a graph?
This question I don't understand. Graph are not "evaluated" but are a tool to define the flow of data. Nodes in a graph are linked directionally, and there are FIFO queues between output of a node and input of the next, when the graph is executed (those queues are only created by the executor, and thus executions are isolated). Feel free to explain what you meant in different words if I did not answer.

Can a graph contain another graph?
There are no tools today in bonobo to insert a graph as a subgraph. It would be great to allow so, but there is a few design questions behind this, like what node you use as input and output of the subgraph, etc. Probably something that will come way after 1.0.

How would one access contextual data from a transformation? / are there parameter injections like pytest's fixtures?
You have the question and the answer here. You have parameter injections like pytest fixtures, and it is the way to go to access contextual data in a transformation. The API may evolve a bit though, because I feel it's a bit hackish, as it is. I mean, it's the right concept, but the exact syntax used make me feel it's not the best experience we can have. To understand how it works today, look at https://github.com/python-bonobo/bonobo/blob/0.2/bonobo/io/csv.py#L63 and class hierarchy.

Are there yet any concepts how to process trees, like xml?
There was the "xml mapper" in bonobo ancestor that had a bit of logic to explain how to go from a xml "blob" to lines of data (cf https://github.com/hartym/rdc.etl/blob/dev/rdc/etl/transform/map/xml.py). It's not exactly "tree processing", but as an ETL is a line-by-line processor, you need to be able to transform your tree in something more flat, and there may be a lot of different options to do so. Think depth first, width first, skip items or not, preprocess depending on type, etc. It may be better to just write your flattening logic in a function, then process it with regular tools as it's not a tree anymore.

How is a plugin distinguished from a python import in a module that contains transformation callables?
Transformation callables are just regular callables, and there is nothing that differentiate it from regular python callables. You can even use some callables both in an imperative programming context and in a transformation graph, no problem. Plugins in bonobo is a different concept that allows one to "enhance" executions in a generic way. For example, the console plugin enhance execution with a nice ANSI output that displays statistics while the execution is running (https://github.com/python-bonobo/bonobo/blob/0.2/bonobo/ext/console/plugin.py). I'd say, no need to think about this for standard ETL cases, it's more a way to extend the framework in itself than userland.

On a sidenote, what the heck is marketing-automation? how would that make the world a better place?
It is tagged as such because I have use cases where I use bonobo for marketing automation. It's probably a derivative usage and not the main point, but I guess there is such a use case (think IFTTT or Zappier, but programmatic).
Bonobo never promised to "make the world a better place", but I'd say it's a good thing for you if you're wasting time on repetitive marketing tasks and bonobo helps you automate it. My own sidenote: I don't understand why people tend to think marketing is a bad thing.

I hope it answers your questions, if not, let's have a chat on slack so I better understand your points.

@hartym hartym added doc and removed 1 - ready labels May 2, 2017

@hartym hartym modified the milestone: 0.4 May 2, 2017

hartym added a commit that referenced this issue May 31, 2017

@hartym hartym closed this Jan 16, 2018

hartym pushed a commit that referenced this issue Aug 11, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment