New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: Document core parts of ibis #1351

Closed
wants to merge 1 commit into
base: master
from

Conversation

4 participants
@cpcloud
Member

cpcloud commented Feb 10, 2018

No description provided.

The main user-facing component of ibis is expressions. The base class of all
expressions in ibis is the :class:`~ibis.expr.types.Expr` class.
Expressions provide the user facing API, defined in ``ibis/expr/api.py``

This comment has been minimized.

@jreback

jreback Feb 10, 2018

Contributor

you could actually put in the links to code

This comment has been minimized.

@cpcloud

cpcloud Feb 11, 2018

Member

yep, will do

@cpcloud cpcloud added this to the 0.13 milestone Feb 10, 2018

@cpcloud cpcloud added this to To do in Documentation via automation Feb 10, 2018

@cpcloud

This comment has been minimized.

Member

cpcloud commented Feb 11, 2018

@kszucs @wesm @jreback please review when you get a chance

@kszucs

kszucs approved these changes Feb 11, 2018

Cool! If I recall correctly I've tried to click on Ibis design internals three times :)

- [SQLite](http://sqlite.org/)
- [Pandas DataFrames](https://pandas.pydata.org/) (Experimental)
- [SQLite](https://www.sqlite.org/)
- [Pandas](https://pandas.pydata.org/) [DataFrames](http://pandas.pydata.org/pandas-docs/stable/dsintro.html#dataframe) (Experimental)

This comment has been minimized.

@jreback

jreback Feb 11, 2018

Contributor

are you marking this experimental?

This comment has been minimized.

@cpcloud

cpcloud Feb 11, 2018

Member

Yeah, it hasn't been around that long.

More to come here.
Primary Goals
*************

This comment has been minimized.

@jreback

jreback Feb 11, 2018

Contributor

I think this should be '----' otherwise you have too many levels of nesting (compared to other sections)

This comment has been minimized.

@cpcloud

cpcloud Feb 11, 2018

Member

Yep, done

#. Composability
#. Familiarty
Flow of execution

This comment has been minimized.

@jreback

jreback Feb 11, 2018

Contributor

maybe should capitalize all works in sections (or not), just do this consistently

This comment has been minimized.

@cpcloud
(EXAMPLE)
#. Some optimizations happen at compile time? (EXAMPLE)
#. Expressions are compiled
#. The SQL string that generated by the compiler is sent to the database and

This comment has been minimized.

@jreback

jreback Feb 11, 2018

Contributor

this is true for SQL but not for other backends where the execution can happen locally / remote (e.g. pandas / spark / file based)

This comment has been minimized.

@cpcloud

cpcloud Feb 11, 2018

Member

Added a note

#. The database returns some data that is then turned into a pandas DataFrame
by ibis

This comment has been minimized.

@jreback

jreback Feb 11, 2018

Contributor

you might want to have reference tags to each sub-section

expressions.
The compiler works by translating the different pieces of SQL expression into a
string or SQLAlchemy expression.

This comment has been minimized.

@jreback

jreback Feb 11, 2018

Contributor

ref to SQLAlchemy docs

:class:`~ibis.impala.compiler.ImpalaExprTranslator` is one of the subclasses
that will perform this translation.
Execution

This comment has been minimized.

@jreback

jreback Feb 11, 2018

Contributor

maybe have another section on 'local execution' (pandas / spark / file based)

Adding a new operation (``Node`` subclass)
------------------------------------------
Let's go through adding a `sha1`_ method to ibis, implemented in the BigQuery

This comment has been minimized.

@jreback

jreback Feb 11, 2018

Contributor

ref to BQ (maybe doc-section in ibis)

This comment has been minimized.

@cpcloud

cpcloud Feb 11, 2018

Member

I'll link to the sha1 wikipedia article here https://en.wikipedia.org/wiki/SHA-1

function in BigQuery takes a string or bytes and returns a bytestring of length
20.
.. code-block:: python

This comment has been minimized.

@jreback

jreback Feb 11, 2018

Contributor

you could do these as ipython blocks (I think) esp if you want to use them later

This comment has been minimized.

@cpcloud

cpcloud Feb 11, 2018

Member

I'm actually going to move this into a notebook

execute the arguments of the current node
execute the current node with its executed arguments

This comment has been minimized.

@jreback

jreback Feb 11, 2018

Contributor

I think this rather belongs in the docs proper

This comment has been minimized.

@cpcloud

cpcloud Feb 11, 2018

Member

So, I put this here so that I could start a trend of adding module level docs for each backend, rather than have one rst file for each backend which will tend to get out of date less often than the module docs.

@cpcloud cpcloud force-pushed the cpcloud:more-docs branch from c5307e7 to ce89efc Feb 11, 2018

@wesm

This comment has been minimized.

Member

wesm commented Feb 12, 2018

Will endeavor to review tomorrow

@cpcloud cpcloud force-pushed the cpcloud:more-docs branch 2 times, most recently from e406325 to ae5d981 Feb 12, 2018

@wesm

Minor comments, but looks like a great start!

#. Type safety
#. Expressiveness
#. Composability
#. Familiarty

This comment has been minimized.

@wesm

wesm Feb 12, 2018

Member

typo

This comment has been minimized.

@cpcloud

cpcloud Feb 13, 2018

Member

thank you

#. Expressions are type checked as you create them
#. Some expressions have some optimizations that happen as the user builds them
(EXAMPLE)
#. Some optimizations happen at compile time? (EXAMPLE)

This comment has been minimized.

@wesm

wesm Feb 12, 2018

Member

Backend-specific rewrites

#. The SQL string that generated by the compiler is sent to the database and
executed (this step is skipped for the Pandas backend)
#. The database returns some data that is then turned into a pandas DataFrame
by ibis

This comment has been minimized.

@wesm

wesm Feb 12, 2018

Member

We should decide on Ibis vs ibis in prose and be consistent (pandas is of course styled as lowercase)

This comment has been minimized.

@cpcloud

cpcloud Feb 13, 2018

Member

I'll arbitrarily decide to follow the pandas convention.

Here's an example of each type of expression:
.. code-block:: python

This comment has been minimized.

@wesm

wesm Feb 12, 2018

Member

Could do IPython directive (with graphviz disabled)

This comment has been minimized.

@cpcloud

cpcloud Feb 13, 2018

Member

This won't show them anyways, since I'm not repring the expression, just the type.

Separation of the :class:`~ibis.expr.types.Node` and
:class:`~ibis.expr.types.Expr` classes also allows the API to be tied to the
physical type of the expression rather than the particular operation, making it
easy to define the API in terms of types rather than specific operations.

This comment has been minimized.

@wesm

wesm Feb 12, 2018

Member

Another key point here is that the operator output type will often depend on the input type(s). So the "user API" for the result of an operation is a strongly-typed expression having only the behavior of the actual output type of the operator.

  • Add strings -> get strings, do things with strings
  • Add numbers -> get numbers, do things with numbers

This comment has been minimized.

@cpcloud

cpcloud Feb 13, 2018

Member

Cool, added a blurb about this.

the instance of the :class:`~ibis.sql.compiler.ExprTranslator` subclass
specific to the backend being compiled. For example, the
:class:`~ibis.impala.compiler.ImpalaExprTranslator` is one of the subclasses
that will perform this translation.

This comment has been minimized.

@wesm

wesm Feb 12, 2018

Member

May be worth noting explicitly that SQL is only one target, however the library was designed with first-class SQL support in mind, being the lingua franca of analytics.

@cpcloud cpcloud force-pushed the cpcloud:more-docs branch from ae5d981 to c64b5ea Feb 13, 2018

@cpcloud

This comment has been minimized.

Member

cpcloud commented Feb 13, 2018

Merging on green

@cpcloud cpcloud force-pushed the cpcloud:more-docs branch from c64b5ea to 10d07b1 Feb 13, 2018

@cpcloud cpcloud closed this in 5952b1a Feb 13, 2018

Documentation automation moved this from To do to Done Feb 13, 2018

@cpcloud cpcloud deleted the cpcloud:more-docs branch Feb 13, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment