Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: Document core parts of ibis #1351

Closed
wants to merge 1 commit into from

Conversation

cpcloud
Copy link
Member

@cpcloud cpcloud commented Feb 10, 2018

No description provided.

The main user-facing component of ibis is expressions. The base class of all
expressions in ibis is the :class:`~ibis.expr.types.Expr` class.

Expressions provide the user facing API, defined in ``ibis/expr/api.py``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could actually put in the links to code

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, will do

@cpcloud cpcloud added this to the 0.13 milestone Feb 10, 2018
@cpcloud cpcloud added docs Documentation related issues or PRs feature Features or general enhancements internals Issues or PRs related to ibis's internal APIs labels Feb 10, 2018
@cpcloud
Copy link
Member Author

cpcloud commented Feb 11, 2018

@kszucs @wesm @jreback please review when you get a chance

Copy link
Member

@kszucs kszucs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! If I recall correctly I've tried to click on Ibis design internals three times :)

- [SQLite](http://sqlite.org/)
- [Pandas DataFrames](https://pandas.pydata.org/) (Experimental)
- [SQLite](https://www.sqlite.org/)
- [Pandas](https://pandas.pydata.org/) [DataFrames](http://pandas.pydata.org/pandas-docs/stable/dsintro.html#dataframe) (Experimental)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you marking this experimental?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it hasn't been around that long.


More to come here.
Primary Goals
*************
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be '----' otherwise you have too many levels of nesting (compared to other sections)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, done

#. Composability
#. Familiarty

Flow of execution
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe should capitalize all works in sections (or not), just do this consistently

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

(EXAMPLE)
#. Some optimizations happen at compile time? (EXAMPLE)
#. Expressions are compiled
#. The SQL string that generated by the compiler is sent to the database and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is true for SQL but not for other backends where the execution can happen locally / remote (e.g. pandas / spark / file based)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a note

#. The database returns some data that is then turned into a pandas DataFrame
by ibis


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you might want to have reference tags to each sub-section

expressions.

The compiler works by translating the different pieces of SQL expression into a
string or SQLAlchemy expression.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ref to SQLAlchemy docs

:class:`~ibis.impala.compiler.ImpalaExprTranslator` is one of the subclasses
that will perform this translation.

Execution
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe have another section on 'local execution' (pandas / spark / file based)

Adding a new operation (``Node`` subclass)
------------------------------------------

Let's go through adding a `sha1`_ method to ibis, implemented in the BigQuery
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ref to BQ (maybe doc-section in ibis)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll link to the sha1 wikipedia article here https://en.wikipedia.org/wiki/SHA-1

function in BigQuery takes a string or bytes and returns a bytestring of length
20.

.. code-block:: python
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could do these as ipython blocks (I think) esp if you want to use them later

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually going to move this into a notebook

execute the arguments of the current node

execute the current node with its executed arguments

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this rather belongs in the docs proper

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I put this here so that I could start a trend of adding module level docs for each backend, rather than have one rst file for each backend which will tend to get out of date less often than the module docs.

@wesm
Copy link
Member

wesm commented Feb 12, 2018

Will endeavor to review tomorrow

@cpcloud cpcloud force-pushed the more-docs branch 2 times, most recently from e406325 to ae5d981 Compare February 12, 2018 14:26
Copy link
Member

@wesm wesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments, but looks like a great start!

#. Type safety
#. Expressiveness
#. Composability
#. Familiarty
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you

#. Expressions are type checked as you create them
#. Some expressions have some optimizations that happen as the user builds them
(EXAMPLE)
#. Some optimizations happen at compile time? (EXAMPLE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Backend-specific rewrites

#. The SQL string that generated by the compiler is sent to the database and
executed (this step is skipped for the Pandas backend)
#. The database returns some data that is then turned into a pandas DataFrame
by ibis
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should decide on Ibis vs ibis in prose and be consistent (pandas is of course styled as lowercase)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll arbitrarily decide to follow the pandas convention.


Here's an example of each type of expression:

.. code-block:: python
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could do IPython directive (with graphviz disabled)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't show them anyways, since I'm not repring the expression, just the type.

Separation of the :class:`~ibis.expr.types.Node` and
:class:`~ibis.expr.types.Expr` classes also allows the API to be tied to the
physical type of the expression rather than the particular operation, making it
easy to define the API in terms of types rather than specific operations.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another key point here is that the operator output type will often depend on the input type(s). So the "user API" for the result of an operation is a strongly-typed expression having only the behavior of the actual output type of the operator.

  • Add strings -> get strings, do things with strings
  • Add numbers -> get numbers, do things with numbers

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, added a blurb about this.

the instance of the :class:`~ibis.sql.compiler.ExprTranslator` subclass
specific to the backend being compiled. For example, the
:class:`~ibis.impala.compiler.ImpalaExprTranslator` is one of the subclasses
that will perform this translation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be worth noting explicitly that SQL is only one target, however the library was designed with first-class SQL support in mind, being the lingua franca of analytics.

@cpcloud
Copy link
Member Author

cpcloud commented Feb 13, 2018

Merging on green

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation related issues or PRs feature Features or general enhancements internals Issues or PRs related to ibis's internal APIs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants