Skip to content

Commit

Permalink
DOC: Design docs
Browse files Browse the repository at this point in the history
  • Loading branch information
cpcloud committed Feb 13, 2018
1 parent 0c63d4b commit 10d07b1
Show file tree
Hide file tree
Showing 11 changed files with 698 additions and 37 deletions.
6 changes: 3 additions & 3 deletions README.md
Expand Up @@ -27,10 +27,10 @@ Ibis currently provides tools for interacting with the following systems:
- [Apache Impala (incubating)](http://impala.io/)
- [Apache Kudu](http://getkudu.io)
- [Hadoop Distributed File System (HDFS)](https://hadoop.apache.org/)
- [PostgreSQL](https://www.postgresql.org/) (Experimental)
- [PostgreSQL](https://www.postgresql.org/)
- [MySQL](https://www.mysql.com/) (Experimental)
- [SQLite](http://sqlite.org/)
- [Pandas DataFrames](https://pandas.pydata.org/) (Experimental)
- [SQLite](https://www.sqlite.org/)
- [Pandas](https://pandas.pydata.org/) [DataFrames](http://pandas.pydata.org/pandas-docs/stable/dsintro.html#dataframe) (Experimental)
- [Clickhouse](https://clickhouse.yandex)
- [BigQuery](https://cloud.google.com/bigquery)

Expand Down
35 changes: 24 additions & 11 deletions ci/load-data.sh
@@ -1,28 +1,41 @@
#!/usr/bin/env bash

CWD=$(dirname $0)
CWD="$(dirname "${0}")"

declare -A argcommands=([sqlite]=sqlite
[parquet]="parquet -i"
[postgres]=postgres
[clickhouse]=clickhouse
[mysql]=mysql
[impala]=impala)

if [[ "$#" == 0 ]]; then
ARGS=(${!argcommands[@]}) # keys of argcommands
else
ARGS=($*)
fi

python $CWD/datamgr.py download
python $CWD/datamgr.py mysql &
python $CWD/datamgr.py sqlite &
python $CWD/datamgr.py parquet -i &
python $CWD/datamgr.py postgres &
python $CWD/datamgr.py clickhouse &
python $CWD/impalamgr.py load --data &

for arg in ${ARGS[@]}; do
if [[ "${arg}" == "impala" ]]; then
python "${CWD}"/impalamgr.py load --data &
else
python "${CWD}"/datamgr.py ${argcommands[${arg}]} &
fi
done

FAIL=0

for job in `jobs -p`
do
wait $job || let FAIL+=1
wait "${job}" || let FAIL+=1
done

if [ $FAIL -eq 0 ]; then
echo "Done loading to SQLite, Postgres, Clickhouse and Impala"
if [[ "${FAIL}" == 0 ]]; then
echo "Done loading ${ARGS[@]}"
exit 0
else
echo "Failed loading the datasets" >&2
echo "Failed loading ${ARGS[@]}" >&2
exit 1
fi
2 changes: 1 addition & 1 deletion docs/source/conf.py
Expand Up @@ -142,7 +142,7 @@
# The name of an image file (within the static path) to use as favicon of the
# docs. This file should be a Windows icon file (.ico) being 16x16 or 32x32
# pixels large.
#html_favicon = None
html_favicon = 'favicon.ico'

# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
Expand Down
215 changes: 210 additions & 5 deletions docs/source/design.rst
@@ -1,7 +1,212 @@
.. _internals:
.. _design:

*********************
Ibis design internals
*********************
Design
======

More to come here.

.. _primary_goals:

Primary Goals
-------------

#. Type safety
#. Expressiveness
#. Composability
#. Familiarity

.. _flow_of_execution:

Flow of Execution
-----------------

#. User writes expression
#. Each method or function call builds a new expression
#. Expressions are type checked as you create them
#. Expressions have some optimizations that happen as the user builds them
#. Backend specific rewrites
#. Expressions are compiled
#. The SQL string that generated by the compiler is sent to the database and
executed (this step is skipped for the pandas backend)
#. The database returns some data that is then turned into a pandas DataFrame
by ibis

.. _expressions:

Expressions
-----------

The main user-facing component of ibis is expressions. The base class of all
expressions in ibis is the :class:`~ibis.expr.types.Expr` class.

Expressions provide the user facing API, defined in ``ibis/expr/api.py``

.. _type_system:

Type System
~~~~~~~~~~~

Ibis's type system consists of a set of rules for specifying the types of
inputs to :class:`~ibis.expr.types.Node` subclasses. Upon construction of a
:class:`~ibis.expr.types.Node` subclass, ibis performs validation of every
input to the node based on the rule that was used to declare the input.

Rules are defined in ``ibis/expr/rules.py``

.. _expr_class:

The :class:`~ibis.expr.types.Expr` class
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Expressions are a thin but important abstraction over operations, containing
only type information and shape information, i.e., whether they are tables,
columns, or scalars.

Examples of expressions include :class:`~ibis.expr.types.Int64Column`,
:class:`~ibis.expr.types.StringScalar`, and
:class:`~ibis.expr.types.TableExpr`.

Here's an example of each type of expression:

.. code-block:: ipython
import ibis
t = ibis.table([('a', 'int64')])
int64_column = t.a
type(int64_column)
string_scalar = ibis.literal('some_string_value')
type(string_scalar)
table_expr = t.mutate(b=t.a + 1)
type(table_expr)
.. _node_class:

The :class:`~ibis.expr.types.Node` Class
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

:class:`~ibis.expr.types.Node` subclasses make up the core set of operations of
ibis. Each node corresponds to a particular operation.

Most nodes are defined in the :mod:`~ibis.expr.operations` module.

Examples of nodes include :class:`~ibis.expr.operations.Add` and
:class:`~ibis.expr.operations.Sum`.

Nodes have two important members (and often these are the only members defined):

#. ``input_type``: a list of rules
#. ``output_type``: a rule or method

The ``input_type`` member is a list of rules that defines the types of
the inputs to the operation. This is sometimes called the signature.

The ``output_type`` member is a rule or a method that defines the output type
of the operation. This is sometimes called the return type.

An example of ``input_type``/``output_type`` usage is the
:class:`~ibis.expr.operations.Log` class:

.. code-block:: ipython
class Log(Node):
input_type = [
rules.double(),
rules.double(name='base', optional=True)
]
output_type = rules.shape_like_arg(0, 'double')
This class describes an operation called ``Log`` that takes one required
argument: a double scalar or column, and one optional argument: a double scalar
or column named ``base`` that defaults to nothing if not provided. The base
argument is ``None`` by default so that the expression will behave as the
underlying database does.

These objects are instantiated when you use ibis APIs:

.. code-block:: ipython
import ibis
t = ibis.table([('a', 'double')])
log_1p = (1 + t.a).log() # an Add and a Log are instantiated here
.. _expr_vs_ops:

Expressions vs Operations: Why are they different?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Separating expressions from their underlying operations makes it easy to
generically describe and validate the inputs to particular nodes. In the log
example, it doesn't matter what *operation* (node) the double-valued arguments
are coming from, they must only satisfy the requirement denoted by the rule.

Separation of the :class:`~ibis.expr.types.Node` and
:class:`~ibis.expr.types.Expr` classes also allows the API to be tied to the
physical type of the expression rather than the particular operation, making it
easy to define the API in terms of types rather than specific operations.

Furthermore, operations often have an output type that depends on the input
type. An example of this is the ``greatest`` function, which takes the maximum
of all of its arguments. Another example is ``CASE`` statements, whose ``THEN``
expressions determine the output type of the expression.

This allows ibis to provide **only** the APIs that make sense for a particular
type, even when an operation yields a different output type depending on its
input. Concretely, this means that you cannot perform operations that don't
make sense, like computing the average of a string column.

.. _compilation:

Compilation
-----------

The next major component of ibis is the compilers.

The first few versions of ibis directly generated strings, but the compiler
infrastructure was generalized to support compilation of `SQLAlchemy
<https://docs.sqlalchemy.org/en/latest/core/tutorial.html>`_ based expressions.

The compiler works by translating the different pieces of SQL expression into a
string or SQLAlchemy expression.

The main pieces of a ``SELECT`` statement are:

#. The set of column expressions (``select_set``)
#. ``WHERE`` clauses (``where``)
#. ``GROUP BY`` clauses (``group_by``)
#. ``HAVING`` clauses (``having``)
#. ``LIMIT`` clauses (``limit``)
#. ``ORDER BY`` clauses (``order_by``)
#. ``DISTINCT`` clauses (``distinct``)

Each of these pieces is translated into a SQL string and finally assembled by
the instance of the :class:`~ibis.sql.compiler.ExprTranslator` subclass
specific to the backend being compiled. For example, the
:class:`~ibis.impala.compiler.ImpalaExprTranslator` is one of the subclasses
that will perform this translation.

.. note::

While ibis was designed with an explicit goal of first-class SQL support,
ibis can target other systems such as pandas.

.. _execution:

Execution
---------

We presumably want to *do* something with our compiled expressions. This is
where execution comes in.

This is least complex part of ibis, mostly only requiring ibis to correctly
handle whatever the database hands back.

By and large, the execution of compiled SQL is handled by the database to which
SQL is sent from ibis.

However, once the data arrives from the database we need to convert that
data to a pandas DataFrame.

The Query class, with its :meth:`~ibis.sql.client.Query._fetch` method,
provides a way for ibis :class:`~ibis.sql.client.SQLClient` objects to do any
additional processing necessary after the database returns results to the
client.
40 changes: 40 additions & 0 deletions docs/source/extending.rst
@@ -0,0 +1,40 @@
.. _extending:


Extending Ibis
==============

Users typically want to extend ibis in one of two ways:

#. Add a new expression
#. Add a new backend


Below we provide notebooks showing how to extend ibis in each of these ways.


Adding a New Expression
-----------------------

.. note::

Make sure you've run the following commands before executing the notebook

.. code-block:: sh
docker-compose up -d --no-build postgres dns
docker-compose run waiter
docker-compose run ibis ci/load-data.sh postgres
Here we show how to add a ``sha1`` method to the PostgreSQL backend:

.. toctree::
:maxdepth: 1

notebooks/tutorial/9-Adding-a-new-expression.ipynb


Adding a New Backend
--------------------

TBD
Binary file added docs/source/favicon.ico
Binary file not shown.
1 change: 1 addition & 0 deletions docs/source/index.rst
Expand Up @@ -86,6 +86,7 @@ places, but this will improve as things progress.
sql
developer
design
extending
release
legal

Expand Down

0 comments on commit 10d07b1

Please sign in to comment.