249 changes: 0 additions & 249 deletions docs/source/getting-started.rst

This file was deleted.

61 changes: 46 additions & 15 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,21 @@ http://ibis-project.org.

Source code is on GitHub: https://github.com/ibis-project/ibis

.. _install:

Installation
------------

System Dependencies
~~~~~~~~~~~~~~~~~~~

Ibis requires a working Python 3.7+ installation. We recommend using
`Anaconda <http://continuum.io/downloads>`_ to manage Python versions and
environments.

Installing the Python Package
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Install Ibis from PyPI with:

::
Expand All @@ -43,6 +58,20 @@ Or from `conda-forge <http://conda-forge.github.io>`_ with

conda install ibis-framework -c conda-forge

This installs the ``ibis`` library to your configured Python environment.

Ibis can also be installed with Kerberos support for its HDFS functionality:

::

pip install ibis-framework[kerberos]

Some platforms will require that you have Kerberos installed to build properly.

* Redhat / CentOS: ``yum install krb5-devel``
* Ubuntu / Debian: ``apt-get install libkrb5-dev``
* Arch Linux : ``pacman -S krb5``

At this time, Ibis offers some level of support for the following systems:

- `Apache Impala <https://impala.apache.org/>`_
Expand All @@ -55,7 +84,7 @@ At this time, Ibis offers some level of support for the following systems:
- `Yandex Clickhouse <https://clickhouse.yandex/>`_
- Direct execution of ibis expressions against `Pandas
<http://pandas.pydata.org/>`_ objects
- `OmniSciDB <https://www.omnisci.com/>`_ (Experimental)
- `OmniSciDB <https://www.omnisci.com/>`_
- `PySpark/Spark SQL <https://spark.apache.org/sql/>`_ (Experimental)

Coming from SQL? Check out :ref:`Ibis for SQL Programmers <sql>`.
Expand All @@ -79,26 +108,28 @@ SQL engine support needing code contributors:
.. toctree::
:maxdepth: 1

getting-started
configuration
impala
tutorial
tutorial/index
user_guide/index
api
sql
udf
contributing
design
extending
backends
roadmap
release
release-pre-1.0
legal
backends/index
release/index

Learning Resources
------------------

We collect Jupyter notebooks for learning how to use ibis here:
https://github.com/ibis-project/ibis/tree/master/docs/source/notebooks/tutorial.
Some of these notebooks will be reproduced as part of the documentation
:ref:`in the tutorial section <tutorial>`.

Indices and tables
==================

* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`

.. Use the meta tags to verify the site for use in Google OAuth2 consent flow.
.. meta::
:google-site-verification: IVqzkYiD5E35oD4kkVOcTYCTfqWKU1f6zOHCnLIPkUU
97 changes: 0 additions & 97 deletions docs/source/notebooks/tutorial/1-Intro-and-Setup.ipynb

This file was deleted.

616 changes: 0 additions & 616 deletions docs/source/notebooks/tutorial/2-Basics-Aggregate-Filter-Limit.ipynb

This file was deleted.

497 changes: 0 additions & 497 deletions docs/source/notebooks/tutorial/3-Projection-Join-Sort.ipynb

This file was deleted.

77 changes: 76 additions & 1 deletion docs/source/release.rst → docs/source/release/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,86 @@
Release Notes
=============

.. toctree::
:hidden:

release-pre-1.0

.. note::

These release notes are for versions of ibis **1.0 and later**. Release
notes for pre-1.0 versions of ibis can be found at :doc:`/release-pre-1.0`
notes for pre-1.0 versions of ibis can be found at :doc:`release-pre-1.0`

* :feature:`2514` Add Struct.from_dict
* :feature:`2310` Add hash and hashbytes support for BigQuery backend
* :feature:`2511` Support reduction UDF without groupby to return multiple columns for Pandas backend
* :feature:`2487` Support analytic and reduction UDF to return multiple columns for Pandas backend
* :support:`2497` Move `ibis.HDFS`, `ibis.WebHDFS` and `ibis.hdfs_connect` to `ibis.impala.*`
* :feature:`2473` Support elementwise UDF to return multiple columns for Pandas and PySpark backend
* :bug:`2462` Table expressions do not recognize inet datatype (Postgres backend)
* :bug:`2461` Table expressions do not recognize macaddr datatype (Postgres backend)
* :bug:`2410` Fix ``aggcontext.Summarize`` not always producing scalar (Pandas backend)
* :bug:`2414` Fix same window op with different window size on table lead to incorrect results for pyspark backend
* :feature:`2409` FEAT: Support Ibis interval for window in pyspark backend
* :bug:`2229` Fix same column with multiple aliases not showing properly in repr
* :feature:`2402` Use Scope class for scope in pyspark backend
* :bug:`2395` Fix reduction UDFs over ungrouped, bounded windows on Pandas backend
* :bug:`2386` FEAT: Support rolling window UDF with non numeric inputs for pandas backend.
* :bug:`2386` Fix scope get to use hashmap lookup instead of list lookup
* :bug:`2387` Fix equality behavior for Literal ops
* :bug:`2376` Fix analytic ops over ungrouped and unordered windows on Pandas backend
* :support:`2288` Drop support to Python 3.6
* :bug:`2367` Fix the covariance operator in the BigQuery backend.
* :feature:`2366` Add PySpark support for ReductionVectorizedUDF
* :bug:`2342` Update impala kerberos dependencies
* :feature:`2306` Add time context in `scope` in execution for pandas backend
* :support:`2351` Simplifying tests directories structure
* :feature:`2081` Add ``start_point`` and ``end_point`` to PostGIS backend.
* :feature:`2347` Add set difference to general ibis api
* :feature:`2251` Add ``rowid`` expression, supported by SQLite and OmniSciDB
* :feature:`2230` Add intersection to general ibis api
* :support:`2304` Update ``google-cloud-bigquery`` dependency minimum version to 1.12.0
* :feature:`2303` Add ``application_name`` argument to ``ibis.bigquery.connect`` to allow attributing Google API requests to projects that use Ibis.
* :bug:`1320` Added verbose logging to SQL backends
* :feature:`2285` Add support for casting category dtype in pandas backend
* :feature:`2270` Add support for Union in the PySpark backend
* :bug:`2256` Fix issue with sql_validate call to OmnisciDB.
* :feature:`2260` Add support for implementign custom window object for pandas backend
* :bug:`2237` Add missing float types to pandas backend
* :bug:`2252` Allow group_by and order_by as window operation input in pandas backend
* :feature:`2246` Implement two level dispatcher for execute_node
* :feature:`2233` Add ibis.pandas.trace module to log time and call stack information.
* :feature:`2198` Validate that the output type of a UDF is a single element
* :bug:`2223` Fix PySpark compiler error when elementwise UDF output_type is Decimal or Timestamp
* :feature:`2186` ZeroIfNull and NullIfZero implementation for OmniSciDB
* :bug:`2157` Fix interactive mode returning a expression instead of the value when used in Jupyter
* :feature:`2093` IsNan implementation for OmniSciDB
* :feature:`2094` [OmnisciDB] Support add_columns and drop_columns for OmnisciDB table
* :support:`2234` Remove "experimental" mentions for OmniSciDB and Pandas backends
* :bug:`2127` Fix PySpark error when doing alias after selection
* :support:`2244` Use an OmniSciDB image stable on CI
* :feature:`2175` Create ExtractQuarter operation and add its support to Clickhouse, CSV, Impala, MySQL, OmniSciDB, Pandas, Parquet, PostgreSQL, PySpark, SQLite and Spark
* :feature:`2126` Add translation rules for isnull() and notnull() for pyspark backend
* :feature:`2232` Add window operations support to SQLite
* :feature:`2062` Implement read_csv for omniscidb backend
* :feature:`2171` [OmniSciDB] Add support to week extraction
* :feature:`2097` Date, DateDiff and TimestampDiff implementations for OmniSciDB
* :bug:`2170` Fix millisecond issue for OmniSciDB :issue:`2167`, MySQL :issue:`2169`, PostgreSQL :issue:`2166`, Pandas :issue:`2168`, BigQuery :issue:`2273` backends
* :feature:`2177` Create ExtractWeekOfYear operation and add its support to Clickhouse, CSV, MySQL, Pandas, Parquet, PostgreSQL, PySpark and Spark
* :feature:`2060` Add initial support for ibis.random function
* :support:`2107` Added fragment_size to table creation for OmniSciDB
* :feature:`2178` Added epoch_seconds extraction operation to Clickhouse, CSV, Impala, MySQL, OmniSciDB, Pandas, Parquet, PostgreSQL, PySpark, SQLite, Spark and BigQuery :issue:`2273`
* :feature:`2165` [OmniSciDB] Add "method" parameter to load_data
* :feature:`2117` Add non-nullable info to schema output
* :feature:`2083` fillna and nullif implementations for OmnisciDB
* :feature:`1981` Add load_data to sqlalchemy's backends and fix database parameter for load/create/drop when database parameter is the same than the current database
* :support:`2096` Added round() support for OmniSciDB
* :feature:`2125` [OmniSciDB] Add support for within, d_fully_within and point
* :feature:`2086` OmniSciDB - Refactor DDL and Client; Add temporary parameter to create_table and "force" parameter to drop_view
* :support:`2113` Enabled cumulative ops support for OmniSciDB
* :bug:`2134` [OmniSciDB] Fix TopK when used as filter
* :feature:`2173` Create ExtractDayOfYear operation and add its support to Clickhouse, CSV, MySQL, OmniSciDB, Pandas, Parquet, PostgreSQL, PySpark, SQLite and Spark
* :feature:`2095` Implementations of Log Log2 Log10 for OmniSciDB backend
* :release:`1.3.0 <2020-02-27>`
* :support:`2066` Add support to Python 3.8
* :bug:`2089 major` Pin "clickhouse-driver" to ">=0.1.3"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ Release Notes (pre 1.0)
.. note::

These release notes are for versions of ibis **prior to 1.0**. For 1.0 and
later release notes see :doc:`/release`.
later release notes see :doc:`/release/index`.

v0.14.0 (August 23rd, 2018)
---------------------------
Expand Down Expand Up @@ -392,7 +392,7 @@ Thank you to all who contributed patches to this release.

This release brings expanded pandas and Impala integration, including support
for managing partitioned tables in Impala. See the new :ref:`Ibis for Impala
Users <impala>` guide for more on using Ibis with Impala.
Users <backends.impala>` guide for more on using Ibis with Impala.

The :ref:`Ibis for SQL Programmers <sql>` guide also was written since the 0.5
release.
Expand All @@ -404,7 +404,7 @@ New Features
~~~~~~~~~~~~

* New integrated Impala functionality. See :ref:`Ibis for Impala Users
<impala>` for more details on these things.
<backends.impala>` for more details on these things.

* Improved Impala-pandas integration. Create tables or insert into existing
tables from pandas ``DataFrame`` objects.
Expand Down
21 changes: 0 additions & 21 deletions docs/source/tutorial.rst

This file was deleted.

566 changes: 566 additions & 0 deletions docs/source/tutorial/01-Introduction-to-Ibis.ipynb

Large diffs are not rendered by default.

693 changes: 693 additions & 0 deletions docs/source/tutorial/02-Aggregates-Joins.ipynb

Large diffs are not rendered by default.

670 changes: 670 additions & 0 deletions docs/source/tutorial/03-Expressions-Lazy-Mode-Logging.ipynb

Large diffs are not rendered by default.

File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
88 changes: 88 additions & 0 deletions docs/source/tutorial/data/Create-geography-database.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Creation of the geography database\n",
"\n",
"This notbook creates the SQLite `geography.db` database, used in the Ibis tutorials.\n",
"\n",
"The source of the `countries` table has been obtained from [GeoNames](https://www.geonames.org/countries/).\n",
"\n",
"The data for the `gdp` data has been obtained from the [World Bank website](https://data.worldbank.org/indicator/NY.GDP.MKTP.CD)."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import json\n",
"import sqlite3\n",
"\n",
"\n",
"with open('geography.json') as f:\n",
" data = json.load(f)\n",
"\n",
"conn = sqlite3.connect('geography.db')\n",
"cursor = conn.cursor()\n",
"\n",
"cursor.execute('''\n",
"CREATE TABLE countries (\n",
" iso_alpha2 TEXT,\n",
" iso_alpha3 TEXT,\n",
" iso_numeric INT,\n",
" fips TEXT,\n",
" name TEXT,\n",
" capital TEXT,\n",
" area_km2 REAL,\n",
" population INT,\n",
" continent TEXT);\n",
"''')\n",
"cursor.executemany('INSERT INTO countries VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)',\n",
" data['countries'])\n",
"\n",
"cursor.execute('''\n",
"CREATE TABLE gdp (\n",
" country_code TEXT,\n",
" year INT,\n",
" value REAL);\n",
"''')\n",
"cursor.executemany('INSERT INTO gdp VALUES (?, ?, ?)',\n",
" data['gdp'])\n",
"\n",
"conn.commit()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Binary file added docs/source/tutorial/data/geography.db
Binary file not shown.
1 change: 1 addition & 0 deletions docs/source/tutorial/data/geography.json

Large diffs are not rendered by default.

18 changes: 18 additions & 0 deletions docs/source/tutorial/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
.. _tutorial:

Tutorial
========

Here we show Jupyter notebooks that take you through various tasks using ibis.

.. toctree::
:maxdepth: 1

01-Introduction-to-Ibis.ipynb
02-Aggregates-Joins.ipynb
03-Expressions-Lazy-Mode-Logging.ipynb
04-More-Value-Expressions.ipynb
05-IO-Create-Insert-External-Data.ipynb
06-Advanced-Topics-TopK-SelfJoins.ipynb
07-Advanced-Topics-ComplexFiltering.ipynb
08-More-Analytics-Helpers.ipynb
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -32,8 +32,8 @@ how to add a new ``bitwise_and`` reduction operation:
.. toctree::
:maxdepth: 1

notebooks/tutorial/9-Adding-a-new-elementwise-expression.ipynb
notebooks/tutorial/10-Adding-a-new-reduction-expression.ipynb
extending_elementwise_expr.ipynb
extending_reduce_expr.ipynb


Adding a New Backend
Expand All @@ -59,8 +59,3 @@ Run test suite for separate Backend
`./ci/backends-markers.sh`. By default, a marker will be generated that
matches the name of the backend (you can manually correct the generated
name for the marker inside the file)

Other
-----

TBD
File renamed without changes.
22 changes: 22 additions & 0 deletions docs/source/user_guide/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
.. _userguide:

**********
User guide
**********

The user guide covers Ibis by topic.

If you are new to Ibis, you can learn the basics in the :ref:`tutorial`.

For users looking for information about a particular class, method...
the information is available in the :ref:`api`.

.. toctree::
:maxdepth: 1

configuration
sql
udf
geospatial_analysis
design
extending/index
File renamed without changes.
21 changes: 21 additions & 0 deletions docs/source/user_guide/udf.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
.. _udf:

User Defined Functions
======================

Ibis provides a mechanism for writing custom scalar and aggregate functions,
with varying levels of support for different backends. UDFs/UDAFs are a complex
topic.

This section of the documentation will discuss some of the backend specific
details of user defined functions.

.. warning::

The UDF API is provisional and subject to change.

The next backends provide UDF support:

- :ref:`udf.impala`
- :ref:`udf.pandas`
- :ref:`udf.bigquery`
23 changes: 23 additions & 0 deletions docs/web/about/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Ibis: Python Data Analysis Productivity Framework

Ibis is a toolbox to bridge the gap between local Python environments (like
pandas and scikit-learn) and remote storage and execution systems like Hadoop
components (like HDFS, Impala, Hive, Spark) and SQL databases (Postgres,
etc.). Its goal is to simplify analytical workflows and make you more
productive.

We have a handful of specific priority focus areas:

- Enable data analysts to translate local, single-node data idioms to scalable
computation representations (e.g. SQL or Spark)
- Integration with pandas and other Python data ecosystem components
- Provide high level analytics APIs and workflow tools to enhance productivity
and streamline common or tedious tasks.
- Integration with community standard data formats (e.g. Parquet and Avro)
- Abstract away database-specific SQL differences

As the [Apache Arrow](http://arrow.apache.org/) project develops, we will
look to use Arrow to enable computational code written in Python to be executed
natively within other systems like Apache Spark and Apache Impala (incubating).

Source code is on GitHub: <https://github.com/ibis-project/ibis>.
10 changes: 3 additions & 7 deletions docs/source/legal.rst → docs/web/about/license.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,8 @@
=====
Legal
=====
# Legal

Ibis is distributed under the Apache License, Version 2.0.

Ibis development is generously sponsored by Cloudera, Inc.

License::

```text
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
Expand Down Expand Up @@ -209,3 +204,4 @@ License::
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
```
73 changes: 18 additions & 55 deletions docs/source/roadmap.rst → docs/web/about/roadmap.md
Original file line number Diff line number Diff line change
@@ -1,27 +1,18 @@
.. _roadmap:
# Roadmap

Roadmap
=======
This document is an outline of the next set of major efforts within ibis.

.. _long_term_goals:
## Long Term Goals

Long Term Goals
---------------
This section outlines broader, longer-term goals for the project alongside a
few short-term goals and provides information and direction for a few key areas
of focus over the next 1-2 years, possibly longer depending on the amount of
time the developers of Ibis can devote to the project.

.. _compiler_structure:
### Compiler Structure

Compiler Structure
~~~~~~~~~~~~~~~~~~
#### Separation of Concerns

.. _separation_of_concerns:

Separation of Concerns
^^^^^^^^^^^^^^^^^^^^^^
The current architecture of the ibis compiler has a few key problems that need
to be addressed to ensure longevity and maintainability of the project going
forward.
Expand All @@ -40,36 +31,29 @@ optimize whole expression trees.
This approach lets us optimize queries piece by piece, as opposed to having to
provide all optimization implementations in a single pull request.

.. _unifying_table_and_column_compilation:
#### Unifying Table and Column Compilation

Unifying Table and Column Compilation
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Right now, it is very difficult to customize the way the operations underlying
table expressions are compiled. The logic to compile them is hard-coded in each
backend (or the compiler’s parent class). This needs to be addressed, if only
to ease the burden of implementing the UNNEST operation and make the codebase
easier to understand and maintain.

.. _depth:
### Depth

Depth
~~~~~
"Depth" goals relate to enhancing Ibis to provide better support for
backend-specific functionality.

.. _backend_specific_operations:
#### Backend-Specific Operations

Backend-Specific Operations
^^^^^^^^^^^^^^^^^^^^^^^^^^^
As the number of ibis users and use cases grows there will be an increasing
need for individual backends to support more exotic operations. Many SQL
databases have features that are unique only to themselves and often this is
why people will choose that technology over another. Ibis should support an API
that reflects the backend that underlies an expression and expose the
functionality of that specific backend.

A concrete example of this is the `FARM_FINGERPRINT
<https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#farm_fingerprint>`_
A concrete example of this is the [FARM_FINGERPRINT](https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#farm_fingerprint)
function in BigQuery.

It is unlikely that the main ValueExpr API will ever grow such a method, but a
Expand All @@ -83,21 +67,17 @@ define operations with a backend-specific spelling (presumably in the name of
expediency) that may actually be easily generalizable to or useful for other
backends. This behavior should be discouraged to the extent possible.

.. _standardize_udfs:
#### Standardize UDFs (User Defined Functions)

Standardize UDFs (User Defined Functions)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
A few backends have support for UDFs. Impala, Pandas and BigQuery all have at
least some level of support for user-defined functions. This mechanism should
be extended to other backends where possible. We outline different approaches
to adding UDFs to the backends that are well-supported but currently do not
have a UDF implementation. Development of a standard interface for UDFs is
ideal, so that it’s easy for new backends to implement the interface.

.. _breadth:
### Breadth

Breadth
~~~~~~~
The major breadth-related question ibis is facing is how to grow the number of
backends in ibis in a scalable, minimum-maintenance way is an open question.

Expand All @@ -108,48 +88,31 @@ At minimum we need a way to display which backends implement which operations.
With the ability to provide custom operations we also need a way to display the
custom operations that each backend provides.

.. _backend_specific_goals:
## Backend-Specific Goals

Backend-Specific Goals
----------------------
These goals relate to specific backends

.. _pandas:

Pandas
~~~~~~
### pandas

.. _speed_up_grouped_rolling_and_simple_aggregations_using_numba:
#### Speed up grouped, rolling, and simple aggregations using numba

Speed up grouped, rolling, and simple aggregations using numba
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Pandas aggregations are quite slow relative to an equivalent numba
pandas aggregations are quite slow relative to an equivalent numba
implementation, for various reasons. Since ibis hides the implementation
details of a particular expression we can experiment with using different
aggregation implementations.

.. _dask:

Dask
~~~~
### Dask

.. _implement_a_dask_backed:
#### Implement a Dask backend

Implement a Dask backend
^^^^^^^^^^^^^^^^^^^^^^^^
There is currently no way in ibis to easily parallelize a computation on a
single machine, let alone distribute a computation across machines.

Dask provides APIs for doing such things.

.. _spark:

Spark
~~~~~

.. _implement_a_spark_backend:
### Spark

Implement a SparkSQL backend
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

SparkSQL provides a way to execute distributed SQL queries similar to other
backends supported by ibis such as Impala and BigQuery.
47 changes: 47 additions & 0 deletions docs/web/about/team.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# Team

## Contributors

{{ ibis.project_name }} is developed and maintained by a
[community of volunteer contributors](https://github.com/{{ ibis.github_repo_url }}/graphs/contributors).


{% for group in team %}

## {{ group.name }}

<div class="row maintainers">
{% for row in group.members | batch(6, "") %}
<div class="card-group maintainers">
{% for person in row %}
{% if person %}
<div class="card">
<img class="card-img-top" alt="" src="{{ person.avatar_url }}"/>
<div class="card-body">
<h6 class="card-title">
{% if person.blog %}
<a href="{{ person.blog }}">
{{ person.name or person.login }}
</a>
{% else %}
{{ person.name or person.login }}
{% endif %}
</h6>
<p class="card-text small"><a href="{{ person.html_url }}">{{ person.login }}</a></p>
</div>
</div>
{% else %}
<div class="card border-0"></div>
{% endif %}
{% endfor %}
</div>
{% endfor %}
</div>

{% endfor %}

{{ ibis.project_name }} aims to be a welcoming, friendly, diverse and inclusive community.
Everybody is welcome, regardless of gender, sexual orientation, gender identity,
and expression, disability, physical appearance, body size, race, or religion.
We do not tolerate harassment of community members in any form.
In particular, people from underrepresented groups are encouraged to join the community.
20 changes: 20 additions & 0 deletions docs/web/community/coc.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Code of Conduct

Ibis is governed by the
[NumFOCUS code of conduct](https://numfocus.org/code-of-conduct),
which in a short version is:

Be kind to others. Do not insult or put down others. Behave professionally.
Remember that harassment and sexist, racist, or exclusionary jokes are not
appropriate for {{ ibis.project_name }}.

All communication should be appropriate for a professional audience
including people of many different backgrounds. Sexual language and
imagery is not appropriate.

{{ ibis.project_name }} is dedicated to providing a harassment-free community for everyone,
regardless of gender, sexual orientation, gender identity, and expression,
disability, physical appearance, body size, race, or religion. We do not
tolerate harassment of community members in any form.

Thank you for helping make this a welcoming, friendly community for all.
60 changes: 60 additions & 0 deletions docs/web/community/ecosystem.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
# Ecosystem

## [pandas](https://github.com/pandas-dev/pandas)

[pandas](https://pandas.pydata.org) is a Python package that provides fast,
flexible, and expressive data structures designed to make working with "relational" or
"labeled" data both easy and intuitive. It aims to be the fundamental high-level
building block for doing practical, real world data analysis in Python. Additionally,
it has the broader goal of becoming the most powerful and flexible open source data
analysis / manipulation tool available in any language. It is already well on its way
towards this goal.

## [SQLAlchemy](https://github.com/sqlalchemy/sqlalchemy)

[SQLAlchemy](https://www.sqlalchemy.org/) is the Python SQL toolkit and
Object Relational Mapper that gives application developers the full power and
flexibility of SQL. SQLAlchemy provides a full suite of well known enterprise-level
persistence patterns, designed for efficient and high-performing database access,
adapted into a simple and Pythonic domain language.

## [sql_to_ibis](https://github.com/zbrookle/sql_to_ibis)

[sql_to_ibis](https://github.com/zbrookle/sql_to_ibis) is a Python package that
translates SQL syntax into ibis expressions. This allows users to use one unified SQL
dialect to target many different backends, even those that don't traditionally
support SQL.

A good use case would be ease of migration between databases or backends. Suppose you
were moving from SQLite to MySQL or from PostgresSQL to BigQuery. These
frameworks all have very subtle differences in SQL dialects, but with sql_to_ibis,
these differences are automatically translated in Ibis.

Another good use case is pandas, which has no SQL support at all for querying a
dataframe. With sql_to_ibis this is made possible.

For example,

```python
import ibis.backends.pandas
import pandas
import sql_to_ibis

df = pandas.DataFrame({"column1": [1, 2, 3], "column2": ["4", "5", "6"]})
ibis_table = ibis.backends.pandas.from_dataframe(
df, name="my_table", client=ibis.backends.pandas.PandasClient({})
)
sql_to_ibis.register_temp_table(ibis_table, "my_table")
sql_to_ibis.query(
"select column1, cast(column2 as integer) + 1 as my_col2 from my_table"
).execute()
```
This would output a dataframe that looks like:

```
| column1 | my_col2 |
|---------|---------|
| 1 | 5 |
| 2 | 6 |
| 3 | 7 |
```
68 changes: 68 additions & 0 deletions docs/web/config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
pysuerga:
extensions:
- pysuerga.contrib.team
markdown_extensions:
- toc
- tables
- fenced_code
- codehilite

ibis:
project_name: Ibis
github_repo_url: ibis-project/ibis

layout:
title: "Ibis: Python data analysis productivity framework"
favicon: /static/img/favicon.ico
stylesheets:
- /static/css/ibis.css
- /static/css/codehilite.css
logo: /static/img/logo_ibis.svg
header_text: Ibis
navbar:
- name: "About us"
sections:
- name: "About Ibis"
target: /about/index.html
- name: "Team"
target: /about/team.html
- name: "Roadmap"
target: /about/roadmap.html
- name: "License"
target: /about/license.html
- name: "Getting started"
target: /getting_started.html
- name: "Documentation"
target: /docs/
- name: "Community"
sections:
- name: "Ask a question (StackOverflow)"
target: https://stackoverflow.com/questions/tagged/ibis
- name: "Chat (Gitter)"
target: https://gitter.im/ibis-dev/Lobby
- name: "Code of conduct"
target: /community/coc.html
- name: "Ecosystem"
target: /community/ecosystem.html
- name: "Contribute"
target: /contribute.html
social_media:
- font_awesome: twitter
url: https://twitter.com/IbisData
- font_awesome: github
url: https://github.com/ibis-project/ibis/
footer_note: "© Copyright 2020, Ibis developers"
google_analytics: ""

team:
- name: "Active maintainers"
kind: github
members:
- jreback
- datapythonista
- name: "Former maintainers"
kind: github
members:
- wesm
- cpcloud
- kszucs
132 changes: 132 additions & 0 deletions docs/web/contribute.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
# Contributing to Ibis

## Set up a development environment

1. Create a fork of the [Ibis repository](https://github.com/ibis-project/ibis), and clone it.

:::sh
git clone https://github.com/<your-github-username>/ibis


2. [Download](https://docs.conda.io/en/latest/miniconda.html) and install Miniconda
3. Create a Conda environment suitable for ibis development:

:::sh
cd ibis
conda env create


4. Activate the environment

:::sh
conda activate ibis-dev

5. Install your local copy of Ibis into the Conda environment. In the root of the project run:

:::sh
pip install -e .


## Find an issue to work on

If you are working with Ibis, and find a bug, or you are reading the documentation and see something
wrong, or that could be clearer, you can work on that.

But sometimes, you may want to contribute to Ibis, but you don't have anything in mind. In that case,
you can check the GitHub issue tracker for Ibis, and look for issues with the label
[good first issue](https://github.com/ibis-project/ibis/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22).
Feel free to also help with other issues that don't have the label, but they may be more challenging,
and require knowledge of Ibis internals.

Once you found an issue you want to work on, write a comment with the text `/take`, and GitHub will
assign the issue to yourself. This way, nobody else will work on it at the same time. If you find an
issue that someone else is assigned to, please contact the assignee to know if they are still working
on it.


## Working with backends

Ibis comes with several backends. If you want to work with a specific backend, you will have to install
the dependencies for the backend with `conda install -n ibis-dev -c conda-forge --file="ci/deps/<backend>.yml"`.

If you don't have a database for the backend you want to work on, you can check the configuration of the
continuos integration, where docker images are used for different backend. This is defined in
`.github/workflows/main.yml`.

## Run the test suite

To run Ibis tests use the next command:

```sh
PYTEST_BACKENDS="sqlite pandas" python -m pytest ibis/tests
```

You can change `sqlite pandas` by the backend or backends (space separated) that
you want to test.


## Style and formatting

We use [flake8](http://flake8.pycqa.org/en/latest/),
[black](https://github.com/psf/black) and
[isort](https://github.com/pre-commit/mirrors-isort) to ensure our code
is formatted and linted properly. If you have properly set up your development
environment by running ``make develop``, the pre-commit hooks should check
that your proposed changes continue to conform to our style guide.

We use [numpydoc](https://numpydoc.readthedocs.io/en/latest/format.html) as
our standard format for docstrings.


## Commit philosophy

We aim to make our individual commits small and tightly focused on the feature
they are implementing. If you find yourself making functional changes to
different areas of the codebase, we prefer you break up your changes into
separate Pull Requests. In general, a philosophy of one Github Issue per
Pull Request is a good rule of thumb, though that isn't always possible.

We avoid merge commits (and in fact they are disabled in the Github repository)
so you may be asked to rebase your changes on top of the latest commits to
master if there have been changes since you last updated a Pull Request.
Rebasing your changes is usually as simple as running
``git pull upstream master --rebase`` and then force-pushing to your branch:
``git push origin <branch-name> -f``.


## Commit/PR messages

Well-structed commit messages allow us to generate comprehensive release notes
and make it very easy to understand what a commit/PR contributes to our
codebase. Commit messages and PR titles should be prefixed with a standard
code the states what kind of change it is. They fall broadly into 3 categories:
``FEAT (feature)``, ``BUG (bug)``, and ``SUPP (support)``. The ``SUPP``
category has some more fine-grained aliases that you can use, such as ``BLD``
(build), ``CI`` (continuous integration), ``DOC`` (documentation), ``TST``
(testing), and ``RLS`` (releases).


## Maintainer's guide

Maintainers generally perform two roles, merging PRs and making official
releases.


### Merging PRs

We have a CLI script that will merge Pull Requests automatically once they have
been reviewed and approved. See the help message in ``dev/merge-pr.py`` for
full details. If you have two-factor authentication turned on in Github, you
will have to generate an application-specific password by following this
[guide](https://help.github.com/en/articles/creating-a-personal-access-token-for-the-command-line).
You will then use that generated password on the command line for the ``-P``
argument.

Access the [Ibis "Merging PRs" wiki](https://github.com/ibis-project/ibis/wiki/Merging-PRs) page
for more information.


### Releasing

Access the [Ibis "Releasing" wiki](https://github.com/ibis-project/ibis/wiki/Releasing-Ibis) page
for more information.
39 changes: 39 additions & 0 deletions docs/web/getting_started.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# Getting started

## Installation instructions

The next steps provides the easiest and recommended way to set up your
environment to use {{ ibis.project_name }}. Other installation options can be found in
the [advanced installation page]({{ base_url}}/docs/getting_started/install.html).

1. Download [Anaconda](https://www.anaconda.com/distribution/) for your operating system and
the latest Python version, run the installer, and follow the steps. Detailed instructions
on how to install Anaconda can be found in the
[Anaconda documentation](https://docs.anaconda.com/anaconda/install/)).

2. In the Anaconda prompt (or terminal in Linux or MacOS), install {{ ibis.project_name }}:

:::sh
conda install -c conda-forge ibis-framework

3. In the Anaconda prompt (or terminal in Linux or MacOS), start JupyterLab:

<img class="img-fluid" alt="" src="{{ base_url }}/static/img/install/anaconda_prompt.png"/>

4. In JupyterLab, create a new (Python 3) notebook:

<img class="img-fluid" alt="" src="{{ base_url }}/static/img/install/jupyterlab_home.png"/>

5. In the first cell of the notebook, you can import {{ ibis.project_name }} and check the version with:

:::python
import ibis
ibis.__version__

6. Now you are ready to use {{ ibis.project_name }}, and you can write your code in the next cells.

## Tutorials

You can learn more about {{ ibis.project_name }} in the
[tutorials](https://ibis-project.org/docs/tutorial/index.html),
and more about JupyterLab in the [JupyterLab documentation](https://jupyterlab.readthedocs.io/en/stable/user/interface.html).
80 changes: 80 additions & 0 deletions docs/web/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
<div class="row">
<div class="col">
<section class="jumbotron text-center home-jumbotron">
<p>
Write your analytics code once, run it everywhere.
</p>
</section>
</div>
</div>

## Main features

Ibis provides a standard way to write analytics code, that then can be run in
multiple engines.

- **Full coverage of SQL features**: You can code in Ibis anything you can implement in a SQL SELECT
- **Transparent to SQL implementation differences**: Write standard code that translate to any SQL syntax
- **High performance execution**: Execute at the speed of your backend, not your local computer
- **Integration with community data formats and tools** (e.g. pandas, Parquet, Avro...)

## Supported engines

- Standard DBMS: [PostgreSQL](/docs/backends/postgres.html), [MySQL](/docs/backends/mysql.html), [SQLite](/docs/backends/sqlite.html)
- Analytical DBMS: [OmniSciDB](/docs/backends/omnisci.html), [ClickHouse](/docs/backends/clickhouse.html)
- Distributed platforms: [Impala](/docs/backends/impala.html), [Spark](/docs/backends/spark.html), [BigQuery](/docs/backends/bigquery.html)
- In memory execution: [pandas](/docs/backends/pandas.html)

## Example

The next example is all the code you need to connect to a database with a
countries database, and compute the number of citizens per squared kilometer in Asia:

```python
>>> import ibis
>>> db = ibis.sqlite.connect('geography.db')
>>> countries = db.table('countries')
>>> asian_countries = countries.filter(countries['continent'] == 'AS')
>>> density_in_asia = asian_countries['population'].sum() / asian_countries['area_km2'].sum()
>>> density_in_asia.execute()
130.7019141926602
```

Learn more about Ibis in [our tutorial](/docs/tutorial/).

## Comparison to other tools

### Why not use [pandas](https://pandas.pydata.org/)?

pandas is great for many use cases. But pandas loads the data into the
memory of the local host, and performs the computations on it.

Ibis instead, leaves the data in its storage, and performs the computations
there. This means that even if your data is distributed, or it requires
GPU accelarated speed, Ibis code will be able to benefit from your storage
capabilities.

### Why not use SQL?

SQL is widely used and very convenient when writing simple queries. But as
the complexity of operations grow, SQL can become very difficult to deal with.

With Ibis, you can take fully advantage of software engineering techniques to
keep your code readable and maintainable, while writing very complex analytics
code.

### Why not use [SQLAlchemy](https://www.sqlalchemy.org/)?

SQLAlchemy is very convenient as an ORM (Object Relational Mapper), providing
a Python interface to SQL databases. Ibis uses SQLAlchemy internally, but aims
to provide a friendlier syntax for analytics code. And Ibis is also not limited
to SQL databases, but also can connect to distributed platforms and in-memory
representations.

### Why not use [Dask](https://dask.org/)?

Dask provides advanced parallelism, and can distribute pandas jobs. Ibis can
process data in a similar way, but for a different number of backends. For
example, given a Spark cluster, Ibis allows to perform analytics using it,
with a familiar Python syntax. Ibis plans to add support for a Dask backend
in the future.
69 changes: 69 additions & 0 deletions docs/web/static/css/codehilite.css
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
.codehilite .hll { background-color: #ffffcc }
.codehilite { background: #f8f8f8; }
.codehilite .c { color: #408080; font-style: italic } /* Comment */
.codehilite .err { border: 1px solid #FF0000 } /* Error */
.codehilite .k { color: #008000; font-weight: bold } /* Keyword */
.codehilite .o { color: #666666 } /* Operator */
.codehilite .ch { color: #408080; font-style: italic } /* Comment.Hashbang */
.codehilite .cm { color: #408080; font-style: italic } /* Comment.Multiline */
.codehilite .cp { color: #BC7A00 } /* Comment.Preproc */
.codehilite .cpf { color: #408080; font-style: italic } /* Comment.PreprocFile */
.codehilite .c1 { color: #408080; font-style: italic } /* Comment.Single */
.codehilite .cs { color: #408080; font-style: italic } /* Comment.Special */
.codehilite .gd { color: #A00000 } /* Generic.Deleted */
.codehilite .ge { font-style: italic } /* Generic.Emph */
.codehilite .gr { color: #FF0000 } /* Generic.Error */
.codehilite .gh { color: #000080; font-weight: bold } /* Generic.Heading */
.codehilite .gi { color: #00A000 } /* Generic.Inserted */
.codehilite .go { color: #888888 } /* Generic.Output */
.codehilite .gp { color: #000080; font-weight: bold } /* Generic.Prompt */
.codehilite .gs { font-weight: bold } /* Generic.Strong */
.codehilite .gu { color: #800080; font-weight: bold } /* Generic.Subheading */
.codehilite .gt { color: #0044DD } /* Generic.Traceback */
.codehilite .kc { color: #008000; font-weight: bold } /* Keyword.Constant */
.codehilite .kd { color: #008000; font-weight: bold } /* Keyword.Declaration */
.codehilite .kn { color: #008000; font-weight: bold } /* Keyword.Namespace */
.codehilite .kp { color: #008000 } /* Keyword.Pseudo */
.codehilite .kr { color: #008000; font-weight: bold } /* Keyword.Reserved */
.codehilite .kt { color: #B00040 } /* Keyword.Type */
.codehilite .m { color: #666666 } /* Literal.Number */
.codehilite .s { color: #BA2121 } /* Literal.String */
.codehilite .na { color: #7D9029 } /* Name.Attribute */
.codehilite .nb { color: #008000 } /* Name.Builtin */
.codehilite .nc { color: #0000FF; font-weight: bold } /* Name.Class */
.codehilite .no { color: #880000 } /* Name.Constant */
.codehilite .nd { color: #AA22FF } /* Name.Decorator */
.codehilite .ni { color: #999999; font-weight: bold } /* Name.Entity */
.codehilite .ne { color: #D2413A; font-weight: bold } /* Name.Exception */
.codehilite .nf { color: #0000FF } /* Name.Function */
.codehilite .nl { color: #A0A000 } /* Name.Label */
.codehilite .nn { color: #0000FF; font-weight: bold } /* Name.Namespace */
.codehilite .nt { color: #008000; font-weight: bold } /* Name.Tag */
.codehilite .nv { color: #19177C } /* Name.Variable */
.codehilite .ow { color: #AA22FF; font-weight: bold } /* Operator.Word */
.codehilite .w { color: #bbbbbb } /* Text.Whitespace */
.codehilite .mb { color: #666666 } /* Literal.Number.Bin */
.codehilite .mf { color: #666666 } /* Literal.Number.Float */
.codehilite .mh { color: #666666 } /* Literal.Number.Hex */
.codehilite .mi { color: #666666 } /* Literal.Number.Integer */
.codehilite .mo { color: #666666 } /* Literal.Number.Oct */
.codehilite .sa { color: #BA2121 } /* Literal.String.Affix */
.codehilite .sb { color: #BA2121 } /* Literal.String.Backtick */
.codehilite .sc { color: #BA2121 } /* Literal.String.Char */
.codehilite .dl { color: #BA2121 } /* Literal.String.Delimiter */
.codehilite .sd { color: #BA2121; font-style: italic } /* Literal.String.Doc */
.codehilite .s2 { color: #BA2121 } /* Literal.String.Double */
.codehilite .se { color: #BB6622; font-weight: bold } /* Literal.String.Escape */
.codehilite .sh { color: #BA2121 } /* Literal.String.Heredoc */
.codehilite .si { color: #BB6688; font-weight: bold } /* Literal.String.Interpol */
.codehilite .sx { color: #008000 } /* Literal.String.Other */
.codehilite .sr { color: #BB6688 } /* Literal.String.Regex */
.codehilite .s1 { color: #BA2121 } /* Literal.String.Single */
.codehilite .ss { color: #19177C } /* Literal.String.Symbol */
.codehilite .bp { color: #008000 } /* Name.Builtin.Pseudo */
.codehilite .fm { color: #0000FF } /* Name.Function.Magic */
.codehilite .vc { color: #19177C } /* Name.Variable.Class */
.codehilite .vg { color: #19177C } /* Name.Variable.Global */
.codehilite .vi { color: #19177C } /* Name.Variable.Instance */
.codehilite .vm { color: #19177C } /* Name.Variable.Magic */
.codehilite .il { color: #666666 } /* Literal.Number.Integer.Long */
104 changes: 104 additions & 0 deletions docs/web/static/css/ibis.css
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
body {
padding-top: 5em;
color: #444;
}
h1 {
font-size: 2.4rem;
font-weight: 700;
color: #2980B9;
}
h2 {
font-size: 1.45rem;
font-weight: 700;
color: black;
}
h3 {
font-size: 1.1rem;
font-weight: 600;
color: #444;
}
h3 a {
color: #446;
}
h4 {
font-size: 1rem;
font-weight: 500;
color: #444;
}
a {
color: #130654;
}
pre {
white-space: pre;
padding: 10px;
background-color: #fafafa;
color: #222;
line-height: 1.2em;
border: 1px solid #c9c9c9;
margin: 1.5em 0;
box-shadow: 1px 1px 1px #d8d8d8
}
blockquote, blockquote p {
color: #888;
}
.header-title {
font-size: 1.75rem;
font-weight: bold;
vertical-align: middle;
}
.blue {
color: #150458;
}
.pink {
color: #e70488;
}
.fab {
font-size: 1.2rem;
color: #666;
}
.fab:hover {
color: #130654;
}
a.navbar-brand img {
max-height: 50px;
height: 3rem;
margin-right: 0.75rem;
}
div.card {
margin: 0 0 .2em .2em !important;
}
div.card .card-title {
font-weight: 500;
color: #130654;
}
.book {
padding: 0 20%;
}
.bg-dark {
background-color: #2980B9 !important;
}
.navbar-dark .navbar-nav .nav-link {
color: rgba(255, 255, 255, .9);
}
.navbar-dark .navbar-nav .nav-link:hover {
color: white;
}
table.logo td {
text-align: center;
}
table.logo img {
height: 4rem;
}
.home-jumbotron {
background-image: url(/static/img/ibis_sky.png);
background-size: contain;
background-repeat: no-repeat;
background-color: #d8e8ff;
background-position: center;
min-height: 9rem;
}
.home-jumbotron p {
position: absolute;
bottom: 2rem;
text-shadow: 1px 1px #ffffff;
}
Binary file added docs/web/static/img/favicon.ico
Binary file not shown.
Binary file added docs/web/static/img/ibis_sky.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/web/static/img/install/anaconda_prompt.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/web/static/img/install/jupyterlab_home.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
128 changes: 128 additions & 0 deletions docs/web/static/img/logo_ibis.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
59 changes: 59 additions & 0 deletions environment.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# This file should have all the dependencies for development excluding the specific to the backends.
name: ibis-dev
channels:
- conda-forge
dependencies:
# Ibis hard dependencies
- multipledispatch>=0.6.0
- numpy>=1.19
- pandas>=0.25 # XXX pymapd does not support pandas 1.0
- pytz>=2020.1
- regex>=2020.7
- toolz>=0.10

# Ibis soft dependencies
# TODO This section is probably not very accurate right now (some dependencies should probably be in the backends files)
- sqlalchemy>=1.3
- graphviz>=2.38
- openjdk=8
- pytables>=3.6
- python-graphviz>=0.14
- python-hdfs>=2.0.16 # XXX this verison can probably be increased

# Dev tools
- asv>=0.4.2
- black=19.10b0
- click>=7.1 # few scripts in ci/
- conda-build # feedstock
- cmake>=3.17
- flake8>=3.8
- isort>=5.3
- jinja2>=2.11 # feedstock
- mypy>=0.782
- plumbum>=1.6 # few scripts in ci/ and dev/
- pre-commit>=2.6
- pydocstyle>=4.0
- pygit2>=1.2 # dev/genrelease.py
- pytest>=5.4
- pytest-cov>=2.10
- pytest-mock>=3.1
- ruamel.yaml>=0.16 # feedstock
- libiconv>=1.15 # bug in repo2docker, see https://github.com/jupyter/repo2docker/issues/758
- xorg-libxpm>=3.5
- xorg-libxrender>=0.9

# Docs
- ipython>=7.17
- jupyter>=1.0
- matplotlib>=2 # XXX test if this can be bumped
- nbconvert
- nbsphinx>=0.7
- nomkl
- pyarrow>=0.12 # must pin again otherwise strange things happen
- semantic_version=2.6 # https://github.com/ibis-project/ibis/issues/2027
- sphinx>=2.0.1
- sphinx-releases
- sphinx_rtd_theme>=0.5
- pip
- pip:
- pysuerga
126 changes: 33 additions & 93 deletions ibis/__init__.py
Original file line number Diff line number Diff line change
@@ -1,144 +1,84 @@
"""Initialize Ibis module."""
import warnings
from contextlib import suppress

import ibis.config_init # noqa: F401
import ibis.expr.api as api # noqa: F401
import ibis.expr.types as ir # noqa: F401
import ibis.util as util # noqa: F401

# pandas backend is mandatory
import ibis.pandas.api as pandas # noqa: F401
import ibis.util as util # noqa: F401
from ibis.common.exceptions import IbisError
from ibis.backends import pandas # noqa: F401
from ibis.common.exceptions import IbisError # noqa: F401
from ibis.config import options # noqa: F401
from ibis.expr.api import * # noqa: F401,F403
from ibis.filesystems import HDFS, WebHDFS # noqa: F401

from ._version import get_versions # noqa: E402

with suppress(ImportError):
# pip install ibis-framework[csv]
import ibis.file.csv as csv # noqa: F401
from ibis.backends import csv # noqa: F401

with suppress(ImportError):
# pip install ibis-framework[parquet]
import ibis.file.parquet as parquet # noqa: F401
from ibis.backends import parquet # noqa: F401

with suppress(ImportError):
# pip install ibis-framework[hdf5]
import ibis.file.hdf5 as hdf5 # noqa: F401
from ibis.backends import hdf5 # noqa: F401

with suppress(ImportError):
# pip install ibis-framework[impala]
import ibis.impala.api as impala # noqa: F401
from ibis.backends import impala # noqa: F401

with suppress(ImportError):
# pip install ibis-framework[sqlite]
import ibis.sql.sqlite.api as sqlite # noqa: F401
from ibis.backends import sqlite # noqa: F401

with suppress(ImportError):
# pip install ibis-framework[postgres]
import ibis.sql.postgres.api as postgres # noqa: F401
from ibis.backends import postgres # noqa: F401

with suppress(ImportError):
# pip install ibis-framework[mysql]
import ibis.sql.mysql.api as mysql # noqa: F401
from ibis.backends import mysql # noqa: F401

with suppress(ImportError):
# pip install ibis-framework[clickhouse]
import ibis.clickhouse.api as clickhouse # noqa: F401
from ibis.backends import clickhouse # noqa: F401

with suppress(ImportError):
# pip install ibis-framework[bigquery]
import ibis.bigquery.api as bigquery # noqa: F401
from ibis.backends import bigquery # noqa: F401

with suppress(ImportError):
# pip install ibis-framework[omniscidb]
import ibis.omniscidb.api as omniscidb # noqa: F401
from ibis.backends import omniscidb # noqa: F401

with suppress(ImportError):
# pip install ibis-framework[spark]
import ibis.spark.api as spark # noqa: F401
from ibis.backends import spark # noqa: F401

with suppress(ImportError):
import ibis.pyspark.api as pyspark # noqa: F401


def hdfs_connect(
host='localhost',
port=50070,
protocol='webhdfs',
use_https='default',
auth_mechanism='NOSASL',
verify=True,
session=None,
**kwds,
):
"""Connect to HDFS.
Parameters
----------
host : str
Host name of the HDFS NameNode
port : int
NameNode's WebHDFS port
protocol : str,
The protocol used to communicate with HDFS. The only valid value is
``'webhdfs'``.
use_https : bool
Connect to WebHDFS with HTTPS, otherwise plain HTTP. For secure
authentication, the default for this is True, otherwise False.
auth_mechanism : str
Set to NOSASL or PLAIN for non-secure clusters.
Set to GSSAPI or LDAP for Kerberos-secured clusters.
verify : bool
Set to :data:`False` to turn off verifying SSL certificates.
session : Optional[requests.Session]
A custom :class:`requests.Session` object.
Notes
-----
Other keywords are forwarded to HDFS library classes.
Returns
-------
WebHDFS
"""
import requests

if session is None:
session = requests.Session()
session.verify = verify
if auth_mechanism in ('GSSAPI', 'LDAP'):
if use_https == 'default':
prefix = 'https'
else:
prefix = 'https' if use_https else 'http'
try:
import requests_kerberos # noqa: F401
except ImportError:
raise IbisError(
"Unable to import requests-kerberos, which is required for "
"Kerberos HDFS support. Install it by executing `pip install "
"requests-kerberos` or `pip install hdfs[kerberos]`."
)
from hdfs.ext.kerberos import KerberosClient

# note SSL
url = '{0}://{1}:{2}'.format(prefix, host, port)
kwds.setdefault('mutual_auth', 'OPTIONAL')
hdfs_client = KerberosClient(url, session=session, **kwds)
else:
if use_https == 'default':
prefix = 'http'
else:
prefix = 'https' if use_https else 'http'
from hdfs.client import InsecureClient

url = '{}://{}:{}'.format(prefix, host, port)
hdfs_client = InsecureClient(url, session=session, **kwds)
return WebHDFS(hdfs_client)
from ibis.backends import pyspark # noqa: F401


__version__ = get_versions()['version']
del get_versions


def __getattr__(name):
if name in ('HDFS', 'WebHDFS', 'hdfs_connect'):
warnings.warn(
f'`ibis.{name}` has been deprecated and will be removed in a '
f'future version, use `ibis.impala.{name}` instead',
FutureWarning,
stacklevel=2,
)
if 'impala' in globals():
return getattr(impala, name)
else:
raise AttributeError(
f'`ibis.{name}` requires impala backend to be installed'
)
raise AttributeError
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

import ibis
import ibis.expr.types as ir
from ibis.pandas.core import execute_and_reset
from ibis.backends.pandas.core import execute_and_reset


class FileClient(ibis.client.Client):
Expand Down
File renamed without changes.
File renamed without changes.
1,614 changes: 727 additions & 887 deletions ibis/impala/compiler.py → ibis/backends/base_sql/__init__.py

Large diffs are not rendered by default.

194 changes: 194 additions & 0 deletions ibis/backends/base_sql/compiler.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,194 @@
from operator import add, mul, sub

import ibis.backends.base_sqlalchemy.compiler as comp
import ibis.common.exceptions as com
import ibis.expr.datatypes as dt
import ibis.expr.operations as ops
import ibis.expr.types as ir
from ibis.backends.base_sql import (
binary_infix_ops,
operation_registry,
quote_identifier,
)


def build_ast(expr, context):
assert context is not None, 'context is None'
builder = BaseQueryBuilder(expr, context=context)
return builder.get_result()


def _get_query(expr, context):
assert context is not None, 'context is None'
ast = build_ast(expr, context)
query = ast.queries[0]

return query


def to_sql(expr, context=None):
if context is None:
context = BaseDialect.make_context()
assert context is not None, 'context is None'
query = _get_query(expr, context)
return query.compile()


# ----------------------------------------------------------------------
# Select compilation


class BaseSelectBuilder(comp.SelectBuilder):
@property
def _select_class(self):
return BaseSelect


class BaseQueryBuilder(comp.QueryBuilder):

select_builder = BaseSelectBuilder


class BaseContext(comp.QueryContext):
def _to_sql(self, expr, ctx):
return to_sql(expr, ctx)


class BaseSelect(comp.Select):

"""
A SELECT statement which, after execution, might yield back to the user a
table, array/list, or scalar value, depending on the expression that
generated it
"""

@property
def translator(self):
return BaseExprTranslator

@property
def table_set_formatter(self):
return BaseTableSetFormatter


class BaseTableSetFormatter(comp.TableSetFormatter):

_join_names = {
ops.InnerJoin: 'INNER JOIN',
ops.LeftJoin: 'LEFT OUTER JOIN',
ops.RightJoin: 'RIGHT OUTER JOIN',
ops.OuterJoin: 'FULL OUTER JOIN',
ops.LeftAntiJoin: 'LEFT ANTI JOIN',
ops.LeftSemiJoin: 'LEFT SEMI JOIN',
ops.CrossJoin: 'CROSS JOIN',
}

def _get_join_type(self, op):
jname = self._join_names[type(op)]

return jname

def _quote_identifier(self, name):
return quote_identifier(name)


_map_interval_to_microseconds = dict(
W=604800000000,
D=86400000000,
h=3600000000,
m=60000000,
s=1000000,
ms=1000,
us=1,
ns=0.001,
)


_map_interval_op_to_op = {
# Literal Intervals have two args, i.e.
# Literal(1, Interval(value_type=int8, unit='D', nullable=True))
# Parse both args and multipy 1 * _map_interval_to_microseconds['D']
ops.Literal: mul,
ops.IntervalMultiply: mul,
ops.IntervalAdd: add,
ops.IntervalSubtract: sub,
}


def _replace_interval_with_scalar(expr):
"""
Good old Depth-First Search to identify the Interval and IntervalValue
components of the expression and return a comparable scalar expression.
Parameters
----------
expr : float or expression of intervals
For example, ``ibis.interval(days=1) + ibis.interval(hours=5)``
Returns
-------
preceding : float or ir.FloatingScalar, depending upon the expr
"""
try:
expr_op = expr.op()
except AttributeError:
expr_op = None

if not isinstance(expr, (dt.Interval, ir.IntervalValue)):
# Literal expressions have op method but native types do not.
if isinstance(expr_op, ops.Literal):
return expr_op.value
else:
return expr
elif isinstance(expr, dt.Interval):
try:
microseconds = _map_interval_to_microseconds[expr.unit]
return microseconds
except KeyError:
raise ValueError(
"Expected preceding values of week(), "
+ "day(), hour(), minute(), second(), millisecond(), "
+ "microseconds(), nanoseconds(); got {}".format(expr)
)
elif expr_op.args and isinstance(expr, ir.IntervalValue):
if len(expr_op.args) > 2:
raise com.NotImplementedError(
"'preceding' argument cannot be parsed."
)
left_arg = _replace_interval_with_scalar(expr_op.args[0])
right_arg = _replace_interval_with_scalar(expr_op.args[1])
method = _map_interval_op_to_op[type(expr_op)]
return method(left_arg, right_arg)


_operation_registry = {**operation_registry, **binary_infix_ops}


# TODO move the name method to comp.ExprTranslator and use that instead
class BaseExprTranslator(comp.ExprTranslator):
"""Base expression translator."""

_registry = _operation_registry
context_class = BaseContext

@staticmethod
def _name_expr(formatted_expr, quoted_name):
return '{} AS {}'.format(formatted_expr, quoted_name)

def name(self, translated, name, force=True):
"""Return expression with its identifier."""
return self._name_expr(translated, quote_identifier(name, force=force))


class BaseDialect(comp.Dialect):
translator = BaseExprTranslator


compiles = BaseExprTranslator.compiles
rewrites = BaseExprTranslator.rewrites


@rewrites(ops.FloorDivide)
def _floor_divide(expr):
left, right = expr.op().args
return left.div(right).floor()
629 changes: 150 additions & 479 deletions ibis/impala/ddl.py → ibis/backends/base_sql/ddl.py

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,9 @@
# See the License for the specific language governing permissions and
# limitations under the License.

# Impala identifiers
# Base identifiers

impala_identifiers = [
base_identifiers = [
'add',
'aggregate',
'all',
Expand Down
File renamed without changes.
88 changes: 83 additions & 5 deletions ibis/sql/alchemy.py → ibis/backends/base_sqlalchemy/alchemy.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,11 +20,12 @@
import ibis.expr.schema as sch
import ibis.expr.types as ir
import ibis.expr.window as W
import ibis.sql.compiler as comp
import ibis.sql.transforms as transforms
import ibis.util as util
from ibis.client import Database, Query, SQLClient
from ibis.sql.compiler import Dialect, Select, TableSetFormatter, Union

from . import compiler as comp
from . import transforms
from .compiler import Dialect, Select, TableSetFormatter, Union

geospatial_supported = False
try:
Expand Down Expand Up @@ -158,6 +159,16 @@ def sa_uuid(_, satype, nullable=True):
return dt.UUID(nullable=nullable)


@dt.dtype.register(PostgreSQLDialect, sa.dialects.postgresql.MACADDR)
def sa_macaddr(_, satype, nullable=True):
return dt.MACADDR(nullable=nullable)


@dt.dtype.register(PostgreSQLDialect, sa.dialects.postgresql.INET)
def sa_inet(_, satype, nullable=True):
return dt.INET(nullable=nullable)


@dt.dtype.register(PostgreSQLDialect, sa.dialects.postgresql.JSON)
def sa_json(_, satype, nullable=True):
return dt.JSON(nullable=nullable)
Expand Down Expand Up @@ -797,6 +808,7 @@ def _ntile(t, expr):
ops.GeoDisjoint: fixed_arity(sa.func.ST_Disjoint, 2),
ops.GeoDistance: fixed_arity(sa.func.ST_Distance, 2),
ops.GeoDWithin: fixed_arity(sa.func.ST_DWithin, 3),
ops.GeoEndPoint: unary(sa.func.ST_EndPoint),
ops.GeoEnvelope: unary(sa.func.ST_Envelope),
ops.GeoEquals: fixed_arity(sa.func.ST_Equals, 2),
ops.GeoGeometryN: fixed_arity(sa.func.ST_GeometryN, 2),
Expand All @@ -815,6 +827,7 @@ def _ntile(t, expr):
ops.GeoSimplify: fixed_arity(sa.func.ST_Simplify, 3),
ops.GeoSRID: unary(sa.func.ST_SRID),
ops.GeoSetSRID: fixed_arity(sa.func.ST_SetSRID, 2),
ops.GeoStartPoint: unary(sa.func.ST_StartPoint),
ops.GeoTouches: fixed_arity(sa.func.ST_Touches, 2),
ops.GeoTransform: fixed_arity(sa.func.ST_Transform, 2),
ops.GeoUnaryUnion: unary(sa.func.ST_Union),
Expand Down Expand Up @@ -1077,6 +1090,7 @@ class AlchemyClient(SQLClient):

dialect = AlchemyDialect
query_class = AlchemyQuery
has_attachment = False

def __init__(self, con: sa.engine.Engine) -> None:
super().__init__()
Expand All @@ -1099,7 +1113,11 @@ def begin(self):

@invalidates_reflection_cache
def create_table(self, name, expr=None, schema=None, database=None):
if database is not None and database != self.engine.url.database:
if database == self.database_name:
# avoid fully qualified name
database = None

if database is not None:
raise NotImplementedError(
'Creating tables from a different database is not yet '
'implemented'
Expand Down Expand Up @@ -1150,7 +1168,11 @@ def drop_table(
database: Optional[str] = None,
force: bool = False,
) -> None:
if database is not None and database != self.con.url.database:
if database == self.database_name:
# avoid fully qualified name
database = None

if database is not None:
raise NotImplementedError(
'Dropping tables from a different database is not yet '
'implemented'
Expand All @@ -1172,6 +1194,54 @@ def drop_table(
except KeyError: # schemas won't be cached if created with raw_sql
pass

def load_data(
self,
table_name: str,
data: pd.DataFrame,
database: str = None,
if_exists: str = 'fail',
):
"""
Load data from a dataframe to the backend.
Parameters
----------
table_name : string
data: pandas.DataFrame
database : string, optional
if_exists : string, optional, default 'fail'
The values available are: {‘fail’, ‘replace’, ‘append’}
Raises
------
NotImplementedError
Loading data to a table from a different database is not
yet implemented
"""
if database == self.database_name:
# avoid fully qualified name
database = None

if database is not None:
raise NotImplementedError(
'Loading data to a table from a different database is not '
'yet implemented'
)

params = {}
if self.has_attachment:
# for database with attachment
# see: https://github.com/ibis-project/ibis/issues/1930
params['schema'] = self.database_name

data.to_sql(
table_name,
con=self.con,
index=False,
if_exists=if_exists,
**params,
)

def truncate_table(
self, table_name: str, database: Optional[str] = None
) -> None:
Expand Down Expand Up @@ -1218,6 +1288,14 @@ def raw_sql(self, query: str, results: bool = False):
def _build_ast(self, expr, context):
return build_ast(expr, context)

def _log(self, sql):
try:
query_str = str(sql)
except sa.exc.UnsupportedCompilationError:
pass
else:
util.log(query_str)

def _get_sqla_table(self, name, schema=None, autoload=True):
return sa.Table(name, self.meta, schema=schema, autoload=autoload)

Expand Down
108 changes: 97 additions & 11 deletions ibis/sql/compiler.py → ibis/backends/base_sqlalchemy/compiler.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,10 @@
import ibis.expr.format as fmt
import ibis.expr.operations as ops
import ibis.expr.types as ir
import ibis.sql.transforms as transforms
import ibis.util as util

from . import transforms


class DML(abc.ABC):
@abc.abstractmethod
Expand Down Expand Up @@ -697,6 +698,18 @@ def visit_Union(self, expr):
self.visit(op.right)
self.observe(expr)

def visit_Intersection(self, expr):
op = expr.op()
self.visit(op.left)
self.visit(op.right)
self.observe(expr)

def visit_Difference(self, expr):
op = expr.op()
self.visit(op.left)
self.visit(op.right)
self.observe(expr)

def visit_MaterializedJoin(self, expr):
self.visit(expr.op().join)
self.observe(expr)
Expand Down Expand Up @@ -933,11 +946,10 @@ def column_handler(results):
)


class Union(DML):
def __init__(self, tables, expr, context, distincts):
class SetOp(DML):
def __init__(self, tables, expr, context):
self.context = context
self.tables = tables
self.distincts = distincts
self.table_set = expr
self.filters = []

Expand All @@ -964,9 +976,8 @@ def format_relation(self, expr):
return 'SELECT *\nFROM {}'.format(ref)
return self.context.get_compiled_expr(expr)

@staticmethod
def keyword(distinct):
return 'UNION' if distinct else 'UNION ALL'
def _get_keyword_list(self):
raise NotImplementedError("Need objects to interleave")

def compile(self):
self._extract_subqueries()
Expand All @@ -978,20 +989,41 @@ def compile(self):
if extracted:
buf.append('WITH {}'.format(extracted))

# interleave correct keyword for the backend in between the formatted
# UNION expressions
buf.extend(
toolz.interleave(
(
map(self.format_relation, self.tables),
map(self.keyword, self.distincts),
self._get_keyword_list(),
)
)
)
return '\n'.join(buf)


def flatten_union(table):
class Union(SetOp):
def __init__(self, tables, expr, context, distincts):
super().__init__(tables, expr, context)
self.distincts = distincts

@staticmethod
def keyword(distinct):
return 'UNION' if distinct else 'UNION ALL'

def _get_keyword_list(self):
return map(self.keyword, self.distincts)


class Intersection(SetOp):
def _get_keyword_list(self):
return ["INTERSECT" for _ in range(len(self.tables) - 1)]


class Difference(SetOp):
def _get_keyword_list(self):
return ["EXCEPT"] * (len(self.tables) - 1)


def flatten_union(table: ir.TableExpr):
"""Extract all union queries from `table`.
Parameters
Expand All @@ -1010,10 +1042,46 @@ def flatten_union(table):
return [table]


def flatten_intersection(table: ir.TableExpr):
"""Extract all intersection queries from `table`.
Parameters
----------
table : TableExpr
Returns
-------
Iterable[Union[TableExpr]]
"""
op = table.op()
if isinstance(op, ops.Intersection):
return toolz.concatv(flatten_union(op.left), flatten_union(op.right))
return [table]


def flatten_difference(table: ir.TableExpr):
"""Extract all intersection queries from `table`.
Parameters
----------
table : TableExpr
Returns
-------
Iterable[Union[TableExpr]]
"""
op = table.op()
if isinstance(op, ops.Difference):
return toolz.concatv(flatten_union(op.left), flatten_union(op.right))
return [table]


class QueryBuilder:

select_builder = SelectBuilder
union_class = Union
intersect_class = Intersection
difference_class = Difference

def __init__(self, expr, context):
self.expr = expr
Expand All @@ -1036,6 +1104,10 @@ def get_result(self):
# to building the result set-generating statements.
if isinstance(op, ops.Union):
query = self._make_union()
elif isinstance(op, ops.Intersection):
query = self._make_intersect()
elif isinstance(op, ops.Difference):
query = self._make_difference()
else:
query = self._make_select()

Expand Down Expand Up @@ -1066,6 +1138,20 @@ def _make_union(self):
table_exprs, self.expr, distincts=distincts, context=self.context
)

def _make_intersect(self):
# flatten intersections so that we can codegen them all at once
table_exprs = list(flatten_intersection(self.expr))
return self.intersect_class(
table_exprs, self.expr, context=self.context
)

def _make_difference(self):
# flatten differences so that we can codegen them all at once
table_exprs = list(flatten_difference(self.expr))
return self.difference_class(
table_exprs, self.expr, context=self.context
)

def _make_select(self):
builder = self.select_builder(self.expr, self.context)
return builder.get_result()
Expand Down
File renamed without changes.
17 changes: 12 additions & 5 deletions ibis/bigquery/api.py → ibis/backends/bigquery/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,13 @@
import pydata_google_auth

import ibis.common.exceptions as com
from ibis.bigquery.client import BigQueryClient
from ibis.bigquery.compiler import dialect
from ibis.config import options # noqa: F401

from .client import BigQueryClient
from .compiler import dialect

try:
from ibis.bigquery.udf import udf # noqa: F401
from .udf import udf
except ImportError:
pass

Expand All @@ -32,7 +33,7 @@ def compile(expr, params=None):
ibis.expr.types.Expr.compile
"""
from ibis.bigquery.compiler import to_sql
from .compiler import to_sql

return to_sql(expr, dialect.make_context(params=params))

Expand All @@ -57,6 +58,7 @@ def connect(
project_id: Optional[str] = None,
dataset_id: Optional[str] = None,
credentials: Optional[google.auth.credentials.Credentials] = None,
application_name: Optional[str] = None,
) -> BigQueryClient:
"""Create a BigQueryClient for use with Ibis.
Expand All @@ -68,6 +70,8 @@ def connect(
A dataset id that lives inside of the project indicated by
`project_id`.
credentials : google.auth.credentials.Credentials
application_name : str
A string identifying your application to Google API endpoints.
Returns
-------
Expand All @@ -86,5 +90,8 @@ def connect(
)

return BigQueryClient(
project_id, dataset_id=dataset_id, credentials=credentials
project_id,
dataset_id=dataset_id,
credentials=credentials,
application_name=application_name,
)
40 changes: 31 additions & 9 deletions ibis/bigquery/client.py → ibis/backends/bigquery/client.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
import google.cloud.bigquery as bq
import pandas as pd
import regex as re
from google.api_core.client_info import ClientInfo
from google.api_core.exceptions import NotFound
from multipledispatch import Dispatcher
from pkg_resources import parse_version
Expand All @@ -18,10 +19,11 @@
import ibis.expr.operations as ops
import ibis.expr.schema as sch
import ibis.expr.types as ir
from ibis.bigquery import compiler as comp
from ibis.bigquery.datatypes import ibis_type_to_bigquery_type
from ibis.client import Database, Query, SQLClient

from . import compiler as comp
from .datatypes import ibis_type_to_bigquery_type

NATIVE_PARTITION_COL = '_PARTITIONTIME'


Expand All @@ -47,6 +49,19 @@
}


_USER_AGENT_DEFAULT_TEMPLATE = 'ibis/{}'


def _create_client_info(application_name):
user_agent = []

if application_name:
user_agent.append(application_name)

user_agent.append(_USER_AGENT_DEFAULT_TEMPLATE.format(ibis.__version__))
return ClientInfo(user_agent=" ".join(user_agent))


@dt.dtype.register(bq.schema.SchemaField)
def bigquery_field_to_ibis_dtype(field):
"""Convert BigQuery `field` to an ibis type."""
Expand Down Expand Up @@ -364,7 +379,13 @@ class BigQueryClient(SQLClient):
table_class = BigQueryTable
dialect = comp.BigQueryDialect

def __init__(self, project_id, dataset_id=None, credentials=None):
def __init__(
self,
project_id,
dataset_id=None,
credentials=None,
application_name=None,
):
"""Construct a BigQueryClient.
Parameters
Expand All @@ -374,6 +395,8 @@ def __init__(self, project_id, dataset_id=None, credentials=None):
dataset_id : Optional[str]
A ``<project_id>.<dataset_id>`` string or just a dataset name
credentials : google.auth.credentials.Credentials
application_name : str
A string identifying your application to Google API endpoints.
"""
(
Expand All @@ -382,7 +405,9 @@ def __init__(self, project_id, dataset_id=None, credentials=None):
self.dataset,
) = parse_project_and_dataset(project_id, dataset_id)
self.client = bq.Client(
project=self.data_project, credentials=credentials
project=self.data_project,
credentials=credentials,
client_info=_create_client_info(application_name),
)

def _parse_project_and_dataset(self, dataset):
Expand Down Expand Up @@ -414,11 +439,8 @@ def _build_ast(self, expr, context):
result = comp.build_ast(expr, context)
return result

def _execute_query(self, dml):
query = self.query_class(
self, dml, query_parameters=dml.context.params
)
return query.execute()
def _get_query(self, dml, **kwargs):
return self.query_class(self, dml, query_parameters=dml.context.params)

def _fully_qualified_name(self, name, database):
project, dataset = self._parse_project_and_dataset(database)
Expand Down
67 changes: 47 additions & 20 deletions ibis/bigquery/compiler.py → ibis/backends/bigquery/compiler.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
import base64
import datetime
from functools import partial

Expand All @@ -7,22 +8,22 @@
from multipledispatch import Dispatcher

import ibis
import ibis.backends.base_sqlalchemy.compiler as comp
import ibis.common.exceptions as com
import ibis.expr.datatypes as dt
import ibis.expr.lineage as lin
import ibis.expr.operations as ops
import ibis.expr.types as ir
import ibis.sql.compiler as comp
from ibis.bigquery.datatypes import ibis_type_to_bigquery_type
from ibis.impala import compiler as impala_compiler
from ibis.impala.compiler import (
ImpalaSelect,
ImpalaTableSetFormatter,
_reduction,
fixed_arity,
unary,
from ibis.backends import base_sql
from ibis.backends.base_sql import fixed_arity, literal, reduction, unary
from ibis.backends.base_sql.compiler import (
BaseExprTranslator,
BaseSelect,
BaseTableSetFormatter,
)

from .datatypes import ibis_type_to_bigquery_type


class BigQueryUDFNode(ops.ValueOp):
pass
Expand Down Expand Up @@ -95,7 +96,10 @@ def _extract_field(sql_attr):
def extract_field_formatter(translator, expr):
op = expr.op()
arg = translator.translate(op.args[0])
return 'EXTRACT({} from {})'.format(sql_attr, arg)
if sql_attr == 'epochseconds':
return f'UNIX_SECONDS({arg})'
else:
return f'EXTRACT({sql_attr} from {arg})'

return extract_field_formatter

Expand Down Expand Up @@ -140,6 +144,18 @@ def _array_index(translator, expr):
)


def _hash(translator, expr):
op = expr.op()
arg, how = op.args

arg_formatted = translator.translate(arg)

if how == 'farm_fingerprint':
return f'farm_fingerprint({arg_formatted})'
else:
raise NotImplementedError(how)


def _string_find(translator, expr):
haystack, needle, start, end = expr.op().args

Expand Down Expand Up @@ -249,9 +265,13 @@ def _literal(translator, expr):
elif isinstance(expr, ir.TimeScalar):
# TODO: define extractors on TimeValue expressions
return "TIME '{}'".format(value)
elif isinstance(expr, ir.BinaryScalar):
return "FROM_BASE64('{}')".format(
base64.b64encode(value).decode(encoding="utf-8")
)

try:
return impala_compiler._literal(translator, expr)
return literal(translator, expr)
except NotImplementedError:
if isinstance(expr, ir.ArrayValue):
return _array_literal_format(expr)
Expand Down Expand Up @@ -333,16 +353,21 @@ def _formatter(translator, expr):
}


_operation_registry = impala_compiler._operation_registry.copy()
_operation_registry = {
**base_sql.operation_registry,
}
_operation_registry.update(
{
ops.ExtractYear: _extract_field('year'),
ops.ExtractQuarter: _extract_field('quarter'),
ops.ExtractMonth: _extract_field('month'),
ops.ExtractDay: _extract_field('day'),
ops.ExtractHour: _extract_field('hour'),
ops.ExtractMinute: _extract_field('minute'),
ops.ExtractSecond: _extract_field('second'),
ops.ExtractMillisecond: _extract_field('millisecond'),
ops.ExtractEpochSeconds: _extract_field('epochseconds'),
ops.Hash: _hash,
ops.StringReplace: fixed_arity('REPLACE', 3),
ops.StringSplit: fixed_arity('SPLIT', 2),
ops.StringConcat: _string_concat,
Expand All @@ -354,15 +379,15 @@ def _formatter(translator, expr):
ops.RegexSearch: _regex_search,
ops.RegexExtract: _regex_extract,
ops.RegexReplace: _regex_replace,
ops.GroupConcat: _reduction('STRING_AGG'),
ops.GroupConcat: reduction('STRING_AGG'),
ops.IfNull: fixed_arity('IFNULL', 2),
ops.Cast: _cast,
ops.StructField: _struct_field,
ops.ArrayCollect: unary('ARRAY_AGG'),
ops.ArrayConcat: _array_concat,
ops.ArrayIndex: _array_index,
ops.ArrayLength: unary('ARRAY_LENGTH'),
ops.HLLCardinality: _reduction('APPROX_COUNT_DISTINCT'),
ops.HLLCardinality: reduction('APPROX_COUNT_DISTINCT'),
ops.Log: _log,
ops.Sign: unary('SIGN'),
ops.Modulus: fixed_arity('MOD', 2),
Expand Down Expand Up @@ -403,9 +428,9 @@ def _formatter(translator, expr):
}


class BigQueryExprTranslator(impala_compiler.ImpalaExprTranslator):
class BigQueryExprTranslator(BaseExprTranslator):
_registry = _operation_registry
_rewrites = impala_compiler.ImpalaExprTranslator._rewrites.copy()
_rewrites = BaseExprTranslator._rewrites.copy()

context_class = BigQueryContext

Expand Down Expand Up @@ -470,14 +495,14 @@ def compiles_string_to_timestamp(translator, expr):
return 'PARSE_TIMESTAMP({}, {})'.format(fmt_string, arg_formatted)


class BigQueryTableSetFormatter(ImpalaTableSetFormatter):
class BigQueryTableSetFormatter(BaseTableSetFormatter):
def _quote_identifier(self, name):
if re.match(r'^[A-Za-z][A-Za-z_0-9]*$', name):
return name
return '`{}`'.format(name)


class BigQuerySelect(ImpalaSelect):
class BigQuerySelect(BaseSelect):

translator = BigQueryExprTranslator

Expand Down Expand Up @@ -568,7 +593,9 @@ def compiles_covar(translator, expr):
left = where.ifelse(left, ibis.NA)
right = where.ifelse(right, ibis.NA)

return "COVAR_{}({}, {})".format(how, left, right)
return "COVAR_{}({}, {})".format(
how, translator.translate(left), translator.translate(right)
)


@rewrites(ops.Any)
Expand Down Expand Up @@ -603,7 +630,7 @@ def bigquery_compile_notall(translator, expr):
)


class BigQueryDialect(impala_compiler.ImpalaDialect):
class BigQueryDialect(comp.Dialect):
translator = BigQueryExprTranslator


Expand Down
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
DATASET_ID = 'testing'


def connect(project_id, dataset_id):
def connect(project_id, dataset_id, application_name=None):
ga = pytest.importorskip('google.auth')
service_account = pytest.importorskip('google.oauth2.service_account')
google_application_credentials = os.environ.get(
Expand Down Expand Up @@ -40,7 +40,10 @@ def connect(project_id, dataset_id):
)
try:
return ibis.bigquery.connect(
project_id, dataset_id, credentials=credentials
project_id,
dataset_id,
credentials=credentials,
application_name=application_name,
)
except ga.exceptions.DefaultCredentialsError:
pytest.skip(skip_message)
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
import collections
import datetime
import decimal
from unittest import mock

import numpy as np
import pandas as pd
Expand All @@ -17,8 +18,8 @@
ga = pytest.importorskip('google.auth')
exceptions = pytest.importorskip('google.api_core.exceptions')

from ibis.bigquery.client import bigquery_param # noqa: E402, isort:skip
from ibis.bigquery.tests.conftest import connect # noqa: E402, isort:skip
from .client import bigquery_param # noqa: E402, isort:skip
from .tests.conftest import connect # noqa: E402, isort:skip


def test_table(alltypes):
Expand Down Expand Up @@ -358,6 +359,7 @@ def test_scalar_param_date(alltypes, df, date_value):
tm.assert_frame_equal(result, expected)


@pytest.mark.xfail(reason='Issue #2374', strict=True)
def test_scalar_param_array(alltypes, df):
param = ibis.param('array<double>')
expr = alltypes.sort_by('id').limit(1).double_col.collect() + param
Expand All @@ -374,6 +376,7 @@ def test_scalar_param_struct(client):
assert value == result


@pytest.mark.xfail(reason='Issue #2373', strict=True)
def test_scalar_param_nested(client):
param = ibis.param('struct<x: array<struct<y: array<double>>>>')
value = collections.OrderedDict(
Expand Down Expand Up @@ -715,10 +718,26 @@ def test_approx_median(alltypes):

expr = m.approx_median()
result = expr.execute()
assert result == expected
# Since 6 and 7 are right on the edge for median in the range of months
# (1-12), accept either for the approximate function.
assert result in (6, 7)


def test_client_without_dataset(project_id):
con = connect(project_id, dataset_id=None)
with pytest.raises(ValueError, match="Unable to determine BigQuery"):
con.list_tables()


def test_client_sets_user_agent(project_id, monkeypatch):
mock_client = mock.create_autospec(bq.Client)
monkeypatch.setattr(bq, 'Client', mock_client)
connect(
project_id,
dataset_id='bigquery-public-data.stackoverflow',
application_name='my-great-app/0.7.0',
)
info = mock_client.call_args[1]['client_info']
user_agent = info.to_user_agent()
assert ' ibis/{}'.format(ibis.__version__) in user_agent
assert 'my-great-app/0.7.0 ' in user_agent
Loading