8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

|Service|Status|
| -------------: | :---- |
| Documentation | [![Documentation Status](https://img.shields.io/badge/docs-docs.ibis--project.org-blue.svg)](http://docs.ibis-project.org) |
| Documentation | [![Documentation Status](https://img.shields.io/badge/docs-docs.ibis--project.org-blue.svg)](http://ibis-project.org) |
| Conda packages | [![Anaconda-Server Badge](https://anaconda.org/conda-forge/ibis-framework/badges/version.svg)](https://anaconda.org/conda-forge/ibis-framework) |
| PyPI | [![PyPI](https://img.shields.io/pypi/v/ibis-framework.svg)](https://pypi.org/project/ibis-framework) |
| Azure | [![Azure Status](https://dev.azure.com/ibis-project/ibis/_apis/build/status/ibis-project.ibis)](https://dev.azure.com/ibis-project/ibis/_build) |
Expand Down Expand Up @@ -34,10 +34,10 @@ Ibis currently provides tools for interacting with the following systems:
- [PostgreSQL](https://www.postgresql.org/)
- [MySQL](https://www.mysql.com/) (Experimental)
- [SQLite](https://www.sqlite.org/)
- [Pandas](https://pandas.pydata.org/) [DataFrames](http://pandas.pydata.org/pandas-docs/stable/dsintro.html#dataframe) (Experimental)
- [Pandas](https://pandas.pydata.org/) [DataFrames](http://pandas.pydata.org/pandas-docs/stable/dsintro.html#dataframe)
- [Clickhouse](https://clickhouse.yandex)
- [BigQuery](https://cloud.google.com/bigquery)
- [OmniSciDB](https://www.omnisci.com) (Experimental)
- [OmniSciDB](https://www.omnisci.com)
- [Spark](https://spark.apache.org) (Experimental)

Learn more about using the library at http://docs.ibis-project.org.
Learn more about using the library at http://ibis-project.org.
5 changes: 0 additions & 5 deletions azure-pipelines.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,3 @@ jobs:
parameters:
name: Linux
vmImage: ubuntu-16.04

- template: ci/azure/windows.yml
parameters:
name: Windows
vmImage: windows-2019
2 changes: 1 addition & 1 deletion benchmarks/benchmarks.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

import ibis
import ibis.expr.datatypes as dt
from ibis.pandas.udf import udf
from ibis.backends.pandas.udf import udf


def make_t(name='t'):
Expand Down
2 changes: 0 additions & 2 deletions ci/.env
Original file line number Diff line number Diff line change
Expand Up @@ -23,5 +23,3 @@ IBIS_TEST_OMNISCIDB_PORT=6274
IBIS_TEST_OMNISCIDB_DATABASE=ibis_testing
IBIS_TEST_OMNISCIDB_USER=admin
IBIS_TEST_OMNISCIDB_PASSWORD=HyperInteractive
GOOGLE_BIGQUERY_PROJECT_ID=ibis-gbq
GOOGLE_APPLICATION_CREDENTIALS=/tmp/gcloud-service-key.json
12 changes: 8 additions & 4 deletions ci/Dockerfile.dev
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,17 @@ RUN apt-get -qq update --yes \
&& rm -rf /var/lib/apt/lists/*

ARG PYTHON_VERSION
ADD ci/requirements-$PYTHON_VERSION-dev.yml /
ADD environment.yml /
ADD ci/deps/*.yml /deps/

RUN /opt/conda/bin/conda config --add channels conda-forge \
&& /opt/conda/bin/conda update --all --yes --quiet \
&& /opt/conda/bin/conda env create --name ibis-env --file /requirements-$PYTHON_VERSION-dev.yml \
&& /opt/conda/bin/conda install --yes conda-build \
&& /opt/conda/bin/conda clean --all --yes
&& /opt/conda/bin/conda install --yes conda-build

RUN /opt/conda/bin/conda env create --name ibis-env --file /environment.yml \
&& /opt/conda/bin/conda install --name ibis-env python=$PYTHON_VERSION

RUN for FNAME in $(ls /deps/*.yml | grep -v "\-min.yml"); do /opt/conda/bin/conda install --name ibis-env --file $FNAME; done

RUN echo 'source /opt/conda/bin/activate ibis-env && exec "$@"' > activate.sh

Expand Down
23 changes: 8 additions & 15 deletions ci/Dockerfile.docs
Original file line number Diff line number Diff line change
@@ -1,23 +1,16 @@
ARG PYTHON_VERSION
FROM ibis:$PYTHON_VERSION

COPY . /ibis
WORKDIR /ibis

# fonts are for docs
RUN apt-get -qq update --yes \
&& apt-get -qq install --yes ttf-dejavu iputils-ping \
&& rm -rf /var/lib/apt/lists/*

ADD ci/requirements-docs.yml /

RUN /opt/conda/bin/conda config --add channels conda-forge \
&& rm -rf /var/lib/apt/lists/* \
&& /opt/conda/bin/conda config --add channels conda-forge \
&& /opt/conda/bin/conda update --all --yes \
&& /opt/conda/bin/conda install --name ibis-env --yes --file /requirements-docs.yml \
&& /opt/conda/bin/conda clean --all --yes

RUN echo 'source /opt/conda/bin/activate ibis-env && exec "$@"' > activate.sh

COPY . /ibis
WORKDIR /ibis

RUN bash /activate.sh pip install -e . --no-deps --ignore-installed --no-cache-dir
&& /opt/conda/bin/conda clean --all --yes \
&& pip install -e . --no-deps --ignore-installed --no-cache-dir

ENTRYPOINT ["bash", "/activate.sh"]
SHELL ["conda", "run", "-n", "ibis-env", "/bin/bash", "-c"]
18 changes: 0 additions & 18 deletions ci/asvconfig.py

This file was deleted.

443 changes: 180 additions & 263 deletions ci/azure/linux.yml

Large diffs are not rendered by default.

137 changes: 0 additions & 137 deletions ci/azure/windows.yml

This file was deleted.

9 changes: 0 additions & 9 deletions ci/benchmark.sh

This file was deleted.

129 changes: 92 additions & 37 deletions ci/datamgr.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,7 @@
#!/usr/bin/env python
import json
import io
import logging
import os
import tempfile
import warnings
import zipfile
from pathlib import Path
Expand Down Expand Up @@ -144,9 +143,10 @@ def cli(quiet):
)
@click.option('-d', '--directory', default=DATA_DIR)
def download(repo_url, directory):
from plumbum.cmd import curl
from shutil import rmtree

from plumbum.cmd import curl

directory = Path(directory)
# download the master branch
url = repo_url + '/archive/master.zip'
Expand Down Expand Up @@ -216,6 +216,7 @@ def parquet(tables, data_directory, ignore_missing_dependency, **params):
'--schema',
type=click.File('rt'),
default=str(SCRIPT_DIR / 'schema' / 'postgresql.sql'),
help='Path to SQL file that initializes the database via DDL.',
)
@click.option('-t', '--tables', multiple=True, default=TEST_TABLES + ['geo'])
@click.option('-d', '--data-directory', default=DATA_DIR)
Expand Down Expand Up @@ -304,6 +305,7 @@ def postgres(schema, tables, data_directory, psql_path, plpython, **params):
'--schema',
type=click.File('rt'),
default=str(SCRIPT_DIR / 'schema' / 'sqlite.sql'),
help='Path to SQL file that initializes the database via DDL.',
)
@click.option('-t', '--tables', multiple=True, default=TEST_TABLES)
@click.option('-d', '--data-directory', default=DATA_DIR)
Expand Down Expand Up @@ -334,6 +336,7 @@ def sqlite(database, schema, tables, data_directory, **params):
'--schema',
type=click.File('rt'),
default=str(SCRIPT_DIR / 'schema' / 'omniscidb.sql'),
help='Path to SQL file that initializes the database via DDL.',
)
@click.option('-t', '--tables', multiple=True, default=TEST_TABLES + ['geo'])
@click.option('-d', '--data-directory', default=DATA_DIR)
Expand Down Expand Up @@ -428,6 +431,7 @@ def omniscidb(schema, tables, data_directory, **params):
'--schema',
type=click.File('rt'),
default=str(SCRIPT_DIR / 'schema' / 'mysql.sql'),
help='Path to SQL file that initializes the database via DDL.',
)
@click.option('-t', '--tables', multiple=True, default=TEST_TABLES)
@click.option('-d', '--data-directory', default=DATA_DIR)
Expand All @@ -453,6 +457,7 @@ def mysql(schema, tables, data_directory, **params):
'--schema',
type=click.File('rt'),
default=str(SCRIPT_DIR / 'schema' / 'clickhouse.sql'),
help='Path to SQL file that initializes the database via DDL.',
)
@click.option('-t', '--tables', multiple=True, default=TEST_TABLES)
@click.option('-d', '--data-directory', default=DATA_DIR)
Expand All @@ -477,9 +482,16 @@ def clickhouse(schema, tables, data_directory, **params):


@cli.command()
@click.option(
'-S',
'--schema',
type=click.File('rt'),
default=str(SCRIPT_DIR / 'schema' / 'bigquery.sql'),
help='Path to SQL file that initializes the database via DDL.',
)
@click.option('-d', '--data-directory', default=DATA_DIR)
@click.option('-i', '--ignore-missing-dependency', is_flag=True, default=False)
def bigquery(data_directory, ignore_missing_dependency, **params):
def bigquery(schema, data_directory, ignore_missing_dependency, **params):
try:
import google.api_core.exceptions
from google.cloud import bigquery
Expand All @@ -495,29 +507,23 @@ def bigquery(data_directory, ignore_missing_dependency, **params):
bqclient = bigquery.Client(project=project_id)

# Create testing dataset.
testing_dataset = bqclient.dataset('testing')
testing_dataset = bigquery.DatasetReference(bqclient.project, 'testing')
try:
bqclient.create_dataset(bigquery.Dataset(testing_dataset))
except google.api_core.exceptions.Conflict:
pass # Skip if already created.

# Set up main data table.
# Set up main data tables.
job = bqclient.query(schema.read())
job.result()
if job.error_result:
raise click.ClickException(str(job.error_result))

# Load main data table.
data_directory = Path(data_directory)
functional_alltypes_path = data_directory / 'functional_alltypes.csv'
functional_alltypes_schema = []
schema_path = data_directory / 'functional_alltypes_bigquery_schema.json'
with open(str(schema_path)) as schemafile:
schema_json = json.load(schemafile)
for field in schema_json:
functional_alltypes_schema.append(
bigquery.SchemaField.from_api_repr(field)
)
load_config = bigquery.LoadJobConfig()
load_config.skip_leading_rows = 1 # skip the header row.
load_config.schema = functional_alltypes_schema

# Load main data table.
functional_alltypes_schema = []
with open(str(functional_alltypes_path), 'rb') as csvfile:
job = bqclient.load_table_from_file(
csvfile,
Expand All @@ -529,9 +535,7 @@ def bigquery(data_directory, ignore_missing_dependency, **params):
raise click.ClickException(str(job.error_result))

# Load an ingestion time partitioned table.
functional_alltypes_path = data_directory / 'functional_alltypes.csv'
with open(str(functional_alltypes_path), 'rb') as csvfile:
load_config.time_partitioning = bigquery.TimePartitioning()
job = bqclient.load_table_from_file(
csvfile,
testing_dataset.table('functional_alltypes_parted'),
Expand All @@ -545,6 +549,7 @@ def bigquery(data_directory, ignore_missing_dependency, **params):
struct_table_path = data_directory / 'struct_table.avro'
with open(str(struct_table_path), 'rb') as avrofile:
load_config = bigquery.LoadJobConfig()
load_config.write_disposition = 'WRITE_TRUNCATE'
load_config.source_format = 'AVRO'
job = bqclient.load_table_from_file(
avrofile,
Expand All @@ -565,7 +570,7 @@ def bigquery(data_directory, ignore_missing_dependency, **params):
date_table.time_partitioning = bigquery.TimePartitioning(
field='my_date_parted_col'
)
bqclient.create_table(date_table)
bqclient.create_table(date_table, exists_ok=True)

# Create empty timestamp-partitioned tables.
timestamp_table = bigquery.Table(
Expand All @@ -579,37 +584,87 @@ def bigquery(data_directory, ignore_missing_dependency, **params):
timestamp_table.time_partitioning = bigquery.TimePartitioning(
field='my_timestamp_parted_col'
)
bqclient.create_table(timestamp_table)
bqclient.create_table(timestamp_table, exists_ok=True)

# Create a table with a numeric column
numeric_table = bigquery.Table(testing_dataset.table('numeric_table'))
numeric_table.schema = [
bigquery.SchemaField('string_col', 'STRING'),
bigquery.SchemaField('numeric_col', 'NUMERIC'),
]
bqclient.create_table(numeric_table)
bqclient.create_table(numeric_table, exists_ok=True)

df = pd.read_csv(
str(data_directory / 'functional_alltypes.csv'),
str(functional_alltypes_path),
usecols=['string_col', 'double_col'],
header=0,
)
with tempfile.NamedTemporaryFile(mode='a+b') as csvfile:
df.to_csv(csvfile, header=False, index=False)
csvfile.seek(0)
numeric_csv = io.StringIO()
df.to_csv(numeric_csv, header=False, index=False)
csvfile = io.BytesIO(numeric_csv.getvalue().encode('utf-8'))
load_config = bigquery.LoadJobConfig()
load_config.write_disposition = 'WRITE_TRUNCATE'
load_config.skip_leading_rows = 1 # skip the header row.
load_config.schema = numeric_table.schema

load_config = bigquery.LoadJobConfig()
load_config.skip_leading_rows = 1 # skip the header row.
load_config.schema = numeric_table.schema
job = bqclient.load_table_from_file(
csvfile,
testing_dataset.table('numeric_table'),
job_config=load_config,
).result()

job = bqclient.load_table_from_file(
csvfile,
testing_dataset.table('numeric_table'),
job_config=load_config,
).result()
if job.error_result:
raise click.ClickException(str(job.error_result))

if job.error_result:
raise click.ClickException(str(job.error_result))

@cli.command()
def pandas(**params):
"""
The pandas backend does not need test data, but we still
have an option for the backend for consistency, and to not
have to avoid calling `./datamgr.py pandas` in the CI.
"""
pass


@cli.command()
def csv(**params):
"""
The csv backend does not need test data, but we still
have an option for the backend for consistency, and to not
have to avoid calling `./datamgr.py csv` in the CI.
"""
pass


@cli.command()
def hdf5(**params):
"""
The hdf5 backend does not need test data, but we still
have an option for the backend for consistency, and to not
have to avoid calling `./datamgr.py hdf5` in the CI.
"""
pass


@cli.command()
def spark(**params):
"""
The spark backend does not need test data, but we still
have an option for the backend for consistency, and to not
have to avoid calling `./datamgr.py spark` in the CI.
"""
pass


@cli.command()
def pyspark(**params):
"""
The hdf5 backend does not need test data, but we still
have an option for the backend for consistency, and to not
have to avoid calling `./datamgr.py pyspark` in the CI.
"""
pass


if __name__ == '__main__':
Expand Down
2 changes: 2 additions & 0 deletions ci/deps/bigquery.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
google-cloud-bigquery-core >=1.12.0,<1.24.0dev
pydata-google-auth
5 changes: 5 additions & 0 deletions ci/deps/clickhouse.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
sqlalchemy>=1.3
clickhouse-cityhash
clickhouse-driver>=0.1.3
clickhouse-sqlalchemy
lz4
6 changes: 6 additions & 0 deletions ci/deps/impala.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
sqlalchemy>=1.3
impyla>=0.15.0
requests>=2.24
thrift>=0.9.3
thriftpy2>=0.4
thrift_sasl>=0.2.1
2 changes: 2 additions & 0 deletions ci/deps/mysql.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
sqlalchemy>=1.3
pymysql
2 changes: 2 additions & 0 deletions ci/deps/omniscidb.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
pymapd==0.24
pyarrow
1 change: 1 addition & 0 deletions ci/deps/parquet.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
pyarrow>=0.13
3 changes: 3 additions & 0 deletions ci/deps/postgres.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
sqlalchemy>=1.3
psycopg2>=2.8
geoalchemy2>=0.6
4 changes: 4 additions & 0 deletions ci/deps/pyspark-min.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Need to import double-conversion below otherwise `import pyarrow` fails with ImportError: libdouble-conversion.so.3
double-conversion
pyarrow=0.12.1
pyspark=2.4.3
1 change: 1 addition & 0 deletions ci/deps/pyspark.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
pyspark>=2.4.3
4 changes: 4 additions & 0 deletions ci/deps/spark-min.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Need to import double-conversion below otherwise `import pyarrow` fails with ImportError: libdouble-conversion.so.3
double-conversion
pyarrow=0.12.1
pyspark=2.4.3
1 change: 1 addition & 0 deletions ci/deps/spark.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
pyspark>=2.4.3
18 changes: 9 additions & 9 deletions ci/docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ services:
POSTGRES_PASSWORD: ''

mysql:
image: mariadb:10.2
image: mariadb:10.4.12
hostname: mysql
ports:
- 3307:3306
Expand All @@ -19,8 +19,6 @@ services:
MYSQL_DATABASE: ibis_testing
MYSQL_USER: ibis
MYSQL_PASSWORD: ibis
# see: https://github.com/docker-library/mariadb/issues/262
MYSQL_INITDB_SKIP_TZINFO: 1

impala:
image: ibisproject/impala:latest
Expand Down Expand Up @@ -56,7 +54,7 @@ services:
- "sleep 80 && supervisord -c /etc/supervisord.conf -n"

clickhouse:
image: yandex/clickhouse-server:18.12
image: yandex/clickhouse-server:18.14
hostname: clickhouse
ports:
- 8123:8123
Expand Down Expand Up @@ -89,7 +87,9 @@ services:
KUDU_MASTER: "false"

omniscidb:
image: omnisci/core-os-cpu:v5.1.0
# NOTE: Keep the documentation about the OmniSciDB supported version
# updated (docs/source/backends/omnisci.rst).
image: omnisci/core-os-cpu:v5.3.0
hostname: omniscidb

ports:
Expand All @@ -102,7 +102,7 @@ services:
image: jwilder/dockerize

ibis:
image: ibis:${PYTHON_VERSION:-3.6}
image: ibis:${PYTHON_VERSION:-3.7}
env_file:
- ./.env
volumes:
Expand All @@ -112,10 +112,10 @@ services:
context: ..
dockerfile: ci/Dockerfile.dev
args:
PYTHON_VERSION: ${PYTHON_VERSION:-3.6}
PYTHON_VERSION: ${PYTHON_VERSION:-3.7}

ibis-docs:
image: ibis-docs:${PYTHON_VERSION:-3.6}
image: ibis-docs:${PYTHON_VERSION:-3.7}
env_file:
- ./.env
volumes:
Expand All @@ -125,4 +125,4 @@ services:
context: ..
dockerfile: ci/Dockerfile.docs
args:
PYTHON_VERSION: ${PYTHON_VERSION:-3.6}
PYTHON_VERSION: ${PYTHON_VERSION:-3.7}
2 changes: 1 addition & 1 deletion ci/docs.sh
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#!/bin/bash -e

export PYTHON_VERSION="3.6"
export PYTHON_VERSION="3.7"

docker-compose build ibis
docker-compose build ibis-docs
Expand Down
150 changes: 0 additions & 150 deletions ci/feedstock.py

This file was deleted.

2 changes: 1 addition & 1 deletion ci/impalamgr.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@
from plumbum.cmd import cmake, make

import ibis
from ibis.backends.impala.tests.conftest import IbisTestEnv
from ibis.common.exceptions import IbisError
from ibis.impala.tests.conftest import IbisTestEnv

SCRIPT_DIR = Path(__file__).parent.absolute()
DATA_DIR = Path(
Expand Down
94 changes: 94 additions & 0 deletions ci/recipe/meta.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
# This is a copy of https://github.com/conda-forge/ibis-framework-feedstock/blob/master/recipe/meta.yaml
# Changes required to the recipe will be performed and tested with this file during
# development of Ibis, and this file needs to replace the original one on releases.
#
# Changes to the original file that need to be restored when copying to release:
# - Set the version in the first line: {% set version = "1.3.0" %}
# - Add `sha256` key to the `source` section, with the tar.gz hash
# - Set the `number` in the `build` section to the appropriate build number
# - Remove this comment from the beginning of the file

package:
name: ibis-framework
version: {{ version }}

source:
url: https://github.com/ibis-project/ibis/archive/{{ version }}.tar.gz

build:
number: 1
script: {{ PYTHON }} -m pip install . --no-deps --ignore-installed --no-cache-dir -vvv
# uncomment noarch when pymapd and pyspark issues are fixed for py38
# noarch: python

requirements:
host:
- pip
- python
- setuptools

run:
- clickhouse-driver >=0.1.3
- clickhouse-cityhash # [not win]
- clickhouse-sqlalchemy
- geoalchemy2
- geopandas
- google-cloud-bigquery-core >=1.12.0,<1.24.0dev
- graphviz
- impyla >=0.15.0
- lz4
- multipledispatch >=0.6
- numpy >=1.15
- pandas >=0.25.3
- psycopg2
- pyarrow >=0.15
- pydata-google-auth
- pymapd 0.24 # [py<38]
- pymysql
- pyspark >=2.4.3 # [py<38]
- pytables >=3.0.0
- python
- python-graphviz
- python-hdfs >=2.0.16
- pytz
- regex
- requests
- shapely
- setuptools
- sqlalchemy >=1.1
- thrift >=0.11
- thriftpy2
- toolz

test:
imports:
- ibis
- ibis.backends.bigquery
- ibis.backends.clickhouse
- ibis.backends.csv
- ibis.backends.parquet
- ibis.backends.hdf5
- ibis.backends.impala
- ibis.backends.mysql
- ibis.backends.omniscidb # [py<38]
- ibis.backends.pandas
- ibis.backends.postgres
- ibis.backends.pyspark # [py<38]
- ibis.backends.spark
- ibis.backends.sqlite

about:
license: Apache-2.0
license_family: Apache
license_file: LICENSE.txt
home: http://www.ibis-project.org
summary: Productivity-centric Python Big Data Framework

extra:
recipe-maintainers:
- cpcloud
- mariusvniekerk
- wesm
- kszucs
- xmnlab
- jreback
59 changes: 0 additions & 59 deletions ci/requirements-3.6-dev.yml

This file was deleted.

59 changes: 0 additions & 59 deletions ci/requirements-3.7-dev.yml

This file was deleted.

61 changes: 0 additions & 61 deletions ci/requirements-3.8-dev.yml

This file was deleted.

14 changes: 0 additions & 14 deletions ci/requirements-docs.yml

This file was deleted.

22 changes: 22 additions & 0 deletions ci/run_tests.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
#!/bin/bash -e
# Run the Ibis tests. Two environment variables are considered:
# - PYTEST_BACKENDS: Space-separated list of backends to run
# - PYTEST_EXPRESSION: Marker expression, for example "not udf"

TESTS_DIRS="ibis/tests"
for BACKEND in $PYTEST_BACKENDS; do
if [[ -d ibis/$BACKEND/tests ]]; then
TESTS_DIRS="$TESTS_DIRS ibis/$BACKEND/tests"
fi
done

echo "TESTS_DIRS: $TESTS_DIRS"
echo "PYTEST_EXPRESSION: $PYTEST_EXPRESSION"


pytest $TESTS_DIRS \
-m "${PYTEST_EXPRESSION}" \
-ra \
--junitxml=junit.xml \
--cov=ibis \
--cov-report=xml:coverage.xml
41 changes: 41 additions & 0 deletions ci/schema/bigquery.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
CREATE OR REPLACE TABLE `testing.functional_alltypes_parted`
(
index INT64,
Unnamed_0 INT64,
id INT64,
bool_col BOOL,
tinyint_col INT64,
smallint_col INT64,
int_col INT64,
bigint_col INT64,
float_col FLOAT64,
double_col FLOAT64,
date_string_col STRING,
string_col STRING,
timestamp_col TIMESTAMP,
year INT64,
month INT64
)
PARTITION BY DATE(_PARTITIONTIME)
OPTIONS (
require_partition_filter=false
);

CREATE OR REPLACE TABLE `testing.functional_alltypes`
(
index INT64,
Unnamed_0 INT64,
id INT64,
bool_col BOOL,
tinyint_col INT64,
smallint_col INT64,
int_col INT64,
bigint_col INT64,
float_col FLOAT64,
double_col FLOAT64,
date_string_col STRING,
string_col STRING,
timestamp_col TIMESTAMP,
year INT64,
month INT64
);
34 changes: 0 additions & 34 deletions ci/setup_docker_volume.sh

This file was deleted.

64 changes: 64 additions & 0 deletions ci/setup_env.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
#!/bin/bash -e
# Set up conda environment for Ibis in GitHub Actions
# The base environment of the provided conda is used
# This script only installs the base dependencies.
# Dependencies for the backends need to be installed separately.

PYTHON_VERSION="${1:-3.7}"
BACKENDS="$2"

echo "PYTHON_VERSION: $PYTHON_VERSION"
echo "BACKENDS: $BACKENDS"

if [[ -n "$CONDA" ]]; then
# Add conda to Path
OS_NAME=$(uname)
case $OS_NAME in
Linux)
CONDA_PATH="$CONDA/bin"
;;
MINGW*)
# Windows
CONDA_POSIX=$(cygpath -u "$CONDA")
CONDA_PATH="$CONDA_POSIX:$CONDA_POSIX/Scripts:$CONDA_POSIX/Library:$CONDA_POSIX/Library/bin:$CONDA_POSIX/Library/mingw-w64/bin"
;;
*)
echo "$OS_NAME not supported."
exit 1
esac
PATH=${CONDA_PATH}:${PATH}
# Prepend conda path to system path for the subsequent GitHub Actions
echo "${CONDA_PATH}" >> $GITHUB_PATH
else
echo "Running without adding conda to PATH."
fi

conda update -n base -c anaconda --all --yes conda
conda install -n base -c anaconda --yes python=${PYTHON_VERSION}
conda env update -n base --file=environment.yml
python -m pip install -e .

if [[ -n "$BACKENDS" ]]; then
python ci/datamgr.py download
for BACKEND in $BACKENDS; do
# For the oldest python version supported (currently 3.7) we first try to
# install the minimum supported dependencies `ci/deps/$BACKEND-min.yml`.
# If the file does not exist then we install the normal dependencies
# (if there are dependencies). For other python versions we simply install
# the normal dependencies if they exist.
if [[ $PYTHON_VERSION == "3.7" && -f "ci/deps/$BACKEND-min.yml" ]]; then
conda install -n base -c conda-forge --file="ci/deps/$BACKEND-min.yml"
else
if [[ -f "ci/deps/$BACKEND.yml" ]]; then
conda install -n base -c conda-forge --file="ci/deps/$BACKEND.yml"
fi
fi

# TODO load impala data in the same way as the rest of the backends
if [[ "$BACKEND" == "impala" ]]; then
python ci/impalamgr.py load --data
else
python ci/datamgr.py $BACKEND
fi
done
fi
296 changes: 4 additions & 292 deletions docs/source/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,298 +5,6 @@
API Reference
*************

.. currentmodule:: ibis

.. _api.client:

Creating connections
--------------------

These methods are in the ``ibis`` module namespace, and your main point of
entry to using Ibis.

.. autosummary::
:toctree: generated/

hdfs_connect

Impala client
-------------
.. currentmodule:: ibis.impala.api

These methods are available on the Impala client object after connecting to
your HDFS cluster (``ibis.hdfs_connect``) and connecting to Impala with
``ibis.impala.connect``.

.. autosummary::
:toctree: generated/

connect
ImpalaClient.close
ImpalaClient.database

Database methods
~~~~~~~~~~~~~~~~

.. autosummary::
:toctree: generated/

ImpalaClient.set_database
ImpalaClient.create_database
ImpalaClient.drop_database
ImpalaClient.list_databases
ImpalaClient.exists_database

.. autosummary::
:toctree: generated/

ImpalaDatabase.create_table
ImpalaDatabase.drop
ImpalaDatabase.namespace
ImpalaDatabase.table

Table methods
~~~~~~~~~~~~~

The ``ImpalaClient`` object itself has many helper utility methods. You'll find
the most methods on ``ImpalaTable``.

.. autosummary::
:toctree: generated/

ImpalaClient.database
ImpalaClient.table
ImpalaClient.sql
ImpalaClient.raw_sql
ImpalaClient.list_tables
ImpalaClient.exists_table
ImpalaClient.drop_table
ImpalaClient.create_table
ImpalaClient.insert
ImpalaClient.truncate_table
ImpalaClient.get_schema
ImpalaClient.cache_table
ImpalaClient.load_data
ImpalaClient.get_options
ImpalaClient.set_options
ImpalaClient.set_compression_codec


The best way to interact with a single table is through the ``ImpalaTable``
object you get back from ``ImpalaClient.table``.

.. autosummary::
:toctree: generated/

ImpalaTable.add_partition
ImpalaTable.alter
ImpalaTable.alter_partition
ImpalaTable.column_stats
ImpalaTable.compute_stats
ImpalaTable.describe_formatted
ImpalaTable.drop
ImpalaTable.drop_partition
ImpalaTable.files
ImpalaTable.insert
ImpalaTable.invalidate_metadata
ImpalaTable.load_data
ImpalaTable.metadata
ImpalaTable.partition_schema
ImpalaTable.partitions
ImpalaTable.refresh
ImpalaTable.rename
ImpalaTable.stats

Creating views is also possible:

.. autosummary::
:toctree: generated/

ImpalaClient.create_view
ImpalaClient.drop_view
ImpalaClient.drop_table_or_view

Accessing data formats in HDFS
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autosummary::
:toctree: generated/

ImpalaClient.avro_file
ImpalaClient.delimited_file
ImpalaClient.parquet_file

Executing expressions
~~~~~~~~~~~~~~~~~~~~~

.. autosummary::
:toctree: generated/

ImpalaClient.execute
ImpalaClient.disable_codegen

.. _api.postgres:

PostgreSQL client
-----------------
.. currentmodule:: ibis.sql.postgres.api

The PostgreSQL client is accessible through the ``ibis.postgres`` namespace.

Use ``ibis.postgres.connect`` with a SQLAlchemy-compatible connection string to
create a client.

.. autosummary::
:toctree: generated/

connect
PostgreSQLClient.database
PostgreSQLClient.list_tables
PostgreSQLClient.list_databases
PostgreSQLClient.table

.. _api.sqlite:

SQLite client
-------------
.. currentmodule:: ibis.sql.sqlite.api

The SQLite client is accessible through the ``ibis.sqlite`` namespace.

Use ``ibis.sqlite.connect`` to create a SQLite client.

.. autosummary::
:toctree: generated/

connect
SQLiteClient.attach
SQLiteClient.database
SQLiteClient.list_tables
SQLiteClient.table

.. _api.mysql:

MySQL client (Experimental)
---------------------------
.. currentmodule:: ibis.sql.mysql.api

The MySQL client is accessible through the ``ibis.mysql`` namespace.

Use ``ibis.mysql.connect`` with a SQLAlchemy-compatible connection string to
create a client.

.. autosummary::
:toctree: generated/

connect
MySQLClient.database
MySQLClient.list_databases
MySQLClient.list_tables
MySQLClient.table

.. _api.omniscidb:

OmniSciDB client (Experimental)
-------------------------------
.. currentmodule:: ibis.omniscidb.api

The OmniSciDB client is accessible through the ``ibis.omniscidb`` namespace.

Use ``ibis.omniscidb.connect`` to create a client.

.. autosummary::
:toctree: generated/

compile
connect
verify
OmniSciDBClient.alter_user
OmniSciDBClient.close
OmniSciDBClient.create_database
OmniSciDBClient.create_table
OmniSciDBClient.create_user
OmniSciDBClient.create_view
OmniSciDBClient.database
OmniSciDBClient.describe_formatted
OmniSciDBClient.drop_database
OmniSciDBClient.drop_table
OmniSciDBClient.drop_table_or_view
OmniSciDBClient.drop_user
OmniSciDBClient.drop_view
OmniSciDBClient.exists_table
OmniSciDBClient.get_schema
OmniSciDBClient.list_tables
OmniSciDBClient.load_data
OmniSciDBClient.log
OmniSciDBClient.set_database
OmniSciDBClient.sql
OmniSciDBClient.table
OmniSciDBClient.truncate_table
OmniSciDBClient.version

.. _api.hdfs:

HDFS
----

Client objects have an ``hdfs`` attribute you can use to interact directly with
HDFS.

.. currentmodule:: ibis

.. autosummary::
:toctree: generated/

HDFS.ls
HDFS.chmod
HDFS.chown
HDFS.get
HDFS.head
HDFS.put
HDFS.put_tarfile
HDFS.rm
HDFS.rmdir
HDFS.size
HDFS.status

.. _api.spark:

SparkSQL client (Experimental)
------------------------------
.. currentmodule:: ibis.spark.api

The Spark SQL client is accessible through the ``ibis.spark`` namespace.

Use ``ibis.spark.connect`` to create a client.

.. autosummary::
:toctree: generated/

connect
SparkClient.database
SparkClient.list_databases
SparkClient.list_tables
SparkClient.table

.. _api.pyspark:

PySpark client (Experimental)
-----------------------------
.. currentmodule:: ibis.pyspark.api

The PySpark client is accessible through the ``ibis.pyspark`` namespace.

Use ``ibis.pyspark.connect`` to create a client.

.. autosummary::
:toctree: generated/

connect
PySparkClient.database
PySparkClient.list_databases
PySparkClient.list_tables
PySparkClient.table

Top-level expression APIs
-------------------------

Expand Down Expand Up @@ -329,6 +37,7 @@ These methods are available directly in the ``ibis`` module namespace.
trailing_window
cumulative_window
trailing_range_window
random

.. _api.expr:

Expand Down Expand Up @@ -370,6 +79,7 @@ Table methods
TableExpr.mutate
TableExpr.projection
TableExpr.relabel
TableExpr.rowid
TableExpr.schema
TableExpr.set_column
TableExpr.sort_by
Expand Down Expand Up @@ -576,6 +286,7 @@ All timestamp operations are valid either on scalar or array values
TimestampValue.month
TimestampValue.day
TimestampValue.day_of_week
TimestampValue.epoch_seconds
TimestampValue.hour
TimestampValue.minute
TimestampValue.second
Expand All @@ -601,6 +312,7 @@ Date methods
DateValue.month
DateValue.day
DateValue.day_of_week
DateValue.epoch_seconds
DateValue.truncate
DateValue.add
DateValue.radd
Expand Down
176 changes: 176 additions & 0 deletions docs/source/backends/bigquery.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,176 @@
.. currentmodule:: ibis.bigquery.api

.. _backends.bigquery:

BigQuery
========

To use the BigQuery client, you will need a Google Cloud Platform account.
Use the `BigQuery sandbox <https://cloud.google.com/bigquery/docs/sandbox>`__
to try the service for free.

.. _install.bigquery:

`BigQuery <https://cloud.google.com/bigquery/>`_ Quickstart
-----------------------------------------------------------

Install dependencies for Ibis's BigQuery dialect:

::

pip install ibis-framework[bigquery]

Create a client by passing in the project id and dataset id you wish to operate
with:


.. code-block:: python
>>> con = ibis.bigquery.connect(project_id='ibis-gbq', dataset_id='testing')
By default ibis assumes that the BigQuery project that's billed for queries is
also the project where the data lives.

However, it's very easy to query data that does **not** live in the billing
project.

.. note::

When you run queries against data from other projects **the billing project
will still be billed for any and all queries**.

If you want to query data that lives in a different project than the billing
project you can use the :meth:`ibis.bigquery.client.BigQueryClient.database`
method of :class:`ibis.bigquery.client.BigQueryClient` objects:

.. code-block:: python
>>> db = con.database('other-data-project.other-dataset')
>>> t = db.my_awesome_table
>>> t.sweet_column.sum().execute() # runs against the billing project
.. _api.bigquery:

API
---
.. currentmodule:: ibis.backends.bigquery

The BigQuery client is accessible through the ``ibis.bigquery`` namespace.
See :ref:`backends.bigquery` for a tutorial on using this backend.

Use the ``ibis.bigquery.connect`` function to create a BigQuery
client. If no ``credentials`` are provided, the
:func:`pydata_google_auth.default` function fetches default credentials.

.. autosummary::
:toctree: ../generated/

connect
BigQueryClient.database
BigQueryClient.list_databases
BigQueryClient.list_tables
BigQueryClient.table

The BigQuery client object
--------------------------

To use Ibis with BigQuery, you first must connect to BigQuery using the
:func:`ibis.bigquery.connect` function, optionally supplying Google API
credentials:

.. code-block:: python
import ibis
client = ibis.bigquery.connect(
project_id=YOUR_PROJECT_ID,
dataset_id='bigquery-public-data.stackoverflow'
)
.. _udf.bigquery:

User Defined functions (UDF)
----------------------------

.. note::

BigQuery only supports element-wise UDFs at this time.

BigQuery supports UDFs through JavaScript. Ibis provides support for this by
turning Python code into JavaScript.

The interface is very similar to the pandas UDF API:

.. code-block:: python
import ibis.expr.datatypes as dt
from ibis.bigquery import udf
@udf([dt.double], dt.double)
def my_bigquery_add_one(x):
return x + 1.0
Ibis will parse the source of the function and turn the resulting Python AST
into JavaScript source code (technically, ECMAScript 2015). Most of the Python
language is supported including classes, functions and generators.

When you want to use this function you call it like any other Python
function--only it must be called on an ibis expression:

.. code-block:: python
t = ibis.table([('a', 'double')])
expr = my_bigquery_add_one(t.a)
print(ibis.bigquery.compile(expr))
.. _bigquery-privacy:

Privacy
-------

This package is subject to the `NumFocus privacy policy
<https://numfocus.org/privacy-policy>`_. Your use of Google APIs with this
module is subject to each API's respective `terms of service
<https://developers.google.com/terms/>`_.

Google account and user data
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Accessing user data
~~~~~~~~~~~~~~~~~~~

The :func:`~ibis.bigquery.api.connect` function provides access to data
stored in Google BigQuery and other sources such as Google Sheets or Cloud
Storage, via the federated query feature. Your machine communicates directly
with the Google APIs.

Storing user data
~~~~~~~~~~~~~~~~~

By default, your credentials are stored to a local file, such as
``~/.config/pydata/ibis.json``. All user data is stored on
your local machine. **Use caution when using this library on a shared
machine**.

Sharing user data
~~~~~~~~~~~~~~~~~

The BigQuery client only communicates with Google APIs. No user data is
shared with PyData, NumFocus, or any other servers.

Policies for application authors
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Do not use the default client ID when using Ibis from an application,
library, or tool. Per the `Google User Data Policy
<https://developers.google.com/terms/api-services-user-data-policy>`_, your
application must accurately represent itself when authenticating to Google
API services.

Extending the BigQuery backend
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

* Create a Google Cloud project.
* Set the ``GOOGLE_BIGQUERY_PROJECT_ID`` environment variable.
* Populate test data: ``python ci/datamgr.py bigquery``
* Run the test suite: ``pytest ibis/bigquery/tests``
40 changes: 40 additions & 0 deletions docs/source/backends/clickhouse.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
.. _install.clickhouse:

`Clickhouse <https://clickhouse.yandex/>`_
------------------------------------------

Install dependencies for Ibis's Clickhouse dialect(minimal supported version is `0.1.3`):

::

pip install ibis-framework[clickhouse]

Create a client by passing in database connection parameters such as ``host``,
``port``, ``database``, and ``user`` to :func:`ibis.clickhouse.connect`:


.. code-block:: python
con = ibis.clickhouse.connect(host='clickhouse', port=9000)
.. _api.clickhouse:

API
===
.. currentmodule:: ibis.backends.clickhouse

The ClickHouse client is accessible through the ``ibis.clickhouse`` namespace.

Use ``ibis.clickhouse.connect`` to create a client.

.. autosummary::
:toctree: ../generated/

connect
ClickhouseClient.close
ClickhouseClient.exists_table
ClickhouseClient.exists_database
ClickhouseClient.get_schema
ClickhouseClient.set_database
ClickhouseClient.list_databases
ClickhouseClient.list_tables
215 changes: 195 additions & 20 deletions docs/source/impala.rst → docs/source/backends/impala.rst
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
.. currentmodule:: ibis.impala.api
.. currentmodule:: ibis.backends.impala

.. _impala:
.. _backends.impala:

**********************
Using Ibis with Impala
**********************
******
Impala
******

One goal of Ibis is to provide an integrated Python API for an Impala cluster
without requiring you to switch back and forth between Python code and the
Expand All @@ -17,6 +17,179 @@ While interoperability between the Hadoop / Spark ecosystems and pandas / the
PyData stack is overall poor (but improving), we also show some ways that you
can use pandas with Ibis and Impala.

.. _install.impala:

`Impala <https://impala.apache.org/>`_ Quickstart
-------------------------------------------------

Install dependencies for Ibis's Impala dialect:

::

pip install ibis-framework[impala]

To create an Ibis client, you must first connect your services and assemble the
client using :func:`ibis.impala.connect`:

.. code-block:: python
import ibis
hdfs = ibis.hdfs_connect(host='impala', port=50070)
con = ibis.impala.connect(
host='impala', database='ibis_testing', hdfs_client=hdfs
)
Both method calls can take ``auth_mechanism='GSSAPI'`` or
``auth_mechanism='LDAP'`` to connect to Kerberos clusters. Depending on your
cluster setup, this may also include SSL. See the :ref:`API reference
<api>` for more, along with the Impala shell reference, as the
connection semantics are identical.

API
---
.. currentmodule:: ibis.backends.impala

These methods are available on the Impala client object after connecting to
your HDFS cluster (``ibis.hdfs_connect``) and connecting to Impala with
``ibis.impala.connect``. See :ref:`backends.impala` for a tutorial on using this
backend.

.. autosummary::
:toctree: ../generated/

connect
ImpalaClient.close
ImpalaClient.database

Database methods
~~~~~~~~~~~~~~~~

.. autosummary::
:toctree: ../generated/

ImpalaClient.set_database
ImpalaClient.create_database
ImpalaClient.drop_database
ImpalaClient.list_databases
ImpalaClient.exists_database

.. autosummary::
:toctree: ../generated/

ImpalaDatabase.create_table
ImpalaDatabase.drop
ImpalaDatabase.namespace
ImpalaDatabase.table

Table methods
~~~~~~~~~~~~~

The ``ImpalaClient`` object itself has many helper utility methods. You'll find
the most methods on ``ImpalaTable``.

.. autosummary::
:toctree: ../generated/

ImpalaClient.database
ImpalaClient.table
ImpalaClient.sql
ImpalaClient.raw_sql
ImpalaClient.list_tables
ImpalaClient.exists_table
ImpalaClient.drop_table
ImpalaClient.create_table
ImpalaClient.insert
ImpalaClient.invalidate_metadata
ImpalaClient.truncate_table
ImpalaClient.get_schema
ImpalaClient.cache_table
ImpalaClient.load_data
ImpalaClient.get_options
ImpalaClient.set_options
ImpalaClient.set_compression_codec


The best way to interact with a single table is through the ``ImpalaTable``
object you get back from ``ImpalaClient.table``.

.. autosummary::
:toctree: ../generated/

ImpalaTable.add_partition
ImpalaTable.alter
ImpalaTable.alter_partition
ImpalaTable.column_stats
ImpalaTable.compute_stats
ImpalaTable.describe_formatted
ImpalaTable.drop
ImpalaTable.drop_partition
ImpalaTable.files
ImpalaTable.insert
ImpalaTable.invalidate_metadata
ImpalaTable.is_partitioned
ImpalaTable.load_data
ImpalaTable.metadata
ImpalaTable.partition_schema
ImpalaTable.partitions
ImpalaTable.refresh
ImpalaTable.rename
ImpalaTable.schema
ImpalaTable.stats

Creating views is also possible:

.. autosummary::
:toctree: ../generated/

ImpalaClient.create_view
ImpalaClient.drop_view
ImpalaClient.drop_table_or_view

Accessing data formats in HDFS
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autosummary::
:toctree: ../generated/

ImpalaClient.avro_file
ImpalaClient.delimited_file
ImpalaClient.parquet_file

Executing expressions
~~~~~~~~~~~~~~~~~~~~~

.. autosummary::
:toctree: ../generated/

ImpalaClient.execute
ImpalaClient.disable_codegen

.. _api.hdfs:

HDFS
~~~~

Client objects have an ``hdfs`` attribute you can use to interact directly with
HDFS.

.. autosummary::
:toctree: generated/

hdfs_connect
HDFS.ls
HDFS.chmod
HDFS.chown
HDFS.get
HDFS.head
HDFS.put
HDFS.put_tarfile
HDFS.rm
HDFS.rmdir
HDFS.size
HDFS.status


The Impala client object
------------------------

Expand All @@ -40,7 +213,7 @@ using docker:
import ibis
host = 'impala'
hdfs = ibis.hdfs_connect(host=host)
hdfs = ibis.impala.hdfs_connect(host=host)
client = ibis.impala.connect(host=host, hdfs_client=hdfs)
You can accomplish many tasks directly through the client object, but we
Expand All @@ -51,7 +224,7 @@ Database and Table objects
--------------------------

.. autosummary::
:toctree: generated/
:toctree: ../generated/

ImpalaClient.database
ImpalaClient.table
Expand Down Expand Up @@ -83,7 +256,7 @@ Like all table expressions in Ibis, ``ImpalaTable`` has a ``schema`` method you
can use to examine its schema:

.. autosummary::
:toctree: generated/
:toctree: ../generated/

ImpalaTable.schema

Expand Down Expand Up @@ -124,7 +297,7 @@ In all cases, you should use the ``create_table`` method either on the
top-level client connection or a database object.

.. autosummary::
:toctree: generated/
:toctree: ../generated/

ImpalaClient.create_table
ImpalaDatabase.create_table
Expand Down Expand Up @@ -217,7 +390,7 @@ There are a handful of table methods for adding and removing partitions and
getting information about the partition schema and any existing partition data:

.. autosummary::
:toctree: generated/
:toctree: ../generated/

ImpalaTable.add_partition
ImpalaTable.drop_partition
Expand Down Expand Up @@ -287,7 +460,7 @@ To get a handy wrangled version of ``DESCRIBE FORMATTED`` use the ``metadata``
method.

.. autosummary::
:toctree: generated/
:toctree: ../generated/

ImpalaTable.metadata

Expand All @@ -306,7 +479,7 @@ The ``files`` function is also available to see all of the physical HDFS data
files backing a table:

.. autosummary::
:toctree: generated/
:toctree: ../generated/

ImpalaTable.files

Expand Down Expand Up @@ -338,7 +511,7 @@ location, file format, and other properties. For partitioned tables, to change
partition-specific metadata use ``alter_partition``.

.. autosummary::
:toctree: generated/
:toctree: ../generated/

ImpalaTable.alter
ImpalaTable.alter_partition
Expand Down Expand Up @@ -377,7 +550,7 @@ Computing table and partition statistics
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autosummary::
:toctree: generated/
:toctree: ../generated/

ImpalaTable.compute_stats

Expand All @@ -400,7 +573,7 @@ INCREMENTAL STATS`` DDL command:
Seeing table and column statistics
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autosummary::
:toctree: generated/
:toctree: ../generated/

ImpalaTable.column_stats
ImpalaTable.stats
Expand Down Expand Up @@ -476,7 +649,7 @@ depend, of course, on the last ``COMPUTE STATS`` call.
These DDL commands are available as table-level and client-level methods:

.. autosummary::
:toctree: generated/
:toctree: ../generated/

ImpalaClient.invalidate_metadata
ImpalaTable.invalidate_metadata
Expand Down Expand Up @@ -509,7 +682,7 @@ manually moving files with low level HDFS commands. It also deals with file
name conflicts so data is not lost in such cases.

.. autosummary::
:toctree: generated/
:toctree: ../generated/

ImpalaClient.load_data
ImpalaTable.load_data
Expand Down Expand Up @@ -538,7 +711,7 @@ Ibis gives you access to Impala session-level variables that affect query
execution:

.. autosummary::
:toctree: generated/
:toctree: ../generated/

ImpalaClient.disable_codegen
ImpalaClient.get_options
Expand Down Expand Up @@ -627,8 +800,10 @@ For example:
to_insert.execute()
to_insert.drop()
Using Impala UDFs in Ibis
-------------------------
.. _udf.impala:

User Defined functions (UDF)
----------------------------

Impala currently supports user-defined scalar functions (known henceforth as
*UDFs*) and aggregate functions (respectively *UDAs*) via a C++ extension API.
Expand Down
32 changes: 21 additions & 11 deletions docs/source/backends.rst → docs/source/backends/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,23 @@ Backends
This document describes the classes of backends, how they work, and any details
about each backend that are relevant to end users.

.. _classes_of_backends:
For more information on a specific backend, check the next backend pages:

.. toctree::
:maxdepth: 1

sqlite
postgres
mysql
impala
omnisci
bigquery
clickhouse
spark
pandas

Classes of Backends
-------------------

.. _classes_of_backends:

There are currently three classes of backends that live in ibis.

Expand All @@ -19,8 +32,7 @@ There are currently three classes of backends that live in ibis.

.. _string_generating_backends:

String Generating Backends
~~~~~~~~~~~~~~~~~~~~~~~~~~
**String Generating Backends**

The first category of backend translates ibis expressions into strings.
Generally speaking these backends also need to handle their own execution.
Expand All @@ -31,13 +43,12 @@ string to the database through a driver API.
- `Yandex Clickhouse <https://clickhouse.yandex/>`_
- `Google BigQuery <https://cloud.google.com/bigquery/>`_
- `Hadoop Distributed File System (HDFS) <https://hadoop.apache.org/>`_
- `OmniSciDB <https://www.omnisci.com/>`_ (Experimental)
- `OmniSciDB <https://www.omnisci.com/>`_
- `PySpark/Spark SQL <https://spark.apache.org/sql/>`_ (Experimental)

.. _expression_generating_backends:

Expression Generating Backends
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
**Expression Generating Backends**

The second category of backends translates ibis expressions into other
expressions. Currently, all expression generating backends generate `SQLAlchemy
Expand All @@ -54,11 +65,10 @@ dependencies).

.. _direct_execution_backends:

Direct Execution Backends
~~~~~~~~~~~~~~~~~~~~~~~~~
**Direct Execution Backends**

The only existing backend that directly executes ibis expressions is the pandas
backend. A full description of the implementation can be found in the module
docstring of the pandas backend located in ``ibis/pandas/execution/core.py``.
docstring of the pandas backend located in ``ibis/backends/pandas/core.py``.

- `Pandas <http://pandas.pydata.org/>`_
43 changes: 43 additions & 0 deletions docs/source/backends/mysql.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
.. _install.mysql:

`MySQL <https://www.mysql.com/>`_
=================================

Install dependencies for Ibis's MySQL dialect:

::

pip install ibis-framework[mysql]

Create a client by passing a connection string or individual parameters to
:func:`ibis.mysql.connect`:

.. code-block:: python
con = ibis.mysql.connect(url='mysql+pymysql://ibis:ibis@mysql/ibis_testing')
con = ibis.mysql.connect(
user='ibis',
password='ibis',
host='mysql',
database='ibis_testing',
)
.. _api.mysql:

API
---
.. currentmodule:: ibis.backends.mysql

The MySQL client is accessible through the ``ibis.mysql`` namespace.

Use ``ibis.mysql.connect`` with a SQLAlchemy-compatible connection string to
create a client.

.. autosummary::
:toctree: ../generated/

connect
MySQLClient.database
MySQLClient.list_databases
MySQLClient.list_tables
MySQLClient.table
245 changes: 155 additions & 90 deletions ibis/omniscidb/README.rst → docs/source/backends/omnisci.rst

Large diffs are not rendered by default.

77 changes: 22 additions & 55 deletions docs/source/udf.rst → docs/source/backends/pandas.rst
Original file line number Diff line number Diff line change
@@ -1,23 +1,26 @@
.. _udf:
`pandas <https://pandas.pydata.org/>`_
======================================

User Defined Functions
======================
Ibis's pandas backend is available in core Ibis:

Ibis provides a mechanism for writing custom scalar and aggregate functions,
with varying levels of support for different backends. UDFs/UDAFs are a complex
topic.
Create a client by supplying a dictionary of DataFrames using
:func:`ibis.pandas.connect`. The keys become the table names:

This section of the documentation will discuss some of the backend specific
details of user defined functions.

.. warning::
.. code-block:: python
The UDF API is provisional and subject to change.
import pandas as pd
con = ibis.pandas.connect(
{
'A': pd._testing.makeDataFrame(),
'B': pd._testing.makeDataFrame(),
}
)
.. _udf.pandas:

Pandas
------
User Defined functions (UDF)
----------------------------

Ibis supports defining three kinds of user-defined functions for operations on
expressions targeting the pandas backend: **element-wise**, **reduction**, and
**analytic**.
Expand All @@ -35,7 +38,7 @@ Here's how to define an element-wise function:
.. code-block:: python
import ibis.expr.datatypes as dt
from ibis.pandas import udf
from ibis.backends.pandas import udf
@udf.elementwise(input_type=[dt.int64], output_type=dt.double)
def add_one(x):
Expand All @@ -55,7 +58,7 @@ Here's how to define a reduction function:
.. code-block:: python
import ibis.expr.datatypes as dt
from ibis.pandas import udf
from ibis.backends.pandas import udf
@udf.reduction(input_type=[dt.double], output_type=dt.double)
def double_mean(series):
Expand All @@ -75,7 +78,7 @@ Here's how to define an analytic function:
.. code-block:: python
import ibis.expr.datatypes as dt
from ibis.pandas import udf
from ibis.backends.pandas import udf
@udf.analytic(input_type=[dt.double], output_type=dt.double)
def zscore(series):
Expand Down Expand Up @@ -109,7 +112,7 @@ Using ``add_one`` from above as an example, the following call will receive a
import ibis
import pandas as pd
df = pd.DataFrame({'a': [1, 2, 3]})
con = ibis.pandas.connect({'df': df})
con = ibis.backends.pandas.connect({'df': df})
t = con.table('df')
expr = add_one(t.a)
expr
Expand All @@ -127,7 +130,7 @@ in your function:
.. code-block:: python
import ibis.expr.datatypes as dt
from ibis.pandas import udf
from ibis.backends.pandas import udf
@udf.elementwise([dt.int64], dt.double)
def add_two(x, **kwargs):
Expand All @@ -142,46 +145,10 @@ For example:
.. code-block:: python
import ibis.expr.datatypes as dt
from ibis.pandas import udf
from ibis.backends.pandas import udf
@udf.elementwise([dt.int64], dt.double)
def add_two_with_none(x, y=None):
if y is None:
y = 2.0
return x + y
BigQuery
--------

.. _udf.bigquery:

.. note::

BigQuery only supports element-wise UDFs at this time.

BigQuery supports UDFs through JavaScript. Ibis provides support for this by
turning Python code into JavaScript.

The interface is very similar to the pandas UDF API:

.. code-block:: python
import ibis.expr.datatypes as dt
from ibis.bigquery import udf
@udf([dt.double], dt.double)
def my_bigquery_add_one(x):
return x + 1.0
Ibis will parse the source of the function and turn the resulting Python AST
into JavaScript source code (technically, ECMAScript 2015). Most of the Python
language is supported including classes, functions and generators.

When you want to use this function you call it like any other Python
function--only it must be called on an ibis expression:

.. code-block:: python
t = ibis.table([('a', 'double')])
expr = my_bigquery_add_one(t.a)
print(ibis.bigquery.compile(expr))
46 changes: 46 additions & 0 deletions docs/source/backends/postgres.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
.. _install.postgres:

`PostgreSQL <https://www.postgresql.org/>`_
===========================================

Install dependencies for Ibis's PostgreSQL dialect:

::

pip install ibis-framework[postgres]

Create a client by passing a connection string to the ``url`` parameter or
individual parameters to :func:`ibis.postgres.connect`:

.. code-block:: python
con = ibis.postgres.connect(
url='postgresql://postgres:postgres@postgres:5432/ibis_testing'
)
con = ibis.postgres.connect(
user='postgres',
password='postgres',
host='postgres',
port=5432,
database='ibis_testing',
)
.. _api.postgres:

API
---
.. currentmodule:: ibis.backends.postgres

The PostgreSQL client is accessible through the ``ibis.postgres`` namespace.

Use ``ibis.postgres.connect`` with a SQLAlchemy-compatible connection string to
create a client.

.. autosummary::
:toctree: ../generated/

connect
PostgreSQLClient.database
PostgreSQLClient.list_tables
PostgreSQLClient.list_databases
PostgreSQLClient.table
58 changes: 58 additions & 0 deletions docs/source/backends/spark.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
.. _install.spark:

`PySpark/Spark SQL <https://spark.apache.org/sql/>`_
====================================================

Install dependencies for Ibis's Spark dialect:

::

pip install ibis-framework[spark]

Create a client by passing in the spark session as a parameter to
:func:`ibis.spark.connect`:

.. code-block:: python
con = ibis.spark.connect(spark_session)
.. _api.spark:

API
---

SparkSQL client (Experimental)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. currentmodule:: ibis.backends.spark

The Spark SQL client is accessible through the ``ibis.spark`` namespace.

Use ``ibis.spark.connect`` to create a client.

.. autosummary::
:toctree: ../generated/

connect
SparkClient.database
SparkClient.list_databases
SparkClient.list_tables
SparkClient.table

.. _api.pyspark:

PySpark client (Experimental)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. currentmodule:: ibis.backends.pyspark

The PySpark client is accessible through the ``ibis.pyspark`` namespace.

Use ``ibis.pyspark.connect`` to create a client.

.. autosummary::
:toctree: ../generated/

connect
PySparkClient.database
PySparkClient.list_databases
PySparkClient.list_tables
PySparkClient.table
40 changes: 40 additions & 0 deletions docs/source/backends/sqlite.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
.. _install.sqlite:

`SQLite <https://www.sqlite.org/>`_
===================================

Install dependencies for Ibis's SQLite dialect:

::

pip install ibis-framework[sqlite]

Create a client by passing a path to a SQLite database to
:func:`ibis.sqlite.connect`:

.. code-block:: python
>>> import ibis
>>> ibis.sqlite.connect('path/to/my/sqlite.db')
See http://blog.ibis-project.org/sqlite-crunchbase-quickstart/ for a quickstart
using SQLite.

.. _api.sqlite:

API
---
.. currentmodule:: ibis.backends.sqlite

The SQLite client is accessible through the ``ibis.sqlite`` namespace.

Use ``ibis.sqlite.connect`` to create a SQLite client.

.. autosummary::
:toctree: ../generated/

connect
SQLiteClient.attach
SQLiteClient.database
SQLiteClient.list_tables
SQLiteClient.table
18 changes: 15 additions & 3 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@

import datetime
import glob
import os

import sphinx_rtd_theme # noqa: E402

Expand All @@ -35,6 +36,7 @@
'sphinx.ext.autodoc',
'sphinx.ext.autosummary',
'sphinx.ext.extlinks',
'sphinx.ext.intersphinx',
'sphinx.ext.mathjax',
'sphinx.ext.napoleon',
'nbsphinx',
Expand All @@ -46,9 +48,11 @@
napoleon_numpy_docstring = True
releases_github_path = "ibis-project/ibis"
releases_unstable_prehistory = True
releases_document_name = ["release"]
releases_document_name = [os.path.join("release", "index")]
ipython_warning_is_error = True
autosummary_generate = glob.glob("*.rst")
autosummary_generate = glob.glob("*.rst") + glob.glob(
os.path.join("backends", "*.rst")
)

# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']
Expand Down Expand Up @@ -89,7 +93,7 @@

# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
exclude_patterns = ['_build', '**.ipynb_checkpoints']
exclude_patterns = ['_build', '**.ipynb_checkpoints', 'tutorial/data']

# The reST default role (used for this markup: `text`) to use for all
# documents.
Expand All @@ -115,6 +119,14 @@
# If true, keep warnings as "system message" paragraphs in the built documents.
# keep_warnings = False

# -- Options for intersphinx ----------------------------------------------
intersphinx_mapping = {
'python': ('https://docs.python.org/3', None),
'pydata-google-auth': (
'https://pydata-google-auth.readthedocs.io/en/latest/',
None,
),
}

# -- Options for HTML output ----------------------------------------------

Expand Down
181 changes: 0 additions & 181 deletions docs/source/contributing.rst

This file was deleted.

Loading