267 changes: 213 additions & 54 deletions .github/workflows/ibis-backends.yml

Large diffs are not rendered by default.

34 changes: 26 additions & 8 deletions .github/workflows/ibis-docs-lint.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ jobs:
fetch-depth: 0

- name: install nix
uses: cachix/install-nix-action@v20
uses: cachix/install-nix-action@v22
with:
nix_path: nixpkgs=channel:nixos-unstable-small
extra_nix_config: |
Expand All @@ -47,7 +47,7 @@ jobs:
uses: actions/checkout@v3

- name: install nix
uses: cachix/install-nix-action@v20
uses: cachix/install-nix-action@v22
with:
extra_nix_config: |
access-tokens = github.com=${{ secrets.GITHUB_TOKEN }}
Expand Down Expand Up @@ -111,18 +111,28 @@ jobs:
tool: pytest
github-token: ${{ steps.generate-token.outputs.token }}
output-file-path: .benchmarks/output.json
benchmark-data-dir-path: bench
auto-push: true
benchmark-data-dir-path: ./bench
auto-push: false
comment-on-alert: true
alert-threshold: "300%"

- name: checkout gh-pages
run: git checkout gh-pages

- name: upload benchmark data
uses: actions/upload-artifact@v3
with:
name: bench
path: ./bench
if-no-files-found: error

docs_pr:
runs-on: ubuntu-latest
if: github.event_name == 'pull_request'
concurrency: docs-${{ github.repository }}-${{ github.head_ref || github.sha }}
steps:
- name: install nix
uses: cachix/install-nix-action@v20
uses: cachix/install-nix-action@v22
with:
extra_nix_config: |
access-tokens = github.com=${{ secrets.GITHUB_TOKEN }}
Expand Down Expand Up @@ -155,7 +165,7 @@ jobs:
- benchmarks
steps:
- name: install nix
uses: cachix/install-nix-action@v20
uses: cachix/install-nix-action@v22
with:
extra_nix_config: |
access-tokens = github.com=${{ secrets.GITHUB_TOKEN }}
Expand Down Expand Up @@ -186,11 +196,19 @@ jobs:
git config user.name 'ibis-docs-bot[bot]'
git config user.email 'ibis-docs-bot[bot]@users.noreply.github.com'
git config http.postBuffer 157286400
git config http.version 'HTTP/1.1'
- name: download benchmark data
uses: actions/download-artifact@v3
with:
name: bench
path: docs/bench

- name: build and push dev docs
run: |
nix develop --ignore-environment -c \
mkdocs gh-deploy --message 'docs: ibis@${{ github.sha }}'
mkdocs gh-deploy --message 'docs: ibis@${{ github.sha }}' --ignore-version
simulate_release:
runs-on: ubuntu-latest
Expand All @@ -199,7 +217,7 @@ jobs:
with:
fetch-depth: 0

- uses: cachix/install-nix-action@v20
- uses: cachix/install-nix-action@v22
with:
extra_nix_config: |
access-tokens = github.com=${{ secrets.GITHUB_TOKEN }}
Expand Down
15 changes: 0 additions & 15 deletions .github/workflows/ibis-main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,6 @@ jobs:
- ubuntu-latest
- windows-latest
python-version:
- "3.8"
- "3.9"
- "3.10"
- "3.11"
Expand Down Expand Up @@ -103,13 +102,6 @@ jobs:
with:
flags: core,${{ runner.os }},python-${{ steps.install_python.outputs.python-version }}

- name: publish test report
uses: actions/upload-artifact@v3
if: success() || failure()
with:
name: no-backends-${{ matrix.os }}-${{ matrix.python-version }}
path: junit.xml

test_shapely_duckdb_import:
name: Test shapely and duckdb import
runs-on: ${{ matrix.os }}
Expand Down Expand Up @@ -210,10 +202,3 @@ jobs:
uses: codecov/codecov-action@v3
with:
flags: core,doctests,${{ runner.os }},python-${{ steps.install_python.outputs.python-version }}

- name: publish test report
uses: actions/upload-artifact@v3
if: success() || failure()
with:
name: doctest-${{ matrix.os }}-${{ matrix.python-version }}
path: junit.xml
1 change: 0 additions & 1 deletion .github/workflows/nix-skip-helper.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,6 @@ jobs:
os:
- ubuntu-latest
python-version:
- "3.8"
- "3.9"
- "3.10"
- "3.11"
Expand Down
3 changes: 1 addition & 2 deletions .github/workflows/nix.yml
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,6 @@ jobs:
os:
- ubuntu-latest
python-version:
- "3.8"
- "3.9"
- "3.10"
- "3.11"
Expand All @@ -44,7 +43,7 @@ jobs:
uses: actions/checkout@v3

- name: install nix
uses: cachix/install-nix-action@v20
uses: cachix/install-nix-action@v22
with:
nix_path: nixpkgs=channel:nixos-unstable-small
extra_nix_config: |
Expand Down
63 changes: 63 additions & 0 deletions .github/workflows/pre-release.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
name: PyPI Pre-Release

on:
schedule:
# weekly on Sunday
- cron: "0 0 * * 0"

# as needed by clicking through the github actions UI
workflow_dispatch:

# we do not want more than one pre-release workflow executing at the same time, ever
concurrency:
group: pre-release
# cancelling in the middle of a release would create incomplete releases
# so cancel-in-progress is false
cancel-in-progress: false

jobs:
pre-release:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
with:
fetch-depth: 0

- name: install python
uses: actions/setup-python@v4
with:
python-version: "3.10"

- name: upgrade pip
run: python -m pip install --upgrade pip

- name: install poetry
run: python -m pip install 'poetry<1.4' poetry-dynamic-versioning

- name: compute ibis version
id: get_version
run: echo "value=$(poetry version)" >> "$GITHUB_OUTPUT"

- name: run some poetry sanity checks
run: poetry check
if: contains(steps.get_version.outputs.value, '.dev')

- name: build wheel and source dist
run: poetry build
if: contains(steps.get_version.outputs.value, '.dev')

- name: add test pypi index
if: contains(steps.get_version.outputs.value, '.dev')
run: poetry config repositories.test-pypi https://test.pypi.org/legacy/

- name: publish pre-release wheel to test pypi index
if: contains(steps.get_version.outputs.value, '.dev')
run: poetry publish -r test-pypi
env:
POETRY_PYPI_TOKEN_TEST_PYPI: ${{ secrets.TEST_PYPI_TOKEN }}

- name: publish pre-release wheel to pypi
if: contains(steps.get_version.outputs.value, '.dev')
run: poetry publish
env:
POETRY_PYPI_TOKEN_PYPI: ${{ secrets.PYPI_TOKEN }}
2 changes: 1 addition & 1 deletion .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ jobs:
fetch-depth: 0
token: ${{ steps.generate_token.outputs.token }}

- uses: cachix/install-nix-action@v20
- uses: cachix/install-nix-action@v22
with:
extra_nix_config: |
access-tokens = github.com=${{ secrets.GITHUB_TOKEN }}
Expand Down
30 changes: 0 additions & 30 deletions .github/workflows/test-report.yml

This file was deleted.

6 changes: 3 additions & 3 deletions .github/workflows/update-deps.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ jobs:
matrix: ${{ steps.get-flakes.outputs.matrix }}
steps:
- uses: actions/checkout@v3
- uses: cachix/install-nix-action@v20
- uses: cachix/install-nix-action@v22
with:
extra_nix_config: |
access-tokens = github.com=${{ secrets.GITHUB_TOKEN }}
Expand All @@ -34,7 +34,7 @@ jobs:
steps:
- uses: actions/checkout@v3

- uses: cachix/install-nix-action@v20
- uses: cachix/install-nix-action@v22
with:
extra_nix_config: |
access-tokens = github.com=${{ secrets.GITHUB_TOKEN }}
Expand Down Expand Up @@ -76,7 +76,7 @@ jobs:
app_id: ${{ secrets.PR_APPROVAL_BOT_APP_ID }}
private_key: ${{ secrets.PR_APPROVAL_BOT_APP_PRIVATE_KEY }}

- uses: cpcloud/compare-commits-action@v5.0.28
- uses: cpcloud/compare-commits-action@v5.0.33
if: fromJSON(steps.needs_pr.outputs.did_change)
id: compare_commits
with:
Expand Down
9 changes: 9 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,15 @@ dist
.coverage
coverage.xml

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# OS generated files
.directory
.gdb_history
Expand Down
19 changes: 15 additions & 4 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ ci:
autofix_prs: false
autoupdate_commit_msg: "chore(deps): pre-commit.ci autoupdate"
skip:
- actionlint
- actionlint-system
- deadnix
- just
- nixpkgs-fmt
Expand All @@ -17,13 +17,24 @@ default_stages:
- commit
repos:
- repo: https://github.com/rhysd/actionlint
rev: v1.6.24
rev: v1.6.25
hooks:
- id: actionlint
- id: actionlint-system
- repo: https://github.com/psf/black
rev: 23.3.0
hooks:
- id: black
- repo: https://github.com/kynan/nbstripout
rev: 0.6.1
hooks:
- id: nbstripout
exclude: .+/rendered/.+
- repo: https://github.com/codespell-project/codespell
rev: v2.2.5
hooks:
- id: codespell
additional_dependencies:
- tomli
- repo: local
hooks:
- id: ruff
Expand All @@ -38,7 +49,7 @@ repos:
require_serial: true
minimum_pre_commit_version: "2.9.2"
- repo: https://github.com/adrienverge/yamllint
rev: v1.30.0
rev: v1.32.0
hooks:
- id: yamllint
- repo: https://github.com/pre-commit/pre-commit-hooks
Expand Down
2 changes: 1 addition & 1 deletion CODE_OF_CONDUCT.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,5 +46,5 @@ members of the project's leadership.

## Attribution

Parts of this CoC are adapated from the [Dask code of
Parts of this CoC are adapted from the [Dask code of
conduct](https://github.com/dask/governance/blob/main/code-of-conduct.md).
33 changes: 17 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,23 +52,24 @@ Ibis aims to be a future-proof solution to interacting with data using Python an

Ibis acts as a universal frontend to the following systems:

- [Apache Arrow DataFusion](https://ibis-project.org/backends/Datafusion/) (experimental)
- [Apache Druid](https://ibis-project.org/backends/Druid/) (experimental)
- [Apache Impala](https://ibis-project.org/backends/Impala/)
- [Apache PySpark](https://ibis-project.org/backends/PySpark/)
- [BigQuery](https://ibis-project.org/backends/BigQuery/)
- [ClickHouse](https://ibis-project.org/backends/ClickHouse/)
- [Dask](https://ibis-project.org/backends/Dask/)
- [DuckDB](https://ibis-project.org/backends/DuckDB/)
- [Apache Arrow DataFusion](https://ibis-project.org/backends/datafusion/) (experimental)
- [Apache Druid](https://ibis-project.org/backends/druid/) (experimental)
- [Apache Impala](https://ibis-project.org/backends/impala/)
- [Apache PySpark](https://ibis-project.org/backends/pyspark/)
- [BigQuery](https://ibis-project.org/backends/bigquery/)
- [ClickHouse](https://ibis-project.org/backends/clickhouse/)
- [Dask](https://ibis-project.org/backends/dask/)
- [DuckDB](https://ibis-project.org/backends/duckdb/)
- [HeavyAI](https://github.com/heavyai/ibis-heavyai)
- [MySQL](https://ibis-project.org/backends/MySQL/)
- [Pandas](https://ibis-project.org/backends/Pandas/)
- [Polars](https://ibis-project.org/backends/Polars/) (experimental)
- [PostgreSQL](https://ibis-project.org/backends/PostgreSQL/)
- [SQL Server](https://ibis-project.org/backends/MSSQL/)
- [SQLite](https://ibis-project.org/backends/SQLite/)
- [Snowflake](https://ibis-project.org/backends/Snowflake) (experimental)
- [Trino](https://ibis-project.org/backends/Trino/) (experimental)
- [MySQL](https://ibis-project.org/backends/mysql/)
- [Oracle](https://ibis-project.org/backends/oracle/) (experimental)
- [Pandas](https://ibis-project.org/backends/pandas/)
- [Polars](https://ibis-project.org/backends/polars/) (experimental)
- [PostgreSQL](https://ibis-project.org/backends/postgresql/)
- [SQL Server](https://ibis-project.org/backends/mssql/)
- [SQLite](https://ibis-project.org/backends/sqlite/)
- [Snowflake](https://ibis-project.org/backends/snowflake) (experimental)
- [Trino](https://ibis-project.org/backends/trino/) (experimental)

The list of supported backends is continuously growing. Anyone can get involved
in adding new ones! Learn more about contributing to ibis in our contributing
Expand Down
54 changes: 54 additions & 0 deletions ci/check_disallowed_imports.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
#!/usr/bin/env python3

import collections
import fnmatch
import json
import pathlib
import subprocess
import sys

CURRENT_DIR = pathlib.Path(__file__).parent.absolute()


def generate_dependency_graph(*args):
command = ("pydeps", "--show-deps", *args)
print(f"Running: {' '.join(command)}") # noqa: T201
result = subprocess.check_output(command, text=True)
return json.loads(result)


def check_dependency_rules(dependency_graph, disallowed_imports):
prohibited_deps = collections.defaultdict(set)

for module, module_data in dependency_graph.items():
imports = module_data.get("imports", [])

for pattern, disallow_rules in disallowed_imports.items():
if fnmatch.fnmatch(module, pattern):
for disallow_rule in disallow_rules:
for imported in imports:
if fnmatch.fnmatch(imported, disallow_rule):
prohibited_deps[module].add(imported)

return prohibited_deps


disallowed_imports = {
"ibis.expr.*": ["numpy", "pandas"],
}


if __name__ == '__main__':
dependency_graph = generate_dependency_graph(*sys.argv[1:])
prohibited_deps = check_dependency_rules(dependency_graph, disallowed_imports)

print("\n") # noqa: T201
print("Prohibited dependencies:") # noqa: T201
print("------------------------") # noqa: T201
for module, deps in prohibited_deps.items():
print(f"\n{module}:") # noqa: T201
for dep in deps:
print(f" <= {dep}") # noqa: T201

if prohibited_deps:
sys.exit(1)
50 changes: 22 additions & 28 deletions ci/conda-lock/generate.sh
Original file line number Diff line number Diff line change
Expand Up @@ -13,11 +13,16 @@ python_version_file="$(mktemp --suffix=.yml)"

extras=(
-e bigquery
-e clickhouse
-e dask
-e druid
-e duckdb
# this doesn't work on any platform yet (issues with resolving some google deps)
# -e geospatial
-e impala
-e mssql
-e mysql
-e oracle
-e pandas
-e polars
-e postgres
Expand All @@ -30,32 +35,21 @@ extras=(
)
template="conda-lock/{platform}-${python_version}.lock"

linux_osx_extras=()
if [ "${python_version}" != "3.11" ]; then
# clickhouse cityhash doesn't exist for python 3.11
linux_osx_extras+=(-e clickhouse)
fi
function conda_lock() {
local platforms
platforms=(--platform "$1" --platform "$2")
shift 2
conda lock \
--file pyproject.toml \
--file "${python_version_file}" \
--kind explicit \
"${platforms[@]}" \
--filename-template "${template}" \
--filter-extras \
--conda="$(which conda)" \
--category dev --category test --category docs \
"${@}"
}

conda lock \
--file pyproject.toml \
--file "${python_version_file}" \
--kind explicit \
--platform linux-64 \
--platform osx-64 \
--filename-template "${template}" \
--filter-extras \
--conda="$(which conda)" \
--category dev --category test --category docs \
"${extras[@]}" "${linux_osx_extras[@]}" -e datafusion

conda lock \
--file pyproject.toml \
--file "${python_version_file}" \
--kind explicit \
--platform osx-arm64 \
--platform win-64 \
--filename-template "${template}" \
--filter-extras \
--conda="$(which conda)" \
--category dev --category test --category docs \
"${extras[@]}"
conda_lock linux-64 osx-64 "${extras[@]}" -e datafusion
conda_lock osx-arm64 win-64 "${extras[@]}"
35 changes: 17 additions & 18 deletions ci/schema/clickhouse.sql
Original file line number Diff line number Diff line change
@@ -1,25 +1,24 @@
-- NB: The paths in this file are all relative to /var/lib/clickhouse/user_files

CREATE OR REPLACE TABLE diamonds ENGINE = Memory AS
CREATE OR REPLACE TABLE ibis_testing.diamonds ENGINE = Memory AS
SELECT * FROM file('ibis/diamonds.parquet', 'Parquet');

CREATE OR REPLACE TABLE batting ENGINE = Memory AS
CREATE OR REPLACE TABLE ibis_testing.batting ENGINE = Memory AS
SELECT * FROM file('ibis/batting.parquet', 'Parquet');

CREATE OR REPLACE TABLE awards_players ENGINE = Memory AS
CREATE OR REPLACE TABLE ibis_testing.awards_players ENGINE = Memory AS
SELECT * FROM file('ibis/awards_players.parquet', 'Parquet');

CREATE OR REPLACE TABLE functional_alltypes ENGINE = Memory AS
CREATE OR REPLACE TABLE ibis_testing.functional_alltypes ENGINE = Memory AS
SELECT * REPLACE(CAST(timestamp_col AS Nullable(DateTime)) AS timestamp_col)
FROM file('ibis/functional_alltypes.parquet', 'Parquet');

CREATE OR REPLACE TABLE tzone (
CREATE OR REPLACE TABLE ibis_testing.tzone (
ts Nullable(DateTime),
key Nullable(String),
value Nullable(Float64)
) ENGINE = Memory;

CREATE OR REPLACE TABLE array_types (
CREATE OR REPLACE TABLE ibis_testing.array_types (
x Array(Nullable(Int64)),
y Array(Nullable(String)),
z Array(Nullable(Float64)),
Expand All @@ -28,39 +27,39 @@ CREATE OR REPLACE TABLE array_types (
multi_dim Array(Array(Nullable(Int64)))
) ENGINE = Memory;

INSERT INTO array_types VALUES
INSERT INTO ibis_testing.array_types VALUES
([1, 2, 3], ['a', 'b', 'c'], [1.0, 2.0, 3.0], 'a', 1.0, [[], [1, 2, 3], []]),
([4, 5], ['d', 'e'], [4.0, 5.0], 'a', 2.0, []),
([6, NULL], ['f', NULL], [6.0, NULL], 'a', 3.0, [[], [], []]),
([NULL, 1, NULL], [NULL, 'a', NULL], [], 'b', 4.0, [[1], [2], [], [3, 4, 5]]),
([2, NULL, 3], ['b', NULL, 'c'], NULL, 'b', 5.0, []),
([4, NULL, NULL, 5], ['d', NULL, NULL, 'e'], [4.0, NULL, NULL, 5.0], 'c', 6.0, [[1, 2, 3]]);

CREATE OR REPLACE TABLE time_df1 (
CREATE OR REPLACE TABLE ibis_testing.time_df1 (
time Int64,
value Nullable(Float64),
key Nullable(String)
) ENGINE = Memory;
INSERT INTO time_df1 VALUES
INSERT INTO ibis_testing.time_df1 VALUES
(1, 1.0, 'x'),
(20, 20.0, 'x'),
(30, 30.0, 'x'),
(40, 40.0, 'x'),
(50, 50.0, 'x');

CREATE OR REPLACE TABLE time_df2 (
CREATE OR REPLACE TABLE ibis_testing.time_df2 (
time Int64,
value Nullable(Float64),
key Nullable(String)
) ENGINE = Memory;
INSERT INTO time_df2 VALUES
INSERT INTO ibis_testing.time_df2 VALUES
(19, 19.0, 'x'),
(21, 21.0, 'x'),
(39, 39.0, 'x'),
(49, 49.0, 'x'),
(1000, 1000.0, 'x');

CREATE OR REPLACE TABLE struct (
CREATE OR REPLACE TABLE ibis_testing.struct (
abc Tuple(
a Nullable(Float64),
b Nullable(String),
Expand All @@ -70,7 +69,7 @@ CREATE OR REPLACE TABLE struct (

-- NULL is the same as tuple(NULL, NULL, NULL) because clickhouse doesn't
-- support Nullable(Tuple(...))
INSERT INTO struct VALUES
INSERT INTO ibis_testing.struct VALUES
(tuple(1.0, 'banana', 2)),
(tuple(2.0, 'apple', 3)),
(tuple(3.0, 'orange', 4)),
Expand All @@ -79,14 +78,14 @@ INSERT INTO struct VALUES
(tuple(NULL, NULL, NULL)),
(tuple(3.0, 'orange', NULL));

CREATE OR REPLACE TABLE map (kv Map(String, Nullable(Int64))) ENGINE = Memory;
CREATE OR REPLACE TABLE ibis_testing.map (kv Map(String, Nullable(Int64))) ENGINE = Memory;

INSERT INTO map VALUES
INSERT INTO ibis_testing.map VALUES
(map('a', 1, 'b', 2, 'c', 3)),
(map('d', 4, 'e', 5, 'c', 6));

CREATE OR REPLACE TABLE win (g String, x Int64, y Int64) ENGINE = Memory;
INSERT INTO win VALUES
CREATE OR REPLACE TABLE ibis_testing.win (g String, x Int64, y Int64) ENGINE = Memory;
INSERT INTO ibis_testing.win VALUES
('a', 0, 3),
('a', 1, 2),
('a', 2, 0),
Expand Down
10 changes: 5 additions & 5 deletions ci/schema/druid.sql
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ OVERWRITE ALL
SELECT *
FROM TABLE(
EXTERN(
'{"type":"local","files":["/opt/shared/diamonds.parquet"]}',
'{"type":"local","files":["/data/diamonds.parquet"]}',
'{"type":"parquet"}',
'[{"name":"carat","type":"double"},{"name":"cut","type":"string"},{"name":"color","type":"string"},{"name":"clarity","type":"string"},{"name":"depth","type":"double"},{"name":"table","type":"double"},{"name":"price","type":"long"},{"name":"x","type":"double"},{"name":"y","type":"double"},{"name":"z","type":"double"}]'
)
Expand All @@ -15,7 +15,7 @@ OVERWRITE ALL
SELECT *
FROM TABLE(
EXTERN(
'{"type":"local","files":["/opt/shared/batting.parquet"]}',
'{"type":"local","files":["/data/batting.parquet"]}',
'{"type":"parquet"}',
'[{"name":"playerID","type":"string"},{"name":"yearID","type":"long"},{"name":"stint","type":"long"},{"name":"teamID","type":"string"},{"name":"lgID","type":"string"},{"name":"G","type":"long"},{"name":"AB","type":"long"},{"name":"R","type":"long"},{"name":"H","type":"long"},{"name":"X2B","type":"long"},{"name":"X3B","type":"long"},{"name":"HR","type":"long"},{"name":"RBI","type":"long"},{"name":"SB","type":"long"},{"name":"CS","type":"long"},{"name":"BB","type":"long"},{"name":"SO","type":"long"},{"name":"IBB","type":"long"},{"name":"HBP","type":"long"},{"name":"SH","type":"long"},{"name":"SF","type":"long"},{"name":"GIDP","type":"long"}]'
)
Expand All @@ -27,7 +27,7 @@ OVERWRITE ALL
SELECT *
FROM TABLE(
EXTERN(
'{"type":"local","files":["/opt/shared/awards_players.parquet"]}',
'{"type":"local","files":["/data/awards_players.parquet"]}',
'{"type":"parquet"}',
'[{"name":"playerID","type":"string"},{"name":"awardID","type":"string"},{"name":"yearID","type":"long"},{"name":"lgID","type":"string"},{"name":"tie","type":"string"},{"name":"notes","type":"string"}]'
)
Expand All @@ -39,9 +39,9 @@ OVERWRITE ALL
SELECT *
FROM TABLE(
EXTERN(
'{"type":"local","files":["/opt/shared/functional_alltypes.parquet"]}',
'{"type":"local","files":["/data/functional_alltypes.parquet"]}',
'{"type":"parquet"}',
'[{"name":"index","type":"long"},{"name":"Unnamed: 0","type":"long"},{"name":"id","type":"long"},{"name":"bool_col","type":"long"},{"name":"tinyint_col","type":"long"},{"name":"smallint_col","type":"long"},{"name":"int_col","type":"long"},{"name":"bigint_col","type":"long"},{"name":"float_col","type":"double"},{"name":"double_col","type":"double"},{"name":"date_string_col","type":"string"},{"name":"string_col","type":"string"},{"name":"timestamp_col","type":"string"},{"name":"year","type":"long"},{"name":"month","type":"long"}]'
'[{"name":"id","type":"long"},{"name":"bool_col","type":"long"},{"name":"tinyint_col","type":"long"},{"name":"smallint_col","type":"long"},{"name":"int_col","type":"long"},{"name":"bigint_col","type":"long"},{"name":"float_col","type":"double"},{"name":"double_col","type":"double"},{"name":"date_string_col","type":"string"},{"name":"string_col","type":"string"},{"name":"timestamp_col","type":"string"},{"name":"year","type":"long"},{"name":"month","type":"long"}]'
)
)
PARTITIONED BY ALL TIME;
2 changes: 0 additions & 2 deletions ci/schema/duckdb.sql
Original file line number Diff line number Diff line change
Expand Up @@ -46,8 +46,6 @@ CREATE OR REPLACE TABLE awards_players (
);

CREATE OR REPLACE TABLE functional_alltypes (
"index" BIGINT,
"Unnamed: 0" BIGINT,
id INTEGER,
bool_col BOOLEAN,
tinyint_col SMALLINT,
Expand Down
4 changes: 0 additions & 4 deletions ci/schema/mssql.sql
Original file line number Diff line number Diff line change
Expand Up @@ -70,8 +70,6 @@ WITH (FORMAT = 'CSV', FIELDTERMINATOR = ',', ROWTERMINATOR = '\n', FIRSTROW = 2)
DROP TABLE IF EXISTS functional_alltypes;

CREATE TABLE functional_alltypes (
"index" BIGINT,
"Unnamed: 0" BIGINT,
id INTEGER,
bool_col BIT,
tinyint_col SMALLINT,
Expand All @@ -91,8 +89,6 @@ BULK INSERT functional_alltypes
FROM '/data/functional_alltypes.csv'
WITH (FORMAT = 'CSV', FIELDTERMINATOR = ',', ROWTERMINATOR = '\n', FIRSTROW = 2)

CREATE INDEX "ix_functional_alltypes_index" ON functional_alltypes ("index");

DROP TABLE IF EXISTS win;

CREATE TABLE win (g VARCHAR(MAX), x BIGINT, y BIGINT);
Expand Down
4 changes: 0 additions & 4 deletions ci/schema/mysql.sql
Original file line number Diff line number Diff line change
Expand Up @@ -54,8 +54,6 @@ CREATE TABLE awards_players (
DROP TABLE IF EXISTS functional_alltypes;

CREATE TABLE functional_alltypes (
`index` BIGINT,
`Unnamed: 0` BIGINT,
id INTEGER,
bool_col BOOLEAN,
tinyint_col TINYINT,
Expand All @@ -71,8 +69,6 @@ CREATE TABLE functional_alltypes (
month INTEGER
) DEFAULT CHARACTER SET = utf8;

CREATE INDEX `ix_functional_alltypes_index` ON functional_alltypes (`index`);

DROP TABLE IF EXISTS json_t CASCADE;

CREATE TABLE IF NOT EXISTS json_t (js JSON);
Expand Down
90 changes: 90 additions & 0 deletions ci/schema/oracle.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
-- https://docs.oracle.com/database/121/DRDAS/data_type.htm#DRDAS264
-- says that NUMBER(4) -> NUMBER(4)
-- says that NUMBER(9) -> NUMBER(9)
-- says that BIGINT -> NUMBER(18);

DROP TABLE IF EXISTS "diamonds";

CREATE TABLE "diamonds" (
"carat" BINARY_FLOAT,
"cut" VARCHAR2(255),
"color" VARCHAR2(255),
"clarity" VARCHAR2(255),
"depth" BINARY_FLOAT,
"table" BINARY_FLOAT,
"price" NUMBER(18),
"x" BINARY_FLOAT,
"y" BINARY_FLOAT,
"z" BINARY_FLOAT
);

DROP TABLE IF EXISTS "batting";

CREATE TABLE "batting" (
"playerID" VARCHAR2(255),
"yearID" NUMBER(18),
"stint" NUMBER(18),
"teamID" VARCHAR2(7),
"lgID" VARCHAR2(7),
"G" NUMBER(18),
"AB" NUMBER(18),
"R" NUMBER(18),
"H" NUMBER(18),
"X2B" NUMBER(18),
"X3B" NUMBER(18),
"HR" NUMBER(18),
"RBI" NUMBER(18),
"SB" NUMBER(18),
"CS" NUMBER(18),
"BB" NUMBER(18),
"SO" NUMBER(18),
"IBB" NUMBER(18),
"HBP" NUMBER(18),
"SH" NUMBER(18),
"SF" NUMBER(18),
"GIDP" NUMBER(18)
);

DROP TABLE IF EXISTS "awards_players";

CREATE TABLE "awards_players" (
"playerID" VARCHAR2(255),
"awardID" VARCHAR2(255),
"yearID" NUMBER(18),
"lgID" VARCHAR2(7),
"tie" VARCHAR2(7),
"notes" VARCHAR2(255)
) ;

DROP TABLE IF EXISTS "functional_alltypes";

CREATE TABLE "functional_alltypes" (
"id" NUMBER(9),
-- There is no boolean type in oracle
-- and no recommendation on how to implement it
-- I'm going with 0/1 in a NUMBER(1)
"bool_col" NUMBER(1),
"tinyint_col" NUMBER(2),
"smallint_col" NUMBER(4),
"int_col" NUMBER(9),
"bigint_col" NUMBER(18),
"float_col" BINARY_FLOAT,
"double_col" BINARY_DOUBLE,
"date_string_col" VARCHAR2(255),
"string_col" VARCHAR2(255),
"timestamp_col" TIMESTAMP(3),
"year" NUMBER(9),
"month" NUMBER(9)
);

DROP TABLE IF EXISTS "win";

CREATE TABLE "win" ("g" VARCHAR2(8), "x" NUMBER(18), "y" NUMBER(18));
INSERT INTO "win" VALUES
('a', 0, 3),
('a', 1, 2),
('a', 2, 0),
('a', 3, 1),
('a', 4, 1);

COMMIT;
7 changes: 7 additions & 0 deletions ci/schema/oracle/awards_players.ctl
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
options (SKIP=1)
load data
infile '/opt/oracle/data/awards_players.csv'
into table "awards_players"
fields terminated by "," optionally enclosed by '"'
TRAILING NULLCOLS
( "playerID", "awardID", "yearID", "lgID", "tie", "notes" )
7 changes: 7 additions & 0 deletions ci/schema/oracle/batting.ctl
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
options (SKIP=1)
load data
infile '/opt/oracle/data/batting.csv'
into table "batting"
fields terminated by "," optionally enclosed by '"'
TRAILING NULLCOLS
( "playerID", "yearID", "stint", "teamID", "lgID", "G", "AB", "R", "H", "X2B", "X3B", "HR", "RBI", "SB", "CS", "BB", "SO", "IBB", "HBP", "SH", "SF", "GIDP" )
6 changes: 6 additions & 0 deletions ci/schema/oracle/diamonds.ctl
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
options (SKIP=1)
load data
infile '/opt/oracle/data/diamonds.csv'
into table "diamonds"
fields terminated by "," optionally enclosed by '"'
( "carat", "cut", "color", "clarity", "depth", "table", "price", "x", "y", "z" )
19 changes: 19 additions & 0 deletions ci/schema/oracle/functional_alltypes.ctl
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
options (SKIP=1)
load data
infile '/opt/oracle/data/functional_alltypes.csv'
into table "functional_alltypes"
fields terminated by "," optionally enclosed by '"'
TRAILING NULLCOLS
( "id",
"bool_col",
"tinyint_col",
"smallint_col",
"int_col",
"bigint_col",
"float_col",
"double_col",
"date_string_col",
"string_col",
"timestamp_col" "to_timestamp(:\"timestamp_col\", 'YYYY-MM-DD HH24:MI:SS.FF')",
"year",
"month" )
29 changes: 20 additions & 9 deletions ci/schema/postgresql.sql
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ CREATE EXTENSION IF NOT EXISTS hstore;
CREATE EXTENSION IF NOT EXISTS postgis;
CREATE EXTENSION IF NOT EXISTS plpython3u;
CREATE EXTENSION IF NOT EXISTS vector;
CREATE EXTENSION IF NOT EXISTS first_last_agg;

DROP TABLE IF EXISTS diamonds CASCADE;

Expand All @@ -18,6 +19,8 @@ CREATE TABLE diamonds (
z FLOAT
);

COPY diamonds FROM '/data/diamonds.csv' WITH (FORMAT CSV, HEADER TRUE, DELIMITER ',');

DROP TABLE IF EXISTS batting CASCADE;

CREATE TABLE batting (
Expand Down Expand Up @@ -45,6 +48,8 @@ CREATE TABLE batting (
"GIDP" BIGINT
);

COPY batting FROM '/data/batting.csv' WITH (FORMAT CSV, HEADER TRUE, DELIMITER ',');

DROP TABLE IF EXISTS awards_players CASCADE;

CREATE TABLE awards_players (
Expand All @@ -53,18 +58,22 @@ CREATE TABLE awards_players (
"yearID" BIGINT,
"lgID" TEXT,
tie TEXT,
notes TEXT,
search TSVECTOR GENERATED ALWAYS AS (
setweight(to_tsvector('simple', notes), 'A')::TSVECTOR
) STORED,
simvec VECTOR GENERATED always AS ('[1,2,3]'::VECTOR) STORED
notes TEXT
);

COPY awards_players FROM '/data/awards_players.csv' WITH (FORMAT CSV, HEADER TRUE, DELIMITER ',');

DROP VIEW IF EXISTS awards_players_special_types CASCADE;
CREATE VIEW awards_players_special_types AS
SELECT
*,
setweight(to_tsvector('simple', notes), 'A')::TSVECTOR AS search,
'[1,2,3]'::VECTOR AS simvec
FROM awards_players;

DROP TABLE IF EXISTS functional_alltypes CASCADE;

CREATE TABLE functional_alltypes (
"index" BIGINT,
"Unnamed: 0" BIGINT,
id INTEGER,
bool_col BOOLEAN,
tinyint_col SMALLINT,
Expand All @@ -80,7 +89,7 @@ CREATE TABLE functional_alltypes (
month INTEGER
);

CREATE INDEX "ix_functional_alltypes_index" ON functional_alltypes ("index");
COPY functional_alltypes FROM '/data/functional_alltypes.csv' WITH (FORMAT CSV, HEADER TRUE, DELIMITER ',');

DROP TABLE IF EXISTS tzone CASCADE;

Expand Down Expand Up @@ -174,14 +183,16 @@ CREATE TABLE IF NOT EXISTS not_supported_intervals (

DROP TABLE IF EXISTS geo CASCADE;

CREATE TABLE IF NOT EXISTS geo (
CREATE TABLE geo (
id BIGSERIAL PRIMARY KEY,
geo_point GEOMETRY(POINT),
geo_linestring GEOMETRY(LINESTRING),
geo_polygon GEOMETRY(POLYGON),
geo_multipolygon GEOMETRY(MULTIPOLYGON)
);

COPY geo FROM '/data/geo.csv' WITH (FORMAT CSV, HEADER TRUE, DELIMITER ',');

CREATE INDEX IF NOT EXISTS idx_geo_geo_linestring ON geo USING GIST (geo_linestring);
CREATE INDEX IF NOT EXISTS idx_geo_geo_multipolygon ON geo USING GIST (geo_multipolygon);
CREATE INDEX IF NOT EXISTS idx_geo_geo_point ON geo USING GIST (geo_point);
Expand Down
4 changes: 1 addition & 3 deletions ci/schema/snowflake.sql
Original file line number Diff line number Diff line change
Expand Up @@ -54,8 +54,6 @@ CREATE OR REPLACE TABLE awards_players (
);

CREATE OR REPLACE TABLE functional_alltypes (
"index" BIGINT,
"Unnamed: 0" BIGINT,
"id" INTEGER,
"bool_col" BOOLEAN,
"tinyint_col" SMALLINT,
Expand Down Expand Up @@ -92,7 +90,7 @@ CREATE OR REPLACE TABLE map ("kv" OBJECT);

INSERT INTO map ("kv")
SELECT object_construct('a', 1, 'b', 2, 'c', 3) UNION
SELECT object_construct('d', 4, 'e', 5, 'c', 6);
SELECT object_construct('d', 4, 'e', 5, 'f', 6);


CREATE OR REPLACE TABLE struct ("abc" OBJECT);
Expand Down
4 changes: 0 additions & 4 deletions ci/schema/sqlite.sql
Original file line number Diff line number Diff line change
@@ -1,8 +1,6 @@
DROP TABLE IF EXISTS functional_alltypes;

CREATE TABLE functional_alltypes (
"index" BIGINT,
"Unnamed: 0" BIGINT,
id BIGINT,
bool_col BOOLEAN,
tinyint_col BIGINT,
Expand All @@ -19,8 +17,6 @@ CREATE TABLE functional_alltypes (
CHECK (bool_col IN (0, 1))
);

CREATE INDEX ix_functional_alltypes_index ON "functional_alltypes" ("index");

DROP TABLE IF EXISTS awards_players;

CREATE TABLE awards_players (
Expand Down
2 changes: 1 addition & 1 deletion ci/udf/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,6 @@ endfunction(COMPILE_TO_IR)
add_library(udfsample SHARED udf-sample.cc)
add_library(udasample SHARED uda-sample.cc)

# Custom targest to cross compile UDA/UDF to ir
# Custom targets to cross compile UDA/UDF to ir
COMPILE_TO_IR(udf-sample.cc)
COMPILE_TO_IR(uda-sample.cc)
4 changes: 2 additions & 2 deletions ci/udf/lib/udf.h
Original file line number Diff line number Diff line change
Expand Up @@ -168,7 +168,7 @@ class FunctionContext {
void Free(int64_t byte_size);

/// Methods for maintaining state across UDF/UDA function calls. SetFunctionState() can
/// be used to store a pointer that can then be retreived via GetFunctionState(). If
/// be used to store a pointer that can then be retrieved via GetFunctionState(). If
/// GetFunctionState() is called when no pointer is set, it will return
/// NULL. SetFunctionState() does not take ownership of 'ptr'; it is up to the UDF/UDA
/// to clean up any function state if necessary.
Expand Down Expand Up @@ -599,7 +599,7 @@ struct StringVal : public AnyVal {

struct DecimalVal : public impala_udf::AnyVal {
/// Decimal data is stored as an unscaled integer value. For example, the decimal 1.00
/// (precison 3, scale 2) is stored as 100. The byte size necessary to store the decimal
/// (precision 3, scale 2) is stored as 100. The byte size necessary to store the decimal
/// depends on the precision, which determines which field of the union should be used to
/// store and manipulate the unscaled value.
//
Expand Down
4 changes: 2 additions & 2 deletions ci/udf/udf-sample.cc
Original file line number Diff line number Diff line change
Expand Up @@ -120,7 +120,7 @@ void ReturnConstantArgPrepare(
}
}

// Retreives and returns the shared state set in the prepare function
// Retrieves and returns the shared state set in the prepare function
IntVal ReturnConstantArg(FunctionContext* context, const IntVal& const_val) {
IntVal* state = reinterpret_cast<IntVal*>(
context->GetFunctionState(FunctionContext::THREAD_LOCAL));
Expand All @@ -131,7 +131,7 @@ IntVal ReturnConstantArg(FunctionContext* context, const IntVal& const_val) {
void ReturnConstantArgClose(
FunctionContext* context, FunctionContext::FunctionStateScope scope) {
if (scope == FunctionContext::THREAD_LOCAL) {
// Retreive and deallocate the shared state
// Retrieve and deallocate the shared state
void* state = context->GetFunctionState(scope);
context->Free(reinterpret_cast<uint8_t*>(state));
context->SetFunctionState(scope, NULL);
Expand Down
2 changes: 1 addition & 1 deletion codecov.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ coverage:
patch:
default:
target: auto
threshold: 1%
threshold: 92%
only_pulls: true
project:
default:
Expand Down
381 changes: 191 additions & 190 deletions conda-lock/linux-64-3.10.lock

Large diffs are not rendered by default.

379 changes: 191 additions & 188 deletions conda-lock/linux-64-3.11.lock

Large diffs are not rendered by default.

444 changes: 0 additions & 444 deletions conda-lock/linux-64-3.8.lock

This file was deleted.

379 changes: 190 additions & 189 deletions conda-lock/linux-64-3.9.lock

Large diffs are not rendered by default.

365 changes: 183 additions & 182 deletions conda-lock/osx-64-3.10.lock

Large diffs are not rendered by default.

363 changes: 183 additions & 180 deletions conda-lock/osx-64-3.11.lock

Large diffs are not rendered by default.

424 changes: 0 additions & 424 deletions conda-lock/osx-64-3.8.lock

This file was deleted.

363 changes: 182 additions & 181 deletions conda-lock/osx-64-3.9.lock

Large diffs are not rendered by default.

364 changes: 183 additions & 181 deletions conda-lock/osx-arm64-3.10.lock

Large diffs are not rendered by default.

364 changes: 183 additions & 181 deletions conda-lock/osx-arm64-3.11.lock

Large diffs are not rendered by default.

422 changes: 0 additions & 422 deletions conda-lock/osx-arm64-3.8.lock

This file was deleted.

362 changes: 182 additions & 180 deletions conda-lock/osx-arm64-3.9.lock

Large diffs are not rendered by default.

382 changes: 192 additions & 190 deletions conda-lock/win-64-3.10.lock

Large diffs are not rendered by default.

382 changes: 192 additions & 190 deletions conda-lock/win-64-3.11.lock

Large diffs are not rendered by default.

423 changes: 0 additions & 423 deletions conda-lock/win-64-3.8.lock

This file was deleted.

380 changes: 191 additions & 189 deletions conda-lock/win-64-3.9.lock

Large diffs are not rendered by default.

70 changes: 52 additions & 18 deletions docker-compose.yml
Original file line number Diff line number Diff line change
@@ -1,15 +1,16 @@
version: "3.4"
services:
clickhouse:
image: clickhouse/clickhouse-server:23.3.1.2823-alpine
image: clickhouse/clickhouse-server:23.6.1.1524-alpine
ports:
- 8123:8123
- 9000:9000
healthcheck:
interval: 1s
retries: 10
test:
- CMD-SHELL
- nc -z 127.0.0.1 9000
- nc -z 127.0.0.1 8123 && nc -z 127.0.0.1 9000
timeout: 10s
volumes:
- clickhouse:/var/lib/clickhouse/user_files/ibis
Expand Down Expand Up @@ -56,7 +57,7 @@ services:
- CMD
- pg_isready
timeout: 5s
image: postgres:13.9-alpine
image: postgres:13.11-alpine
networks:
- impala
kudu:
Expand Down Expand Up @@ -108,14 +109,16 @@ services:
retries: 30
test:
- CMD
- mysqladmin
- mariadb-admin
- ping
timeout: 5s
image: mariadb:10.11.2
image: mariadb:10.11.4
ports:
- 3306:3306
networks:
- mysql
volumes:
- mysql:/data
postgres:
user: postgres
environment:
Expand All @@ -135,6 +138,8 @@ services:
- 5432:5432
networks:
- postgres
volumes:
- postgres:/data
mssql:
image: mcr.microsoft.com/mssql/server:2022-latest
environment:
Expand Down Expand Up @@ -172,6 +177,8 @@ services:
- 5433:5432
networks:
- trino
volumes:
- trino-postgres:/data
trino:
depends_on:
- trino-postgres
Expand All @@ -182,7 +189,7 @@ services:
- CMD-SHELL
- trino --execute 'SELECT 1 AS one'
timeout: 30s
image: trinodb/trino:412
image: trinodb/trino:420
ports:
- 8080:8080
networks:
Expand All @@ -193,10 +200,8 @@ services:
- $PWD/docker/trino/jvm.config:/etc/trino/jvm.config:ro

druid-postgres:
image: postgres:15.2-alpine
image: postgres:15.3-alpine
container_name: druid-postgres
volumes:
- metadata_data:/var/lib/postgresql/data
environment:
- POSTGRES_PASSWORD=FoolishPassword
- POSTGRES_USER=druid
Expand Down Expand Up @@ -229,7 +234,7 @@ services:
- druid

druid-coordinator:
image: apache/druid:25.0.0
image: apache/druid:26.0.0
hostname: coordinator
container_name: coordinator
volumes:
Expand All @@ -253,7 +258,7 @@ services:
- druid

druid-broker:
image: apache/druid:25.0.0
image: apache/druid:26.0.0
hostname: broker
container_name: broker
volumes:
Expand All @@ -279,7 +284,7 @@ services:
- druid

druid-historical:
image: apache/druid:25.0.0
image: apache/druid:26.0.0
hostname: historical
container_name: historical
volumes:
Expand All @@ -304,12 +309,13 @@ services:
- druid

druid-middlemanager:
image: apache/druid:25.0.0
image: apache/druid:26.0.0
hostname: middlemanager
container_name: middlemanager
volumes:
- druid:/opt/shared
- middle_var:/opt/druid/var
- druid-data:/data
depends_on:
- druid-zookeeper
- druid-postgres
Expand All @@ -329,7 +335,7 @@ services:
- druid

druid:
image: apache/druid:25.0.0
image: apache/druid:26.0.0
hostname: router
container_name: router
volumes:
Expand Down Expand Up @@ -357,6 +363,28 @@ services:
networks:
- druid

oracle:
image: gvenzl/oracle-free:23
environment:
ORACLE_PASSWORD: ibis
ORACLE_DATABASE: IBIS_TESTING
APP_USER: ibis
APP_USER_PASSWORD: ibis
ports:
- 1521:1521
healthcheck:
interval: 5s
retries: 10
test:
- CMD-SHELL
- ./healthcheck.sh
timeout: 30s
restart: on-failure
networks:
- oracle
volumes:
- oracle:/opt/oracle/data

networks:
impala:
mysql:
Expand All @@ -365,14 +393,20 @@ networks:
postgres:
trino:
druid:
oracle:

volumes:
metadata_data:
middle_var:
historical_var:
broker_var:
coordinator_var:
druid:
historical_var:
middle_var:
router_var:
# test data volumes
clickhouse:
druid:
druid-data:
mssql:
mysql:
oracle:
postgres:
trino-postgres:
1 change: 1 addition & 0 deletions docker/postgres/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ FROM postgis/postgis:15-3.3-alpine
RUN apk add --no-cache build-base clang15 llvm15 postgresql15-plpython3 python3 py3-pip && \
python3 -m pip install pgxnclient && \
pgxn install vector && \
pgxn install first_last_agg && \
python3 -m pip uninstall -y pgxnclient && \
rm -rf ~/.cache/pip && \
apk del build-base clang15 llvm15 python3 py3-pip
74 changes: 44 additions & 30 deletions docs/SUMMARY.md
Original file line number Diff line number Diff line change
@@ -1,38 +1,52 @@
* [Home](index.md)
* [Install](install.md)
* [Docs](docs/index.md)
* [Getting Started](getting_started.md)
* [How To Guide](how_to/)
* [Execution Backends](backends/)
* [User Guide](user_guide/)
* API Reference
* [Expressions](api/expressions/index.md)
* [Top Level](api/expressions/top_level.md)
* [Tables](api/expressions/tables.md)
* [Generic Values](api/expressions/generic.md)
* [Numeric + Boolean](api/expressions/numeric.md)
* [Strings](api/expressions/strings.md)
* [Timestamps + Dates + Times](api/expressions/timestamps.md)
* [Collections](api/expressions/collections.md)
* [Geospatial](api/expressions/geospatial.md)
* [Column Selectors](api/selectors.md)
* [Data Types](api/datatypes.md)
* [Schemas](api/schemas.md)
* [Backend Interfaces](api/backends/)
* [Configuration](api/config.md)
* [Ibis for SQL Programmers](ibis-for-sql-programmers.ipynb)
* [Ibis for pandas Users](ibis-for-pandas-users.ipynb)
* [Backend Operations Matrix](backends/support_matrix.md)
* [Why Ibis?](why_ibis.md)
* [Releases](release_notes.md)
* Concepts
* [Why Ibis?](concept/why_ibis.md)
* [Design](concept/design.md)
* [Backends](concept/backends.md)
* [Backends](backends/)
* [Tutorials](tutorial/index.md)
* [Getting started with Ibis](tutorial/getting_started.md)
* [Ibis for SQL users](tutorial/ibis-for-sql-users.ipynb)
* [Ibis for pandas users](tutorial/ibis-for-pandas-users.ipynb)
* [Ibis for dplyr users](tutorial/ibis-for-dplyr-users.ipynb)
* How-to guides
* [Configure Ibis](how_to/configuration.md)
* [Chain expressions with the underscore API](how_to/chain_expressions.md)
* [Fill data using window functions](how_to/ffill_bfill_w_window.md)
* [Perform self joins](how_to/self_joins.md)
* [Sessionize a log of events](how_to/sessionize.md)
* [Compute the top K records](how_to/topk.md)
* [Join an in-memory DataFrame to a TableExpression](how_to/memtable_join.md)
* [Load external data files with the DuckDB backend](how_to/duckdb_register.md)
* [Write a Streamlit app with Ibis](how_to/streamlit.md)
* [Extend with custom operations](how_to/extending/)
* Reference
* [Expressions](reference/expressions/index.md)
* [Top level](reference/expressions/top_level.md)
* [Tables](reference/expressions/tables.md)
* [Generic Values](reference/expressions/generic.md)
* [Numeric and boolean](reference/expressions/numeric.md)
* [Strings](reference/expressions/strings.md)
* [Timestamps, dates, and times](reference/expressions/timestamps.md)
* [Collections](reference/expressions/collections.md)
* [Geospatial](reference/expressions/geospatial.md)
* [Column selectors](reference/selectors.md)
* [Data types](reference/datatypes.md)
* [Schemas](reference/schemas.md)
* [Backend interfaces](reference/backends/)
* [Configuration](reference/config.md)
* [Supported Python versions](supported_python_versions.md)
* [Release notes](release_notes.md)
* Blog
* [Campaign Finance Analysis with Ibis](blog/rendered/campaign-finance.ipynb)
* [Ibis Sneak Peek: Writing to Files](blog/ibis-to-file.md)
* [Ibis Sneak Peek: Examples](blog/ibis-examples.md)
* [Maximizing Productivity with Selectors](blog/selectors.md)
* [Ibis on :fire:: Supercharge Your Workflow with DuckDB and PyTorch](blog/rendered/torch.ipynb)
* [Campaign finance analysis with Ibis](blog/rendered/campaign-finance.ipynb)
* [Ibis sneak peek: writing to files](blog/ibis-to-file.md)
* [Ibis sneak peek: examples](blog/ibis-examples.md)
* [Maximizing productivity with selectors](blog/selectors.md)
* [Ibis + DuckDB + Substrait](blog/ibis_substrait_to_duckdb.md)
* [Ibis v4.0.0](blog/ibis-version-4.0.0-release.md)
* [Analyzing Ibis's CI Data with Ibis](blog/rendered/ci-analysis.ipynb)
* [Analyzing Ibis's CI data with Ibis](blog/rendered/ci-analysis.ipynb)
* [ffill and bfill using ibis](blog/ffill-and-bfill-using-ibis.md)
* [Ibis v3.1.0](blog/Ibis-version-3.1.0-release.md)
* [Ibis v3.0.0](blog/Ibis-version-3.0.0-release.md)
Expand Down
9 changes: 0 additions & 9 deletions docs/api/expressions/timestamps.md

This file was deleted.

8 changes: 0 additions & 8 deletions docs/backends/BigQuery.md

This file was deleted.

8 changes: 0 additions & 8 deletions docs/backends/ClickHouse.md

This file was deleted.

9 changes: 0 additions & 9 deletions docs/backends/Datafusion.md

This file was deleted.

11 changes: 0 additions & 11 deletions docs/backends/Druid.md

This file was deleted.

18 changes: 0 additions & 18 deletions docs/backends/DuckDB.md

This file was deleted.

9 changes: 0 additions & 9 deletions docs/backends/MSSQL.md

This file was deleted.

8 changes: 0 additions & 8 deletions docs/backends/MySQL.md

This file was deleted.

10 changes: 0 additions & 10 deletions docs/backends/Polars.md

This file was deleted.

8 changes: 0 additions & 8 deletions docs/backends/PostgreSQL.md

This file was deleted.

8 changes: 0 additions & 8 deletions docs/backends/PySpark.md

This file was deleted.

41 changes: 0 additions & 41 deletions docs/backends/SQLite.md

This file was deleted.

11 changes: 0 additions & 11 deletions docs/backends/Snowflake.md

This file was deleted.

10 changes: 0 additions & 10 deletions docs/backends/Trino.md

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ hide:
- toc
---

# Operation Support Matrix
# Operation support matrix

Backends are shown in descending order of the number of supported operations.

Expand All @@ -15,7 +15,7 @@ Backends are shown in descending order of the number of supported operations.
operation coverage.

<div class="streamlit-app">
<iframe id="streamlit-app" src="https://ibis-project.streamlit.app/?embedded=true"></iframe>
<iframe class="streamlit-app-inner" src="https://ibis-project.streamlit.app/?embedded=true"></iframe>
</div>

!!! note "This app is built using [`streamlit`](https://streamlit.io/)"
Expand Down
11 changes: 6 additions & 5 deletions docs/backends/app/backend_info_app.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ def support_matrix_df():
short_operation=_.full_operation.split(".")[-1],
operation_category=_.full_operation.split(".")[-2],
)
.execute()
.to_pandas()
)


Expand All @@ -52,6 +52,7 @@ def backends_info_df():
"impala": ["string", "sql"],
"mssql": ["sqlalchemy", "sql"],
"mysql": ["sqlalchemy", "sql"],
"oracle": ["sqlalchemy", "sql"],
"pandas": ["dataframe"],
"polars": ["dataframe"],
"postgres": ["sqlalchemy", "sql"],
Expand All @@ -74,7 +75,7 @@ def get_all_backend_categories():
backend_info_table.select(category=_.categories.unnest())
.distinct()
.order_by('category')['category']
.execute()
.to_pandas()
.tolist()
)

Expand All @@ -84,7 +85,7 @@ def get_all_operation_categories():
return (
support_matrix_table.select(_.operation_category)
.distinct()['operation_category']
.execute()
.to_pandas()
.tolist()
)

Expand All @@ -95,7 +96,7 @@ def get_backend_names(categories: Optional[List[str]] = None):
if categories:
backend_expr = backend_expr.filter(_.category.isin(categories))
return (
backend_expr.select(_.backend_name).distinct().backend_name.execute().tolist()
backend_expr.select(_.backend_name).distinct().backend_name.to_pandas().tolist()
)


Expand Down Expand Up @@ -169,7 +170,7 @@ def get_selected_operation_categories():
table_expr = table_expr[current_backend_names + ["index"]]

# Execute query
df = table_expr.execute()
df = table_expr.to_pandas()
df = df.set_index('index')

# Display result
Expand Down
3 changes: 3 additions & 0 deletions docs/backends/badges.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
{% if imports %} ![filebadge](https://img.shields.io/badge/Reads-{{ "%20|%20".join(sorted(imports)) }}-blue?style=flat-square) {% endif %}

{% if exports %} ![exportbadge](https://img.shields.io/badge/Exports-{{ "%20|%20".join(sorted(exports)) }}-orange?style=flat-square) {% endif %}
85 changes: 85 additions & 0 deletions docs/backends/bigquery.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
---
backend_name: Google BigQuery
backend_url: https://cloud.google.com/bigquery
exports: ["PyArrow", "Parquet", "CSV", "Pandas"]
---

# BigQuery

{% include 'backends/badges.md' %}

## Install

Install `ibis` and dependencies for the BigQuery backend:

=== "pip"

```sh
pip install 'ibis-framework[bigquery]'
```

{% for mgr in ["conda", "mamba"] %}
=== "{{ mgr }}"

```sh
{{ mgr }} install -c conda-forge ibis-bigquery
```

{% endfor %}

## Connect

### `ibis.bigquery.connect`

```python
con = ibis.bigquery.connect(
project_id="ibis-bq-project",
dataset_id="testing",
)
```

<!-- prettier-ignore-start -->
!!! info "`ibis.bigquery.connect` is a thin wrapper around [`ibis.backends.bigquery.Backend.do_connect`][ibis.backends.bigquery.Backend.do_connect]."
<!-- prettier-ignore-end -->

### Connection Parameters

<!-- prettier-ignore-start -->
::: ibis.backends.bigquery.Backend.do_connect
options:
heading_level: 4
<!-- prettier-ignore-end -->

### `ibis.connect` URL format

In addition to `ibis.bigquery.connect`, you can also connect to BigQuery by
passing a properly formatted BigQuery connection URL to `ibis.connect`

```python
con = ibis.connect(f"bigquery://{project_id}/{dataset_id}")
```

<!-- prettier-ignore-start -->
!!! info "This assumes you have already authenticated via the `gcloud` CLI"
<!-- prettier-ignore-end -->

### Finding your `project_id` and `dataset_id`

Log in to the [Google Cloud Console](https://console.cloud.google.com/bigquery)
to see which `project_id`s and `dataset_id`s are available to use.

![bigquery_ids](./images/bigquery_connect.png)

### BigQuery Authentication

The simplest way to authenticate with the BigQuery backend is to use [Google's `gcloud` CLI tool](https://cloud.google.com/sdk/docs/install-sdk).

Once you have `gcloud` installed, you can authenticate to BigQuery (and other Google Cloud services) by running

```sh
gcloud auth login
```

For any authentication problems, or information on other ways of authenticating,
see the [`gcloud` CLI authorization
guide](https://cloud.google.com/sdk/docs/authorizing).
82 changes: 82 additions & 0 deletions docs/backends/clickhouse.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
---
backend_name: ClickHouse
backend_url: https://clickhouse.yandex/
backend_module: clickhouse
exports: ["PyArrow", "Parquet", "CSV", "Pandas"]
---

# ClickHouse

{% include 'backends/badges.md' %}

## Install

Install `ibis` and dependencies for the ClickHouse backend:

=== "pip"

```sh
pip install 'ibis-framework[clickhouse]'
```

{% for mgr in ["conda", "mamba"] %}
=== "{{ mgr }}"

```sh
{{ mgr }} install -c conda-forge ibis-clickhouse
```

{% endfor %}

## Connect

### `ibis.clickhouse.connect`

```python
con = ibis.clickhouse.connect(
user="username",
password="password",
host="hostname",
)
```

<!-- prettier-ignore-start -->
!!! info "`ibis.clickhouse.connect` is a thin wrapper around [`ibis.backends.clickhouse.Backend.do_connect`][ibis.backends.clickhouse.Backend.do_connect]."
<!-- prettier-ignore-end -->

### Connection Parameters

<!-- prettier-ignore-start -->
::: ibis.backends.clickhouse.Backend.do_connect
options:
heading_level: 4
show_docstring_examples: false
<!-- prettier-ignore-end -->

### `ibis.connect` URL format

In addition to `ibis.clickhouse.connect`, you can also connect to ClickHouse by
passing a properly formatted ClickHouse connection URL to `ibis.connect`

```python
con = ibis.connect(f"clickhouse://{user}:{password}@{host}:{port}?secure={secure}")
```

## ClickHouse playground

ClickHouse provides a free playground with several datasets that you can connect to using `ibis`:

```python
con = ibis.clickhouse.connect(
host="play.clickhouse.com",
secure=True,
user="play",
password="clickhouse",
)
```

or

```python
con = ibis.connect("clickhouse://play:clickhouse@play.clickhouse.com:443?secure=True")
```
File renamed without changes.
75 changes: 75 additions & 0 deletions docs/backends/datafusion.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
---
backend_name: Datafusion
backend_url: https://arrow.apache.org/datafusion/
backend_module: datafusion
version_added: "2.1"
exports: ["PyArrow", "Parquet", "Delta Lake", "CSV", "Pandas"]
imports: ["CSV", "Parquet", "Delta Lake"]
---

# DataFusion

{% include 'backends/badges.md' %}

## Install

Install `ibis` and dependencies for the Apache Datafusion backend:

=== "pip"

```sh
pip install 'ibis-framework[datafusion]'
```

{% for mgr in ["conda", "mamba"] %}
=== "{{ mgr }}"

```sh
{{ mgr }} install -c conda-forge ibis-datafusion
```

{% endfor %}

## Connect

### `ibis.datafusion.connect`

```python
con = ibis.datafusion.connect()
```

```python
con = ibis.datafusion.connect(
config={"table1": "path/to/file.parquet", "table2": "path/to/file.csv"}
)
```

<!-- prettier-ignore-start -->
!!! info "`ibis.datafusion.connect` is a thin wrapper around [`ibis.backends.datafusion.Backend.do_connect`][ibis.backends.datafusion.Backend.do_connect]."
<!-- prettier-ignore-end -->

### Connection Parameters

<!-- prettier-ignore-start -->
::: ibis.backends.datafusion.Backend.do_connect
options:
heading_level: 4
show_docstring_examples: false
<!-- prettier-ignore-end -->

## File Support

<!-- prettier-ignore-start -->
::: ibis.backends.datafusion.Backend.read_csv
options:
heading_level: 4
show_docstring_returns: false
::: ibis.backends.datafusion.Backend.read_parquet
options:
heading_level: 4
show_docstring_returns: false
::: ibis.backends.datafusion.Backend.read_delta
options:
heading_level: 4
show_docstring_returns: false
<!-- prettier-ignore-end -->
67 changes: 67 additions & 0 deletions docs/backends/druid.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
---
backend_name: Druid
backend_url: https://druid.apache.org/
backend_module: druid
exports: ["PyArrow", "Parquet", "CSV", "Pandas"]
---

# Druid

{% include 'backends/badges.md' %}

!!! experimental "Introduced in v5.0"

The Druid backend is experimental and is subject to backwards incompatible changes.

## Install

Install `ibis` and dependencies for the Druid backend:

=== "pip"

```sh
pip install 'ibis-framework[druid]'
```

{% for mgr in ["conda", "mamba"] %}
=== "{{ mgr }}"

```sh
{{ mgr }} install -c conda-forge ibis-druid
```

{% endfor %}

## Connect

### `ibis.druid.connect`

```python
con = ibis.druid.connect(
host="hostname",
port=8082,
database="druid/v2/sql",
)
```

<!-- prettier-ignore-start -->
!!! info "`ibis.druid.connect` is a thin wrapper around [`ibis.backends.druid.Backend.do_connect`][ibis.backends.druid.Backend.do_connect]."
<!-- prettier-ignore-end -->

### Connection Parameters

<!-- prettier-ignore-start -->
::: ibis.backends.druid.Backend.do_connect
options:
heading_level: 4
show_docstring_examples: false
<!-- prettier-ignore-end -->

### `ibis.connect` URL format

In addition to `ibis.druid.connect`, you can also connect to Druid by
passing a properly formatted Druid connection URL to `ibis.connect`

```python
con = ibis.connect("druid://localhost:8082/druid/v2/sql")
```
155 changes: 155 additions & 0 deletions docs/backends/duckdb.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,155 @@
---
backend_name: DuckDB
backend_url: https://duckdb.org/
backend_module: duckdb
exports: ["PyArrow", "Parquet", "Delta Lake", "CSV", "Pandas"]
imports:
[
"CSV",
"Parquet",
"Delta Lake",
"JSON",
"PyArrow",
"Pandas",
"SQLite",
"Postgres",
]
---

# DuckDB

{% include 'backends/badges.md' %}

??? danger "`duckdb` >= 0.5.0 requires `duckdb-engine` >= 0.6.2"

If you encounter problems when using `duckdb` >= **0.5.0** you may need to
upgrade `duckdb-engine` to at least version **0.6.2**.

See [this issue](https://github.com/ibis-project/ibis/issues/4503) for
more details.

## Install

Install `ibis` and dependencies for the DuckDB backend:

=== "pip"

```sh
pip install 'ibis-framework[duckdb]'
```

{% for mgr in ["conda", "mamba"] %}
=== "{{ mgr }}"

```sh
{{ mgr }} install -c conda-forge ibis-duckdb
```

{% endfor %}

## Connect

### `ibis.duckdb.connect`

```python
con = ibis.duckdb.connect() # (1)
```

1. Use an ephemeral, in-memory database

```python
con = ibis.duckdb.connect("mydb.duckdb") # (1)
```

1. Connect to, or create, a local DuckDB file

<!-- prettier-ignore-start -->
!!! info "`ibis.duckdb.connect` is a thin wrapper around [`ibis.backends.duckdb.Backend.do_connect`][ibis.backends.duckdb.Backend.do_connect]."
<!-- prettier-ignore-end -->

### Connection Parameters

<!-- prettier-ignore-start -->
::: ibis.backends.duckdb.Backend.do_connect
options:
heading_level: 4
<!-- prettier-ignore-end -->

### `ibis.connect` URL format

In addition to `ibis.duckdb.connect`, you can also connect to DuckDB by
passing a properly formatted DuckDB connection URL to `ibis.connect`

```python
con = ibis.connect("duckdb:///path/to/local/file")
```

```python
con = ibis.connect("duckdb://") # (1)
```

1. ephemeral, in-memory database

## MotherDuck

The DuckDB backend supports [MotherDuck](https://motherduck.com). If you have an
account, you can connect to MotherDuck by passing in the string `md:` or
`motherduck:`. `ibis` will trigger the authentication prompt in-browser.

```python
>>> import ibis

>>> con = ibis.duckdb.connect("md:")
```

<!-- prettier-ignore-start -->
!!! info "Authentication to MotherDuck will trigger on the first call that requires retrieving information (in this case `list_tables`)"
<!-- prettier-ignore-end -->

```python
>>> con.list_tables()
Attempting to automatically open the SSO authorization page in your default browser.
1. Please open this link to login into your account: https://auth.motherduck.com/activate
2. Enter the following code: ZSRQ-GJQS


Token successfully retrieved âś…
You can store it as an environment variable to avoid having to log in again:
$ export motherduck_token='****************'

['penguins']
```

## File Support

<!-- prettier-ignore-start -->
::: ibis.backends.duckdb.Backend.read_csv
options:
heading_level: 4
show_docstring_returns: false
::: ibis.backends.duckdb.Backend.read_parquet
options:
heading_level: 4
show_docstring_returns: false
::: ibis.backends.duckdb.Backend.read_delta
options:
heading_level: 4
show_docstring_returns: false
::: ibis.backends.duckdb.Backend.read_json
options:
heading_level: 4
show_docstring_returns: false
::: ibis.backends.duckdb.Backend.read_in_memory
options:
heading_level: 4
show_docstring_returns: false
::: ibis.backends.duckdb.Backend.read_sqlite
options:
heading_level: 4
show_docstring_examples: false
show_docstring_returns: false
::: ibis.backends.duckdb.Backend.read_postgres
options:
heading_level: 4
show_docstring_returns: false
<!-- prettier-ignore-end -->
404 changes: 404 additions & 0 deletions docs/backends/images/bigquery_connect.excalidraw

Large diffs are not rendered by default.

Binary file added docs/backends/images/bigquery_connect.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
313 changes: 313 additions & 0 deletions docs/backends/images/snowflake_database.excalidraw

Large diffs are not rendered by default.

Binary file added docs/backends/images/snowflake_database.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
317 changes: 317 additions & 0 deletions docs/backends/images/snowflake_org_user.excalidraw

Large diffs are not rendered by default.

Binary file added docs/backends/images/snowflake_org_user.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
30 changes: 15 additions & 15 deletions docs/backends/Impala.md → docs/backends/impala.md
Original file line number Diff line number Diff line change
Expand Up @@ -192,15 +192,15 @@ table.drop()

## Expression execution

Ibis expressions have an `execute` method with compiles and runs the
Ibis expressions have execution methods like `to_pandas` that compile and run the
expressions on Impala or whichever backend is being referenced.

For example:

```python
>>> fa = db.functional_alltypes
>>> expr = fa.double_col.sum()
>>> expr.execute()
>>> expr.to_pandas()
331785.00000000006
```

Expand Down Expand Up @@ -235,7 +235,7 @@ If you pass an Ibis expression to `create_table`, Ibis issues a
>>> db.create_table('string_freqs', expr, format='parquet')

>>> freqs = db.table('string_freqs')
>>> freqs.execute()
>>> freqs.to_pandas()
string_col count
0 9 730
1 3 730
Expand All @@ -262,7 +262,7 @@ below).
### Creating an empty table

To create an empty table, you must declare an Ibis schema that will be
translated to the appopriate Impala schema and data types.
translated to the appropriate Impala schema and data types.

As Ibis types are simplified compared with Impala types, this may expand
in the future to include a more fine-grained schema declaration.
Expand Down Expand Up @@ -387,7 +387,7 @@ an Ibis table expression:
>>> target.insert(t[:3])
>>> target.insert(t[:3])

>>> target.execute()
>>> target.to_pandas()
id bool_col tinyint_col ... timestamp_col year month
0 5770 True 0 ... 2010-08-01 00:00:00.000 2010 8
1 5771 False 1 ... 2010-08-01 00:01:00.000 2010 8
Expand Down Expand Up @@ -824,7 +824,7 @@ a major part of the Ibis roadmap).
Ibis's Impala tools currently interoperate with pandas in these ways:

- Ibis expressions return pandas objects (i.e. DataFrame or Series)
for non-scalar expressions when calling their `execute` method
for non-scalar expressions when calling their `to_pandas` method
- The `create_table` and `insert` methods can accept pandas objects.
This includes inserting into partitioned tables. It currently uses
CSV as the ingest route.
Expand All @@ -838,7 +838,7 @@ For example:

>>> db.create_table('pandas_table', data)
>>> t = db.pandas_table
>>> t.execute()
>>> t.to_pandas()
bar foo
0 a 1
1 b 2
Expand All @@ -851,7 +851,7 @@ For example:

>>> to_insert = db.empty_for_insert
>>> to_insert.insert(data)
>>> to_insert.execute()
>>> to_insert.to_pandas()
bar foo
0 a 1
1 b 2
Expand All @@ -868,7 +868,7 @@ For example:

>>> db.create_table('pandas_table', data)
>>> t = db.pandas_table
>>> t.execute()
>>> t.to_pandas()
foo bar
0 1 a
1 2 b
Expand All @@ -879,7 +879,7 @@ For example:
>>> db.create_table('empty_for_insert', schema=t.schema())
>>> to_insert = db.empty_for_insert
>>> to_insert.insert(data)
>>> to_insert.execute()
>>> to_insert.to_pandas()
foo bar
0 1 a
1 2 b
Expand Down Expand Up @@ -1215,7 +1215,7 @@ may significantly speed up queries on smaller datasets:
```

```bash
$ time python -c "(t.double_col + rand()).sum().execute()"
$ time python -c "(t.double_col + rand()).sum().to_pandas()"
27.7 ms ± 996 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
```

Expand All @@ -1225,7 +1225,7 @@ con.disable_codegen(False)
```

```bash
$ time python -c "(t.double_col + rand()).sum().execute()"
$ time python -c "(t.double_col + rand()).sum().to_pandas()"
27 ms ± 1.62 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
```

Expand Down Expand Up @@ -1303,7 +1303,7 @@ The object `fuzzy_equals` is callable and works with Ibis expressions:

>>> expr = fuzzy_equals(t.float_col, t.double_col / 10)

>>> expr.execute()[:10]
>>> expr.to_pandas()[:10]
0 True
1 False
2 False
Expand Down Expand Up @@ -1338,7 +1338,7 @@ connection semantics are similar to the other access methods for working with
secure clusters.

Specifically, after authenticating yourself against Kerberos (e.g., by issuing
the appropriate `kinit` commmand), simply pass `auth_mechanism='GSSAPI'` or
the appropriate `kinit` command), simply pass `auth_mechanism='GSSAPI'` or
`auth_mechanism='LDAP'` (and set `kerberos_service_name` if necessary along
with `user` and `password` if necessary) to the
`ibis.impala_connect(...)` method when instantiating an `ImpalaConnection`.
Expand All @@ -1355,7 +1355,7 @@ when connecting to a Kerberized cluster. Because some Ibis commands create HDFS
directories as well as new Impala databases and/or tables, your user will
require the necessary privileges.

## Default Configuation Values for CDH Components
## Default Configuration Values for CDH Components

Cloudera CDH ships with HDFS, Impala, Hive and many other components.
Sometimes it's not obvious what default configuration values these tools are
Expand Down
46 changes: 5 additions & 41 deletions docs/backends/index.md
Original file line number Diff line number Diff line change
@@ -1,45 +1,9 @@
# Backends

See the [configuration guide](../user_guide/configuration.md#default-backend)
to inspect or reconfigure the backend used by default.
A backend is where execution of Ibis table expressions occur after compiling into some intermediate representation. A backend is often a database and the intermediate representation often SQL, but several types of backends exist.

## String Generating Backends
See the [configuration guide](../how_to/configuration.md#default-backend)
to inspect or reconfigure the backend used by default. View the [operation support matrix](_support_matrix.md) to see which operations
are supported by each backend.

The first category of backend translate Ibis expressions into string queries.

The compiler turns each expression into a string query and passes that query to the
database through a driver API for execution.

- [Apache Impala](Impala.md)
- [ClickHouse](ClickHouse.md)
- [Google BigQuery](BigQuery.md)
- [HeavyAI](https://github.com/heavyai/ibis-heavyai)

## Expression Generating Backends

The next category of backends translates ibis expressions into another
system's expressions, for example, SQLAlchemy.

Instead of generating strings for each expression these backends produce
another kind of expression and typically have high-level APIs for execution.

- [Apache Arrow Datafusion](Datafusion.md)
- [Apache Druid](Druid.md)
- [Apache PySpark](PySpark.md)
- [Dask](Dask.md)
- [DuckDB](DuckDB.md)
- [MS SQL Server](MSSQL.md)
- [MySQL](MySQL.md)
- [Polars](Polars.md)
- [PostgreSQL](PostgreSQL.md)
- [SQLite](SQLite.md)
- [Snowflake](Snowflake.md)
- [Trino](Trino.md)

## Direct Execution Backends

The pandas backend is the only direct execution backend. A full description
of the implementation can be found in the module docstring of the pandas
backend located in `ibis/backends/pandas/core.py`.

- [Pandas](Pandas.md)
Each backend has its own configuration options documented here.
65 changes: 65 additions & 0 deletions docs/backends/mssql.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
---
backend_name: MS SQL Server
backend_url: https://www.microsoft.com/en-us/evalcenter/evaluate-sql-server-2022
backend_module: mssql
backend_param_style: connection parameters
version_added: "4.0"
exports: ["PyArrow", "Parquet", "CSV", "Pandas"]
---

# MSSQL

{% include 'backends/badges.md' %}

## Install

Install `ibis` and dependencies for the MSSQL backend:

=== "pip"

```sh
pip install 'ibis-framework[mssql]'
```

{% for mgr in ["conda", "mamba"] %}
=== "{{ mgr }}"

```sh
{{ mgr }} install -c conda-forge ibis-mssql
```

{% endfor %}

## Connect

### `ibis.mssql.connect`

```python
con = ibis.mssql.connect(
user="username",
password="password",
host="hostname",
)
```

<!-- prettier-ignore-start -->
!!! info "`ibis.mssql.connect` is a thin wrapper around [`ibis.backends.mssql.Backend.do_connect`][ibis.backends.mssql.Backend.do_connect]."
<!-- prettier-ignore-end -->

### Connection Parameters

<!-- prettier-ignore-start -->
::: ibis.backends.mssql.Backend.do_connect
options:
heading_level: 4
show_docstring_examples: false
<!-- prettier-ignore-end -->

### `ibis.connect` URL format

In addition to `ibis.mssql.connect`, you can also connect to MSSQL by
passing a properly formatted MSSQL connection URL to `ibis.connect`

```python
con = ibis.connect(f"mssql://{user}:{password}@{host}:{port}")
```
66 changes: 66 additions & 0 deletions docs/backends/mysql.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
---
backend_name: MySQL
backend_url: https://www.mysql.com/
backend_module: mysql
backend_param_style: a SQLAlchemy-style URI
exports: ["PyArrow", "Parquet", "CSV", "Pandas"]
---

# MySQL

{% include 'backends/badges.md' %}

## Install

Install `ibis` and dependencies for the MySQL backend:

=== "pip"

```sh
pip install 'ibis-framework[mysql]'
```

{% for mgr in ["conda", "mamba"] %}
=== "{{ mgr }}"

```sh
{{ mgr }} install -c conda-forge ibis-mysql
```

{% endfor %}

## Connect

### `ibis.mysql.connect`

```python
con = ibis.mysql.connect(
user="username",
password="password",
host="hostname",
port=3306,
database="database",
)
```

<!-- prettier-ignore-start -->
!!! info "`ibis.mysql.connect` is a thin wrapper around [`ibis.backends.mysql.Backend.do_connect`][ibis.backends.mysql.Backend.do_connect]."
<!-- prettier-ignore-end -->

### Connection Parameters

<!-- prettier-ignore-start -->
::: ibis.backends.mysql.Backend.do_connect
options:
heading_level: 4
show_docstring_examples: false
<!-- prettier-ignore-end -->

### `ibis.connect` URL format

In addition to `ibis.mysql.connect`, you can also connect to MySQL by
passing a properly formatted MySQL connection URL to `ibis.connect`

```python
con = ibis.connect(f"mysql://{user}:{password}@{host}:{port}/{database}")
```
73 changes: 73 additions & 0 deletions docs/backends/oracle.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
---
backend_name: Oracle
backend_url: https://docs.oracle.com/en/database/oracle/oracle-database/index.html
backend_module: oracle
backend_param_style: a SQLAlchemy connection string
backend_connection_example: ibis.connect("oracle://user:pass@host:port/service_name")
is_experimental: true
version_added: "6.0"
exports: ["PyArrow", "Parquet", "CSV", "Pandas"]
---

# Oracle

{% include 'backends/badges.md' %}

!!! experimental "Introduced in v6.0"

The Oracle backend is experimental and is subject to backwards incompatible changes.

## Install

Install `ibis` and dependencies for the Oracle backend:

=== "pip"

```sh
pip install 'ibis-framework[oracle]'
```

{% for mgr in ["conda", "mamba"] %}
=== "{{ mgr }}"

```sh
{{ mgr }} install -c conda-forge ibis-oracle
```

{% endfor %}

## Connect

### `ibis.oracle.connect`

```python
con = ibis.oracle.connect(
user="username",
password="password",
host="hostname",
port=1521,
database="database",
)
```

<!-- prettier-ignore-start -->
!!! info "`ibis.oracle.connect` is a thin wrapper around [`ibis.backends.oracle.Backend.do_connect`][ibis.backends.oracle.Backend.do_connect]."
<!-- prettier-ignore-end -->

### Connection Parameters

<!-- prettier-ignore-start -->
::: ibis.backends.oracle.Backend.do_connect
options:
heading_level: 4
show_docstring_examples: false
<!-- prettier-ignore-end -->

### `ibis.connect` URL format

In addition to `ibis.oracle.connect`, you can also connect to Oracle by
passing a properly formatted Oracle connection URL to `ibis.connect`

```python
con = ibis.connect(f"oracle://{user}:{password}@{host}:{port}/{database}")
```
4 changes: 2 additions & 2 deletions docs/backends/Pandas.md → docs/backends/pandas.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
backend_name: Pandas
backend_name: pandas
backend_url: https://pandas.pydata.org/
backend_module: pandas
intro: Ibis's pandas backend is available in core Ibis.
Expand Down Expand Up @@ -69,7 +69,7 @@ def zscore(series):
return (series - series.mean()) / series.std()
```

### Details of Pandas UDFs
### Details of pandas UDFs

- Element-wise provide support
for applying your UDF to any combination of scalar values and columns.
Expand Down
73 changes: 73 additions & 0 deletions docs/backends/polars.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
---
backend_name: Polars
backend_url: https://pola-rs.github.io/polars-book/user-guide/index.html
backend_module: polars
is_experimental: true
version_added: "4.0"
exports: ["PyArrow", "Parquet", "Delta Lake", "CSV", "Pandas"]
imports: ["CSV", "Parquet", "Delta Lake", "Pandas"]
---

# Polars

{% include 'backends/badges.md' %}

!!! experimental "Introduced in v4.0"

The Polars backend is experimental and is subject to backwards incompatible changes.

## Install

Install `ibis` and dependencies for the Polars backend:

=== "pip"

```sh
pip install 'ibis-framework[polars]'
```

{% for mgr in ["conda", "mamba"] %}
=== "{{ mgr }}"

```sh
{{ mgr }} install -c conda-forge ibis-polars
```

{% endfor %}

## Connect

### `ibis.polars.connect`

```python
con = ibis.polars.connect()
```

<!-- prettier-ignore-start -->
!!! info "`ibis.polars.connect` is a thin wrapper around [`ibis.backends.polars.Backend.do_connect`][ibis.backends.polars.Backend.do_connect]."
<!-- prettier-ignore-end -->

### Connection Parameters

<!-- prettier-ignore-start -->
::: ibis.backends.polars.Backend.do_connect
options:
heading_level: 4
<!-- prettier-ignore-end -->

## File Support

<!-- prettier-ignore-start -->
::: ibis.backends.polars.Backend.read_csv
options:
heading_level: 4
show_docstring_returns: false
::: ibis.backends.polars.Backend.read_parquet
options:
heading_level: 4
show_docstring_returns: false
::: ibis.backends.polars.Backend.read_delta
options:
heading_level: 4
show_docstring_returns: false
<!-- prettier-ignore-end -->
66 changes: 66 additions & 0 deletions docs/backends/postgresql.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
---
backend_name: PostgreSQL
backend_url: https://www.postgresql.org/
backend_module: postgres
backend_param_style: a SQLAlchemy-style URI
exports: ["PyArrow", "Parquet", "CSV", "Pandas"]
---

# PostgreSQL

{% include 'backends/badges.md' %}

## Install

Install `ibis` and dependencies for the Postgres backend:

=== "pip"

```sh
pip install 'ibis-framework[postgres]'
```

{% for mgr in ["conda", "mamba"] %}
=== "{{ mgr }}"

```sh
{{ mgr }} install -c conda-forge ibis-postgres
```

{% endfor %}

## Connect

### `ibis.postgres.connect`

```python
con = ibis.postgres.connect(
user="username",
password="password",
host="hostname",
port=5432,
database="database",
)
```

<!-- prettier-ignore-start -->
!!! info "`ibis.postgres.connect` is a thin wrapper around [`ibis.backends.postgres.Backend.do_connect`][ibis.backends.postgres.Backend.do_connect]."
<!-- prettier-ignore-end -->

### Connection Parameters

<!-- prettier-ignore-start -->
::: ibis.backends.postgres.Backend.do_connect
options:
heading_level: 4
show_docstring_examples: false
<!-- prettier-ignore-end -->

### `ibis.connect` URL format

In addition to `ibis.postgres.connect`, you can also connect to Postgres by
passing a properly formatted Postgres connection URL to `ibis.connect`

```python
con = ibis.connect(f"postgres://{user}:{password}@{host}:{port}/{database}")
```
68 changes: 68 additions & 0 deletions docs/backends/pyspark.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
---
backend_name: PySpark
backend_url: https://spark.apache.org/docs/latest/api/python/
backend_module: pyspark
backend_param_style: PySpark things
exports: ["PyArrow", "Parquet", "CSV", "Pandas"]
imports: ["CSV", "Parquet"]
---

# PySpark

{% include 'backends/badges.md' %}

## Install

Install `ibis` and dependencies for the PySpark backend:

=== "pip"

```sh
pip install 'ibis-framework[pyspark]'
```

{% for mgr in ["conda", "mamba"] %}
=== "{{ mgr }}"

```sh
{{ mgr }} install -c conda-forge ibis-pyspark
```

{% endfor %}

## Connect

### `ibis.pyspark.connect`

```python
con = ibis.pyspark.connect(session=session)
```

<!-- prettier-ignore-start -->
!!! info "`ibis.pyspark.connect` is a thin wrapper around [`ibis.backends.pyspark.Backend.do_connect`][ibis.backends.pyspark.Backend.do_connect]."
<!-- prettier-ignore-end -->

<!-- prettier-ignore-start -->
!!! info "The `pyspark` backend does not create `SparkSession` objects, you must create a `SparkSession` and pass that to `ibis.pyspark.connect`."
<!-- prettier-ignore-end -->

### Connection Parameters

<!-- prettier-ignore-start -->
::: ibis.backends.pyspark.Backend.do_connect
options:
heading_level: 4
<!-- prettier-ignore-end -->

## File Support

<!-- prettier-ignore-start -->
::: ibis.backends.pyspark.Backend.read_csv
options:
heading_level: 4
show_docstring_returns: false
::: ibis.backends.pyspark.Backend.read_parquet
options:
heading_level: 4
show_docstring_returns: false
<!-- prettier-ignore-end -->
121 changes: 121 additions & 0 deletions docs/backends/snowflake.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
---
backend_name: Snowflake
backend_url: https://snowflake.com/
exports: ["PyArrow", "Parquet", "CSV", "Pandas"]
---

# Snowflake

{% include 'backends/badges.md' %}

!!! experimental "Introduced in v4.0"

The Snowflake backend is experimental and is subject to backwards incompatible changes.

## Install

Install `ibis` and dependencies for the Snowflake backend:

=== "pip"

```sh
pip install 'ibis-framework[snowflake]'
```

{% for mgr in ["conda", "mamba"] %}
=== "{{ mgr }}"

```sh
{{ mgr }} install -c conda-forge ibis-snowflake
```

{% endfor %}

## Connect

### `ibis.snowflake.connect`

```python
con = ibis.snowflake.connect(
user="user",
password="password",
account="safpqpq-sq55555",
database="IBIS_TESTING/IBIS_TESTING",
)
```

<!-- prettier-ignore-start -->
!!! info "`ibis.snowflake.connect` is a thin wrapper around [`ibis.backends.snowflake.Backend.do_connect`][ibis.backends.snowflake.Backend.do_connect]."
<!-- prettier-ignore-end -->

### Connection Parameters

<!-- prettier-ignore-start -->
::: ibis.backends.snowflake.Backend.do_connect
options:
heading_level: 4
<!-- prettier-ignore-end -->

### `ibis.connect` URL format

In addition to `ibis.snowflake.connect`, you can also connect to Snowflake by
passing a properly formatted Snowflake connection URL to `ibis.connect`

```python
con = ibis.connect(f"snowflake://{user}:{password}@{account}/{database}")
```

### Authenticating with SSO

Ibis supports connecting to SSO-enabled Snowflake warehouses using the `authenticator` parameter.

You can use it in the explicit-parameters-style or in the URL-style connection
APIs. All values of `authenticator` are supported.

#### Explicit

```python
con = ibis.snowflake.connect(
user="user",
account="safpqpq-sq55555",
database="my_database/my_schema",
warehouse="my_warehouse",
authenticator="externalbrowser",
)
```

#### URL

```python
con = ibis.connect(
f"snowflake://{user}@{account}/{database}?warehouse={warehouse}",
authenticator="externalbrowser",
)
```

### Looking up your Snowflake organization ID and user ID

A [Snowflake account
identifier](https://docs.snowflake.com/en/user-guide/admin-account-identifier#format-1-preferred-account-name-in-your-organization)
consists of an organization ID and a user ID, separated by a hyphen.

!!! info "This user ID is not the same as the username you log in with."

To find your organization ID and user ID, log in to the Snowflake web app, then
click on the text just to the right of the Snowflake logo (in the
lower-left-hand corner of the screen).

The bold text at the top of the little pop-up window is your organization ID.
The bold blue text with a checkmark next to it is your user ID.

![Snowflake Organization and User ID](./images/snowflake_org_user.png)

### Choosing a value for `database`

Snowflake refers to a collection of tables as a schema, and a collection of schema as a database.

You must choose a database and a schema to connect to. You can refer to the
available databases and schema in the "Data" sidebar item in the Snowflake web
app.

![Snowflake Database](./images/snowflake_database.png)
72 changes: 72 additions & 0 deletions docs/backends/sqlite.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
---
backend_name: SQLite
backend_url: https://www.sqlite.org/
backend_module: sqlite
imports: ["CSV", "Parquet", "JSON", "PyArrow", "Pandas", "SQLite", "Postgres"]
---

# SQLite

{% include 'backends/badges.md' %}

## Install

Install `ibis` and dependencies for the SQLite backend:

=== "pip"

```sh
pip install 'ibis-framework[sqlite]'
```

{% for mgr in ["conda", "mamba"] %}
=== "{{ mgr }}"

```sh
{{ mgr }} install -c conda-forge ibis-sqlite
```

{% endfor %}

## Connect

### `ibis.sqlite.connect`

```python
con = ibis.sqlite.connect() # (1)
```

1. Use an ephemeral, in-memory database

```python
con = ibis.sqlite.connect("mydb.sqlite") # (1)
```

1. Connect to, or create, a local SQLite file

<!-- prettier-ignore-start -->
!!! info "`ibis.sqlite.connect` is a thin wrapper around [`ibis.backends.sqlite.Backend.do_connect`][ibis.backends.sqlite.Backend.do_connect]."
<!-- prettier-ignore-end -->

### Connection Parameters

<!-- prettier-ignore-start -->
::: ibis.backends.sqlite.Backend.do_connect
options:
heading_level: 4
<!-- prettier-ignore-end -->

### `ibis.connect` URL format

In addition to `ibis.sqlite.connect`, you can also connect to SQLite by
passing a properly formatted SQLite connection URL to `ibis.connect`

```python
con = ibis.connect("sqlite:///path/to/local/file")
```

```python
con = ibis.connect("sqlite://") # (1)
```

1. ephemeral, in-memory database
2 changes: 1 addition & 1 deletion docs/backends/template.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@

## Install

Install ibis and dependencies for the {{ backend_name }} backend:
Install `ibis` and dependencies for the {{ backend_name }} backend:

=== "pip"

Expand Down
59 changes: 59 additions & 0 deletions docs/backends/trino.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
---
backend_name: Trino
backend_url: https://trino.io
backend_module: trino
exports: ["PyArrow", "Parquet", "CSV", "Pandas"]
---

# Trino

{% include 'backends/badges.md' %}

!!! experimental "Introduced in v4.0"

The Trino backend is experimental and is subject to backwards incompatible changes.

## Install

Install `ibis` and dependencies for the Trino backend:

=== "pip"

```sh
pip install 'ibis-framework[trino]'
```

{% for mgr in ["conda", "mamba"] %}
=== "{{ mgr }}"

```sh
{{ mgr }} install -c conda-forge ibis-trino
```

{% endfor %}

## Connect

### `ibis.trino.connect`

```python
con = ibis.trino.connect(
user="user",
password="password",
port=8080,
database="database",
schema="default",
)
```

<!-- prettier-ignore-start -->
!!! info "`ibis.trino.connect` is a thin wrapper around [`ibis.backends.trino.Backend.do_connect`][ibis.backends.trino.Backend.do_connect]."
<!-- prettier-ignore-end -->

### Connection Parameters

<!-- prettier-ignore-start -->
::: ibis.backends.trino.Backend.do_connect
options:
heading_level: 4
<!-- prettier-ignore-end -->
2 changes: 1 addition & 1 deletion docs/blog/ffill-and-bfill-using-ibis.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
Suppose you have a table of data mapping events and dates to values, and that this data contains gaps in values.

Suppose you want to forward fill these gaps such that, one-by-one,
if a value is null, it is replaced by the non-null value preceeding.
if a value is null, it is replaced by the non-null value preceding.

For example, you might be measuring the total value of an account over time.
Saving the same value until that value changes is an inefficient use of space,
Expand Down
17 changes: 9 additions & 8 deletions docs/blog/ibis_substrait_to_duckdb.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ of different analytical execution engines, most of which (but not all) speak
some dialect of SQL.

Today, Ibis accomplishes this with a lot of help from `sqlalchemy` and `sqlglot`
to handle differences in dialect, or we interact directly with avalable Python
to handle differences in dialect, or we interact directly with available Python
bindings (for instance with the `pandas`, `datafusion`, and `polars` backends).

Ibis goes to <span class="underline">great</span> lengths to generate sane and consistent SQL for those
Expand All @@ -18,9 +18,9 @@ communicating consistently with those backends.
other things) query plans. It's still in its early days, but there is already
nascent support for Substrait in [Apache Arrow](https://arrow.apache.org/docs/dev/cpp/streaming_execution.html#substrait), [DuckDB](https://duckdb.org/docs/extensions/substrait), and [Velox](https://engineering.fb.com/2022/08/31/open-source/velox/).

Ibis supports producing Substrait plans from Ibis expressions, with the help of
the [ibis-substrait](https://github.com/ibis-project/ibis-substrait) library.
Let's take a quick peek at how we might use it for query execution.
Ibis supports producing Substrait plans from Ibis table expressions, with the
help of the [ibis-substrait](https://github.com/ibis-project/ibis-substrait)
library. Let's take a quick peek at how we might use it for query execution.

## Getting started

Expand Down Expand Up @@ -50,7 +50,7 @@ con.execute(
## Query Creation

For our example, we'll build up a query using Ibis but without connecting to our
execution engine (DuckDB). Once we have an Ibis expression, we'll create a
execution engine (DuckDB). Once we have an Ibis table expression, we'll create a
Substrait plan, then execute that plan directly on DuckDB to get results.

To do this, all we need is some knowledge of the schema of the tables we want to
Expand Down Expand Up @@ -164,12 +164,13 @@ topfilms = (
)
```

Now that we have an Ibis expression, it's time for Substrait to enter the scene.
Now that we have an Ibis table expression, it's time for Substrait to enter
the scene.

## Substrait Serialization

We're going to import `ibis_substrait` and compile the `topfilms` expression
into a Substrait plan.
We're going to import `ibis_substrait` and compile the `topfilms` table
expression into a Substrait plan.

```python
from ibis_substrait.compiler.core import SubstraitCompiler
Expand Down
4 changes: 2 additions & 2 deletions docs/blog/rendered/campaign-finance.ipynb

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading