Large diffs are not rendered by default.

9 changes: 5 additions & 4 deletions docs/_freeze/posts/selectors/index/execute-results/html.json

Large diffs are not rendered by default.

Large diffs are not rendered by default.

9 changes: 7 additions & 2 deletions docs/_quarto.yml
Original file line number Diff line number Diff line change
Expand Up @@ -181,6 +181,10 @@ website:
- reference/Repr.qmd
- reference/SQL.qmd

- section: Cursed Knowledge
contents:
- reference/cursed_knowledge.qmd

format:
html:
theme:
Expand Down Expand Up @@ -568,14 +572,15 @@ quartodoc:
- matches
- any_of
- all_of
- c
- cols
- across
- if_any
- if_all
- r
- index
- first
- last
- all
- none

- title: Type System
desc: "Data types and schemas"
Expand Down
2 changes: 1 addition & 1 deletion docs/how-to/extending/builtin.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ rest of the library:
pkgs = ibis.read_parquet(
"https://storage.googleapis.com/ibis-tutorial-data/pypi/2024-04-24/packages.parquet"
)
pandas_ish = pkgs[jw_sim(pkgs.name, "pandas") >= 0.9]
pandas_ish = pkgs.filter(jw_sim(pkgs.name, "pandas") >= 0.9)
pandas_ish
```

Expand Down
2 changes: 1 addition & 1 deletion docs/how-to/input-output/duckdb-parquet.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ hosted on S3 at `s3://gbif-open-data-us-east-1/occurrence/`
We can read a single partition by specifying its path.

We do this by calling
[`read_parquet`](https://ibis-project.org/api/expressions/top_level/#ibis.read_parquet)
[`read_parquet`](https://ibis-project.org/backends/duckdb#ibis.backends.duckdb.Backend.read_parquet)
on the partition we care about.

So to read the first partition in this dataset, we'll call `read_parquet` on
Expand Down
2 changes: 1 addition & 1 deletion docs/how-to/visualization/matplotlib.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ grouped = t.group_by("species").aggregate(count=ibis._.count())
grouped = grouped.mutate(row_number=ibis.row_number().over()).select(
"row_number",
(
~s.c("row_number") & s.all()
~s.cols("row_number") & s.all()
), # see https://github.com/ibis-project/ibis/issues/6803
)
grouped
Expand Down
2 changes: 1 addition & 1 deletion docs/posts/ibis-to-file/index.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ import ibis.selectors as s
expr = (
t.group_by("species")
.mutate(s.across(s.numeric() & ~s.c("year"), (_ - _.mean()) / _.std()))
.mutate(s.across(s.numeric() & ~s.cols("year"), (_ - _.mean()) / _.std()))
)
expr
```
Expand Down
17 changes: 9 additions & 8 deletions docs/posts/selectors/index.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -49,10 +49,11 @@ sense.
We can exclude `year` from the normalization using another selector:

```{python}
t.mutate(s.across(s.numeric() & ~s.c("year"), (_ - _.mean()) / _.std()))
t.mutate(s.across(s.numeric() & ~s.cols("year"), (_ - _.mean()) / _.std()))
```

`c` is short for "column" and the `~` means "negate". Combining those we get "not the year column"!
`cols` selects one or more columns, and the `~` means "negate". Combining those
we get "every column except for 'year'"!

Pretty neat right?

Expand All @@ -65,7 +66,7 @@ With selectors, all you need to do is slap a `.group_by("species")` onto `t`:

```{python}
t.group_by("species").mutate(
s.across(s.numeric() & ~s.c("year"), (_ - _.mean()) / _.std())
s.across(s.numeric() & ~s.cols("year"), (_ - _.mean()) / _.std())
)
```

Expand All @@ -81,7 +82,7 @@ Grouped min/max normalization? Easy:

```{python}
t.group_by("species").mutate(
s.across(s.numeric() & ~s.c("year"), (_ - _.min()) / (_.max() - _.min()))
s.across(s.numeric() & ~s.cols("year"), (_ - _.min()) / (_.max() - _.min()))
)
```

Expand All @@ -107,7 +108,7 @@ What if I want to compute multiple things? Heck yeah!
```{python}
t.group_by("sex").mutate(
s.across(
s.numeric() & ~s.c("year"),
s.numeric() & ~s.cols("year"),
dict(centered=_ - _.mean(), zscore=(_ - _.mean()) / _.std()),
)
).select("sex", s.endswith(("_centered", "_zscore")))
Expand Down Expand Up @@ -144,14 +145,14 @@ t.select(s.startswith("bill")).mutate(
We've seen lots of mutate use, but selectors also work with `.agg`:

```{python}
t.group_by("year").agg(s.across(s.numeric() & ~s.c("year"), _.mean())).order_by("year")
t.group_by("year").agg(s.across(s.numeric() & ~s.cols("year"), _.mean())).order_by("year")
```

Naturally, selectors work in grouping keys too, for even more convenience:

```{python}
t.group_by(~s.numeric() | s.c("year")).mutate(
s.across(s.numeric() & ~s.c("year"), dict(centered=_ - _.mean(), std=_.std()))
t.group_by(~s.numeric() | s.cols("year")).mutate(
s.across(s.numeric() & ~s.cols("year"), dict(centered=_ - _.mean(), std=_.std()))
).select("species", s.endswith(("_centered", "_std")))
```

Expand Down
16 changes: 16 additions & 0 deletions docs/reference/cursed_knowledge.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Cursed Knowledge

Cursed knowledge the Ibis developers have acquired via battling with many, many
execution engines.

## Oracle

* Oracle's `LTRIM` and `RTRIM` functions accept a _set_ of whitespace (or other)
characters to remove from the left-, and right-hand-side sides of the input
string, but the `TRIM` function only accepts a single character to remove.

## Impala

* Impala's `LTRIM` and `RTRIM` functions accept a _set_ of whitespace (or other)
characters to remove from the left-, and right-hand-side sides of the input
string, but the `TRIM` function only removes _spaces_.
62 changes: 62 additions & 0 deletions docs/release_notes_generated.qmd
Original file line number Diff line number Diff line change
@@ -1,6 +1,68 @@
---
---

## [9.5.0](https://github.com/ibis-project/ibis/compare/9.4.0...9.5.0) (2024-09-11)

### Features

* **api:** add `name` argument to `topk` ([1652076](https://github.com/ibis-project/ibis/commit/16520764a9debdf49106851ed1e3ee179b2cebc5))
* **api:** add `name` argument to `value_counts` ([24be184](https://github.com/ibis-project/ibis/commit/24be184827c6368d6c7509584b27c3e2a332bb24))
* **api:** add `to_sqlglot` method to `Schema` objects ([#10063](https://github.com/ibis-project/ibis/issues/10063)) ([9488115](https://github.com/ibis-project/ibis/commit/9488115b588ebf6ba0814ebbac9937c8bfc8b517))
* **mssql:** add lpad and rpad ops ([#10060](https://github.com/ibis-project/ibis/issues/10060)) ([77af14b](https://github.com/ibis-project/ibis/commit/77af14bccdd6cd13c8df28499c36b58b9868f6e8))
* **mssql:** add startswith and endswith ops ([17a628c](https://github.com/ibis-project/ibis/commit/17a628ca52a4c4249f5b09b2b03f9f429f8ba248))

### Bug Fixes

* **backends:** pass kwargs to _from_url() in every case ([#10003](https://github.com/ibis-project/ibis/issues/10003)) ([9ca92f0](https://github.com/ibis-project/ibis/commit/9ca92f07707fd8c8bbc0ca4123b1e1bf5452d6c4))
* **bigquery:** handle column name mismatches and `_TABLE_SUFFIX` everywhere ([5ade49e](https://github.com/ibis-project/ibis/commit/5ade49e6a409b691da40a6109f69c3bfd49b83ed))
* **clickhouse:** fix lstrip, rstrip, and strip ([d2539c4](https://github.com/ibis-project/ibis/commit/d2539c4201af6a4a7928b594d5e46d6a06ba3127))
* **datafusion:** raise when attempting to create temp table ([#10072](https://github.com/ibis-project/ibis/issues/10072)) ([1cf5439](https://github.com/ibis-project/ibis/commit/1cf54399c94849cf27782b2446efe3c2e31e2467))
* **deps:** update dependency fsspec to <2024.9.1 ([#10036](https://github.com/ibis-project/ibis/issues/10036)) ([ea71719](https://github.com/ibis-project/ibis/commit/ea717198f60e2143888f0901be82d137ef1a8aff))
* **deps:** update dependency sqlglot to >=23.4,<25.20 ([#10010](https://github.com/ibis-project/ibis/issues/10010)) ([ba07da7](https://github.com/ibis-project/ibis/commit/ba07da7841b276f333c4e3238507ddcb3981b6e4))
* **deps:** update dependency sqlglot to >=23.4,<25.21 ([#10050](https://github.com/ibis-project/ibis/issues/10050)) ([422d361](https://github.com/ibis-project/ibis/commit/422d3618286845612fdc5d259537385b0dfa9d2e))
* **docs:** update invalid read_parquet link ([2ae9ef4](https://github.com/ibis-project/ibis/commit/2ae9ef440a2e897377ee19e132d8f4638d798baf))
* **duckdb:** allow setting `auto_detect` to `False` by fixing translation of columns argument ([#10065](https://github.com/ibis-project/ibis/issues/10065)) ([883d2d3](https://github.com/ibis-project/ibis/commit/883d2d3f064a75ae59660ee5027c2adfa2483913))
* **duckdb:** free memtables based on operation lifetime ([#10042](https://github.com/ibis-project/ibis/issues/10042)) ([a121ab3](https://github.com/ibis-project/ibis/commit/a121ab35ece43d8cf2724dca86f1bbbbd8e047a5))
* **duckdb:** support version 1.1.0 ([#10037](https://github.com/ibis-project/ibis/issues/10037)) ([3a37626](https://github.com/ibis-project/ibis/commit/3a376265534add3d9d8de76f40a8b2dad41832a1))
* **flink:** fix strip ([01117a5](https://github.com/ibis-project/ibis/commit/01117a5308027601a315d853bec88fc6e42cdd8a))
* **impala:** allow specifying `temp=False` in `create_table` ([e29712c](https://github.com/ibis-project/ibis/commit/e29712c31264eca39d2606c70848097f092db6fb))
* **impala:** fix lstrip, rstrip, strip ([413df3b](https://github.com/ibis-project/ibis/commit/413df3bcee21faadade61549ca4e778e4b60fb7d))
* **mssql:** ensure that dot-sql can be executed when column names are not provided ([#10028](https://github.com/ibis-project/ibis/issues/10028)) ([1936437](https://github.com/ibis-project/ibis/commit/193643717d1042d3244171c9af3888f6009c9c5e)), closes [#10025](https://github.com/ibis-project/ibis/issues/10025)
* **mssql:** fix strip, lstrip, rstrip ([f53feab](https://github.com/ibis-project/ibis/commit/f53feaba1e03f6b8a05f5f705ae2cc844a865599))
* **oracle:** fix lstrip, rstrip, and strip ([3f5a304](https://github.com/ibis-project/ibis/commit/3f5a3042061bcaee7f9e611cc5ce60bd8bf973e2))
* **pandas:** don't silently ignore result column name mismatches ([48be246](https://github.com/ibis-project/ibis/commit/48be246f6a5b6381dbd83ca0d0fa9ee5fe45f542))
* **polars:** support polars `Enum` type ([#10017](https://github.com/ibis-project/ibis/issues/10017)) ([869829f](https://github.com/ibis-project/ibis/commit/869829f03d957d572113929533414a015b312047))
* **sqlite:** list temporary tables by default ([#10058](https://github.com/ibis-project/ibis/issues/10058)) ([dfa55b6](https://github.com/ibis-project/ibis/commit/dfa55b6465ebb54d65a2041c752f2058fd422d3a))
* **sql:** properly parenthesize binary ops containing named expressions ([5c2eadc](https://github.com/ibis-project/ibis/commit/5c2eadcdd5b2fbfcdae454e7149d9438c52e190f))

### Documentation

* **accursed:** add cursed knowledge page ([#10031](https://github.com/ibis-project/ibis/issues/10031)) ([85e1dcc](https://github.com/ibis-project/ibis/commit/85e1dccd59c46f5abf8670ca1d3c1f559f219ecd))
* **duckdb:** fix broken link to parquet writing ([#10026](https://github.com/ibis-project/ibis/issues/10026)) ([d22f8eb](https://github.com/ibis-project/ibis/commit/d22f8eb88cc0cfb70b2a9e292564d8c87206c352))
* **jupyterlite:** disable insecure extensions ([#10052](https://github.com/ibis-project/ibis/issues/10052)) ([3d8280b](https://github.com/ibis-project/ibis/commit/3d8280b494dd9df6f2e40fe2f4966786a6fa5766))

### Refactors

* **backends:** clean up resources produced by `memtable` ([#10055](https://github.com/ibis-project/ibis/issues/10055)) ([019cae5](https://github.com/ibis-project/ibis/commit/019cae5d8567477b7be38942069f66b6ce87805a))
* **backends:** split memtable existence check out ([#10053](https://github.com/ibis-project/ibis/issues/10053)) ([77448bf](https://github.com/ibis-project/ibis/commit/77448bfb85a48b8674d3fe432639f6ac5752c1ba))
* **datafusion:** avoid reinitializing memtables on every execute call ([#10057](https://github.com/ibis-project/ibis/issues/10057)) ([43e5f12](https://github.com/ibis-project/ibis/commit/43e5f1282bf1c4fcab8e4f1c40927bedd8bc95a8))
* **dependencies:** make `fsspec` a test-only dependency ([37e4439](https://github.com/ibis-project/ibis/commit/37e4439328315dece1ee54ade0fff1f17a5ef8b2))
* **formats:** plumb through `data_mapper` and `schema` in both pandas and pyarrow formats ([cbeb967](https://github.com/ibis-project/ibis/commit/cbeb967a48ae3ca37669721e54874aff8bbc435d))
* **mssql:** simplify lpad and rpad ops ([#10085](https://github.com/ibis-project/ibis/issues/10085)) ([ef5d58d](https://github.com/ibis-project/ibis/commit/ef5d58deab950d3bc205cb0e4c6bc1ba3e6299f7)), closes [/github.com/ibis-project/ibis/pull/10060#discussion_r1752665235](https://github.com/ibis-project//github.com/ibis-project/ibis/pull/10060/issues/discussion_r1752665235)
* **polars:** handle memtables like every other backend ([#10056](https://github.com/ibis-project/ibis/issues/10056)) ([2b0dbb9](https://github.com/ibis-project/ibis/commit/2b0dbb980f40ab52b5cdfbe906233c311c8cf8ee))

### Performance

* **backends:** speed up most memtable existence checks ([#10067](https://github.com/ibis-project/ibis/issues/10067)) ([a205ab7](https://github.com/ibis-project/ibis/commit/a205ab7810356973678ab7ff94c171c9c43edab4))
* **ir:** don't recreate nodes in `replace` if their children haven't changed ([ac79604](https://github.com/ibis-project/ibis/commit/ac79604f5acebb15281ebb2b15d0ac81c0a0c579))
* **sql:** avoid parenthesizing chains of commutative operators ([f86515c](https://github.com/ibis-project/ibis/commit/f86515c0c26a50c9cff39969e01543ea728d2391))

### Deprecations

* **api:** deprecate `bool_val.negate()`/`-bool_val` in favor of `~bool_val` ([499fc03](https://github.com/ibis-project/ibis/commit/499fc03bb613c473584669ab14dcb36584eb909f))
* **api:** deprecate filtering/expression projection in `Table.__getitem__` ([62c63d2](https://github.com/ibis-project/ibis/commit/62c63d243f13aaf566c9c66bd48510ddbd76bacf))
* **selectors:** deprecate `c` and `r` selectors in favor of `cols` and `index` ([29b865e](https://github.com/ibis-project/ibis/commit/29b865e96288dbbb3baf62d698dbea980b95e84f))

## [9.4.0](https://github.com/ibis-project/ibis/compare/9.3.0...9.4.0) (2024-09-03)

### Features
Expand Down
36 changes: 13 additions & 23 deletions docs/tutorials/ibis-for-pandas-users.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -126,13 +126,6 @@ Selecting columns is very similar to in pandas. In fact, you can use the same sy
t[["one", "two"]]
```

However, since row-level indexing is not supported in Ibis, the inner list is not necessary.


```{python}
t["one", "two"]
```

## Selecting columns

Selecting columns is done using the same syntax as in pandas `DataFrames`. You can use either
Expand Down Expand Up @@ -192,11 +185,11 @@ new_col = unnamed.name("new_col")
new_col
```

You can then add this column to the table using a projection.
You can then add this column to the table using `mutate`


```{python}
proj = t["one", "two", new_col]
proj = t.mutate(new_col)
proj
```

Expand Down Expand Up @@ -301,10 +294,9 @@ penguins.limit(5)
### Filtering rows

In addition to limiting the number of rows that are returned, it is possible to
filter the rows using expressions. Expressions are constructed very similarly to
the way they are in pandas. Ibis expressions are constructed from operations on
columns in a table which return a boolean result. This result is then used to
filter the table.
filter the rows using expressions. This is done using the `filter` method in
ibis. Ibis expressions are constructed from operations on columns in a table
which return a boolean result. This result is then used to filter the table.


```{python}
Expand All @@ -324,32 +316,30 @@ get 6 rows back.


```{python}
filtered = penguins[expr]
filtered = penguins.filter(expr)
filtered
```

Of course, the filtering expression can be applied inline as well.


```{python}
filtered = penguins[penguins.bill_length_mm > 37.0]
filtered = penguins.filter(penguins.bill_length_mm > 37.0)
filtered
```

Multiple filtering expressions can be combined into a single expression or chained onto existing
table expressions.
Multiple filtering expressions may be passed in to a single call (filtering
only rows where they're all true), or combined together using common boolean
operators like (`&`, `|`). The expressions below are equivalent:


```{python}
filtered = penguins[(penguins.bill_length_mm > 37.0) & (penguins.bill_depth_mm > 18.0)]
filtered = penguins.filter(penguins.bill_length_mm > 37.0, penguins.bill_depth_mm > 18.0)
filtered
```

The code above will return the same rows as the code below.


```{python}
filtered = penguins[penguins.bill_length_mm > 37.0][penguins.bill_depth_mm > 18.0]
filtered = penguins.filter((penguins.bill_length_mm > 37.0) & (penguins.bill_depth_mm > 18.0))
filtered
```

Expand All @@ -359,7 +349,7 @@ is greater than the mean.


```{python}
filtered = penguins[penguins.bill_length_mm > penguins.bill_length_mm.mean()]
filtered = penguins.filter(penguins.bill_length_mm > penguins.bill_length_mm.mean())
filtered
```

Expand Down
64 changes: 29 additions & 35 deletions docs/tutorials/ibis-for-sql-users.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -46,12 +46,6 @@ FROM my_data

In Ibis, this is

```{python}
proj = t["two", "one"]
```

or

```{python}
proj = t.select("two", "one")
```
Expand All @@ -78,7 +72,7 @@ new_col = (t.three * 2).name("new_col")
Now, we have:

```{python}
proj = t["two", "one", new_col]
proj = t.select("two", "one", new_col)
ibis.to_sql(proj)
```

Expand Down Expand Up @@ -113,15 +107,15 @@ select all columns in a table using the `SELECT *` construct. To do this, use
the table expression itself in a projection:

```{python}
proj = t[t]
proj = t.select(t)
ibis.to_sql(proj)
```

This is how `mutate` is implemented. The example above
`t.mutate(new_col=t.three * 2)` can be written as a normal projection:

```{python}
proj = t[t, new_col]
proj = t.select(t, new_col)
ibis.to_sql(proj)
```

Expand All @@ -144,7 +138,7 @@ To write this with Ibis, it is:

```{python}
diff = (t.two - t2.value).name("diff")
joined = t.join(t2, t.one == t2.key)[t, diff]
joined = t.join(t2, t.one == t2.key).select(t, diff)
```

And verify the generated SQL:
Expand Down Expand Up @@ -188,19 +182,18 @@ ibis.to_sql(expr)

## Filtering / `WHERE`

You can add filter clauses to a table expression either by indexing with
`[]` (similar to pandas) or use the `filter` method:
You can add filter clauses to a table expression by using the `filter` method:

```{python}
filtered = t[t.two > 0]
filtered = t.filter(t.two > 0)
ibis.to_sql(filtered)
```

`filter` can take a list of expressions, which must all be satisfied for
`filter` can take multiple expressions, which must all be satisfied for
a row to appear in the result:

```{python}
filtered = t.filter([t.two > 0, t.one.isin(["A", "B"])])
filtered = t.filter(t.two > 0, t.one.isin(["A", "B"]))
ibis.to_sql(filtered)
```

Expand All @@ -209,7 +202,7 @@ To compose boolean expressions with `AND` or `OR`, use the respective

```{python}
cond = (t.two < 0) | ((t.two > 0) | t.one.isin(["A", "B"]))
filtered = t[cond]
filtered = t.filter(cond)
ibis.to_sql(filtered)
```

Expand Down Expand Up @@ -617,7 +610,7 @@ ibis.to_sql(expr)

```{python}
agged = (
expr[expr.one.notnull()]
expr.filter(expr.one.notnull())
.group_by("is_valid")
.aggregate(three_count=lambda t: t.three.notnull().sum())
)
Expand All @@ -632,7 +625,7 @@ keyword. The result of `between` is boolean and can be used with any
other boolean expression:

```{python}
expr = t[t.two.between(10, 50) & t.one.notnull()]
expr = t.filter(t.two.between(10, 50) & t.one.notnull())
ibis.to_sql(expr)
```

Expand Down Expand Up @@ -684,15 +677,15 @@ After one or more joins, you can reference any of the joined tables in
a projection immediately after:

```{python}
expr = joined[t1, t2.value2]
expr = joined.select(t1, t2.value2)
ibis.to_sql(expr)
```

If you need to compute an expression that involves both tables, you can
do that also:

```{python}
expr = joined[t1.key1, (t1.value1 - t2.value2).name("diff")]
expr = joined.select(t1.key1, (t1.value1 - t2.value2).name("diff"))
ibis.to_sql(expr)
```

Expand Down Expand Up @@ -800,15 +793,15 @@ In these case, we can specify a list of common join keys:

```{python}
joined = t4.join(t5, ["key1", "key2", "key3"])
expr = joined[t4, t5.value2]
expr = joined.select(t4, t5.value2)
ibis.to_sql(expr)
```

You can mix the overlapping key names with other expressions:

```{python}
joined = t4.join(t5, ["key1", "key2", t4.key3.left(4) == t5.key3.left(4)])
expr = joined[t4, t5.value2]
expr = joined.select(t4, t5.value2)
ibis.to_sql(expr)
```

Expand Down Expand Up @@ -885,15 +878,15 @@ cond = (events.user_id == purchases.user_id).any()
This can now be used to filter `events`:

```{python}
expr = events[cond]
expr = events.filter(cond)
ibis.to_sql(expr)
```

If you negate the condition, it will instead give you only event data
from user *that have not made a purchase*:

```{python}
expr = events[-cond]
expr = events.filter(-cond)
ibis.to_sql(expr)
```

Expand All @@ -916,7 +909,7 @@ you can write with Ibis:

```{python}
cond = events.user_id.isin(purchases.user_id)
expr = events[cond]
expr = events.filter(cond)
ibis.to_sql(expr)
```

Expand All @@ -941,7 +934,7 @@ WHERE value1 > (
With Ibis, the code is simpler and more pandas-like:

```{python}
expr = t1[t1.value1 > t2.value2.max()]
expr = t1.filter(t1.value1 > t2.value2.max())
ibis.to_sql(expr)
```

Expand All @@ -968,8 +961,8 @@ With Ibis, the code is similar, but you add the correlated filter to the
average statistic:

```{python}
stat = t2[t1.key1 == t2.key3].value2.mean()
expr = t1[t1.value1 > stat]
stat = t2.filter(t1.key1 == t2.key3).value2.mean()
expr = t1.filter(t1.value1 > stat)
ibis.to_sql(expr)
```

Expand Down Expand Up @@ -1118,7 +1111,7 @@ Ibis provides a `row_number()` function that allows you to do this:
expr = purchases.mutate(
row_number=ibis.row_number().over(group_by=[_.user_id], order_by=_.price)
)
expr = expr[_.row_number < 3]
expr = expr.filter(_.row_number < 3)
```

The output of this is a table with the three most expensive items that each user has purchased
Expand Down Expand Up @@ -1149,7 +1142,7 @@ Ibis has a set of interval APIs that allow you to do date/time
arithmetic. For example:

```{python}
expr = events[events.ts > (ibis.now() - ibis.interval(years=1))]
expr = events.filter(events.ts > (ibis.now() - ibis.interval(years=1)))
ibis.to_sql(expr)
```

Expand Down Expand Up @@ -1214,12 +1207,13 @@ purchases = ibis.table(
metric = purchases.amount.sum().name("total")
agged = purchases.group_by(["region", "kind"]).aggregate(metric)
left = agged[agged.kind == "foo"]
right = agged[agged.kind == "bar"]
left = agged.filter(agged.kind == "foo")
right = agged.filter(agged.kind == "bar")
result = left.join(right, left.region == right.region)[
left.region, (left.total - right.total).name("diff")
]
result = (
left.join(right, left.region == right.region)
.select(left.region, (left.total - right.total).name("diff"))
)
```

Ibis automatically creates a CTE for `agged`:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -184,7 +184,7 @@ transaction count over the past five hours may be useful features. Let’s write
out each of these using Ibis API:

```{python}
user_trans_amt_last_360m_agg = source_table[
user_trans_amt_last_360m_agg = source_table.select(
source_table.user_id,
# Calculate the average transaction amount over the past six hours
source_table.amt.mean()
Expand All @@ -207,7 +207,7 @@ user_trans_amt_last_360m_agg = source_table[
)
.name("user_trans_count_last_360min"),
source_table.trans_date_trans_time,
]
)
```

`over()` creates an [over
Expand Down
12 changes: 6 additions & 6 deletions flake.lock
2 changes: 1 addition & 1 deletion ibis/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

from __future__ import annotations

__version__ = "9.4.0"
__version__ = "9.5.0"

import warnings
from typing import Any
Expand Down
30 changes: 25 additions & 5 deletions ibis/backends/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

import abc
import collections.abc
import contextlib
import functools
import importlib.metadata
import keyword
Expand Down Expand Up @@ -1109,16 +1110,34 @@ def _register_udfs(self, expr: ir.Expr) -> None:
if self.supports_python_udfs:
raise NotImplementedError(self.name)

def _in_memory_table_exists(self, name: str) -> bool:
return name in self.list_tables()

def _register_in_memory_tables(self, expr: ir.Expr) -> None:
for memtable in expr.op().find(ops.InMemoryTable):
self._register_in_memory_table(memtable)
if not self._in_memory_table_exists(memtable.name):
self._register_in_memory_table(memtable)
weakref.finalize(
memtable, self._finalize_in_memory_table, memtable.name
)

def _register_in_memory_table(self, op: ops.InMemoryTable):
def _register_in_memory_table(self, op: ops.InMemoryTable) -> None:
if self.supports_in_memory_tables:
raise NotImplementedError(
f"{self.name} must implement `_register_in_memory_table` to support in-memory tables"
)

def _finalize_in_memory_table(self, name: str) -> None:
"""Wrap `_finalize_memtable` to suppress exceptions."""
with contextlib.suppress(Exception):
self._finalize_memtable(name)

def _finalize_memtable(self, name: str) -> None:
if self.supports_in_memory_tables:
raise NotImplementedError(
f"{self.name} must implement `_finalize_memtable` to support in-memory tables"
)

def _run_pre_execute_hooks(self, expr: ir.Expr) -> None:
"""Backend-specific hooks to run before an expression is executed."""
self._register_udfs(expr)
Expand Down Expand Up @@ -1396,11 +1415,12 @@ def connect(resource: Path | str, **kwargs: Any) -> BaseBackend:
if len(value) == 1:
kwargs[name] = value[0]

# Merge explicit kwargs with query string, explicit kwargs
# taking precedence
kwargs.update(orig_kwargs)

if scheme == "file":
path = parsed.netloc + parsed.path
# Merge explicit kwargs with query string, explicit kwargs
# taking precedence
kwargs.update(orig_kwargs)
if path.endswith(".duckdb"):
return ibis.duckdb.connect(path, **kwargs)
elif path.endswith((".sqlite", ".db")):
Expand Down
228 changes: 121 additions & 107 deletions ibis/backends/bigquery/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,6 @@
)
from ibis.backends.bigquery.datatypes import BigQuerySchema
from ibis.backends.sql import SQLBackend
from ibis.backends.sql.datatypes import BigQueryType

if TYPE_CHECKING:
from collections.abc import Iterable, Mapping
Expand Down Expand Up @@ -147,10 +146,18 @@ def _force_quote_table(table: sge.Table) -> sge.Table:
return table


def _postprocess_arrow(
table_or_batch: pa.Table | pa.RecordBatch, names: list[str]
) -> pa.Table | pa.RecordBatch:
"""Drop `_TABLE_SUFFIX` if present in the results, then rename columns."""
if "_TABLE_SUFFIX" in table_or_batch.column_names:
table_or_batch = table_or_batch.drop_columns(["_TABLE_SUFFIX"])
return table_or_batch.rename_columns(names)


class Backend(SQLBackend, CanCreateDatabase, CanCreateSchema):
name = "bigquery"
compiler = sc.bigquery.compiler
supports_in_memory_tables = True
supports_python_udfs = False

def __init__(self, *args, **kwargs) -> None:
Expand All @@ -163,31 +170,35 @@ def _session_dataset(self):
self.__session_dataset = self._make_session()
return self.__session_dataset

def _register_in_memory_table(self, op: ops.InMemoryTable) -> None:
raw_name = op.name
def _in_memory_table_exists(self, name: str) -> bool:
table_ref = bq.TableReference(self._session_dataset, name)

session_dataset = self._session_dataset
project = session_dataset.project
dataset = session_dataset.dataset_id

table_ref = bq.TableReference(session_dataset, raw_name)
try:
self.client.get_table(table_ref)
except google.api_core.exceptions.NotFound:
table_id = sg.table(
raw_name, db=dataset, catalog=project, quoted=False
).sql(dialect=self.name)
bq_schema = BigQuerySchema.from_ibis(op.schema)
load_job = self.client.load_table_from_dataframe(
op.data.to_frame(),
table_id,
job_config=bq.LoadJobConfig(
# fail if the table already exists and contains data
write_disposition=bq.WriteDisposition.WRITE_EMPTY,
schema=bq_schema,
),
)
load_job.result()
return False
else:
return True

def _finalize_memtable(self, name: str) -> None:
table_ref = bq.TableReference(self._session_dataset, name)
self.client.delete_table(table_ref, not_found_ok=True)

def _register_in_memory_table(self, op: ops.InMemoryTable) -> None:
table_ref = bq.TableReference(self._session_dataset, op.name)

bq_schema = BigQuerySchema.from_ibis(op.schema)

load_job = self.client.load_table_from_dataframe(
op.data.to_frame(),
table_ref,
job_config=bq.LoadJobConfig(
# fail if the table already exists and contains data
write_disposition=bq.WriteDisposition.WRITE_EMPTY,
schema=bq_schema,
),
)
load_job.result()

def _read_file(
self,
Expand Down Expand Up @@ -702,50 +713,6 @@ def compile(
self._log(sql)
return sql

def execute(self, expr, params=None, limit="default", **kwargs):
"""Compile and execute the given Ibis expression.
Compile and execute Ibis expression using this backend client
interface, returning results in-memory in the appropriate object type
Parameters
----------
expr
Ibis expression to execute
limit
Retrieve at most this number of values/rows. Overrides any limit
already set on the expression.
params
Query parameters
kwargs
Extra arguments specific to the backend
Returns
-------
pd.DataFrame | pd.Series | scalar
Output from execution
"""
from ibis.backends.bigquery.converter import BigQueryPandasData

self._run_pre_execute_hooks(expr)

schema = expr.as_table().schema() - ibis.schema({"_TABLE_SUFFIX": "string"})

sql = self.compile(expr, limit=limit, params=params, **kwargs)
self._log(sql)
query = self.raw_sql(sql, params=params, **kwargs)

arrow_t = query.to_arrow(
progress_bar_type=None, bqstorage_client=self.storage_client
)

result = BigQueryPandasData.convert_table(
arrow_t.to_pandas(timestamp_as_object=True), schema
)

return expr.__pandas_result__(result, schema=schema)

def insert(
self,
table_name: str,
Expand Down Expand Up @@ -784,6 +751,21 @@ def insert(
overwrite=overwrite,
)

def _to_query(
self,
table_expr: ir.Table,
*,
params: Mapping[ir.Scalar, Any] | None = None,
limit: int | str | None = None,
page_size: int | None = None,
**kwargs: Any,
):
self._run_pre_execute_hooks(table_expr)
sql = self.compile(table_expr, limit=limit, params=params, **kwargs)
self._log(sql)

return self.raw_sql(sql, params=params, page_size=page_size)

def to_pyarrow(
self,
expr: ir.Expr,
Expand All @@ -793,15 +775,16 @@ def to_pyarrow(
**kwargs: Any,
) -> pa.Table:
self._import_pyarrow()
self._register_in_memory_tables(expr)
sql = self.compile(expr, limit=limit, params=params, **kwargs)
self._log(sql)
query = self.raw_sql(sql, params=params, **kwargs)

table_expr = expr.as_table()
schema = table_expr.schema() - ibis.schema({"_TABLE_SUFFIX": "string"})

query = self._to_query(table_expr, params=params, limit=limit, **kwargs)
table = query.to_arrow(
progress_bar_type=None, bqstorage_client=self.storage_client
)
table = table.rename_columns(list(expr.as_table().schema().names))
return expr.__pyarrow_result__(table)
table = _postprocess_arrow(table, list(schema.names))
return expr.__pyarrow_result__(table, schema=schema)

def to_pyarrow_batches(
self,
Expand All @@ -814,14 +797,55 @@ def to_pyarrow_batches(
):
pa = self._import_pyarrow()

schema = expr.as_table().schema()
table_expr = expr.as_table()
schema = table_expr.schema() - ibis.schema({"_TABLE_SUFFIX": "string"})
colnames = list(schema.names)

self._register_in_memory_tables(expr)
sql = self.compile(expr, limit=limit, params=params, **kwargs)
self._log(sql)
query = self.raw_sql(sql, params=params, page_size=chunk_size, **kwargs)
query = self._to_query(
table_expr, params=params, limit=limit, page_size=chunk_size, **kwargs
)
batch_iter = query.to_arrow_iterable(bqstorage_client=self.storage_client)
return pa.ipc.RecordBatchReader.from_batches(schema.to_pyarrow(), batch_iter)
return pa.ipc.RecordBatchReader.from_batches(
schema.to_pyarrow(),
(_postprocess_arrow(b, colnames) for b in batch_iter),
)

def execute(self, expr, params=None, limit="default", **kwargs):
"""Compile and execute the given Ibis expression.
Compile and execute Ibis expression using this backend client
interface, returning results in-memory in the appropriate object type
Parameters
----------
expr
Ibis expression to execute
limit
Retrieve at most this number of values/rows. Overrides any limit
already set on the expression.
params
Query parameters
kwargs
Extra arguments specific to the backend
Returns
-------
pd.DataFrame | pd.Series | scalar
Output from execution
"""
from ibis.backends.bigquery.converter import BigQueryPandasData

table_expr = expr.as_table()
schema = table_expr.schema() - ibis.schema({"_TABLE_SUFFIX": "string"})
query = self._to_query(table_expr, params=params, limit=limit, **kwargs)
df = query.to_arrow(
progress_bar_type=None, bqstorage_client=self.storage_client
).to_pandas(timestamp_as_object=True)
# Drop _TABLE_SUFFIX if present in the results, then rename columns
df = df.drop(columns="_TABLE_SUFFIX", errors="ignore")
df.columns = schema.names
return expr.__pandas_result__(df, schema=schema, data_mapper=BigQueryPandasData)

def _gen_udf_name(self, name: str, schema: Optional[str]) -> str:
func = ".".join(filter(None, (schema, name)))
Expand Down Expand Up @@ -868,6 +892,17 @@ def list_tables(
) -> list[str]:
"""List the tables in the database.
::: {.callout-note}
## Ibis does not use the word `schema` to refer to database hierarchy.
A collection of tables is referred to as a `database`.
A collection of `database` is referred to as a `catalog`.
These terms are mapped onto the corresponding features in each
backend (where available), regardless of whether the backend itself
uses the same terminology.
:::
Parameters
----------
like
Expand All @@ -880,18 +915,7 @@ def list_tables(
To specify a table in a separate BigQuery dataset, you can pass in the
dataset and project as a string `"dataset.project"`, or as a tuple of
strings `("dataset", "project")`.
::: {.callout-note}
## Ibis does not use the word `schema` to refer to database hierarchy.
A collection of tables is referred to as a `database`.
A collection of `database` is referred to as a `catalog`.
These terms are mapped onto the corresponding features in each
backend (where available), regardless of whether the backend itself
uses the same terminology.
:::
strings `(dataset, project)`.
schema
[deprecated] The schema (dataset) inside `database` to perform the list against.
"""
Expand Down Expand Up @@ -1010,7 +1034,7 @@ def create_table(
obj = ibis.memtable(obj, schema=schema)

if obj is not None:
self._register_in_memory_tables(obj)
self._run_pre_execute_hooks(obj)

if temp:
dataset = self._session_dataset.dataset_id
Expand Down Expand Up @@ -1038,22 +1062,12 @@ def create_table(

table = _force_quote_table(table)

column_defs = [
sge.ColumnDef(
this=sg.to_identifier(name, quoted=self.compiler.quoted),
kind=BigQueryType.from_ibis(typ),
constraints=(
None
if typ.nullable or typ.is_array()
else [sge.ColumnConstraint(kind=sge.NotNullColumnConstraint())]
),
)
for name, typ in (schema or {}).items()
]

stmt = sge.Create(
kind="TABLE",
this=sge.Schema(this=table, expressions=column_defs or None),
this=sge.Schema(
this=table,
expressions=schema.to_sqlglot(self.dialect) if schema else None,
),
replace=overwrite,
properties=sge.Properties(expressions=properties),
expression=None if obj is None else self.compile(obj),
Expand Down Expand Up @@ -1107,7 +1121,7 @@ def create_view(
expression=self.compile(obj),
replace=overwrite,
)
self._register_in_memory_tables(obj)
self._run_pre_execute_hooks(obj)
self.raw_sql(stmt.sql(self.name))
return self.table(name, database=(catalog, database))

Expand Down
22 changes: 16 additions & 6 deletions ibis/backends/bigquery/tests/system/test_client.py
Original file line number Diff line number Diff line change
Expand Up @@ -186,7 +186,7 @@ def test_scalar_param_partition_time(parted_alltypes):
assert "PARTITIONTIME" in parted_alltypes.columns
assert "PARTITIONTIME" in parted_alltypes.schema()
param = ibis.param("timestamp('UTC')")
expr = parted_alltypes[param > parted_alltypes.PARTITIONTIME]
expr = parted_alltypes.filter(param > parted_alltypes.PARTITIONTIME)
df = expr.execute(params={param: "2017-01-01"})
assert df.empty

Expand All @@ -201,7 +201,7 @@ def test_parted_column(con, kind):

def test_cross_project_query(public):
table = public.table("posts_questions")
expr = table[table.tags.contains("ibis")][["title", "tags"]]
expr = table.filter(table.tags.contains("ibis"))[["title", "tags"]]
n = 5
df = expr.limit(n).execute()
assert len(df) == n
Expand Down Expand Up @@ -231,7 +231,7 @@ def test_multiple_project_queries_execute(con):
trips = con.table("trips", database="nyc-tlc.yellow").limit(5)
predicate = posts_questions.tags == trips.rate_code
cols = [posts_questions.title]
join = posts_questions.left_join(trips, predicate)[cols]
join = posts_questions.left_join(trips, predicate).select(cols)
result = join.execute()
assert list(result.columns) == ["title"]
assert len(result) == 5
Expand Down Expand Up @@ -421,12 +421,22 @@ def test_create_table_from_scratch_with_spaces(project_id, dataset_id):
con.drop_table(name)


def test_table_suffix():
@pytest.mark.parametrize("ret_type", ["pandas", "pyarrow", "pyarrow_batches"])
def test_table_suffix(ret_type):
con = ibis.connect("bigquery://ibis-gbq")
t = con.table("gsod*", database="bigquery-public-data.noaa_gsod")
expr = t.filter(t._TABLE_SUFFIX == "1929", t.max != 9999.9).head(1)
result = expr.execute()
assert not result.empty
if ret_type == "pandas":
result = expr.to_pandas()
cols = list(result.columns)
elif ret_type == "pyarrow":
result = expr.to_pyarrow()
cols = result.column_names
elif ret_type == "pyarrow_batches":
result = pa.Table.from_batches(expr.to_pyarrow_batches())
cols = result.column_names
assert len(result)
assert "_TABLE_PREFIX" not in cols


def test_parameters_in_url_connect(mocker):
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
@pytest.fixture(scope="module")
def alltypes(con):
t = con.table("functional_alltypes")
expr = t[t.bigint_col.isin([10, 20])].limit(10)
expr = t.filter(t.bigint_col.isin([10, 20])).limit(10)
return expr


Expand Down
12 changes: 6 additions & 6 deletions ibis/backends/bigquery/tests/unit/test_compiler.py
Original file line number Diff line number Diff line change
Expand Up @@ -151,11 +151,11 @@ def test_projection_fusion_only_peeks_at_immediate_parent(snapshot):
("val", "int64"),
]
table = ibis.table(schema, name="unbound_table")
table = table[table.PARTITIONTIME < ibis.date("2017-01-01")]
table = table.filter(table.PARTITIONTIME < ibis.date("2017-01-01"))
table = table.mutate(file_date=table.file_date.cast("date"))
table = table[table.file_date < ibis.date("2017-01-01")]
table = table.filter(table.file_date < ibis.date("2017-01-01"))
table = table.mutate(XYZ=table.val * 2)
expr = table.join(table.view())[table]
expr = table.join(table.view()).select(table)
snapshot.assert_match(to_sql(expr), "out.sql")


Expand Down Expand Up @@ -276,7 +276,7 @@ class MockBackend(ibis.backends.bigquery.Backend):
for _ in range(num_joins): # noqa: F402
table = table.mutate(dummy=ibis.literal(""))
table_ = table.view()
table = table.left_join(table_, ["dummy"])[[table_]]
table = table.left_join(table_, ["dummy"]).select(table_)

start = time.time()
table.compile()
Expand Down Expand Up @@ -417,9 +417,9 @@ def test_divide_by_zero(alltypes, op, snapshot):


def test_identical_to(alltypes, snapshot):
expr = alltypes[
expr = alltypes.filter(
_.string_col.identical_to("a") & _.date_string_col.identical_to("b")
]
)
snapshot.assert_match(to_sql(expr), "out.sql")


Expand Down
31 changes: 22 additions & 9 deletions ibis/backends/clickhouse/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -143,8 +143,7 @@ def do_connect(
>>> import ibis
>>> client = ibis.clickhouse.connect()
>>> client
<ibis.clickhouse.client.ClickhouseClient object at 0x...>
<ibis.backends.clickhouse.Backend object at 0x...>
"""
if settings is None:
settings = {}
Expand Down Expand Up @@ -674,13 +673,7 @@ def create_table(

this = sge.Schema(
this=sg.table(name, db=database, quoted=self.compiler.quoted),
expressions=[
sge.ColumnDef(
this=sg.to_identifier(name, quoted=self.compiler.quoted),
kind=self.compiler.type_mapper.from_ibis(typ),
)
for name, typ in (schema or obj.schema()).items()
],
expressions=(schema or obj.schema()).to_sqlglot(self.dialect),
)
properties = [
# the engine cannot be quoted, since clickhouse won't allow e.g.,
Expand Down Expand Up @@ -779,3 +772,23 @@ def create_view(
with self._safe_raw_sql(src, external_tables=external_tables):
pass
return self.table(name, database=database)

def _in_memory_table_exists(self, name: str) -> bool:
name = sg.table(name, quoted=self.compiler.quoted).sql(self.dialect)
try:
# DESCRIBE TABLE $TABLE FORMAT NULL is the fastest way to check
# table existence in clickhouse; FORMAT NULL produces no data which
# is ideal since we don't care about the output for existence
# checking
#
# Other methods compared were
# 1. SELECT 1 FROM $TABLE LIMIT 0
# 2. SHOW TABLES LIKE $TABLE LIMIT 1
#
# if the table exists nothing is returned and there's no error
# otherwise there's an error
self.con.raw_query(f"DESCRIBE {name} FORMAT NULL")
except cc.driver.exceptions.DatabaseError:
return False
else:
return True
Original file line number Diff line number Diff line change
@@ -1,5 +1,3 @@
SELECT
(
"t0"."int_col" + "t0"."tinyint_col"
) + "t0"."double_col" AS "Add(Add(int_col, tinyint_col), double_col)"
"t0"."int_col" + "t0"."tinyint_col" + "t0"."double_col" AS "Add(Add(int_col, tinyint_col), double_col)"
FROM "functional_alltypes" AS "t0"
Original file line number Diff line number Diff line change
@@ -1,10 +1,6 @@
SELECT
"t1"."key" AS "key",
SUM((
(
"t1"."value" + 1
) + 2
) + 3) AS "abc"
SUM("t1"."value" + 1 + 2 + 3) AS "abc"
FROM (
SELECT
*
Expand Down
Original file line number Diff line number Diff line change
@@ -1,10 +1,6 @@
SELECT
"t1"."key" AS "key",
SUM((
(
"t1"."value" + 1
) + 2
) + 3) AS "foo"
SUM("t1"."value" + 1 + 2 + 3) AS "foo"
FROM (
SELECT
*
Expand Down
2 changes: 1 addition & 1 deletion ibis/backends/clickhouse/tests/test_aggregations.py
Original file line number Diff line number Diff line change
Expand Up @@ -163,7 +163,7 @@ def test_boolean_reduction(alltypes, op, df):

def test_anonymous_aggregate(alltypes, df):
t = alltypes
expr = t[t.double_col > t.double_col.mean()]
expr = t.filter(t.double_col > t.double_col.mean())
result = expr.execute().set_index("id")
expected = df[df.double_col > df.double_col.mean()].set_index("id")
tm.assert_frame_equal(result, expected, check_like=True)
15 changes: 13 additions & 2 deletions ibis/backends/clickhouse/tests/test_client.py
Original file line number Diff line number Diff line change
Expand Up @@ -129,7 +129,7 @@ def test_sql_query_limits(alltypes):
def test_embedded_identifier_quoting(alltypes):
t = alltypes

expr = t[[(t.double_col * 2).name("double(fun)")]]["double(fun)"].sum()
expr = t.select((t.double_col * 2).name("double(fun)"))["double(fun)"].sum()
expr.execute()


Expand Down Expand Up @@ -375,7 +375,18 @@ def test_from_url(con):
)


def test_invalid_port(con):
def test_from_url_with_kwargs(con):
# since explicit kwargs take precedence, this passes, because we're passing
# `database` explicitly, even though our connection string says to use a
# random database
database = ibis.util.gen_name("clickhouse_database")
assert ibis.connect(
f"clickhouse://{CLICKHOUSE_USER}:{CLICKHOUSE_PASS}@{CLICKHOUSE_HOST}:{CLICKHOUSE_PORT}/{database}",
database=IBIS_TEST_CLICKHOUSE_DB,
)


def test_invalid_port():
port = 9999
url = f"clickhouse://{CLICKHOUSE_USER}:{CLICKHOUSE_PASS}@{CLICKHOUSE_HOST}:{port}/{IBIS_TEST_CLICKHOUSE_DB}"
with pytest.raises(cc.driver.exceptions.DatabaseError):
Expand Down
2 changes: 1 addition & 1 deletion ibis/backends/clickhouse/tests/test_functions.py
Original file line number Diff line number Diff line change
Expand Up @@ -476,7 +476,7 @@ def my_add(a: int, b: int) -> int: ...

n = 5
expr = (
alltypes[alltypes.int_col == 1]
alltypes.filter(alltypes.int_col == 1)
.limit(n)
.int_col.collect()
.map(lambda x: my_add(x, 1))
Expand Down
2 changes: 1 addition & 1 deletion ibis/backends/clickhouse/tests/test_operators.py
Original file line number Diff line number Diff line change
Expand Up @@ -142,7 +142,7 @@ def test_field_in_literals(con, alltypes, df, container):
tm.assert_series_equal(result_col, expected_col)


@pytest.mark.parametrize("column", ["int_col", "float_col", "bool_col"])
@pytest.mark.parametrize("column", ["int_col", "float_col"])
def test_negate(con, alltypes, column, assert_sql):
expr = -alltypes[column]
assert_sql(expr)
Expand Down
26 changes: 13 additions & 13 deletions ibis/backends/clickhouse/tests/test_select.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,23 +38,23 @@ def time_right(con):

def test_timestamp_extract_field(alltypes, assert_sql):
t = alltypes.timestamp_col
expr = alltypes[
expr = alltypes.select(
t.year().name("year"),
t.month().name("month"),
t.day().name("day"),
t.hour().name("hour"),
t.minute().name("minute"),
t.second().name("second"),
]
)
assert_sql(expr)


def test_isin_notin_in_select(alltypes, assert_sql):
values = ["foo", "bar"]
filtered = alltypes[alltypes.string_col.isin(values)]
filtered = alltypes.filter(alltypes.string_col.isin(values))
assert_sql(filtered, "out1.sql")

filtered = alltypes[alltypes.string_col.notin(values)]
filtered = alltypes.filter(alltypes.string_col.notin(values))
assert_sql(filtered, "out2.sql")


Expand Down Expand Up @@ -100,7 +100,7 @@ def test_simple_scalar_aggregates(alltypes, assert_sql):
# Things like table.column.{sum, mean, ...}()
table = alltypes

expr = table[table.int_col > 0].float_col.sum()
expr = table.filter(table.int_col > 0).float_col.sum()
assert_sql(expr)


Expand Down Expand Up @@ -152,7 +152,7 @@ def test_simple_scalar_aggregates(alltypes, assert_sql):

def test_table_column_unbox(alltypes, assert_sql):
m = alltypes.float_col.sum().name("total")
agged = alltypes[alltypes.int_col > 0].group_by("string_col").aggregate([m])
agged = alltypes.filter(alltypes.int_col > 0).group_by("string_col").aggregate([m])
expr = agged.string_col
assert_sql(expr)

Expand Down Expand Up @@ -213,7 +213,7 @@ def test_simple_joins(
):
t1, t2 = batting, awards_players
pred = [t1[left_key] == t2[right_key]]
expr = getattr(t1, join_type)(t2, pred)[[t1]]
expr = getattr(t1, join_type)(t2, pred).select(t1)
assert_sql(expr)


Expand All @@ -226,7 +226,7 @@ def test_self_reference_simple(con, alltypes, assert_sql):
def test_join_self_reference(con, alltypes, assert_sql):
t1 = alltypes
t2 = t1.view()
expr = t1.inner_join(t2, ["id"])[[t1]]
expr = t1.inner_join(t2, ["id"]).select(t1)
assert_sql(expr)
assert len(con.execute(expr))

Expand Down Expand Up @@ -261,7 +261,7 @@ def test_filter_predicates(diamonds):

expr = diamonds
for pred in predicates:
expr = expr[pred(expr)].select(expr)
expr = expr.filter(pred(expr)).select(expr)

expr.execute()

Expand Down Expand Up @@ -305,9 +305,9 @@ def test_join_with_external_table_errors(alltypes):
)

alltypes = alltypes.mutate(b=alltypes.tinyint_col)
expr = alltypes.inner_join(external_table, ["b"])[
expr = alltypes.inner_join(external_table, ["b"]).select(
external_table.a, external_table.c, alltypes.id
]
)

with pytest.raises(cc.driver.exceptions.DatabaseError):
expr.execute()
Expand All @@ -328,9 +328,9 @@ def test_join_with_external_table(alltypes, df):
)

alltypes = alltypes.mutate(b=alltypes.tinyint_col)
expr = alltypes.inner_join(external_table, ["b"])[
expr = alltypes.inner_join(external_table, ["b"]).select(
external_table.a, external_table.c, alltypes.id
]
)

result = expr.execute(external_tables={"external": external_df})
expected = df.assign(b=df.tinyint_col).merge(external_df, on="b")[["a", "c", "id"]]
Expand Down
11 changes: 5 additions & 6 deletions ibis/backends/dask/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,13 +40,12 @@ def do_connect(
Examples
--------
>>> import ibis
>>> import pandas as pd
>>> import dask.dataframe as dd
>>> data = {
... "t": dd.read_parquet("path/to/file.parquet"),
... "s": dd.read_csv("path/to/file.csv"),
... }
>>> ibis.dask.connect(data)
>>> ibis.dask.connect(
... {"t": dd.from_pandas(pd.DataFrame({"a": [1, 2, 3]}), npartitions=1)}
... ) # doctest: +ELLIPSIS
<ibis.backends.dask.Backend object at 0x...>
"""
super().do_connect(dictionary)

Expand Down
4 changes: 2 additions & 2 deletions ibis/backends/dask/tests/test_arrays.py
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ def test_array_collect(t, df):
def test_array_collect_rolling_partitioned(t, df):
window = ibis.trailing_window(1, order_by=t.plain_int64)
colexpr = t.plain_float64.collect().over(window)
expr = t["dup_strings", "plain_int64", colexpr.name("collected")]
expr = t.select("dup_strings", "plain_int64", colexpr.name("collected"))
result = expr.compile()
expected = dd.from_pandas(
pd.DataFrame(
Expand Down Expand Up @@ -134,7 +134,7 @@ def test_array_slice_scalar(client, start, stop):
[1, 3, 4, 11, -11],
)
def test_array_index(t, df, index):
expr = t[t.array_of_float64[index].name("indexed")]
expr = t.select(t.array_of_float64[index].name("indexed"))
result = expr.execute()
expected = pd.DataFrame(
{
Expand Down
36 changes: 19 additions & 17 deletions ibis/backends/dask/tests/test_join.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,9 +30,9 @@

@join_type
def test_join(how, left, right, df1, df2):
expr = left.join(right, left.key == right.key, how=how)[
expr = left.join(right, left.key == right.key, how=how).select(
left, right.other_value, right.key3
]
)
result = expr.compile()
expected = dd.merge(df1, df2, how=how, on="key")
tm.assert_frame_equal(
Expand All @@ -43,7 +43,7 @@ def test_join(how, left, right, df1, df2):

@join_type
def test_join_project_left_table(how, left, right, df1, df2):
expr = left.join(right, left.key == right.key, how=how)[left, right.key3]
expr = left.join(right, left.key == right.key, how=how).select(left, right.key3)
result = expr.compile()
expected = dd.merge(df1, df2, how=how, on="key")[list(left.columns) + ["key3"]]
tm.assert_frame_equal(
Expand Down Expand Up @@ -81,7 +81,7 @@ def test_join_with_duplicate_non_key_columns(how, left, right, df1, df2):
@join_type
def test_join_with_post_expression_selection(how, left, right, df1, df2):
join = left.join(right, left.key == right.key, how=how)
expr = join[left.key, left.value, right.other_value]
expr = join.select(left.key, left.value, right.other_value)
result = expr.compile()
expected = dd.merge(df1, df2, on="key", how=how)[["key", "value", "other_value"]]
tm.assert_frame_equal(
Expand All @@ -96,8 +96,8 @@ def test_join_with_post_expression_filter(how, left):
rhs = left[["key2", "value"]]

joined = lhs.join(rhs, "key2", how=how)
projected = joined[lhs, rhs.value]
expr = projected[projected.value == 4]
projected = joined.select(lhs, rhs.value)
expr = projected.filter(projected.value == 4)
result = expr.compile()

df1 = lhs.compile()
Expand All @@ -118,12 +118,12 @@ def test_multi_join_with_post_expression_filter(how, left, df1):
rhs2 = left[["key2", "value"]].rename(value2="value")

joined = lhs.join(rhs, "key2", how=how)
projected = joined[lhs, rhs.value]
filtered = projected[projected.value == 4]
projected = joined.select(lhs, rhs.value)
filtered = projected.filter(projected.value == 4)

joined2 = filtered.join(rhs2, "key2")
projected2 = joined2[filtered.key, rhs2.value2]
expr = projected2[projected2.value2 == 3]
projected2 = joined2.select(filtered.key, rhs2.value2)
expr = projected2.filter(projected2.value2 == 3)

result = expr.compile()

Expand All @@ -145,7 +145,7 @@ def test_multi_join_with_post_expression_filter(how, left, df1):
def test_join_with_non_trivial_key(how, left, right, df1, df2):
# also test that the order of operands in the predicate doesn't matter
join = left.join(right, right.key.length() == left.key.length(), how=how)
expr = join[left.key, left.value, right.other_value]
expr = join.select(left.key, left.value, right.other_value)
result = expr.compile()

expected = (
Expand All @@ -168,8 +168,8 @@ def test_join_with_non_trivial_key(how, left, right, df1, df2):
def test_join_with_non_trivial_key_project_table(how, left, right, df1, df2):
# also test that the order of operands in the predicate doesn't matter
join = left.join(right, right.key.length() == left.key.length(), how=how)
expr = join[left, right.other_value]
expr = expr[expr.key.length() == 1]
expr = join.select(left, right.other_value)
expr = expr.filter(expr.key.length() == 1)
result = expr.compile()

expected = (
Expand All @@ -194,7 +194,7 @@ def test_join_with_project_right_duplicate_column(client, how, left, df1, df3):
# also test that the order of operands in the predicate doesn't matter
right = client.table("df3")
join = left.join(right, ["key"], how=how)
expr = join[left.key, right.key2, right.other_value]
expr = join.select(left.key, right.key2, right.other_value)
result = expr.compile()

expected = (
Expand All @@ -216,7 +216,9 @@ def test_join_with_project_right_duplicate_column(client, how, left, df1, df3):

@merge_asof_minversion
def test_asof_join(time_left, time_right, time_df1, time_df2):
expr = time_left.asof_join(time_right, "time")[time_left, time_right.other_value]
expr = time_left.asof_join(time_right, "time").select(
time_left, time_right.other_value
)
result = expr.compile()
expected = dd.merge_asof(time_df1, time_df2, on="time")
tm.assert_frame_equal(
Expand All @@ -229,9 +231,9 @@ def test_asof_join(time_left, time_right, time_df1, time_df2):
def test_keyed_asof_join(
time_keyed_left, time_keyed_right, time_keyed_df1, time_keyed_df2
):
expr = time_keyed_left.asof_join(time_keyed_right, "time", predicates="key")[
expr = time_keyed_left.asof_join(time_keyed_right, "time", predicates="key").select(
time_keyed_left, time_keyed_right.other_value
]
)
result = expr.compile()
expected = dd.merge_asof(time_keyed_df1, time_keyed_df2, on="time", by="key")
tm.assert_frame_equal(
Expand Down
16 changes: 8 additions & 8 deletions ibis/backends/dask/tests/test_operations.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,9 @@ def test_literal(client):


def test_selection(t, df):
expr = t[((t.plain_strings == "a") | (t.plain_int64 == 3)) & (t.dup_strings == "d")]
expr = t.filter(
((t.plain_strings == "a") | (t.plain_int64 == 3)) & (t.dup_strings == "d")
)
result = expr.compile()
expected = df[
((df.plain_strings == "a") | (df.plain_int64 == 3)) & (df.dup_strings == "d")
Expand All @@ -56,12 +58,10 @@ def test_mutate(t, df):
@pytest.mark.xfail(reason="TODO - windowing - #2553")
def test_project_scope_does_not_override(t, df):
col = t.plain_int64
expr = t[
[
col.name("new_col"),
col.sum().over(ibis.window(group_by="dup_strings")).name("grouped"),
]
]
expr = t.select(
col.name("new_col"),
col.sum().over(ibis.window(group_by="dup_strings")).name("grouped"),
)
result = expr.compile()
expected = dd.concat(
[
Expand Down Expand Up @@ -402,7 +402,7 @@ def test_nullif_inf(con):

def test_group_concat(t, df):
expr = (
t[t.dup_ints == 1]
t.filter(t.dup_ints == 1)
.group_by(t.dup_strings)
.aggregate(foo=t.dup_ints.group_concat(","))
)
Expand Down
10 changes: 5 additions & 5 deletions ibis/backends/dask/tests/test_window.py
Original file line number Diff line number Diff line change
Expand Up @@ -161,7 +161,7 @@ def test_players(players, players_df):


def test_batting_filter_mean(batting, batting_df):
expr = batting[batting.G > batting.G.mean()]
expr = batting.filter(batting.G > batting.G.mean())
result = expr.execute()
expected = (
batting_df[batting_df.G > batting_df.G.mean()].reset_index(drop=True).compute()
Expand Down Expand Up @@ -348,7 +348,7 @@ def test_mutate_with_window_after_join(con, sort_kind):
right = ibis.memtable(right_df)

joined = left.outer_join(right, left.ints == right.group)
proj = joined[left, right.value]
proj = joined.select(left, right.value)
expr = proj.group_by("ints").mutate(sum=proj.value.sum())
result = con.execute(expr)
expected = pd.DataFrame(
Expand Down Expand Up @@ -380,7 +380,7 @@ def test_mutate_scalar_with_window_after_join(npartitions):
left, right = map(con.table, ("left", "right"))

joined = left.outer_join(right, left.ints == right.group)
proj = joined[left, right.value]
proj = joined.select(left, right.value)
expr = proj.mutate(sum=proj.value.sum(), const=ibis.literal(1))
result = expr.execute()
result = result.sort_values(["ints", "value"]).reset_index(drop=True)
Expand Down Expand Up @@ -415,8 +415,8 @@ def test_project_scalar_after_join(npartitions):
left, right = map(con.table, ("left", "right"))

joined = left.outer_join(right, left.ints == right.group)
proj = joined[left, right.value]
expr = proj[proj.value.sum().name("sum"), ibis.literal(1).name("const")]
proj = joined.select(left, right.value)
expr = proj.select(proj.value.sum().name("sum"), ibis.literal(1).name("const"))
result = expr.execute().reset_index(drop=True)
expected = pd.DataFrame(
{
Expand Down
115 changes: 49 additions & 66 deletions ibis/backends/datafusion/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,6 @@ def as_nullable(dtype: dt.DataType) -> dt.DataType:

class Backend(SQLBackend, CanCreateCatalog, CanCreateDatabase, CanCreateSchema, NoUrl):
name = "datafusion"
supports_in_memory_tables = True
supports_arrays = True
compiler = sc.datafusion.compiler

Expand All @@ -95,9 +94,25 @@ def do_connect(
Examples
--------
>>> import ibis
>>> config = {"t": "path/to/file.parquet", "s": "path/to/file.csv"}
>>> ibis.datafusion.connect(config)
>>> config = {
... "astronauts": "ci/ibis-testing-data/parquet/astronauts.parquet",
... "diamonds": "ci/ibis-testing-data/csv/diamonds.csv",
... }
>>> con = ibis.datafusion.connect(config)
>>> con.list_tables()
['astronauts', 'diamonds']
>>> con.table("diamonds")
DatabaseTable: diamonds
carat float64
cut string
color string
clarity string
depth float64
table float64
price int64
x float64
y float64
z float64
"""
if isinstance(config, SessionContext):
(self.con, config) = (config, None)
Expand All @@ -123,7 +138,7 @@ def do_connect(
config = {}

for name, path in config.items():
self.register(path, table_name=name)
self._register(path, table_name=name)

@util.experimental
@classmethod
Expand Down Expand Up @@ -302,8 +317,11 @@ def list_tables(
sg.select("table_name")
.from_("information_schema.tables")
.where(sg.column("table_schema").eq(sge.convert(database)))
.order_by("table_name")
)
return self._filter_with_like(
self.raw_sql(query).to_pydict()["table_name"], like
)
return self.raw_sql(query).to_pydict()["table_name"]

def get_schema(
self,
Expand Down Expand Up @@ -335,43 +353,14 @@ def register(
table_name: str | None = None,
**kwargs: Any,
) -> ir.Table:
"""Register a data set with `table_name` located at `source`.
Parameters
----------
source
The data source(s). May be a path to a file or directory of
parquet/csv files, a pandas dataframe, or a pyarrow table, dataset
or record batch.
table_name
The name of the table
kwargs
DataFusion-specific keyword arguments
Examples
--------
Register a csv:
>>> import ibis
>>> conn = ibis.datafusion.connect(config)
>>> conn.register("path/to/data.csv", "my_table")
>>> conn.table("my_table")
Register a PyArrow table:
>>> import pyarrow as pa
>>> tab = pa.table({"x": [1, 2, 3]})
>>> conn.register(tab, "my_table")
>>> conn.table("my_table")
Register a PyArrow dataset:
return self._register(source, table_name, **kwargs)

>>> import pyarrow.dataset as ds
>>> dataset = ds.dataset("path/to/table")
>>> conn.register(dataset, "my_table")
>>> conn.table("my_table")
"""
def _register(
self,
source: str | Path | pa.Table | pa.RecordBatch | pa.Dataset | pd.DataFrame,
table_name: str | None = None,
**kwargs: Any,
) -> ir.Table:
import pandas as pd

if isinstance(source, (str, Path)):
Expand All @@ -384,7 +373,7 @@ def register(
self.con.deregister_table(table_name)
self.con.register_record_batches(table_name, [[source]])
return self.table(table_name)
elif isinstance(source, pa.dataset.Dataset):
elif isinstance(source, ds.Dataset):
self.con.deregister_table(table_name)
self.con.register_dataset(table_name, source)
return self.table(table_name)
Expand Down Expand Up @@ -416,16 +405,20 @@ def _register_failure(self):
f"please call one of {msg} directly"
)

def _register_in_memory_table(self, op: ops.InMemoryTable) -> None:
name = op.name
schema = op.schema

self.con.deregister_table(name)
if batches := op.data.to_pyarrow(schema).to_batches():
self.con.register_record_batches(name, [batches])
def _in_memory_table_exists(self, name: str) -> bool:
db = self.con.catalog().database()
try:
db.table(name)
except Exception: # noqa: BLE001 because DataFusion has nothing better
return False
else:
empty_dataset = ds.dataset([], schema=schema.to_pyarrow())
self.con.register_dataset(name=name, dataset=empty_dataset)
return True

def _register_in_memory_table(self, op: ops.InMemoryTable) -> None:
# self.con.register_table is broken, so we do this roundabout thing
# of constructing a datafusion DataFrame, which has a side effect
# of registering the table
self.con.from_arrow_table(op.data.to_pyarrow(op.schema), op.name)

def read_csv(
self, path: str | Path, table_name: str | None = None, **kwargs: Any
Expand Down Expand Up @@ -657,20 +650,10 @@ def create_table(
table_ident = sg.table(name, db=database, quoted=quoted)

if query is None:
column_defs = [
sge.ColumnDef(
this=sg.to_identifier(colname, quoted=quoted),
kind=self.compiler.type_mapper.from_ibis(typ),
constraints=(
None
if typ.nullable
else [sge.ColumnConstraint(kind=sge.NotNullColumnConstraint())]
),
)
for colname, typ in (schema or table.schema()).items()
]

target = sge.Schema(this=table_ident, expressions=column_defs)
target = sge.Schema(
this=table_ident,
expressions=(schema or table.schema()).to_sqlglot(self.dialect),
)
else:
target = table_ident

Expand Down
8 changes: 2 additions & 6 deletions ibis/backends/datafusion/tests/test_connect.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,17 +25,13 @@ def test_none_config():

def test_str_config(name_to_path):
config = {name: str(path) for name, path in name_to_path.items()}
# if path.endswith((".parquet", ".csv", ".csv.gz")) connect triggers register
with pytest.warns(FutureWarning, match="v9.1"):
conn = ibis.datafusion.connect(config)
conn = ibis.datafusion.connect(config)
assert sorted(conn.list_tables()) == sorted(name_to_path)


def test_path_config(name_to_path):
config = name_to_path
# if path.endswith((".parquet", ".csv", ".csv.gz")) connect triggers register
with pytest.warns(FutureWarning, match="v9.1"):
conn = ibis.datafusion.connect(config)
conn = ibis.datafusion.connect(config)
assert sorted(conn.list_tables()) == sorted(name_to_path)


Expand Down
3 changes: 2 additions & 1 deletion ibis/backends/datafusion/tests/test_register.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@

import pandas as pd
import pyarrow as pa
import pyarrow.dataset as ds
import pytest

import ibis
Expand Down Expand Up @@ -45,6 +44,8 @@ def test_register_batches(conn):


def test_register_dataset(conn):
import pyarrow.dataset as ds

tab = pa.table({"x": [1, 2, 3]})
dataset = ds.InMemoryDataset(tab)
with pytest.warns(FutureWarning, match="v9.1"):
Expand Down
28 changes: 26 additions & 2 deletions ibis/backends/druid/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,6 @@ class Backend(SQLBackend):
name = "druid"
compiler = sc.druid.compiler
supports_create_or_replace = False
supports_in_memory_tables = True

@property
def version(self) -> str:
Expand Down Expand Up @@ -78,7 +77,32 @@ def current_database(self) -> str:
return "druid"

def do_connect(self, **kwargs: Any) -> None:
"""Create an Ibis client using the passed connection parameters."""
"""Create an Ibis client using the passed connection parameters.
Examples
--------
>>> import ibis
>>> con = ibis.connect("druid://localhost:8082/druid/v2/sql?header=true")
>>> con.list_tables() # doctest: +ELLIPSIS
[...]
>>> t = con.table("functional_alltypes")
>>> t
DatabaseTable: functional_alltypes
__time timestamp
id int64
bool_col int64
tinyint_col int64
smallint_col int64
int_col int64
bigint_col int64
float_col float64
double_col float64
date_string_col string
string_col string
timestamp_col int64
year int64
month int64
"""
header = kwargs.pop("header", True)
self.con = pydruid.db.connect(**kwargs, header=header)

Expand Down
114 changes: 72 additions & 42 deletions ibis/backends/duckdb/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
import pyarrow_hotfix # noqa: F401
import sqlglot as sg
import sqlglot.expressions as sge
from packaging.version import parse as vparse

import ibis
import ibis.backends.sql.compilers as sc
Expand Down Expand Up @@ -158,12 +159,9 @@ def create_table(
properties.append(sge.TemporaryProperty())
catalog = "temp"

temp_memtable_view = None

if obj is not None:
if not isinstance(obj, ir.Expr):
table = ibis.memtable(obj)
temp_memtable_view = table.op().name
else:
table = obj

Expand All @@ -173,19 +171,6 @@ def create_table(
else:
query = None

column_defs = [
sge.ColumnDef(
this=sg.to_identifier(colname, quoted=self.compiler.quoted),
kind=self.compiler.type_mapper.from_ibis(typ),
constraints=(
None
if typ.nullable
else [sge.ColumnConstraint(kind=sge.NotNullColumnConstraint())]
),
)
for colname, typ in (schema or table.schema()).items()
]

if overwrite:
temp_name = util.gen_name("duckdb_table")
else:
Expand All @@ -194,7 +179,10 @@ def create_table(
initial_table = sg.table(
temp_name, catalog=catalog, db=database, quoted=self.compiler.quoted
)
target = sge.Schema(this=initial_table, expressions=column_defs)
target = sge.Schema(
this=initial_table,
expressions=(schema or table.schema()).to_sqlglot(self.dialect),
)

create_stmt = sge.Create(
kind="TABLE",
Expand Down Expand Up @@ -242,9 +230,6 @@ def create_table(
).sql(self.name)
)

if temp_memtable_view is not None:
self.con.unregister(temp_memtable_view)

return self.table(name, database=(catalog, database))

def table(
Expand Down Expand Up @@ -408,9 +393,8 @@ def do_connect(
Examples
--------
>>> import ibis
>>> ibis.duckdb.connect("database.ddb", threads=4, memory_limit="1GB")
<ibis.backends.duckdb.Backend object at ...>
>>> ibis.duckdb.connect(threads=4, memory_limit="1GB") # doctest: +ELLIPSIS
<ibis.backends.duckdb.Backend object at 0x...>
"""
if not isinstance(database, Path) and not database.startswith(
("md:", "motherduck:", ":memory:")
Expand Down Expand Up @@ -461,6 +445,11 @@ def _post_connect(self, extensions: Sequence[str] | None = None) -> None:
# Default timezone, can't be set with `config`
self.settings["timezone"] = "UTC"

# setting this to false disables magic variables-as-tables discovery,
# hopefully eliminating large classes of bugs
if vparse(self.version) > vparse("1"):
self.settings["python_enable_replacements"] = False

self._record_batch_readers_consumed = {}

def _load_extensions(
Expand Down Expand Up @@ -593,6 +582,7 @@ def read_json(
self,
source_list: str | list[str] | tuple[str],
table_name: str | None = None,
columns: Mapping[str, str] | None = None,
**kwargs,
) -> ir.Table:
"""Read newline-delimited JSON into an ibis table.
Expand All @@ -607,8 +597,13 @@ def read_json(
File or list of files
table_name
Optional table name
columns
Optional mapping from string column name to duckdb type string.
**kwargs
Additional keyword arguments passed to DuckDB's `read_json_auto` function
Additional keyword arguments passed to DuckDB's `read_json_auto` function.
See https://duckdb.org/docs/data/json/overview.html#json-loading
for parameters and more information about reading JSON.
Returns
-------
Expand All @@ -623,6 +618,21 @@ def read_json(
sg.to_identifier(key).eq(sge.convert(val)) for key, val in kwargs.items()
]

if columns:
options.append(
sg.to_identifier("columns").eq(
sge.Struct.from_arg_list(
[
sge.PropertyEQ(
this=sg.to_identifier(key),
expression=sge.convert(value),
)
for key, value in columns.items()
]
)
)
)

self._create_temp_view(
table_name,
sg.select(STAR).from_(
Expand Down Expand Up @@ -1026,9 +1036,8 @@ def list_tables(
>>> con.create_database("my_database")
>>> con.list_tables(database="my_database")
[]
>>> with con.begin() as c:
... c.exec_driver_sql("CREATE TABLE my_database.baz (a INTEGER)") # doctest: +ELLIPSIS
<...>
>>> con.raw_sql("CREATE TABLE my_database.baz (a INTEGER)") # doctest: +ELLIPSIS
<duckdb.duckdb.DuckDBPyConnection object at 0x...>
>>> con.list_tables(database="my_database")
['baz']
Expand Down Expand Up @@ -1301,17 +1310,24 @@ def register_filesystem(self, filesystem: AbstractFileSystem):
--------
>>> import ibis
>>> import fsspec
>>> ibis.options.interactive = True
>>> gcs = fsspec.filesystem("gcs")
>>> con = ibis.duckdb.connect()
>>> con.register_filesystem(gcs)
>>> t = con.read_csv(
... "gcs://ibis-examples/data/band_members.csv.gz",
... table_name="band_members",
... )
DatabaseTable: band_members
name string
band string
>>> t
┏━━━━━━━━┳━━━━━━━━━┓
┃ name ┃ band ┃
┡━━━━━━━━╇━━━━━━━━━┩
│ string │ string │
├────────┼─────────┤
│ Mick │ Stones │
│ John │ Beatles │
│ Paul │ Beatles │
└────────┴─────────┘
"""
self.con.register_filesystem(filesystem)

Expand Down Expand Up @@ -1376,6 +1392,10 @@ def to_pyarrow_batches(
For analytics use cases this is usually nothing to fret about. In some cases you
may need to explicit release the cursor.
::: {.callout-warning}
## DuckDB returns 1024 size batches regardless of what value of `chunk_size` argument is passed.
:::
Parameters
----------
expr
Expand All @@ -1385,10 +1405,7 @@ def to_pyarrow_batches(
limit
Limit the result to this number of rows
chunk_size
::: {.callout-warning}
## DuckDB returns 1024 size batches regardless of what argument is passed.
:::
The number of rows to fetch per batch
"""
self._run_pre_execute_hooks(expr)
table = expr.as_table()
Expand Down Expand Up @@ -1499,8 +1516,8 @@ def to_parquet(
Mapping of scalar parameter expressions to value.
**kwargs
DuckDB Parquet writer arguments. See
https://duckdb.org/docs/data/parquet#writing-to-parquet-files for
details
https://duckdb.org/docs/data/parquet/overview.html#writing-to-parquet-files
for details.
Examples
--------
Expand Down Expand Up @@ -1589,14 +1606,27 @@ def _get_schema_using_query(self, query: str) -> sch.Schema:
}
)

def _register_in_memory_table(self, op: ops.InMemoryTable) -> None:
name = op.name
def _in_memory_table_exists(self, name: str) -> bool:
try:
# this handles tables _and_ views
# this handles both tables and views
self.con.table(name)
except (duckdb.CatalogException, duckdb.InvalidInputException):
# only register if we haven't already done so
self.con.register(name, op.data.to_pyarrow(op.schema))
return False
else:
return True

def _register_in_memory_table(self, op: ops.InMemoryTable) -> None:
self.con.register(op.name, op.data.to_pyarrow(op.schema))

def _finalize_memtable(self, name: str) -> None:
# if we don't aggressively unregister tables duckdb will keep a
# reference to every memtable ever registered, even if there's no
# way for a user to access the operation anymore, resulting in a
# memory leak
#
# we can't use drop_table, because self.con.register creates a view, so
# use the corresponding unregister method
self.con.unregister(name)

def _register_udfs(self, expr: ir.Expr) -> None:
con = self.con
Expand Down
9 changes: 9 additions & 0 deletions ibis/backends/duckdb/tests/test_client.py
Original file line number Diff line number Diff line change
Expand Up @@ -417,3 +417,12 @@ def test_read_csv_with_types(tmp_path, input, all_varchar):
path.write_bytes(data)
t = con.read_csv(path, all_varchar=all_varchar, **input)
assert t.schema()["geom"].is_geospatial()


def test_memtable_doesnt_leak(con, monkeypatch):
monkeypatch.setattr(ibis.options, "default_backend", con)
name = "memtable_doesnt_leak"
assert name not in con.list_tables()
df = ibis.memtable({"a": [1, 2, 3]}, name=name).execute()
assert name not in con.list_tables()
assert len(df) == 3
13 changes: 13 additions & 0 deletions ibis/backends/duckdb/tests/test_register.py
Original file line number Diff line number Diff line change
Expand Up @@ -505,3 +505,16 @@ def test_memtable_null_column_parquet_dtype_roundtrip(con, tmp_path):
after = con.read_parquet(tmp_path / "tmp.parquet")

assert before.a.type() == after.a.type()


def test_read_json_no_auto_detection(con, tmp_path):
ndjson_data = """
{"year": 2007}
{"year": 2008}
{"year": 2009}
"""
path = tmp_path.joinpath("test.ndjson")
path.write_text(ndjson_data)

t = con.read_json(path, auto_detect=False, columns={"year": "varchar"})
assert t.year.type() == dt.string
3 changes: 2 additions & 1 deletion ibis/backends/duckdb/tests/test_udf.py
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,8 @@ def favg(x: float, where: bool = True) -> float: ...
def test_builtin_agg(con, func):
import ibis

raw_data = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0]
start, stop = 1, 11
raw_data = list(map(float, range(start, stop)))
data = ibis.memtable({"a": raw_data})
expr = func(data.a)

Expand Down
144 changes: 70 additions & 74 deletions ibis/backends/exasol/__init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
from __future__ import annotations

import atexit
import contextlib
import datetime
import re
Expand Down Expand Up @@ -42,7 +41,6 @@ class Backend(SQLBackend, CanCreateDatabase, CanCreateSchema):
compiler = sc.exasol.compiler
supports_temporary_tables = False
supports_create_or_replace = False
supports_in_memory_tables = False
supports_python_udfs = False

@property
Expand Down Expand Up @@ -83,6 +81,33 @@ def do_connect(
kwargs
Additional keyword arguments passed to `pyexasol.connect`.
Examples
--------
>>> import os
>>> import ibis
>>> host = os.environ.get("IBIS_TEST_EXASOL_HOST", "localhost")
>>> user = os.environ.get("IBIS_TEST_EXASOL_USER", "sys")
>>> password = os.environ.get("IBIS_TEST_EXASOL_PASSWORD", "exasol")
>>> schema = os.environ.get("IBIS_TEST_EXASOL_DATABASE", "EXASOL")
>>> con = ibis.exasol.connect(schema=schema, host=host, user=user, password=password)
>>> con.list_tables() # doctest: +ELLIPSIS
[...]
>>> t = con.table("functional_alltypes")
>>> t
DatabaseTable: functional_alltypes
id int32
bool_col boolean
tinyint_col int16
smallint_col int16
int_col int32
bigint_col int64
float_col float64
double_col float64
date_string_col string
string_col string
timestamp_col timestamp(3)
year int32
month int32
"""
if kwargs.pop("quote_ident", None) is not None:
raise com.UnsupportedArgumentError(
Expand Down Expand Up @@ -245,6 +270,9 @@ def _get_schema_using_query(self, query: str) -> sch.Schema:
finally:
self.con.execute(drop_view)

def _in_memory_table_exists(self, name: str) -> bool:
return self.con.meta.table_exists(name)

def _register_in_memory_table(self, op: ops.InMemoryTable) -> None:
schema = op.schema
if null_columns := [col for col, dtype in schema.items() if dtype.is_null()]:
Expand All @@ -253,57 +281,40 @@ def _register_in_memory_table(self, op: ops.InMemoryTable) -> None:
f"got null typed columns: {null_columns}"
)

# only register if we haven't already done so
if (name := op.name) not in self.list_tables():
quoted = self.compiler.quoted
column_defs = [
sg.exp.ColumnDef(
this=sg.to_identifier(colname, quoted=quoted),
kind=self.compiler.type_mapper.from_ibis(typ),
constraints=(
None
if typ.nullable
else [
sg.exp.ColumnConstraint(
kind=sg.exp.NotNullColumnConstraint()
)
]
),
)
for colname, typ in schema.items()
]

ident = sg.to_identifier(name, quoted=quoted)
create_stmt = sg.exp.Create(
kind="TABLE",
this=sg.exp.Schema(this=ident, expressions=column_defs),
)
create_stmt_sql = create_stmt.sql(self.name)

df = op.data.to_frame()
data = df.itertuples(index=False, name=None)

def process_item(item: Any):
"""Handle inserting timestamps with timezones."""
if isinstance(item, datetime.datetime):
if item.tzinfo is not None:
item = item.tz_convert("UTC").tz_localize(None)
return item.isoformat(sep=" ", timespec="milliseconds")
return item

rows = (tuple(map(process_item, row)) for row in data)
with self._safe_raw_sql(create_stmt_sql):
if not df.empty:
self.con.ext.insert_multi(name, rows)

atexit.register(self._clean_up_tmp_table, ident)
quoted = self.compiler.quoted
name = op.name

def _clean_up_tmp_table(self, ident: sge.Identifier) -> None:
with self._safe_raw_sql(
sge.Drop(kind="TABLE", this=ident, exists=True, cascade=True)
):
ident = sg.to_identifier(name, quoted=quoted)
create_stmt = sg.exp.Create(
kind="TABLE",
this=sg.exp.Schema(this=ident, expressions=schema.to_sqlglot(self.dialect)),
)
create_stmt_sql = create_stmt.sql(self.name)

df = op.data.to_frame()
data = df.itertuples(index=False, name=None)

def process_item(item: Any):
"""Handle inserting timestamps with timezones."""
if isinstance(item, datetime.datetime):
if item.tzinfo is not None:
item = item.tz_convert("UTC").tz_localize(None)
return item.isoformat(sep=" ", timespec="milliseconds")
return item

rows = (tuple(map(process_item, row)) for row in data)
with self._safe_raw_sql(create_stmt_sql):
if not df.empty:
self.con.ext.insert_multi(name, rows)

def _clean_up_tmp_table(self, name: str) -> None:
ident = sg.to_identifier(name, quoted=self.compiler.quoted)
sql = sge.Drop(kind="TABLE", this=ident, exists=True, cascade=True)
with self._safe_raw_sql(sql):
pass

_finalize_memtable = _clean_up_tmp_table

def create_table(
self,
name: str,
Expand Down Expand Up @@ -352,11 +363,9 @@ def create_table(

quoted = self.compiler.quoted

temp_memtable_view = None
if obj is not None:
if not isinstance(obj, ir.Expr):
table = ibis.memtable(obj)
temp_memtable_view = table.op().name
else:
table = obj

Expand All @@ -366,50 +375,37 @@ def create_table(
else:
query = None

type_mapper = self.compiler.type_mapper
column_defs = [
sge.ColumnDef(
this=sg.to_identifier(colname, quoted=quoted),
kind=type_mapper.from_ibis(typ),
constraints=(
None
if typ.nullable
else [sge.ColumnConstraint(kind=sge.NotNullColumnConstraint())]
),
)
for colname, typ in (schema or table.schema()).items()
]

if overwrite:
temp_name = util.gen_name(f"{self.name}_table")
else:
temp_name = name

table = sg.table(temp_name, catalog=database, quoted=quoted)
target = sge.Schema(this=table, expressions=column_defs)
if not schema:
schema = table.schema()

table_expr = sg.table(temp_name, catalog=database, quoted=quoted)
target = sge.Schema(
this=table_expr, expressions=schema.to_sqlglot(self.dialect)
)

create_stmt = sge.Create(kind="TABLE", this=target)

this = sg.table(name, catalog=database, quoted=quoted)
with self._safe_raw_sql(create_stmt):
if query is not None:
self.con.execute(
sge.Insert(this=table, expression=query).sql(self.name)
sge.Insert(this=table_expr, expression=query).sql(self.name)
)

if overwrite:
self.con.execute(
sge.Drop(kind="TABLE", this=this, exists=True).sql(self.name)
)
self.con.execute(
f"RENAME TABLE {table.sql(self.name)} TO {this.sql(self.name)}"
f"RENAME TABLE {table_expr.sql(self.name)} TO {this.sql(self.name)}"
)

if schema is None:
# Clean up temporary memtable if we've created one
# for in-memory reads
if temp_memtable_view is not None:
self.drop_table(temp_memtable_view)
return self.table(name, database=database)

# preserve the input schema if it was provided
Expand Down
5 changes: 2 additions & 3 deletions ibis/backends/flink/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,9 +66,8 @@ def do_connect(self, table_env: TableEnvironment) -> None:
>>> import ibis
>>> from pyflink.table import EnvironmentSettings, TableEnvironment
>>> table_env = TableEnvironment.create(EnvironmentSettings.in_streaming_mode())
>>> ibis.flink.connect(table_env)
<ibis.backends.flink.Backend at 0x...>
>>> ibis.flink.connect(table_env) # doctest: +ELLIPSIS
<ibis.backends.flink.Backend object at 0x...>
"""
self._table_env = table_env

Expand Down
4 changes: 2 additions & 2 deletions ibis/backends/flink/tests/test_compiler.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,9 +37,9 @@ def test_complex_projections(simple_table, assert_sql):


def test_filter(simple_table, assert_sql):
expr = simple_table[
expr = simple_table.filter(
((simple_table.c > 0) | (simple_table.c < 0)) & simple_table.g.isin(["A", "B"])
]
)
assert_sql(expr)


Expand Down
56 changes: 23 additions & 33 deletions ibis/backends/impala/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,8 +66,6 @@ class Backend(SQLBackend):
name = "impala"
compiler = sc.impala.compiler

supports_in_memory_tables = True

def _from_url(self, url: ParseResult, **kwargs: Any) -> Backend:
"""Connect to a backend using a URL `url`.
Expand Down Expand Up @@ -266,7 +264,7 @@ def raw_sql(self, query: str):
def _fetch_from_cursor(self, cursor, schema):
from ibis.formats.pandas import PandasData

results = fetchall(cursor)
results = fetchall(cursor, schema.names)
return PandasData.convert_table(results, schema)

@contextlib.contextmanager
Expand Down Expand Up @@ -513,7 +511,7 @@ def create_table(
if schema is not None:
schema = ibis.schema(schema)

if temp is not None:
if temp:
raise NotImplementedError(
"Impala backend does not yet support temporary tables"
)
Expand Down Expand Up @@ -1221,6 +1219,10 @@ def explain(

return "\n".join(["Query:", util.indent(query, 2), "", *results.iloc[:, 0]])

def _in_memory_table_exists(self, name: str) -> bool:
with contextlib.closing(self.con.cursor()) as cur:
return cur.table_exists(name)

def _register_in_memory_table(self, op: ops.InMemoryTable) -> None:
schema = op.schema
if null_columns := [col for col, dtype in schema.items() if dtype.is_null()]:
Expand All @@ -1229,40 +1231,28 @@ def _register_in_memory_table(self, op: ops.InMemoryTable) -> None:
f"got null typed columns: {null_columns}"
)

# only register if we haven't already done so
if (name := op.name) not in self.list_tables():
type_mapper = self.compiler.type_mapper
quoted = self.compiler.quoted
column_defs = [
sg.exp.ColumnDef(
this=sg.to_identifier(colname, quoted=quoted),
kind=type_mapper.from_ibis(typ),
# we don't support `NOT NULL` constraints in trino because
# because each trino connector differs in whether it
# supports nullability constraints, and whether the
# connector supports it isn't visible to ibis via a
# metadata query
)
for colname, typ in schema.items()
]
name = op.name
quoted = self.compiler.quoted

create_stmt = sg.exp.Create(
kind="TABLE",
this=sg.exp.Schema(
this=sg.to_identifier(name, quoted=quoted), expressions=column_defs
),
).sql(self.name, pretty=True)
create_stmt = sg.exp.Create(
kind="TABLE",
this=sg.exp.Schema(
this=sg.to_identifier(name, quoted=quoted),
expressions=schema.to_sqlglot(self.dialect),
),
).sql(self.name, pretty=True)

data = op.data.to_frame().itertuples(index=False)
insert_stmt = self._build_insert_template(name, schema=schema)
with self._safe_raw_sql(create_stmt) as cur:
for row in data:
cur.execute(insert_stmt, row)
data = op.data.to_frame().itertuples(index=False)
insert_stmt = self._build_insert_template(name, schema=schema)
with self._safe_raw_sql(create_stmt) as cur:
for row in data:
cur.execute(insert_stmt, row)


def fetchall(cur):
def fetchall(cur, names=None):
batches = cur.fetchcolumnar()
names = list(map(operator.itemgetter(0), cur.description))
if names is None:
names = list(map(operator.itemgetter(0), cur.description))
df = _column_batches_to_dataframe(names, batches)
return df

Expand Down
Original file line number Diff line number Diff line change
@@ -1 +1 @@
SELECT `t1`.`key`, SUM(((`t1`.`value` + 1) + 2) + 3) AS `abc` FROM (SELECT * FROM `t0` AS `t0` WHERE `t0`.`value` = 42) AS `t1` GROUP BY 1
SELECT `t1`.`key`, SUM(`t1`.`value` + 1 + 2 + 3) AS `abc` FROM (SELECT * FROM `t0` AS `t0` WHERE `t0`.`value` = 42) AS `t1` GROUP BY 1
Original file line number Diff line number Diff line change
@@ -1 +1 @@
SELECT `t1`.`key`, SUM(((`t1`.`value` + 1) + 2) + 3) AS `foo` FROM (SELECT * FROM `t0` AS `t0` WHERE `t0`.`value` = 42) AS `t1` GROUP BY 1
SELECT `t1`.`key`, SUM(`t1`.`value` + 1 + 2 + 3) AS `foo` FROM (SELECT * FROM `t0` AS `t0` WHERE `t0`.`value` = 42) AS `t1` GROUP BY 1
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
SELECT
LTRIM(`t0`.`string_col`) AS `LStrip(string_col)`
LTRIM(`t0`.`string_col`, ' \t\n\r\v\f') AS `LStrip(string_col)`
FROM `functional_alltypes` AS `t0`
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
SELECT
RTRIM(`t0`.`string_col`) AS `RStrip(string_col)`
RTRIM(`t0`.`string_col`, ' \t\n\r\v\f') AS `RStrip(string_col)`
FROM `functional_alltypes` AS `t0`
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
SELECT
TRIM(`t0`.`string_col`) AS `Strip(string_col)`
RTRIM(LTRIM(`t0`.`string_col`, ' \t\n\r\v\f'), ' \t\n\r\v\f') AS `Strip(string_col)`
FROM `functional_alltypes` AS `t0`
Original file line number Diff line number Diff line change
@@ -1,5 +1,3 @@
SELECT
(
`t0`.`a` + `t0`.`b`
) + `t0`.`c` AS `Add(Add(a, b), c)`
`t0`.`a` + `t0`.`b` + `t0`.`c` AS `Add(Add(a, b), c)`
FROM `alltypes` AS `t0`
2 changes: 1 addition & 1 deletion ibis/backends/impala/tests/test_bucket_histogram.py
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,6 @@ def test_bucket_assign_labels(table, snapshot):
labelled = size.tier.label(
["Under 0", "0 to 10", "10 to 25", "25 to 50"], nulls="error"
).name("tier2")
expr = size[labelled, size[1]]
expr = size.select(labelled, size[1])

snapshot.assert_match(translate(expr), "out.sql")
8 changes: 5 additions & 3 deletions ibis/backends/impala/tests/test_client.py
Original file line number Diff line number Diff line change
Expand Up @@ -88,16 +88,18 @@ def test_adapt_scalar_array_results(con, alltypes):
def test_interactive_repr_call_failure(con):
t = con.table("lineitem").limit(100000)

t = t[t, t.l_receiptdate.cast("timestamp").name("date")]
t = t.select(t, t.l_receiptdate.cast("timestamp").name("date"))

keys = [t.date.year().name("year"), "l_linestatus"]
filt = t.l_linestatus.isin(["F"])
expr = t[filt].group_by(keys).aggregate(t.l_extendedprice.mean().name("avg_px"))
expr = (
t.filter(filt).group_by(keys).aggregate(t.l_extendedprice.mean().name("avg_px"))
)

w2 = ibis.trailing_window(9, group_by=expr.l_linestatus, order_by=expr.year)

metric = expr["avg_px"].mean().over(w2)
enriched = expr[expr, metric]
enriched = expr.select(expr, metric)
with config.option_context("interactive", True):
repr(enriched)

Expand Down
14 changes: 8 additions & 6 deletions ibis/backends/impala/tests/test_ddl.py
Original file line number Diff line number Diff line change
Expand Up @@ -159,19 +159,21 @@ def test_insert_validate_types(con, alltypes, test_data_db, temp_table):

t = con.table(temp_table, database=db)

to_insert = expr[
to_insert = expr.select(
expr.tinyint_col, expr.smallint_col.name("int_col"), expr.string_col
]
)
t.insert(to_insert.limit(10))

to_insert = expr[
to_insert = expr.select(
expr.tinyint_col,
expr.smallint_col.cast("int32").name("int_col"),
expr.string_col,
]
)
t.insert(to_insert.limit(10))

to_insert = expr[expr.tinyint_col, expr.bigint_col.name("int_col"), expr.string_col]
to_insert = expr.select(
expr.tinyint_col, expr.bigint_col.name("int_col"), expr.string_col
)

limit_expr = to_insert.limit(10)
with pytest.raises(com.IbisError):
Expand Down Expand Up @@ -296,7 +298,7 @@ def test_query_delimited_file_directory(con, test_data_dir, temp_table):
table = con.delimited_file(hdfs_path, schema, name=temp_table, delimiter=",")

expr = (
table[table.bar > 0]
table.filter(table.bar > 0)
.group_by("foo")
.aggregate(
[
Expand Down
2 changes: 1 addition & 1 deletion ibis/backends/impala/tests/test_ddl_compilation.py
Original file line number Diff line number Diff line change
Expand Up @@ -168,7 +168,7 @@ def _get_ddl_string(props):

@pytest.fixture
def expr(t):
return t[t.bigint_col > 0]
return t.filter(t.bigint_col > 0)


def test_create_external_table_as(mockcon, snapshot):
Expand Down
38 changes: 20 additions & 18 deletions ibis/backends/impala/tests/test_exprs.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
def test_embedded_identifier_quoting(alltypes):
t = alltypes

expr = t[[(t.double_col * 2).name("double(fun)")]]["double(fun)"].sum()
expr = t.select((t.double_col * 2).name("double(fun)"))["double(fun)"].sum()
expr.execute()


Expand Down Expand Up @@ -134,7 +134,7 @@ def test_builtins(con, alltypes):

proj_exprs = [expr.name("e%d" % i) for i, expr in enumerate(exprs)]

projection = table[proj_exprs]
projection = table.select(proj_exprs)
projection.limit(10).execute()

_check_impala_output_types_match(con, projection)
Expand Down Expand Up @@ -352,7 +352,7 @@ def test_filter_predicates(con):

expr = t
for pred in predicates:
expr = expr[pred(expr)].select(expr)
expr = expr.filter(pred(expr)).select(expr)

expr.execute()

Expand Down Expand Up @@ -420,7 +420,7 @@ def test_decimal_timestamp_builtins(con):

proj_exprs = [expr.name("e%d" % i) for i, expr in enumerate(exprs)]

projection = table[proj_exprs].limit(10)
projection = table.select(proj_exprs).limit(10)
projection.execute()


Expand Down Expand Up @@ -461,10 +461,10 @@ def test_aggregations(alltypes):
d.var(how="pop"),
table.bool_col.any(),
table.bool_col.notany(),
-table.bool_col.any(),
~table.bool_col.any(),
table.bool_col.all(),
table.bool_col.notall(),
-table.bool_col.all(),
~table.bool_col.all(),
table.bool_col.count(where=cond),
d.sum(where=cond),
d.mean(where=cond),
Expand Down Expand Up @@ -520,7 +520,7 @@ def test_analytic_functions(alltypes):
def test_anti_join_self_reference_works(con, alltypes):
t = alltypes.limit(100)
t2 = t.view()
case = t[-((t.string_col == t2.string_col).any())]
case = t.filter(~((t.string_col == t2.string_col).any()))
con.explain(case)


Expand All @@ -540,7 +540,8 @@ def test_tpch_self_join_failure(con):
joined_all = (
region.join(nation, region.r_regionkey == nation.n_regionkey)
.join(customer, customer.c_nationkey == nation.n_nationkey)
.join(orders, orders.o_custkey == customer.c_custkey)[fields_of_interest]
.join(orders, orders.o_custkey == customer.c_custkey)
.select(fields_of_interest)
)

year = joined_all.odate.year().name("year")
Expand All @@ -554,7 +555,7 @@ def test_tpch_self_join_failure(con):
yoy = current.join(
prior,
((current.region == prior.region) & (current.year == (prior.year - 1))),
)[current.region, current.year, yoy_change]
).select(current.region, current.year, yoy_change)

# no analysis failure
con.explain(yoy)
Expand All @@ -577,14 +578,15 @@ def test_tpch_correlated_subquery_failure(con):
tpch = (
region.join(nation, region.r_regionkey == nation.n_regionkey)
.join(customer, customer.c_nationkey == nation.n_nationkey)
.join(orders, orders.o_custkey == customer.c_custkey)[fields_of_interest]
.join(orders, orders.o_custkey == customer.c_custkey)
.select(fields_of_interest)
)

t2 = tpch.view()
conditional_avg = t2[(t2.region == tpch.region)].amount.mean()
conditional_avg = t2.filter(t2.region == tpch.region).amount.mean()
amount_filter = tpch.amount > conditional_avg

expr = tpch[amount_filter].limit(0)
expr = tpch.filter(amount_filter).limit(0)

# impala can't plan this because its correlated subquery implementation is
# broken: it cannot detect the outer reference inside the inner query
Expand Down Expand Up @@ -622,7 +624,7 @@ def test_unions_with_ctes(con, alltypes):
)
expr2 = expr1.view()

join1 = expr1.join(expr2, expr1.string_col == expr2.string_col)[[expr1]]
join1 = expr1.join(expr2, expr1.string_col == expr2.string_col).select(expr1)
join2 = join1.view()

expr = join1.union(join2)
Expand Down Expand Up @@ -665,12 +667,12 @@ def test_where_with_timestamp(snapshot):

def test_filter_with_analytic(snapshot):
x = ibis.table(ibis.schema([("col", "int32")]), "x")
with_filter_col = x[x.columns + [ibis.null().name("filter")]]
filtered = with_filter_col[with_filter_col["filter"].isnull()]
subquery = filtered[filtered.columns]
with_filter_col = x.select(x.columns + [ibis.null().name("filter")])
filtered = with_filter_col.filter(with_filter_col["filter"].isnull())
subquery = filtered.select(filtered.columns)

with_analytic = subquery[["col", subquery.count().name("analytic")]]
expr = with_analytic[with_analytic.columns]
with_analytic = subquery.select("col", subquery.count().name("analytic"))
expr = with_analytic.select(with_analytic.columns)

snapshot.assert_match(ibis.impala.compile(expr), "out.sql")

Expand Down
2 changes: 1 addition & 1 deletion ibis/backends/impala/tests/test_in_not_in.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,6 @@ def test_literal_in_fields(table, method_name, snapshot):
def test_isin_notin_in_select(table, method_name, snapshot):
values = ["foo", "bar"]
method = getattr(table.g, method_name)
filtered = table[method(values)]
filtered = table.filter(method(values))
result = translate(filtered)
snapshot.assert_match(result, "out.sql")
6 changes: 4 additions & 2 deletions ibis/backends/impala/tests/test_partition.py
Original file line number Diff line number Diff line change
Expand Up @@ -111,7 +111,9 @@ def test_insert_select_partitioned_table(con, df, temp_table, unpart_t):
unique_keys = df[part_keys].drop_duplicates()

for i, (year, month) in enumerate(unique_keys.itertuples(index=False)):
select_stmt = unpart_t[(unpart_t.year == year) & (unpart_t.month == month)]
select_stmt = unpart_t.filter(
(unpart_t.year == year) & (unpart_t.month == month)
)

# test both styles of insert
if i:
Expand All @@ -132,7 +134,7 @@ def tmp_parted(con):

def test_create_partitioned_table_from_expr(con, alltypes, tmp_parted):
t = alltypes
expr = t[t.id <= 10][["id", "double_col", "month", "year"]]
expr = t.filter(t.id <= 10)[["id", "double_col", "month", "year"]]
name = tmp_parted
con.create_table(name, expr, partition=[t.year])
new = con.table(name)
Expand Down
Loading