# Ibis v6.1.0

02 August 2023

## Overview

Ibis 6.1.0 is a minor release that includes new features, backend improvements, bug fixes, documentation improvements, and refactors. We are excited to see further adoption of the dataframe interchange protocol enabling visualization and other libraries to be used more easily with Ibis.

You can view the full changelog in [the release notes](/release_notes/).

If you're new to Ibis, see [how to install](/install/) and [the getting started tutorial](/tutorial/getting_started/).

To follow along with this blog, ensure you're on `'ibis-framework>=6.1,<7'`. First, we'll setup Ibis and fetch some sample data to use.

In [1]:
import ibis
import ibis.selectors as s

ibis.__version__

'6.1.0'

In [2]:
# interactive mode for demo purposes
ibis.options.interactive = True

In [3]:
t = ibis.examples.penguins.fetch()
t = t.mutate(year=t["year"].cast("str"))
t.limit(3)

## Ecosystem integrations

With the introduction of `__dataframe__` support in v6.0.0 and efficiency improvements in this release, Ibis now works with [Altair](https://altair-viz.github.io/index.html), [Plotly](https://plotly.com/python/), [plotnine](https://plotnine.readthedocs.io/en/stable/), and any other visualization library that implements the protocol. This enables passing Ibis tables directly to visualization libraries without a `.to_pandas()` or `to_pyarrow()` call for any of the 15+ backends supported, with data efficiently transferred through Apache Arrow.

**Important**: as of writing this blog, a developer version of Altair is required.

In [4]:
import altair as alt

chart = (
    alt.Chart(t, width=500)
    .mark_point(size=30)
    .encode(
        x=alt.X("bill_length_mm").scale(zero=False),
        y=alt.Y("bill_depth_mm").scale(zero=False),
        color="species",
        shape="island",
    )
    .interactive()
)

chart

A more modular, composable, and scalable way of working with data is taking shape with `__dataframe__` and `__array__` support in Ibis and increasingly the Python data ecosystem. Let's combine the above with PCA after some preprocessing in Ibis to visualize all numeric columns in 2D.

In [5]:
def transform(t):
    # compute the z score
    t = t.mutate(
        s.across(s.numeric(), {"zscore": lambda x: (x - x.mean()) / x.std()})
    ).dropna()  # drop rows with missing values
    return t


f = transform(t)
f

In [6]:
from sklearn.decomposition import PCA

# select "features" as X
X = f.select(s.contains("zscore"))

# get the the first 2 principal components to visualize
n_components = 2
pca = PCA(n_components=n_components).fit(X)

# transform the table to get the principal components
t_pca = ibis.memtable(pca.transform(X)).relabel({"col0": "pc1", "col1": "pc2"})

# join the original table with the PCA table, assuming the order is the same
f = f.mutate(row_number=ibis.row_number().over()).join(
    t_pca.mutate(row_number=ibis.row_number().over()), "row_number"
)

# plot the first 2 principal components
chart = (
    alt.Chart(f, width=500)
    .mark_point(size=30)
    .encode(
        x=alt.X("pc1").scale(zero=False),
        y=alt.Y("pc2").scale(zero=False),
        color="species",
        shape="island",
    )
    .interactive()
)
chart

## Backends

Numerous backends received improvements. See the [release notes](/release_notes/) for more details.

### DataFusion

The DataFusion backend (and a few others) received several improvements from community member [@mesejo](https://github.com/mesejo) with memtables and many new operations now supported. Some highlights include:

In [7]:
url = ibis.literal("https://ibis-project.org/concepts/why_ibis")
con = ibis.datafusion.connect()

con.execute(url.host())

'ibis-project.org'

In [8]:
con.execute(url.path())

'/concepts/why_ibis'

In [9]:
con.execute(ibis.literal("aaabbbaaa").re_search("bbb"))

True

In [10]:
con.execute(ibis.literal(5.56).ln())

1.715598108262491

In [11]:
con.execute(ibis.literal(5.56).log10())

0.7450747915820575

In [12]:
con.execute(ibis.literal(5.56).radians())

0.09704030641088471

### BigQuery

Some remaining gaps in `CREATE TABLE` DDL options for BigQuery have been filled in, including the ability to pass in `overwrite=True` for table creation.


### PySpark

The PySpark backend now supports reading/writing Delta Lake tables. Your PySpark session must be configured to use the Delta Lake package and you must have the `delta` package installed in your environment.


```python
t = ibis.read_delta("/path/to/delta")

...

t.to_delta("/path/to/delta", mode="overwrite")
```

### Trino

The `.sql` API is now supported in Trino, enabling you to chain Ibis and SQL together.

### SQLite

Scalar Python UDFs are now supported in SQLite.

Additionally, URL parsing has been added:

In [13]:
con = ibis.sqlite.connect()

con.execute(url.host())

'ibis-project.org'

In [14]:
con.execute(url.path())

'/concepts/why_ibis'

### pandas

URL parsing support was added.

In [15]:
con = ibis.pandas.connect()

con.execute(url.host())

'ibis-project.org'

In [16]:
con.execute(url.path())

'/concepts/why_ibis'

## Functionality

### `.nunique()` supported on tables

You can now call `.nunique()` on tables to get the number of unique rows.

In [17]:
# how many unique rows are there? equivalent to `.count()` in this case
t.nunique()

[1;36m344[0m

In [18]:
# how many unique species/island/year combinations are there?
t.select("species", "island", "year").nunique()

[1;36m15[0m

### `to_sql` returns a `str` type

The `ibis.expr.sql.SQLString` type resulting from `to_sql` is now a proper `str` subclass, enabling use without casting to `str` first.

In [19]:
type(ibis.to_sql(t))

ibis.expr.sql.SQLString

In [20]:
issubclass(type(ibis.to_sql(t)), str)

True

### Allow mixing literals and columns in `ibis.array`

Note that arrays must still be of a single type.

In [21]:
ibis.array([t["species"], "hello"])

In [22]:
ibis.array([t["flipper_length_mm"], 42])

### Array `concat` and `repeat` methods

You can still use `+` or `*` in typical Python fashion, with new and more explicit `concat` and `repeat` methods added in this release.

In [23]:
a = ibis.array([1, 2, 3])
b = ibis.array([4, 5])

c = a.concat(b)
c

[1m[[0m[1;36m1[0m, [1;36m2[0m, [1;36m3[0m, [1;36m4[0m, [1;36m5[0m[1m][0m

In [24]:
c = a + b
c

[1m[[0m[1;36m1[0m, [1;36m2[0m, [1;36m3[0m, [1;36m4[0m, [1;36m5[0m[1m][0m

In [25]:
b.repeat(2)

[1m[[0m[1;36m4[0m, [1;36m5[0m, [1;36m4[0m, [1;36m5[0m[1m][0m

In [26]:
b * 2

[1m[[0m[1;36m4[0m, [1;36m5[0m, [1;36m4[0m, [1;36m5[0m[1m][0m

### Support boolean literals in the join API

This allows for joins with boolean predicates.

In [27]:
t.join(t, True)

In [28]:
t.join(t, False)

In [29]:
t.join(t, False, how="outer")

## Refactors

Several internal refactors that shouldn't affect normal usage were made. See [the release notes](/release_notes/) for more details.

## Wrapping up

Ibis v6.1.0 brings exciting enhancements to the library that enable broader ecosystem adoption of Python standards.

As always, try Ibis by [installing](https://ibis-project.org/install/) and [getting started](/tutorial/getting_started/).

If you run into any issues or find support is lacking for your backend, [open an issue](https://github.com/ibis-project/issues/new/choose) or [discussion](https://github.com/ibis-project/discussions/new/choose) and let us know!