In [2]:
#| code-fold: true
from IPython.core.interactiveshell import InteractiveShell

# `ast_node_interactivity` is a setting that determines how the return value of the last line in a cell is displayed
# with `last_expr_or_assign`, the return value of the last expression is displayed unless it is assigned to a variable
InteractiveShell.ast_node_interactivity = "last_expr_or_assign"

There's an excellent blog post on why Pandas feels clunky for those coming from R:

<https://www.sumsar.net/blog/pandas-feels-clunky-when-coming-from-r/>

However in Python, I've found `ibis` as an alternative to `pandas` to be a much more natural fit for those coming from `R`. 

[`ibis`](https://ibis-project.org/) uses duckdb as a backend by default.

In [3]:
import ibis

`_` in ibis is a special variable that refers to the last expression evaluated
this is useful for chaining operations or for using the result of the last expression in subsequent operations


In [4]:
from ibis import _

By default, `ibis` defers execution until you call `execute()`. Using `ibis.options.interactive = True` will make it so that expressions are immediately executed when displayed. This is useful for interactive exploration.

In [5]:

ibis.options.interactive = True

Here's the equivalent code in `ibis` for the example provided in the blog post:



In [6]:
df = ibis.read_csv("purchases.csv")

> “How much do we sell..? Let’s take the total sum!”

In [7]:
df.amount.sum().execute()

17210

> “Ah, they wanted it by country…”

In [8]:
(
    df
    .group_by("country")
    .aggregate(total=_.amount.sum())
)

> “And I guess I should deduct the discount.”

In [9]:
(
    df
    .group_by("country")
    .aggregate(total=(_.amount - _.discount).sum())
)

In [10]:
(
    df
    .mutate(median=_.amount.median())
    .filter(_.amount <= _.median * 10)
    .group_by("country")
    .aggregate(total=(_.amount - _.discount).sum())
)

In [11]:
(
    df
    .join(
        df.group_by("country").aggregate(median=_.amount.median()),
        predicates=["country"]
    )
    .filter(_.amount <= _.median * 10)
    .group_by("country")
    .aggregate(total=(_.amount - _.discount).sum())
    .order_by("country")
)