This notebook tries to make the Volcanito architecture easier to understand by walking through one specific example.

In [1]:
import Pkg
Pkg.activate("..")

[32m[1m Activating[22m[39m environment at `~/Dropbox (Personal)/Coding Projects/Volcanito/Project.toml`


In [2]:
import DataFrames: DataFrame
import Volcanito: @select, materialize
import MacroTools: prettify

In [3]:
df = DataFrame(
    a = [1, 2, 3, 4],
    b = [0.1, 0.2, missing, 0.4],
)

Unnamed: 0_level_0,a,b
Unnamed: 0_level_1,Int64,Float64?
1,1,0.1
2,2,0.2
3,3,missing
4,4,0.4


Users write things like:

In [4]:
@select(df, c = a + b, d = a - b)



4×2 DataFrame
│ Row │ c        │ d        │
│     │ [90mFloat64?[39m │ [90mFloat64?[39m │
├─────┼──────────┼──────────┤
│ 1   │ 1.1      │ 0.9      │
│ 2   │ 2.2      │ 1.8      │
│ 3   │ [90mmissing[39m  │ [90mmissing[39m  │
│ 4   │ 4.4      │ 3.6      │

This macro call is expanded to a logical node constructor based on `FunctionSpec` objects per-expression in the input:

In [5]:
prettify(@macroexpand @select(df, c = a + b, d = a - b))

:(Volcanito.Projection(df, (Volcanito.FunctionSpec(:c, $(QuoteNode(:(a + b))), $(QuoteNode(:(c = a + b))), (:a, :b), Volcanito.Dict{Volcanito.Symbol, Volcanito.Int}(:a => 1, :b => 2), (t->t[1] + t[2]), ((a, b)->a + b), true, false, false), Volcanito.FunctionSpec(:d, $(QuoteNode(:(a - b))), $(QuoteNode(:(d = a - b))), (:a, :b), Volcanito.Dict{Volcanito.Symbol, Volcanito.Int}(:a => 1, :b => 2), (t->t[1] - t[2]), ((a, b)->a - b), true, false, false))))

A more readable version of that might look like:

```
Projection(
    df,
    (
        FunctionSpec(
            :c,
            :(a + b),
            :(c = a + b),
            (:a, :b),
            Dict{Symbol, Int}(:a => 1, :b => 2),
            t -> t[1] + t[2],
            (a, b) -> a + b,
            true,
            false,
            false,
        ),
        FunctionSpec(
            :d,
            :(a - b),
            :(d = a - b),
            (:a, :b),
            Dict{Symbol, Int}(:a => 1, :b => 2),
            t -> t[1] - t[2],
            (a, b) -> a - b,
            true,
            false,
            false,
        )
    )
)
```

Clearly, this is less enjoyable for users to write.

To evaluate a logical node and get a DataFrame back, we use the `materialize` function:

In [6]:
materialize(@select(df, c = a + b, d = a - b))

Unnamed: 0_level_0,c,d
Unnamed: 0_level_1,Float64?,Float64?
1,1.1,0.9
2,2.2,1.8
3,missing,missing
4,4.4,3.6


This last step passes through all three phases in one line:

1. The user-facing `@select` macro is called.
2. The macro is expanded to a `Projection` logical node constructor.
3. `materialize` is called on the `Projection` object, which performs compute using a specific physical operation that's specialized for evaluating a projection on a DataFrame.