# Custom functions

## map

- Used mainly for passing in an expression to a third-party library
- `map` gets applied before aggregation

In [1]:
import polars as pl
from polars import col

In [3]:
df = pl.DataFrame({
    "keys": ['a', 'a', 'b'],
    "values": [10, 7, 1]
})
df

keys,values
str,i64
"""a""",10
"""a""",7
"""b""",1


In [4]:
df.groupby('keys', maintain_order=True).agg([
    col('values').map(lambda s: s.shift()).alias('shift_map'),
    col('values').shift().alias("shift_expression")
])

keys,shift_map,shift_expression
str,list[i64],list[i64]
"""a""","[null, 10]","[null, 10]"
"""b""",[7],[null]


For `map`, the dataframe was first shifted, and then aggregated. In general, it is not recommended to mix `map` and `groupby` statements.

## apply

- `apply` works on the smallest logical elements for the operation

In [5]:
df.groupby('keys', maintain_order=True).agg([
    col('values').apply(lambda s: s.shift()).alias('shift_apply'),
    col('values').shift().alias("shift_expression")
])

keys,shift_apply,shift_expression
str,list[i64],list[i64]
"""a""","[null, 10]","[null, 10]"
"""b""",[null],[null]


This time we get the same answer. But the `apply` function is passing elements of the column to a python function, which will be slow.

To access the value of multiple columns in a single `apply`, you can create a `pl.struct`

In [6]:
df.select([
    pl.struct(['keys', 'values']).apply(lambda x: len(x['keys']) + x['values']).alias('solution_apply'),
    (col('keys').str.lengths() + col('values')).alias('solution_expression')
])

solution_apply,solution_expression
i64,i64
11,11
8,8
2,2


Overall, don't use `map` or `apply` if you can get away with it.