# Contexts

You can't use an expression anywhere. An expression needs a context, the available contexts are:

- Selection: `df.select([..])`
- Groupby aggregation: `df.groupby(..).agg([..])`
- hstack/ add columns: `df.with_columns([..])`

In [4]:
import numpy as np
import polars as pl
from polars import col, lit

np.random.seed(12)

df = pl.DataFrame(
    {
        "nrs": [1, 2, 3, None, 5],
        "names": ["foo", "ham", "spam", "egg", None],
        "random": np.random.rand(5),
        "groups": ["A", "A", "B", "C", "B"],
    }
)
df

nrs,names,random,groups
i64,str,f64,str
1.0,"""foo""",0.154163,"""A"""
2.0,"""ham""",0.74005,"""A"""
3.0,"""spam""",0.263315,"""B"""
,"""egg""",0.533739,"""C"""
5.0,,0.014575,"""B"""


## Syntactic sugar


Even when you use one of the above contexts in eager mode, you are actually still using the Polars lazy API, enabling Polars to push the expression into the query engine, do optimizations, and cache intermediate results. 

For example:

```python
df.groupby("foo").agg([col("bar").sum()])
```

desugars to

```python
(df.lazy().groupby("foo").agg([col("bar").sum()])).collect()
```

## Select context

In the select context, the selection applies expressions over **columns**. The expressions in this context must produce `Seres` that are all the same length or have a length of `1` (so that they can be broadcast). `select` may produce new columns that are aggregations, combinations of expressions, or literals.

In [2]:
df.select([
    pl.sum("nrs"),
    col("names").sort(),
    col("names").first().alias("first name"),
    (pl.mean("nrs") * 10).alias("10xnrs"),
])

nrs,names,first name,10xnrs
i64,str,str,f64
11,,"""foo""",27.5
11,"""egg""","""foo""",27.5
11,"""foo""","""foo""",27.5
11,"""ham""","""foo""",27.5
11,"""spam""","""foo""",27.5


## Add columns

Adding columns to a `DataFrame` using `with_columns` is also the `selection` context

In [3]:
df.with_columns([
    pl.sum("nrs").alias("nrs_sum"),
    col("random").count().alias("count"),
])

nrs,names,random,groups,nrs_sum,count
i64,str,f64,str,i64,u32
1.0,"""foo""",0.154163,"""A""",11,5
2.0,"""ham""",0.74005,"""A""",11,5
3.0,"""spam""",0.263315,"""B""",11,5
,"""egg""",0.533739,"""C""",11,5
5.0,,0.014575,"""B""",11,5


## Groupby context

The `groupby` context works on groups, and thus may yield resutls of any length

In [7]:
df.groupby("groups").agg([
    pl.sum("nrs").alias("sum"),
    col("random").count().alias("count"),
    # sum random where name != null
    col("random").filter(col("names").is_not_null()).sum().suffix("_sum"),
    col("names").reverse().alias("reversed_names"),
])

groups,sum,count,random_sum,reversed_names
str,i64,u32,f64,list[str]
"""C""",,1,0.533739,"[""egg""]"
"""A""",3.0,2,0.894213,"[""ham"", ""foo""]"
"""B""",8.0,2,0.263315,"[null, ""spam""]"


There are a few other `groupby` contexts, like `groupby_dynamic` and `groupby_rolling` for time-based grouping.