## Selecting columns 5: Transforming and adding multiple columns
By the end of this lesson you will be able to:
- transform multiple columns in-place
- add multiple columns
- transform and add multiple columns is less verbose ways

In [1]:
import polars as pl
import polars.selectors as cs

In [5]:
csv_file = "../../Files/Sample_Superstore.csv"

In [6]:
df = pl.read_csv(csv_file)


In [7]:
df.head(3)

Row_ID,Order_ID,Order_Date,Ship Date,Ship_Mode,Customer_ID,Customer_Name,Segment,Country,City,State,Postal_Code,Region,Product_ID,Category,Sub_Category,Product_Name,Sales,Quantity,Discount,Profit
i64,str,str,str,str,str,str,str,str,str,str,i64,str,str,str,str,str,f64,i64,f64,f64
1,"""CA-2016-152156""","""11/8/2016""","""11/11/2016""","""Second Class""","""CG-12520""","""Claire Gute""","""Consumer""","""United States""","""Henderson""","""Kentucky""",42420,"""South""","""FUR-BO-10001798""","""Furniture""","""Bookcases""","""Bush Somerset Collection Bookc…",261.96,2,0.0,41.9136
2,"""CA-2016-152156""","""11/8/2016""","""11/11/2016""","""Second Class""","""CG-12520""","""Claire Gute""","""Consumer""","""United States""","""Henderson""","""Kentucky""",42420,"""South""","""FUR-CH-10000454""","""Furniture""","""Chairs""","""Hon Deluxe Fabric Upholstered …",731.94,3,0.0,219.582
3,"""CA-2016-138688""","""6/12/2016""","""6/16/2016""","""Second Class""","""DV-13045""","""Darrin Van Huff""","""Corporate""","""United States""","""Los Angeles""","""California""",90036,"""West""","""OFF-LA-10000240""","""Office Supplies""","""Labels""","""Self-Adhesive Address Labels f…",14.62,2,0.0,6.8714


## Transforming existing columns

We can transform multiple existing columns by either passing a `list` of expressions to `with_columns` or comma-separated expressions.

Here we pass comma-separated expressions to round the floating columns to 0 decimal places

In [8]:
(
    pl.read_csv(csv_file)
    .select("Profit","Discount")
    .with_columns(
        pl.col('Profit').round(0),
        pl.col('Discount').round(0),
    )
    .head(3)
)

Profit,Discount
f64,f64
42.0,0.0
220.0,0.0
7.0,0.0


We can make this less verbose, however.

As we are applying the same transformation to the `Profit` and `Discount` columns we can pass them both to the same `pl.col` as comma-separated column names

In [9]:
(
    pl.read_csv(csv_file)
    .select("Profit","Discount")
    .with_columns(
        pl.col('Profit','Discount').round(0),
    )
    .head(5)
)

Profit,Discount
f64,f64
42.0,0.0
220.0,0.0
7.0,0.0
-383.0,0.0
3.0,0.0


In this example `Sales`, `Profit` and `Discount` are the only float columns. This means that we can instead pass their dtype to `pl.col` to apply the `round` expression to all float columns

In [11]:
(
    pl.read_csv(csv_file)
    .select("Profit","Discount","Sales")
    .with_columns(
        pl.col(pl.Float64).round(0),
    )
    .head(3)
)

Profit,Discount,Sales
f64,f64,f64
42.0,0.0,262.0
220.0,0.0,732.0
7.0,0.0,15.0


Or we can use selectors to select the columns that we want to round

In [12]:
(
    pl.read_csv(csv_file)
    .select("Profit","Discount","Sales")
    .with_columns(
        cs.float().round(0),
    )
    .head(3)
)

Profit,Discount,Sales
f64,f64,f64
42.0,0.0,262.0
220.0,0.0,732.0
7.0,0.0,15.0


## Adding new columns from existing columns
Above we overwrite the existing `Profit` and `Discount` columns in the `with_columns` statements

We can instead create new columns from existing columns with `alias`. 

In this example we add the rounded `Profit` and `Discount` as new columns

In [14]:
(
    pl.read_csv(csv_file)
    .with_columns(
        pl.col('Profit').round(0).alias('Profit_round'),
        pl.col('Discount').round(0).alias('Discount_round')
    )
    .select(
        'Profit', 'Profit_round', 'Discount', 'Discount_round',
    )
    .head(3)
)

Profit,Profit_round,Discount,Discount_round
f64,f64,f64,f64
41.9136,42.0,0.0,0.0
219.582,220.0,0.0,0.0
6.8714,7.0,0.0,0.0


As an alternative to `alias` we can use comma-separated keyword assignments

In [20]:
(
    pl.read_csv(csv_file)
    .with_columns(
        Profit_round = pl.col('Profit').round(0),
        Discount_round = pl.col('Discount').round(0),
    )
    .select(
        'Profit', 'Profit_round', 'Discount', 'Discount_round',
    )
    .head(3)
)

Profit,Profit_round,Discount,Discount_round
f64,f64,f64,f64
41.9136,42.0,0.0,0.0
219.582,220.0,0.0,0.0
6.8714,7.0,0.0,0.0


Note that if you mix the `alias` and keyword assignment approach in the same `with_columns` the keyword assignments must come after the `alias` expressions.

When should you use `alias` and when should you use the keyword approach?
- There is no performance difference between the `alias` and keyword approach
- You might find the keyword approach more readable in some cases
- You can use python variables inside an `alias` but not with keyword assignment

## Creating new columns when working with multiple expressions
We can still use the less verbose multi-expression approaches we saw above when we want to create new columns.

In this example we round the float columns as new columns by adding the `_round` using `name.suffix`

In [21]:
(
    pl.read_csv(csv_file)
    .with_columns(
        pl.col(pl.Float64).round(0).name.suffix("_round"),
    )
    .select(
        'Profit','Profit_round','Discount','Discount_round',
    )
    .head(3)
)

Profit,Profit_round,Discount,Discount_round
f64,f64,f64,f64
41.9136,42.0,0.0,0.0
219.582,220.0,0.0,0.0
6.8714,7.0,0.0,0.0
