## Selecting columns 4: Transforming and adding a column
By the end of this lecture you will be able to:
- transform an existing column in place using `with_columns`
- add a new column with an expression
- add a new column with column arithmetic
- add a column with constant values using `pl.lit`

In [1]:
import polars as pl

In [4]:
csv_file = "../../Files/Sample_Superstore.csv"

In [5]:
df = pl.read_csv(csv_file)
df.head(3)

Row_ID,Order_ID,Order_Date,Ship Date,Ship_Mode,Customer_ID,Customer_Name,Segment,Country,City,State,Postal_Code,Region,Product_ID,Category,Sub_Category,Product_Name,Sales,Quantity,Discount,Profit
i64,str,str,str,str,str,str,str,str,str,str,i64,str,str,str,str,str,f64,i64,f64,f64
1,"""CA-2016-152156""","""11/8/2016""","""11/11/2016""","""Second Class""","""CG-12520""","""Claire Gute""","""Consumer""","""United States""","""Henderson""","""Kentucky""",42420,"""South""","""FUR-BO-10001798""","""Furniture""","""Bookcases""","""Bush Somerset Collection Bookc…",261.96,2,0.0,41.9136
2,"""CA-2016-152156""","""11/8/2016""","""11/11/2016""","""Second Class""","""CG-12520""","""Claire Gute""","""Consumer""","""United States""","""Henderson""","""Kentucky""",42420,"""South""","""FUR-CH-10000454""","""Furniture""","""Chairs""","""Hon Deluxe Fabric Upholstered …",731.94,3,0.0,219.582
3,"""CA-2016-138688""","""6/12/2016""","""6/16/2016""","""Second Class""","""DV-13045""","""Darrin Van Huff""","""Corporate""","""United States""","""Los Angeles""","""California""",90036,"""West""","""OFF-LA-10000240""","""Office Supplies""","""Labels""","""Self-Adhesive Address Labels f…",14.62,2,0.0,6.8714


## Transforming an existing column

We can transform an existing column by passing the column to `Select`.

In this example we round `Profit` to 0 significant figures.

In [10]:
(
    pl.read_csv(csv_file).select("Profit")
    .with_columns(
        pl.col("Profit").round(0)
        )
    .head(3)
)

Profit
f64
42.0
220.0
7.0


## Adding a new column from an existing column
We can create a new column from an existing column by renaming it with `alias`

In [13]:
(
    pl.read_csv(csv_file)
    .select("Profit")
    .with_columns(
        pl.col('Profit').round(0).alias('roundFProfit')
    )
    .head(3)
)

Profit,roundFProfit
f64,f64
41.9136,42.0
219.582,220.0
6.8714,7.0


Instead of using `alias` we can also create the new column by assigning the column name equal to the expression (this approach in Polars is referred to as kwargs assignment) 

In [14]:
(
    pl.read_csv(csv_file)
    .select("Profit")
    .with_columns(
        RoundProfit = pl.col('Profit').round(0)
    )
    .head(3)
)

Profit,RoundProfit
f64,f64
41.9136,42.0
219.582,220.0
6.8714,7.0


## Difference between `with_columns` and `select`
- The `select` method returns a subset of the columns but `with_columns` method returns all of the columns
- `with_columns` accepts expressions only - no strings

## Adding or transforming a column with column arithmetic

We can transform columns with arithmetic in an expression.

In this example we double the values in the `Profit` column in a new column called `doubleProfit`

In [15]:
(
    pl.read_csv(csv_file)
    .select("Profit")
    .with_columns(
        (pl.col("Profit") * 2).alias("doubleProfit")
    )
    .head(3)
)

Profit,doubleProfit
f64,f64
41.9136,83.8272
219.582,439.164
6.8714,13.7428


We can also do arithmetic multiple columns in an expression.

In this examle we add the values in the `Profit` and `Discount` column

In [16]:
(
    pl.read_csv(csv_file)
    .select("Profit","Discount")
    .with_columns(
        (pl.col("Profit") + pl.col("Discount")).alias("ProfitPlusDiscount")
    )
    .head(2)
)

Profit,Discount,ProfitPlusDiscount
f64,f64,f64
41.9136,0.0,41.9136
219.582,0.0,219.582


Some people feel text arithmetic expressions are more readable. 

We do the same example as above but with the `.add` operator rather than `+` 

In [17]:
(
    pl.read_csv(csv_file)
    .select("Profit","Discount")
    .with_columns(
        pl.col('Profit').add(pl.col('Discount')).alias('ProfitPlusDiscount')
    )
    .head(2)
)


Profit,Discount,ProfitPlusDiscount
f64,f64,f64
41.9136,0.0,41.9136
219.582,0.0,219.582


The mapping from python operators to expressions are:
- `==` to `eq`
- `//` to `floordiv`
- `> ` to `gt`
- `>=` to `ge`
- `< ` to `lt`
- `<=` to `le`
- `% ` to `mod`
- `!=` to `ne`
- `- ` to `sub`
- `/ ` to `truediv`
- `^ ` to `xor`
- `* ` to `mul`

## Adding a new column with a constant value

Use the literal function `pl.lit` to specify a constant value in Polars.

Here we add a new column called `Aboard` with a value `yes` for all passengers 

In [16]:
(
    pl.read_csv(csv_file)
    .with_columns(
        pl.lit('yes').alias('Aboard')
    )
    .select(['Customer_Name','Aboard'])
    .head(2)
)

Customer_Name,Aboard
str,str
"""Claire Gute""","""yes"""
"""Claire Gute""","""yes"""
