# 4. Data Manipulation II - Advanced Selecting

The goal of this module is to become familiar with some of the more advanced ways of selecting, adding and removing columns. Some of the topics we'll cover are:
1. How to select and perform operations on multiple columns at the same time with `pl.col()`.
2. How to add new columns rather than selecting columns with `pl.DataFrame.with_columns()`.
3. How to drop columns with `pl.DataFrame.drop()`.
4. How to rename columns with `pl.DataFrame.rename()`.

But first we import `polars`...

In [None]:
%pip install -U polars

In [1]:
import polars as pl

pl.__version__

'1.18.0'

In [3]:
%run setup.py

File /data/datasets/data/yellow_tripdata_2024-03.parquet already exists, skipping download.


... and load the data.

In [4]:
df = pl.read_parquet(local_parquet)

## 4.1. Operating On Multiple Columns At The Same Time

In the previous module, we saw how to select and perform computations on a particular column by way of the `pl.col()` function. This allowed us to, in one instance, convert miles to kilometers for the column `"trip_distance"`:

In [5]:
kilometers_per_mile = 1.61
(
    df.select(
        [
            pl.col("trip_distance").name.suffix("_miles"),
            (pl.col("trip_distance") * kilometers_per_mile).name.suffix(
                "_kilometers"
            ),
        ]
    ).head()
)

trip_distance_miles,trip_distance_kilometers
f64,f64
1.3,2.093
1.1,1.771
0.86,1.3846
0.82,1.3202
4.9,7.889


Let's turn our attention now to another type of conversion--currency conversion. Let's say that we want to convert currencies from USD to Euros. We can do this by multiplying all the columns that we're interested in by a constant, as with the kilometers conversion:

In [6]:
eur_per_usd = 0.92  # As of 2024-05-27.
(
    df.select(
        [
            # Payment amounts in USD, explicitly named as such.
            pl.col("fare_amount").name.suffix("_usd"),
            pl.col("extra").name.suffix("_usd"),
            pl.col("mta_tax").name.suffix("_usd"),
            pl.col("tip_amount").name.suffix("_usd"),
            pl.col("tolls_amount").name.suffix("_usd"),
            pl.col("improvement_surcharge").name.suffix("_usd"),
            pl.col("total_amount").name.suffix("_usd"),
            pl.col("congestion_surcharge").name.suffix("_usd"),
            pl.col("Airport_fee").name.suffix("_usd"),
            # Payment amounts, in Euros.
            (pl.col("fare_amount") * eur_per_usd).name.suffix("_eur"),
            (pl.col("extra") * eur_per_usd).name.suffix("_eur"),
            (pl.col("mta_tax") * eur_per_usd).name.suffix("_eur"),
            (pl.col("tip_amount") * eur_per_usd).name.suffix("_eur"),
            (pl.col("tolls_amount") * eur_per_usd).name.suffix("_eur"),
            (pl.col("improvement_surcharge") * eur_per_usd).name.suffix(
                "_eur"
            ),
            (pl.col("total_amount") * eur_per_usd).name.suffix("_eur"),
            (pl.col("congestion_surcharge") * eur_per_usd).name.suffix("_eur"),
            (pl.col("Airport_fee") * eur_per_usd).name.suffix("_eur"),
        ]
    ).head()
)

fare_amount_usd,extra_usd,mta_tax_usd,tip_amount_usd,tolls_amount_usd,improvement_surcharge_usd,total_amount_usd,congestion_surcharge_usd,Airport_fee_usd,fare_amount_eur,extra_eur,mta_tax_eur,tip_amount_eur,tolls_amount_eur,improvement_surcharge_eur,total_amount_eur,congestion_surcharge_eur,Airport_fee_eur
f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64
8.6,3.5,0.5,2.7,0.0,1.0,16.3,2.5,0.0,7.912,3.22,0.46,2.484,0.0,0.92,14.996,2.3,0.0
7.2,3.5,0.5,3.0,0.0,1.0,15.2,2.5,0.0,6.624,3.22,0.46,2.76,0.0,0.92,13.984,2.3,0.0
7.9,1.0,0.5,0.0,0.0,1.0,10.4,0.0,0.0,7.268,0.92,0.46,0.0,0.0,0.92,9.568,0.0,0.0
7.9,1.0,0.5,1.29,0.0,1.0,14.19,2.5,0.0,7.268,0.92,0.46,1.1868,0.0,0.92,13.0548,2.3,0.0
25.4,3.5,0.5,0.0,0.0,1.0,30.4,2.5,0.0,23.368,3.22,0.46,0.0,0.0,0.92,27.968,2.3,0.0


Well, we got what we wanted... but we had to write a lot of repeated code, and that's annoying. Thankfully, `polars` gives us a better way--you can actually pass multiple column names to the same `pl.col()` at once!

In [7]:
eur_per_usd = 0.92
currency_columns = [
    "fare_amount",
    "extra",
    "mta_tax",
    "tip_amount",
    "tolls_amount",
    "improvement_surcharge",
    "total_amount",
    "congestion_surcharge",
    "Airport_fee",
]
(
    df.select(
        [
            pl.col(currency_columns).name.suffix("_usd"),
            (pl.col(currency_columns) * eur_per_usd).name.suffix("_eur"),
        ]
    ).head()
)

fare_amount_usd,extra_usd,mta_tax_usd,tip_amount_usd,tolls_amount_usd,improvement_surcharge_usd,total_amount_usd,congestion_surcharge_usd,Airport_fee_usd,fare_amount_eur,extra_eur,mta_tax_eur,tip_amount_eur,tolls_amount_eur,improvement_surcharge_eur,total_amount_eur,congestion_surcharge_eur,Airport_fee_eur
f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64
8.6,3.5,0.5,2.7,0.0,1.0,16.3,2.5,0.0,7.912,3.22,0.46,2.484,0.0,0.92,14.996,2.3,0.0
7.2,3.5,0.5,3.0,0.0,1.0,15.2,2.5,0.0,6.624,3.22,0.46,2.76,0.0,0.92,13.984,2.3,0.0
7.9,1.0,0.5,0.0,0.0,1.0,10.4,0.0,0.0,7.268,0.92,0.46,0.0,0.0,0.92,9.568,0.0,0.0
7.9,1.0,0.5,1.29,0.0,1.0,14.19,2.5,0.0,7.268,0.92,0.46,1.1868,0.0,0.92,13.0548,2.3,0.0
25.4,3.5,0.5,0.0,0.0,1.0,30.4,2.5,0.0,23.368,3.22,0.46,0.0,0.0,0.92,27.968,2.3,0.0


That really cleans up the code! You can also see the usefulness of the `pl.Expr.name.suffix()` functionality--without this, we may very well have had to list every column manually, just so that we can name everything. With `.suffix()`, though, we don't have to.

That's not the only way to select multiple columns, though. In the previous module, we filtered out rides that had a passenger count that was less than or equal to zero. Well, if we have a quick look at the data's schema...

In [8]:
display(df.schema)

Schema([('VendorID', Int32),
        ('tpep_pickup_datetime', Datetime(time_unit='ns', time_zone=None)),
        ('tpep_dropoff_datetime', Datetime(time_unit='ns', time_zone=None)),
        ('passenger_count', Int64),
        ('trip_distance', Float64),
        ('RatecodeID', Int64),
        ('store_and_fwd_flag', String),
        ('PULocationID', Int32),
        ('DOLocationID', Int32),
        ('payment_type', Int64),
        ('fare_amount', Float64),
        ('extra', Float64),
        ('mta_tax', Float64),
        ('tip_amount', Float64),
        ('tolls_amount', Float64),
        ('improvement_surcharge', Float64),
        ('total_amount', Float64),
        ('congestion_surcharge', Float64),
        ('Airport_fee', Float64)])

... we can see that all `pl.Float64` columns are either a distance or a currency, and they should never be zero!  After all, a negative payment amount of any kind means that the taxi driver pays the passenger. Have you ever been paid by the taxi driver to take a taxi ride? No? Me neither.

Well, `polars` offers another way to select multiple columns--selecting by data type! Let's use that to have a look at all the `pl.Float64` columns:

In [9]:
(df.select(pl.col(pl.Float64)).head())

trip_distance,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge,Airport_fee
f64,f64,f64,f64,f64,f64,f64,f64,f64,f64
1.3,8.6,3.5,0.5,2.7,0.0,1.0,16.3,2.5,0.0
1.1,7.2,3.5,0.5,3.0,0.0,1.0,15.2,2.5,0.0
0.86,7.9,1.0,0.5,0.0,0.0,1.0,10.4,0.0,0.0
0.82,7.9,1.0,0.5,1.29,0.0,1.0,14.19,2.5,0.0
4.9,25.4,3.5,0.5,0.0,0.0,1.0,30.4,2.5,0.0


And, just as with `pl.col(List)`, we can operate on all them together to check for impossibly sub-zero values:

In [10]:
(
    df.select(
        pl.col(pl.Float64)
        .lt(0)
        .name.suffix("_lt_zero")  # `.lt()` stands for "Less Than".
    ).head()
)

trip_distance_lt_zero,fare_amount_lt_zero,extra_lt_zero,mta_tax_lt_zero,tip_amount_lt_zero,tolls_amount_lt_zero,improvement_surcharge_lt_zero,total_amount_lt_zero,congestion_surcharge_lt_zero,Airport_fee_lt_zero
bool,bool,bool,bool,bool,bool,bool,bool,bool,bool
False,False,False,False,False,False,False,False,False,False
False,False,False,False,False,False,False,False,False,False
False,False,False,False,False,False,False,False,False,False
False,False,False,False,False,False,False,False,False,False
False,False,False,False,False,False,False,False,False,False


And we can use a new function, `pl.any_horizontal()` (which combines boolean columns into one), along with the aggregate function `.mean()` to see what fraction of rows have bad data:

In [11]:
(
    df.select(
        pl.any_horizontal(pl.col(pl.Float64).lt(0))
        .alias("fraction_float_cols_lt_0")
        .mean()
    ).head()
)

fraction_float_cols_lt_0
f64
0.018489


About `.02`, or `2%`. Not so bad!

Beyond selecting and operating on columns by one column at a time, by a list of columns, or by columns of a particular datatype, `polars` offers still more options. Howevever, they are beyond the scope of this practical guide; if you'd like to read more on those other column selection options, they are consolidated in the `polars.selectors` submodule ([link](https://docs.pola.rs/py-polars/html/reference/selectors.html)).

## 4.2. How to Add New Columns with `.with_columns()`

So far, our methods for retrieving data have been `.head()` and `.select()`. And those can take us pretty far! But what if we want to view all the data, but also add some new columns? Well, for that, `polars` has a helper function, `pl.all()`, similar to SQL's `*`.

In [12]:
(df.select(pl.all()).head())

VendorID,tpep_pickup_datetime,tpep_dropoff_datetime,passenger_count,trip_distance,RatecodeID,store_and_fwd_flag,PULocationID,DOLocationID,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge,Airport_fee
i32,datetime[ns],datetime[ns],i64,f64,i64,str,i32,i32,i64,f64,f64,f64,f64,f64,f64,f64,f64,f64
1,2024-03-01 00:18:51,2024-03-01 00:23:45,0,1.3,1,"""N""",142,239,1,8.6,3.5,0.5,2.7,0.0,1.0,16.3,2.5,0.0
1,2024-03-01 00:26:00,2024-03-01 00:29:06,0,1.1,1,"""N""",238,24,1,7.2,3.5,0.5,3.0,0.0,1.0,15.2,2.5,0.0
2,2024-03-01 00:09:22,2024-03-01 00:15:24,1,0.86,1,"""N""",263,75,2,7.9,1.0,0.5,0.0,0.0,1.0,10.4,0.0,0.0
2,2024-03-01 00:33:45,2024-03-01 00:39:34,1,0.82,1,"""N""",164,162,1,7.9,1.0,0.5,1.29,0.0,1.0,14.19,2.5,0.0
1,2024-03-01 00:05:43,2024-03-01 00:26:22,0,4.9,1,"""N""",263,7,2,25.4,3.5,0.5,0.0,0.0,1.0,30.4,2.5,0.0


And so if we want to add new columns to our dataframe, we can use `select` with `pl.all()` and whatever new column we want:

In [13]:
kilometers_per_mile = 1.61
(
    df.select(
        [
            pl.all(),
            (pl.col("trip_distance") * kilometers_per_mile).name.suffix("_km"),
        ]
    ).head()
)

VendorID,tpep_pickup_datetime,tpep_dropoff_datetime,passenger_count,trip_distance,RatecodeID,store_and_fwd_flag,PULocationID,DOLocationID,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge,Airport_fee,trip_distance_km
i32,datetime[ns],datetime[ns],i64,f64,i64,str,i32,i32,i64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64
1,2024-03-01 00:18:51,2024-03-01 00:23:45,0,1.3,1,"""N""",142,239,1,8.6,3.5,0.5,2.7,0.0,1.0,16.3,2.5,0.0,2.093
1,2024-03-01 00:26:00,2024-03-01 00:29:06,0,1.1,1,"""N""",238,24,1,7.2,3.5,0.5,3.0,0.0,1.0,15.2,2.5,0.0,1.771
2,2024-03-01 00:09:22,2024-03-01 00:15:24,1,0.86,1,"""N""",263,75,2,7.9,1.0,0.5,0.0,0.0,1.0,10.4,0.0,0.0,1.3846
2,2024-03-01 00:33:45,2024-03-01 00:39:34,1,0.82,1,"""N""",164,162,1,7.9,1.0,0.5,1.29,0.0,1.0,14.19,2.5,0.0,1.3202
1,2024-03-01 00:05:43,2024-03-01 00:26:22,0,4.9,1,"""N""",263,7,2,25.4,3.5,0.5,0.0,0.0,1.0,30.4,2.5,0.0,7.889


That's nice, but it can be annoying to type `pl.all()` everytime you want to keep all the original columns around. That's why `polars` offers us `.with_columns()`, a function specifically designed for keeping all original columns, but simply adding new colums:

In [14]:
kilometers_per_mile = 1.61
(
    df.with_columns(
        [(pl.col("trip_distance") * kilometers_per_mile).name.suffix("_km")]
    ).head()
)

VendorID,tpep_pickup_datetime,tpep_dropoff_datetime,passenger_count,trip_distance,RatecodeID,store_and_fwd_flag,PULocationID,DOLocationID,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge,Airport_fee,trip_distance_km
i32,datetime[ns],datetime[ns],i64,f64,i64,str,i32,i32,i64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64
1,2024-03-01 00:18:51,2024-03-01 00:23:45,0,1.3,1,"""N""",142,239,1,8.6,3.5,0.5,2.7,0.0,1.0,16.3,2.5,0.0,2.093
1,2024-03-01 00:26:00,2024-03-01 00:29:06,0,1.1,1,"""N""",238,24,1,7.2,3.5,0.5,3.0,0.0,1.0,15.2,2.5,0.0,1.771
2,2024-03-01 00:09:22,2024-03-01 00:15:24,1,0.86,1,"""N""",263,75,2,7.9,1.0,0.5,0.0,0.0,1.0,10.4,0.0,0.0,1.3846
2,2024-03-01 00:33:45,2024-03-01 00:39:34,1,0.82,1,"""N""",164,162,1,7.9,1.0,0.5,1.29,0.0,1.0,14.19,2.5,0.0,1.3202
1,2024-03-01 00:05:43,2024-03-01 00:26:22,0,4.9,1,"""N""",263,7,2,25.4,3.5,0.5,0.0,0.0,1.0,30.4,2.5,0.0,7.889


The nice thing about `.with_columns()` is that it offers a few ways of passing arguments. You can just pass a single column, without the list:

In [15]:
kilometers_per_mile = 1.61
(
    df.select("trip_distance")
    .with_columns(
        (pl.col("trip_distance") * kilometers_per_mile).name.suffix("_km")
    )
    .head()
)

trip_distance,trip_distance_km
f64,f64
1.3,2.093
1.1,1.771
0.86,1.3846
0.82,1.3202
4.9,7.889


Or you can even pass new columns with their intended column names as keyword arguments to the function:

In [16]:
kilometers_per_mile = 1.61
(
    df.select(pl.col("trip_distance"))
    .with_columns(
        trip_distance_km=pl.col("trip_distance") * kilometers_per_mile
    )
    .head()
)

trip_distance,trip_distance_km
f64,f64
1.3,2.093
1.1,1.771
0.86,1.3846
0.82,1.3202
4.9,7.889


And if you're adding a column with the same name, it'll just overwrite the original column.

In [17]:
kilometers_per_mile = 1.61
(
    df.select(pl.col("trip_distance"))
    .with_columns(pl.col("trip_distance") * kilometers_per_mile)
    .head()
)

trip_distance
f64
2.093
1.771
1.3846
1.3202
7.889


Because `select` was somewhat inconvenient if we want to keep all original columns and **add** one or two, `polars` offered us `.with_columns()`; similarly, what if we want to keep all original columns, but **remove** one or two? Well, for that, there's `.drop()`.

# 4.3. How to Remove Columns with `.drop()`

If we want to remove columns from a dataframe, we use the `.drop()` function. For example, if we want to drop the pickup and dropoff columns:

In [18]:
(df.drop(["tpep_pickup_datetime", "tpep_dropoff_datetime"]).head())

VendorID,passenger_count,trip_distance,RatecodeID,store_and_fwd_flag,PULocationID,DOLocationID,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge,Airport_fee
i32,i64,f64,i64,str,i32,i32,i64,f64,f64,f64,f64,f64,f64,f64,f64,f64
1,0,1.3,1,"""N""",142,239,1,8.6,3.5,0.5,2.7,0.0,1.0,16.3,2.5,0.0
1,0,1.1,1,"""N""",238,24,1,7.2,3.5,0.5,3.0,0.0,1.0,15.2,2.5,0.0
2,1,0.86,1,"""N""",263,75,2,7.9,1.0,0.5,0.0,0.0,1.0,10.4,0.0,0.0
2,1,0.82,1,"""N""",164,162,1,7.9,1.0,0.5,1.29,0.0,1.0,14.19,2.5,0.0
1,0,4.9,1,"""N""",263,7,2,25.4,3.5,0.5,0.0,0.0,1.0,30.4,2.5,0.0


`.drop()` is similar to its cousins `.with_columns()` and `.select()` in the way that all of them can receive arguments as a list, or as simply positional arguments without the list.

In [19]:
(
    df.drop("tpep_pickup_datetime", "tpep_dropoff_datetime").head()  # no list.
)

VendorID,passenger_count,trip_distance,RatecodeID,store_and_fwd_flag,PULocationID,DOLocationID,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge,Airport_fee
i32,i64,f64,i64,str,i32,i32,i64,f64,f64,f64,f64,f64,f64,f64,f64,f64
1,0,1.3,1,"""N""",142,239,1,8.6,3.5,0.5,2.7,0.0,1.0,16.3,2.5,0.0
1,0,1.1,1,"""N""",238,24,1,7.2,3.5,0.5,3.0,0.0,1.0,15.2,2.5,0.0
2,1,0.86,1,"""N""",263,75,2,7.9,1.0,0.5,0.0,0.0,1.0,10.4,0.0,0.0
2,1,0.82,1,"""N""",164,162,1,7.9,1.0,0.5,1.29,0.0,1.0,14.19,2.5,0.0
1,0,4.9,1,"""N""",263,7,2,25.4,3.5,0.5,0.0,0.0,1.0,30.4,2.5,0.0


And, as its cousins, `.with_columns()` and `.select()`, `.drop()` can also receive a polars expression object as input. For example, if we want to remove all columns with the `pl.Int32` data type:

In [20]:
(df.drop(pl.col(pl.Int32)).head())

tpep_pickup_datetime,tpep_dropoff_datetime,passenger_count,trip_distance,RatecodeID,store_and_fwd_flag,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge,Airport_fee
datetime[ns],datetime[ns],i64,f64,i64,str,i64,f64,f64,f64,f64,f64,f64,f64,f64,f64
2024-03-01 00:18:51,2024-03-01 00:23:45,0,1.3,1,"""N""",1,8.6,3.5,0.5,2.7,0.0,1.0,16.3,2.5,0.0
2024-03-01 00:26:00,2024-03-01 00:29:06,0,1.1,1,"""N""",1,7.2,3.5,0.5,3.0,0.0,1.0,15.2,2.5,0.0
2024-03-01 00:09:22,2024-03-01 00:15:24,1,0.86,1,"""N""",2,7.9,1.0,0.5,0.0,0.0,1.0,10.4,0.0,0.0
2024-03-01 00:33:45,2024-03-01 00:39:34,1,0.82,1,"""N""",1,7.9,1.0,0.5,1.29,0.0,1.0,14.19,2.5,0.0
2024-03-01 00:05:43,2024-03-01 00:26:22,0,4.9,1,"""N""",2,25.4,3.5,0.5,0.0,0.0,1.0,30.4,2.5,0.0


That one was easy! One last advanced selecting technique before we're ready to move onto the next module--renaming.

# 4.4. Renaming Columns With `.rename()`

As we've seen, `polars` offers some pretty awesome tools for renaming `pl.Expr` objects, such as `pl.Expr.alias()`, `pl.Expr.name.suffix()` (and of course, though we haven't used it yet, there's also a `pl.Expr.name.prefix()`). But what if you want to rename columns outside the context of a `pl.Expr` object? For that, there are a few tools to help, the primary one being `.rename()`.

`.rename()` has pretty simple usage: you just pass a dictionary of old names, and the names you want to change them to. For example, some names have capital letters, and don't adhere to the same formatting rules as the other columns; we can first have a look by checking the `.columns` attribute of the dataframe:

In [21]:
df.schema

Schema([('VendorID', Int32),
        ('tpep_pickup_datetime', Datetime(time_unit='ns', time_zone=None)),
        ('tpep_dropoff_datetime', Datetime(time_unit='ns', time_zone=None)),
        ('passenger_count', Int64),
        ('trip_distance', Float64),
        ('RatecodeID', Int64),
        ('store_and_fwd_flag', String),
        ('PULocationID', Int32),
        ('DOLocationID', Int32),
        ('payment_type', Int64),
        ('fare_amount', Float64),
        ('extra', Float64),
        ('mta_tax', Float64),
        ('tip_amount', Float64),
        ('tolls_amount', Float64),
        ('improvement_surcharge', Float64),
        ('total_amount', Float64),
        ('congestion_surcharge', Float64),
        ('Airport_fee', Float64)])

In [22]:
df.columns

['VendorID',
 'tpep_pickup_datetime',
 'tpep_dropoff_datetime',
 'passenger_count',
 'trip_distance',
 'RatecodeID',
 'store_and_fwd_flag',
 'PULocationID',
 'DOLocationID',
 'payment_type',
 'fare_amount',
 'extra',
 'mta_tax',
 'tip_amount',
 'tolls_amount',
 'improvement_surcharge',
 'total_amount',
 'congestion_surcharge',
 'Airport_fee']

TitleCase and snake_case in the same set of columns?? We can't allow that!

In [23]:
column_rename_mapping = {
    "VendorID": "vendor_id",
    "RatecodeID": "ratecode_id",
    "PULocationID": "pu_location_id",
    "DOLocationID": "do_location_id",
    "Airport_fee": "airport_fee",
}
(df.rename(column_rename_mapping).head())

vendor_id,tpep_pickup_datetime,tpep_dropoff_datetime,passenger_count,trip_distance,ratecode_id,store_and_fwd_flag,pu_location_id,do_location_id,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge,airport_fee
i32,datetime[ns],datetime[ns],i64,f64,i64,str,i32,i32,i64,f64,f64,f64,f64,f64,f64,f64,f64,f64
1,2024-03-01 00:18:51,2024-03-01 00:23:45,0,1.3,1,"""N""",142,239,1,8.6,3.5,0.5,2.7,0.0,1.0,16.3,2.5,0.0
1,2024-03-01 00:26:00,2024-03-01 00:29:06,0,1.1,1,"""N""",238,24,1,7.2,3.5,0.5,3.0,0.0,1.0,15.2,2.5,0.0
2,2024-03-01 00:09:22,2024-03-01 00:15:24,1,0.86,1,"""N""",263,75,2,7.9,1.0,0.5,0.0,0.0,1.0,10.4,0.0,0.0
2,2024-03-01 00:33:45,2024-03-01 00:39:34,1,0.82,1,"""N""",164,162,1,7.9,1.0,0.5,1.29,0.0,1.0,14.19,2.5,0.0
1,2024-03-01 00:05:43,2024-03-01 00:26:22,0,4.9,1,"""N""",263,7,2,25.4,3.5,0.5,0.0,0.0,1.0,30.4,2.5,0.0


Easy, looking good!

# Conclusion

In this module, we learned some advanced selecting techniques, including operations on multi-column `pl.Expr` objects, `.with_columns()`, `.drop()`, and `.rename()`, thus preparing us to create some more advanced queries in the next module.