# Filtering rows: String Columns
By the end of this lecture you will be able to:

- `Select` columns
- use `=` `!` Condition in `filter`
- use `in` condition in `filter`
- use `Like` condition in `filter`

In [1]:
import polars as pl

In [4]:
df = pl.read_csv("../../Files/Sample_Superstore.csv")

In [5]:
df.head()

Row_ID,Order_ID,Order_Date,Ship Date,Ship_Mode,Customer_ID,Customer_Name,Segment,Country,City,State,Postal_Code,Region,Product_ID,Category,Sub_Category,Product_Name,Sales,Quantity,Discount,Profit
i64,str,str,str,str,str,str,str,str,str,str,i64,str,str,str,str,str,f64,i64,f64,f64
1,"""CA-2016-152156""","""11/8/2016""","""11/11/2016""","""Second Class""","""CG-12520""","""Claire Gute""","""Consumer""","""United States""","""Henderson""","""Kentucky""",42420,"""South""","""FUR-BO-10001798""","""Furniture""","""Bookcases""","""Bush Somerset Collection Bookc…",261.96,2,0.0,41.9136
2,"""CA-2016-152156""","""11/8/2016""","""11/11/2016""","""Second Class""","""CG-12520""","""Claire Gute""","""Consumer""","""United States""","""Henderson""","""Kentucky""",42420,"""South""","""FUR-CH-10000454""","""Furniture""","""Chairs""","""Hon Deluxe Fabric Upholstered …",731.94,3,0.0,219.582
3,"""CA-2016-138688""","""6/12/2016""","""6/16/2016""","""Second Class""","""DV-13045""","""Darrin Van Huff""","""Corporate""","""United States""","""Los Angeles""","""California""",90036,"""West""","""OFF-LA-10000240""","""Office Supplies""","""Labels""","""Self-Adhesive Address Labels f…",14.62,2,0.0,6.8714
4,"""US-2015-108966""","""10/11/2015""","""10/18/2015""","""Standard Class""","""SO-20335""","""Sean O'Donnell""","""Consumer""","""United States""","""Fort Lauderdale""","""Florida""",33311,"""South""","""FUR-TA-10000577""","""Furniture""","""Tables""","""Bretford CR4500 Series Slim Re…",957.5775,5,0.45,-383.031
5,"""US-2015-108966""","""10/11/2015""","""10/18/2015""","""Standard Class""","""SO-20335""","""Sean O'Donnell""","""Consumer""","""United States""","""Fort Lauderdale""","""Florida""",33311,"""South""","""OFF-ST-10000760""","""Office Supplies""","""Storage""","""Eldon Fold 'N Roll Cart System""",22.368,2,0.2,2.5164


## Selecting columns

We can select columns with an expression in the `select` method

In [7]:
df.select(pl.col('Customer_Name','Segment')).head(3)

Customer_Name,Segment
str,str
"""Claire Gute""","""Consumer"""
"""Claire Gute""","""Consumer"""
"""Darrin Van Huff""","""Corporate"""


## Apply `EQUAL` Condition

In this example we choose all rows `City` is `Henderson`


In [8]:
(
    df
    .filter(
        pl.col("City") == "Henderson"
    ).head()
)

Row_ID,Order_ID,Order_Date,Ship Date,Ship_Mode,Customer_ID,Customer_Name,Segment,Country,City,State,Postal_Code,Region,Product_ID,Category,Sub_Category,Product_Name,Sales,Quantity,Discount,Profit
i64,str,str,str,str,str,str,str,str,str,str,i64,str,str,str,str,str,f64,i64,f64,f64
1,"""CA-2016-152156""","""11/8/2016""","""11/11/2016""","""Second Class""","""CG-12520""","""Claire Gute""","""Consumer""","""United States""","""Henderson""","""Kentucky""",42420,"""South""","""FUR-BO-10001798""","""Furniture""","""Bookcases""","""Bush Somerset Collection Bookc…",261.96,2,0.0,41.9136
2,"""CA-2016-152156""","""11/8/2016""","""11/11/2016""","""Second Class""","""CG-12520""","""Claire Gute""","""Consumer""","""United States""","""Henderson""","""Kentucky""",42420,"""South""","""FUR-CH-10000454""","""Furniture""","""Chairs""","""Hon Deluxe Fabric Upholstered …",731.94,3,0.0,219.582
539,"""CA-2015-134894""","""12/7/2015""","""12/11/2015""","""Standard Class""","""DK-12985""","""Darren Koutras""","""Consumer""","""United States""","""Henderson""","""Kentucky""",42420,"""South""","""OFF-AP-10001271""","""Office Supplies""","""Appliances""","""Eureka The Boss Cordless Recha…",152.94,3,0.0,41.2938
540,"""CA-2015-134894""","""12/7/2015""","""12/11/2015""","""Standard Class""","""DK-12985""","""Darren Koutras""","""Consumer""","""United States""","""Henderson""","""Kentucky""",42420,"""South""","""FUR-CH-10002647""","""Furniture""","""Chairs""","""Situations Contoured Folding C…",283.92,4,0.0,70.98
997,"""CA-2015-162537""","""10/28/2015""","""11/3/2015""","""Standard Class""","""RD-19585""","""Rob Dowd""","""Consumer""","""United States""","""Henderson""","""Kentucky""",42420,"""South""","""OFF-EN-10003862""","""Office Supplies""","""Envelopes""","""Laser & Ink Jet Business Envel…",10.67,1,0.0,4.9082


As well as the mathemtical operators such as `=!`,`>`,`<` there are corresponding text operators that some people find more readable

## Apply `IN` condition

you can use the `is_in` method to filter rows based on whether the values in a column match any value from a list or set.

In [10]:
(
    df
    .filter(
        pl.col('City').is_in(["Los Angeles", "Henderson"])
    ).head()
)

Row_ID,Order_ID,Order_Date,Ship Date,Ship_Mode,Customer_ID,Customer_Name,Segment,Country,City,State,Postal_Code,Region,Product_ID,Category,Sub_Category,Product_Name,Sales,Quantity,Discount,Profit
i64,str,str,str,str,str,str,str,str,str,str,i64,str,str,str,str,str,f64,i64,f64,f64
1,"""CA-2016-152156""","""11/8/2016""","""11/11/2016""","""Second Class""","""CG-12520""","""Claire Gute""","""Consumer""","""United States""","""Henderson""","""Kentucky""",42420,"""South""","""FUR-BO-10001798""","""Furniture""","""Bookcases""","""Bush Somerset Collection Bookc…",261.96,2,0.0,41.9136
2,"""CA-2016-152156""","""11/8/2016""","""11/11/2016""","""Second Class""","""CG-12520""","""Claire Gute""","""Consumer""","""United States""","""Henderson""","""Kentucky""",42420,"""South""","""FUR-CH-10000454""","""Furniture""","""Chairs""","""Hon Deluxe Fabric Upholstered …",731.94,3,0.0,219.582
3,"""CA-2016-138688""","""6/12/2016""","""6/16/2016""","""Second Class""","""DV-13045""","""Darrin Van Huff""","""Corporate""","""United States""","""Los Angeles""","""California""",90036,"""West""","""OFF-LA-10000240""","""Office Supplies""","""Labels""","""Self-Adhesive Address Labels f…",14.62,2,0.0,6.8714
6,"""CA-2014-115812""","""6/9/2014""","""6/14/2014""","""Standard Class""","""BH-11710""","""Brosina Hoffman""","""Consumer""","""United States""","""Los Angeles""","""California""",90032,"""West""","""FUR-FU-10001487""","""Furniture""","""Furnishings""","""Eldon Expressions Wood and Pla…",48.86,7,0.0,14.1694
7,"""CA-2014-115812""","""6/9/2014""","""6/14/2014""","""Standard Class""","""BH-11710""","""Brosina Hoffman""","""Consumer""","""United States""","""Los Angeles""","""California""",90032,"""West""","""OFF-AR-10002833""","""Office Supplies""","""Art""","""Newell 322""",7.28,4,0.0,1.9656


## Apply `Contain` condition

The filter method applies this condition, returning rows where the `Customer_Name` includes "Gene Hale".

In [12]:
(
    df.filter(
        pl.col("Customer_Name").str.contains("Gene Hale")
    ).head()
)

The history saving thread hit an unexpected error (OperationalError('attempt to write a readonly database')).History will not be written to the database.


Row_ID,Order_ID,Order_Date,Ship Date,Ship_Mode,Customer_ID,Customer_Name,Segment,Country,City,State,Postal_Code,Region,Product_ID,Category,Sub_Category,Product_Name,Sales,Quantity,Discount,Profit
i64,str,str,str,str,str,str,str,str,str,str,i64,str,str,str,str,str,f64,i64,f64,f64
36,"""CA-2016-117590""","""12/8/2016""","""12/10/2016""","""First Class""","""GH-14485""","""Gene Hale""","""Corporate""","""United States""","""Richardson""","""Texas""",75080,"""Central""","""TEC-PH-10004977""","""Technology""","""Phones""","""GE 30524EE4""",1097.544,7,0.2,123.4737
37,"""CA-2016-117590""","""12/8/2016""","""12/10/2016""","""First Class""","""GH-14485""","""Gene Hale""","""Corporate""","""United States""","""Richardson""","""Texas""",75080,"""Central""","""FUR-FU-10003664""","""Furniture""","""Furnishings""","""Electrix Architect's Clamp-On …",190.92,5,0.6,-147.963
8134,"""CA-2015-131352""","""10/8/2015""","""10/13/2015""","""Standard Class""","""GH-14485""","""Gene Hale""","""Corporate""","""United States""","""Dallas""","""Texas""",75081,"""Central""","""FUR-FU-10003708""","""Furniture""","""Furnishings""","""Tenex Traditional Chairmats fo…",72.78,3,0.6,-70.9605


### Apply `AND` conditions

We can apply filter `AND` conditions where all conditions must be met in a number of ways.

In this example we look for rows where the `Quantity` is 5 `AND` the `Profit` is over 500

In [17]:
(
    df
    .filter(
        (pl.col('Customer_Name') == "Gene Hale") & (pl.col('Segment') == "Corporate")
    )
    .head(2)
)

Row_ID,Order_ID,Order_Date,Ship Date,Ship_Mode,Customer_ID,Customer_Name,Segment,Country,City,State,Postal_Code,Region,Product_ID,Category,Sub_Category,Product_Name,Sales,Quantity,Discount,Profit
i64,str,str,str,str,str,str,str,str,str,str,i64,str,str,str,str,str,f64,i64,f64,f64
36,"""CA-2016-117590""","""12/8/2016""","""12/10/2016""","""First Class""","""GH-14485""","""Gene Hale""","""Corporate""","""United States""","""Richardson""","""Texas""",75080,"""Central""","""TEC-PH-10004977""","""Technology""","""Phones""","""GE 30524EE4""",1097.544,7,0.2,123.4737
37,"""CA-2016-117590""","""12/8/2016""","""12/10/2016""","""First Class""","""GH-14485""","""Gene Hale""","""Corporate""","""United States""","""Richardson""","""Texas""",75080,"""Central""","""FUR-FU-10003664""","""Furniture""","""Furnishings""","""Electrix Architect's Clamp-On …",190.92,5,0.6,-147.963
