# Filtering rows: Integer Columns
By the end of this lecture you will be able to:

- `Single` column
- `Multiple` column
- use `<` `>` `=` `!` Condition in `filter`
- use AND conditions in `filter`
- use OR conditions in `filter`
- use multiple `AND` and `OR` condition in `filter`
- use `Between` condition in `filter`

In [6]:
import polars as pl

In [7]:
csv_file = "../../Files/Sample_Superstore.csv"

In [8]:
df = pl.read_csv(csv_file)

In [9]:
df.head()

Row_ID,Order_ID,Order_Date,Ship Date,Ship_Mode,Customer_ID,Customer_Name,Segment,Country,City,State,Postal_Code,Region,Product_ID,Category,Sub_Category,Product_Name,Sales,Quantity,Discount,Profit
i64,str,str,str,str,str,str,str,str,str,str,i64,str,str,str,str,str,f64,i64,f64,f64
1,"""CA-2016-152156""","""11/8/2016""","""11/11/2016""","""Second Class""","""CG-12520""","""Claire Gute""","""Consumer""","""United States""","""Henderson""","""Kentucky""",42420,"""South""","""FUR-BO-10001798""","""Furniture""","""Bookcases""","""Bush Somerset Collection Bookc…",261.96,2,0.0,41.9136
2,"""CA-2016-152156""","""11/8/2016""","""11/11/2016""","""Second Class""","""CG-12520""","""Claire Gute""","""Consumer""","""United States""","""Henderson""","""Kentucky""",42420,"""South""","""FUR-CH-10000454""","""Furniture""","""Chairs""","""Hon Deluxe Fabric Upholstered …",731.94,3,0.0,219.582
3,"""CA-2016-138688""","""6/12/2016""","""6/16/2016""","""Second Class""","""DV-13045""","""Darrin Van Huff""","""Corporate""","""United States""","""Los Angeles""","""California""",90036,"""West""","""OFF-LA-10000240""","""Office Supplies""","""Labels""","""Self-Adhesive Address Labels f…",14.62,2,0.0,6.8714
4,"""US-2015-108966""","""10/11/2015""","""10/18/2015""","""Standard Class""","""SO-20335""","""Sean O'Donnell""","""Consumer""","""United States""","""Fort Lauderdale""","""Florida""",33311,"""South""","""FUR-TA-10000577""","""Furniture""","""Tables""","""Bretford CR4500 Series Slim Re…",957.5775,5,0.45,-383.031
5,"""US-2015-108966""","""10/11/2015""","""10/18/2015""","""Standard Class""","""SO-20335""","""Sean O'Donnell""","""Consumer""","""United States""","""Fort Lauderdale""","""Florida""",33311,"""South""","""OFF-ST-10000760""","""Office Supplies""","""Storage""","""Eldon Fold 'N Roll Cart System""",22.368,2,0.2,2.5164


## Selecting 1 column 

We can select a column with an expression in the `select` method

In [14]:
df.select(pl.col('Profit')).head(3)

Profit
f64
41.9136
219.582
6.8714


## Selecting columns

We can select columns with an expression in the `select` method

In [15]:

#repeat
df.select(pl.col('Profit','Discount')).head(3)

Profit,Discount
f64,f64
41.9136,0.0
219.582,0.0
6.8714,0.0


## Syntax of `filter`

In this example we choose all rows with the number of `Profit` is greater than 1000



In [32]:
(
    df
    .filter(
        pl.col('Profit') > 1000
    )
    .head(2)
)

Row_ID,Order_ID,Order_Date,Ship Date,Ship_Mode,Customer_ID,Customer_Name,Segment,Country,City,State,Postal_Code,Region,Product_ID,Category,Sub_Category,Product_Name,Sales,Quantity,Discount,Profit
i64,str,str,str,str,str,str,str,str,str,str,i64,str,str,str,str,str,f64,i64,f64,f64
319,"""CA-2014-164973""","""11/4/2014""","""11/9/2014""","""Standard Class""","""NM-18445""","""Nathan Mautz""","""Home Office""","""United States""","""New York City""","""New York""",10024,"""East""","""TEC-MA-10002927""","""Technology""","""Machines""","""Canon imageCLASS MF7460 Monoch…",3991.98,2,0.0,1995.99
354,"""CA-2016-129714""","""9/1/2016""","""9/3/2016""","""First Class""","""AB-10060""","""Adam Bellavance""","""Home Office""","""United States""","""New York City""","""New York""",10009,"""East""","""OFF-BI-10004995""","""Office Supplies""","""Binders""","""GBC DocuBind P400 Electric Bin…",4355.168,4,0.2,1415.4296


To save a bit of typing we can also apply a filter to a column by passing the column name directly

In [36]:
(
    df
    .filter(
        pl.col('Profit') > 1000,
    )
    .select("Customer_Name", "Profit", "Discount")
    .head(5)
)

Customer_Name,Profit,Discount
str,f64,f64
"""Nathan Mautz""",1995.99,0.0
"""Adam Bellavance""",1415.4296,0.2
"""Christopher Martinez""",3177.475,0.0
"""Alan Dominguez""",1379.977,0.0
"""Mitch Willingham""",1276.4871,0.0


The Filter available include:
- `=`
- `<`
- `>`
- `<=`
- `>=`
- `!=`


### Apply `AND` conditions

We can apply filter `AND` conditions where all conditions must be met in a number of ways.

In this example we look for rows where the `Quantity` is 5 `AND` the `Profit` is over 500  

In [22]:
(
    df
    .filter(
        (pl.col('Quantity') == 5) & (pl.col('Profit') >= 500)
    )
    .head(2)
)

Row_ID,Order_ID,Order_Date,Ship Date,Ship_Mode,Customer_ID,Customer_Name,Segment,Country,City,State,Postal_Code,Region,Product_ID,Category,Sub_Category,Product_Name,Sales,Quantity,Discount,Profit
i64,str,str,str,str,str,str,str,str,str,str,i64,str,str,str,str,str,f64,i64,f64,f64
510,"""CA-2015-145352""","""3/16/2015""","""3/22/2015""","""Standard Class""","""CM-12385""","""Christopher Martinez""","""Consumer""","""United States""","""Atlanta""","""Georgia""",30318,"""South""","""OFF-BI-10003527""","""Office Supplies""","""Binders""","""Fellowes PB500 Electric Punch …",6354.95,5,0.0,3177.475
516,"""CA-2017-127432""","""1/22/2017""","""1/27/2017""","""Standard Class""","""AD-10180""","""Alan Dominguez""","""Home Office""","""United States""","""Great Falls""","""Montana""",59405,"""West""","""TEC-CO-10003236""","""Technology""","""Copiers""","""Canon Image Class D660 Copier""",2999.95,5,0.0,1379.977


### Apply `AND` condition on a range

There is a less verbose way to do this by passing the predicates as a comma-separated list of expressions

In [37]:
(
    df
    .filter(
        pl.col("Quantity") == 5,
        pl.col("Profit") > 1000
    )
    .head(2)
)

Row_ID,Order_ID,Order_Date,Ship Date,Ship_Mode,Customer_ID,Customer_Name,Segment,Country,City,State,Postal_Code,Region,Product_ID,Category,Sub_Category,Product_Name,Sales,Quantity,Discount,Profit
i64,str,str,str,str,str,str,str,str,str,str,i64,str,str,str,str,str,f64,i64,f64,f64
510,"""CA-2015-145352""","""3/16/2015""","""3/22/2015""","""Standard Class""","""CM-12385""","""Christopher Martinez""","""Consumer""","""United States""","""Atlanta""","""Georgia""",30318,"""South""","""OFF-BI-10003527""","""Office Supplies""","""Binders""","""Fellowes PB500 Electric Punch …",6354.95,5,0.0,3177.475
516,"""CA-2017-127432""","""1/22/2017""","""1/27/2017""","""Standard Class""","""AD-10180""","""Alan Dominguez""","""Home Office""","""United States""","""Great Falls""","""Montana""",59405,"""West""","""TEC-CO-10003236""","""Technology""","""Copiers""","""Canon Image Class D660 Copier""",2999.95,5,0.0,1379.977


### Apply `OR` conditions

We can apply an OR filter using the pipe `|` operator.



In [27]:
(
    df
    .filter(
        (pl.col('Quantity') == 5) | (pl.col('Profit') >= 500)
    )
    .head(2)
)

Row_ID,Order_ID,Order_Date,Ship Date,Ship_Mode,Customer_ID,Customer_Name,Segment,Country,City,State,Postal_Code,Region,Product_ID,Category,Sub_Category,Product_Name,Sales,Quantity,Discount,Profit
i64,str,str,str,str,str,str,str,str,str,str,i64,str,str,str,str,str,f64,i64,f64,f64
4,"""US-2015-108966""","""10/11/2015""","""10/18/2015""","""Standard Class""","""SO-20335""","""Sean O'Donnell""","""Consumer""","""United States""","""Fort Lauderdale""","""Florida""",33311,"""South""","""FUR-TA-10000577""","""Furniture""","""Tables""","""Bretford CR4500 Series Slim Re…",957.5775,5,0.45,-383.031
10,"""CA-2014-115812""","""6/9/2014""","""6/14/2014""","""Standard Class""","""BH-11710""","""Brosina Hoffman""","""Consumer""","""United States""","""Los Angeles""","""California""",90032,"""West""","""OFF-AP-10002892""","""Office Supplies""","""Appliances""","""Belkin F5C206VTEL 6 Outlet Sur…",114.9,5,0.0,34.47


One kind of OR condition is when we want to check if a row is equal to any value in a `list`. We can do this with `is_in`

In [30]:
(
    df
    .filter(
        pl.col('Quantity').is_in([2,3])
    )
    .head(3)
)

Row_ID,Order_ID,Order_Date,Ship Date,Ship_Mode,Customer_ID,Customer_Name,Segment,Country,City,State,Postal_Code,Region,Product_ID,Category,Sub_Category,Product_Name,Sales,Quantity,Discount,Profit
i64,str,str,str,str,str,str,str,str,str,str,i64,str,str,str,str,str,f64,i64,f64,f64
1,"""CA-2016-152156""","""11/8/2016""","""11/11/2016""","""Second Class""","""CG-12520""","""Claire Gute""","""Consumer""","""United States""","""Henderson""","""Kentucky""",42420,"""South""","""FUR-BO-10001798""","""Furniture""","""Bookcases""","""Bush Somerset Collection Bookc…",261.96,2,0.0,41.9136
2,"""CA-2016-152156""","""11/8/2016""","""11/11/2016""","""Second Class""","""CG-12520""","""Claire Gute""","""Consumer""","""United States""","""Henderson""","""Kentucky""",42420,"""South""","""FUR-CH-10000454""","""Furniture""","""Chairs""","""Hon Deluxe Fabric Upholstered …",731.94,3,0.0,219.582
3,"""CA-2016-138688""","""6/12/2016""","""6/16/2016""","""Second Class""","""DV-13045""","""Darrin Van Huff""","""Corporate""","""United States""","""Los Angeles""","""California""",90036,"""West""","""OFF-LA-10000240""","""Office Supplies""","""Labels""","""Self-Adhesive Address Labels f…",14.62,2,0.0,6.8714


## Apply `AND` and `OR` Condition

Filter with `AND` and `OR`

In [31]:
(
    df
    .filter(
        ((pl.col('Quantity') == 5) & (pl.col('Profit') >= 1000)) | (pl.col('City') == "Los Angeles")
    )
    .head(2)
)


Row_ID,Order_ID,Order_Date,Ship Date,Ship_Mode,Customer_ID,Customer_Name,Segment,Country,City,State,Postal_Code,Region,Product_ID,Category,Sub_Category,Product_Name,Sales,Quantity,Discount,Profit
i64,str,str,str,str,str,str,str,str,str,str,i64,str,str,str,str,str,f64,i64,f64,f64
3,"""CA-2016-138688""","""6/12/2016""","""6/16/2016""","""Second Class""","""DV-13045""","""Darrin Van Huff""","""Corporate""","""United States""","""Los Angeles""","""California""",90036,"""West""","""OFF-LA-10000240""","""Office Supplies""","""Labels""","""Self-Adhesive Address Labels f…",14.62,2,0.0,6.8714
6,"""CA-2014-115812""","""6/9/2014""","""6/14/2014""","""Standard Class""","""BH-11710""","""Brosina Hoffman""","""Consumer""","""United States""","""Los Angeles""","""California""",90032,"""West""","""FUR-FU-10001487""","""Furniture""","""Furnishings""","""Eldon Expressions Wood and Pla…",48.86,7,0.0,14.1694


## Apply `Between` condition

We use `in_between` to apply a condition on a range. In this case we are looking for values **greater than or equal to** 1000 and **less than or equal to** 2000

In [39]:
(
    df
    .filter(
        pl.col('Profit').is_between(1000, 2000)
    ).head(5)
)

Row_ID,Order_ID,Order_Date,Ship Date,Ship_Mode,Customer_ID,Customer_Name,Segment,Country,City,State,Postal_Code,Region,Product_ID,Category,Sub_Category,Product_Name,Sales,Quantity,Discount,Profit
i64,str,str,str,str,str,str,str,str,str,str,i64,str,str,str,str,str,f64,i64,f64,f64
319,"""CA-2014-164973""","""11/4/2014""","""11/9/2014""","""Standard Class""","""NM-18445""","""Nathan Mautz""","""Home Office""","""United States""","""New York City""","""New York""",10024,"""East""","""TEC-MA-10002927""","""Technology""","""Machines""","""Canon imageCLASS MF7460 Monoch…",3991.98,2,0.0,1995.99
354,"""CA-2016-129714""","""9/1/2016""","""9/3/2016""","""First Class""","""AB-10060""","""Adam Bellavance""","""Home Office""","""United States""","""New York City""","""New York""",10009,"""East""","""OFF-BI-10004995""","""Office Supplies""","""Binders""","""GBC DocuBind P400 Electric Bin…",4355.168,4,0.2,1415.4296
516,"""CA-2017-127432""","""1/22/2017""","""1/27/2017""","""Standard Class""","""AD-10180""","""Alan Dominguez""","""Home Office""","""United States""","""Great Falls""","""Montana""",59405,"""West""","""TEC-CO-10003236""","""Technology""","""Copiers""","""Canon Image Class D660 Copier""",2999.95,5,0.0,1379.977
995,"""CA-2014-117639""","""5/21/2014""","""5/25/2014""","""Standard Class""","""MW-18235""","""Mitch Willingham""","""Corporate""","""United States""","""Virginia Beach""","""Virginia""",23464,"""South""","""OFF-BI-10003925""","""Office Supplies""","""Binders""","""Fellowes PB300 Plastic Comb Bi…",2715.93,7,0.0,1276.4871
1455,"""CA-2016-133711""","""11/26/2016""","""11/29/2016""","""First Class""","""MC-17425""","""Mark Cousins""","""Corporate""","""United States""","""Mobile""","""Alabama""",36608,"""South""","""TEC-MA-10000010""","""Technology""","""Machines""","""Hewlett-Packard Deskjet 3050a …",3040.0,8,0.0,1459.2
