## Missing values
By the end of this lecture you will be able to:
- identify missing values in a `DataFrame`
- count the number of missing values in a column
- find and drop `null` or non-`null` values

In [2]:
import polars as pl
import polars.selectors as cs

In [17]:
csv_file = '../../Files/Sample_Superstore.csv'

df = pl.read_csv(csv_file)


> In Pandas a missing value can be represented with a `null`,`NaN` or `None` value depending on the dtype of the column. Polars also allows `NaN` values for floating point columns to represent non-numberic values (e.g. where division by zero has occurred). This use of `NaN` is distinct from missing values. 

### Metadata on `null` values
Polars stores metadata about `null` values for each column in a `DataFrame`.

#### Null count
Polars stores a count of how many `null` values there are. We can access this with the `null_count` method on a single column or on all the columns

In [6]:
df.null_count()

Row_ID,Order_ID,Order_Date,Ship_Date,Ship_Mode,Customer_ID,Customer_Name,Segment,Country,City,State,Postal_Code,Region,Product_ID,Category,Sub_Category,Product_Name,Sales,Quantity,Discount,Profit,Is_Return
u32,u32,u32,u32,u32,u32,u32,u32,u32,u32,u32,u32,u32,u32,u32,u32,u32,u32,u32,u32,u32,u32
0,0,2,6,1,0,1,0,0,0,1,0,0,0,8,3,0,0,0,0,0,4


Polars keeps track of the `null_count` at all times so this is a cheap operation regardless of the size of the column.

### Finding `null` values

We use the `is_null` expression to find out each value is `null` for the converse

In [10]:
(
    df
    .select(
        [
            pl.col("Customer_Name"),
            pl.col("Category").is_null().alias("Category_is_null"),
            pl.col("Region").is_null().alias("Region_is_null")
        ]
    ).head()
)

Customer_Name,Category_is_null,Region_is_null
str,bool,bool
"""Claire Gute""",False,False
"""Claire Gute""",False,False
"""Darrin Van Huff""",False,False
"""Sean O'Donnell""",False,False
"""Sean O'Donnell""",False,False


### Filtering by `null` values

#### Filtering on a single column
We can use these methods to filter by `null` values on a single column.

In this example we want all rows where the values in `Category` are not `null`

In [25]:
(
    df
    .filter(
        pl.col("Category").is_null(),
    ).select("Customer_Name", "Category", "Profit")
)

Customer_Name,Category,Profit
str,str,f64
"""Brosina Hoffman""",,5.4432
"""Zuschuss Donatelli""",,2.4824
"""Emily Burns""",,240.2649
"""Gene Hale""",,123.4737
"""Katrina Willman""",,9.936
"""Dean Katz""",,0.777
"""Mark Packer""",,3.36
"""Bradley Drucker""",,206.316


#### Filtering by `null` values in multiple columns


In [14]:
(
    df
    .filter(
        pl.any_horizontal(pl.all().is_null())
    ).head()
)

Row_ID,Order_ID,Order_Date,Ship_Date,Ship_Mode,Customer_ID,Customer_Name,Segment,Country,City,State,Postal_Code,Region,Product_ID,Category,Sub_Category,Product_Name,Sales,Quantity,Discount,Profit,Is_Return
i64,str,str,str,str,str,str,str,str,str,str,i64,str,str,str,str,str,f64,i64,f64,f64,bool
5,"""US-2015-108966""",,"""10/18/2015""","""Standard Class""","""SO-20335""","""Sean O'Donnell""","""Consumer""","""United States""","""Fort Lauderdale""","""Florida""",33311,"""South""","""OFF-ST-10000760""","""Office Supplies""","""Storage""","""Eldon Fold 'N Roll Cart System""",22.368,2,0.2,2.5164,False
10,"""CA-2014-115812""","""6/9/2014""","""6/14/2014""","""Standard Class""","""BH-11710""",,"""Consumer""","""United States""","""Los Angeles""","""California""",90032,"""West""","""OFF-AP-10002892""","""Office Supplies""","""Appliances""","""Belkin F5C206VTEL 6 Outlet Sur…",114.9,5,0.0,34.47,True
13,"""CA-2017-114412""","""4/15/2017""","""4/20/2017""","""Standard Class""","""AA-10480""","""Brosina Hoffman""","""Consumer""","""United States""","""Concord""","""North Carolina""",28027,"""South""","""OFF-PA-10002365""",,"""Paper""","""Xerox 1967""",15.552,3,0.2,5.4432,False
17,"""CA-2014-105893""","""11/11/2014""",,"""Standard Class""","""PK-19075""","""Pete Kriz""","""Consumer""","""United States""","""Madison""","""Wisconsin""",53711,"""Central""","""OFF-ST-10004186""","""Office Supplies""","""Storage""","""Stur-D-Stor Shelving, Vertical…",665.88,6,0.0,13.3176,True
19,"""CA-2014-143336""","""8/27/2014""","""9/1/2014""","""Second Class""","""ZD-21925""","""Zuschuss Donatelli""","""Consumer""","""United States""","""San Francisco""","""California""",94109,"""West""","""OFF-AR-10003056""",,"""Art""","""Newell 341""",8.56,2,0.0,2.4824,True


### Using the `drop_nulls` method

Polars has a convenience `drop_nulls` method for dropping rows where all values are `null`

In [22]:
(
    df
    .drop_nulls(subset=["Ship_Date"])
)

Row_ID,Order_ID,Order_Date,Ship_Date,Ship_Mode,Customer_ID,Customer_Name,Segment,Country,City,State,Postal_Code,Region,Product_ID,Category,Sub_Category,Product_Name,Sales,Quantity,Discount,Profit,Is_Return
i64,str,str,str,str,str,str,str,str,str,str,i64,str,str,str,str,str,f64,i64,f64,f64,bool
1,"""CA-2016-152156""","""11/8/2016""","""11/11/2016""","""Second Class""","""CG-12520""","""Claire Gute""","""Consumer""","""United States""","""Henderson""","""Kentucky""",42420,"""South""","""FUR-BO-10001798""","""Furniture""","""Bookcases""","""Bush Somerset Collection Bookc…",261.96,2,0.0,41.9136,true
2,"""CA-2016-152156""","""11/8/2016""","""11/11/2016""","""Second Class""","""CG-12520""","""Claire Gute""","""Consumer""","""United States""","""Henderson""","""Kentucky""",42420,"""South""","""FUR-CH-10000454""","""Furniture""","""Chairs""","""Hon Deluxe Fabric Upholstered …",731.94,3,0.0,219.582,true
3,"""CA-2016-138688""","""6/12/2016""","""6/16/2016""","""Second Class""","""DV-13045""","""Darrin Van Huff""","""Corporate""","""United States""","""Los Angeles""","""California""",90036,"""West""","""OFF-LA-10000240""","""Office Supplies""","""Labels""","""Self-Adhesive Address Labels f…",14.62,2,0.0,6.8714,true
4,"""US-2015-108966""","""10/11/2015""","""10/18/2015""","""Standard Class""","""SO-20335""","""Sean O'Donnell""","""Consumer""","""United States""","""Fort Lauderdale""","""Florida""",33311,"""South""","""FUR-TA-10000577""","""Furniture""","""Tables""","""Bretford CR4500 Series Slim Re…",957.5775,5,0.45,-383.031,true
5,"""US-2015-108966""",,"""10/18/2015""","""Standard Class""","""SO-20335""","""Sean O'Donnell""","""Consumer""","""United States""","""Fort Lauderdale""","""Florida""",33311,"""South""","""OFF-ST-10000760""","""Office Supplies""","""Storage""","""Eldon Fold 'N Roll Cart System""",22.368,2,0.2,2.5164,false
…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…
9990,"""CA-2014-110422""","""1/21/2014""","""1/23/2014""","""Second Class""","""TB-21400""","""Tom Boeckenhauer""","""Consumer""","""United States""","""Miami""","""Florida""",33180,"""South""","""FUR-FU-10001889""","""Furniture""","""Furnishings""","""Ultra Door Pull Handle""",25.248,3,0.2,4.1028,true
9991,"""CA-2017-121258""","""2/26/2017""","""3/3/2017""","""Standard Class""","""DB-13060""","""Dave Brooks""","""Consumer""","""United States""","""Costa Mesa""","""California""",92627,"""West""","""FUR-FU-10000747""","""Furniture""","""Furnishings""","""Tenex B1-RE Series Chair Mats …",91.96,2,0.0,15.6332,true
9992,"""CA-2017-121258""","""2/26/2017""","""3/3/2017""","""Standard Class""","""DB-13060""","""Dave Brooks""","""Consumer""","""United States""","""Costa Mesa""","""California""",92627,"""West""","""TEC-PH-10003645""","""Technology""","""Phones""","""Aastra 57i VoIP phone""",258.576,2,0.2,19.3932,true
9993,"""CA-2017-121258""","""2/26/2017""","""3/3/2017""","""Standard Class""","""DB-13060""","""Dave Brooks""","""Consumer""","""United States""","""Costa Mesa""","""California""",92627,"""West""","""OFF-PA-10004041""","""Office Supplies""","""Paper""","""It's Hot Message Books with St…",29.6,4,0.0,13.32,true


In [20]:
(
    df
    .drop_nulls()
)

Row_ID,Order_ID,Order_Date,Ship_Date,Ship_Mode,Customer_ID,Customer_Name,Segment,Country,City,State,Postal_Code,Region,Product_ID,Category,Sub_Category,Product_Name,Sales,Quantity,Discount,Profit,Is_Return
i64,str,str,str,str,str,str,str,str,str,str,i64,str,str,str,str,str,f64,i64,f64,f64,bool
1,"""CA-2016-152156""","""11/8/2016""","""11/11/2016""","""Second Class""","""CG-12520""","""Claire Gute""","""Consumer""","""United States""","""Henderson""","""Kentucky""",42420,"""South""","""FUR-BO-10001798""","""Furniture""","""Bookcases""","""Bush Somerset Collection Bookc…",261.96,2,0.0,41.9136,true
2,"""CA-2016-152156""","""11/8/2016""","""11/11/2016""","""Second Class""","""CG-12520""","""Claire Gute""","""Consumer""","""United States""","""Henderson""","""Kentucky""",42420,"""South""","""FUR-CH-10000454""","""Furniture""","""Chairs""","""Hon Deluxe Fabric Upholstered …",731.94,3,0.0,219.582,true
3,"""CA-2016-138688""","""6/12/2016""","""6/16/2016""","""Second Class""","""DV-13045""","""Darrin Van Huff""","""Corporate""","""United States""","""Los Angeles""","""California""",90036,"""West""","""OFF-LA-10000240""","""Office Supplies""","""Labels""","""Self-Adhesive Address Labels f…",14.62,2,0.0,6.8714,true
4,"""US-2015-108966""","""10/11/2015""","""10/18/2015""","""Standard Class""","""SO-20335""","""Sean O'Donnell""","""Consumer""","""United States""","""Fort Lauderdale""","""Florida""",33311,"""South""","""FUR-TA-10000577""","""Furniture""","""Tables""","""Bretford CR4500 Series Slim Re…",957.5775,5,0.45,-383.031,true
6,"""CA-2014-115812""","""6/9/2014""","""6/14/2014""","""Standard Class""","""BH-11710""","""Brosina Hoffman""","""Consumer""","""United States""","""Los Angeles""","""California""",90032,"""West""","""FUR-FU-10001487""","""Furniture""","""Furnishings""","""Eldon Expressions Wood and Pla…",48.86,7,0.0,14.1694,true
…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…
9990,"""CA-2014-110422""","""1/21/2014""","""1/23/2014""","""Second Class""","""TB-21400""","""Tom Boeckenhauer""","""Consumer""","""United States""","""Miami""","""Florida""",33180,"""South""","""FUR-FU-10001889""","""Furniture""","""Furnishings""","""Ultra Door Pull Handle""",25.248,3,0.2,4.1028,true
9991,"""CA-2017-121258""","""2/26/2017""","""3/3/2017""","""Standard Class""","""DB-13060""","""Dave Brooks""","""Consumer""","""United States""","""Costa Mesa""","""California""",92627,"""West""","""FUR-FU-10000747""","""Furniture""","""Furnishings""","""Tenex B1-RE Series Chair Mats …",91.96,2,0.0,15.6332,true
9992,"""CA-2017-121258""","""2/26/2017""","""3/3/2017""","""Standard Class""","""DB-13060""","""Dave Brooks""","""Consumer""","""United States""","""Costa Mesa""","""California""",92627,"""West""","""TEC-PH-10003645""","""Technology""","""Phones""","""Aastra 57i VoIP phone""",258.576,2,0.2,19.3932,true
9993,"""CA-2017-121258""","""2/26/2017""","""3/3/2017""","""Standard Class""","""DB-13060""","""Dave Brooks""","""Consumer""","""United States""","""Costa Mesa""","""California""",92627,"""West""","""OFF-PA-10004041""","""Office Supplies""","""Paper""","""It's Hot Message Books with St…",29.6,4,0.0,13.32,true


We can also specify a subset of columns to apply the condition on