## Pandas Missing Values
2020-09-25

Real world data has garbage mixed in with it, so we need good ways to handle that. Sometimes it's best to have an error thrown and trap that. But with big text files of data, it's often better just to get NaN or NaT values for the bad data. The pandas to_numeric and to_datetime functions make that easy. Let's look at a delimeted text file wiht an integer and date columns, each. with one bad value.

In [1]:
import pandas as pd
import io
import numpy as np

In [2]:
S = """counter|date
1|1999-01-31
2|1999-02-31
N|1999-03-31
"""
tR = pd.read_csv(io.StringIO(S), sep="|")
tR

Unnamed: 0,counter,date
0,1,1999-01-31
1,2,1999-02-31
2,N,1999-03-31


In [3]:
tR.dtypes

counter    object
date       object
dtype: object

### Converting
The DataFrame was read in as string, because we did not ask for something else, and we can use the converter functions to get numeric and date types with missing values marked. 

In [4]:
tR["counterN"] = pd.to_numeric(tR.counter, errors="coerce")
tR["dateTime"] = pd.to_datetime(tR.date, errors="coerce")
tR.dtypes

counter             object
date                object
counterN           float64
dateTime    datetime64[ns]
dtype: object

In [5]:
tR

Unnamed: 0,counter,date,counterN,dateTime
0,1,1999-01-31,1.0,1999-01-31
1,2,1999-02-31,2.0,NaT
2,N,1999-03-31,,1999-03-31
