A lot of times when you are using `pandas` to read data, if you are missing a point then `pandas` will automatically fill in that missing point.

We will begin to explore methods of handling missing values or fill in those missing values.

In [2]:
import numpy as np
import pandas as pd

In [3]:
d = {'A': [1, 2, np.nan], 'B': [5, np.nan, np.nan], 'C': [7, 8, 9]}
# np.nan is a special floating point value that is used to represent missing data

In [4]:
df = pd.DataFrame(data = d)
df

Unnamed: 0,A,B,C
0,1.0,5.0,7
1,2.0,,8
2,,,9


In [5]:
df.isnull() # Find Null Values or Check for Null Values

Unnamed: 0,A,B,C
0,False,False,False
1,False,True,False
2,True,True,False


In [6]:
help(df.dropna)

Help on method dropna in module pandas.core.frame:

dropna(*, axis: 'Axis' = 0, how: 'AnyAll | lib.NoDefault' = <no_default>, thresh: 'int | lib.NoDefault' = <no_default>, subset: 'IndexLabel | None' = None, inplace: 'bool' = False, ignore_index: 'bool' = False) -> 'DataFrame | None' method of pandas.core.frame.DataFrame instance
    Remove missing values.
    
    See the :ref:`User Guide <missing_data>` for more on which values are
    considered missing, and how to work with missing data.
    
    Parameters
    ----------
    axis : {0 or 'index', 1 or 'columns'}, default 0
        Determine if rows or columns which contain missing values are
        removed.
    
        * 0, or 'index' : Drop rows which contain missing values.
        * 1, or 'columns' : Drop columns which contain missing value.
    
        Only a single axis is allowed.
    
    how : {'any', 'all'}, default 'any'
        Determine if row or column is removed from DataFrame, when we have
        at least one NA

In [7]:
df.dropna() # drop rows with NaN values

Unnamed: 0,A,B,C
0,1.0,5.0,7


In [8]:
df.dropna(axis = 1) # drop columns with NaN values

Unnamed: 0,C
0,7
1,8
2,9


In [9]:
df.dropna(thresh = 2) # drop rows with at least 2 NaN value

Unnamed: 0,A,B,C
0,1.0,5.0,7
1,2.0,,8


In [10]:
help(df.fillna)

Help on method fillna in module pandas.core.generic:

fillna(value: 'Hashable | Mapping | Series | DataFrame | None' = None, *, method: 'FillnaOptions | None' = None, axis: 'Axis | None' = None, inplace: 'bool_t' = False, limit: 'int | None' = None, downcast: 'dict | None | lib.NoDefault' = <no_default>) -> 'Self | None' method of pandas.core.frame.DataFrame instance
    Fill NA/NaN values using the specified method.
    
    Parameters
    ----------
    value : scalar, dict, Series, or DataFrame
        Value to use to fill holes (e.g. 0), alternately a
        dict/Series/DataFrame of values specifying which value to use for
        each index (for a Series) or column (for a DataFrame).  Values not
        in the dict/Series/DataFrame will not be filled. This value cannot
        be a list.
    method : {'backfill', 'bfill', 'ffill', None}, default None
        Method to use for filling holes in reindexed Series:
    
        * ffill: propagate last valid observation forward to next

In [11]:
df.fillna(value = 'FILL VALUE')

Unnamed: 0,A,B,C
0,1.0,5.0,7
1,2.0,FILL VALUE,8
2,FILL VALUE,FILL VALUE,9


You can see more examples in the decription of `help(df.fillna)`.

In [12]:
df['A'].fillna(value = df['A'].mean()) # fill NaN values with the mean of the column

0    1.0
1    2.0
2    1.5
Name: A, dtype: float64