## Cleaning Data

When using pandas, having Python-friendly column names makes attribute access possible. The `pyjanitor clean_names` function will return a DataFrame with columns in lowercase and spaces replaced by underscores:

In [4]:
import pandas as pd
import janitor

In [11]:
sample_df = pd.DataFrame( {
                "A": [1, None, 3],
                " sales numbers ": [20.0, 30.0, None],
              }
            )

# Clean the column names

(
    sample_df
    # Clean the Column Names from the pyjanitor package
    .clean_names()
)

Unnamed: 0,a,_sales_numbers_
0,1.0,20.0
1,,30.0
2,3.0,


> Pyjanitor package does not strip whitespace before and after col names

In [10]:
# We can also strip whitespace by the ff

def clean_col(col):
    return (col.strip().replace(" ", "_"))

(
    sample_df
    # rename all the column names
    .rename(columns=clean_col)
)

Unnamed: 0,A,sales_numbers
0,1.0,20.0
1,,30.0
2,3.0,


In [14]:
# Check whether any missing values exist

(
    sample_df
    # Check if any Missing values exist
    .isna()
    .any()
    .any()
)

True

> As a sanity check before creating models, you can use pandas to ensure that
you have dealt with all missing values. The following code returns a single
boolean if there is any cell that is missing in a DataFrame: