# clean_df_column_names(df) function

December 26, 2022

Function: Clean Pandas DataFrame column names

@author: Oscar A. Trevizo

* Convert column heading to lower case.
* Change all spaces in columns to underscore '_'.

### References 
1. Harvard Data Science Foundations CSCI E101 by Prof. B. Huang
1. "Pandas General Functions" (accessed Feb. 20, 2022) https://pandas.pydata.org/pandas-docs/stable/reference/general_functions.html 


# Libraries

In [1]:
import pandas as pd

# Function

In [2]:
def clean_column_names(df):
    """    
    Cleans the column names of a Pandas DataFrame.
    Converts all column headings to lower case, and changes all spaces in columns to underscore '_'.
    
    Parameters
    ----------
    df: Pandas DataFrame.

    Returns
    -------
    df: Cleaned Pandas DataFrame.

    """
    # https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.rename.html

    # Convert all column heading to lower case.
    df = df.rename(columns = str.lower)
    
    # Change spaces in columns to underscore '_'.
    df = df.rename(columns = {col : col.replace(' ', '_') for col in df.columns})
    
    return df

# Test it

## Create DataFrame: Uncleaned column names

In [3]:
# Get the data
first_name = [" Joan", "Mary ", " Vijay ", "Rob ", "Martha", "Josh", " Vicky", " Mario", "Jenny", "Joe"]
last_name = [" T"," K ", " N ", "R ", "L", "F ", " R", " L", "H", "P"]
score_1 = [91, 83, 95, 72, 91, 85, 89, 82, 72, 79]
score_2 = [91, 85, 90, 81, 95, 92, 88, 94, 75, 75]

# Build the dataframe that has column names with spaces and upper case.
df = pd.DataFrame({'First Name':first_name, 'Last Name':last_name, 'Score 1':score_1, 'Score 2':score_2}  )
df.head(10)

Unnamed: 0,First Name,Last Name,Score 1,Score 2
0,Joan,T,91,91
1,Mary,K,83,85
2,Vijay,N,95,90
3,Rob,R,72,81
4,Martha,L,91,95
5,Josh,F,85,92
6,Vicky,R,89,88
7,Mario,L,82,94
8,Jenny,H,72,75
9,Joe,P,79,75


## Run the function to clean the column names

In [4]:
cleaned_df = clean_column_names(df)
cleaned_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 4 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   first_name  10 non-null     object
 1   last_name   10 non-null     object
 2   score_1     10 non-null     int64 
 3   score_2     10 non-null     int64 
dtypes: int64(2), object(2)
memory usage: 448.0+ bytes
