# A very useful tool to clean column name of a panda’s data frame

![image.png](attachment:ccb4dd4e-7a93-474e-a4b8-c5d382ea1237.png) Source : https://www.kdnuggets.com/2022/11/4-ways-rename-pandas-columns.html

Data are always keep under a column to give an identity or recognition. So, column name should be human readable. Then the business stack holders or other parties can easily understand the visual report with the meaningful name of column. This increase the communication between data analyst and higher officials or customers.

In [14]:
# Import all the libraries that are necessary.
import pandas as pd
import numpy as np

In [18]:
df = pd.read_csv("Fitbit.csv")
df.head(5)

Unnamed: 0,Date,Calorie burned,Steps,Distance,Floors,Minutes Sedentary,Minutes Lightly Active,Minutes Fairly Active,Minutes Very Active,Activity Calories,MinutesOfSleep,MinutesOfBeingAwake,NumberOfAwakings,LengthOfRestInMinutes
0,08-05-2015,1934,905,0.65,0,1.355,46,0,0,1680,384,26,23,417
1,09-05-2015,3631,18925,14.11,4,611.0,316,61,60,2248,454,35,21,491
2,10-05-2015,3204,14228,10.57,1,602.0,226,14,77,1719,387,46,25,436
3,11-05-2015,2673,6756,5.02,8,749.0,190,23,4,9620,311,31,21,350
4,12-05-2015,2495,502,3.73,1,876.0,171,0,0,7360,407,65,44,491


In [12]:
# A dictionary with key, value pair. Key will be replaced with the value.
dic = {
  " ": "_",
  "%": "",
  "#": "",
  "(": "",
  ")": "",
  "$": "",
  "/": "_",
  "\n": "",
  "-": "_",
  "__": "_",
}

In [20]:
def format_columns(df_clr, dic):
    """
    Cleans up numbers, URLs, and special characters from a string.

    Args:
        df : dataframe
        dic: dictionary
    Return:
        clean columns
    """
    # Remove special characters
    
    df_clr = df_clr.rename(columns=str.lower)
    
    for x, y in dic.items():
        df_clr.columns = df_clr.columns.str.replace(x,y)
    
    df_clr.columns = df_clr.columns.str.strip("_")
    
    return df_clr.columns

In [22]:
# df is dataframe
df.columns = format_columns(df,dic)
df.columns

Index(['date', 'calorie_burned', 'steps', 'distance', 'floors',
       'minutes_sedentary', 'minutes_lightly_active', 'minutes_fairly_active',
       'minutes_very_active', 'activity_calories', 'minutesofsleep',
       'minutesofbeingawake', 'numberofawakings', 'lengthofrestinminutes'],
      dtype='object')