I dislike columns with names that slow my work. This package will convert all columns names to snake_case, using the following rules:
- Everything (all columns) are converted to lowercase.
- All spaces are replaced with underscores.
- Everything that isn't a letter, digit, or underscore (in a column name) is removed.
git clone https://github.com/jonnagel/fix_df_cols.git
TODO
Create a pd.DataFrame as normal, then run clean() method to fix the column names. This adds a clean() method to all pd.DataFrames, calling this method fixes the columns in place.
from fix_df_cols.src.fixdfcols import CleanDF
bad_df = pd.DataFrame(columns=['abc -@#ab%@', '12 3', 'a**bcCCC'])
# bad_df.columns
# Index(['abc -@#ab%@', '12 3', 'a**bc'], dtype='object')
bad_df.clean()
# bad_df.columns
# Index(['abc_ab', '12_3', 'abc'], dtype='object')
Standalone example
# a python list or pd.Index works as a manual fix
from src.fixdfcols import FixCols
clean_cols = FixCols(['abc -@#ab%@', '12 3', 'a**bcEB']).columns_clean
# clean_cols
# ['abc_ab', '12_3', 'abceb']
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.