# Project Notebook: Optimizing DataFrames and Processing in Chunks

## 1. Introduction 

In this project, we'll practice working with chunked dataframes and optimizing a dataframe's memory usage. We'll be working with financial lending data from Lending Club, a marketplace for personal loans that matches borrowers with investors. You can read more about the marketplace on its website.

The Lending Club's website lists approved loans. Qualified investors can view the borrower's credit score, the purpose of the loan, and other details in the loan applications. Once a lender is ready to back a loan, it selects the amount of money it wants to fund. When the loan amount the borrower requested is fully funded, the borrower receives the money, minus the origination fee that Lending Club charges.

We'll be working with a dataset of loans approved from 2007-2011 (https://bit.ly/3H2XVgC). We've already removed the desc column for you to make our system run more quickly.

If we read in the entire data set, it will consume about 67 megabytes of memory. Let's imagine that we only have 10 megabytes of memory available throughout this project, so you can practice the concepts you learned in the last two lessons.

**Tasks**

1. Read in the first five lines from `loans_2007.csv` (https://bit.ly/3H2XVgC) and look for any data quality issues.

2. Read in the first 1000 rows from the data set, and calculate the total memory usage for these rows. Increase or decrease the number of rows to converge on a memory usage under five megabytes (to stay on the conservative side).

In [None]:
#@title Default title text
# Importing pandas
import pandas as pd
pd.options.display.max_columns = 99

# Your code goes here
moma = pd.read_csv("https://bit.ly/3H2XVgC" , nrows = 10000)
moma.head()



Unnamed: 0,id,member_id,loan_amnt,funded_amnt,funded_amnt_inv,term,int_rate,installment,grade,sub_grade,emp_title,emp_length,home_ownership,annual_inc,verification_status,issue_d,loan_status,pymnt_plan,purpose,title,zip_code,addr_state,dti,delinq_2yrs,earliest_cr_line,inq_last_6mths,open_acc,pub_rec,revol_bal,revol_util,total_acc,initial_list_status,out_prncp,out_prncp_inv,total_pymnt,total_pymnt_inv,total_rec_prncp,total_rec_int,total_rec_late_fee,recoveries,collection_recovery_fee,last_pymnt_d,last_pymnt_amnt,last_credit_pull_d,collections_12_mths_ex_med,policy_code,application_type,acc_now_delinq,chargeoff_within_12_mths,delinq_amnt,pub_rec_bankruptcies,tax_liens
0,1077501,1296599.0,5000.0,5000.0,4975.0,36 months,10.65%,162.87,B,B2,,10+ years,RENT,24000.0,Verified,Dec-2011,Fully Paid,n,credit_card,Computer,860xx,AZ,27.65,0.0,Jan-1985,1.0,3.0,0.0,13648.0,83.7%,9.0,f,0.0,0.0,5863.155187,5833.84,5000.0,863.16,0.0,0.0,0.0,Jan-2015,171.62,Jun-2016,0.0,1.0,INDIVIDUAL,0.0,0.0,0.0,0.0,0.0
1,1077430,1314167.0,2500.0,2500.0,2500.0,60 months,15.27%,59.83,C,C4,Ryder,< 1 year,RENT,30000.0,Source Verified,Dec-2011,Charged Off,n,car,bike,309xx,GA,1.0,0.0,Apr-1999,5.0,3.0,0.0,1687.0,9.4%,4.0,f,0.0,0.0,1008.71,1008.71,456.46,435.17,0.0,117.08,1.11,Apr-2013,119.66,Sep-2013,0.0,1.0,INDIVIDUAL,0.0,0.0,0.0,0.0,0.0
2,1077175,1313524.0,2400.0,2400.0,2400.0,36 months,15.96%,84.33,C,C5,,10+ years,RENT,12252.0,Not Verified,Dec-2011,Fully Paid,n,small_business,real estate business,606xx,IL,8.72,0.0,Nov-2001,2.0,2.0,0.0,2956.0,98.5%,10.0,f,0.0,0.0,3005.666844,3005.67,2400.0,605.67,0.0,0.0,0.0,Jun-2014,649.91,Jun-2016,0.0,1.0,INDIVIDUAL,0.0,0.0,0.0,0.0,0.0
3,1076863,1277178.0,10000.0,10000.0,10000.0,36 months,13.49%,339.31,C,C1,AIR RESOURCES BOARD,10+ years,RENT,49200.0,Source Verified,Dec-2011,Fully Paid,n,other,personel,917xx,CA,20.0,0.0,Feb-1996,1.0,10.0,0.0,5598.0,21%,37.0,f,0.0,0.0,12231.89,12231.89,10000.0,2214.92,16.97,0.0,0.0,Jan-2015,357.48,Apr-2016,0.0,1.0,INDIVIDUAL,0.0,0.0,0.0,0.0,0.0
4,1075358,1311748.0,3000.0,3000.0,3000.0,60 months,12.69%,67.79,B,B5,University Medical Group,1 year,RENT,80000.0,Source Verified,Dec-2011,Current,n,other,Personal,972xx,OR,17.94,0.0,Jan-1996,0.0,15.0,0.0,27783.0,53.9%,38.0,f,461.73,461.73,3581.12,3581.12,2538.27,1042.85,0.0,0.0,0.0,Jun-2016,67.79,Jun-2016,0.0,1.0,INDIVIDUAL,0.0,0.0,0.0,0.0,0.0


# Observatioon.
a lot of repetitive information on the data, eg, %, year, INDIVIDUAL.
We also hav some missing data, for exampl emp_title

In [None]:



print(moma.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 52 columns):
 #   Column                      Non-Null Count  Dtype  
---  ------                      --------------  -----  
 0   id                          10000 non-null  int64  
 1   member_id                   10000 non-null  float64
 2   loan_amnt                   10000 non-null  float64
 3   funded_amnt                 10000 non-null  float64
 4   funded_amnt_inv             10000 non-null  float64
 5   terms_in_months             10000 non-null  float64
 6   int_rate(%)                 10000 non-null  float64
 7   installment                 10000 non-null  float64
 8   grade                       10000 non-null  object 
 9   sub_grade                   10000 non-null  object 
 10  emp_title                   9348 non-null   object 
 11  emp_length_years            9645 non-null   object 
 12  home_ownership              10000 non-null  object 
 13  annual_inc                  1000

## observation:
for efficeint memory usage, a chunk size of 10000 allows us to use 4MB of **memory**

## 2. Exploring the Data in Chunks

Let's familiarize ourselves with the columns to see which ones we can optimize. In the first lesson, we explored column types by reading in the full dataframe. In this project, let's try to understand the column types better while using dataframe chunks.

**Tasks**

For each chunk:
* How many columns have a numeric type? 
* How many have a string type?
* How many unique values are there in each string column? How many of the string columns contain values that are less than 50% unique?
* Which float columns have no missing values and could be candidates for conversion to the integer type?
* Calculate the total memory usage across all of the chunks.

In [None]:
# Your code goes here
import pandas as pd
import matplotlib.pyplot as plt

#using chunk to check the column type and missing data per chunk, also calculates the memory usage

chunk_iter = pd.read_csv("https://bit.ly/3H2XVgC", chunksize=10000)
for chunk in chunk_iter:
  print(chunk.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 52 columns):
 #   Column                      Non-Null Count  Dtype  
---  ------                      --------------  -----  
 0   id                          10000 non-null  int64  
 1   member_id                   10000 non-null  float64
 2   loan_amnt                   10000 non-null  float64
 3   funded_amnt                 10000 non-null  float64
 4   funded_amnt_inv             10000 non-null  float64
 5   term                        10000 non-null  object 
 6   int_rate                    10000 non-null  object 
 7   installment                 10000 non-null  float64
 8   grade                       10000 non-null  object 
 9   sub_grade                   10000 non-null  object 
 10  emp_title                   9348 non-null   object 
 11  emp_length                  9645 non-null   object 
 12  home_ownership              10000 non-null  object 
 13  annual_inc                  1000

##Observation
Each chunk utilizes approximately 4MBs, with the last having 1 MB

In [None]:
#How many unique values are there in each string column? How many of the string columns contain values that are less than 50% unique?

for col in moma.select_dtypes(include=['object']):
    num_unique_values = len(moma[col].unique())
    num_total_values = len(moma[col])
    print(col, num_unique_values / num_total_values)
     

term 0.0002
int_rate 0.007
grade 0.0007
sub_grade 0.0035
emp_title 0.8172
emp_length 0.0012
home_ownership 0.0003
verification_status 0.0003
issue_d 0.0005
loan_status 0.0006
pymnt_plan 0.0001
purpose 0.0013
title 0.4084
zip_code 0.072
addr_state 0.0045
earliest_cr_line 0.0465
revol_util 0.1027
initial_list_status 0.0001
last_pymnt_d 0.0059
last_credit_pull_d 0.0059
application_type 0.0001


## Observation
emp_title has greater than 0.5 unique values

In [None]:
#Which float columns have no missing values and could be candidates for conversion to the integer type?

# checking floats with null information
float_chunk = chunk.select_dtypes(include=['float'])
print(float_chunk.isnull().sum())

member_id                       2
loan_amnt                       2
funded_amnt                     2
funded_amnt_inv                 2
installment                     2
annual_inc                      6
dti                             2
delinq_2yrs                    31
inq_last_6mths                 31
open_acc                       31
pub_rec                        31
revol_bal                       2
total_acc                      31
out_prncp                       2
out_prncp_inv                   2
total_pymnt                     2
total_pymnt_inv                 2
total_rec_prncp                 2
total_rec_int                   2
total_rec_late_fee              2
recoveries                      2
collection_recovery_fee         2
last_pymnt_amnt                 2
collections_12_mths_ex_med     91
policy_code                     2
acc_now_delinq                 31
chargeoff_within_12_mths       91
delinq_amnt                    31
pub_rec_bankruptcies          670
tax_liens     

## 3. Optimizing String Columns

We can achieve the greatest memory improvements by converting the string columns to a numeric type. Let's convert all of the columns where the values are less than 50% unique to the category type, and the columns that contain numeric values to the `float` type.

While working with dataframe chunks:
* Determine which string columns you can convert to a numeric type if you clean them. For example, the `int_rate` column is only a string because of the % sign at the end.
* Determine which columns have a few unique values and convert them to the category type. For example, you may want to convert the grade and `sub_grade` columns.
Based on your conclusions, perform the necessary type changes across all chunks. * Calculate the total memory footprint, and compare it with the previous one.

In [None]:
# checking strings with null information
float_chunk = chunk.select_dtypes(include=['object'])
print(float_chunk.isnull().sum())

id                       0
term                     2
int_rate                 2
grade                    2
sub_grade                2
emp_title              146
emp_length              27
home_ownership           2
verification_status      2
issue_d                  2
loan_status              2
pymnt_plan               2
purpose                  2
title                    4
zip_code                 2
addr_state               2
earliest_cr_line        31
revol_util              42
initial_list_status      2
last_pymnt_d            14
last_credit_pull_d       4
application_type         2
dtype: int64


In [None]:
#we rename the column 'term' to 'terms in months' and eliminate the word months in values to make it int,
#' emp_length' to change to 'emp_length_years' and delete repetitive years in the rows
#on 'zip_code' remove repetitive xx
#on 'revol_util' remove repetitive %
#on 'int_rate' remove repetiive %
#for chunk in chunk_iter:
 
# for the above, let rename colum to have a decsription to allow us conver their data into int/float
moma.rename(columns={"term":"terms_in_months", "emp_length":"emp_length_years","int_rate":"int_rate(%)", "revol_util":"revol_util(%)"}, inplace=True)#moma['emp_length_years']=chunk['emp_length_years'].str.replace('year','')
moma['emp_length_years']=moma['emp_length_years'].str.replace('years','')
moma['zip_code']=moma['zip_code'].str.replace('xx','')
moma['int_rate(%)']=moma['int_rate(%)'].str.replace('%','')
moma['revol_util(%)']=moma['revol_util(%)'].str.replace('%','')
moma['terms_in_months']=moma['terms_in_months'].str.replace('months','')
  #convert the above to float

moma['zip_code']=moma['zip_code'].astype(float)
moma['int_rate(%)']=moma['int_rate(%)'].astype(float)
moma['terms_in_months']=moma['terms_in_months'].astype(float)
moma['revol_util(%)']=moma['revol_util(%)'].astype(float)
moma.head()

  



Unnamed: 0,id,member_id,loan_amnt,funded_amnt,funded_amnt_inv,terms_in_months,int_rate(%),installment,grade,sub_grade,emp_title,emp_length_years,home_ownership,annual_inc,verification_status,issue_d,loan_status,pymnt_plan,purpose,title,zip_code,addr_state,dti,delinq_2yrs,earliest_cr_line,inq_last_6mths,open_acc,pub_rec,revol_bal,revol_util(%),total_acc,initial_list_status,out_prncp,out_prncp_inv,total_pymnt,total_pymnt_inv,total_rec_prncp,total_rec_int,total_rec_late_fee,recoveries,collection_recovery_fee,last_pymnt_d,last_pymnt_amnt,last_credit_pull_d,collections_12_mths_ex_med,policy_code,application_type,acc_now_delinq,chargeoff_within_12_mths,delinq_amnt,pub_rec_bankruptcies,tax_liens
0,1077501,1296599.0,5000.0,5000.0,4975.0,36.0,10.65,162.87,B,B2,,10+,RENT,24000.0,Verified,Dec-2011,Fully Paid,n,credit_card,Computer,860.0,AZ,27.65,0.0,Jan-1985,1.0,3.0,0.0,13648.0,83.7,9.0,f,0.0,0.0,5863.155187,5833.84,5000.0,863.16,0.0,0.0,0.0,Jan-2015,171.62,Jun-2016,0.0,1.0,INDIVIDUAL,0.0,0.0,0.0,0.0,0.0
1,1077430,1314167.0,2500.0,2500.0,2500.0,60.0,15.27,59.83,C,C4,Ryder,< 1 year,RENT,30000.0,Source Verified,Dec-2011,Charged Off,n,car,bike,309.0,GA,1.0,0.0,Apr-1999,5.0,3.0,0.0,1687.0,9.4,4.0,f,0.0,0.0,1008.71,1008.71,456.46,435.17,0.0,117.08,1.11,Apr-2013,119.66,Sep-2013,0.0,1.0,INDIVIDUAL,0.0,0.0,0.0,0.0,0.0
2,1077175,1313524.0,2400.0,2400.0,2400.0,36.0,15.96,84.33,C,C5,,10+,RENT,12252.0,Not Verified,Dec-2011,Fully Paid,n,small_business,real estate business,606.0,IL,8.72,0.0,Nov-2001,2.0,2.0,0.0,2956.0,98.5,10.0,f,0.0,0.0,3005.666844,3005.67,2400.0,605.67,0.0,0.0,0.0,Jun-2014,649.91,Jun-2016,0.0,1.0,INDIVIDUAL,0.0,0.0,0.0,0.0,0.0
3,1076863,1277178.0,10000.0,10000.0,10000.0,36.0,13.49,339.31,C,C1,AIR RESOURCES BOARD,10+,RENT,49200.0,Source Verified,Dec-2011,Fully Paid,n,other,personel,917.0,CA,20.0,0.0,Feb-1996,1.0,10.0,0.0,5598.0,21.0,37.0,f,0.0,0.0,12231.89,12231.89,10000.0,2214.92,16.97,0.0,0.0,Jan-2015,357.48,Apr-2016,0.0,1.0,INDIVIDUAL,0.0,0.0,0.0,0.0,0.0
4,1075358,1311748.0,3000.0,3000.0,3000.0,60.0,12.69,67.79,B,B5,University Medical Group,1 year,RENT,80000.0,Source Verified,Dec-2011,Current,n,other,Personal,972.0,OR,17.94,0.0,Jan-1996,0.0,15.0,0.0,27783.0,53.9,38.0,f,461.73,461.73,3581.12,3581.12,2538.27,1042.85,0.0,0.0,0.0,Jun-2016,67.79,Jun-2016,0.0,1.0,INDIVIDUAL,0.0,0.0,0.0,0.0,0.0


In [None]:
#Checking memwory usage after converting above to float
print(moma.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 52 columns):
 #   Column                      Non-Null Count  Dtype  
---  ------                      --------------  -----  
 0   id                          10000 non-null  int64  
 1   member_id                   10000 non-null  float64
 2   loan_amnt                   10000 non-null  float64
 3   funded_amnt                 10000 non-null  float64
 4   funded_amnt_inv             10000 non-null  float64
 5   terms_in_months             10000 non-null  float64
 6   int_rate(%)                 10000 non-null  float64
 7   installment                 10000 non-null  float64
 8   grade                       10000 non-null  object 
 9   sub_grade                   10000 non-null  object 
 10  emp_title                   9348 non-null   object 
 11  emp_length_years            9645 non-null   object 
 12  home_ownership              10000 non-null  object 
 13  annual_inc                  1000

###converting objects with less than 0.5 unique values into category

In [None]:
for col in moma.select_dtypes(include=['object']):
    num_unique_values = len(moma[col].unique())
    num_total_values = len(moma[col])
    if num_unique_values / num_total_values < 0.5:
        moma[col] = moma[col].astype('category')
        
print(moma.info(memory_usage='deep'))

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 52 columns):
 #   Column                      Non-Null Count  Dtype   
---  ------                      --------------  -----   
 0   id                          10000 non-null  int64   
 1   member_id                   10000 non-null  float64 
 2   loan_amnt                   10000 non-null  float64 
 3   funded_amnt                 10000 non-null  float64 
 4   funded_amnt_inv             10000 non-null  float64 
 5   terms_in_months             10000 non-null  float64 
 6   int_rate(%)                 10000 non-null  float64 
 7   installment                 10000 non-null  float64 
 8   grade                       10000 non-null  category
 9   sub_grade                   10000 non-null  category
 10  emp_title                   9348 non-null   object  
 11  emp_length_years            9645 non-null   category
 12  home_ownership              10000 non-null  category
 13  annual_inc       

In [None]:
#checking the memory usage after category conversion
print(moma.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 52 columns):
 #   Column                      Non-Null Count  Dtype   
---  ------                      --------------  -----   
 0   id                          10000 non-null  int64   
 1   member_id                   10000 non-null  float64 
 2   loan_amnt                   10000 non-null  float64 
 3   funded_amnt                 10000 non-null  float64 
 4   funded_amnt_inv             10000 non-null  float64 
 5   terms_in_months             10000 non-null  float64 
 6   int_rate(%)                 10000 non-null  float64 
 7   installment                 10000 non-null  float64 
 8   grade                       10000 non-null  category
 9   sub_grade                   10000 non-null  category
 10  emp_title                   9348 non-null   object  
 11  emp_length_years            9645 non-null   category
 12  home_ownership              10000 non-null  category
 13  annual_inc       

###
Observation

the memory usage for the 10000 chunk reduces to 3.1MB

## 4. Optimizing Numeric Columns

It looks like we were able to realize some powerful memory savings by converting to the category type and converting string columns to numeric ones.

Now let's optimize the numeric columns using the `pandas.to_numeric()` function.

**Tasks**

While working with dataframe chunks:
* Identify float columns that contain missing values, and that we can convert to a more space efficient subtype.
* Identify float columns that don't contain any missing values, and that we can convert to the integer type because they represent whole numbers.
* Based on your conclusions, perform the necessary type changes across all chunks.
* Calculate the total memory footprint and compare it with the previous one.




In [None]:
# Your code goes here
import numpy as np
def change_to_int(df, col_name):
    # Get the minimum and maximum values
    col_max = df[col_name].max()
    col_min = df[col_name].min()
    # Find the datatype
    for dtype_name in ['int8', 'int16', 'int32', 'int64']:
        # Check if this datatype can hold all values
        if col_max <  np.iinfo(dtype_name).max and col_min > np.iinfo(dtype_name).min:
            df[col_name] = df[col_name].astype(dtype_name)
            break

# check missing values
float_moma = moma.select_dtypes(include=['float64'])
print(float_moma.isnull().sum())

member_id                     0
loan_amnt                     0
funded_amnt                   0
funded_amnt_inv               0
terms_in_months               0
int_rate(%)                   0
installment                   0
annual_inc                    0
zip_code                      0
dti                           0
delinq_2yrs                   0
inq_last_6mths                0
open_acc                      0
pub_rec                       0
revol_bal                     0
revol_util(%)                 3
total_acc                     0
out_prncp                     0
out_prncp_inv                 0
total_pymnt                   0
total_pymnt_inv               0
total_rec_prncp               0
total_rec_int                 0
total_rec_late_fee            0
recoveries                    0
collection_recovery_fee       0
last_pymnt_amnt               0
collections_12_mths_ex_med    0
policy_code                   0
acc_now_delinq                0
chargeoff_within_12_mths      0
delinq_a

### revol_util(%) has missing values

In [55]:
#Optimizing Float Columns With Subtypes
float_cols = moma.select_dtypes(include=['float']).columns

# Write you code below
for col in float_cols:
    moma[col] = pd.to_numeric(moma[col], downcast='float')

In [None]:
#let change columns with no missing vanues to int
change_to_int(moma, 'int_rate(%)')
#change_to_int(moma, 'revol_util(%)')
change_to_int(moma, 'zip_code')
change_to_int(moma, 'terms_in_months')




change_to_int(moma, 'member_id')
change_to_int(moma, 'funded_amnt')
change_to_int(moma, 'funded_amnt_inv')
change_to_int(moma, 'installment')
change_to_int(moma, 'annual_inc')
change_to_int(moma, 'dti')
change_to_int(moma, 'delinq_2yrs')
change_to_int(moma, 'inq_last_6mths')
change_to_int(moma, 'open_acc')
change_to_int(moma, 'pub_rec')
change_to_int(moma, 'revol_bal')
change_to_int(moma, 'total_acc')
change_to_int(moma, 'out_prncp')
change_to_int(moma, 'out_prncp_inv')
change_to_int(moma, 'total_pymnt')
change_to_int(moma, 'total_pymnt_inv')
change_to_int(moma, 'total_rec_prncp')
change_to_int(moma, 'total_rec_int')
change_to_int(moma, 'collections_12_mths_ex_med')
change_to_int(moma, 'policy_code')
change_to_int(moma, 'acc_now_delinq')
change_to_int(moma, 'chargeoff_within_12_mths')
change_to_int(moma, 'delinq_amnt')
change_to_int(moma, 'pub_rec_bankruptcies')
change_to_int(moma, 'tax_liens')
              
             


In [None]:
#checking the current memory usage
print(moma.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 52 columns):
 #   Column                      Non-Null Count  Dtype   
---  ------                      --------------  -----   
 0   id                          10000 non-null  int64   
 1   member_id                   10000 non-null  int32   
 2   loan_amnt                   10000 non-null  float32 
 3   funded_amnt                 10000 non-null  int32   
 4   funded_amnt_inv             10000 non-null  int32   
 5   terms_in_months             10000 non-null  int8    
 6   int_rate(%)                 10000 non-null  int8    
 7   installment                 10000 non-null  int16   
 8   grade                       10000 non-null  category
 9   sub_grade                   10000 non-null  category
 10  emp_title                   9348 non-null   object  
 11  emp_length_years            9645 non-null   category
 12  home_ownership              10000 non-null  category
 13  annual_inc       

## Observation:

the memory saving for processing a 10000 chunk reduced from 4 MBs to 1.3 MBs

## Next Steps

We've practiced optimizing a dataframe's memory footprint and working with dataframe chunks. Here's an idea for some next steps:

Create a function that automates as much of the work you just did as possible, so that you could use it on other Lending Club data sets. This function should:

* Determine the optimal chunk size based on the memory constraints you provide.

* Determine which string columns can be converted to numeric ones by removing the `%` character.

* Determine which numeric columns can be converted to more space efficient representations.


In [62]:
# Determine which numeric columns can be converted to more space efficient representations.

def numeric_int():
      
  for col in moma.select_dtypes(include=['float']):
    if (moma[col].isnull().sum()) == 0:
      moma[col].astype(int)



In [63]:
#Determine which string columns can be converted to numeric ones by removing the % character.
def string_numeric ():

  for col in moma.select_dtypes(include=['object']):
    num_contain_values = ((moma[col].str.contains("%")).value_counts()).to_string()
    num_total_values = len(moma[col])
    if pd.to_numeric(num_contain_values) - num_total_values == 0 :
      moma[col]=moma[col].str.replace('%','')
  