# Data Preparation for Classification of Neonate Weight

In [1]:
import pandas as pd
import numpy as np

The main dataframe that contains all necessary data must first be loaded.

In [2]:
mainDf = pd.read_csv('C:/Users/Nefeli/Desktop/biomed_project_data/mainDf_int.csv')

<br>According to the following resource that is partnered with some world renowned medical schools (Yale,UCSF,UCSD and more)
https://emedicine.medscape.com/article/938854-overview?form=fpf</br>

<br>As well as The Mayo Clinic : https://www.mayoclinic.org/diseases-conditions/fetal-macrosomia/symptoms-causes/syc-20372579 and Penn Medicine Lancaster General Health: https://www.lancastergeneralhealth.org/health-hub-home/motherhood/the-first-year/your-newborns-weight-gain</br>

Neonate Weight is classified as follows:
<br>ELBW: Extremely low birth weight -> weight < 1000 g </br>
<br>VLBW: Very low birth weight -> weight < 1500 g </br>
<br>LBW: Low birth weight -> weight < 2500 g</br>
<br>NORM: Normal birth weight -> weight >=2500 g and <=4000 g</br>
<br>MACRO: Macrosomia -> weight > 4000 g</br>

In [3]:
def neonateWeightClassAssigner_5(x):
    if x<1000:
        return 'ELBW'
    elif x<1500:
        return 'VLBW'
    elif x<2500:
        return 'LBW'
    elif x>= 2500 and x<=4000:
        return 'NORM'
    elif x>4000:
        return 'MACRO'
    else:
        return np.nan

In [4]:
mainDf['Neonate_Weight_5'] = [None]*len(mainDf) 

In [5]:
mainDf['Neonate_Weight_5']  = mainDf['Weight(g)'].apply(lambda x : neonateWeightClassAssigner_5(x))

In [6]:
y_labels, counts = np.unique(mainDf['Neonate_Weight_5'], return_counts=True)
print(y_labels)
print(counts)

['LBW' 'MACRO' 'NORM']
[  3  17 202]


The ELBW and VLBW classes do not exist in the dataset. There is also a prominent class imbalance amongst the other three existing classes with the majority of entries belonging to the 'NORM' class. The problem can be turned into a binary classification problem with two classes 'NORM' and 'ABNORM' for normal and abnormal cases respectively. So the 'LBW' and 'MACRO' will belong to 'ABNORM' and the rest remain in 'NORM'. There is nothing more than can be done while maintaining a interpretability. 

Neonate Weight is now classified as follows:
<br>ABNORM: Low birth weight -> weight < 2500 g</br>
<br>NORM: Normal birth weight -> weight >=2500 g and <=4000 g</br>
<br>ABNORM: Macrosomia -> weight > 4000 g</br>

In [7]:
def neonateWeightClassAssigner(x):
    
    if x<2500:
        return 'ABNORM'
    elif x>= 2500 and x<=4000:
        return 'NORM'
    elif x>4000:
        return 'ABNORM'
    else:
        return np.nan

In [8]:
mainDf['Neonate_Weight'] = [None]*len(mainDf) 

In [9]:
mainDf['Neonate_Weight']  = mainDf['Weight(g)'].apply(lambda x : neonateWeightClassAssigner(x))

In [10]:
y_labels, counts = np.unique(mainDf['Neonate_Weight'], return_counts=True)
print(y_labels)
print(counts)

['ABNORM' 'NORM']
[ 20 202]


In [11]:
to_drop=['Weight(g)','Neonate_Weight_5']
weightDf= mainDf.drop(columns=to_drop).copy()
#weightDf.info()
weightDf.head(3)

Unnamed: 0,pH,BDecf,pCO2,BE,Apgar1,Apgar5,Gest.weeks,Sex,Age,Gravidity,...,FHR_II_ffill_total_power,FHR_II_ffill_vlf,FHR_II_ffill_haar_stdev,FHR_II_ffill_haar_mean,FHR_II_ffill_samp_entr,FHR_II_ffill_bub_entr,diff_nni20,diff_lf_hf,diff_haar_std,Neonate_Weight
0,7.14,8.14,7.7,-10.5,6.0,8.0,37.0,2.0,32.0,1.0,...,368.077564,210.991854,1.549748,-0.000147,0.031682,0.179575,16,3.059011,0.437721,NORM
1,7.0,7.92,12.0,-12.0,8.0,8.0,41.0,2.0,23.0,1.0,...,573.335415,388.247905,3.125196,-0.01974,0.053499,0.15447,2,0.020674,1.175809,NORM
2,7.2,3.03,8.3,-5.6,7.0,9.0,40.0,1.0,31.0,1.0,...,285.519025,149.841842,2.459169,0.030445,0.052023,0.205526,8,1.917875,1.699915,NORM


In [12]:
weightDf.to_csv('C:/Users/Nefeli/Desktop/biomed_project_data/weightDf_bin.csv',index=False)