# Maternal Health Risk

Dataset Information

Additional Information

Age, Systolic Blood Pressure as SystolicBP, Diastolic BP as DiastolicBP, Blood Sugar as BS, Body Temperature as BodyTemp, HeartRate and RiskLevel. All these are the responsible and significant risk factors for maternal mortality, that is one of the main concern of SDG of UN.

https://archive.ics.uci.edu/dataset/863/maternal+health+risk

In [1]:
%load_ext autoreload
%autoreload 2

In [3]:
from ucimlrepo import fetch_ucirepo 

In [4]:
# fetch dataset 
maternal_health_risk = fetch_ucirepo(id=863) 

In [5]:
# data (as pandas dataframes) 
X = maternal_health_risk.data.features 
y = maternal_health_risk.data.targets 
  
# metadata 
print(maternal_health_risk.metadata) 
  
# variable information 
print(maternal_health_risk.variables) 

{'uci_id': 863, 'name': 'Maternal Health Risk', 'repository_url': 'https://archive.ics.uci.edu/dataset/863/maternal+health+risk', 'data_url': 'https://archive.ics.uci.edu/static/public/863/data.csv', 'abstract': 'Data has been collected from different hospitals, community clinics, maternal health cares from the rural areas of Bangladesh through the IoT based risk monitoring system.', 'area': 'Health and Medicine', 'tasks': ['Classification'], 'characteristics': ['Multivariate'], 'num_instances': 1013, 'num_features': 6, 'feature_types': ['Real', 'Integer'], 'demographics': ['Age'], 'target_col': ['RiskLevel'], 'index_col': None, 'has_missing_values': 'no', 'missing_values_symbol': None, 'year_of_dataset_creation': 2020, 'last_updated': 'Fri Nov 03 2023', 'dataset_doi': '10.24432/C5DP5D', 'creators': ['Marzia Ahmed'], 'intro_paper': {'title': 'Review and Analysis of Risk Factor of Maternal Health in Remote Area Using the Internet of Things (IoT)', 'authors': 'Marzia Ahmed, M. A. Kashem,

In [6]:
import pandas as pd

1.  Extract header names
2. Seperate header names into continous or Categorical
3. If it is not included, include the dependent variable 
4. Clean header names (remove gaps or other unnecessary characters) and assign it as header names of dataframe
5. Check the data 
6. Save the dataframe as the csv file (train.csv)
7. Use the same header names on test data. Then apply the steps 4,5 for test data
8. Save the dataframe as the csv file (test.csv) (They don't include index of test)
9. Combine the train and test data and convert into csv (all.csv)
10. Take randomly 100 sample (apply index reset) and save as demo


- Use train and test csv data to obtain first ML efficacy results 
- Remove unnecessary files

In [7]:
header_names = X.columns
print(header_names)

Index(['Age', 'SystolicBP', 'DiastolicBP', 'BS', 'BodyTemp', 'HeartRate'], dtype='object')


In [8]:
y.dtypes

RiskLevel    object
dtype: object

In [9]:
num_cols = ['Age', 'SystolicBP', 'DiastolicBP', 'BS', 'BodyTemp', 'HeartRate']

In [10]:
cat_cols = ["RiskLevel"]

In [11]:
mhr_data = pd.concat([X,y],axis=1)
print(mhr_data.shape)

(1014, 7)


In [12]:
mhr_data.head()

Unnamed: 0,Age,SystolicBP,DiastolicBP,BS,BodyTemp,HeartRate,RiskLevel
0,25,130,80,15.0,98.0,86,high risk
1,35,140,90,13.0,98.0,70,high risk
2,29,90,70,8.0,100.0,80,high risk
3,30,140,85,7.0,98.0,70,high risk
4,35,120,60,6.1,98.0,76,low risk


In [13]:
for col in mhr_data.columns:
    if col in cat_cols:
        mhr_data[col] = mhr_data[col].apply(lambda x: x.strip())

In [14]:
mhr_data["RiskLevel"].value_counts()

low risk     406
mid risk     336
high risk    272
Name: RiskLevel, dtype: int64

In [15]:
from sklearn.model_selection import train_test_split

In [16]:
df_train, df_test = train_test_split(mhr_data,random_state=1234, test_size=0.2,stratify=mhr_data["RiskLevel"])
df_train.shape, df_test.shape

((811, 7), (203, 7))

In [24]:
df_train.head()

Unnamed: 0,Age,SystolicBP,DiastolicBP,BS,BodyTemp,HeartRate,RiskLevel
639,19,120,60,7.0,98.4,70,low risk
833,50,130,100,16.0,98.0,75,mid risk
477,19,120,75,7.9,98.0,70,low risk
915,19,90,65,7.5,101.0,70,low risk
683,40,140,100,18.0,98.0,90,high risk


In [25]:
df_test.head()

Unnamed: 0,Age,SystolicBP,DiastolicBP,BS,BodyTemp,HeartRate,RiskLevel
987,17,90,63,8.0,101.0,70,high risk
880,15,76,49,6.8,98.0,77,low risk
3,30,140,85,7.0,98.0,70,high risk
54,60,90,65,7.0,98.0,77,low risk
848,15,70,50,6.0,98.0,70,mid risk


In [17]:
df_train.to_csv("train.csv", index=None)
df_test.to_csv("test.csv", index=None)

In [18]:
df_all = pd.concat([df_train, df_test])
df_all.to_csv("mhr_data.csv", index=None)

In [19]:
df_demo = df_train.sample(100, random_state=1000)
df_demo.reset_index(inplace=True, drop=True)
df_demo.head()

Unnamed: 0,Age,SystolicBP,DiastolicBP,BS,BodyTemp,HeartRate,RiskLevel
0,17,90,60,6.9,101.0,76,mid risk
1,28,115,60,7.8,101.0,86,mid risk
2,32,140,90,18.0,98.0,88,high risk
3,21,120,80,7.5,98.0,77,mid risk
4,49,120,90,7.5,98.0,77,low risk


In [20]:
df_demo.to_csv("demo.csv", index=None)

In [21]:
ls

data_load_mhr.ipynb  demo.csv  mhr_data.csv  test.csv  train.csv


In [22]:
import sys
sys.path.append("../../../")
sys.path.append("../../")
from margctgan.metrics import utility

In [26]:
f1_score = utility.efficacy_test(realdata=df_test, fakedata=df_train, target_name="RiskLevel")
print(f1_score)

0.6403940886699507


In [28]:
f1_score = utility.efficacy_test(realdata=df_train, fakedata=df_test, target_name="RiskLevel")
print(f1_score)

0.623921085080148
