<a href="https://colab.research.google.com/github/molitorl/Projekt-LJL/blob/main/Maternal_Health_Risk_Classification_Model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Projekt: Maternal Health Risk Classification Model**

**Team:**

Linda Fahrenbruch

Lucia Molitor

Anja Prpic





**Variablen:**


**Age (Integer):** Alter in Jahren einer Frau während der Schwangerschaft.

**SystolicBP (Integer)**: Oberer Wert des Blutdrucks in mmHg.

**DiastolicBP (Integer)**: Unterer Wert des Blutdrucks in mmHg.

**BS (Integer)**: Blutzuckerspiegel

**BodyTemp (Integer)**: Körpertemperatur in Fahrenheit.

**HeartRate (Integer)**: Ruhepuls


**RiskLevel (Categorical)**: Prognostizierte Risiko-Intensitätsstufe während der Schwangerschaft.

## Import der Libraries

In [None]:
import pandas as pd

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
from google.colab import drive
drive.mount('/content/drive')

# Pfade zu den Excel-Dateien
raw_data = "/content/drive/My Drive/AI and ML/Data/dataset_maternal_health.xlsx"  # Passe den Dateipfad entsprechend an


Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
raw_data

Unnamed: 0,Age,SystolicBP,DiastolicBP,BS,BodyTemp,HeartRate,RiskLevel
0,25,130,80,15.0,98.0,86,high risk
1,35,140,90,13.0,98.0,70,high risk
2,29,90,70,8.0,100.0,80,high risk
3,30,140,85,7.0,98.0,70,high risk
4,35,120,60,6.1,98.0,76,low risk
...,...,...,...,...,...,...,...
1009,22,120,60,15.0,98.0,80,high risk
1010,55,120,90,18.0,98.0,60,high risk
1011,35,85,60,19.0,98.0,86,high risk
1012,43,120,90,18.0,98.0,70,high risk


## Data Cleansing

In [None]:
# Überblick über die Daten
print(raw_data.info())
print(raw_data.describe())
print(raw_data['RiskLevel'].value_counts())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1014 entries, 0 to 1013
Data columns (total 7 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   Age          1014 non-null   int64  
 1   SystolicBP   1014 non-null   int64  
 2   DiastolicBP  1014 non-null   int64  
 3   BS           1014 non-null   float64
 4   BodyTemp     1014 non-null   float64
 5   HeartRate    1014 non-null   int64  
 6   RiskLevel    1014 non-null   object 
dtypes: float64(2), int64(4), object(1)
memory usage: 55.6+ KB
None
               Age   SystolicBP  DiastolicBP           BS     BodyTemp  \
count  1014.000000  1014.000000  1014.000000  1014.000000  1014.000000   
mean     29.871795   113.198225    76.460552     8.725986    98.665089   
std      13.474386    18.403913    13.885796     3.293532     1.371384   
min      10.000000    70.000000    49.000000     6.000000    98.000000   
25%      19.000000   100.000000    65.000000     6.900000    98.000000   
50% 

In [None]:
(raw_data.isna().sum()/len(raw_data)).apply('{0:.4%}'.format)

Age            0.0000%
SystolicBP     0.0000%
DiastolicBP    0.0000%
BS             0.0000%
BodyTemp       0.0000%
HeartRate      0.0000%
RiskLevel      0.0000%
dtype: object

In [None]:
raw_data.nunique()

Age            50
SystolicBP     19
DiastolicBP    16
BS             29
BodyTemp        8
HeartRate      16
RiskLevel       3
dtype: int64

In [None]:
# Datentypen überprüfen
print(raw_data.dtypes)

Age              int64
SystolicBP       int64
DiastolicBP      int64
BS             float64
BodyTemp       float64
HeartRate        int64
RiskLevel       object
dtype: object


In [None]:
# Kategorische Variablen in numerische Werte konvertieren
raw_data['RiskLevel'] = raw_data['RiskLevel'].map({'low risk': 0, 'mid risk': 1, 'high risk': 2})

In [None]:
print(raw_data)

      Age  SystolicBP  DiastolicBP    BS  BodyTemp  HeartRate  RiskLevel
0      25         130           80  15.0      98.0         86          2
1      35         140           90  13.0      98.0         70          2
2      29          90           70   8.0     100.0         80          2
3      30         140           85   7.0      98.0         70          2
4      35         120           60   6.1      98.0         76          0
...   ...         ...          ...   ...       ...        ...        ...
1009   22         120           60  15.0      98.0         80          2
1010   55         120           90  18.0      98.0         60          2
1011   35          85           60  19.0      98.0         86          2
1012   43         120           90  18.0      98.0         70          2
1013   32         120           65   6.0     101.0         76          1

[1014 rows x 7 columns]


In [None]:
raw_data.summary()

AttributeError: 'DataFrame' object has no attribute 'summary'

In [None]:
variance_values = raw_data.var()

In [None]:
print(variance_values)

Age            181.559065
SystolicBP     338.704005
DiastolicBP    192.815323
BS              10.847351
BodyTemp         1.880695
HeartRate       65.427104
RiskLevel        0.651818
dtype: float64


In [None]:
# BodyTemp von Fahrenheit zu Celsius umrechnen, damit wir die Daten verstehen :)
raw_data['BodyTemp'] = (raw_data['BodyTemp'] - 32) * 5.0/9.0


## Skalierung

In [None]:
from sklearn import preprocessing
scaler = preprocessing.StandardScaler()
from sklearn.preprocessing import StandardScaler

In [None]:
# Merkmale skalieren
scaler = StandardScaler()
data_scaled = raw_data.copy()
data_scaled[['Age', 'SystolicBP', 'DiastolicBP', 'BS', 'BodyTemp', 'HeartRate']] = scaler.fit_transform(raw_data[['Age', 'SystolicBP', 'DiastolicBP', 'BS', 'BodyTemp', 'HeartRate']])

In [None]:
# Daten nach dem Preprocessing anzeigen
print(data_scaled.head())

        Age  SystolicBP  DiastolicBP        BS  BodyTemp  HeartRate  RiskLevel
0 -0.361738    0.913396     0.255023  1.905890 -0.485215   1.446956          2
1  0.380777    1.457027     0.975539  1.298340 -0.485215  -0.532088          2
2 -0.064732   -1.261127    -0.465493 -0.220537  0.973884   0.704815          2
3  0.009519    1.457027     0.615281 -0.524312 -0.485215  -0.532088          2
4  0.380777    0.369765    -1.186009 -0.797710 -0.485215   0.210054          0


# Analyse

## Entscheidungsbäume

## Gradient Boosting

## Random Forest