# **Introduction**
**In this notebook I'll Classify fetal health in order to prevent child and maternal mortality.**

**So, We will divide the result into 3 classes**
1. Normal
2. Suspect
3. Pathological

![](http://stream.org/wp-content/uploads/Scientist-Fetus-Embryo-healthy-Life-Baby-Science-Studies-900.jpg)

Firstly I'll explain what is the meaning of important feature:

* baseline value - Baseline Fetal Heart Rate (FHR)
* accelerations - Number of accelerations per second
* fetal_movement - Number of fetal movements per second
* uterine_con - Number of uterine contractions per second
* light_decelerations - Number of LDs per second
* severe_decel - Number of SDs per second
* prolongued_decel - Number of PDs per second
* abnormal_short - Percentage of time with abnormal short term variability
* mean_value_of_sh - Mean value of short term variability
* percentage_of_ti - Percentage of time with abnormal long term variability

**let's start import libraries.**

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.metrics import mean_squared_error

**Read data**

In [None]:
df = pd.read_csv('../input/fetal-health-classification/fetal_health.csv')
df.head()

In [None]:
print (f'data shape is {df.shape}')
print (f'data columns : \n {df.columns}')

**describe data**

In [None]:
df.describe().T

**missing data if there are some missing data that we have to fixed it**

In [None]:
df.info()

**It seems there are no missing data**

**My target is fetal_health which has the result.
So, we'll analysis it and find the relationship between it and other feature.**

In [None]:
fetal = df['fetal_health']
plt.figure(figsize = (9,6))
sns.boxplot(x = fetal, y= 'baseline value', data = df, palette="Blues")
plt.title('Baseline Fetal Heart Rate (FHR) for each status')

remamber that 1.0 mean normal , 2.0 mean Suspect and 3.0 mean Pathological.

In [None]:
plt.figure(figsize = (9,6))
sns.barplot(x = fetal, y= 'accelerations', data = df )
plt.title("accelerations of each status health")

We will find the count of each status in this data, to easiest on us visualize it.

In [None]:
a, b, c = df['fetal_health'].value_counts()
plt.figure(figsize = (13,5))
plt.subplot(121)
plt.pie([a, b, c], labels=["Normal", "Suspect", "Pathological"], autopct="%1.0f%%")
print(df["fetal_health"].value_counts())
Status = {1: 'Normal', 2: 'Suspect', 3: 'Pathological'}
fetal = [Status[i] for i in df["fetal_health"]]
plt.subplot(122)
sns.countplot(fetal)

To more understand data we will find correlation between fetal health and other features.

In [None]:
corr = df.corr()
sort_corr = corr.sort_values(by=["fetal_health"], ascending=False).head(10)
plt.figure(figsize = (9,9))
sns.heatmap(df.corr(), annot=True, cmap = "Oranges", fmt = '.1f', cbar = True, square = True)
print(sort_corr.fetal_health)

Let's see the distribution of the data and histogram

In [None]:
df.hist(figsize = (20,20))

The data is varied in numbers, and this creates some problems for us. We will normalize the data to avoid these problems.

In [None]:
col = ['baseline value', 'accelerations', 'fetal_movement',
       'uterine_contractions', 'light_decelerations', 'severe_decelerations',
       'prolongued_decelerations', 'abnormal_short_term_variability',
       'mean_value_of_short_term_variability',
       'percentage_of_time_with_abnormal_long_term_variability',
       'mean_value_of_long_term_variability', 'histogram_width',
       'histogram_min', 'histogram_max', 'histogram_number_of_peaks',
       'histogram_number_of_zeroes', 'histogram_mode', 'histogram_mean',
       'histogram_median', 'histogram_variance', 'histogram_tendency']
standardscaler = StandardScaler()
X = standardscaler.fit_transform(df.drop(["fetal_health"],axis = 1))
data_nor = pd.DataFrame(X, columns = col)
data_nor.head()

Let's split the data into training data and test data. Let's train the model on the training data and then test it on the test data.

In [None]:
target = df["fetal_health"]
X_train, X_test, y_train, y_test = train_test_split(data_nor, target, test_size = 0.3, random_state = 42, stratify = target)
logistic_regression = LogisticRegression()
logistic_regression_mod = logistic_regression.fit(X_train, y_train)
logistic_regression_mod.score(X_test, y_test)

This ratio is in the Logistic Regression,
Let's try another model.

In [None]:
from sklearn import svm
clf = svm.SVC(C=2)
clf.fit(X_train, y_train)
clf.score(X_test, y_test)

In [None]:
pridict = clf.predict(X_test)
from sklearn.metrics import mean_squared_error
MSEValue = mean_squared_error(y_test, pridict)
print('Mean Squared Error Value is : ', MSEValue)

In [None]:
print("Classification Report")
print(classification_report(y_test, pridict))