# Cardiotocographic Classification using TensorFlow and Keras
##### Cardiotocographic classification for fetal heart-rate and uterine contractions, implemented with a PySpark Pipeline

Dataset from the UCI data repository: https://archive.ics.uci.edu/ml/datasets/cardiotocography

In [17]:
#Pipeline dependencies
from pyspark.ml import Pipeline
from pyspark.ml.classification import LogisticRegression
from pyspark.ml.feature import HashingTF, Tokenizer
from pyspark.sql import SparkSession

#Data manipulation, analysis and plotting tools
import pandas as pd
import numpy as np
import matplotlib as plt

#Machine Learning libraries
#import sklearn as sk
#import tensorflow as tf

In [18]:
#Define the session and cluster
spark = SparkSession.builder \
                    .master('local[4]') \
                    .appName('cardiotocography_pipeline') \
                    .getOrCreate()

In [19]:
#Read the CSV dataset file and print the schema
df = spark.read.options(header='True', #Specify that headers exist in dataset
                        inferSchema='True', 
                        delimiter=',' #Comma delimited
                       ).csv("CTG_data.csv") #Source file

#Drop missing values
df = df.dropna()
df.printSchema()

root
 |-- FileName: string (nullable = true)
 |-- Date: string (nullable = true)
 |-- SegFile: string (nullable = true)
 |-- b3: integer (nullable = true)
 |-- e4: integer (nullable = true)
 |-- LBE: integer (nullable = true)
 |-- LB: integer (nullable = true)
 |-- AC: integer (nullable = true)
 |-- FM: integer (nullable = true)
 |-- UC: integer (nullable = true)
 |-- ASTV: integer (nullable = true)
 |-- MSTV: double (nullable = true)
 |-- ALTV: integer (nullable = true)
 |-- MLTV: double (nullable = true)
 |-- DL: integer (nullable = true)
 |-- DS: integer (nullable = true)
 |-- DP: integer (nullable = true)
 |-- DR: integer (nullable = true)
 |-- Width: integer (nullable = true)
 |-- Min: integer (nullable = true)
 |-- Max: integer (nullable = true)
 |-- Nmax: integer (nullable = true)
 |-- Nzeros: integer (nullable = true)
 |-- Mode: integer (nullable = true)
 |-- Mean: integer (nullable = true)
 |-- Median: integer (nullable = true)
 |-- Variance: integer (nullable = true)
 |-- Ten

### Feature descriptions:
- b: Start instant
- e: End instant
- LBE: Baseline value (medical expert)
- LB: Baseline value (SisPorto)
- AC: Accelerations (SisPorto)
- FM: Foetal movement (SisPorto)
- UC: Uterine contractions (SisPorto)
- ASTV: percentage of time with abnormal short term variability  (SisPorto)
- mSTV	mean value of short term variability  (SisPorto)
- ALTV	percentage of time with abnormal long term variability  (SisPorto)
- mLTV	mean value of long term variability  (SisPorto)
- DL	light decelerations
- DS	severe decelerations
- DP	prolongued decelerations
- DR	repetitive decelerations
- Width	histogram width
- Min	low freq. of the histogram
- Max	high freq. of the histogram
- Nmax	number of histogram peaks
- Nzeros	number of histogram zeros
- Mode	histogram mode
- Mean	histogram mean
- Median	histogram median
- Variance	histogram variance
- Tendency	histogram tendency: -1=left assymetric; 0=symmetric; 1=right assymetric![image.png]