<a href="https://colab.research.google.com/github/sakluk/cognitive-systems-for-health-technology/blob/master/Week_2_Case_1_preprocessing_data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Week 2. Case 1 preprocessing data (drafty notes)
Cognitive Systems for Health Technology Applications<br>
26.1.2019, Sakari Lukkarinen<br>
Helsinki Metropolia University of Applied Sciences

## Preprocessing and saving data
The following code snippet imports the data and labels from UCI archives, preprocess them and save them to data files. 

This will simplify your working processes as you need only to preprocess the data once and the recycle the preprocessed data in your network modeling trials.  

In [1]:
# Import libraries
import numpy as np
import pandas as pd
from keras.utils import to_categorical

# Import data
url = r'http://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/processed.cleveland.data'
dataframe = pd.read_csv(url, 
                        sep = ',', 
                        header = None, 
                        index_col = None,
                        na_values = '?')

# Add column names
name_list = ['age', 'sex', 'cp','trestbps', 'chol', 'fbs','restecg',
             'thalac','exang','oldpeak','slope','ca','thal','num']
dataframe.columns = name_list

# Fill missing data with columnwise median values
dataframe = dataframe.fillna(dataframe.median())

# Select the data (input) columns
data_list = ['age', 'sex', 'cp','trestbps', 'chol', 'fbs','restecg',
             'thalac','exang','oldpeak','slope','ca','thal']
data = dataframe[data_list]

# Scale the data
data_min = data.min()
data_max = data.max()
data_norm = (data - data_min)/(data_max - data_min)

# Save the data
np.save('case_1_data.npy', data_norm)

# Select the labels (output)
labels = dataframe['num']

# Code labels to categorical output
one_hot_labels = to_categorical(labels)

# Save categorical (one hot coded) labels
np.save('case_1_one_hot_labels.npy', one_hot_labels)

# Make binary labels
bin_labels = 1.0*(labels > 0.0)

# Save binary labels
np.save('case_1_bin_labels.npy', bin_labels)

Using TensorFlow backend.


## Test data loading
You can use the following code snippets in your modeling trials to load the preprocessed data  into the Notebook. 

Note! Check that the data-files are in the same folder as your Notebooks.

In [2]:
# Load data and display first five rows
data1 = np.load('case_1_data.npy')
print('Data:\n', data1[:5])

# Load one-hot-labels and print first 5 rows
hot1 = np.load('case_1_one_hot_labels.npy')
print('One-hot-labels:\n', hot1[:5])

# Load binary labels and print first 5 rows
bin1 = np.load('case_1_bin_labels.npy')
print('Binary labels:\n', bin1[:5])

Data:
 [[0.70833333 1.         0.         0.48113208 0.24429224 1.
  1.         0.60305344 0.         0.37096774 1.         0.
  0.75      ]
 [0.79166667 1.         1.         0.62264151 0.3652968  0.
  1.         0.28244275 1.         0.24193548 0.5        1.
  0.        ]
 [0.79166667 1.         1.         0.24528302 0.23515982 0.
  1.         0.44274809 1.         0.41935484 0.5        0.66666667
  1.        ]
 [0.16666667 1.         0.66666667 0.33962264 0.28310502 0.
  0.         0.88549618 0.         0.56451613 1.         0.
  0.        ]
 [0.25       0.         0.33333333 0.33962264 0.17808219 0.
  1.         0.77099237 0.         0.22580645 0.         0.
  0.        ]]
One-hot-labels:
 [[1. 0. 0. 0. 0.]
 [0. 0. 1. 0. 0.]
 [0. 1. 0. 0. 0.]
 [1. 0. 0. 0. 0.]
 [1. 0. 0. 0. 0.]]
Binary labels:
 [0. 1. 1. 0. 0.]


In [0]:
# Good luck!