# Flower data, basic ML model

In this notebook we will train our first machine learning model. The goal is to predict the labels of unknown flowers. 

### loading libraries

In [1]:
import matplotlib.image as img # To load the images
import matplotlib.pyplot as plt # To plot the images

import copy  # to copy variables
import numpy as np # To do some calculations
import pandas as pd # To work with dataframes (easier matrices)
from sklearn.ensemble import RandomForestClassifier # The machine learning model 
from os import listdir # To get a list of files in a folder

## Calculating features

We want to quantify our image in  few variables which we have to calculate. In class we saw the sum example. So let's calculate the sum for each color, and let's see how we would be able to gather that data up in a matrix like structure (ndarray in python).

## Making the data matrix

The ML model we want to train expects

In [4]:
patient_data = pd.read_csv("patient_rawsignals.txt",  sep = ";")
patient_data

Unnamed: 0,time,value,NICU_ID,field,measurement,hospital_ID,patient_ID,signal_type
0,1.630454e+09,171,Bed 1,signal_value,rawsignal,UZA,257777778,HF
1,1.630454e+09,170,Bed 1,signal_value,rawsignal,UZA,257777778,HF
2,1.630454e+09,170,Bed 1,signal_value,rawsignal,UZA,257777778,HF
3,1.630454e+09,170,Bed 1,signal_value,rawsignal,UZA,257777778,HF
4,1.630455e+09,169,Bed 1,signal_value,rawsignal,UZA,257777778,HF
...,...,...,...,...,...,...,...,...
1988552,1.638763e+09,99,Bed 6,signal_value,rawsignal,UZA,257777807,Taxillair
1988553,1.638778e+09,101,Bed 6,signal_value,rawsignal,UZA,257777807,Taxillair
1988554,1.638792e+09,100,Bed 6,signal_value,rawsignal,UZA,257777807,Taxillair
1988555,1.638806e+09,102,Bed 6,signal_value,rawsignal,UZA,257777807,Taxillair


In [5]:
my_patients = np.unique(patient_data['patient_ID'])
my_patients 

array([257777778, 257777779, 257777781, 257777783, 257777785, 257777786,
       257777787, 257777788, 257777789, 257777790, 257777792, 257777793,
       257777795, 257777796, 257777798, 257777799, 257777800, 257777801,
       257777802, 257777803, 257777804, 257777805, 257777807])

In [37]:
patient_subset = patient_data[patient_data['patient_ID'] == 257777793]

# some time calc
patient_end_time = max(patient_subset['time'])
last_48h_start = patient_end_time - 2*24*60*60

HF  = patient_subset.loc[patient_subset['signal_type'] == 'HF', ['time', 'value'] ]
HF.loc[HF['value'] > 250, 'value'] = np.nan

low_HF = HF[HF['value'] < 100]
normal_HF = HF[HF['value'] > 100]

normal_48h = normal_HF[np.logical_and(normal_HF['time'] >= last_48h_start, normal_HF['time'] < patient_end_time)]
low_HF_48h = low_HF[np.logical_and(low_HF['time'] >= last_48h_start, low_HF['time'] < patient_end_time)]

# 3 features
avg_HF_48h_last = np.mean(normal_48h['value'])
low_count_48h = low_HF_48h['value'].shape[0]
low_low_48h = low_HF_48h[low_HF_48h['value'] < 75].shape[0]






In [38]:
low_low_48h

14

In [40]:
Feature_matrix = np.empty((0, 3), float) # initialization of empty ndarray
for patient in my_patients:
    print(patient)
    
    patient_subset = patient_data[patient_data['patient_ID'] == patient]

    # some time calc
    patient_end_time = max(patient_subset['time'])
    last_48h_start = patient_end_time - 2*24*60*60

    HF  = patient_subset.loc[patient_subset['signal_type'] == 'HF', ['time', 'value'] ]
    HF.loc[HF['value'] > 250, 'value'] = np.nan

    low_HF = HF[HF['value'] < 100]
    normal_HF = HF[HF['value'] > 100]

    normal_48h = normal_HF[np.logical_and(normal_HF['time'] >= last_48h_start, normal_HF['time'] < patient_end_time)]
    low_HF_48h = low_HF[np.logical_and(low_HF['time'] >= last_48h_start, low_HF['time'] < patient_end_time)]

    # 3 features
    avg_HF_48h_last = np.mean(normal_48h['value'])
    low_count_48h = low_HF_48h['value'].shape[0]
    low_low_48h = low_HF_48h[low_HF_48h['value'] < 75].shape[0]

    
    Features = [[avg_HF_48h_last, low_count_48h, low_low_48h]]
           
    Feature_matrix = np.append(Feature_matrix, Features, axis = 0) 

#RGB_Sum_results    

257777778
257777779
257777781
257777783
257777785
257777786
257777787
257777788
257777789
257777790
257777792
257777793
257777795
257777796
257777798
257777799
257777800
257777801
257777802
257777803
257777804
257777805
257777807


In [41]:
Feature_matrix

array([[1.81208986e+02, 1.80000000e+01, 0.00000000e+00],
       [2.12480035e+02, 0.00000000e+00, 0.00000000e+00],
       [2.02322031e+02, 7.00000000e+00, 0.00000000e+00],
       [1.41696817e+02, 1.10000000e+01, 1.00000000e+01],
       [1.20120542e+02, 1.29600000e+03, 2.30000000e+01],
       [1.88123869e+02, 1.20000000e+01, 0.00000000e+00],
       [2.01581783e+02, 6.00000000e+00, 0.00000000e+00],
       [1.34390941e+02, 2.00000000e+01, 2.00000000e+01],
       [1.31931323e+02, 2.30000000e+01, 2.30000000e+01],
       [1.81366818e+02, 1.60000000e+01, 0.00000000e+00],
       [1.54189481e+02, 1.80000000e+01, 8.00000000e+00],
       [1.21811978e+02, 2.16700000e+03, 1.40000000e+01],
       [1.49232834e+02, 2.20000000e+01, 1.20000000e+01],
       [1.31742385e+02, 1.50000000e+01, 1.50000000e+01],
       [1.26327905e+02, 3.53000000e+02, 2.10000000e+01],
       [1.69169887e+02, 1.50000000e+01, 0.00000000e+00],
       [1.52186488e+02, 1.70000000e+01, 1.30000000e+01],
       [1.27829735e+02, 1.60000

## Building the outcome vector (flower type for each file)
