HDF_dataset_adds_on is intended for adding feature values to the HDF files, in a way to eliminate further computations. Features are added in the form of dataframes. Each dataframe has the structure [rows: the readable ECG leads, columns: feature values over the ECG ROI's]

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import h5py
import neurokit2 as nk

In [2]:
normal_ecg_age = pd.read_pickle('normal_ecg_age.pickle')
normal_ecg_age

Unnamed: 0,ECG_ID,Age,Age_class_0,Age_class_1,Age_class_2,Age_class_3
0,A00002,32,2,1,0,0
1,A00003,63,5,2,1,1
2,A00006,46,3,1,1,0
3,A00008,32,2,1,0,0
4,A00009,48,3,1,1,0
...,...,...,...,...,...,...
13900,A25755,44,3,1,1,0
13901,A25756,76,6,3,2,1
13902,A25757,55,4,2,1,1
13903,A25764,20,1,0,0,0


Create h5py dataframe of R peaks: f['ECG_R_Peaks'][...]. Procedure also creates pickles files containing:
 - ecg_signal_read_error: pickle file containing ['ECG_ID', 'Derivation'] of signal read error OSError.
 - ecg_multiple_r_peaks_detection: pickle file containing ['ECG_ID', 'Derivation'] of signal with multiple R Peaks detection. (Un apagon interrumpio su calculo)
 - ecg_r_peaks_missing: pickle file containing ['ECG_ID', 'Derivation'] of signal with miss detection and -1 included because NaN is of type float.

Dataframe in the form rows: derivations, columns: ECG_R_Peaks fiducial points.

In [21]:
ecg_signal_read_error = pd.DataFrame(columns=['ECG_ID', 'Derivation'])
ecg_multiple_r_peaks_detection = pd.DataFrame(columns=['ECG_ID', 'Derivation'])
ecg_r_peaks_missing = pd.DataFrame(columns=['ECG_ID', 'Derivation'])
for id in normal_ecg_age['ECG_ID']:
    with h5py.File(f'E:/1-DENIS/Biomarkers/SPH dataset/records/{id}.h5', 'r+') as f:
        for der in range (12):
            try:
                ecg_sig = f['ecg'][der]
            except OSError:
                print(f'ECG signal: {id}, derivation: {der}. Couldnot be read')
                ecg_signal_read_error.loc[len(ecg_signal_read_error)] = [id, der]
                continue
            ecg_fixed, is_inverted = nk.ecg_invert(ecg_sig, sampling_rate=500)
            if is_inverted:
                ecg_sig = ecg_fixed    
            signals, _ = nk.ecg_process(ecg_sig, sampling_rate=500)
            roi_ref = list(signals[signals['ECG_R_Peaks'] == 1].index)
            if der == 0:
                ECG_R_Peaks_dataframe = pd.DataFrame(columns=[c for c in range(len(roi_ref))])            
            else:
                while len(roi_ref) > len(ECG_R_Peaks_dataframe.columns):
                    interval_difference = [0] * (len(roi_ref) - 1)
                    for i in range(len(roi_ref) - 1):
                        interval_difference[i] = roi_ref[i + 1] - roi_ref[i]
                    index_min_interval = interval_difference.index(min(interval_difference)) + 1
                    ecg_multiple_r_peaks_detection.loc[len(ecg_multiple_r_peaks_detection)] = [id, der]
                    roi_ref.pop(index_min_interval)
                while len(roi_ref) < len(ECG_R_Peaks_dataframe.columns):
                    roi_ref.append(-1)
                    ecg_r_peaks_missing.loc[len(ecg_r_peaks_missing)] = [id, der]
            ECG_R_Peaks_dataframe.loc[len(ECG_R_Peaks_dataframe)] = roi_ref
        #ECG_R_Peaks_dataframe.index = [['I', 'II', 'III', 'aVR', 'aVL', 'aVF', 'V1', 'V2', 'V3', 'V4', 'V5', 'V6']]
        f['ECG_R_Peaks'] = ECG_R_Peaks_dataframe
        del ECG_R_Peaks_dataframe
        f.close()
ecg_signal_read_error.to_pickle('ecg_signal_read_error.pickle')
ecg_multiple_r_peaks_detection.to_pickle('ecg_multiple_r_peaks_detection.pickle')
ecg_r_peaks_missing.to_pickle('ecg_r_peaks_missing.pickle')

Because blackout running was interrupted and pickle files were not correctly created. The folloeing procedure is to create pickle files related with ECG read error and missing R peaks. The statistics related with multiple R peaks detection needs to be calculated on running time and the process is tedious (+24 hours).

In [None]:
ecg_signal_read_error = pd.DataFrame(columns=['ECG_ID', 'Derivation'])
ecg_r_peaks_missing = pd.DataFrame(columns=['ECG_ID', 'Derivation'])
for id in normal_ecg_age['ECG_ID']:
    with h5py.File(f'E:/1-DENIS/Biomarkers/SPH dataset/records/{id}.h5', 'r+') as f:
        r_peaks = f['ECG_R_Peaks'][...]
        row, col = r_peaks.shape
        for peaks_list in r_peaks:
            for c in range(col):
                if peaks_list[c] == -1:
                    ecg_r_peaks_missing.loc[len(ecg_r_peaks_missing)] = [id, der]
        for der in range (12):
            try:
                ecg_sig = f['ecg'][der]
            except OSError:
                print(f'ECG signal: {id}, derivation: {der}. Couldnot be read')
                ecg_signal_read_error.loc[len(ecg_signal_read_error)] = [id, der]
                continue      
        f.close()
ecg_signal_read_error.to_pickle('ecg_signal_read_error.pickle')
ecg_r_peaks_missing.to_pickle('ecg_r_peaks_missing.pickle')

Following procedure creates h5py dataframe of the fractal dimension values obtained around detected ECG R Peaks (ROI = [-150 + ECG_R_Peaks: 150 + ECG_R_Peaks]). 
Fractal functions are defined in fractal_function_list.
Dataframe in the form: [rows: derivations, columns: fractal dimension values for each ECG ROI].

In [51]:
count_wrong_read_signal = 0
count_missing_peaks = 0
fractal_function_list = [nk.fractal_higuchi, nk.fractal_hurst, nk.fractal_dfa]
for fractal_functions in fractal_function_list:
    for id in normal_ecg_age['ECG_ID']:
        with h5py.File(f'E:/1-DENIS/Biomarkers/SPH dataset/records/{id}.h5', 'r+') as f:
            r_peaks = f['ECG_R_Peaks'][...]
            col = r_peaks.shape[1]
            temp_DataFrame = pd.DataFrame(columns=[c for c in range(col)])
            temp_list = [np.NaN]*col
            index = range(12)
            for index, r in zip(index, r_peaks):
                try:
                    signal = f['ecg'][index]
                except OSError:
                    count_wrong_read_signal+=1
                    continue
                for c in range(col):
                    if r[c] != -1:
                        temp_list[c],_ = fractal_functions(signal[r[c] - 150:r[c] + 150])
                    else:
                        count_missing_peaks+=1
                        temp_list[c] = temp_list[c - 1]
                temp_DataFrame.loc[len(temp_DataFrame)] = temp_list
                temp_list = [np.NaN]*col
            mn = pd.Series(np.mean(temp_DataFrame, axis=1))
            mn.name = col
            std = pd.Series(np.std(temp_DataFrame, axis=1))
            std.name = col + 1
            temp_DataFrame = pd.concat([temp_DataFrame, mn, std], axis=1)
            name = str(fractal_functions)
            name = name.split(' ')
            f[f'Fractal_Dimension/{name[1]}'] = temp_DataFrame
            del temp_DataFrame, r_peaks, mn, std
            f.close()
    #print('Cannot read de signal:', count_wrong_read_signal) size of r_peaks don't let code to rest of derivations
    print('Missing R peaks:', count_missing_peaks)

Find the derivations with minor Katz fractal dimension standard deviation

In [26]:
derivations_list = [] 
for id in normal_ecg_age['ECG_ID']:
    with h5py.File(f'E:/1-DENIS/Biomarkers/SPH dataset/records/{id}.h5', 'r+') as f:
        k = f['Katz_fractal'][:,-1]
        derivations_list.append(np.argmin(k))
np.save('derivations_list_Katz_std_minimun.npy.npy', derivations_list)

In [27]:
unique, counts = np.unique(derivations_list, return_counts=True)
print(np.asarray((unique, counts)).T)

[[   0  569]
 [   1  876]
 [   2  303]
 [   3  711]
 [   4  118]
 [   5  518]
 [   6  802]
 [   7 2107]
 [   8 1572]
 [   9 2473]
 [  10 2420]
 [  11 1436]]


Derivation ['I', 'II', 'III', 'aVR', 'aVL', 'aVF', 'V1', 'V2', 'V3', 'V4', 'V5', 'V6'] with minor standard deviation: 9 corresponding to V4, 10 corresponding to V5, 7 corresponding to V2

In [4]:
Katz_mean_457 = pd.DataFrame(columns=['ECG_ID', 'Katz_mean_V4', 'Katz_mean_V5', 'Katz_mean_V2'])
for id in normal_ecg_age['ECG_ID']:
    with h5py.File(f'E:/1-DENIS/Biomarkers/SPH dataset/records/{id}.h5', 'r+') as f:
        try:
            Katz_mean_457.loc[len(Katz_mean_457)] = [id, f['Katz_fractal'][9,-2], f['Katz_fractal'][10,-2], f['Katz_fractal'][7,-2]]
        except IndexError:
            continue
Katz_mean_457.to_pickle('Katz_mean_457.pickle')

In [5]:
Katz_mean_V457

Unnamed: 0,ECG_ID,Katz_mean_V4,Katz_mean_V5,Katz_mean_V2
0,A00002,1.256836,1.205078,1.245117
1,A00003,1.423828,1.388672,1.340820
2,A00006,1.310547,1.313477,1.375000
3,A00008,1.284180,1.262695,1.316406
4,A00009,1.292969,1.264648,1.320312
...,...,...,...,...
13862,A25755,1.243164,1.220703,1.407227
13863,A25756,1.233398,1.227539,1.317383
13864,A25757,1.362305,1.320312,1.367188
13865,A25764,1.291992,1.266602,1.439453


Conform Katz_mean pickle file formed by the mean of the Katz fractal dimension values for the 12 leads.

In [33]:
Katz_mean = pd.DataFrame(columns=['ECG_ID','I_mean', 'II_mean', 'III_mean', 'aVR_mean', 'aVL_mean', 'aVF_mean', 'V1_mean', 'V2_mean', 'V3_mean', 'V4_mean', 'V5_mean', 'V6_mean'])
for id in normal_ecg_age['ECG_ID']:
    with h5py.File(f'E:/1-DENIS/Biomarkers/SPH dataset/records/{id}.h5', 'r+') as f:
        try:
            Katz_mean.loc[len(Katz_mean)] = [id, f['Katz_fractal'][0,-2], f['Katz_fractal'][1,-2], f['Katz_fractal'][2,-2], f['Katz_fractal'][3,-2], f['Katz_fractal'][4,-2], f['Katz_fractal'][5,-2], f['Katz_fractal'][6,-2], f['Katz_fractal'][7,-2], f['Katz_fractal'][8,-2], f['Katz_fractal'][9,-2], f['Katz_fractal'][10,-2], f['Katz_fractal'][11,-2]]
        except IndexError:
            continue
Katz_mean.to_pickle('Katz_mean.pickle')

In [34]:
Katz_mean

Unnamed: 0,ECG_ID,I_mean,II_mean,III_mean,aVR_mean,aVL_mean,aVF_mean,V1_mean,V2_mean,V3_mean,V4_mean,V5_mean,V6_mean
0,A00002,1.248047,1.194336,1.248047,1.210938,1.367188,1.192383,1.194336,1.245117,1.377930,1.256836,1.205078,1.176758
1,A00003,1.921875,1.493164,1.545898,1.659180,1.859375,1.464844,1.563477,1.340820,1.361328,1.423828,1.388672,1.398438
2,A00006,1.668945,1.635742,1.791992,1.575195,1.916992,1.703125,1.440430,1.375000,1.408203,1.310547,1.313477,1.321289
3,A00008,1.495117,1.263672,1.261719,1.290039,1.298828,1.256836,1.287109,1.316406,1.347656,1.284180,1.262695,1.236328
4,A00009,1.277344,1.322266,1.553711,1.292969,1.284180,1.416992,1.222656,1.320312,1.398438,1.292969,1.264648,1.253906
...,...,...,...,...,...,...,...,...,...,...,...,...,...
13862,A25755,1.231445,1.244141,1.421875,1.225586,1.266602,1.364258,1.343750,1.407227,1.283203,1.243164,1.220703,1.213867
13863,A25756,1.351562,1.333008,1.667969,1.324219,1.491211,1.391602,1.355469,1.317383,1.256836,1.233398,1.227539,1.234375
13864,A25757,1.501953,1.314453,1.443359,1.357422,1.698242,1.337891,1.388672,1.367188,1.367188,1.362305,1.320312,1.290039
13865,A25764,1.392578,1.285156,1.279297,1.320312,1.367188,1.271484,1.397461,1.439453,1.411133,1.291992,1.266602,1.251953


Find the derivations with minor line length fractal dimension standard deviation

In [43]:
derivations_list_line_length = [] 
for id in normal_ecg_age['ECG_ID']:
    with h5py.File(f'E:/1-DENIS/Biomarkers/SPH dataset/records/{id}.h5', 'r+') as f:
        ll = f['Line_length_fractal'][:,-1]
        derivations_list_line_length.append(np.argmin(ll))
np.save('derivations_list_line_length_std_minimun.npy', derivations_list_line_length)

In [44]:
unique, counts = np.unique(derivations_list_line_length, return_counts=True)
print(np.asarray((unique, counts)).T)

[[   0 2741]
 [   1 2813]
 [   2 1168]
 [   3 3153]
 [   4 1324]
 [   5  640]
 [   6  726]
 [   7  387]
 [   8  308]
 [   9  100]
 [  10  169]
 [  11  376]]


Derivation ['I', 'II', 'III', 'aVR', 'aVL', 'aVF', 'V1', 'V2', 'V3', 'V4', 'V5', 'V6'] with minor standard deviation: 3 corresponding to aVR, 1 corresponding to II, 0 corresponding to I

In [47]:
Line_length_mean_310 = pd.DataFrame(columns=['ECG_ID', 'Ll_mean_aVR', 'Ll_mean_II', 'Ll_mean_I'])
for id in normal_ecg_age['ECG_ID']:
    with h5py.File(f'E:/1-DENIS/Biomarkers/SPH dataset/records/{id}.h5', 'r+') as f:
        try:
            Line_length_mean_310.loc[len(Line_length_mean_310)] = [id, f['Line_length_fractal'][3,-2], f['Line_length_fractal'][1,-2], f['Line_length_fractal'][0,-2]]
        except IndexError:
            continue
Line_length_mean_310.to_pickle('Line_length_mean_310.pickle')

In [48]:
Line_length_mean_310

Unnamed: 0,ECG_ID,Ll_mean_aVR,Ll_mean_II,Ll_mean_I
0,A00002,0.010239,0.010925,0.010254
1,A00003,0.014694,0.016937,0.016769
2,A00006,0.013832,0.023315,0.009232
3,A00008,0.008522,0.013283,0.005756
4,A00009,0.009232,0.009369,0.009468
...,...,...,...,...
13900,A25755,0.007534,0.006481,0.009285
13901,A25756,0.009590,0.011017,0.009323
13902,A25757,0.009445,0.011536,0.009064
13903,A25764,0.013992,0.018799,0.013412


Conform Line_length_mean pickle file formed by the mean of the line length fractal dimension values for the 12 leads.

In [45]:
Line_length_mean = pd.DataFrame(columns=['ECG_ID','I_mean', 'II_mean', 'III_mean', 'aVR_mean', 'aVL_mean', 'aVF_mean', 'V1_mean', 'V2_mean', 'V3_mean', 'V4_mean', 'V5_mean', 'V6_mean'])
for id in normal_ecg_age['ECG_ID']:
    with h5py.File(f'E:/1-DENIS/Biomarkers/SPH dataset/records/{id}.h5', 'r+') as f:
        try:
            Line_length_mean.loc[len(Line_length_mean)] = [id, f['Line_length_fractal'][0,-2], f['Line_length_fractal'][1,-2], f['Line_length_fractal'][2,-2], f['Line_length_fractal'][3,-2], f['Line_length_fractal'][4,-2], f['Line_length_fractal'][5,-2], f['Line_length_fractal'][6,-2], f['Line_length_fractal'][7,-2], f['Line_length_fractal'][8,-2], f['Line_length_fractal'][9,-2], f['Line_length_fractal'][10,-2], f['Line_length_fractal'][11,-2]]
        except IndexError:
            continue
Line_length_mean.to_pickle('Line_length_mean.pickle')

In [46]:
Line_length_mean

Unnamed: 0,ECG_ID,I_mean,II_mean,III_mean,aVR_mean,aVL_mean,aVF_mean,V1_mean,V2_mean,V3_mean,V4_mean,V5_mean,V6_mean
0,A00002,0.010254,0.010925,0.004246,0.010239,0.005791,0.006756,0.014870,0.029922,0.019150,0.018478,0.017761,0.011131
1,A00003,0.016769,0.016937,0.017288,0.014694,0.014565,0.014847,0.011406,0.024414,0.025253,0.024124,0.020096,0.016937
2,A00006,0.009232,0.023315,0.021667,0.013832,0.012016,0.022064,0.010056,0.014465,0.016968,0.019180,0.017471,0.015480
3,A00008,0.005756,0.013283,0.011299,0.008522,0.005634,0.012047,0.012917,0.027374,0.024994,0.023941,0.017242,0.012405
4,A00009,0.009468,0.009369,0.003212,0.009232,0.005520,0.005161,0.008789,0.011444,0.014862,0.016113,0.014488,0.011269
...,...,...,...,...,...,...,...,...,...,...,...,...,...
13862,A25755,0.009285,0.006481,0.005634,0.007534,0.006992,0.003202,0.006989,0.015434,0.018311,0.016098,0.012878,0.010254
13863,A25756,0.009323,0.011017,0.006641,0.009590,0.005753,0.008018,0.006855,0.007820,0.011147,0.013496,0.012627,0.010399
13864,A25757,0.009064,0.011536,0.009926,0.009445,0.006676,0.009850,0.013321,0.021484,0.018372,0.017136,0.018127,0.013878
13865,A25764,0.013412,0.018799,0.015404,0.013992,0.010849,0.016113,0.008598,0.017548,0.019196,0.022369,0.020538,0.017288


In [3]:
Katz_mean = pd.read_pickle('Katz_mean.pickle')
Line_length_mean = pd.read_pickle('Line_length_mean.pickle')

In [11]:
Katz_Line_length_mean = pd.merge(Katz_mean, Line_length_mean, on=['ECG_ID', 'ECG_ID'])

In [12]:
Katz_Line_length_mean = pd.merge(Katz_Line_length_mean, normal_ecg_age, on=['ECG_ID', 'ECG_ID'])

In [13]:
Katz_Line_length_mean = Katz_Line_length_mean.drop('ECG_ID', axis=1)
Katz_Line_length_mean

Unnamed: 0,I_mean_x,II_mean_x,III_mean_x,aVR_mean_x,aVL_mean_x,aVF_mean_x,V1_mean_x,V2_mean_x,V3_mean_x,V4_mean_x,...,V2_mean_y,V3_mean_y,V4_mean_y,V5_mean_y,V6_mean_y,Age,Age_class_0,Age_class_1,Age_class_2,Age_class_3
0,1.248047,1.194336,1.248047,1.210938,1.367188,1.192383,1.194336,1.245117,1.377930,1.256836,...,0.029922,0.019150,0.018478,0.017761,0.011131,32,2,1,0,0
1,1.921875,1.493164,1.545898,1.659180,1.859375,1.464844,1.563477,1.340820,1.361328,1.423828,...,0.024414,0.025253,0.024124,0.020096,0.016937,63,5,2,1,1
2,1.668945,1.635742,1.791992,1.575195,1.916992,1.703125,1.440430,1.375000,1.408203,1.310547,...,0.014465,0.016968,0.019180,0.017471,0.015480,46,3,1,1,0
3,1.495117,1.263672,1.261719,1.290039,1.298828,1.256836,1.287109,1.316406,1.347656,1.284180,...,0.027374,0.024994,0.023941,0.017242,0.012405,32,2,1,0,0
4,1.277344,1.322266,1.553711,1.292969,1.284180,1.416992,1.222656,1.320312,1.398438,1.292969,...,0.011444,0.014862,0.016113,0.014488,0.011269,48,3,1,1,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
13862,1.231445,1.244141,1.421875,1.225586,1.266602,1.364258,1.343750,1.407227,1.283203,1.243164,...,0.015434,0.018311,0.016098,0.012878,0.010254,44,3,1,1,0
13863,1.351562,1.333008,1.667969,1.324219,1.491211,1.391602,1.355469,1.317383,1.256836,1.233398,...,0.007820,0.011147,0.013496,0.012627,0.010399,76,6,3,2,1
13864,1.501953,1.314453,1.443359,1.357422,1.698242,1.337891,1.388672,1.367188,1.367188,1.362305,...,0.021484,0.018372,0.017136,0.018127,0.013878,55,4,2,1,1
13865,1.392578,1.285156,1.279297,1.320312,1.367188,1.271484,1.397461,1.439453,1.411133,1.291992,...,0.017548,0.019196,0.022369,0.020538,0.017288,20,1,0,0,0


In [28]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(Katz_Line_length_mean.iloc[:,:24], Katz_Line_length_mean['Age_class_1'], random_state=0)

In [39]:
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=25)
knn.fit(X_train, y_train)
print("Test set score: {:.2f}".format(knn.score(X_test, y_test)))

Test set score: 0.52


In [40]:
from sklearn.tree import DecisionTreeClassifier
tree = DecisionTreeClassifier(random_state=0)
tree.fit(X_train, y_train)
print("Accuracy on training set: {:.2f}".format(tree.score(X_train, y_train)))
print("Accuracy on test set: {:.2f}".format(tree.score(X_test, y_test)))

Accuracy on training set: 1.00
Accuracy on test set: 0.44


Find the derivations with minor Petrosian fractal dimension standard deviation

In [46]:
derivations_list_Petrosian = [] 
for id in normal_ecg_age['ECG_ID']:
    with h5py.File(f'E:/1-DENIS/Biomarkers/SPH dataset/records/{id}.h5', 'r+') as f:
        p = f['Petrosian_fractal'][:,-1]
        derivations_list_Petrosian.append(np.argmin(p))
np.save('derivations_list_Petrosian_std_minimun.npy', derivations_list_Petrosian)

In [47]:
unique, counts = np.unique(derivations_list_Petrosian, return_counts=True)
print(np.asarray((unique, counts)).T)

[[   0  569]
 [   1  955]
 [   2  924]
 [   3  664]
 [   4  652]
 [   5  871]
 [   6  637]
 [   7 2004]
 [   8 2174]
 [   9 2111]
 [  10 1460]
 [  11  884]]


Derivation ['I', 'II', 'III', 'aVR', 'aVL', 'aVF', 'V1', 'V2', 'V3', 'V4', 'V5', 'V6'] with minor standard deviation: 8 corresponding to V3, 9 corresponding to V4, 10 corresponding to V5

In [48]:
Petrosian_mean_8910 = pd.DataFrame(columns=['ECG_ID', 'Petrosian_mean_V3', 'Petrosian_mean_V4', 'Petrosian_mean_V5'])
for id in normal_ecg_age['ECG_ID']:
    with h5py.File(f'E:/1-DENIS/Biomarkers/SPH dataset/records/{id}.h5', 'r+') as f:
        try:
            Petrosian_mean_8910.loc[len(Petrosian_mean_8910)] = [id, f['Petrosian_fractal'][8,-2], f['Petrosian_fractal'][9,-2], f['Petrosian_fractal'][10,-2]]
        except IndexError:
            continue
Petrosian_mean_8910.to_pickle('Petrosian_mean_8910.pickle')

Conform Petrosian_mean pickle file formed by the mean of the Petrosian fractal dimension values for the 12 leads.

In [49]:
Petrosian_mean = pd.DataFrame(columns=['ECG_ID','I_mean', 'II_mean', 'III_mean', 'aVR_mean', 'aVL_mean', 'aVF_mean', 'V1_mean', 'V2_mean', 'V3_mean', 'V4_mean', 'V5_mean', 'V6_mean'])
for id in normal_ecg_age['ECG_ID']:
    with h5py.File(f'E:/1-DENIS/Biomarkers/SPH dataset/records/{id}.h5', 'r+') as f:
        try:
            Petrosian_mean.loc[len(Petrosian_mean)] = [id, f['Petrosian_fractal'][0,-2], f['Petrosian_fractal'][1,-2], f['Petrosian_fractal'][2,-2], f['Petrosian_fractal'][3,-2], f['Petrosian_fractal'][4,-2], f['Petrosian_fractal'][5,-2], f['Petrosian_fractal'][6,-2], f['Petrosian_fractal'][7,-2], f['Petrosian_fractal'][8,-2], f['Petrosian_fractal'][9,-2], f['Petrosian_fractal'][10,-2], f['Petrosian_fractal'][11,-2]]
        except IndexError:
            continue
Petrosian_mean.to_pickle('Petrosian_mean.pickle')

In [50]:
Petrosian_mean                                                                                                                                                                                                                                                                                                                                                                                  

Unnamed: 0,ECG_ID,I_mean,II_mean,III_mean,aVR_mean,aVL_mean,aVF_mean,V1_mean,V2_mean,V3_mean,V4_mean,V5_mean,V6_mean
0,A00002,1.007326,1.009275,1.008594,1.008416,1.006542,1.011021,1.006702,1.004257,1.003679,1.004223,1.004620,1.007488
1,A00003,1.030798,1.029596,1.029637,1.031062,1.030363,1.028488,1.028260,1.021499,1.020602,1.023961,1.025274,1.025216
2,A00006,1.034390,1.034431,1.034681,1.034035,1.035199,1.034495,1.033017,1.029001,1.028135,1.028961,1.029256,1.030291
3,A00008,1.009951,1.010471,1.011870,1.010562,1.012878,1.011194,1.008181,1.006584,1.006790,1.007726,1.007680,1.008432
4,A00009,1.009022,1.009048,1.011518,1.008859,1.010589,1.009477,1.008282,1.006249,1.006124,1.006495,1.006889,1.007268
...,...,...,...,...,...,...,...,...,...,...,...,...,...
13862,A25755,1.011745,1.011046,1.014578,1.010887,1.013144,1.012580,1.010169,1.007018,1.009613,1.009902,1.010215,1.009665
13863,A25756,1.015982,1.016773,1.015599,1.016335,1.015923,1.016054,1.012233,1.013155,1.013750,1.013018,1.012713,1.013334
13864,A25757,1.011780,1.009568,1.014153,1.011081,1.013707,1.012540,1.009997,1.009137,1.010586,1.010923,1.009953,1.010338
13865,A25764,1.010270,1.012229,1.013551,1.011307,1.012790,1.012610,1.012587,1.008182,1.008797,1.008888,1.009568,1.009818


Find the derivations with minor Sevcik fractal dimension standard deviation

In [4]:
derivations_list_Sevcik = [] 
for id in normal_ecg_age['ECG_ID']:
    with h5py.File(f'E:/1-DENIS/Biomarkers/SPH dataset/records/{id}.h5', 'r+') as f:
        s = f['Sevcik_fractal'][:,-1]
        derivations_list_Sevcik.append(np.argmin(s))
np.save('derivations_list_Sevcik_std_minimun.npy', derivations_list_Sevcik)

In [5]:
unique, counts = np.unique(derivations_list_Sevcik, return_counts=True)
print(np.asarray((unique, counts)).T)

[[   0  399]
 [   1  420]
 [   2  152]
 [   3  365]
 [   4   81]
 [   5  217]
 [   6  627]
 [   7 2155]
 [   8 2677]
 [   9 3355]
 [  10 2477]
 [  11  980]]


Derivation ['I', 'II', 'III', 'aVR', 'aVL', 'aVF', 'V1', 'V2', 'V3', 'V4', 'V5', 'V6'] with minor standard deviation: 9 corresponding to V4, 8 corresponding to V3, 10 corresponding to V5

In [3]:
Sevcik_mean_9810 = pd.DataFrame(columns=['ECG_ID', 'Sevcik_mean_V4', 'Sevcik_mean_V3', 'Sevcik_mean_V5'])
for id in normal_ecg_age['ECG_ID']:
    with h5py.File(f'E:/1-DENIS/Biomarkers/SPH dataset/records/{id}.h5', 'r+') as f:
        try:
            Sevcik_mean_9810.loc[len(Sevcik_mean_9810)] = [id, f['Sevcik_fractal'][9,-2], f['Sevcik_fractal'][8,-2], f['Sevcik_fractal'][10,-2]]
        except IndexError:
            continue
Sevcik_mean_9810.to_pickle('Sevcik_mean_9810.pickle')

Conform Sevcik_mean pickle file formed by the mean of the Sevcik fractal dimension values for the 12 leads.

In [4]:
Sevcik_mean = pd.DataFrame(columns=['ECG_ID','I_mean', 'II_mean', 'III_mean', 'aVR_mean', 'aVL_mean', 'aVF_mean', 'V1_mean', 'V2_mean', 'V3_mean', 'V4_mean', 'V5_mean', 'V6_mean'])
for id in normal_ecg_age['ECG_ID']:
    with h5py.File(f'E:/1-DENIS/Biomarkers/SPH dataset/records/{id}.h5', 'r+') as f:
        try:
            Sevcik_mean.loc[len(Sevcik_mean)] = [id, f['Sevcik_fractal'][0,-2], f['Sevcik_fractal'][1,-2], f['Sevcik_fractal'][2,-2], f['Sevcik_fractal'][3,-2], f['Sevcik_fractal'][4,-2], f['Sevcik_fractal'][5,-2], f['Sevcik_fractal'][6,-2], f['Sevcik_fractal'][7,-2], f['Sevcik_fractal'][8,-2], f['Sevcik_fractal'][9,-2], f['Sevcik_fractal'][10,-2], f['Sevcik_fractal'][11,-2]]
        except IndexError:
            continue
Sevcik_mean.to_pickle('Sevcik_mean.pickle')

In [25]:
for id in normal_ecg_age['ECG_ID']:
        with h5py.File(f'E:/1-DENIS/Biomarkers/SPH dataset/records/{id}.h5', 'r+') as f:
                f.create_group('Fractal_Dimension')
                for member in f.keys():
                        if (member!='Fractal_Dimension' and member!='ecg' and member!='ECG_R_Peaks'):
                                f.move(f'{member}', f'Fractal_Dimension/{member}')
                                

In [31]:
with h5py.File('E:/1-DENIS/Biomarkers/SPH dataset/records/A00002.h5', 'r+') as f:
    for member in f['Fractal_Dimension'].keys():
        f.move(f'Fractal_Dimension/{member}',f'{member}')


In [50]:
with h5py.File('E:/1-DENIS/Biomarkers/SPH dataset/records/A00002.h5', 'r+') as f:
    del f['Fractal_Dimension/fractal_higuchi']

In [47]:
for id in normal_ecg_age['ECG_ID']:
    with h5py.File(f'E:/1-DENIS/Biomarkers/SPH dataset/records/{id}.h5', 'r+') as f:
        f['Fractal_Dimension/fractal_nld'] = f['Fractal_Dimension/NLD_fractal']
        del f['Fractal_Dimension/NLD_fractal']

fractal_psdslope - error
fractal_higuchi
fractal_density - work in progress
fractal_hurst
fractal_correlation
fractal_dfa
fractal_tmf