@author : Madiha Qureshi

# **An Exampe of Using the DGCCA Package - DeepGeneralizedCCA**
An example of using the DeepGeneralizedCCA package to the 'Fetal Health Classification' dataset (Source : https://www.kaggle.com/andrewmvd/fetal-health-classification).

---


**Dataset Description** - Cardiotocograms (CTGs) are a simple and cost accessible option to assess fetal health, allowing doctors to take action to prevent child and maternal mortality. This dataset includes 2126 records of features extracted from Cardiotocogram exams, which are classified into 3 classes: Normal, Suspect, Pathological.

---

**Contents** 


1.   Data Understanding
2.   Data Preprocessing
3.   Applying DGCCA

---

**DGCCA Source Code** : https://github.com/shekhar-sharma/DataScience/blob/main/Groups/Group_ID_3/DeepGeneralizedCCA/dgcca.py


---


## 1. Data Understanding

In [77]:
from google.colab import files
src = list(files.upload().values())[0]

Saving dgcca.py to dgcca.py


In [78]:
#Importing required libraries
import numpy as np
import pandas as pd
import dgcca
import io
from sklearn.model_selection import train_test_split

In [5]:
#uploading GlobalTempratures.csv on Google Colab
from google.colab import files
uploaded = files.upload()

Saving fetal_health.csv to fetal_health.csv


In [79]:
#Reading Data
data = pd.read_csv(io.BytesIO(uploaded['fetal_health.csv']))
data.head()

Unnamed: 0,baseline value,accelerations,fetal_movement,uterine_contractions,light_decelerations,severe_decelerations,prolongued_decelerations,abnormal_short_term_variability,mean_value_of_short_term_variability,percentage_of_time_with_abnormal_long_term_variability,mean_value_of_long_term_variability,histogram_width,histogram_min,histogram_max,histogram_number_of_peaks,histogram_number_of_zeroes,histogram_mode,histogram_mean,histogram_median,histogram_variance,histogram_tendency,fetal_health
0,120.0,0.0,0.0,0.0,0.0,0.0,0.0,73.0,0.5,43.0,2.4,64.0,62.0,126.0,2.0,0.0,120.0,137.0,121.0,73.0,1.0,2.0
1,132.0,0.006,0.0,0.006,0.003,0.0,0.0,17.0,2.1,0.0,10.4,130.0,68.0,198.0,6.0,1.0,141.0,136.0,140.0,12.0,0.0,1.0
2,133.0,0.003,0.0,0.008,0.003,0.0,0.0,16.0,2.1,0.0,13.4,130.0,68.0,198.0,5.0,1.0,141.0,135.0,138.0,13.0,0.0,1.0
3,134.0,0.003,0.0,0.008,0.003,0.0,0.0,16.0,2.4,0.0,23.0,117.0,53.0,170.0,11.0,0.0,137.0,134.0,137.0,13.0,1.0,1.0
4,132.0,0.007,0.0,0.008,0.0,0.0,0.0,16.0,2.4,0.0,19.9,117.0,53.0,170.0,9.0,0.0,137.0,136.0,138.0,11.0,1.0,1.0


In [80]:
data.describe()

Unnamed: 0,baseline value,accelerations,fetal_movement,uterine_contractions,light_decelerations,severe_decelerations,prolongued_decelerations,abnormal_short_term_variability,mean_value_of_short_term_variability,percentage_of_time_with_abnormal_long_term_variability,mean_value_of_long_term_variability,histogram_width,histogram_min,histogram_max,histogram_number_of_peaks,histogram_number_of_zeroes,histogram_mode,histogram_mean,histogram_median,histogram_variance,histogram_tendency,fetal_health
count,2126.0,2126.0,2126.0,2126.0,2126.0,2126.0,2126.0,2126.0,2126.0,2126.0,2126.0,2126.0,2126.0,2126.0,2126.0,2126.0,2126.0,2126.0,2126.0,2126.0,2126.0,2126.0
mean,133.303857,0.003178,0.009481,0.004366,0.001889,3e-06,0.000159,46.990122,1.332785,9.84666,8.187629,70.445908,93.579492,164.0254,4.068203,0.323612,137.452023,134.610536,138.09031,18.80809,0.32032,1.304327
std,9.840844,0.003866,0.046666,0.002946,0.00296,5.7e-05,0.00059,17.192814,0.883241,18.39688,5.628247,38.955693,29.560212,17.944183,2.949386,0.706059,16.381289,15.593596,14.466589,28.977636,0.610829,0.614377
min,106.0,0.0,0.0,0.0,0.0,0.0,0.0,12.0,0.2,0.0,0.0,3.0,50.0,122.0,0.0,0.0,60.0,73.0,77.0,0.0,-1.0,1.0
25%,126.0,0.0,0.0,0.002,0.0,0.0,0.0,32.0,0.7,0.0,4.6,37.0,67.0,152.0,2.0,0.0,129.0,125.0,129.0,2.0,0.0,1.0
50%,133.0,0.002,0.0,0.004,0.0,0.0,0.0,49.0,1.2,0.0,7.4,67.5,93.0,162.0,3.0,0.0,139.0,136.0,139.0,7.0,0.0,1.0
75%,140.0,0.006,0.003,0.007,0.003,0.0,0.0,61.0,1.7,11.0,10.8,100.0,120.0,174.0,6.0,0.0,148.0,145.0,148.0,24.0,1.0,1.0
max,160.0,0.019,0.481,0.015,0.015,0.001,0.005,87.0,7.0,91.0,50.7,180.0,159.0,238.0,18.0,10.0,187.0,182.0,186.0,269.0,1.0,3.0


In [81]:
data.info() 

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2126 entries, 0 to 2125
Data columns (total 22 columns):
 #   Column                                                  Non-Null Count  Dtype  
---  ------                                                  --------------  -----  
 0   baseline value                                          2126 non-null   float64
 1   accelerations                                           2126 non-null   float64
 2   fetal_movement                                          2126 non-null   float64
 3   uterine_contractions                                    2126 non-null   float64
 4   light_decelerations                                     2126 non-null   float64
 5   severe_decelerations                                    2126 non-null   float64
 6   prolongued_decelerations                                2126 non-null   float64
 7   abnormal_short_term_variability                         2126 non-null   float64
 8   mean_value_of_short_term_variability  

In [82]:
data.isnull().sum()
#The dataset has no missing values

baseline value                                            0
accelerations                                             0
fetal_movement                                            0
uterine_contractions                                      0
light_decelerations                                       0
severe_decelerations                                      0
prolongued_decelerations                                  0
abnormal_short_term_variability                           0
mean_value_of_short_term_variability                      0
percentage_of_time_with_abnormal_long_term_variability    0
mean_value_of_long_term_variability                       0
histogram_width                                           0
histogram_min                                             0
histogram_max                                             0
histogram_number_of_peaks                                 0
histogram_number_of_zeroes                                0
histogram_mode                          

In [83]:
# counting no of unique values of each column
for x in data.columns.array:
  print(x, (data[x]).nunique()) 

baseline value 48
accelerations 20
fetal_movement 102
uterine_contractions 16
light_decelerations 16
severe_decelerations 2
prolongued_decelerations 6
abnormal_short_term_variability 75
mean_value_of_short_term_variability 57
percentage_of_time_with_abnormal_long_term_variability 87
mean_value_of_long_term_variability 249
histogram_width 154
histogram_min 109
histogram_max 86
histogram_number_of_peaks 18
histogram_number_of_zeroes 9
histogram_mode 88
histogram_mean 103
histogram_median 95
histogram_variance 133
histogram_tendency 3
fetal_health 3


In [84]:
#No of entries  in each class
data['fetal_health'].value_counts()

1.0    1655
2.0     295
3.0     176
Name: fetal_health, dtype: int64



---



## 2. Data Preprocessing

In [85]:
#Checking for and Removing duplicate records
data.drop_duplicates(keep='first',inplace=True)

In [86]:
data.shape
#removed duplicate 13 records

(2113, 22)

In [87]:
#Features for the model
X = data.iloc[:,0:21]

In [88]:
#Feature to predict
Y = data.iloc[:,21]

In [89]:
# converting to tensor for inputing to DGCCA 
X = torch.from_numpy(X.to_numpy()).float()

In [90]:
#Splitting views in testing and training set
X_train, X_test = train_test_split(X, test_size=0.20, random_state=0)

In [91]:
#Creating different views
X1_train = X_train[:,:7]
X2_train = X_train[:, 7:14]
X3_train = X_train[:, 14:]

X1_test = X_test[:,:7]
X2_test = X_test[:, 7:14]
X3_test = X_test[:, 14:]

In [92]:
# X1_train = torch.transpose(X1_train, 0, 1)
# X2_train = torch.transpose(X2_train, 0, 1)
# X3_train = torch.transpose(X3_train, 0, 1)
# X1_test = torch.transpose(X1_test, 0, 1)
# X2_test = torch.transpose(X2_test, 0, 1)
# X3_test = torch.transpose(X3_test, 0, 1)


---
## 4. Applying DGCCA

In [93]:
in_size_X1 = 7
in_size_X2 = 7
in_size_X3 = 7
out_size = 3

layer_sizes1 = [in_size_X1, 840, 840, out_size]
layer_sizes2 = [in_size_X1, 840, 840, out_size]
layer_sizes3 = [in_size_X1, 840, 840, out_size]

model = DGCCA_architecture(layer_sizes1, layer_sizes2, layer_sizes3, "sigmoid")

learning_rate = 1e-3
epoch_num = 15
batch_size = 80
reg_par = 1e-5

#DGCCA(self, architecture, learning_rate, epoch_num, batch_size, reg_par, out_size:int)
algo = DGCCA(model, learning_rate, epoch_num, batch_size, reg_par, out_size)
algo.fit_transform(X1_train, X2_train, X3_train, X1_test, X2_test, X3_test)

Epcoh num:  0  Train loss =  -2.2523692
more than 2 views therefore switched to generalized
Epcoh num:  1  Train loss =  -2.52994
more than 2 views therefore switched to generalized
Epcoh num:  2  Train loss =  -2.709432
more than 2 views therefore switched to generalized
Epcoh num:  3  Train loss =  -2.8163888
more than 2 views therefore switched to generalized
Epcoh num:  4  Train loss =  -2.8984923
more than 2 views therefore switched to generalized
Epcoh num:  5  Train loss =  -2.9476137
more than 2 views therefore switched to generalized
Epcoh num:  6  Train loss =  -2.9914389
more than 2 views therefore switched to generalized
Epcoh num:  7  Train loss =  -3.0334902
more than 2 views therefore switched to generalized
Epcoh num:  8  Train loss =  -3.0667522
more than 2 views therefore switched to generalized
Epcoh num:  9  Train loss =  -3.0903597
more than 2 views therefore switched to generalized
Epcoh num:  10  Train loss =  -3.1128564
more than 2 views therefore switched to ge

In [94]:
loss, outputs = algo.test(torch.cat([X1_train, X1_test], dim=0), torch.cat([X2_train, X2_test], dim=0), torch.cat([X3_train, X3_test], dim=0))

In [95]:
#New Features 
model_input_array = np.concatenate((outputs[0],outputs[1],outputs[2]), axis = 1)
model_input_array.shape

(2113, 9)

In [96]:
 #Creating dataframe
 model_input = pd.DataFrame(data=model_input_array)
 model_input.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8
0,32.158333,36.359158,-5.415336,0.942111,-0.041667,1.012916,0.162881,-0.965194,-0.967378
1,30.082998,36.507904,-6.639428,-0.756228,-0.771154,0.863941,-0.891778,-0.865454,-0.441535
2,33.515259,35.685043,-4.563952,1.003128,-0.300051,0.851757,-0.097978,-1.753128,-0.520079
3,32.065796,36.315231,-5.406788,1.217186,0.365465,1.175623,-0.948657,-2.020954,-0.20931
4,34.154137,35.262539,-4.265619,-0.971421,0.035228,-0.706083,1.035805,0.293935,-0.938944


In [97]:
 model_input.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2113 entries, 0 to 2112
Data columns (total 9 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   0       2113 non-null   float32
 1   1       2113 non-null   float32
 2   2       2113 non-null   float32
 3   3       2113 non-null   float32
 4   4       2113 non-null   float32
 5   5       2113 non-null   float32
 6   6       2113 non-null   float32
 7   7       2113 non-null   float32
 8   8       2113 non-null   float32
dtypes: float32(9)
memory usage: 74.4 KB


In [98]:
model_input.corr()

Unnamed: 0,0,1,2,3,4,5,6,7,8
0,1.0,-0.593508,0.947634,-0.061863,0.484762,-0.068512,0.63181,-0.046772,-0.037351
1,-0.593508,1.0,-0.701675,0.231406,-0.429352,0.225118,-0.618685,-0.163297,-0.02609
2,0.947634,-0.701675,1.0,-0.129224,0.487462,-0.086341,0.696195,0.008097,-0.057561
3,-0.061863,0.231406,-0.129224,1.0,0.231779,0.395263,-0.533674,-0.776462,0.044606
4,0.484762,-0.429352,0.487462,0.231779,1.0,-0.331247,0.397304,-0.06896,0.444923
5,-0.068512,0.225118,-0.086341,0.395263,-0.331247,1.0,-0.407218,-0.647234,-0.716985
6,0.63181,-0.618685,0.696195,-0.533674,0.397304,-0.407218,1.0,0.526706,0.043845
7,-0.046772,-0.163297,0.008097,-0.776462,-0.06896,-0.647234,0.526706,1.0,0.22337
8,-0.037351,-0.02609,-0.057561,0.044606,0.444923,-0.716985,0.043845,0.22337,1.0




---


## End

---

---



