## Paper's description about the HIV Dataset

### Study Participants
- **Age Range**: 18 to 86 years.
- **Imaging Technique**: All participants were scanned using a T1-weighted MRI.
- **Consent & Approval**:
  - Written informed consent was obtained from all study participants.
  - The study was approved by the Institutional Review Board (IRB) at Stanford University (Protocol ID: IRB-9861) and SRI International (Protocol ID: Pro00039132).

### Participant Details
- **HIV Subjects**:
  - All were seropositive for HIV infection.
  - CD4 count was greater than 100 cells/μL (average: 303.0).

### Data Matching and Preprocessing
- **Construction of Confounder-independent Subset**:
  - Utilized a matching algorithm to extract the maximum number of subjects from each group, ensuring equal size and identical distribution with respect to confounder values.
  - For each HIV subject, a control subject was selected based on minimal age difference, continuing until all subjects were matched or the two-tailed p-value of the two-sample t-test between age distributions reached 0.5.
- **MRI Preprocessing Steps**:
  - Denoising, bias-field correction, skull stripping, and affine registration to the SRI24 template.
  - The registered images were then downsampled to a 64 × 64 × 64 volume to reduce potential overfitting and enable large batch sizes during training.

### Model Training and Evaluation
- **Data Augmentation**:
  - New synthetic 3D images were generated by randomly shifting each MRI within one voxel and rotating within 1 degree along the three axes.
  - The augmented dataset included a balanced set of 1024 MRIs for each group (control and HIV).
- **Assumption**: HIV affects the brain bilaterally; thus, the left hemisphere was flipped to create a second right hemisphere.
- **Testing Approach**:
  - During testing, both the right and flipped left hemispheres of the raw test images were analyzed by the trained model.
  - The prediction score averaged across both hemispheres was used to predict the individual’s diagnosis group.
- **Saliency Mapping**:
  - Computed for the right hemisphere of each test image to quantify the importance of each voxel to the final prediction.

### Cross-validation Strategy
- Prediction accuracy of the deep learning models was determined via fivefold cross-validation.


In [10]:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

from tensorflow.keras.models import Sequential, Model
from tensorflow.keras.layers import Activation, Dense, Dropout, Flatten, UpSampling3D, Input, ZeroPadding3D, Lambda, Reshape
from tensorflow.keras.layers import BatchNormalization
from tensorflow.keras.layers import Conv3D, MaxPooling3D
from tensorflow.keras.losses import mse, binary_crossentropy
from tensorflow.keras.utils import plot_model
from tensorflow.keras.constraints import unit_norm, max_norm
from tensorflow.keras import regularizers
from tensorflow.keras import backend as K
from tensorflow.keras.optimizers import Adam

import tensorflow as tf

from sklearn.model_selection import StratifiedKFold
import numpy as np
import nibabel as nib
import scipy as sp
import scipy.ndimage
from sklearn.metrics import mean_squared_error, r2_score

import sys
import argparse
import os
import glob 

import dcor

## Data Augmentation:
-  In machine learning, particularly in scenarios with limited data (common in medical imaging due to privacy issues, cost, etc.), augmentation is a critical technique to artificially expand the training dataset. This helps prevent overfitting and allows the model to generalize better on unseen data.
- Model Robustness: By introducing variability (through rotations and shifts), the function ensures that the neural network becomes robust to such variations in the input data, which is crucial for medical diagnostics where input data can vary significantly in orientation and positioning.
- Efficiency: This method of augmentation is computationally cheaper and quicker than acquiring new real-world data, making it an efficient strategy in data-scarce environments like medical imaging.
- This augmentation function supports the overall goal of enhancing model training by providing a diverse set of training examples from a limited set of actual samples, thus aiding in developing a more effective and robust machine learning model.

In [11]:
def augment_by_transformation(data,age,sex,n):
    augment_scale = 1

    if n <= data.shape[0]:
        return data
    else:
        raw_n = data.shape[0]
        m = n - raw_n
        new_data = np.zeros((m,data.shape[1],data.shape[2],data.shape[3],1))
        for i in range(0,m):
            idx = np.random.randint(0,raw_n)
            new_age = age[idx]
            new_sex = sex[idx]
            new_data[i] = data[idx].copy()
            new_data[i,:,:,:,0] = sp.ndimage.interpolation.rotate(new_data[i,:,:,:,0],np.random.uniform(-0.5,0.5),axes=(1,0),reshape=False)
            new_data[i,:,:,:,0] = sp.ndimage.interpolation.rotate(new_data[i,:,:,:,0],np.random.uniform(-0.5,0.5),axes=(0,2),reshape=False)
            new_data[i,:,:,:,0] = sp.ndimage.interpolation.rotate(new_data[i,:,:,:,0],np.random.uniform(-0.5,0.5),axes=(1,2),reshape=False)
            new_data[i,:,:,:,0] = sp.ndimage.shift(new_data[i,:,:,:,0],np.random.uniform(-0.5,0.5))

            age = np.append(age, new_age)
            sex = np.append(sex, new_sex)

        # output an example
        array_img = nib.Nifti1Image(np.squeeze(new_data[3,:,:,:,0]),np.diag([1, 1, 1, 1]))  
        filename = 'augmented_example.nii.gz'
        nib.save(array_img,filename)

        data = np.concatenate((data, new_data), axis=0)
        return data,age,sex


1. inv_mse (Inverse Mean Squared Error)

- This function computes the mean squared error (MSE), which is a common measure of the average of the squares of the errors—that is, the average squared difference between the estimated values and the actual value. Here, it is calculated as the sum of squared differences between y_true and y_pred. Uniquely, this function returns the negative of the MSE.

- Returning the negative of the MSE could be used for scenarios where one might need to maximize MSE, possibly in adversarial settings or specific optimization scenarios where the model aims to diverge from a particular solution. It's an unusual application, as typically MSE is minimized.
2. inv_correlation_coefficient_loss

- This function computes a variation of the Pearson correlation coefficient between the true and predicted values. The Pearson correlation assesses the linear relationship between two datasets. Standard Pearson's r ranges from -1 to +1, where +1 indicates total positive linear correlation, 0 indicates no linear correlation, and -1 indicates total negative linear correlation. This specific implementation squares the correlation coefficient and subtracts it from 1, effectively reversing its effect.

- This loss is likely designed to minimize correlation between predictions and actuals, potentially useful in scenarios where independence between outputs and true values is desired. This could be useful in regularization or in designing features that should not correlate with the noise or undesired signals in the data.
3. correlation_coefficient_loss

- Similar to inv_correlation_coefficient_loss, but it directly returns the square of the Pearson correlation coefficient. This version emphasizes promoting higher correlation between the predicted and true values.

- This loss function is used when you want to maximize the correlation between the predictions and the actual values. It's suitable for regression problems where the goal is to align as closely as possible with the variability in the data, adjusted linearly.
Overall Usage:
These loss functions can be selected based on specific training goals:

In [12]:
def inv_mse(y_true, y_pred):
    mse_value = K.sum(K.square(y_true-y_pred))

    return -mse_value

def inv_correlation_coefficient_loss(y_true, y_pred):
    x = y_true
    y = y_pred
    mx = K.mean(x)
    my = K.mean(y)
    xm, ym = x-mx, y-my
    r_num = K.sum(tf.multiply(xm,ym))
    r_den = K.sqrt(tf.multiply(K.sum(K.square(xm)), K.sum(K.square(ym)))) + 1e-5
    r = r_num / r_den

    r = K.maximum(K.minimum(r, 1.0), -1.0)
    return 1 - K.square(r)

def correlation_coefficient_loss(y_true, y_pred):
    x = y_true
    y = y_pred
    mx = K.mean(x)
    my = K.mean(y)
    xm, ym = x-mx, y-my
    r_num = K.sum(tf.multiply(xm,ym))
    r_den = K.sqrt(tf.multiply(K.sum(K.square(xm)), K.sum(K.square(ym)))) + 1e-5
    r = r_num / r_den

    r = K.maximum(K.minimum(r, 1.0), -1.0)
    return K.square(r)


### For learning GANNs:
 - https://youtu.be/8L11aMN5KY8?si=41BEjE-QQ0fbbtqn

### 1. Overview of the Code Structure
- **Optimizer Setup**: Multiple optimizers are set up, probably to handle different training requirements for each network component.
- **Regressor and Encoder**: These parts of the network are responsible for processing the input data and extracting meaningful features. The encoder acts as the feature extractor mentioned in the paper, reducing each medical image to a vector of features.
- **Distiller Component**: Although not explicitly named as such in your code, the use of the regressor in a manner that it is not updated during the training of the encoder suggests a role similar to a distillation process where the knowledge is transferred or refined.
- **Classifier**: This part uses the features processed by the encoder to make final predictions (e.g., disease presence or absence).

### 2. Specific Functions Mapped to Paper Descriptions
- **Encoder**: The encoder in your code likely corresponds to the **Feature Extractor (FE)** in the paper. It processes input images into a condensed form of features that are useful for prediction but should ideally be invariant to confounding factors like age or sex.
  
- **Regressor Setup as Non-Trainable in the Context of Distiller**: The regressor might be akin to the **Confounder Predictor (CP)** described in the paper, although it appears to be used here more for feature transformation or distillation rather than directly predicting the confounder. In the paper, CP is used in an adversarial setting to ensure that the features extracted are independent of confounders.

- **Classifier and Workflow Compilation**: This aligns with the **Classifier/Predictor (P)** in the paper, which uses the features provided by the FE to predict the outcome, such as a medical diagnosis, while ideally being free from the influence of confounders.

### 3. Handling of Confounders
In the context of the paper, the network should be learning to extract features that are informative for the prediction task but invariant to confounding factors (like age or sex differences that are not relevant to the disease being studied). This setup is suggested by the use of specific loss functions and the architecture setup where different parts of the network are optimized to either predict the primary outcome or ensure that the features are not confounded.


### 4. Train function
This function integrates multiple training and evaluation processes within a single training loop, reflecting a complex workflow typical in medical imaging studies where both diagnostic prediction and confounding factor analysis (like age prediction) are relevant. The function emphasizes robustness and generalization by incorporating data augmentation directly in the training loop and by evaluating models on both original and manipulated datasets. This approach is critical in medical applications where model performance and interpretability directly impact clinical outcomes.

### Conclusion
Your code seems to implement a sophisticated neural network that attempts to integrate the extraction of useful and confounder-free features from medical images, closely aligning with the objectives discussed in the paper. The actual mechanism by which confounding is addressed (e.g., adversarial training components or specific regularization techniques) would depend on further details from the paper and additional parts of the code not provided here.

In [13]:
class GAN():
        def __init__(self):
                self.lr = 0.0002
                optimizer = Adam(self.lr)
                optimizer_distiller = Adam(self.lr)
                optimizer_regressor = Adam(self.lr)

                L2_reg = 0.1
                ft_bank_baseline = 16
                latent_dim = 16

                # Build and compile the cf predictorinv_inv
                self.regressor = self.build_regressor()
                self.regressor.compile(loss='mse', optimizer=optimizer)

                #The cnn
                # Build the feature encoder
                input_image = Input(shape=(32,64,64,1), name='input_image')
                feature = Conv3D(ft_bank_baseline, activation='relu', kernel_size=(3, 3, 3),padding='same')(input_image)
                feature = BatchNormalization()(feature)
                feature = MaxPooling3D(pool_size=(2, 2, 2))(feature)

                feature = Conv3D(ft_bank_baseline*2, activation='relu', kernel_size=(3, 3, 3),padding='same')(feature)
                feature = BatchNormalization()(feature)
                feature = MaxPooling3D(pool_size=(2, 2, 2))(feature)

                feature = Conv3D(ft_bank_baseline*4, activation='relu', kernel_size=(3, 3, 3),padding='same')(feature)
                feature = BatchNormalization()(feature)
                feature = MaxPooling3D(pool_size=(2, 2, 2))(feature)

                feature = Conv3D(ft_bank_baseline*2, activation='relu', kernel_size=(3, 3, 3),padding='same')(feature)
                #feature = Conv3D(ft_bank_baseline*8, activation='relu', kernel_size=(3, 3, 3),padding='same')(feature)
                feature = BatchNormalization()(feature)
                feature = MaxPooling3D(pool_size=(2, 2, 2))(feature)

                feature_dense = Flatten()(feature)

                self.encoder = Model(input_image, feature_dense)

                # the CF part with regression, we are making it confounder free

                # For the distillation model we will only train the encoder

                self.regressor.trainable = False
                cf = self.regressor(feature_dense)
                self.distiller = Model(input_image, cf)
                self.distiller.compile(loss=correlation_coefficient_loss, optimizer=optimizer)

                # classifier:

                # Build and Compile the classifer  
                #self.encoder.load_weights('encoder.h5');
                #self.encoder.trainable = False
                input_feature_clf = Input(shape=(1024,), name='input_feature_dense')
                #input_feature_clf = Input(shape=(4096,), name='input_feature_dense')
                feature_clf = Dense(latent_dim*4, activation='tanh',kernel_regularizer=regularizers.l2(L2_reg))(input_feature_clf)
                feature_clf = Dense(latent_dim*2, activation='tanh',kernel_regularizer=regularizers.l2(L2_reg))(feature_clf)
                prediction_score = Dense(1, name='prediction_score',kernel_regularizer=regularizers.l2(L2_reg))(feature_clf)
                self.classifier = Model(input_feature_clf, prediction_score)

                # workflow and ouput:

                # Build the entir workflow
                prediction_score_workflow = self.classifier(feature_dense)
                label_workflow = Activation('sigmoid', name='r_mean')(prediction_score_workflow)
                self.workflow = Model(input_image, label_workflow)
                self.workflow.compile(loss='binary_crossentropy', optimizer=optimizer,metrics=['accuracy'])
#  the counter factual (CF) part, which is the regressor and the input is the encoders output
        def build_regressor(self):
                latent_dim = 16
                inputs_x = Input(shape=(1024,))
                #inputs_x = Input(shape=(4096,))
                feature = Dense(latent_dim*4, activation='tanh')(inputs_x)
                feature = Dense(latent_dim*2, activation='tanh')(feature)
                cf = Dense(1)(feature)

                return Model(inputs_x, cf)


        def train(self, epochs, training, testing, testing_raw, batch_size=64, fold=0):
                [train_data_aug, train_dx_aug, train_age_aug, train_sex_aug] = training
                [test_data_aug,  test_dx_aug,  test_age_aug,  test_sex_aug]  = testing
                [test_data    ,  test_dx    ,  test_age,      test_sex   ]   = testing_raw

                test_data_aug_flip = np.flip(test_data_aug,1)
                test_data_flip = np.flip(test_data,1)

                idx_perm = np.random.permutation(int(train_data_aug.shape[0]/2))

                dc_age = np.zeros((int(epochs/10)+1,))
                min_dc = 0
                for epoch in range(epochs):

                        ## Turn on to LR decay manually
                        # if epoch % 200 == 0:
                        #    self.lr = self.lr * 0.75
                        #    optimizer = Adam(self.lr)
                        #    self.workflow.compile(loss='binary_crossentropy', optimizer=optimizer,metrics=['accuracy'])
                        #    self.distiller.compile(loss=correlation_coefficient_loss, optimizer=optimizer)
                        #    self.regressor.compile(loss='mse', optimizer=optimizer)

                        # Select a random batch of images
                        
                        idx_perm = np.random.permutation(int(train_data_aug.shape[0]/2))
                        ctrl_idx = idx_perm[:int(batch_size)]
                        idx_perm = np.random.permutation(int(train_data_aug.shape[0]/2))
                        idx = idx_perm[:int(batch_size/2)]
                        idx = np.concatenate((idx,idx+int(train_data_aug.shape[0]/2)))

                        training_feature_batch = train_data_aug[idx]
                        dx_batch = train_dx_aug[idx]
                        age_batch = train_age_aug[idx]

                        training_feature_ctrl_batch = train_data_aug[ctrl_idx]
                        age_ctrl_batch = train_age_aug[ctrl_idx]
                        
                        # ---------------------
                        #  Train regressor (cf predictor)
                        # ---------------------

                        encoded_feature_ctrl_batch = self.encoder.predict(training_feature_ctrl_batch[:,:32,:,:])
                        r_loss = self.regressor.train_on_batch(encoded_feature_ctrl_batch, age_ctrl_batch)

                        # ---------------------
                        #  Train Disstiller
                        # ---------------------
                        
                        g_loss = self.distiller.train_on_batch(training_feature_ctrl_batch[:,:32,:,:], age_ctrl_batch)
                        
                        # ---------------------
                        #  Train Encoder & Classifier
                        # ---------------------
                        
                        c_loss = self.workflow.train_on_batch(training_feature_batch[:,:32,:,:], dx_batch)

                        # ---------------------
                        #  flip & re-do everything
                        # ---------------------

                        training_feature_batch = np.flip(training_feature_batch,1)
                        training_feature_ctrl_batch = np.flip(training_feature_ctrl_batch,1)

                        encoded_feature_ctrl_batch = self.encoder.predict(training_feature_ctrl_batch[:,:32:,:])
                        r_loss = self.regressor.train_on_batch(encoded_feature_ctrl_batch, age_ctrl_batch)
                        g_loss = self.distiller.train_on_batch(training_feature_ctrl_batch[:,:32,:,:], age_ctrl_batch)
                        c_loss = self.workflow.train_on_batch(training_feature_batch[:,:32,:,:], dx_batch)

                        # Plot the progress
                        if epoch % 50 == 0:
                                c_loss_test_1 = self.workflow.evaluate(test_data_aug[:,:32,:,:],      test_dx_aug, verbose = 0, batch_size = batch_size)    
                                c_loss_test_2 = self.workflow.evaluate(test_data_aug_flip[:,:32,:,:], test_dx_aug, verbose = 0, batch_size = batch_size)    

                                # feature dist corr
                                features_dense = self.encoder.predict(train_data_aug[train_dx_aug == 0,:32,:,:],  batch_size = batch_size)
                                dc_age[int(epoch/10)] = dcor.u_distance_correlation_sqr(features_dense, train_age_aug[train_dx_aug == 0])
                                print ("%d [Acc: %f,  Test Acc: %f %f,  dc: %f]" % (epoch, c_loss[1], c_loss_test_1[1], c_loss_test_2[1], dc_age[int(epoch/10)]))
                                sys.stdout.flush()

                                self.classifier.save_weights("res_cf_5cv/classifier.h5")
                                self.encoder.save_weights("res_cf_5cv/encoder.h5")
                                self.workflow.save_weights("res_cf_5cv/workflow.h5")

                                ## Turn on to save all intermediate features for posthoc MI computation
                                #features_dense = self.encoder.predict(test_data[:,:32,:,:],  batch_size = 64)
                                #filename = 'res_cf/features_'+str(fold)+'.txt'
                                #np.savetxt(filename,features_dense)
                                #score = self.classifier.predict(features_dense,  batch_size = 64)
                                #filename = 'res_cf/scores_'+str(fold)+'_'+str(epoch)+'.txt'
                                #np.savetxt(filename,score)

                                #features_dense = self.encoder.predict(test_data_flip[:,:32,:,:],  batch_size = 64)
                                #filename = 'res_cf/features_flip_'+str(fold)+'.txt'
                                #np.savetxt(filename,features_dense)
                                #score = self.classifier.predict(features_dense,  batch_size = 64)
                                #filename = 'res_cf/scores_flip_'+str(fold)+'_'+str(epoch)+'.txt'
                                #np.savetxt(filename,score)

                                # save intermediate predictions
                                prediction = self.workflow.predict(test_data[:,:32,:,:],  batch_size = 64)
                                filename = 'res_cf_5cv/prediction_'+str(fold)+'_'+str(epoch)+'.txt'
                                np.savetxt(filename,prediction)
                                prediction = self.workflow.predict(test_data_flip[:,:32,:,:],  batch_size = 64)
                                filename = 'res_cf_5cv/prediction_flip_'+str(fold)+'_'+str(epoch)+'.txt'
                                np.savetxt(filename,prediction)

                                # save ground-truth
                                filename = 'res_cf_5cv/dx_'+str(fold)+'.txt'
                                np.savetxt(filename,test_dx)    
                                filename = 'res_cf_5cv/cf_'+str(fold)+'.txt'
                                np.savetxt(filename,test_age)  

     

In [15]:
if __name__ == '__main__':
    file_idx = np.genfromtxt('./access.txt', dtype='str')
    age = np.loadtxt('./age.txt') 
    sex = np.loadtxt('./sex.txt') 
    dx = np.loadtxt('./dx.txt') 

    np.random.seed(seed=0)

    subject_num = file_idx.shape[0]
    patch_x = 64
    patch_y = 64
    patch_z = 64
    min_x = 0 
    min_y = 0 
    min_z = 0

    augment_size = 512
    data = np.zeros((subject_num, patch_x, patch_y, patch_z,1))
    i = 0
    for subject_idx in file_idx:
        #subject_string = format(int(subject_idx),'04d')
        filename_full = '/fs/neurosci01/qingyuz/lab_data/img_64_longitudinal/'+subject_idx

        img = nib.load(filename_full)
        img_data = img.get_fdata()

        data[i,:,:,:,0] = img_data[min_x:min_x+patch_x, min_y:min_y+patch_y, min_z:min_z+patch_z] 
        data[i,:,:,:,0] = (data[i,:,:,:,0] - np.mean(data[i,:,:,:,0])) / np.std(data[i,:,:,:,0])

        # output an example
        array_img = nib.Nifti1Image(np.squeeze(data[i,:,:,:,0]),np.diag([1, 1, 1, 1]))  
        filename = 'processed_example.nii.gz'
        nib.save(array_img,filename)

        i += 1
    
    ## Train on whole dataset
    #train_data_pos = data[dx==1];
    #train_data_neg = data[dx==0];
    #train_age_pos = age[dx==1];
    #train_age_neg = age[dx==0];
    #train_sex_pos = sex[dx==1];
    #train_sex_neg = sex[dx==0];

    #train_data_pos_aug,train_age_pos_aug,train_sex_pos_aug = augment_by_transformation(train_data_pos,train_age_pos,train_sex_pos,augment_size)
    #del train_data_pos
    #train_data_neg_aug,train_age_neg_aug,train_sex_neg_aug = augment_by_transformation(train_data_neg,train_age_neg,train_sex_neg,augment_size)
    #del train_data_neg

    #train_data_aug = np.concatenate((train_data_neg_aug, train_data_pos_aug), axis=0)
    #del train_data_neg_aug
    #del train_data_pos_aug
    #train_age_aug = np.concatenate((train_age_neg_aug, train_age_pos_aug), axis=0)
    #train_sex_aug = np.concatenate((train_sex_neg_aug, train_sex_pos_aug), axis=0)
    #train_dx_aug = np.zeros((augment_size * 2,))
    #train_dx_aug[augment_size:] = 1   
     
    #gan = GAN()
    #gan.train(epochs=1501, training=[train_data_aug, train_dx_aug, train_age_aug, train_sex_aug], testing=[train_data_aug, train_dx_aug, train_age_aug, train_sex_aug], testing_raw=[data, dx, age, sex], batch_size=64, fold=0)
    
    #exit()

    ## cross-validation
    skf = StratifiedKFold(n_splits=5,shuffle=True)
    pred = np.zeros((dx.shape))

    fold = 1
    for train_idx, test_idx in skf.split(data, dx):  
        if fold < 3:
            fold = fold + 1
            continue

        train_data = data[train_idx]
        train_dx = dx[train_idx]
        train_age = age[train_idx]
        train_sex = sex[train_idx]

        test_data = data[test_idx]
        test_dx = dx[test_idx]
        test_age = age[test_idx]
        test_sex = sex[test_idx]

        # augment data
        train_data_pos = train_data[train_dx==1];
        train_data_neg = train_data[train_dx==0];
        train_age_pos = train_age[train_dx==1];
        train_age_neg = train_age[train_dx==0];
        train_sex_pos = train_sex[train_dx==1];
        train_sex_neg = train_sex[train_dx==0];

        train_data_pos_aug,train_age_pos_aug,train_sex_pos_aug = augment_by_transformation(train_data_pos,train_age_pos,train_sex_pos,augment_size)
        train_data_neg_aug,train_age_neg_aug,train_sex_neg_aug = augment_by_transformation(train_data_neg,train_age_neg,train_sex_neg,augment_size)

        train_data_aug = np.concatenate((train_data_neg_aug, train_data_pos_aug), axis=0)
        train_age_aug = np.concatenate((train_age_neg_aug, train_age_pos_aug), axis=0)
        train_sex_aug = np.concatenate((train_sex_neg_aug, train_sex_pos_aug), axis=0)
        train_dx_aug = np.zeros((augment_size * 2,))
        train_dx_aug[augment_size:] = 1

        test_data_pos = test_data[test_dx==1];
        test_data_neg = test_data[test_dx==0];
        test_age_pos = test_age[test_dx==1];
        test_age_neg = test_age[test_dx==0];
        test_sex_pos = test_sex[test_dx==1];
        test_sex_neg = test_sex[test_dx==0];

        test_data_pos_aug,test_age_pos_aug,test_sex_pos_aug = augment_by_transformation(test_data_pos,test_age_pos,test_sex_pos,500)
        test_data_neg_aug,test_age_neg_aug,test_sex_neg_aug = augment_by_transformation(test_data_neg,test_age_neg,test_sex_neg,500)

        test_data_aug = np.concatenate((test_data_neg_aug, test_data_pos_aug), axis=0)
        test_age_aug = np.concatenate((test_age_neg_aug, test_age_pos_aug), axis=0)
        test_sex_aug = np.concatenate((test_sex_neg_aug, test_sex_pos_aug), axis=0)
        test_dx_aug = np.zeros((500 * 2,))
        test_dx_aug[500:] = 1

        print("Begin Training fold")
        sys.stdout.flush()
        gan = GAN()
        gan.train(epochs=1501, training=[train_data_aug, train_dx_aug, train_age_aug, train_sex_aug], testing=[test_data_aug, test_dx_aug, test_age_aug, test_sex_aug], testing_raw=[test_data, test_dx, test_age, test_sex], batch_size=64, fold=fold)
        fold = fold + 1

FileNotFoundError: ./access.txt not found.