**This jupyer file together with two other jupyer files (the three files are: dataset-builder, utils, and tester) contain the code for redoing the tests conducted in the following paper:**


> Nazari, Ehsan and  Branco, Paula "On Oversampling via Generative Adversarial Networks under Different Data Difficult Factors " International Workshop on Learning with Imbalanced Domains: Theory and Applications. PMLR, 2021.

This files contains the arrchitecture of the Classifier, and the functions to use the upsampling framework.

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

from sklearn.metrics import classification_report

def nn(x,y, x_test, y_test):

    model = Sequential()
    model.add(Dense(10, input_shape=(x.shape[-1],), activation='relu'))
    model.add(Dense(1, activation='sigmoid'))

    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    model.fit(x, y, epochs=20, batch_size=32, verbose=0, validation_split=0.2)

    y_pred = model.predict(x_test)
    y_pred[y_pred>= 0.5] = 1.
    y_pred[y_pred< 0.5] = 0.
    result = classification_report(y_test, y_pred, output_dict=True)

    precision_class_0 = result['0.0']['precision']
    recall_class_0 = result['0.0']['recall']
    f1_class_0 = result['0.0']['f1-score']

    precision_class_1 = result['1.0']['precision']
    recall_class_1 = result['1.0']['recall']
    f1_class_1 = result['1.0']['f1-score']

    return precision_class_0, recall_class_0, f1_class_0, precision_class_1, recall_class_1, f1_class_1


The code can be executed in Google Colab.  
For this we first need to upload all the three ipynb files to a directory, (for example the following:  /content/gdrive/My Drive/Colab Notebooks/GAN upsampling). 
The following code connects Google Colab to Google Drive:

In [None]:
import os
from google.colab import drive
drive.mount('/content/gdrive')

os.chdir('/content/gdrive/My Drive/Colab Notebooks/GAN upsampling/')

Mounted at /content/gdrive


The following cell imports the necessary functions from utils.ipynb:

In [None]:
!pip install import-ipynb
import import_ipynb
from utils import five_fold_train_gan_then_balance_then_train_classifier, five_fold_train_classifier,CGANRAW,save_imgs

Collecting import-ipynb
  Downloading https://files.pythonhosted.org/packages/63/35/495e0021bfdcc924c7cdec4e9fbb87c88dd03b9b9b22419444dc370c8a45/import-ipynb-0.1.3.tar.gz
Building wheels for collected packages: import-ipynb
  Building wheel for import-ipynb (setup.py) ... [?25l[?25hdone
  Created wheel for import-ipynb: filename=import_ipynb-0.1.3-cp37-none-any.whl size=2976 sha256=dc5ed9c1b11c7c9d358857ef7bb7220bfdca01f54b66f3ef773f955d990894ea
  Stored in directory: /root/.cache/pip/wheels/b4/7b/e9/a3a6e496115dffdb4e3085d0ae39ffe8a814eacc44bbf494b5
Successfully built import-ipynb
Installing collected packages: import-ipynb
Successfully installed import-ipynb-0.1.3
importing Jupyter notebook from utils.ipynb


The following function conducts all the tests and reports the results into a csv file in the current directory

**Important:** We need to put an empty csv file named **results.csv** in the current directory so that the results are written to it

In [None]:
from csv import writer

def all_actions(name,
                data_,
                label,
                dominant_class_count,
                imbalance_rate ,
                classifier,
                epochs,
                rotation):
  data=np.copy(data_)
  data.shape = (-1,data.shape[-1]**2)


  
  print('train classifier on imbalanced dataset:')
  p0, r0, f0, p1, r1,f1 = five_fold_train_classifier(data = data
                               ,label = label
                               ,dominant_class_count = dominant_class_count
                               ,imbalance_rate = imbalance_rate
                               ,classifier = classifier)
  print('train classifier on balanced dataset (via GAN):')
  p0a, r0a, f0a,p1a, r1a,f1a = five_fold_train_gan_then_balance_then_train_classifier(data = data,
                                                           label = label,
                                                           dominant_class_count = dominant_class_count,
                                                           imbalance_rate = imbalance_rate,
                                                           classifier = classifier,
                                                           epochs = epochs)
  
  # columns of excel in order:
      # number of features
      # rotation angle
      # class0 count
      # imbalance rate
      # epochs
      # class0 precision
      # class0 recall 
      # class0 f1
      # class1 precision
      # class1 recall 
      # class1 f1
      # class0 precision   after data augmentation
      # class0 recall      after data augmentation
      # class0 f1          after data augmentation
      # class1 precision   after data augmentation
      # class1 recall      after data augmentation
      # class1 f1          after data augmentation
 
  with open('results.csv', 'a') as f_object:
      writer_object = writer(f_object)
      writer_object.writerow([data.shape[-1],
                rotation,
                dominant_class_count,
                imbalance_rate,
                epochs,
                p0, r0, f0, p1, r1,f1,
                p0a, r0a, f0a,p1a, r1a,f1a])
      f_object.close()

  del(data)




reading the datasets from datasets folder:

In [None]:
import numpy as np
onesfours784 = np.load('datasets/onesfours_784.npy')
onesfours784_y = np.load('datasets/onesfours_784_y.npy')

onesfours196 = np.load('datasets/onesfours_196.npy')
onesfours196_y = np.load('datasets/onesfours_196_y.npy')

onesfours64 = np.load('datasets/onesfours_64.npy')
onesfours64_y = np.load('datasets/onesfours_64_y.npy')

onesfours16 = np.load('datasets/onesfours_16.npy')
onesfours16_y = np.load('datasets/onesfours_16_y.npy')





foursfours90_784 = np.load('datasets/foursfours90_784.npy')
foursfours90_784_y = np.load('datasets/foursfours90_784_y.npy')

foursfours90_196 = np.load('datasets/foursfours90_196.npy')
foursfours90_196_y = np.load('datasets/foursfours90_196_y.npy')

foursfours90_64 = np.load('datasets/foursfours90_64.npy')
foursfours90_64_y = np.load('datasets/foursfours90_64_y.npy')

foursfours90_16 = np.load('datasets/foursfours90_16.npy')
foursfours90_16_y = np.load('datasets/foursfours90_16_y.npy')




foursfours45_784 = np.load('datasets/foursfours45_784.npy')
foursfours45_784_y = np.load('datasets/foursfours45_784_y.npy')

foursfours45_196 = np.load('datasets/foursfours45_196.npy')
foursfours45_196_y = np.load('datasets/foursfours45_196_y.npy')

foursfours45_64 = np.load('datasets/foursfours45_64.npy')
foursfours45_64_y = np.load('datasets/foursfours45_64_y.npy')

foursfours45_16 = np.load('datasets/foursfours45_16.npy')
foursfours45_16_y = np.load('datasets/foursfours45_16_y.npy')





foursfours30_784 = np.load('datasets/foursfours30_784.npy')
foursfours30_784_y = np.load('datasets/foursfours30_784_y.npy')

foursfours30_196 = np.load('datasets/foursfours30_196.npy')
foursfours30_196_y = np.load('datasets/foursfours30_196_y.npy')

foursfours30_64 = np.load('datasets/foursfours30_64.npy')
foursfours30_64_y = np.load('datasets/foursfours30_64_y.npy')

foursfours30_16 = np.load('datasets/foursfours30_16.npy')
foursfours30_16_y = np.load('datasets/foursfours30_16_y.npy')



Tests for the dataset with 784 features and the second class rotation of 30 degrees:

In [None]:
for class0_count in [1000, 400, 200, 100]:
  for imbalance_rate in[0.4,0.2,0.1]:
    all_actions(name = 'class0of_'+str(class0_count)+'_imbalanceRateOf_'+str(imbalance_rate)+'_foursfours30_784',
                data_=foursfours30_784,
                label=foursfours30_784_y,
                dominant_class_count=class0_count,
                imbalance_rate=imbalance_rate ,
                classifier=nn,
                epochs=1500,
                rotation=30)

Tests for the dataset with 784 features and the second class rotation of 45 degrees:

In [None]:
for class0_count in [1000, 400, 200, 100]:
  for imbalance_rate in[0.4,0.2,0.1]:
    all_actions(name = 'class0of_'+str(class0_count)+'_imbalanceRateOf_'+str(imbalance_rate)+'_foursfours45_784',
                data_=foursfours45_784,
                label=foursfours45_784_y,
                dominant_class_count=class0_count,
                imbalance_rate=imbalance_rate ,
                classifier=nn,
                epochs=1500,
                rotation=45)

Tests for the dataset with 784 features and the second class rotation of 90 degrees:

In [None]:
for class0_count in [1000, 400, 200, 100]:
  for imbalance_rate in[0.4,0.2,0.1]:
    all_actions(name = 'class0of_'+str(class0_count)+'_imbalanceRateOf_'+str(imbalance_rate)+'_foursfours90_784',
                data_=foursfours90_784,
                label=foursfours90_784_y,
                dominant_class_count=class0_count,
                imbalance_rate=imbalance_rate ,
                classifier=nn,
                epochs=1500,
                rotation=90)

Tests for the dataset with 196 features and the second class rotation of 30 degrees:

In [None]:
for class0_count in [1000, 400, 200, 100]:
  for imbalance_rate in[0.4,0.2,0.1]:
    all_actions(name = 'class0of_'+str(class0_count)+'_imbalanceRateOf_'+str(imbalance_rate)+'_foursfours30_196',
                data_=foursfours30_196,
                label=foursfours30_196_y,
                dominant_class_count=class0_count,
                imbalance_rate=imbalance_rate ,
                classifier=nn,
                epochs=1500,
                rotation=30)

Tests for the dataset with 196 features and the second class rotation of 45 degrees:

In [None]:
for class0_count in [1000, 400, 200, 100]:
  for imbalance_rate in[0.4,0.2,0.1]:
    all_actions(name = 'class0of_'+str(class0_count)+'_imbalanceRateOf_'+str(imbalance_rate)+'_foursfours45_196',
                data_=foursfours45_196,
                label=foursfours45_196_y,
                dominant_class_count=class0_count,
                imbalance_rate=imbalance_rate ,
                classifier=nn,
                epochs=1500,
                rotation=45)

Tests for the dataset with 196 features and the second class rotation of 90 degrees:

In [None]:
for class0_count in [1000, 400, 200, 100]:
  for imbalance_rate in[0.4,0.2,0.1]:
    all_actions(name = 'class0of_'+str(class0_count)+'_imbalanceRateOf_'+str(imbalance_rate)+'_foursfours90_196',
                data_=foursfours90_196,
                label=foursfours90_196_y,
                dominant_class_count=class0_count,
                imbalance_rate=imbalance_rate ,
                classifier=nn,
                epochs=1500,
                rotation=90)

Tests for the dataset with 64 features and the second class rotation of 30 degrees:

In [None]:
for class0_count in [1000, 400, 200, 100]:
  for imbalance_rate in[0.4,0.2,0.1]:
    all_actions(name = 'class0of_'+str(class0_count)+'_imbalanceRateOf_'+str(imbalance_rate)+'_foursfours30_64',
                data_=foursfours30_64,
                label=foursfours30_64_y,
                dominant_class_count=class0_count,
                imbalance_rate=imbalance_rate ,
                classifier=nn,
                epochs=1500,
                rotation=30)

Tests for the dataset with 64 features and the second class rotation of 45 degrees:

In [None]:
for class0_count in [1000, 400, 200, 100]:
  for imbalance_rate in[0.4,0.2,0.1]:
    all_actions(name = 'class0of_'+str(class0_count)+'_imbalanceRateOf_'+str(imbalance_rate)+'_foursfours45_64',
                data_=foursfours45_64,
                label=foursfours45_64_y,
                dominant_class_count=class0_count,
                imbalance_rate=imbalance_rate ,
                classifier=nn,
                epochs=1500,
                rotation=45)

Tests for the dataset with 64 features and the second class rotation of 90 degrees:

In [None]:
for class0_count in [1000, 400, 200, 100]:
  for imbalance_rate in[0.4,0.2,0.1]:
    all_actions(name = 'class0of_'+str(class0_count)+'_imbalanceRateOf_'+str(imbalance_rate)+'_foursfours90_64',
                data_=foursfours90_64,
                label=foursfours90_64_y,
                dominant_class_count=class0_count,
                imbalance_rate=imbalance_rate ,
                classifier=nn,
                epochs=1500,
                rotation=90)

Tests for the dataset with 16 features and the second class rotation of 30 degrees:

In [None]:
for class0_count in [1000, 400, 200, 100]:
  for imbalance_rate in[0.4,0.2,0.1]:
    all_actions(name = 'class0of_'+str(class0_count)+'_imbalanceRateOf_'+str(imbalance_rate)+'_foursfours30_16',
                data_=foursfours30_16,
                label=foursfours30_16_y,
                dominant_class_count=class0_count,
                imbalance_rate=imbalance_rate ,
                classifier=nn,
                epochs=1500,
                rotation=30)

Tests for the dataset with 16 features and the second class rotation of 45 degrees:

In [None]:
for class0_count in [1000, 400, 200, 100]:
  for imbalance_rate in[0.4,0.2,0.1]:
    all_actions(name = 'class0of_'+str(class0_count)+'_imbalanceRateOf_'+str(imbalance_rate)+'_foursfours45_16',
                data_=foursfours45_16,
                label=foursfours45_16_y,
                dominant_class_count=class0_count,
                imbalance_rate=imbalance_rate ,
                classifier=nn,
                epochs=1500,
                rotation=45)

Tests for the dataset with 16 features and the second class rotation of 90 degrees:

In [None]:
for class0_count in [1000, 400, 200, 100]:
  for imbalance_rate in[0.4,0.2,0.1]:
    all_actions(name = 'class0of_'+str(class0_count)+'_imbalanceRateOf_'+str(imbalance_rate)+'_foursfours90_16',
                data_=foursfours90_16,
                label=foursfours90_16_y,
                dominant_class_count=class0_count,
                imbalance_rate=imbalance_rate ,
                classifier=nn,
                epochs=1500,
                rotation=90)