<a href="https://colab.research.google.com/github/noo-rashbass/synthetic-data-service/blob/master/doppelganger/gan/doppelganger_colab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

####Choose GPU runtime
Select Runtime, 

change runtime type, 

Hardware Accelerator-GPU, 

untick "Omit cell output..."

#### To prevent Colab from disconnecting

ctrl + shift + I

go to Console

Paste the following:
~~~
function KeepClicking(){
console.log("Clicking");
document.querySelector("colab-connect-button").click()
}
setInterval(KeepClicking,60000)
~~~
or 
~~~
function ClickConnect(){
console.log("Working"); 
document.querySelector("colab-toolbar-button").click() 
}setInterval(ClickConnect,60000)
~~~

####Upload the data files and doppelganger code from Github
Your directory should look like this

-data  
  >--data_attribute_output.pkl  
  --data_feature_output.pkl  
  --data_train.npz  

-doppelganger.py  
-load_data.py  
-network.py  
-networkGenerator.py  
-output.py  
-util.py

Note: You do not need to upload main.py. Example of what the directory should look like with checkpoint folders are at the end of the notebook  

To use first_dday as an attribute (separate column), upload files from data_attr folder on Github and put under the data folder on Colab

If first_dday is not considered as attribute, upload files from data folder on Github and put under the data folder on Colab

### Uploading checkpoints

If any checkpoint zip folders were downloaded previously, upload the 3 zip files (xxx.zip) and use the cell below to unzip into their respective folders

In [None]:
# unzip data file
!unzip data.zip

Archive:  data.zip
  inflating: data_train.npz          
  inflating: data_attribute_output.pkl  
  inflating: data_feature_output.pkl  


In [None]:
!rm -rf data.zip

In [None]:
!unzip tf_ckpts_ad.zip
!unzip tf_ckpts_d.zip
!unzip tf_ckpts_g.zip

Archive:  tf_ckpts_ad.zip
   creating: tf_ckpts_ad/
  inflating: tf_ckpts_ad/ckpt-800.index  
  inflating: tf_ckpts_ad/ckpt-799.data-00000-of-00001  
  inflating: tf_ckpts_ad/ckpt-798.index  
  inflating: tf_ckpts_ad/ckpt-799.index  
  inflating: tf_ckpts_ad/checkpoint  
  inflating: tf_ckpts_ad/ckpt-800.data-00000-of-00001  
  inflating: tf_ckpts_ad/ckpt-798.data-00000-of-00001  
Archive:  tf_ckpts_d.zip
   creating: tf_ckpts_d/
  inflating: tf_ckpts_d/ckpt-800.index  
  inflating: tf_ckpts_d/ckpt-799.data-00000-of-00001  
  inflating: tf_ckpts_d/ckpt-798.index  
  inflating: tf_ckpts_d/ckpt-799.index  
  inflating: tf_ckpts_d/checkpoint   
  inflating: tf_ckpts_d/ckpt-800.data-00000-of-00001  
  inflating: tf_ckpts_d/ckpt-798.data-00000-of-00001  
Archive:  tf_ckpts_g.zip
   creating: tf_ckpts_g/
  inflating: tf_ckpts_g/ckpt-800.index  
  inflating: tf_ckpts_g/ckpt-799.data-00000-of-00001  
  inflating: tf_ckpts_g/ckpt-798.index  
  inflating: tf_ckpts_g/ckpt-799.index  
  inflating:

In [None]:
# remove zip files you just uploaded
!rm -rf tf_ckpts_ad.zip
!rm -rf tf_ckpts_d.zip
!rm -rf tf_ckpts_g.zip

####Start training and generating data with DoppelGANger by running the cell below

Note: Change epochs, batch_size and cumsum_bool if needed

In [None]:
from load_data import *
from util import *
import tensorflow as tf
import numpy as np
import os

from network import make_discriminator, make_attrdiscriminator
from networkGenerator import DoppelGANgerGenerator
from doppelganger import DoppelGANger

## change these values if needed
seq_len = 130
batch_size = 64
epochs = 200
total_generate_num_sample = 1347
cumsum_bool = True                # change to True to run the code that tries 2 peaks
path_to_data = "data"

(data_feature, data_attribute, data_gen_flag, data_feature_outputs, data_attribute_outputs) = load_data(path_to_data)

print("-----DATA LOADING-----")
print(data_feature.shape)          # original features_dim        
print(data_attribute.shape)        # original attributes_dim
print(data_gen_flag.shape)
num_real_attribute = len(data_attribute_outputs)

(data_feature, data_attribute, data_attribute_outputs, real_attribute_mask) = \
    normalize_per_sample(data_feature, data_attribute, data_feature_outputs,data_attribute_outputs)

print("-----DATA NORMALIZATION-----")
print(real_attribute_mask)
print(data_feature.shape)
attributes_dim = data_attribute.shape[1]    # attributes_dim to be fed into model
print(data_attribute.shape)       
print(len(data_attribute_outputs))

print("-----ADD GEN FLAG -----")
data_feature, data_feature_outputs = add_gen_flag(
        data_feature, data_gen_flag, data_feature_outputs, seq_len)
features_dim = data_feature.shape[2]    # features dim to be fed into model
print(data_feature.shape)        
print(len(data_feature_outputs))

discriminator_model = make_discriminator(seq_len, features_dim, attributes_dim)
attrdiscriminator_model = make_attrdiscriminator(attributes_dim)

generator = DoppelGANgerGenerator(
        feed_back=False,
        noise=True,
        feature_outputs=data_feature_outputs,
        attribute_outputs=data_attribute_outputs,
        real_attribute_mask=real_attribute_mask,
        sample_len=seq_len)


gan = DoppelGANger(
    epoch=epochs, 
    batch_size=batch_size, 
    data_feature=data_feature, 
    data_attribute=data_attribute, 
    real_attribute_mask=real_attribute_mask, 
    data_gen_flag=data_gen_flag,
    seq_len=seq_len, 
    data_feature_outputs=data_feature_outputs, 
    data_attribute_outputs=data_feature_outputs,
    generator = generator, 
    discriminator = discriminator_model, 
    d_rounds=1, 
    g_rounds=1, 
    d_gp_coe=10.,
    num_packing=1,
    attr_discriminator=attrdiscriminator_model,
    attr_d_gp_coe=10., 
    g_attr_d_coe=1.0,
    cumsum=cumsum_bool)

#combine data attributes and features into one to be fed into the model
# data_attribute_in = tf.expand_dims(data_attribute, axis=1)
# data_attribute_in = tf.repeat(data_attribute_in, repeats=seq_len, axis=1)
# data_all_in = tf.cast(tf.concat([data_feature, data_attribute_in], axis=2), dtype=tf.float32)

print("----START TRAINING-----")
#gan.compile()

# if any callbacks are needed
# callback1 = tf.keras.callbacks.EarlyStopping(monitor='d_loss', patience=3)
# callback2 = tf.keras.callbacks.EarlyStopping(monitor='ad_loss', patience=3)
# callback3 = tf.keras.callbacks.EarlyStopping(monitor='g_loss', patience=3)

#gan.fit(data_all_in, batch_size=batch_size, epochs=epochs) #, callbacks=[callback1, callback2]
gan.train_step()

print("----FINISHED TRAINING-----")

print("----START GENERATING------")

if data_feature.shape[1] % seq_len != 0:
    raise Exception("length must be a multiple of sample_len")
length = int(data_feature.shape[1] / seq_len)
real_attribute_input_noise = gan.gen_attribute_input_noise(total_generate_num_sample) #(?,5)
addi_attribute_input_noise = gan.gen_attribute_input_noise(total_generate_num_sample) #(?,5)
feature_input_noise = gan.gen_feature_input_noise(total_generate_num_sample, length) #(?,1,5)
input_data = gan.gen_feature_input_data_free(total_generate_num_sample) #(?,28)

features, attributes, gen_flags, lengths = \
    gan.sample_from(real_attribute_input_noise, addi_attribute_input_noise,feature_input_noise, input_data)
# specify given_attribute parameter, if you want to generate
# data according to an attribute
print("----SAMPLE FROM-----")
print(features.shape)
print(attributes.shape)
print(gen_flags.shape)
print(lengths.shape)

features, attributes = renormalize_per_sample(features, attributes, data_feature_outputs,
    data_attribute_outputs, gen_flags, num_real_attribute=num_real_attribute)
print("----RENORMALIZATION-----")
print(features.shape)
print(attributes.shape)

np.savez(
        "generated_data_train.npz",
        data_feature=features,
        data_attribute=attributes,
        data_gen_flag=gen_flags)

print("Done")

-----DATA LOADING-----
(1347, 130, 47)
(1347, 1)
(1347, 130)
-----DATA NORMALIZATION-----
[True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False]
(1347, 130, 47)
(1347, 35)
35
-----ADD GEN FLAG -----
(1347, 130, 49)
27
----START TRAINING-----
G restored from tf_ckpts_g/ckpt-800
D restored from tf_ckpts_d/ckpt-800
AD restored from tf_ckpts_ad/ckpt-800
epoch:  0


To change all layers to have dtype float64 by default, call `tf.keras.backend.set_floatx('float64')`. To change just this layer, pass dtype='float64' to the layer constructor. If you are the author of this layer, you can disable autocasting by passing autocast=False to the base Layer constructor.



To change all layers to have dtype float64 by default, call `tf.keras.backend.set_floatx('float64')`. To change just this layer, pass dtype='floa

Remember to download the generated file "generated_data_train.npz" by selecting the 3 vertical dots next to the file name and selecting download

Remember to also download the checkpoint folders so that they can be loaded during the next training session.

In [None]:
# zip the checkpoint folders
!zip -r tf_ckpts_ad.zip tf_ckpts_ad
!zip -r tf_ckpts_d.zip tf_ckpts_d
!zip -r tf_ckpts_g.zip tf_ckpts_g

  adding: tf_ckpts_ad/ (stored 0%)
  adding: tf_ckpts_ad/ckpt-999.index (deflated 68%)
  adding: tf_ckpts_ad/ckpt-998.index (deflated 68%)
  adding: tf_ckpts_ad/checkpoint (deflated 63%)
  adding: tf_ckpts_ad/ckpt-1000.index (deflated 68%)
  adding: tf_ckpts_ad/ckpt-1000.data-00000-of-00001 (deflated 25%)
  adding: tf_ckpts_ad/ckpt-998.data-00000-of-00001 (deflated 25%)
  adding: tf_ckpts_ad/ckpt-999.data-00000-of-00001 (deflated 25%)
  adding: tf_ckpts_d/ (stored 0%)
  adding: tf_ckpts_d/ckpt-999.index (deflated 68%)
  adding: tf_ckpts_d/ckpt-998.index (deflated 68%)
  adding: tf_ckpts_d/checkpoint (deflated 63%)
  adding: tf_ckpts_d/ckpt-1000.index (deflated 68%)
  adding: tf_ckpts_d/ckpt-1000.data-00000-of-00001 (deflated 30%)
  adding: tf_ckpts_d/ckpt-998.data-00000-of-00001 (deflated 30%)
  adding: tf_ckpts_d/ckpt-999.data-00000-of-00001 (deflated 30%)
  adding: tf_ckpts_g/ (stored 0%)
  adding: tf_ckpts_g/ckpt-999.index (deflated 82%)
  adding: tf_ckpts_g/ckpt-998.index (deflated

In [None]:
# download zip files (***might only work with google chrome, if it doesn't work, manually download zip by pressing the 3 vertical dots)
from google.colab import files
files.download("tf_ckpts_ad.zip")
files.download("tf_ckpts_d.zip")
files.download("tf_ckpts_g.zip")

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [None]:
files.download("generated_data_train.npz")

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

Take a note of how the file directory looks like with the checkpoint folders and make sure that it looks the same the next you run the training

-data  
  >--data_attribute_output.pkl  
  --data_feature_output.pkl  
  --data_train.npz  

-doppelganger.py  
-load_data.py  
-network.py  
-networkGenerator.py  
-output.py  
-util.py  
-tf_ckpts_ad  (folder)  
-tf_ckpts_d  (folder)  
-tf_ckpts_g  (folder)  

####How to terminate runtime

1. Go to the top right corner with RAM and Disk and press the downwards arrow

2. Select Manage Sessions

3. Select Terminate

**Note that all uploaded and generated files will be deleted when you terminate runtime**


In [None]:
# if you need to delete any folders

#!rm -rf <replace_with name_of_folder>