# **Team4      권성수, 문대정, 이재철, 조은경**

<h1>Text2Image implementation with StackGAN</h1>

# Motivation for the project and an explanation of the problem statement

<h2>The power of visualization</h2>

* 인간은 시각화된 자료를 쉽게 인지할 수 있다고 함.
* 오감 중에서 시각이 70~80%의 정보를 습득하는 수용체
* 시각화가 가능해지면 "Read"가 아닌, "See"의 방법으로 더 빨리 정보를 받아들일 수 있게 됨.

![대체 텍스트](https://i.ibb.co/CMwcnZb/num-dummy.png)

<h2>Text2Image? => StackGAN</h2>

![대체 텍스트](https://miro.medium.com/max/3356/1*g-0onhpbu6dU0aZbpfEUeA.jpeg)

#A description of the data

![대체 텍스트](https://vision.cornell.edu/se3/wp-content/uploads/2017/04/Screenshot-from-2017-04-29-16-36-17-705x456.png)

* Caltech-UCSD Birds 200
* 총 200개의 다른 종으로 이루어진 11,788장의 이미지
* 각 이미지는 새를 segmentation하고, 속성이 어떠한지 카테고리별로 설명되어있음.
* ex) 부위별 색상은 어떤 색인지, 사이즈는 small/medium/large인지, 부위별 패턴은 어떤지 영단어로 표시되어 있음.

# Hyperparameter and architecture choices that were explored

<h2>Idea</h2>

* 기존의 Text to image : 주어진 문장을 기반으로 하나의 GAN을 통해 이미지를 생성함
* Text to image는 어려운 문제이므로 **두가지 sub problem으로 나누어 보다 고해상도의 이미지를 생성하자!**

<h2>StackGAN Model Architecture</h2>



![대체 텍스트](https://miro.medium.com/max/4756/1*NwDrP1Zi6xj1bGN62wrf3g.jpeg)

<h2>StackGAN Model - Conditioning Augmentation</h2>

* 한정적인 데이터가 고차원의 text embedding 공간에서 불연속성을 야기하는 문제점을 해결하기 위해 도입한 구조.

* $N(\mu(\rho_t)),\sigma(\rho_t))$ 에서 샘플링한 conditioning variable을 text feature로 사용하여 generator의 input에 추가시킴.

* Gaussian distribution에서 샘플링한 text embedding을 사용하므로 randomness가 더해져 동일한 문장에 대해서도 다양한 이미지를 생성할 수 있는 효과가 있음.

* 해당 방법은 generator의 loss에 $D_K$$_L(N(\mu(\rho_t)),\sigma(\rho_t))||N(0,1))$를 추가함으로서 가능함.


![대체 텍스트](https://i.ibb.co/vH2D052/1.pnghttps://)

<h2>StackGAN Model - Stage1</h2>

* GAN을 이용해 text에 대한 초기 shape과 color를 나타내는 저화질 이미지를 1차적으로 생성하는 단계

* Generator : conditioning variable과 noise로부터 저화질의 초기 이미지를 생성함.

* Discriminator : image와 text feature를 기반으로 real / fake 판단.


![대체 텍스트](https://i.ibb.co/R3t6hLK/2.png)

<h2>StackGAN Model - Stage2</h2>

* Stage1에서 생성한 이미지를 수정하고 추가적으로 디테일한 부분을 생성하는 단계.

* Generator : conditioning variable과 stage1의 결과로부터 보다 자세한 고화질 이미지 생성.

* Discriminator : image와 text feature를 기반으로 real / fake 판단.


![대체 텍스트](https://i.ibb.co/C5pqPcP/4.png)

# 구현 방법
1. Batch normalization X + Adam optimizer 사용
2. Batch normalization X + RMSprop optimzer 사용
3. Batch normalization X + RMSprop optimizer + Wassertein Loss 변경
4. Batch normalization O + Adam optimizer 사용 
5. Batch normalization O + Adam optimizer + Wassertein Loss 변경 + Dropout 추가
6. Learning rate 기존의 10배로 학습 진행 



# Code implementation

> ## Mount / Extract

In [None]:
from google.colab import drive
drive.mount('/content/gdrive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /content/gdrive


In [None]:
import os
import tarfile

fname = '/content/gdrive/My Drive/dl_teamproject_folder/CUB_200_2011/CUB_200_2011.tgz'  # 압축 파일을 지정해주고   
ap = tarfile.open(fname)      # 열어줍니다. 

ap.extractall('/content/gdrive/My Drive/dl_teamproject_folder/CUB_200_2011')         # 그리고는 압축을 풀어줍니다. 
# () 안에는 풀고 싶은 경로를 넣어주면 되요. 비워둘 경우 현재 경로에 압축 풉니다. 
 
ap.close()  

> ## Importing Libraries

In [None]:
import os
import pickle
import random
import time

import PIL
import numpy as np
import pandas as pd
import tensorflow as tf
from PIL import Image
from keras import Input, Model
from keras import backend as K
from keras.callbacks import TensorBoard
from keras.layers import Dense, LeakyReLU, BatchNormalization, ReLU, Reshape, UpSampling2D, Conv2D, Activation, \
    concatenate, Flatten, Lambda, Concatenate, ZeroPadding2D, Dropout
from keras.optimizers import Adam
from matplotlib import pyplot as plt
from keras.layers import add

> ## Loading of Dataset

In [None]:
#pickle(텍스트가 아닌 객체 자체인 파일 ex.list) 불러오기 위한 함수 (labels저장된 파일)
def load_class_ids(class_info_file_path):
    with open(class_info_file_path, 'rb') as f:
        class_ids = pickle.load(f, encoding='latin1')
        return class_ids
       
#임베딩     
def load_embeddings(embeddings_file_path):
    with open(embeddings_file_path, 'rb') as f:
        embeddings = pickle.load(f, encoding='latin1')
        embeddings = np.array(embeddings)
        print('embeddings: ', embeddings.shape)
    return embeddings

#pickle 파일이름 불러오기
def load_filenames(filenames_file_path):
    with open(filenames_file_path, 'rb') as f:
        filenames = pickle.load(f, encoding='latin1')
    return filenames

#image detection위한 bounding box(바운딩 박스와 일치하는 파일의 dictionary 불러오기) 
def load_bounding_boxes(dataset_dir):
    # Paths
    bounding_boxes_path = os.path.join(dataset_dir, 'bounding_boxes.txt')
    file_paths_path = os.path.join(dataset_dir, 'images.txt')

    # Read bounding_boxes.txt and images.txt file
    df_bounding_boxes = pd.read_csv(bounding_boxes_path,
                                    delim_whitespace=True, header=None).astype(int) #delim_whitespace : 공백으로 구분된 값 파일 읽기
    df_file_names = pd.read_csv(file_paths_path, delim_whitespace=True, header=None)

    # Create a list of file names
    file_names = df_file_names[1].tolist()

    # Create a dictionary of file_names and bounding boxes
    filename_boundingbox_dict = {img_file[:-4]: [] for img_file in file_names[:2]}

    # Assign a bounding box to the corresponding image
    for i in range(0, len(file_names)):
        # Get the bounding box
        bounding_box = df_bounding_boxes.iloc[i][1:].tolist()
        key = file_names[i][:-4]
        filename_boundingbox_dict[key] = bounding_box

    return filename_boundingbox_dict

#image 바운딩 박스로 자르고, 주어진 사이즈로 이미지 resize
def get_img(img_path, bbox, image_size):
    img = Image.open(img_path).convert('RGB')
    width, height = img.size
    if bbox is not None:
        R = int(np.maximum(bbox[2], bbox[3]) * 0.75)
        center_x = int((2 * bbox[0] + bbox[2]) / 2)
        center_y = int((2 * bbox[1] + bbox[3]) / 2)
        y1 = np.maximum(0, center_y - R)
        y2 = np.minimum(height, center_y + R)
        x1 = np.maximum(0, center_x - R)
        x2 = np.minimum(width, center_x + R)
        img = img.crop([x1, y1, x2, y2])
    img = img.resize(image_size, PIL.Image.BILINEAR)
    return img
 
 #트레이닝하기 위한 데이터셋 로드 : image, labels, 일치하는 embedding return
def load_dataset(filenames_file_path, class_info_file_path, cub_dataset_dir, embeddings_file_path, image_size):
    filenames = load_filenames(filenames_file_path)
    class_ids = load_class_ids(class_info_file_path)
    bounding_boxes = load_bounding_boxes(cub_dataset_dir)
    all_embeddings = load_embeddings(embeddings_file_path)

    X, y, embeddings = [], [], []
    print("Embeddings shape:", all_embeddings.shape)

    for index, filename in enumerate(filenames):
        bounding_box = bounding_boxes[filename]
        try:
            img_name = '{}/images/{}.jpg'.format(cub_dataset_dir, filename)
            img = get_img(img_name, bounding_box, image_size)

            all_embeddings1 = all_embeddings[index, :, :]

            embedding_ix = random.randint(0, all_embeddings1.shape[0] - 1)#0과 임베딩크기 사이 정수 랜덤반환
            embedding = all_embeddings1[embedding_ix, :]

            X.append(np.array(img))
            y.append(class_ids[index])
            embeddings.append(embedding)
        except Exception as e:
            print(e)

    X = np.array(X)
    y = np.array(y)
    embeddings = np.array(embeddings)
    return X, y, embeddings

> ## Model Creation

In [None]:
def generate_c(x):
    mean = x[:, :128] #(batch,128)dims의 tensor생성
    log_sigma = x[:, 128:]
    stddev = K.exp(log_sigma) #from keras import backend as K | exp = exponential
    epsilon = K.random_normal(shape=K.constant((mean.shape[1],), dtype='int32'))  # random normal vector with mean=0 and std=1.0
    c = stddev * epsilon + mean #text conditioning variable 계산 | 모델 아키텍쳐 그림 중에서 c0 햇 부분
    return c

#conditioning augmentation: text embedding vector를 conditioning latent variables로 변환  
def build_ca_model():
    input_layer = Input(shape=(1024,))
    x = Dense(256)(input_layer)
    x = LeakyReLU(alpha=0.2)(x)
    model = Model(inputs=[input_layer], outputs=[x])
    return model  # Takes an embedding of shape (1024,) and returns a tensor of shape (256,)
  
def build_embedding_compressor_model():
    input_layer = Input(shape=(1024,))
    x = Dense(128)(input_layer)
    x = ReLU()(x)
    model = Model(inputs=[input_layer], outputs=[x])
    return model


def build_stage1_generator():
    input_layer = Input(shape=(1024,)) #noise variable
    x = Dense(256)(input_layer)
    mean_logsigma = LeakyReLU(alpha=0.2)(x)

    c = Lambda(generate_c)(mean_logsigma)

    input_layer2 = Input(shape=(100,))

    gen_input = Concatenate(axis=1)([c, input_layer2]) #text-conditioning variable/noise variable

    x = Dense(128 * 8 * 4 * 4, use_bias=False)(gen_input)
    x = ReLU()(x)

    x = Reshape((4, 4, 128 * 8), input_shape=(128 * 8 * 4 * 4,))(x) #2d tensor->4d tensor로 변환

    x = UpSampling2D(size=(2, 2))(x)
    x = Conv2D(512, kernel_size=3, padding="same", strides=1, use_bias=False)(x)
    x = BatchNormalization()(x) #bn 사용 - > bias=False
    x = ReLU()(x)

    x = UpSampling2D(size=(2, 2))(x)
    x = Conv2D(256, kernel_size=3, padding="same", strides=1, use_bias=False)(x)
    x = BatchNormalization()(x)
    x = ReLU()(x)

    x = UpSampling2D(size=(2, 2))(x)
    x = Conv2D(128, kernel_size=3, padding="same", strides=1, use_bias=False)(x)
    x = BatchNormalization()(x)
    x = ReLU()(x)

    x = UpSampling2D(size=(2, 2))(x)
    x = Conv2D(64, kernel_size=3, padding="same", strides=1, use_bias=False)(x)
    x = BatchNormalization()(x)
    x = ReLU()(x)

    x = Conv2D(3, kernel_size=3, padding="same", strides=1, use_bias=False)(x)
    x = Activation(activation='tanh')(x) #저해상도 이미지 생성할 generator

    stage1_gen = Model(inputs=[input_layer, input_layer2], outputs=[x, mean_logsigma])
    stage1_gen.summary()

    checkpoint_path =  '/content/gdrive/My Drive/dl_teamproject_folder/filepath/gen/model.{epoch:02d}.hdf5'
    checkpoint_dir = os.path.dirname(checkpoint_path)

    cp_callback = tf.keras.callbacks.ModelCheckpoint(checkpoint_path, verbose=1, save_weights_only=True,   period=5)

    return stage1_gen

In [None]:
stage1_generator = build_stage1_generator()
stage1_generator.summary()

Model: "model_1"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            (None, 1024)         0                                            
__________________________________________________________________________________________________
dense_1 (Dense)                 (None, 256)          262400      input_1[0][0]                    
__________________________________________________________________________________________________
leaky_re_lu_1 (LeakyReLU)       (None, 256)          0           dense_1[0][0]                    
__________________________________________________________________________________________________
lambda_1 (Lambda)               (None, 128)          0           leaky_re_lu_1[0][0]              
____________________________________________________________________________________________

In [None]:
def build_stage1_discriminator():
    """
    discriminator는 모델 아키텍쳐 그림에서처럼 2개의 input을 받는다 
    1) generator거쳐서 upsampling된 네트워크를 다시 downsampling해서 만든 3차원의 4x4x512의 네트워크
    2) 3번에서 concatenate하기 위해 embedding layer를 같은 shape으로 만들어준다. 4x4x128 
    3. Concatenate 시키고, 마지막 로짓값(0~1)을 얻기 위해 마지막 모듈(merged_input ~ x2)로 넣어준다.
    """
    input_layer = Input(shape=(64, 64, 3))

    x = Conv2D(64, (4, 4),
               padding='same', strides=2,
               input_shape=(64, 64, 3), use_bias=False)(input_layer)
    x = LeakyReLU(alpha=0.2)(x)
    x = Dropout(0.3)(x)

    x = Conv2D(128, (4, 4), padding='same', strides=2, use_bias=False)(x)
    x = BatchNormalization()(x)
    x = LeakyReLU(alpha=0.2)(x)
    x = Dropout(0.3)(x)

    x = Conv2D(256, (4, 4), padding='same', strides=2, use_bias=False)(x)
    x = BatchNormalization()(x)
    x = LeakyReLU(alpha=0.2)(x)
    x = Dropout(0.3)(x)    
    

    x = Conv2D(512, (4, 4), padding='same', strides=2, use_bias=False)(x)
    x = BatchNormalization()(x)
    x = LeakyReLU(alpha=0.2)(x)
    x = Dropout(0.3)(x)    

    input_layer2 = Input(shape=(4, 4, 128))

    merged_input = concatenate([x, input_layer2])

    x2 = Conv2D(64 * 8, kernel_size=1,
                padding="same", strides=1)(merged_input)
    x2 = BatchNormalization()(x2)
    x2 = LeakyReLU(alpha=0.2)(x2)
    x2 = Flatten()(x2)
    x2 = Dense(1)(x2)
    x2 = Activation('sigmoid')(x2)

    stage1_dis = Model(inputs=[input_layer, input_layer2], outputs=[x2])
    checkpoint_path2 =  '/content/gdrive/My Drive/dl_teamproject_folder/filepath/dis/model.{epoch:02d}.hdf5'
    checkpoint_dir2 = os.path.dirname(checkpoint_path2)

    cp_callback2 = tf.keras.callbacks.ModelCheckpoint(checkpoint_path2, verbose=1, save_weights_only=True,   period=5)
    return stage1_dis

In [None]:
stage1_discriminator = build_stage1_discriminator()
stage1_discriminator.summary()

Model: "model_2"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_4 (InputLayer)            (None, 64, 64, 3)    0                                            
__________________________________________________________________________________________________
conv2d_7 (Conv2D)               (None, 32, 32, 64)   3072        input_4[0][0]                    
__________________________________________________________________________________________________
leaky_re_lu_3 (LeakyReLU)       (None, 32, 32, 64)   0           conv2d_7[0][0]                   
__________________________________________________________________________________________________
dropout_1 (Dropout)             (None, 32, 32, 64)   0           leaky_re_lu_3[0][0]              
____________________________________________________________________________________________

In [None]:
def build_adversarial_model(gen_model, dis_model):
    input_layer = Input(shape=(1024,)) # 1024 = stage1_generator에 들어갈 input size
    input_layer2 = Input(shape=(100,)) # 100 = noise 변수의 input size
    input_layer3 = Input(shape=(4, 4, 128)) 

    x, mean_logsigma = gen_model([input_layer, input_layer2]) # stage1_gen 처럼 나온 output

    dis_model.trainable = False
    valid = dis_model([x, input_layer3]) # stage1_gen 처럼 나온 output과 임베딩 logit값?

    model = Model(inputs=[input_layer, input_layer2, input_layer3], outputs=[valid, mean_logsigma])
    return model

In [None]:
def residual_block(input):
    """
    그래디언트가 잘 흐를 수 있도록 일종의 지름길(shortcut, skip connection)을 만들어 주자는 생각
    """
    x = Conv2D(128 * 4, kernel_size=(3, 3), padding='same', strides=1)(input)
    x = BatchNormalization()(x)
    x = ReLU()(x)

    x = Conv2D(128 * 4, kernel_size=(3, 3), strides=1, padding='same')(x)
    x = BatchNormalization()(x)

    x = add([x, input])
    x = ReLU()(x)

    return x

def joint_block(inputs): # 임베딩한 결과와 CA를 합침
    c = inputs[0]
    x = inputs[1]

    c = K.expand_dims(c, axis=1)
    c = K.expand_dims(c, axis=1)
    c = K.tile(c, [1, 16, 16, 1])
    return K.concatenate([c, x], axis=3)
  

In [None]:
def build_stage2_generator():
    """
    CA 네트워크를 포함한 stage 2 generator 생성
    """

    # 1. CA Augmentation Network
    input_layer = Input(shape=(1024,))
    input_lr_images = Input(shape=(64, 64, 3))

    ca = Dense(256)(input_layer)
    mean_logsigma = LeakyReLU(alpha=0.2)(ca)
    c = Lambda(generate_c)(mean_logsigma)

    # 2. Image Encoder
    x = ZeroPadding2D(padding=(1, 1))(input_lr_images)
    x = Conv2D(128, kernel_size=(3, 3), strides=1, use_bias=False)(x)
    x = ReLU()(x)

    x = ZeroPadding2D(padding=(1, 1))(x)
    x = Conv2D(256, kernel_size=(4, 4), strides=2, use_bias=False)(x)
    x = BatchNormalization()(x)
    x = ReLU()(x)

    x = ZeroPadding2D(padding=(1, 1))(x)
    x = Conv2D(512, kernel_size=(4, 4), strides=2, use_bias=False)(x)
    x = BatchNormalization()(x)
    x = ReLU()(x)

    # 3. Joint
    c_code = Lambda(joint_block)([c, x])

    x = ZeroPadding2D(padding=(1, 1))(c_code)
    x = Conv2D(512, kernel_size=(3, 3), strides=1, use_bias=False)(x)
    x = BatchNormalization()(x)
    x = ReLU()(x)

    # 4. Residual blocks
    x = residual_block(x)
    x = residual_block(x)
    x = residual_block(x)
    x = residual_block(x)

    # 5. Upsampling blocks
    x = UpSampling2D(size=(2, 2))(x)
    x = Conv2D(512, kernel_size=3, padding="same", strides=1, use_bias=False)(x)
    x = BatchNormalization()(x)
    x = ReLU()(x)

    x = UpSampling2D(size=(2, 2))(x)
    x = Conv2D(256, kernel_size=3, padding="same", strides=1, use_bias=False)(x)
    x = BatchNormalization()(x)
    x = ReLU()(x)

    x = UpSampling2D(size=(2, 2))(x)
    x = Conv2D(128, kernel_size=3, padding="same", strides=1, use_bias=False)(x)
    x = BatchNormalization()(x)
    x = ReLU()(x)

    x = UpSampling2D(size=(2, 2))(x)
    x = Conv2D(64, kernel_size=3, padding="same", strides=1, use_bias=False)(x)
    x = BatchNormalization()(x)
    x = ReLU()(x)

    x = Conv2D(3, kernel_size=3, padding="same", strides=1, use_bias=False)(x)
    x = Activation('tanh')(x)

    model = Model(inputs=[input_layer, input_lr_images], outputs=[x, mean_logsigma])
    return model

In [None]:
def build_stage2_discriminator():
    """
    stage 2 discriminator 모델 만들기
    """
    input_layer = Input(shape=(256, 256, 3))

    x = Conv2D(64, (4, 4), padding='same', strides=2, input_shape=(256, 256, 3), use_bias=False)(input_layer)
    x = LeakyReLU(alpha=0.2)(x)
    x = Dropout(0.3)(x)    


    x = Conv2D(128, (4, 4), padding='same', strides=2, use_bias=False)(x)
    x = BatchNormalization()(x)
    x = LeakyReLU(alpha=0.2)(x)
    x = Dropout(0.3)(x)    

    x = Conv2D(256, (4, 4), padding='same', strides=2, use_bias=False)(x)
    x = BatchNormalization()(x)
    x = LeakyReLU(alpha=0.2)(x)
    x = Dropout(0.3)(x)    

    x = Conv2D(512, (4, 4), padding='same', strides=2, use_bias=False)(x)
    x = BatchNormalization()(x)
    x = LeakyReLU(alpha=0.2)(x)
    x = Dropout(0.3)(x)    

    x = Conv2D(1024, (4, 4), padding='same', strides=2, use_bias=False)(x)
    x = BatchNormalization()(x)
    x = LeakyReLU(alpha=0.2)(x) 
    x = Dropout(0.3)(x)    


    x = Conv2D(2048, (4, 4), padding='same', strides=2, use_bias=False)(x)
    x = BatchNormalization()(x)
    x = LeakyReLU(alpha=0.2)(x)
    x = Dropout(0.3)(x)    

    x = Conv2D(1024, (1, 1), padding='same', strides=1, use_bias=False)(x)
    x = BatchNormalization()(x)
    x = LeakyReLU(alpha=0.2)(x)
    x = Dropout(0.3)(x)    

    x = Conv2D(512, (1, 1), padding='same', strides=1, use_bias=False)(x)
    x = BatchNormalization()(x)
    x = Dropout(0.3)(x)    

    x2 = Conv2D(128, (1, 1), padding='same', strides=1, use_bias=False)(x)
    x2 = BatchNormalization()(x2)
    x2 = LeakyReLU(alpha=0.2)(x2)
    x2 = Dropout(0.3)(x)    

    x2 = Conv2D(128, (3, 3), padding='same', strides=1, use_bias=False)(x2)
    x2 = BatchNormalization()(x2)
    x2 = LeakyReLU(alpha=0.2)(x2)
    x2 = Dropout(0.3)(x)    

    x2 = Conv2D(512, (3, 3), padding='same', strides=1, use_bias=False)(x2)
    x2 = BatchNormalization()(x2)
    x2 = Dropout(0.3)(x)    

    added_x = add([x, x2])
    added_x = LeakyReLU(alpha=0.2)(added_x)

    input_layer2 = Input(shape=(4, 4, 128))

    merged_input = concatenate([added_x, input_layer2])

    x3 = Conv2D(64 * 8, kernel_size=1, padding="same", strides=1)(merged_input)
    x3 = BatchNormalization()(x3)
    x3 = LeakyReLU(alpha=0.2)(x3)
    x3 = Flatten()(x3)
    x3 = Dense(1)(x3)
    x3 = Activation('sigmoid')(x3)

    stage2_dis = Model(inputs=[input_layer, input_layer2], outputs=[x3])
    return stage2_dis

In [None]:
def build_adversarial_model(gen_model2, dis_model, gen_model1):
    """
    adversarial 모델 만들기
    """
    embeddings_input_layer = Input(shape=(1024, ))
    noise_input_layer = Input(shape=(100, ))
    compressed_embedding_input_layer = Input(shape=(4, 4, 128))

    gen_model1.trainable = False
    dis_model.trainable = False

    lr_images, mean_logsigma1 = gen_model1([embeddings_input_layer, noise_input_layer])
    hr_images, mean_logsigma2 = gen_model2([embeddings_input_layer, lr_images])
    valid = dis_model([hr_images, compressed_embedding_input_layer])

    model = Model(inputs=[embeddings_input_layer, noise_input_layer, compressed_embedding_input_layer], outputs=[valid, mean_logsigma2])
    return model

> ## Defining Loss

In [None]:
def KL_loss(y_true, y_pred):
    mean = y_pred[:, :128]
    logsigma = y_pred[:, :128]
    loss = -logsigma + .5 * (-1 + K.exp(2. * logsigma) + K.square(mean))
    loss = K.mean(loss)
    return loss

def wasserstein_loss(y_true, y_pred):
	return K.mean(y_true * y_pred)

* Wasserstein_loss의 수식적 당위성 설명 : https://ahjeong.tistory.com/7

> ## Inception score measure

![대체 텍스트](https://i.ibb.co/Dw20ssm/gan.png)

* 잘 생성된 이미지 10개 대상으로 성능 측정해보기

In [None]:
from math import floor
from numpy import ones
from numpy import expand_dims
from numpy import log
from numpy import mean
from numpy import std
from numpy import exp
from numpy.random import shuffle
from keras.applications.inception_v3 import InceptionV3
from keras.applications.inception_v3 import preprocess_input
from keras.datasets import cifar10
from skimage.transform import resize
from numpy import asarray
 
# scale an array of images to a new size
def scale_images(images, new_shape):
	images_list = list()
	for image in images:
		# resize with nearest neighbor interpolation
		new_image = resize(image, new_shape, 0)
		# store
		images_list.append(new_image)
	return asarray(images_list)
 
# assumes images have any shape and pixels in [0,255]
def calculate_inception_score(images, n_split=10, eps=1E-16):
	# load inception v3 model
	model = InceptionV3()
	# enumerate splits of images/predictions
	scores = list()
	n_part = floor(images.shape[0] / n_split)
	for i in range(n_split):
		# retrieve images
		ix_start, ix_end = i * n_part, (i+1) * n_part
		subset = images[ix_start:ix_end]
		# convert from uint8 to float32
		subset = subset.astype('float32')
		# scale images to the required size
		subset = scale_images(subset, (299,299,3))
		# pre-process images, scale to [-1,1]
		subset = preprocess_input(subset)
		# predict p(y|x)
		p_yx = model.predict(subset)
		# calculate p(y)
		p_y = expand_dims(p_yx.mean(axis=0), 0)
		# calculate KL divergence using log probabilities
		kl_d = p_yx * (log(p_yx + eps) - log(p_y + eps))
		# sum over classes
		sum_kl_d = kl_d.sum(axis=1)
		# average over images
		avg_kl_d = mean(sum_kl_d)
		# undo the log
		is_score = exp(avg_kl_d)
		# store
		scores.append(is_score)
	# average across images
	is_avg, is_std = mean(scores), std(scores)
	return is_avg, is_std

In [None]:
image1 = Image.open('stage1_results/gen_590_0.png')
image2 = Image.open('stage1_results/gen_590_3.png')
image3 = Image.open('stage1_results/gen_590_5.png')
image4 = Image.open('stage1_results/gen_590_6.png')
image5 = Image.open('stage1_results/gen_590_8.png')
image6 = Image.open('stage1_results/gen_590_9.png')
image7 = Image.open('stage1_results/gen_592_1.png')
image8 = Image.open('stage1_results/gen_592_7.png')
image9 = Image.open('stage1_results/gen_556_6.png')
image10 = Image.open('stage1_results/gen_556_9.png')

data1 = np.asarray(image1)
data2 = np.asarray(image2)
data3 = np.asarray(image3)
data4 = np.asarray(image4)
data5 = np.asarray(image5)
data6 = np.asarray(image6)
data7 = np.asarray(image7)
data8 = np.asarray(image8)
data9 = np.asarray(image9)
data10 = np.asarray(image10)
print(test_data.shape)

dataset1 = [data1,data2,data3,data4,data5,data6,data7,data8,data9,data10]
dataset1 = np.asarray(dataset1)
print(dataset1.shape)

In [None]:
calculate_inception_score(dataset1, n_split=1, eps=1E-16)

* 흰색 배경 제거한 이미지로 다시 score 측정

In [None]:
image1 = Image.open('image/gen_592_9.png')
image2 = Image.open('image/gen_590_2.png')
image3 = Image.open('image/gen_458_8.png')

data1 = np.asarray(image1)
data2 = np.asarray(image2)
data3 = np.asarray(image3)

In [None]:
calculate_inception_score(data1, n_split=1, eps=1E-16)

In [None]:
calculate_inception_score(data2, n_split=1, eps=1E-16)

In [None]:
calculate_inception_score(data3, n_split=1, eps=1E-16)

> ## Main File(Train)

* (Rewind) GAN의 학습과정
  * 판별자 네트워크 학습
   1. 랜덤 노이즈 m 개를 생성하여, 생성자 네트워크에 전달하고 변환된 데이터 m 개를 얻음.
   2. 학습 데이터셋에서 진짜 데이터 m 개를 선택.
   3. 2m 개의 데이터(진짜 m개 + 가짜 m개)를 이용해 판별자 네트워크의 정확도를 최대화하는 방향으로 학습.
  * 생성자 네트워크 학습
   1. 랜덤 노이즈 m 개를 다시 생성.
   2. 랜덤 노이즈 m 개를 이용해 생성자가 판별자의 정확도를 최소화하도록 학습.

In [None]:
if __name__ == '__main__':

 
    '''
    filepath = '/content/gdrive/My Drive/dl_teamproject_folder/filepath/gen/model.{epoch:02d}.hdf5'
    filepath2 = '/content/gdrive/My Drive/dl_teamproject_folder/filepath/dis/model.{epoch:02d}.hdf5'
    modelckpt = ModelCheckpoint(filepath=filepath)
    modelckpt2 = ModelCheckpoint(filepath=filepath2)
    '''
    # 폴더 경로 설정
    data_dir = "/content/gdrive/My Drive/dl_teamproject_folder/birds/birds"
    train_dir = data_dir + "/train"
    test_dir = data_dir + "/test"

    # 하이퍼 파라미터 설정
    image_size = 64
    batch_size = 64
    z_dim = 100
    stage1_generator_lr = 0.0002
    stage1_discriminator_lr = 0.0002
    stage1_lr_decay_step = 600
    epochs = 600
    condition_dim = 128

    #폴더 경로 설정
    embeddings_file_path_train = train_dir + "/char-CNN-RNN-embeddings.pickle"
    embeddings_file_path_test = test_dir + "/char-CNN-RNN-embeddings.pickle"

    filenames_file_path_train = train_dir + "/filenames.pickle"
    filenames_file_path_test = test_dir + "/filenames.pickle"

    class_info_file_path_train = train_dir + "/class_info.pickle"
    class_info_file_path_test = test_dir + "/class_info.pickle"

    cub_dataset_dir = "/content/gdrive/My Drive/dl_teamproject_folder/CUB_200_2011/CUB_200_2011"
    
    # optimizer 정의
    dis_optimizer = Adam(lr=stage1_discriminator_lr, beta_1=0.5, beta_2=0.999)
    gen_optimizer = Adam(lr=stage1_generator_lr, beta_1=0.5, beta_2=0.999)

    """"
    데이터 불러오기
    """
    X_train, y_train, embeddings_train = load_dataset(filenames_file_path=filenames_file_path_train,
                                                      class_info_file_path=class_info_file_path_train,
                                                      cub_dataset_dir=cub_dataset_dir,
                                                      embeddings_file_path=embeddings_file_path_train,
                                                      image_size=(64, 64))

    X_test, y_test, embeddings_test = load_dataset(filenames_file_path=filenames_file_path_test,
                                                   class_info_file_path=class_info_file_path_test,
                                                   cub_dataset_dir=cub_dataset_dir,
                                                   embeddings_file_path=embeddings_file_path_test,
                                                   image_size=(64, 64))

    """
    네트워크 만들고 컴파일
    """
    ca_model = build_ca_model()
    ca_model.compile(loss="binary_crossentropy", optimizer="adam")

    stage1_dis = build_stage1_discriminator()
    stage1_dis.compile(loss='binary_crossentropy', optimizer=dis_optimizer)

    stage1_gen = build_stage1_generator()
    stage1_gen.compile(loss="mse", optimizer=gen_optimizer)

    embedding_compressor_model = build_embedding_compressor_model()
    embedding_compressor_model.compile(loss="binary_crossentropy", optimizer="adam")

    adversarial_model = build_adversarial_model(gen_model=stage1_gen, dis_model=stage1_dis)
    adversarial_model.compile(loss=['binary_crossentropy',wasserstein_loss], loss_weights=[1, 2.0],
                              optimizer=gen_optimizer, metrics=None)

    tensorboard = TensorBoard(log_dir="/content/gdrive/My Drive/dl_teamproject_folder/logs/".format(time.time()))
    tensorboard.set_model(stage1_gen)
    tensorboard.set_model(stage1_dis)
    tensorboard.set_model(ca_model)
    tensorboard.set_model(embedding_compressor_model)

    # 진짜와 가짜 값들이 담긴 배열 생성
    # label smoothing 적용 (discriminator가 부드러운 형태로 확률을 예측하도록 하기 위해 실제 데이터에 대한 target 값을 1보다 약간 작은 값, 이를테면 0.9로 해준다는 것)
    # 정확한 건 (https://kangbk0120.github.io/articles/2017-08/tips-from-goodfellow)
    real_labels = np.ones((batch_size, 1), dtype=float) * 0.9
    fake_labels = np.zeros((batch_size, 1), dtype=float) * 0.1

    for epoch in range(epochs):
        print("========================================")
        print("Epoch is:", epoch+1)
        print("Number of batches", int(X_train.shape[0] / batch_size))

        gen_losses = []
        dis_losses = []

        # 데이터와 train 모델 불러오기
        number_of_batches = int(X_train.shape[0] / batch_size)
        for index in range(number_of_batches):
            print("Batch:{}".format(index+1))
            
            """
            Discriminator network 학습
            """
            # 배치 사이즈 만큼의 데이터를 샘플링한다 Sample a batch of data
            z_noise = np.random.normal(0, 1, size=(batch_size, z_dim))
            image_batch = X_train[index * batch_size:(index + 1) * batch_size]
            embedding_batch = embeddings_train[index * batch_size:(index + 1) * batch_size]
            image_batch = (image_batch - 127.5) / 127.5

            # 가짜 이미지 생성
            fake_images, _ = stage1_gen.predict([embedding_batch, z_noise], verbose=3)

            # compressed된 embedding 생성
            compressed_embedding = embedding_compressor_model.predict_on_batch(embedding_batch)
            compressed_embedding = np.reshape(compressed_embedding, (-1, 1, 1, condition_dim))
            # 배열을 반복하면서 새로운 축(axis)을 추가하는 np.tile
            compressed_embedding = np.tile(compressed_embedding, (1, 4, 4, 1))

            # 진짜 이미지에 진짜라는 라벨(1)을 주고 discriminator 학습시킨 loss 
            dis_loss_real = stage1_dis.train_on_batch([image_batch, compressed_embedding],
                                                      np.reshape(real_labels, (batch_size, 1)))
            # generator가 생성한 가짜 이미지에 가짜라는 라벨(0)을 주고 discriminator 학습시킨 loss 
            dis_loss_fake = stage1_dis.train_on_batch([fake_images, compressed_embedding],
                                                      np.reshape(fake_labels, (batch_size, 1)))
            # 진짜 이미지에 가짜라는 라벨(0)주고 discriminator 학습시킨 loss 
            dis_loss_wrong = stage1_dis.train_on_batch([image_batch[:(batch_size - 1)], compressed_embedding[1:]],
                                                       np.reshape(fake_labels[1:], (batch_size-1, 1)))
            # 총 discriminator의 loss = 0.5 * (loss_real + 0.5 * (loss_wrong + loss_fake)) 
            d_loss = 0.5 * np.add(dis_loss_real, 0.5 * np.add(dis_loss_wrong, dis_loss_fake))

            print("d_loss_real:{}".format(dis_loss_real))
            print("d_loss_fake:{}".format(dis_loss_fake))
            print("d_loss_wrong:{}".format(dis_loss_wrong))
            print("d_loss:{}".format(d_loss))

            """
            Generator network 학습
            """
            g_loss = adversarial_model.train_on_batch([embedding_batch, z_noise, compressed_embedding],[K.ones((batch_size, 1)) * 0.9, K.ones((batch_size, 256)) * 0.9])
            print("g_loss:{}".format(g_loss))

            dis_losses.append(d_loss)
            gen_losses.append(g_loss)

         #   stage1_gen.save_weights("stage1_gen.h5")
         #   stage1_dis.save_weights("stage1_dis.h5")

        """
        각 에폭 끝나고 tensorboard에 loss값들 저장하는 부분
        """
        write_log(tensorboard, 'discriminator_loss', np.mean(dis_losses), epoch)
        write_log(tensorboard, 'generator_loss', np.mean(gen_losses[0]), epoch)
        
        # 짝수번째 에폭 학습 끝날때마다 이미지 생성하고 저장하는 부분
        if epoch % 2 == 0:
            # z_noise2 = np.random.uniform(-1, 1, size=(batch_size, z_dim))
            z_noise2 = np.random.normal(0, 1, size=(batch_size, z_dim))
            embedding_batch = embeddings_test[0:batch_size]
            fake_images, _ = stage1_gen.predict_on_batch([embedding_batch, z_noise2])
            stage1_gen.save_weights("/content/gdrive/My Drive/dl_teamproject_folder/filepath/stage1_gen2.h5")
            stage1_dis.save_weights("/content/gdrive/My Drive/dl_teamproject_folder/filepath/stage1_dis2.h5")

            # Save images
            for i, img in enumerate(fake_images[:10]):
              save_rgb_img(img, "/content/gdrive/My Drive/dl_teamproject_folder/results/gen_{}_{}.png".format(epoch, i))

    # Save models
    stage1_gen.save_weights("/content/gdrive/My Drive/dl_teamproject_folder/stage1_gen.h5")
    stage1_dis.save_weights("/content/gdrive/My Drive/dl_teamproject_folder/stage1_dis.h5")

In [None]:
# Stage 2
if __name__ == '__main__':
    # 경로 설정
    data_dir ="/content/birds/birds"
    train_dir = data_dir + "/train"
    test_dir = data_dir + "/test"

    # 하이퍼파라미터 설정
    hr_image_size = (256, 256)
    lr_image_size = (64, 64)
    batch_size = 32
    z_dim = 100
    stage1_generator_lr = 0.0002
    stage1_discriminator_lr = 0.0002
    stage1_lr_decay_step = 600
    epochs = 600
    condition_dim = 128

    # 경로 설정
    embeddings_file_path_train = train_dir + "/char-CNN-RNN-embeddings.pickle"
    embeddings_file_path_test = test_dir + "/char-CNN-RNN-embeddings.pickle"

    filenames_file_path_train = train_dir + "/filenames.pickle"
    filenames_file_path_test = test_dir + "/filenames.pickle"

    class_info_file_path_train = train_dir + "/class_info.pickle"
    class_info_file_path_test = test_dir + "/class_info.pickle"

    cub_dataset_dir = "/content/CUB_200_2011/CUB_200_2011"

    # 옵티마이저 설정
    dis_optimizer = Adam(lr=stage1_discriminator_lr, beta_1=0.5, beta_2=0.999)
    gen_optimizer = Adam(lr=stage1_generator_lr, beta_1=0.5, beta_2=0.999)

    """
   데이터 불러오기
    """
    X_hr_train, y_hr_train, embeddings_train = load_dataset(filenames_file_path=filenames_file_path_train,
                                                            class_info_file_path=class_info_file_path_train,
                                                            cub_dataset_dir=cub_dataset_dir,
                                                            embeddings_file_path=embeddings_file_path_train,
                                                            image_size=(256, 256))

    X_hr_test, y_hr_test, embeddings_test = load_dataset(filenames_file_path=filenames_file_path_test,
                                                         class_info_file_path=class_info_file_path_test,
                                                         cub_dataset_dir=cub_dataset_dir,
                                                         embeddings_file_path=embeddings_file_path_test,
                                                         image_size=(256, 256))

    X_lr_train, y_lr_train, _ = load_dataset(filenames_file_path=filenames_file_path_train,
                                             class_info_file_path=class_info_file_path_train,
                                             cub_dataset_dir=cub_dataset_dir,
                                             embeddings_file_path=embeddings_file_path_train,
                                             image_size=(64, 64))

    X_lr_test, y_lr_test, _ = load_dataset(filenames_file_path=filenames_file_path_test,
                                           class_info_file_path=class_info_file_path_test,
                                           cub_dataset_dir=cub_dataset_dir,
                                           embeddings_file_path=embeddings_file_path_test,
                                           image_size=(64, 64))

    """
    모델을 만들고 컴파일
    """
    stage2_dis = build_stage2_discriminator()
    stage2_dis.compile(loss='binary_crossentropy', optimizer=dis_optimizer)

    stage1_gen = build_stage1_generator()
    stage1_gen.compile(loss="binary_crossentropy", optimizer=gen_optimizer)

    stage1_gen.load_weights("/content/gdrive/My Drive/딥러닝/results2/stage1_gen2.h5")

    stage2_gen = build_stage2_generator()
    stage2_gen.compile(loss="binary_crossentropy", optimizer=gen_optimizer)

    embedding_compressor_model = build_embedding_compressor_model()
    embedding_compressor_model.compile(loss='binary_crossentropy', optimizer='adam')

    adversarial_model = build_adversarial_model(stage2_gen, stage2_dis, stage1_gen)
    adversarial_model.compile(loss=['binary_crossentropy',wasserstein_loss], loss_weights=[1.0, 2.0],
                              optimizer=gen_optimizer, metrics=None)

    tensorboard = TensorBoard(log_dir="/content/gdrive/My Drive/딥러닝/logs/".format(time.time()))
    tensorboard.set_model(stage2_gen)
    tensorboard.set_model(stage2_dis)

    # 진짜와 가짜 값들이 담긴 배열 생성
    # label smoothing 적용 (discriminator가 부드러운 형태로 확률을 예측하도록 하기 위해 실제 데이터에 대한 target 값을 1보다 약간 작은 값, 이를테면 0.9로 해준다는 것)
    # 정확한 건 (https://kangbk0120.github.io/articles/2017-08/tips-from-goodfellow)
    real_labels = np.ones((batch_size, 1), dtype=float) * 0.9
    fake_labels = np.zeros((batch_size, 1), dtype=float) * 0.1

    for epoch in range(epochs):
        print("========================================")
        print("Epoch is:", epoch)

        gen_losses = []
        dis_losses = []

        # 데이터 불러오고 학습
        number_of_batches = int(X_hr_train.shape[0] / batch_size)
        print("Number of batches:{}".format(number_of_batches))
        for index in range(number_of_batches):
            print("Batch:{}".format(index+1))

            # Create a noise vector
            z_noise = np.random.normal(0, 1, size=(batch_size, z_dim))
            X_hr_train_batch = X_hr_train[index * batch_size:(index + 1) * batch_size]
            embedding_batch = embeddings_train[index * batch_size:(index + 1) * batch_size]
            X_hr_train_batch = (X_hr_train_batch - 127.5) / 127.5

            # 가짜 이미지 생성
            lr_fake_images, _ = stage1_gen.predict([embedding_batch, z_noise], verbose=3)
            hr_fake_images, _ = stage2_gen.predict([embedding_batch, lr_fake_images], verbose=3)

            """
            4. Generate compressed embeddings
            """
            compressed_embedding = embedding_compressor_model.predict_on_batch(embedding_batch)
            compressed_embedding = np.reshape(compressed_embedding, (-1, 1, 1, condition_dim))
            compressed_embedding = np.tile(compressed_embedding, (1, 4, 4, 1))

            """
            5. Train the discriminator model
            """
            dis_loss_real = stage2_dis.train_on_batch([X_hr_train_batch, compressed_embedding],
                                                      np.reshape(real_labels, (batch_size, 1)))
            dis_loss_fake = stage2_dis.train_on_batch([hr_fake_images, compressed_embedding],
                                                      np.reshape(fake_labels, (batch_size, 1)))
            dis_loss_wrong = stage2_dis.train_on_batch([X_hr_train_batch[:(batch_size - 1)], compressed_embedding[1:]],
                                                       np.reshape(fake_labels[1:], (batch_size-1, 1)))
            d_loss = 0.5 * np.add(dis_loss_real, 0.5 * np.add(dis_loss_wrong,  dis_loss_fake))
            print("d_loss:{}".format(d_loss))

            """
            Train the adversarial model
            """
            g_loss = adversarial_model.train_on_batch([embedding_batch, z_noise, compressed_embedding],
                                                                [K.ones((batch_size, 1)) * 0.9, K.ones((batch_size, 256)) * 0.9])

            print("g_loss:{}".format(g_loss))

            dis_losses.append(d_loss)
            gen_losses.append(g_loss)
            adversarial_model.save('/content/gdrive/My Drive/딥러닝/results2/my_stackganmodel2.h5')    
            stage2_gen.save('/content/gdrive/My Drive/딥러닝/results2/my_genmodel2.h5')    
            stage2_dis.save('/content/gdrive/My Drive/딥러닝/results2/my_dismodel2.h5')    
            """
            각 에폭 끝나고 tensorboard에 loss값들 저장하는 부분
            """
            write_log(tensorboard, 'discriminator_loss', np.mean(dis_losses), epoch)
            write_log(tensorboard, 'generator_loss', np.mean(gen_losses[0]), epoch)


        # 2 에폭마다 이미지 생성
        if epoch % 2 == 0:
            z_noise2 = np.random.normal(0, 1, size=(batch_size, z_dim))
            embedding_batch = embeddings_test[0:batch_size]

            lr_fake_images, _ = stage1_gen.predict([embedding_batch, z_noise2], verbose=3)
            hr_fake_images, _ = stage2_gen.predict([embedding_batch, lr_fake_images], verbose=3)
            stage2_gen.save_weights("/content/gdrive/My Drive/딥러닝/results2/stage2_gen.h5")
            stage2_dis.save_weights("/content/gdrive/My Drive/딥러닝/results2/stage2_dis.h5")
            # Save images
            for i, img in enumerate(lr_fake_images[:10]):
                save_rgb_img(img, "/content/gdrive/My Drive/딥러닝/results2/gen_{}_{}.png".format(epoch, i))
            for i, img in enumerate(hr_fake_images[:10]):
                save_rgb_img(img, "/content/gdrive/My Drive/딥러닝/results2/gen_{}_{}.png".format(epoch, i))

    adversarial_model.save('/content/gdrive/My Drive/딥러닝/results2/my_stackganmodel.h5')    
    stage2_gen.save('/content/gdrive/My Drive/딥러닝/results2/my_genmodel.h5')    
    stage2_dis.save('/content/gdrive/My Drive/딥러닝/results2/my_dismodel.h5')  

![대체 텍스트](https://i.ibb.co/9pqLpWD/embedding-shape.pngs://ibb.co/sgFGgqf)

![대체 텍스트](https://i.ibb.co/nzYYZyb/600-epoch.png://)

* LeakyReLU 층 사용 

: 음수의 활성화 값을 조금 허용함으로써 희소한 그래디언트가 훈련 방해하는 것을 방지

  * Residual Block 사용
  
    <img src="https://drive.google.com/uc?id=1ZqD1rYEg3wd6XA8LjRZo-JYsbIgQ1ro9" width="500">

    그래디언트가 잘 흐를 수 있도록 일종의 지름길(shortcut, skip connection)을 만들어 주자는 생각

**<모델 별 변경/ 추가 부분>**

*   LearningRate 변경 
  *   기존 lr의 10배로 변경  
  *  학습시간 단축위해 시행

*   Batch Normalization 사용 : 빠른 학습을 위해
      <img src="https://drive.google.com/uc?id=1BZw_m1lVnfTaNK7blmmvzSgw1svwMXNN" width="400">
  
      <img src="https://drive.google.com/uc?id=1BVRTwGj3mNzdMOnKOG6_dd-gYn6Qoem2" width="400">



*   Optimizer 변경 
  *   Batch Normalization 여부에 따른 최적의 Optimizer 상이

        <img src="  https://1.bp.blogspot.com/-_fMUvxuThaw/WOD-7GuqZ0I/AAAAAAAABgo/fEN29-ukRJ0XKKVnJ5I_R-JVAFSwqO88QCK4B/s1600/lsgan_12.PNG" width="400">


*   Wassertein Loss 사용
  * discriminator와 generator간의 balance 맞추기
  * mode collapse(G returns the same looking samples for different input signals) 피하기 위해  
  <img src="https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&fname=https%3A%2F%2Fk.kakaocdn.net%2Fdn%2FcRqbfT%2Fbtqu2Ia2BK7%2FQEinXUjVkZoWx2jVVXaho0%2Fimg.png" width="400">
  * http://dl-ai.blogspot.com/2017/08/gan-problems.html

*   DropOut 사용 
  *   모델 견고하게 만들기 위해 무작위성 주입
    *  GAN은 동적 평형 특성상 여러 방식으로 갇힐 가능성 높음




# Presentation of results

* MODEL : Batch normalization X + Adam optimizer 사용

  <img src=https://i.ibb.co/Q7VqfcG/black.png, width=150><img src=https://i.ibb.co/4PfNyKG/black2.png, width=150><img src=https://i.ibb.co/4PfNyKG/black2.png, width=150><img src=https://i.ibb.co/4PfNyKG/black2.png, width=150>




* MODEL : Learning rate 기존의 10배로 학습 진행

  <img src=https://i.ibb.co/hydFCT2/legend.png, width=450>


* MODEL : Batch normalization X + RMSprop optimzer 사용

  <img src=https://i.ibb.co/c3W0KLT/NBN.png, width=850>

* MODEL :Batch normalization X + RMSprop optimizer + Wassertein Loss 변경
 <img src=https://i.ibb.co/qYw6nc3/wgan-rmsprop.png>

* MODEL : Batch normalization O + Adam optimizer 사용 -> Mode Collapse 발생

  <img src=https://i.ibb.co/8cGhWbN/original-epoch.png, width=800>

* MODEL : Batch normalization O + Adam optimizer + Wassertein Loss 변경 + Dropout 추가

  * STAGE1
  ![대체 텍스트](https://i.ibb.co/ZTrjXtb/ungyeong.png)

  * STAGE2
  <img src=https://i.ibb.co/gMH1YSr/image.jpg, width=600>


# Analysis of results

* 모델에 따른 학습시간
  * learning rate 기존의 10배로 학습 진행
 
    -> STAGE 1 : 94epochs, 약 12시간 소요 / 88epoch 이후로 loss 값에 NAN 값만 출력 / 높은 Learning rate가 문제인 것으로 보임.
  *  Batch normalization X + RMSprop optimzer 사용
  
    -> STAGE 1 : 150epochs, 약 7시간 소요 / 150epoch 이후 런타임 끊김 현상. 
  * Batch normalization X + RMSprop optimizer + Wassertein Loss 변경
 
    -> STAGE 1 : 180epochs, 약 4시간 소요 / 180epoch에서 세션 끊김 현상 지속적 발생

  *  Batch normalization O + Adam optimizer 사용  
    -> STAGE 1 : case1 = 138epochs, 약11시간 소요 / case2 = 600epochs, 약 8시간 소요        
     
  * Batch normalization O + Adam optimizer + Wassertein Loss 변경 + Dropout 추가
 
    -> STAGE 1 : 600epochs, 약 4-5시간 소요
 
    -> STAGE 2 : 8epochs, 24시간 소요

* 학습 시간이 너무 오래 걸려 learning rate를 0.002도 작은 learning rate라고 생각해서 0.0002에서 0.002로 높여봤음. 
  * 이는 학습에 별로 도움이 되지 못하고 오히려 loss값에 nan값을 띄게 됨. loss가 증가하다가 무한대로 수렴했기 때문일 것.

   ![대체 텍스트](https://i.ibb.co/P6hJ0jK/nan.png)
  * 출처 : https://stackoverflow.com/questions/52211665/why-do-i-get-nan-loss-value-in-training-discriminator-and-generator-of-gan

  

* GAN 변형 모델인 StackGAN의 성능을 높이기 위해서도 Wasserstein loss가 효과가 있었음.
 다른 StyleGAN이나 CycleGAN에도 Wasserstein loss로 학습하면 더 좋은 학습이 가능해 질 수도 있다고 생각.

* inception score 측정

  ![대체 텍스트](https://i.ibb.co/1Lb0dMj/inception-1.png)
  * Stage1밖에 학습을 진행하지 못했기 때문에 기존의 StackGAN의 Inception score에 달하는 score 얻지 못함.
  * data를 뜯어서 봐보니 GAN이 생성한 이미지가 주변에 흰색 배경의 패딩이 있었음.

  ![대체 텍스트](https://i.ibb.co/4Rxm13S/inception-3.png)
  * 이를 해결하면 더 좋은 inception score를 얻을 수 있다고 생각함.

  ![대체 텍스트](https://i.ibb.co/yBVR6Y9/inception-2.png)
  * 겉의 흰 배경을 제거했더니 대체적으로 더 좋은 score를 받는 것 확인.

# Insights and discussions relevant to the project

* GAN은 학습이 굉장히 어려움
  * 학습이 잘 되기 위해서는 서로 비슷한 수준의 생성자와 구분자가 함께 조금씩 발전해야 힘 한쪽이 너무 급격하게 강력해지면 이 관계가 깨져서 학습이 이루어지지 않음


*   DCGAN, WGAN,EBGAN, BEGAN,CycleGAN, DiscoGAN 등 성능향상 및 모델 안정화 위해 다양한 모델 출현
  *   출처 : https://dreamgonfly.github.io/2018/03/17/gan-explained.html


* GAN은 상당히 오랜 시간 학습 필요, 장시간 학습 중 런타임 연결 끊김 계속 발생
  * 개발자 도구(F12)에서 console에 명령어 입력으로 해결
  * ```
 function ClickConnect() { 
   var buttons = document.querySelectorAll("colab-dialog.yes-no-dialog paper-button#cancel"); 
   buttons.forEach(function(btn) { btn.click(); }); 
   console.log("1분마다 자동 재연결"); 
   document.querySelector("#top-toolbar > colab-connect-button").click();
    } 
   setInterval(ClickConnect,1000*60);
```
  * 출처: https://bryan7.tistory.com/1077 [민서네집]

* GAN관련 모델을 학습시키고 싶으시다면,,, GTX 1080Ti graphic card 추천..!

   ![대체 텍스트](https://i.ibb.co/BVGjN0V/gtx1080.pnghttps://)
   * 출처 : https://stackoverflow.com/questions/58595157/colab-gpu-vs-gtx-1080

* github 코드들 활용하려 했지만, 다른 python, tensorflow 버전에서 작성되어있어서 가상환경에서 다운그레이드를 해서 다시 시도.
 *  그래도 원하는대로 잘 안됐음.. 버전 상 호환 안되는 부분이 많기 때문인 듯. 클론 해보고 싶은 코드가 있다면 먼저 라이브러리 버전 확인을 해보시기를..!


# References

* 웹사이트
  * StackGAN 전반적인 구조 및 구현 메인 코드
   - https://medium.com/@mrgarg.rajat/implementing-stackgan-using-keras-a0a1b381125e

  * StackGAN Inception score 산출
   - https://github.com/hanzhanggit/StackGAN-inception-model
   - https://machinelearningmastery.com/how-to-implement-the-inception-score-from-scratch-for-evaluating-generated-images/

  * StackGAN 구조 이해
   - https://www.youtube.com/watch?v=G2_8Jc0IwYk


* 논문

  * “Generative Adversarial Network”(2014,Ian J. Goodfellow외) 

  * “Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks” (2016, Alec Radford외)

  * “StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks”(2017,Han Zhang 외)

  * "Stacked Generative Adversarial Networks"(Xun Huang et al)




# Member's constribution statement

* 권성수 : 기여도 (100%) 기여내용 : Stage2 설계 / <3,6>모델 실행
* 문대정 : 기여도 (100%) 기여내용 : Stage1 설계 / <2,3>모델 실행 / 디버깅 /  inception score 측정 / 발표파일 작성/ 발표 
* 이재철 : 기여도 (100%) 기여내용 : Stage2 설계 / <4,5>모델 실행 / 디버깅 /데이터셋 구조 조사
* 조은경 : 기여도 (100%) 기여내용 : Stage1 설계 /<1,4,5>모델 실행 / 디버깅 / 모델변경 방향 조사,제시/ 발표파일 작성


1. Batch normalization X + Adam optimizer 사용
2. Batch normalization X + RMSprop optimzer 사용
3. Batch normalization X + RMSprop optimizer + Wassertein Loss 변경
4. Batch normalization O + Adam optimizer 사용 
5. Batch normalization O + Adam optimizer + Wassertein Loss 변경 + Dropout 추가
6. Learning rate 기존의 10배로 학습 진행 
