

> Grace Esther - 2702305576



Mengimpor semua modul yang diperlukan untuk membangun model, memproses data gambar, menghitung metrik FID, dan membentuk arsitektur jaringan

In [1]:
import os
from zipfile import ZipFile
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf

from PIL import Image
from scipy.linalg import sqrtm
from tensorflow.keras.models import Model, Sequential
from tensorflow.keras.layers import (
    Dense, Conv2D, Flatten, Reshape, LeakyReLU, BatchNormalization,
    UpSampling2D, Cropping2D, Input, Conv2DTranspose, Dropout
)
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.applications.inception_v3 import InceptionV3, preprocess_input

### Tujuan
Project ini bertujuan untuk meningkatkan performa model Generative Adversarial Network (GAN) dalam menghasilkan citra sintetis yang menyerupai data asli, dengan mengukur kualitas menggunakan metrik **Frechet Inception Distance (FID)**.

# 3.A Bangun baseline model GAN

## Generator

Generator terdiri dari 3 layer convolution dengan masing-masing layer memiliki kernel 3 x 3, stride 1, valid padding, masing-masing jumlah kernel sebanyak, 16, 32, 64, fungsi aktivasi Relu, kecuali layer terakhir memiliki fungsi aktivasi Tanh. Target output gambar 100 x 100 x 3, dengan noise nilai acak yang diambil dari distribusi normal (Gaussian).

In [2]:
def build_generator(latent_dim=100):
    model = Sequential()

    # Latent vector to initial feature map
    model.add(Dense(128 * 25 * 25, input_dim=latent_dim))
    model.add(Reshape((25, 25, 128)))

    # Conv Layer 1: 16 filters
    model.add(Conv2D(16, kernel_size=3, strides=1, padding='valid', activation='relu'))  # 23x23

    # Upsample (4x4) → 92x92
    model.add(UpSampling2D(size=(4, 4)))  # 92x92

    # Conv Layer 2: 32 filters
    model.add(Conv2D(32, kernel_size=3, strides=1, padding='valid', activation='relu'))  # 90x90

    # Conv Layer 3: 64 filters
    model.add(Conv2D(64, kernel_size=3, strides=1, padding='valid', activation='relu'))  # 88x88

    # Output layer: RGB
    model.add(Conv2D(3, kernel_size=3, strides=1, padding='valid', activation='tanh'))  # 86x86

    # Crop to 100x100 if you upsample more OR:
    model.add(UpSampling2D(size=(2, 2)))  # 172x172
    model.add(Cropping2D(cropping=((36, 36), (36, 36))))  # final: 100x100x3

    return model

In [3]:
gen = build_generator(100)
gen.summary()

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


Discriminator

Diskriminator terdiri dari 3 layer convolution dengan masing-masing layer memiliki kernel 3 x 3, stride 1, valid padding, masing-masing jumlah kernel sebanyak, 16, 32, 64, fungsi aktivasi Relu. Fully connected terdiri dari single layer dengan fungsi aktivasi Sigmoid.

In [4]:
def build_discriminator(input_shape=(100, 100, 3)):
    model = Sequential()
    # Layer 1: 16 filter, kernel 3x3, ReLU
    model.add(Conv2D(16, kernel_size=3, strides=1, padding='valid', activation='relu', input_shape=input_shape))
    # Layer 2: 32 filter, kernel 3x3, ReLU
    model.add(Conv2D(32, kernel_size=3, strides=1, padding='valid', activation='relu'))
    # Layer 3: 64 filter, kernel 3x3, ReLU
    model.add(Conv2D(64, kernel_size=3, strides=1, padding='valid', activation='relu'))
    # Flatten + Dense dengan Sigmoid
    model.add(Flatten())
    model.add(Dense(1, activation='sigmoid'))
    return model

# 3.B Modifikasi

Versi generator yang lebih dalam, menggunakan Conv2DTranspose dan BatchNormalization untuk memperhalus hasil dan stabilisasi training

In [5]:
latent_dim = 120

In [6]:
def build_generator_modified(latent_dim=latent_dim): # Use the latent_dim from kernel variables
    model = Sequential([
        # Input layer
        Input(shape=(latent_dim,)),

        # Dense layer dan reshape
        Dense(128 * 25 * 25),
        Reshape((25, 25, 128)),
        BatchNormalization(),
        LeakyReLU(negative_slope=0.2),

        # Upsampling block 1 (25x25 -> 50x50)
        Conv2DTranspose(64, kernel_size=4, strides=2, padding='same'),
        BatchNormalization(),
        LeakyReLU(negative_slope=0.2),

        # Upsampling block 2 (50x50 -> 100x100)
        Conv2DTranspose(32, kernel_size=4, strides=2, padding='same'),
        BatchNormalization(),
        LeakyReLU(negative_slope=0.2),

        # Output layer
        Conv2DTranspose(3, kernel_size=4, strides=1, padding='same', activation='tanh')
    ])

    return model

Diskriminator yang lebih dalam dengan penambahan GaussianNoise untuk regularisasi, membantu menghindari overfitting pada dataset kecil

In [7]:
from tensorflow.keras.layers import GaussianNoise

In [8]:
def build_discriminator_modified_v2(input_shape=(100, 100, 3)):
    model = Sequential([
        Input(shape=input_shape),
        Conv2D(64, kernel_size=4, strides=2, padding='same'),
        LeakyReLU(negative_slope=0.2),
        GaussianNoise(0.1),

        Conv2D(128, kernel_size=4, strides=2, padding='same'),
        BatchNormalization(),
        LeakyReLU(negative_slope=0.2),
        GaussianNoise(0.1),

        Conv2D(256, kernel_size=4, strides=2, padding='same'),
        BatchNormalization(),
        LeakyReLU(negative_slope=0.2),
        GaussianNoise(0.1),

        Flatten(),
        Dense(1, activation='sigmoid')
    ])
    return model

GAN Wrapper

Menggabungkan generator dan discriminator menjadi satu model GAN. Discriminator dibekukan (trainable = False) selama train generator

In [9]:
def build_gan(generator, discriminator, latent_dim):
    discriminator.trainable = False
    z = Input(shape=(latent_dim,))
    img = generator(z)
    validity = discriminator(img)
    return Model(z, validity)

# 3.C Evaluasi

Menghitung nilai FID antara gambar nyata dan hasil generator. Menggunakan InceptionV3 sebagai ekstraktor fitur dan menghitung perbedaan distribusi Gaussian-nya

In [10]:
def get_inception_model():
    return InceptionV3(include_top=False, pooling='avg', input_shape=(299, 299, 3))

def get_activations(images, model, batch_size=32):
    images = tf.image.resize(images, (299, 299)).numpy()
    images = preprocess_input(images)
    return model.predict(images, batch_size=batch_size, verbose=0)

def calculate_fid(act1, act2):
    mu1, sigma1 = act1.mean(axis=0), np.cov(act1, rowvar=False)
    mu2, sigma2 = act2.mean(axis=0), np.cov(act2, rowvar=False)
    ssdiff = np.sum((mu1 - mu2)**2)
    covmean = sqrtm(sigma1 @ sigma2)
    if np.iscomplexobj(covmean):
        covmean = covmean.real
    return ssdiff + np.trace(sigma1 + sigma2 - 2.0 * covmean)

Augmentasi sederhana (flip, rotasi, zoom) hanya diterapkan ke gambar nyata sebelum masuk ke discriminator untuk memperkaya variasi data

In [11]:
data_augmentation = tf.keras.Sequential([
    tf.keras.layers.RandomFlip("horizontal"),
    tf.keras.layers.RandomRotation(0.02),
    tf.keras.layers.RandomZoom(0.1),
])

Train function

- Melatih discriminator menggunakan real dan fake images
- Melatih generator melalui GAN
- Setiap fid_every epoch, menghitung FID untuk memantau kualitas generator

In [12]:
def train(dataset, generator, discriminator, gan, latent_dim, epochs=1000, batch_size=32, fid_every=200):
    inception_model = get_inception_model()

    for epoch in range(epochs):
        # --- Train Discriminator ---
        idx = np.random.randint(0, dataset.shape[0], batch_size // 2)
        real_imgs = dataset[idx]
        real_imgs = data_augmentation(real_imgs, training=True)

        z = np.random.normal(0, 1, (batch_size // 2, latent_dim))
        fake_imgs = generator.predict(z)

        # Label smoothing & noise
        real_labels = np.ones((batch_size // 2, 1)) * 0.9
        fake_labels = np.zeros((batch_size // 2, 1))
        real_labels += np.random.normal(0, 0.05, real_labels.shape)
        fake_labels += np.random.normal(0, 0.05, fake_labels.shape)
        real_labels = np.clip(real_labels, 0, 1)
        fake_labels = np.clip(fake_labels, 0, 1)

        d_loss_real = discriminator.train_on_batch(real_imgs, real_labels)
        d_loss_fake = discriminator.train_on_batch(fake_imgs, fake_labels)
        d_loss = 0.5 * np.add(d_loss_real, d_loss_fake)

        # --- Train Generator ---
        z = np.random.normal(0, 1, (batch_size, latent_dim))
        g_labels = np.ones((batch_size, 1)) * 0.9
        g_labels += np.random.normal(0, 0.05, g_labels.shape)
        g_labels = np.clip(g_labels, 0, 1)

        g_loss = gan.train_on_batch(z, g_labels)

        # --- FID Evaluation ---
        if epoch % fid_every == 0 and epoch > 0:
            fid_n = min(len(dataset), 300)
            real = dataset[:fid_n]
            z = np.random.normal(0, 1, (fid_n, latent_dim))
            gen = generator.predict(z)
            real_fid = (real + 1) * 127.5
            gen_fid = (gen + 1) * 127.5
            act1 = get_activations(real_fid, inception_model)
            act2 = get_activations(gen_fid, inception_model)
            fid = calculate_fid(act1, act2)
            print(f"FID @ epoch {epoch}: {fid:.2f}")

Siapkan dataset

Memuat dan mempersiapkan dataset dari folder, menyesuaikan ukuran dan normalisasi gambar ke rentang [-1, 1]

In [13]:
def load_images(image_dir, target_size=(100, 100)):
    images = []
    image_files = [f for f in os.listdir(image_dir) if f.lower().endswith(('.png', '.jpg', '.jpeg'))]
    for fname in image_files:
        img_path = os.path.join(image_dir, fname)
        img = tf.keras.preprocessing.image.load_img(img_path, target_size=target_size)
        img = tf.keras.preprocessing.image.img_to_array(img)
        img = (img / 127.5) - 1.0  # Normalize to [-1, 1]
        images.append(img)
    return np.array(images)


In [14]:
zip_path = '/content/A_23-20250624T034710Z-1-001.zip'
with ZipFile(zip_path, 'r') as zip_ref:
    zip_ref.extractall('dataset')

image_dir = '/content/dataset/A_23'

# Load dataset dari folder hasil ekstrak
dataset = load_images(image_dir)
print("Dataset shape:", dataset.shape)

Dataset shape: (1074, 100, 100, 3)


Membangun dan menyusun model baseline, kemudian menyambungkannya ke dalam GAN dan mengompilasinya

Generator dan diskriminator menggunakan optimizer Adam, dan loss binary crossentropy

In [15]:
generator_a = build_generator(latent_dim)
discriminator_a = build_discriminator()
discriminator_a.compile(loss='binary_crossentropy', optimizer=Adam(0.0002, 0.5), metrics=['accuracy'])
discriminator_a.trainable = False
gan_a = build_gan(generator_a, discriminator_a, latent_dim)
gan_a.compile(loss='binary_crossentropy', optimizer=Adam(0.0002, 0.5))

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


Sama seperti model A, tetapi menggunakan arsitektur generator dan discriminator yang telah dimodifikasi. Optimizer menggunakan learning rate decay

In [16]:
from tensorflow.keras.optimizers.schedules import ExponentialDecay
from tensorflow.keras.optimizers import Adam

lr_schedule = ExponentialDecay(
    initial_learning_rate=0.0002,
    decay_steps=20000,
    decay_rate=0.95
)
optimizer = Adam(learning_rate=lr_schedule, beta_1=0.5)

In [17]:
generator_b = build_generator_modified(latent_dim)
discriminator_b = build_discriminator()
discriminator_b.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
discriminator_b.trainable = False
gan_b = build_gan(generator_b, discriminator_b, latent_dim)
gan_b.compile(loss='binary_crossentropy', optimizer=optimizer)

Untuk memverifikasi bahwa hanya generator yang dilatih saat GAN dikompilasi, dan bahwa discriminator dibekukan dengan benar

In [18]:
print("Trainable weights in GAN:", len(gan_a.trainable_weights))
print("Trainable weights in GAN:", len(gan_b.trainable_weights))

Trainable weights in GAN: 10
Trainable weights in GAN: 14


In [19]:
for layer in gan_b.layers:
    print(f"{layer.name}: trainable in gan_b layer = {layer.trainable}")

input_layer_6: trainable in gan_b layer = True
sequential_4: trainable in gan_b layer = True
sequential_5: trainable in gan_b layer = False


In [20]:
for layer in gan_a.layers:
    print(f"{layer.name}: trainable in gan_a layer = {layer.trainable}")

input_layer_3: trainable in gan_a layer = True
sequential_2: trainable in gan_a layer = True
sequential_3: trainable in gan_a layer = False


Jalankan train

Melatih kedua model GAN (baseline dan modifikasi) pada dataset yang sama selama 1000 epoch

In [21]:
train(dataset, generator_a, discriminator_a, gan_a, latent_dim, epochs=1000, batch_size=32)

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1s/step




[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 57ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 55ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 67ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 48ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 52ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 54ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 70ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 48ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 50ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 59ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 57ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 52ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 34ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 34

In [22]:
train(dataset, generator_b, discriminator_b, gan_b, latent_dim, epochs=1000, batch_size=32)

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1s/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 39ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 36ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 40ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 37ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 38ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 36ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 37ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 36ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 38ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 40ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 37ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 46ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 38ms

## Evaluasi FID Score

Menghitung nilai FID terakhir dari kedua generator menggunakan 1000 sampel untuk evaluasi akhir

In [23]:
def evaluate_fid(generator, dataset, latent_dim, sample_size=1000):
    inception = get_inception_model()
    real_imgs = dataset[:sample_size]
    z = np.random.normal(0, 1, (sample_size, latent_dim))
    gen_imgs = generator.predict(z)

    real_imgs = (real_imgs + 1) * 127.5
    gen_imgs = (gen_imgs + 1) * 127.5

    act1 = get_activations(real_imgs, inception)
    act2 = get_activations(gen_imgs, inception)
    return calculate_fid(act1, act2)

fid_a = evaluate_fid(generator_a, dataset, latent_dim)
fid_b = evaluate_fid(generator_b, dataset, latent_dim)

print(f"FID Model A (Baseline): {fid_a:.2f}")
print(f"FID Model B (Modifikasi): {fid_b:.2f}")

[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 27ms/step
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 35ms/step
FID Model A (Baseline): 513.41
FID Model B (Modifikasi): 474.33


Link video: https://youtu.be/HScx2Mccdwc?feature=shared

## Kesimpulan dan Analisa
---

### Rincian Perubahan
Modifikasi yang dilakukan terhadap model baseline antara lain:

1. **Penggunaan GaussianNoise** sebagai pengganti Dropout pada discriminator  
   → Memberikan regularisasi yang lebih stabil untuk arsitektur convolutional.
2. **Penerapan data augmentation ringan** (horizontal flip, rotation, zoom) pada real images sebelum masuk ke discriminator  
   → Membantu model menghindari overfitting, terutama pada dataset dengan jumlah sampel terbatas.

---

### Hasil Evaluasi

| Model     | FID @ Epoch 1000 |
|-----------|------------------|
| Baseline  | **513.41**       |
| Modifikasi| **474.33**       |

Terdapat penurunan FID sebesar **≈ 39.08 poin**, yang menunjukkan bahwa hasil dari generator semakin menyerupai distribusi data asli dan terdapat peningkatan kualitas visual.

---

### Batasan Eksperimen

- Eksperimen dibatasi pada **1000 epoch** karena keterbatasan sumber daya GPU (Google Colab). Namun, berdasarkan tren penurunan FID yang stabil pada setiap 200 epoch, dapat disimpulkan bahwa model masih berada dalam proses **konvergensi** dan belum mencapai performa optimal.
---

### Kesimpulan
Modifikasi yang diterapkan berhasil meningkatkan kualitas hasil generator berdasarkan evaluasi kuantitatif menggunakan FID. Hal ini menunjukkan bahwa pendekatan yang digunakan efektif, dan dapat dikembangkan lebih lanjut melalui eksperimen tambahan seperti:
- Penyesuaian nilai `latent_dim`
- Penggunaan teknik normalisasi lanjutan (mis. Spectral Normalization)
- Penggunaan strategi upsampling alternatif (mis. PixelShuffle atau Upsampling2D + Conv2D)

---
Thank you 🙂