# CSCA 5622: *I’m Something of a Painter Myself* – Kaggle Mini Project

**Name:** Olaniyi Nafiu

---

## Project Overview

The goal of this project is to practice building and training generative deep learning models using Generative Adversarial Networks (GANs). Specifically, the task is to generate images in the style of Claude Monet.

The GAN model consists of two neural networks:

- **Generator:** Attempts to create realistic Monet-style images to fool the discriminator.
- **Discriminator:** Attempts to distinguish between real Monet paintings and those generated by the generator.

---

## Task

Build a GAN that generates **7,000 to 10,000** Monet-style images.

---

## Data Description

**Evaluation Metric:** MiFID (Memorization-informed Frechet Inception Distance) score.

### Input Data

The dataset contains four directories:

- `monet_tfrec/`: 300 Monet paintings in TFRecord format (256×256)
- `monet_jpg/`: 300 Monet paintings in JPEG format (256×256)
- `photo_tfrec/`: 7,028 photos in TFRecord format (256×256)
- `photo_jpg/`: 7,028 photos in JPEG format (256×256)

- **Monet directories:** Contain Monet paintings used to train the model.
- **Photo directories:** Contain real-world photos to be transformed into Monet-style images.

### Output Requirements

- **Number of images:** 7,000 to 10,000
- **Image size:** 256 × 256 × 3 (RGB)
- **Submission format:** A single ZIP file named `images.zip` containing all generated images

---

## Reference

[Project Data on Kaggle](https://www.kaggle.com/competitions/gan-getting-started/data)


In [None]:
# All Imports
import os
import glob
from PIL import Image
import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
from tensorflow.keras import layers
from tqdm import tqdm
import zipfile

## Exploratory Data Analysis (EDA)

In [None]:
# Read dataset
monet_dir = "/kaggle/input/gan-getting-started/monet_jpg"
photo_dir = "/kaggle/input/gan-getting-started/photo_jpg"

def get_image_paths(directory):
    return sorted(glob.glob(os.path.join(directory, "*.jpg")))

monet_images = get_image_paths(monet_dir)
photo_images = get_image_paths(photo_dir)

print(f"Number of Monet images: {len(monet_images)}")
print(f"Number of Photo images: {len(photo_images)}")

In [None]:
# Preview a few images
def show_sample_images(image_paths, title, num=5):
    fig, axs = plt.subplots(1, num, figsize=(15, 5))
    fig.suptitle(title, fontsize=16)
    for i, ax in enumerate(axs):
        img = Image.open(image_paths[i])
        ax.imshow(img)
        ax.axis("off")
    plt.show()

show_sample_images(monet_images, "Sample Monet Paintings")
show_sample_images(photo_images, "Sample Real-World Photos")


## Model Architecture

I'll be using Cycle GAN for developing the models. It is designed for unpaired image-to-image translation. 

### Components
1. Generator G (Photo → Monet): Learns to paint a photo in Monet’s style.
2. Generator F (Monet → Photo): Acts as the inverse. Helps ensure cycle consistency.
3. Discriminator D_Y (Monet domain): Tries to tell real Monet paintings from fake ones created by G.
4. Discriminator D_X (Photo domain): Tries to tell real photos from reconstructions by F.

### Loss Functions

To train CycleGAN effectively, several loss functions are used together:

#### 1. Adversarial Loss

Ensures that generated images are indistinguishable from real ones within each domain. It uses a standard GAN loss (typically least squares GAN loss):

- For G and D<sub>Y</sub>:  
  `L_GAN(G, D_Y, X, Y) = E_y[(D_Y(y) - 1)²] + E_x[D_Y(G(x))²]`

- For F and D<sub>X</sub>:  
  `L_GAN(F, D_X, Y, X) = E_x[(D_X(x) - 1)²] + E_y[D_X(F(y))²]`

#### 2. Cycle Consistency Loss

Encourages the mappings to be consistent:

- If we translate a photo to Monet and back, we should get the original photo: `F(G(photo)) ≈ photo`
- Likewise, `G(F(monet)) ≈ monet`

The loss is:  
`L_cycle(G, F) = E_x[‖F(G(x)) − x‖₁] + E_y[‖G(F(y)) − y‖₁]`

This helps preserve the semantic content of the original images.

#### 3. Identity Loss (Optional)

Encourages the generators to preserve color and structure when the input is already from the target domain:

- `G(monet) ≈ monet`  
- `F(photo) ≈ photo`

The loss is:  
`L_identity(G, F) = E_y[‖G(y) − y‖₁] + E_x[‖F(x) − x‖₁]`

This regularization helps stabilize training and preserve low-level details.

#### 4. Total Objective

The total loss combines all components:
`L_total = L_GAN(G, D_Y) + L_GAN(F, D_X) + λ_cycle * L_cycle + λ_id * L_identity`

Where:  
- `λ_cycle` (typically 10) controls the weight of cycle consistency  
- `λ_id` (optional, typically 0.5 or 0) controls the weight of identity loss



## Results and Analysis

In [None]:
## Preprocess Images
IMG_HEIGHT = 256
IMG_WIDTH = 256
BATCH_SIZE = 1
AUTOTUNE = tf.data.AUTOTUNE

# Normalize images to [-1, 1]
def preprocess_image(file_path):
    image = tf.io.read_file(file_path)
    image = tf.image.decode_jpeg(image, channels=3)
    image = tf.image.resize(image, [IMG_HEIGHT, IMG_WIDTH])
    image = (tf.cast(image, tf.float32) / 127.5) - 1.0
    return image


# Build dataset
def build_dataset(image_paths, shuffle=True):
    ds = tf.data.Dataset.from_tensor_slices(image_paths)
    if shuffle:
        ds = ds.shuffle(buffer_size=1000)
    ds = ds.map(preprocess_image, num_parallel_calls=AUTOTUNE)
    ds = ds.batch(BATCH_SIZE).prefetch(AUTOTUNE)
    return ds

monet_ds = build_dataset(monet_images)
photo_ds = build_dataset(photo_images)

In [None]:
# Visualize sample image after pre_processing
def show_sample(ds, title):
    for img in ds.take(1):
        img = (img[0] + 1.0) / 2.0
        plt.imshow(img)
        plt.title(title)
        plt.axis("off")
        plt.show()

show_sample(monet_ds, "Sample Monet Image")
show_sample(photo_ds, "Sample Photo Image")

In [None]:
# Build models

# Instance Normalization normalizes styles across each image individually, which is better for style transfers such as in this project.
class InstanceNormalization(layers.Layer):
    def __init__(self, epsilon=1e-5):
        super().__init__()
        self.epsilon = epsilon

    def build(self, input_shape):
        self.gamma = self.add_weight(name='gamma',
                                     shape=(input_shape[-1],),
                                     initializer="ones",
                                     trainable=True)
        self.beta = self.add_weight(name='beta',
                                    shape=(input_shape[-1],),
                                    initializer="zeros",
                                    trainable=True)

    def call(self, x):
        mean, var = tf.nn.moments(x, axes=[1, 2], keepdims=True)
        return self.gamma * (x - mean) / tf.sqrt(var + self.epsilon) + self.beta

# Residual block is a small neural network that includes a skip connection. It allows the GAN to retain content structure and only allow minor style changes
def residual_block(x, filters):
    input_tensor = x
    x = layers.Conv2D(filters, kernel_size=3, padding="same")(x)
    x = InstanceNormalization()(x)
    x = layers.ReLU()(x)

    x = layers.Conv2D(filters, kernel_size=3, padding="same")(x)
    x = InstanceNormalization()(x)
    return layers.Add()([input_tensor, x])


# Generator used for both G and F
def build_generator(input_shape=(256, 256, 3), name="generator"):
    inputs = layers.Input(shape=input_shape)

    # Initial conv layer
    x = layers.Conv2D(64, kernel_size=7, strides=1, padding="same")(inputs)
    x = InstanceNormalization()(x)
    x = layers.ReLU()(x)

    # Downsample: d128, d256
    x = layers.Conv2D(128, kernel_size=3, strides=2, padding="same")(x)
    x = InstanceNormalization()(x)
    x = layers.ReLU()(x)

    x = layers.Conv2D(256, kernel_size=3, strides=2, padding="same")(x)
    x = InstanceNormalization()(x)
    x = layers.ReLU()(x)

    # Residual blocks ×6
    for _ in range(6):
        x = residual_block(x, 256)

    # Upsample: u128, u64
    x = layers.Conv2DTranspose(128, kernel_size=3, strides=2, padding="same")(x)
    x = InstanceNormalization()(x)
    x = layers.ReLU()(x)

    x = layers.Conv2DTranspose(64, kernel_size=3, strides=2, padding="same")(x)
    x = InstanceNormalization()(x)
    x = layers.ReLU()(x)

    # Output layer
    x = layers.Conv2D(3, kernel_size=7, strides=1, padding="same", activation="tanh")(x)

    return tf.keras.Model(inputs, x, name=name)


# Discriminator used for both D_X and D_Y
def build_discriminator(input_shape=(256, 256, 3), name="discriminator"):
    initializer = tf.random_normal_initializer(0., 0.02)
    inp = layers.Input(shape=input_shape)

    x = layers.Conv2D(64, 4, strides=2, padding='same', kernel_initializer=initializer)(inp)
    x = layers.LeakyReLU(0.2)(x)

    x = layers.Conv2D(128, 4, strides=2, padding='same', kernel_initializer=initializer)(x)
    x = InstanceNormalization()(x)
    x = layers.LeakyReLU(0.2)(x)

    x = layers.Conv2D(256, 4, strides=2, padding='same', kernel_initializer=initializer)(x)
    x = InstanceNormalization()(x)
    x = layers.LeakyReLU(0.2)(x)

    x = layers.Conv2D(512, 4, strides=1, padding='same', kernel_initializer=initializer)(x)
    x = InstanceNormalization()(x)
    x = layers.LeakyReLU(0.2)(x)

    x = layers.Conv2D(1, 4, strides=1, padding='same', kernel_initializer=initializer)(x)

    return tf.keras.Model(inputs=inp, outputs=x, name=name)



In [None]:
# Model instantiation
gen_G = build_generator(name="G_photo2monet")
gen_F = build_generator(name="F_monet2photo")

disc_X = build_discriminator(name="D_photo")
disc_Y = build_discriminator(name="D_monet")


In [None]:
# Define Loss Functions
loss_obj = tf.keras.losses.MeanSquaredError()

## Adversarial loss: goal is for loss of fake output to be close to 1
def generator_loss(fake_output):
    return loss_obj(tf.ones_like(fake_output), fake_output)

## Discrimator loss: goal is for output to be 1 for real monet images and 0 for fake monet images
def discriminator_loss(real_output, fake_output):
    real_loss = loss_obj(tf.ones_like(real_output), real_output)
    fake_loss = loss_obj(tf.zeros_like(fake_output), fake_output)
    return (real_loss + fake_loss) * 0.5

## Cycle consistency loss: ensures original image can be reconstructed
def cycle_consistency_loss(real_image, cycled_image, lambda_cycle=10):
    loss = tf.reduce_mean(tf.abs(real_image - cycled_image))
    return lambda_cycle * loss

## Identity loss: ensures generator does not change input if it's already monet
def identity_loss(real_image, same_image, lambda_identity=5):
    loss = tf.reduce_mean(tf.abs(real_image - same_image))
    return lambda_identity * loss



In [None]:
# Model Optimization
generator_lr = 2e-4
discriminator_lr = 2e-4
beta_1 = 0.5

G_optimizer = tf.keras.optimizers.Adam(learning_rate=generator_lr, beta_1=beta_1)
F_optimizer = tf.keras.optimizers.Adam(learning_rate=generator_lr, beta_1=beta_1)
D_X_optimizer = tf.keras.optimizers.Adam(learning_rate=discriminator_lr, beta_1=beta_1)
D_Y_optimizer = tf.keras.optimizers.Adam(learning_rate=discriminator_lr, beta_1=beta_1)

In [None]:
!nvidia-smi

In [None]:
# Define Training Function
@tf.function
def train_step(real_photo, real_monet):
    with tf.GradientTape(persistent=True) as tape:
        # Generate fake images with generators
        fake_monet = gen_G(real_photo, training=True)
        fake_photo = gen_F(real_monet, training=True)

        # Create cycle images for cycle consitency loss evaluation
        cycled_photo = gen_F(fake_monet, training=True)
        cycled_monet = gen_G(fake_photo, training=True)

        # Create image mapping for identity loss evaluation
        same_photo = gen_F(real_photo, training=True)
        same_monet = gen_G(real_monet, training=True)

        # Discriminator outputs
        disc_real_monet = disc_Y(real_monet, training=True)
        disc_fake_monet = disc_Y(fake_monet, training=True)

        disc_real_photo = disc_X(real_photo, training=True)
        disc_fake_photo = disc_X(fake_photo, training=True)

        # Generator adversarial losses
        G_gan_loss = generator_loss(disc_fake_monet)
        F_gan_loss = generator_loss(disc_fake_photo)

        # Cycle consistency losses
        cycle_loss_G = cycle_consistency_loss(real_photo, cycled_photo)
        cycle_loss_F = cycle_consistency_loss(real_monet, cycled_monet)

        # Identity losses
        id_loss_G = identity_loss(real_monet, same_monet)
        id_loss_F = identity_loss(real_photo, same_photo)

        # Total generator losses
        G_total_loss = G_gan_loss + cycle_loss_G + id_loss_G
        F_total_loss = F_gan_loss + cycle_loss_F + id_loss_F

        # Discriminator losses
        D_Y_loss = discriminator_loss(disc_real_monet, disc_fake_monet)
        D_X_loss = discriminator_loss(disc_real_photo, disc_fake_photo)

    # Compute gradients
    G_gradients = tape.gradient(G_total_loss, gen_G.trainable_variables)
    F_gradients = tape.gradient(F_total_loss, gen_F.trainable_variables)
    D_Y_gradients = tape.gradient(D_Y_loss, disc_Y.trainable_variables)
    D_X_gradients = tape.gradient(D_X_loss, disc_X.trainable_variables)

    # Apply gradients
    G_optimizer.apply_gradients(zip(G_gradients, gen_G.trainable_variables))
    F_optimizer.apply_gradients(zip(F_gradients, gen_F.trainable_variables))
    D_Y_optimizer.apply_gradients(zip(D_Y_gradients, disc_Y.trainable_variables))
    D_X_optimizer.apply_gradients(zip(D_X_gradients, disc_X.trainable_variables))

    return {
        "G_loss": G_total_loss,
        "F_loss": F_total_loss,
        "D_Y_loss": D_Y_loss,
        "D_X_loss": D_X_loss
    }


In [None]:
# Training execution
num_epochs = 20
checkpoint_dir = "/kaggle/working/checkpoints"
os.makedirs(checkpoint_dir, exist_ok=True)
g_losses = []

for epoch in range(num_epochs):
    total_g_loss = 0
    total_steps = 0

    for photo_batch, monet_batch in tf.data.Dataset.zip((photo_ds, monet_ds)):
        losses = train_step(photo_batch, monet_batch)
        total_g_loss += losses["G_loss"]
        total_steps += 1

    avg_g_loss = total_g_loss / total_steps
    g_losses.append(float(avg_g_loss))
    print(f"Epoch {epoch+1} - Avg G_loss: {avg_g_loss:.4f}")
    gen_G.save_weights(f"{checkpoint_dir}/gen_G_epoch_{epoch+1}.weights.h5")
    gen_F.save_weights(f"{checkpoint_dir}/gen_F_epoch_{epoch+1}.weights.h5")
    disc_X.save_weights(f"{checkpoint_dir}/disc_X_epoch_{epoch+1}.weights.h5")
    disc_Y.save_weights(f"{checkpoint_dir}/disc_Y_epoch_{epoch+1}.weights.h5")
    print(f"Saved checkpoint at epoch {epoch+1}")

In [None]:
# Display loss over epochs
plt.plot(g_losses)
plt.title("Generator Loss Over Epochs")
plt.xlabel("Epoch")
plt.ylabel("Avg G_loss")
plt.grid(True)
plt.show()

In [None]:
# Generate monet-styled images 

# Convert model output from [-1, 1] to [0, 255]
def deprocess_image(img_tensor):
    img = (img_tensor + 1.0) * 127.5
    img = tf.clip_by_value(img, 0, 255)
    return tf.cast(img, tf.uint8).numpy()

# set up output directory
output_dir = "/kaggle/working/generated_monet"
os.makedirs(output_dir, exist_ok=True)

# capture sample images
num_display_samples = 5
sample_pairs = []

for i, photo_batch in enumerate(tqdm(photo_ds)):
    generated = gen_G(photo_batch, training=False)[0]
    output_img = deprocess_image(generated)
    
    output_path = os.path.join(output_dir, f"{i+1:05d}.jpg")
    Image.fromarray(output_img).save(output_path)

    if i < num_display_samples:
        input_img = deprocess_image(photo_batch[0])
        sample_pairs.append((input_img, output_img))

# Display samples
print(f"\nDisplaying {num_display_samples} sample Monet-style results:")
fig, axs = plt.subplots(num_display_samples, 2, figsize=(8, 2 * num_display_samples))

for idx, (input_img, output_img) in enumerate(sample_pairs):
    axs[idx, 0].imshow(input_img.astype(np.uint8))
    axs[idx, 0].set_title("Original Photo")
    axs[idx, 0].axis("off")

    axs[idx, 1].imshow(output_img.astype(np.uint8))
    axs[idx, 1].set_title("Monet-Style")
    axs[idx, 1].axis("off")

plt.tight_layout()
plt.show()

In [None]:
# Zip images
zip_path = "/kaggle/working/images.zip"
with zipfile.ZipFile(zip_path, "w") as zipf:
    for filename in sorted(os.listdir(output_dir)):
        file_path = os.path.join(output_dir, filename)
        zipf.write(file_path, arcname=filename)

print(f"Saved {len(os.listdir(output_dir))} images to images.zip")

## Conclusion

This project implemented a **CycleGAN** model to perform unpaired image-to-image translation, converting real-world photos into **Monet-style paintings**.

Over 20 training epochs, the generator showed consistent improvement in learning Monet's artistic style, as reflected in the decreasing generator loss


The final model successfully generated over **7,000 Monet-style images**, which were packaged and submitted to the competition. The submission achieved a **Kaggle MiFID score of 95.186**, indicating strong quality and stylistic alignment with Monet's works.

### Key Takeaways

- Used CycleGAN with identity and cycle-consistency loss to enable **unpaired translation**
- Generator learned to preserve photo structure while applying Monet-style brushstrokes
- Generator loss steadily improved, suggesting **stable convergence**
- Final outputs are visually consistent and competition-ready



