Skip to content

rishika2024/Sketch2Real

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

73 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sketch2Real

Generating realistic images from a colored sketch using a diffusion model based on a conditional U-Net

Extra Criteria: GUI

Getting the Dataset

I used the coco_dataset for this project. I generate the sketch, I do the following steps:

  1. Convert it to Grey Scale
  2. Invert the grey scale
  3. Apply Gaussian Blur
  4. Invert the pixels
  5. Get the edges by binary_thresholding
  6. Replace the egdes with the original colors

Model Architechture:

I tested 2 models

Model - 1 (u_net2.py)

In this model, I used 128 x 128 size for the image and had 5k images in my dataset from the val2017

  • Noise schedule used: Cosine Noise Schedule
  • Optimiser: AdamW, weight decay = 1e-3
  • Loss Function: L1
  • Activation Function: SILU
  • Batch Size: 64
  • Normalizing the image tensor to [0,1]

The Conditional U-Net Architcture is as follows:

INPUT

(concatenating noisy image + sketch) -> Initial Conv (6 -> 64) + timestep embedding added (128 channels)

ENCODER (Downsampling)

DownBlock 1: 128 -> 32 channels DownBlock 2: 32 -> 64 channels DownBlock 3: 64 -> 128 channels

(Bottleneck)

ResidualBlock1: 128 -> 256 ResidualBlock2: 256 -> 256 ResidualBlock3: 256 -> 128

DECODER (Upsampling)

UpBlock 1: 128 -> 64 channels (+ skip) UpBlock 2: 64 -> 32 channels (+ skip) UpBlock 3: 32 -> 16 channels (+ skip)

Final Conv (16 -> 3)

OUTPUT

(predicted noise image)

Image Generation (reverse diffusion)

Starting from pure gaussian noise, iteratively denoise over given number of steps. At each time step and the next time step, I get signal_rate and noise_rate using cosine schedule. The Conditional U-Net takes the noisy image x, the sketch, and the current noise variance as input, and predicts the noise component. From that, a clean image estimate is reconstructed:

predicted_image = (x - noise_rate × predicted_noise) / signal_rate

The noisy image for the next step is then re-composed using the next step's rates:

x_next = signal_rate_next × predicted_image + noise_rate_next × predicted_noise

This process repeats until x converges to a realistic image conditioned on the sketch.

Problems with this model:

While I did get images resembling a realistic image, at around the 160th epoch, the model stopped getting better.The variance loss and training loss stopped improving much. This was probably because I used only 5K images

Hence I made a 2nd model with improvements. Below are images from the epochs 160, 170, 180, 190, 200, 210, 220, 230, 240

Model-2 (u_net.py)

In this model, I used 256 x 256 size for the image and had 118k images in my dataset from the train2017

  • Noise schedule used: Offset Cosine Noise Schedule
  • Optimiser: AdamW, weight decay = 1e-4 + CosineAnnealingLR as learning rate scheduler (changes the learning rate)
  • Loss Function: MSE
  • Activation Function: SILU
  • Batch Size: 64
  • Normalize the image tensors to [-1,1]

The Conditional U-Net Architcture is as follows:

INPUT

(concatenating noisy image + sketch) -> Initial Conv (6 -> 64) + timestep embedding added (128 channels)

ENCODER (Downsampling)

DownBlock 1: 128 -> 64 channels DownBlock 2: 64 -> 128 channels DownBlock 3: 128 -> 256 channels

(Bottleneck)

ResidualBlock1: 256 -> 512 ResidualBlock2: 512 -> 512 ResidualBlock3: 512 -> 256

DECODER (Upsampling)

UpBlock 1: 256 -> 128 channels (+ skip) UpBlock 2: 128 -> 64 channels (+ skip) UpBlock 3: 64 -> 32 channels (+ skip)

Final Conv (32 -> 3)

OUTPUT

(predicted noise image)

Generated images below are from epochs 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70

Both the models generate reasonable ouputs using the actual model and generate random noise using the EMA model. This is because the weights change a lot in the beginning (I did not run too many epochs) and EMA generalizes those weights creating an average that cannot be used

This was the Generated image using the EMA model at the 70th epoch

Using the GUI

gui.mp4

I used the larger model for the gui since it outperforms the smaller model

  • Download the checkpoints folder from this drive:

https://drive.google.com/drive/folders/1T-G_GvM5_VO65vhPDhPmF1zeAT_-6b3-?usp=drive_link

  • run the file gui.py using the command: python3 gui.py

  • Click the URL generated

Downloading and Generating the dataset

wget http://images.cocodataset.org/zips/train2017.zip unzip train2017.zip

  • run the file color_to_sketch.py using the command and type 128 or 256, as required: python3 color_to_sketch

Training the model

  • If you generated images of size 128 x 128: run the u_net2 model: python3 u_net2.py

  • If you generated images of size 256 x 256: run the u_net model: python3 u_net.py

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages