# Diffusion. Text-to-Image Generation.

- Liana Mardanova
- DS-01
- l.mardanova@innopolis.university

## Generating Beautiful Interiors with Diffusion Models

In this project, I undertook the task of training a diffusion model to generate visually appealing interior designs. The notebook is systematically organized into four key sections:

1. **Preparations**
2. **Dataset**
3. **Model**
4. **Results**
5. **Resources**

### Challenges 

1) Understanding Diffusion Models
   
Grasping the complex mechanisms and underlying principles of diffusion models was initially challenging. To overcome this, I relied on comprehensive explanations from resources such as [this explanatory video](https://youtu.be/HoKDTa5jHvg?si=xoofJvnZOjTeeR5M) and a detailed tutorial on training conditional diffusion models from scratch provided by [Weights & Biases](https://wandb.ai/capecape/train_sd/reports/How-To-Train-a-Conditional-Diffusion-Model-From-Scratch--VmlldzoyNzIzNTQ1).

2) Sourcing an Appropriate Dataset
   
Filtering interior images from the COCO dataset using the keyword "interior" resulted in poor-quality data, necessitating the creation of a custom dataset. Consequently, I opted to create a [custom dataset](https://www.kaggle.com/datasets/liaaana/ikea-interiors/data) using resources from [GitHub](https://github.com/IvonaTau/ikea/tree/master/images/room_scenes), ensuring a higher quality and more relevant collection of interior images.

3) Long Training
   
Training a diffusion model from scratch was time-consuming, which led to the adoption of LoRA (Low-Rank Adaptation) to achieve high-quality results more efficiently. This approach significantly reduced the training time without compromising the model's performance. This [tutorial](https://www.kaggle.com/code/ostamand/stable-diffusion-1-5-lora-fine-tuning) was very helpful.

In summary, this project demonstrates that we can successfully leverage the capability of diffusion models in generating beautiful interior designs.

## 0. Preparations

In [None]:
# install nessassary packages
!pip install diffusers peft transformers -q

In [None]:
# imports
from torch.utils.data import random_split, DataLoader
import pandas as pd
from pathlib import Path
from PIL import Image
import torch
from torchvision import transforms
from transformers import AutoTokenizer
from torch.utils.data import Dataset
from pathlib import Path
from diffusers import UNet2DConditionModel, AutoencoderKL, DDPMScheduler, StableDiffusionPipeline, DiffusionPipeline
from transformers import CLIPTextModel, CLIPTokenizer
from huggingface_hub import login
from peft import LoraConfig
import torch
import torch
from torch.utils.data import DataLoader
from torchvision import transforms
import torch.nn.functional as F
import math
from tqdm.auto import tqdm
import matplotlib.pyplot as plt
from peft.utils import get_peft_model_state_dict
from diffusers.utils import convert_state_dict_to_diffusers
from datasets import load_dataset
from functools import partial
from PIL import Image
from kaggle_secrets import UserSecretsClient
from torch.utils.data import Dataset
import pandas as pd
from pydantic import BaseModel
import matplotlib.pyplot as plt
from PIL import Image
import pandas as pd
import os

In [None]:
class TrainingConfig(BaseModel):
    data_dir: str = '/kaggle/input/ikea-interiors'
    csv_file: str = 'information.csv'
    image_path_column: str = 'path'
    description_cloumn: str = 'description'
    image_size: int = 512
    lr: float = 0.0001
    batch_size: int = 4
    rank: int = 120
    max_grad_norm: float = 1.0
    pretrained_model_name: str = "runwayml/stable-diffusion-v1-5"
    data_dir: str = "/kaggle/input/ikea-interiors"
    seed: int = 42
    inference_steps: int = 50
    num_epochs: int = 30

In [None]:
config = TrainingConfig()
torch.manual_seed(config.seed)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

## 1. Dataset

In [None]:
def display_images_with_descriptions(data_dir, csv_file, image_path_column, description_column, num_images=3):
    """
    Display a specified number of images from a folder with descriptions in two columns.
    """
    df = pd.read_csv(Path(data_dir) / csv_file)
    fig, axs = plt.subplots(num_images, 2, figsize=(10, 4 * num_images))
    
    for i in range(num_images):
        image_path = Path(data_dir) / df.iloc[i][image_path_column]
        description = df.iloc[i][description_column]
        
        image = Image.open(image_path).convert("RGB")
        axs[i, 0].imshow(image)
        axs[i, 0].axis("off")
        
        axs[i, 1].text(0.5, 0.5, description, ha="center", va="center", wrap=True, fontsize=12)
        axs[i, 1].axis("off")
    
    plt.tight_layout()
    plt.show()


In [None]:
display_images_with_descriptions(config.data_dir, config.csv_file, config.image_path_column, config.description_cloumn, num_images=3)

In [None]:
class IkeaDataset(Dataset): 
    def __init__(self, data_dir, csv_file, tokenizer, image_path_column, description_column, image_size):
        self.data_dir = Path(data_dir)
        self.df = pd.read_csv(self.data_dir / csv_file)
        self.tokenizer = tokenizer
        self.image_path_column = image_path_column
        self.description_column = description_column
        self.image_size = image_size

        self.transform = transforms.Compose([
            transforms.RandomCrop((image_size, image_size)),
            transforms.ToTensor(),
            transforms.Normalize([0.5], [0.5]),
        ])

    def __len__(self):
        return len(self.df)

    def __getitem__(self, idx):
        image_path = self.data_dir / self.df.iloc[idx][self.image_path_column]
        description = self.df.iloc[idx][self.description_column]
        image = self.transform(Image.open(image_path).convert("RGB"))
        input_ids = self.tokenizer(
            description,
            max_length=self.tokenizer.model_max_length,
            padding="max_length",
            truncation=True,
            return_tensors="pt"
        )["input_ids"][0]

        return {"pixel_values": image, "input_ids": input_ids, "description": description}

## 2. Model

In [None]:
# helper functions
def load_model(model_name):
    tokenizer = CLIPTokenizer.from_pretrained(model_name, subfolder="tokenizer")
    text_encoder = CLIPTextModel.from_pretrained(
        model_name, subfolder="text_encoder", torch_dtype=torch.float16
    )
    vae = AutoencoderKL.from_pretrained(
        model_name, subfolder="vae", torch_dtype=torch.float16
    )
    scheduler = DDPMScheduler.from_pretrained(model_name, subfolder="scheduler")
    unet = UNet2DConditionModel.from_pretrained(
        model_name, subfolder="unet", torch_dtype=torch.float16
    )
    return tokenizer, text_encoder, vae, scheduler, unet

def freeze_parameters(model):
    for param in model.parameters():
        param.requires_grad = False

def change_parameters_type(model):
    for param in model.parameters():
        if param.requires_grad:
            param.data = param.to(dtype=torch.float32)

def get_lora_parameters(model):
    return [param for param in filter(lambda param: param.requires_grad, [param for param in model.parameters()])]


def get_models(model_name, dtype=torch.float16):
    tokenizer = CLIPTokenizer.from_pretrained(model_name, subfolder="tokenizer")
    text_encoder = CLIPTextModel.from_pretrained(model_name, subfolder="text_encoder").to(dtype=dtype)
    vae = AutoencoderKL.from_pretrained(model_name, subfolder="vae").to(dtype=dtype)
    scheduler = DDPMScheduler.from_pretrained(model_name, subfolder="scheduler")
    unet = UNet2DConditionModel.from_pretrained(model_name, subfolder="unet").to(dtype=dtype)
    return tokenizer, text_encoder, vae, scheduler, unet

def setup_models_for_training(model_name, rank: int=128):
    tokenizer, text_encoder, vae, scheduler, unet = load_model(model_name)
    
    freeze_parameters(text_encoder)
    freeze_parameters(vae)
    freeze_parameters(unet)
    
    unet_lora_config = LoraConfig(
        r=rank,
        lora_alpha=rank,
        init_lora_weights="gaussian",
        target_modules=["to_k", "to_q", "to_v", "to_out.0"],
    )

    unet.add_adapter(unet_lora_config)

    change_parameters_type(unet)

    return tokenizer, text_encoder, vae, scheduler, unet

In [None]:
# simplified funciton from tutorial
import torch
import torch.nn.functional as F
from tqdm import tqdm

def train(
    tokenizer, 
    text_encoder, 
    vae, 
    scheduler, 
    unet,
    train_dataset, 
    train_dataloader,
    config,
    device
):        
    lora_params = get_lora_parameters(unet)

    text_encoder.to(device).eval()
    vae.to(device).eval()
    unet.to(device).train()

    optimizer = torch.optim.AdamW(lora_params, lr=config.lr)
    scaler = torch.cuda.amp.GradScaler()
    losses = []

    for epoch in range(config.num_epochs):
        epoch_loss = 0.0
        progress_bar = tqdm(train_dataloader, desc=f"Epoch {epoch + 1}/{config.num_epochs}", unit="batch")
        
        for batch in progress_bar:
            bs = batch["input_ids"].size(0)

            with torch.autocast(device_type="cuda", dtype=torch.float16):
                with torch.no_grad():
                    encoder_hidden_states = text_encoder(batch["input_ids"].to(device), return_dict=False)[0]
    
                timesteps = torch.randint(0, scheduler.config.num_train_timesteps, (bs,)).long().to(device)
    
                with torch.no_grad():
                    latents = vae.encode(batch["pixel_values"].to(device)).latent_dist.sample()
                    latents = latents * vae.config.scaling_factor

                noise = torch.randn_like(latents)
                noisy_latents = scheduler.add_noise(latents, noise, timesteps)
                noise_pred = unet(noisy_latents, timesteps, encoder_hidden_states, return_dict=False)[0]
    
                loss = F.mse_loss(noise_pred, noise, reduction="mean")

            scaler.scale(loss).backward()

            if config.max_grad_norm > 0:
                scaler.unscale_(optimizer)
                torch.nn.utils.clip_grad_norm_(lora_params, config.max_grad_norm)
            
            scaler.step(optimizer)
            scaler.update()

            loss_value = loss.item()
            epoch_loss += loss_value
            
            progress_bar.set_postfix({"Loss": f"{loss_value:.4f}"})

        avg_epoch_loss = epoch_loss / len(train_dataloader)
        losses.append(epoch_loss)
        print(f"Epoch {epoch + 1}/{config.num_epochs} - Average Loss: {avg_epoch_loss:.4f}")

    return losses

In [None]:
try:
    del models, pipe
    import gc; gc.collect()
    torch.cuda.empty_cache()
except:
    pass

In [None]:
models = setup_models_for_training(config.pretrained_model_name, rank=config.rank)
train_dataset = IkeaDataset(
    Path(config.data_dir), 
    config.csv_file, 
    models[0], 
    config.image_path_column, 
    config.description_cloumn, 
    config.image_size
)
train_dataloader = DataLoader(
    train_dataset, 
    batch_size=config.batch_size, 
    shuffle=True
)

In [None]:
losses = train(
    *models,
    train_dataset, 
    train_dataloader,
    config,
    device
)

In [None]:
plt.plot(losses)
plt.show()

In [None]:
unet_lora_state_dict = convert_state_dict_to_diffusers(get_peft_model_state_dict(models[-1]))
StableDiffusionPipeline.save_lora_weights(
    save_directory="/kaggle/working/",
    unet_lora_layers=unet_lora_state_dict,
    safe_serialization=True,
)

## 3. Results

In [None]:
# helper functions
def generate(pipeline, prompt, seed, num_inference_steps):
    generator = torch.Generator(device=device).manual_seed(seed)
    result = pipeline(prompt, num_inference_steps=num_inference_steps, generator=generator).images
    return result[0] 

def display_multiple_results(image_pairs, prompts):
    """
    Display pairs of generated images side by side with a title for each image
    and a prompt displayed above each pair.
    """
    titles = ["Pretrained Model", "New Model"]
    for images, prompt in zip(image_pairs, prompts):
        fig, axs = plt.subplots(1, 2, figsize=(12, 6))
        fig.suptitle(prompt, fontsize=16, y=1.05)

        for ax, img, title in zip(axs, images, titles):
            ax.imshow(img)
            ax.set_title(title)
            ax.axis('off')
        
        plt.tight_layout()
        plt.show()

In [None]:
pipe = DiffusionPipeline.from_pretrained(
    config.pretrained_model_name,
    torch_dtype=torch.float16
).to(device)

pipe_new = DiffusionPipeline.from_pretrained(
    config.pretrained_model_name,
    torch_dtype=torch.float16
).to(device)
pipe_new.load_lora_weights("/kaggle/working/pytorch_lora_weights.safetensors")

In [None]:
prompts = [

    # Bedroom
    "Industrial IKEA bedroom with exposed brick walls, a light gray bed frame, and a dark metal headboard. Black metal pendant lighting adds a bold touch to the room’s industrial charm",
    "Boho IKEA bedroom with soft white walls, light wood furniture, and dark woven accents. A dark wood bed frame and metal lamps create an inviting, earthy ambiance",
    "IKEA bedroom with a light teal accent wall, soft white bedding, and dark wood bedside tables. Black metal wall sconces add modern contrast to the calming color scheme",
    "Contemporary IKEA bedroom with light walls and a dark gray upholstered bed, highlighted by black metal side tables and lamps. Soft lighting brings warmth to the modern design",
    "Minimalist IKEA bedroom with light walls, white bedding, and a dark wooden headboard. Black metal frames around mirrors and picture frames add structure to the minimalist space",
    
    # Kitchen
    "IKEA kitchen with pale blue cabinets, light countertops, and a dark wood kitchen island. Black metal handles and a metal range hood add modern industrial accents",
    "Rustic IKEA kitchen with white walls and light wood shelving, contrasted by dark cabinetry and black metal stools. Exposed metal accents add a farmhouse feel",
    "Modern IKEA kitchen with light gray walls, white countertops, and dark wood cabinets. Black metal shelving and pendant lights add a sleek industrial vibe",
    "IKEA kitchen with light wood cabinetry and a dark gray backsplash. Black metal hardware and open shelves bring an edgy contrast to the warm wood tones",
    "Minimalist IKEA kitchen with white walls, light wood countertops, and dark green cabinets. Black metal light fixtures complete the modern, airy look",
    
    # Living Room
    "Coastal IKEA living room with light blue walls, a beige sofa, and a dark wood coffee table. Black metal decor and lighting add contrast to the beachy vibe",
    "Modern IKEA living room with light gray walls and a white sectional, contrasted by a dark wood TV stand and black metal accents. The minimal decor adds a serene feel",
    "IKEA living room with light, neutral walls and a beige sofa, paired with dark wood end tables and black metal lighting. Green plants add a pop of color to the calm space",
    "Contemporary IKEA living room with white walls, a light gray rug, and a dark wood coffee table. Black metal decor enhances the modern, minimalist feel",
    "Rustic IKEA living room with light beige walls, a cream sofa, and dark wood shelves. Black metal frames and warm lighting add an industrial touch to the cozy design",
    
    # Dining Room
    "Scandinavian IKEA dining room with light wood furniture, white walls, and a dark wood dining table. Black metal pendant lights create a bold contrast in the airy room",
    "Minimalist IKEA dining room with pale gray walls, a light wood table, and dark wood chairs. Black metal light fixtures bring structure to the soft space",
    "Rustic IKEA dining room with white walls, a light wood dining set, and dark metal chairs. The natural textures add warmth to the bright space",
    "Modern IKEA dining room with dark gray walls, a light wood table, and black metal accents. Sleek, minimalist decor enhances the modern aesthetic",
    "Cozy IKEA dining room with light beige walls, a white table, and dark wood chairs. Black metal pendant lighting adds warmth to the inviting space",
    
    # Office
    "Scandinavian IKEA office with light wood furniture, white walls, and a dark wood desk. Black metal shelving adds a modern, functional element to the minimalist design",
    "Industrial IKEA office with exposed brick walls, a light wood desk, and black metal accents. Dark wood shelving enhances the urban, professional feel",
    "Modern IKEA office with white walls, a light wood desk, and a dark gray accent wall. Black metal decor adds contrast to the sleek, minimalist space",
    "Cozy IKEA office with light walls, a white desk, and a dark wood bookshelf. Black metal frames and light fixtures bring structure to the comfortable work area",
    "Minimalist IKEA office with light gray walls, a white desk, and dark wood accents. Black metal lighting fixtures complete the calm, focused workspace",

]

image_pairs = []
for prompt in prompts:
    images = []
    images.append(generate(pipe, prompt, config.seed, config.inference_steps))
    images.append(generate(pipe_new, prompt, config.seed, config.inference_steps))
    image_pairs.append(images)

display_multiple_results(image_pairs, prompts)


## 4. Results
### Key Observations

1. **Resemblance to IKEA Furniture**  
   - Generated objects and furniture exhibit visual similarities to IKEA's design style, reflecting the model's ability to learn key furniture characteristics from the dataset.

2. **Perspective and Composition**  
   - The model successfully replicates the perspective and framing typical of IKEA's interior photography, indicating its grasp of spatial and compositional features.

3. **Performance by Room Type**  
   - **Living Rooms and Kitchens**: These room types are the most refined, often generating aesthetically pleasing and cohesive designs resembling catalog-quality images.  
   - **Bedrooms, Offices and Dining rooms**: The model's performance in generating these room types is less consistent, with occasional artifacts or unrealistic elements.

### Areas for Improvement

1. **Extended Training Duration**  
   - Increasing training time could improve the model's ability to capture finer details and produce more coherent results across all room types.

2. **Dataset Quality and Size**  
   - Expanding the dataset to include a larger variety of high-quality interior images would enhance the model's ability to generalize and improve design diversity.

3. **Enhanced Metadata**  
   - Improving the quality and specificity of dataset descriptions could help the model generate designs that are more accurate and contextually appropriate.

### Conclusion

The project highlights the potential of diffusion models for generating beautiful interior designs. While the results are promising, further training and dataset refinements could significantly enhance the model's capabilities, especially for challenging room types like bedrooms and offices.


## 5. Resources
- https://www.kaggle.com/code/ostamand/stable-diffusion-1-5-lora-fine-tuning
- https://github.com/IvonaTau/ikea/tree/master/images/room_scenes
- https://www.kaggle.com/datasets/liaaana/ikea-interiors/data
- https://wandb.ai/capecape/train_sd/reports/How-To-Train-a-Conditional-Diffusion-Model-From-Scratch--VmlldzoyNzIzNTQ1
- code from Lab 8 in F24-PMLDL course