<a href="https://colab.research.google.com/github/sliscak/notebooks/blob/main/Stable_Diffusion%2BDPT.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Generating an image and its depth map using [Stable Diffusion](https://github.com/CompVis/stable-diffusion) from [Diffusers](https://github.com/huggingface/diffusers) library and [DPT](https://huggingface.co/Intel/dpt-large)

---




### Install requirements

In [None]:
%%capture
!pip install --upgrade diffusers
!pip install --upgrade gradio
!pip install --upgrade transformers
!pip install --upgrade ftfy

In [None]:
import gradio as gr
import os
import torch
from torch import autocast
from diffusers import StableDiffusionPipeline
from google.colab import output
from huggingface_hub import notebook_login
from transformers import DPTFeatureExtractor, DPTForDepthEstimation
import torch
import numpy as np
from PIL import Image
import requests

In [None]:
output.enable_custom_widget_manager()

In [None]:
!nvidia-smi -L

GPU 0: Tesla T4 (UUID: GPU-441292ba-df42-0508-914a-f5aad2262f02)


In [None]:
# login to verify license
notebook_login()

In [None]:
# device = 'cuda' if torch.cuda.is_available() else 'cpu'
device = 'cuda'
# pipe = StableDiffusionPipeline.from_pretrained("hakurei/waifu-diffusion", use_auth_token=False)
pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", use_auth_token=True)
pipe = pipe.to(device)

feature_extractor = DPTFeatureExtractor.from_pretrained("Intel/dpt-large")
model = DPTForDepthEstimation.from_pretrained("Intel/dpt-large").to(device)

Some weights of DPTForDepthEstimation were not initialized from the model checkpoint at Intel/dpt-large and are newly initialized: ['neck.fusion_stage.layers.0.residual_layer1.convolution1.bias', 'neck.fusion_stage.layers.0.residual_layer1.convolution2.bias', 'neck.fusion_stage.layers.0.residual_layer1.convolution1.weight', 'neck.fusion_stage.layers.0.residual_layer1.convolution2.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [None]:
torch.cuda.empty_cache()
def get_depth_map(prompt):
    # print(prompt)
    with autocast(device):
        image = pipe(prompt).images[0]
        # <⬇︎> code taken from https://huggingface.co/Intel/dpt-large (the how to use section)
        inputs = feature_extractor(images=image, return_tensors="pt").to(device)
        with torch.no_grad():
            outputs = model(**inputs)
            predicted_depth = outputs.predicted_depth
            prediction = torch.nn.functional.interpolate(
                predicted_depth.unsqueeze(1),
                size=image.size[::-1],
                mode="bicubic",
                align_corners=False,)
            output = prediction.squeeze().cpu().numpy()
            formatted = (output * 255 / np.max(output)).astype("uint8")
            depth = Image.fromarray(formatted)
        # <⬆ end of code>
    return image, depth

with gr.Blocks() as demo:
    with gr.Row():
        text = gr.Text()
        image = gr.Image(label='output image')
        depth = gr.Image(label='depth')
    with gr.Row():
        button = gr.Button('send')
        output1 = gr.Text(label='output1')
    button.click(fn=get_depth_map, inputs=text, outputs=[image, depth])


demo.launch(debug=True)

Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
Running on public URL: https://23641.gradio.app

This share link expires in 72 hours. For free permanent hosting, check out Spaces: https://huggingface.co/spaces


  0%|          | 0/50 [00:00<?, ?it/s]

(512, 512)
