The code for depth estimation using the DPT model (dpt-hybrid-midas)

Cell 1: Install required libraries (if not already installed)


In [None]:
# Install required libraries if running locally
!pip install transformers   # Hugging Face Transformers library
!pip install torch          # PyTorch for tensor manipulations
!pip install torchvision    # For image processing tasks


Cell 2: Suppress warning messages

In [None]:
# Suppress warning messages to keep the output clean
from transformers.utils import logging
logging.set_verbosity_error()


Cell 3: Import libraries and initialize depth estimation model

In [None]:
# Import necessary libraries for depth estimation and image processing
from transformers import pipeline
from PIL import Image
import torch
import numpy as np

# Initialize the depth estimation pipeline with the Intel DPT Hybrid Midas model.
depth_estimator = pipeline(task="depth-estimation", model="./models/Intel/dpt-hybrid-midas")


Cell 4: Load and preprocess the input image


In [None]:
# Load the input image using PIL and resize it to a specific dimension.
# This image will be used for depth estimation.
raw_image = Image.open('gradio_tamagochi_vienna.png')  # Load an image file
raw_image = raw_image.resize((806, 621))  # Resize the image to match the required input size


Cell 5: Perform depth estimation


In [None]:
# Perform depth estimation using the pre-trained DPT model.
# The model will output a depth map prediction for the input image.
output = depth_estimator(raw_image)


Cell 6: Check the size of the predicted depth output

In [None]:
# Inspect the shape of the predicted depth output.
# The 'predicted_depth' tensor contains the depth map predicted by the model.
print(output["predicted_depth"].shape)  # Print the shape of the predicted depth map

# Add an additional dimension to the depth map tensor using unsqueeze for later resizing.
print(output["predicted_depth"].unsqueeze(1).shape)  # Add a dimension for resizing


Cell 7: Resize the predicted depth map to match the input image

In [None]:
# Resize the predicted depth map to match the original image size using bicubic interpolation.
# 'size=raw_image.size[::-1]' ensures that the depth map is resized to the exact dimensions of the original image.
prediction = torch.nn.functional.interpolate(
    output["predicted_depth"].unsqueeze(1),  # Add a channel dimension for interpolation
    size=raw_image.size[::-1],  # Resize to the original image size (width, height)
    mode="bicubic",  # Use bicubic interpolation for better quality resizing
    align_corners=False  # Disable alignment of corner pixels for smoother interpolation
)

# Print the shape of the resized depth map to ensure it matches the input image size.
print(prediction.shape)
print(raw_image.size[::-1])


Cell 8: Normalize the depth map to display as an image

In [None]:
# Normalize the predicted depth map so that it can be visualized as an image.
# We normalize the values between 0 and 255 to map them to pixel intensities.
output = prediction.squeeze().numpy()  # Remove unnecessary dimensions and convert to NumPy array
formatted = (output * 255 / np.max(output)).astype("uint8")  # Normalize the values to the range [0, 255]

# Convert the normalized depth map back into a PIL image for visualization.
depth = Image.fromarray(formatted)
depth  # Display the depth image


Explanation:
Model Initialization: We initialize the depth estimation pipeline with the DPT model (dpt-hybrid-midas). This model predicts depth maps for input images.

Image Preprocessing: The input image is loaded and resized to fit the model's requirements.

Depth Estimation: The model generates a predicted depth map for the input image. This is a 2D array representing the depth of each pixel.

Resizing the Output: The predicted depth map is resized to match the dimensions of the original input image using bicubic interpolation.

Normalization: The depth map is normalized to pixel values between 0 and 255 so that it can be displayed as an image.

Display Depth Map: The normalized depth map is converted into a PIL image and can be visualized.

