Saved depth seems wrong #8

cnut1648 · 2023-08-01T16:45:29Z

Hello, thanks for the great work!

I am running your model on my custom dataset. However it seems that the saved depth from NYUv2 model is wrong. I think this might due to my misuse of your model's output. I have a script like this:

import os
import shutil
import torch
import numpy as np
import cv2
from tqdm import tqdm
from pathlib import Path
import sys, json
from PIL import Image
import torchvision.transforms.functional as TF
# I clone your repo and put to the place where I can directly import
sys.path.insert(0, str(Path(__file__).parent.resolve() / "idisc"))
from idisc.models.idisc import IDisc
from idisc.utils import (DICT_METRICS_DEPTH, DICT_METRICS_NORMALS,
                         RunningMetric, validate)
model = IDisc.build(json.load(open('idisc/configs/nyu/nyu_swinl.json')))
model.load_pretrained("idisc/nyu_swinlarge.pt")
model = model.to("cuda")
model.eval()


# read in image
image = np.asarray(Image.open(image_path))
image = TF.normalize(TF.to_tensor(image), **{"mean": [0.5, 0.5, 0.5], "std": [0.5, 0.5, 0.5]})
image = image.unsqueeze(0).to("cuda")

with torch.inference_mode():
    depth, *_ = model(image)

TF.to_pil_image(depth[0].cpu()).save(save_path)

I am using Swin-Large model. The image_path is the path to this image

of size 224x224 (I uploaded the exact image in case you might need to debug this), DPT can generate depth like this

however the output of idisc swin-large is this

I believe I made some mistakes somewhere. I wonder if you can help me debug this.

Thanks!

The text was updated successfully, but these errors were encountered:

lpiccinelli-eth · 2023-08-08T11:27:54Z

Thank you for using our model.
I believe that the effect comes from the saving part, the output depth of our model is metrics depth, thus is has floating values from [0.0, +inf), which can result in PIL saving the image in the wrong format.
I suggest you first convert the scalar float values to RGB with a colormap transformation and then save it. You can look into idisc/utils/visualization.py, more specifically into colorize function. It accepts 2D inputs (e.g., (H, W) shaped numpy array), min and max values (for NYU is 0.01 and 10.0 meters), and the colormap name. For instance, "magma" is a good colormap choice since it has a perceptually increasing colormap and does not introduce wrong spurious contrasts.

One little nitpick: the model was trained with ImageNet normalization statistics, hence it would be better to normalize the RGB image with those, instead of the default ones, i.e., {"mean": [0.5, 0.5, 0.5], "std": [0.5, 0.5, 0.5]}

cnut1648 · 2023-08-08T18:05:40Z

Hi @lpiccinelli-eth, this solves it!
Thank you so much for your response and detailed explanation!

cnut1648 closed this as completed Aug 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Saved depth seems wrong #8

Saved depth seems wrong #8

cnut1648 commented Aug 1, 2023 •

edited

Loading

lpiccinelli-eth commented Aug 8, 2023

cnut1648 commented Aug 8, 2023

Saved depth seems wrong #8

Saved depth seems wrong #8

Comments

cnut1648 commented Aug 1, 2023 • edited Loading

lpiccinelli-eth commented Aug 8, 2023

cnut1648 commented Aug 8, 2023

cnut1648 commented Aug 1, 2023 •

edited

Loading