Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Saved depth seems wrong #8

Closed
cnut1648 opened this issue Aug 1, 2023 · 2 comments
Closed

Saved depth seems wrong #8

cnut1648 opened this issue Aug 1, 2023 · 2 comments

Comments

@cnut1648
Copy link

cnut1648 commented Aug 1, 2023

Hello, thanks for the great work!

I am running your model on my custom dataset. However it seems that the saved depth from NYUv2 model is wrong. I think this might due to my misuse of your model's output. I have a script like this:

import os
import shutil
import torch
import numpy as np
import cv2
from tqdm import tqdm
from pathlib import Path
import sys, json
from PIL import Image
import torchvision.transforms.functional as TF
# I clone your repo and put to the place where I can directly import
sys.path.insert(0, str(Path(__file__).parent.resolve() / "idisc"))
from idisc.models.idisc import IDisc
from idisc.utils import (DICT_METRICS_DEPTH, DICT_METRICS_NORMALS,
                         RunningMetric, validate)
model = IDisc.build(json.load(open('idisc/configs/nyu/nyu_swinl.json')))
model.load_pretrained("idisc/nyu_swinlarge.pt")
model = model.to("cuda")
model.eval()


# read in image
image = np.asarray(Image.open(image_path))
image = TF.normalize(TF.to_tensor(image), **{"mean": [0.5, 0.5, 0.5], "std": [0.5, 0.5, 0.5]})
image = image.unsqueeze(0).to("cuda")

with torch.inference_mode():
    depth, *_ = model(image)

TF.to_pil_image(depth[0].cpu()).save(save_path)

I am using Swin-Large model. The image_path is the path to this image
00001
of size 224x224 (I uploaded the exact image in case you might need to debug this), DPT can generate depth like this
image
however the output of idisc swin-large is this
image
I believe I made some mistakes somewhere. I wonder if you can help me debug this.

Thanks!

@lpiccinelli-eth
Copy link
Collaborator

Thank you for using our model.
I believe that the effect comes from the saving part, the output depth of our model is metrics depth, thus is has floating values from [0.0, +inf), which can result in PIL saving the image in the wrong format.
I suggest you first convert the scalar float values to RGB with a colormap transformation and then save it. You can look into idisc/utils/visualization.py, more specifically into colorize function. It accepts 2D inputs (e.g., (H, W) shaped numpy array), min and max values (for NYU is 0.01 and 10.0 meters), and the colormap name. For instance, "magma" is a good colormap choice since it has a perceptually increasing colormap and does not introduce wrong spurious contrasts.

One little nitpick: the model was trained with ImageNet normalization statistics, hence it would be better to normalize the RGB image with those, instead of the default ones, i.e., {"mean": [0.5, 0.5, 0.5], "std": [0.5, 0.5, 0.5]}

@cnut1648
Copy link
Author

cnut1648 commented Aug 8, 2023

Hi @lpiccinelli-eth, this solves it!
Thank you so much for your response and detailed explanation!

@cnut1648 cnut1648 closed this as completed Aug 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants