New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Results on Kitti dataset are not reproducible #31
Comments
That is not enough information for me. How does your evaluation script look like? |
Of course, this script I used to get gt_depth from velodyne data in
In And this script I used to eval. In |
@RuslanOm Judging from the code, it seems that you are trying to reproduce "DPT-Large" in Table 1. If this is not the case, please tell me which model you are exactly using to reproduce what number in the paper. If that is the case, then your code is missing a proper alignment step that accounts for both scale and shift in the appropriate way. The alignment you do in your code is different from ours and also doesn't account for the shift. See here isl-org/MiDaS#29 for a link to evaluation code for NYUv2, specifically @cheniynan: Can you post a screenshot of the results and/or code on how you are using the model? |
Thanks for your replay. Sorry, I can't show you the code. But I can show the result for you. Our work is use depth map to help improve the performance on monocular 3d detection. The result use depth map extracted by DORN is as follow: import os
import numpy as np
from tqdm.auto import tqdm
import os.path as osp
import os
import cv2
def compute_errors(gt, pred):
"""Computation of error metrics between predicted and ground truth depths
"""
thresh = np.maximum((gt / pred), (pred / gt))
a1 = (thresh < 1.25 ).mean()
a2 = (thresh < 1.25 ** 2).mean()
a3 = (thresh < 1.25 ** 3).mean()
rmse = (gt - pred) ** 2
rmse = np.sqrt(rmse.mean())
rmse_log = (np.log(gt) - np.log(pred)) ** 2
rmse_log = np.sqrt(rmse_log.mean())
abs_rel = np.mean(np.abs(gt - pred) / gt)
sq_rel = np.mean(((gt - pred) ** 2) / gt)
return abs_rel, sq_rel, rmse, rmse_log, a1, a2, a3
def evaluate(opt):
"""Evaluates a pretrained model using a specified test set
"""
MIN_DEPTH = 1e-3
MAX_DEPTH = 80
base_path = '/home/chenyinan/Projects/github/DPT/data/kitti/training'
gt_path = osp.join(base_path, 'depth_map')
pred_path = osp.join(base_path, 'depth_2_dpt_crop')
preds = os.listdir(pred_path)
ratios = []
errors = []
for i in tqdm(range(len(preds))):
gt_depth = cv2.imread(osp.join(gt_path, preds[i]), cv2.IMREAD_UNCHANGED).astype('float') / 256
pred_depth = cv2.imread(osp.join(pred_path, preds[i]), cv2.IMREAD_UNCHANGED).astype('float') / 256
mask = gt_depth > 0
if opt == "gt":
ratio = np.median(gt_depth[mask] / pred_depth[mask])
else:
ratio = 1
pred_depth = pred_depth[mask]
gt_depth = gt_depth[mask]
pred_depth *= ratio
ratios.append(ratio)
pred_depth[pred_depth < MIN_DEPTH] = MIN_DEPTH
pred_depth[pred_depth > MAX_DEPTH] = MAX_DEPTH
if len(gt_depth) != 0:
errors.append(compute_errors(gt_depth, pred_depth))
ratios = np.array(ratios)
med = np.median(ratios)
print(" Scaling ratios | med: {:0.3f} | std: {:0.3f}".format(med, np.std(ratios / med)))
mean_errors = np.array(errors).mean(0)
print("\n " + ("{:>8} | " * 7).format("abs_rel", "sq_rel", "rmse", "rmse_log", "a1", "a2", "a3"))
print(("&{: 8.3f} " * 7).format(*mean_errors.tolist()) + "\\\\")
print("\n-> Done!")
if __name__ == '__main__':
st = 'gt'
evaluate(st)
|
Thanks! One more question: if I want to get absolute depth from prediction of model (for example, using cam height like DGC module from this paper) I should also calculate scale and shift, or just a scale from some other method (like paper I mentioned)? |
@RuslanOm I haven't read this paper, but the relation of the output p of the relative depth models to the true depth d = 1 / (s*p + t), where s is scale and t is shift. You need to determine both if you want to recover absolute metric depth and they vary per image. @cheniynan The depth evalaution numbers are quite different to what we got (both with our method and with DORN). Are you using a specific subset of KITTI for evaluation? Could you perhaps post only the part of the code that shows image transform, passing through DPT, and any post-processing so that I can try to reproduce your evaluation? Without that it is pretty much impossible to debug your issue. |
The dataset that I use is kitti 3D object detection dataset, which contains 7481 images for training&validation and 7518 images for test. The training set is divided into 3712 images for training and 3769 images for validation. The workflow of our model is as follow: |
The KITTI model doesn't have a missing scale or shift, so this should be fine. Let me ask you a couple of specific questions:
|
def write_depth(path, depth, bits=1):
"""Write depth map to pfm and png file.
Args:
path (str): filepath without extension
depth (array): depth
"""
# write_pfm(path + ".pfm", depth.astype(np.float32))
depth_min = depth.min()
depth_max = depth.max()
max_val = (2 ** (8 * bits)) - 1
if depth_max - depth_min > np.finfo("float").eps:
out = max_val * (depth - depth_min) / (depth_max - depth_min)
else:
assert True, 'zeros'
out = np.zeros(depth.shape, dtype=depth.dtype)
out = depth
if bits == 1:
cv2.imwrite(path + ".png", out.astype("uint8"))
elif bits == 2:
cv2.imwrite(path + ".png", out.astype("uint16"))
return
|
I evaluated the accuracy on kitti depth predict dataset and got the same result as your paper. But the accuracy of DPT on kitti object detection dataset is worse than DORN, maybe I should find another model to get better depth map for object detection. Anyway, thanks for your great work! |
@AlexeyAB Thanks! Can you explain, please, for what we need to use this cropps for KITTI (or other cropps)? |
@cheniynan Seems like you are normalizing the output to [0, 1]. Not sure if this influences your results since you seem to be able to reproduce our numbers on the Eigen split, but we've added a flag "--absolute_depth" that should write results in the format that you need. @RuslanOm Crops are usually necessary to align the input image size to the requirements of the network. For example, DPT requires the input image to be a fixed multiple of 32. For KITTI and NYUv2 we use crops that have been standard for evaluating models on these datasets in prior work to ensure that the results are comparable. |
We need to use In the PR I show how to match results from our paper (only for NYU result is slightly better than in the paper: AbsRel=0.109, while in the paper AbsRel=0.110): #32 We use the same evaluation approach as in the papers: Ravi Garg et al., David Eigen et al. and Jin Han Lee at al. https://github.com/cogaplex-bts/bts/tree/master/pytorch#bts Ravi Garg et al. Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue: https://arxiv.org/pdf/1603.04992.pdf
David Eigen et al. Depth Map Prediction from a Single Image using a Multi-Scale Deep Network: https://papers.nips.cc/paper/2014/file/7bccfde7714a1ebadf06c5f4cea752c1-Paper.pdf
|
@cheniynan
Try to use new Or use this command: https://github.com/intel-isl/DPT/blob/b71c02b3cda1823e703cad2dae8b005b13e90590/util/io.py#L171-L198 |
@ranftlr @AlexeyAB I'm so sorry, but I try to make same manipulations for |
@AlexeyAB OK, thanks! |
@RuslanOm Here is a gist that reproduces the evaluation https://gist.github.com/ranftlr/45f4c7ddeb1bbb88d606bc600cab6c8d See here for the subset of 161 images that we use for evaluation: isl-org/MiDaS#18 Please pull my latest changes to get exactly this result, as I found a small discrepancy in the interpolation module which let to slightly higher error (8.47% instead of 8.46%). |
Hi! Thanks again about your work!
Recently, I tried to check accuracy of pre-trained models on KITTI (Eigen split) and found that it is differ from paper results.
On this screenshoot you can see basic metrics used in depth prediction on Eigen split (files for split I take from this repo). For ground truth i used raw data from velodyne (used loader like this)
I hope, you can explain this results. Thanks!
The text was updated successfully, but these errors were encountered: