The question about results on Aachen dataset in original paper. #52

ez4lionky · 2020-12-29T10:41:20Z

Hi, thanks for your great work! The paper is very solid and I think it will has a profound influence.
But I have some questions (uncertain details) about visual localization results in the original paper.

Do the results on the Aachen dataset are obtained by a model trained on MegaDepth, or trained individually?
Does the localize pipeline used in the original paper is similar to the local feature challenge 2019?
I reproduced similar results by using the Hierarchical Localization pipeline. But the results from the original paper is marginally lower than workshop results on CVPR 2020, it is caused by the difference from the pipeline or something else?

sarlinpe · 2020-12-29T15:28:16Z

Thanks for your questions.

The results on the Aachen Day-Night dataset are obtained with the model trained on MegaDepth.
Yes, the results are reported using the pipeline provided by the benchmark. Feature extraction and matching can easily be performed with our toolbox hloc. To ensure fairness, I recommend using the official pipeline for the steps of 3D reconstruction and absolute pose estimation.
Our original SuperGlue paper reports results for the local feature challenge (i.e. image pairs are provided) for the Aachen v1.0 dataset. It corresponds to this entry. The results are different because the ground truth poses and evaluation thresholds were updated in July, but we did not update the paper. The CVPR 2020 workshop leaderboard however reports the previous results, before the update, which are consistent with the paper. The ECCV 2020 workshop leaderboard reports results for Aachen v1.1 (which has more queries), where our entry is here.

The entry which you link (here) corresponds to the full localization task, which is different than the local feature challenge since it also involves retrieval. The number are higher because we use an improved pipeline (hloc) for reconstruction and absolute pose estimation.

ez4lionky · 2020-12-29T16:32:44Z

Thanks for your questions.

The results on the Aachen Day-Night dataset are obtained with the model trained on MegaDepth.

Yes, the results are reported using the pipeline provided by the benchmark. Feature extraction and matching can easily be performed with our toolbox hloc. To ensure fairness, I recommend using the official pipeline for the steps of 3D reconstruction and absolute pose estimation.

Our original SuperGlue paper reports results for the local feature challenge (i.e. image pairs are provided) for the Aachen v1.0 dataset. It corresponds to this entry. The results are different because the ground truth poses and evaluation thresholds were updated in July, but we did not update the paper. The CVPR 2020 workshop leaderboard however reports the previous results, before the update, which are consistent with the paper. The ECCV 2020 workshop leaderboard reports results for Aachen v1.1 (which has more queries), where our entry is here.

The entry which you link (here) corresponds to the full localization task, which is different than the local feature challenge since it also involves retrieval. The number are higher because we use an improved pipeline (hloc) for reconstruction and absolute pose estimation.

Thanks for your reply! it really helps me to solve confusion. :)

sarlinpe · 2020-12-29T16:54:08Z

Great, please close this issue if your problem is solved. Thanks!

ez4lionky · 2020-12-31T10:16:15Z

Well, now I'm trying to reproduce the model on MegaDepth, and I have other questions about the image pairs generation and ground truth correspondences:

Does the overlap score is calculated by using len(∩(S1, S2)) / len(∪(S1, S2)), where S1, S2 is the observed points set of one image pair in the reconstructed SfM model?
Does the relative depth error threshold is calculated by estimated_depth / annotated_depth or something else? Where estimated depth is calculated by 3d coordinate and RT, annotated_depth is rendered by MVS.

Could you give me more details? Thanks!

sarlinpe · 2021-01-02T08:14:47Z

The overlap is defined asymmetrically by D2-Net here as len(∩(S1, S2)) / len(S1) and len(∩(S1, S2)) / len(S2). For image pairs that are zoom-in/out, only one direction might have a sufficiently large overlap to pass the threshold test, but this does not matter much as SuperGlue is equivariant to the permutation of the pair.
I am not sure to understand this question. We use the dense MVS depth maps that come with MegaDepth to estimate ground-truth correspondences across images pairs. Correct correspondences are those that have a small distance in both views and that have a small relative depth error when rendered in both views (as to discard occluded keypoints).

ez4lionky · 2021-01-02T09:34:12Z

Yep, I agree with you, only one direction might have a sufficiently large overlap to pass the threshold test. But the ratio threshold in some cases may generate a huge difference in the portion of pairs. I want to keep the same with your implementation :).
The first thing I thought that the relative depth error is the depth difference between both views. But lately, I thought that we need to calculate the reprojection matrix. I thought it includes calculating the 3D coordinates of key points through depth in the source image, then warp and reproject it into the target image. Therefore, I think maybe the relative depth error is calculated by the estimated_depth (calculated by estimated 3d coords in source and rt of target) / annotated_depth (rendered by MVS). I admit this way is cost-ineffective, it's just the product of my overthinking 😬 .

Anyway, thanks for your hints. So, the relative depth error is also asymmetrical, right?
Maybe the relative depth error is calculated by abs(d1 - d2) / d1, or abs(d1 - d2) / min(d1, d2) which is symmetrical?

Besides, I saw the discussion that said not using any data augmentation during training. Does that mean I can generate keypoints and descriptors offline for efficiency? But generate keypoints and descriptors with a little randomness (like lightly random brightness and scale) is better?

sarlinpe · 2021-01-02T18:45:22Z

Ground truth matches

For each keypoint in each image: we interpolate the depth, lift the keypoint to 3D, and project it to the other image using the relative pose. We then interpolate the depth at the projected location and check if it is consistent with the 3D point using the relative error. If larger than 10%, the projection is marked as occluded. Two keypoints are deemed matchable if both are not occluded and if the distances to their projections are both lower than some threshold.

Data augmentation

The only random data augmentations used for MegaDepth are i) the random cropping (since not all images have the same size), ii) the random additional keypoints (for images that have fewer than 1024 detected keypoints). Both allow to detect and describe offline, and dynamically drop or add keypoints.

ez4lionky · 2021-01-07T11:11:52Z

Thanks for your help, now I'm clear :).
Closing the issue.

ssssjiang · 2022-01-07T12:07:51Z

Ground truth matches

For each keypoint in each image: we interpolate the depth, lift the keypoint to 3D, and project it to the other image using the relative pose. We then interpolate the depth at the projected location and check if it is consistent with the 3D point using the relative error. If larger than 10%, the projection is marked as occluded. Two keypoints are deemed matchable if both are not occluded and if the distances to their projections are both lower than some threshold.

Data augmentation

The only random data augmentations used for MegaDepth are i) the random cropping (since not all images have the same size), ii) the random additional keypoints (for images that have fewer than 1024 detected keypoints). Both allow to detect and describe offline, and dynamically drop or add keypoints.

hi, Can you specify the formula for calculating the relative error? 3D point depth := d_3D, projection depth := d_2，relative_errors := abs(d_3D - d_2) / d_3D ?
if d_2 > d_3D, the projection does not seem to be obscured...

ez4lionky closed this as completed Dec 29, 2020

ez4lionky reopened this Dec 31, 2020

ez4lionky closed this as completed Jan 7, 2021

wangchuting mentioned this issue Jan 21, 2021

Details for training on Megadepth #54

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The question about results on Aachen dataset in original paper. #52

The question about results on Aachen dataset in original paper. #52

ez4lionky commented Dec 29, 2020

sarlinpe commented Dec 29, 2020

ez4lionky commented Dec 29, 2020

sarlinpe commented Dec 29, 2020

ez4lionky commented Dec 31, 2020 •

edited

Loading

sarlinpe commented Jan 2, 2021

ez4lionky commented Jan 2, 2021 •

edited

Loading

sarlinpe commented Jan 2, 2021

ez4lionky commented Jan 7, 2021

ssssjiang commented Jan 7, 2022

Ground truth matches

Data augmentation

The question about results on Aachen dataset in original paper. #52

The question about results on Aachen dataset in original paper. #52

Comments

ez4lionky commented Dec 29, 2020

sarlinpe commented Dec 29, 2020

ez4lionky commented Dec 29, 2020

sarlinpe commented Dec 29, 2020

ez4lionky commented Dec 31, 2020 • edited Loading

sarlinpe commented Jan 2, 2021

ez4lionky commented Jan 2, 2021 • edited Loading

sarlinpe commented Jan 2, 2021

Ground truth matches

Data augmentation

ez4lionky commented Jan 7, 2021

ssssjiang commented Jan 7, 2022

Ground truth matches

Data augmentation

ez4lionky commented Dec 31, 2020 •

edited

Loading

ez4lionky commented Jan 2, 2021 •

edited

Loading