Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The question about results on Aachen dataset in original paper. #52

Closed
ez4lionky opened this issue Dec 29, 2020 · 9 comments
Closed

The question about results on Aachen dataset in original paper. #52

ez4lionky opened this issue Dec 29, 2020 · 9 comments

Comments

@ez4lionky
Copy link

Hi, thanks for your great work! The paper is very solid and I think it will has a profound influence.
But I have some questions (uncertain details) about visual localization results in the original paper.

  1. Do the results on the Aachen dataset are obtained by a model trained on MegaDepth, or trained individually?
  2. Does the localize pipeline used in the original paper is similar to the local feature challenge 2019?
  3. I reproduced similar results by using the Hierarchical Localization pipeline. But the results from the original paper is marginally lower than workshop results on CVPR 2020, it is caused by the difference from the pipeline or something else?
@sarlinpe
Copy link
Contributor

Thanks for your questions.

  1. The results on the Aachen Day-Night dataset are obtained with the model trained on MegaDepth.
  2. Yes, the results are reported using the pipeline provided by the benchmark. Feature extraction and matching can easily be performed with our toolbox hloc. To ensure fairness, I recommend using the official pipeline for the steps of 3D reconstruction and absolute pose estimation.
  3. Our original SuperGlue paper reports results for the local feature challenge (i.e. image pairs are provided) for the Aachen v1.0 dataset. It corresponds to this entry. The results are different because the ground truth poses and evaluation thresholds were updated in July, but we did not update the paper. The CVPR 2020 workshop leaderboard however reports the previous results, before the update, which are consistent with the paper. The ECCV 2020 workshop leaderboard reports results for Aachen v1.1 (which has more queries), where our entry is here.

The entry which you link (here) corresponds to the full localization task, which is different than the local feature challenge since it also involves retrieval. The number are higher because we use an improved pipeline (hloc) for reconstruction and absolute pose estimation.

@ez4lionky
Copy link
Author

Thanks for your questions.

  1. The results on the Aachen Day-Night dataset are obtained with the model trained on MegaDepth.
  2. Yes, the results are reported using the pipeline provided by the benchmark. Feature extraction and matching can easily be performed with our toolbox hloc. To ensure fairness, I recommend using the official pipeline for the steps of 3D reconstruction and absolute pose estimation.
  3. Our original SuperGlue paper reports results for the local feature challenge (i.e. image pairs are provided) for the Aachen v1.0 dataset. It corresponds to this entry. The results are different because the ground truth poses and evaluation thresholds were updated in July, but we did not update the paper. The CVPR 2020 workshop leaderboard however reports the previous results, before the update, which are consistent with the paper. The ECCV 2020 workshop leaderboard reports results for Aachen v1.1 (which has more queries), where our entry is here.

The entry which you link (here) corresponds to the full localization task, which is different than the local feature challenge since it also involves retrieval. The number are higher because we use an improved pipeline (hloc) for reconstruction and absolute pose estimation.

Thanks for your reply! it really helps me to solve confusion. :)

@sarlinpe
Copy link
Contributor

Great, please close this issue if your problem is solved. Thanks!

@ez4lionky
Copy link
Author

ez4lionky commented Dec 31, 2020

Well, now I'm trying to reproduce the model on MegaDepth, and I have other questions about the image pairs generation and ground truth correspondences:

  1. Does the overlap score is calculated by using len(∩(S1, S2)) / len(∪(S1, S2)), where S1, S2 is the observed points set of one image pair in the reconstructed SfM model?
  2. Does the relative depth error threshold is calculated by estimated_depth / annotated_depth or something else? Where estimated depth is calculated by 3d coordinate and RT, annotated_depth is rendered by MVS.

Could you give me more details? Thanks!

@ez4lionky ez4lionky reopened this Dec 31, 2020
@sarlinpe
Copy link
Contributor

sarlinpe commented Jan 2, 2021

  1. The overlap is defined asymmetrically by D2-Net here as len(∩(S1, S2)) / len(S1) and len(∩(S1, S2)) / len(S2). For image pairs that are zoom-in/out, only one direction might have a sufficiently large overlap to pass the threshold test, but this does not matter much as SuperGlue is equivariant to the permutation of the pair.

  2. I am not sure to understand this question. We use the dense MVS depth maps that come with MegaDepth to estimate ground-truth correspondences across images pairs. Correct correspondences are those that have a small distance in both views and that have a small relative depth error when rendered in both views (as to discard occluded keypoints).

@ez4lionky
Copy link
Author

ez4lionky commented Jan 2, 2021

  1. Yep, I agree with you, only one direction might have a sufficiently large overlap to pass the threshold test. But the ratio threshold in some cases may generate a huge difference in the portion of pairs. I want to keep the same with your implementation :).
  2. The first thing I thought that the relative depth error is the depth difference between both views. But lately, I thought that we need to calculate the reprojection matrix. I thought it includes calculating the 3D coordinates of key points through depth in the source image, then warp and reproject it into the target image. Therefore, I think maybe the relative depth error is calculated by the estimated_depth (calculated by estimated 3d coords in source and rt of target) / annotated_depth (rendered by MVS). I admit this way is cost-ineffective, it's just the product of my overthinking 😬 .

Anyway, thanks for your hints. So, the relative depth error is also asymmetrical, right?
Maybe the relative depth error is calculated by abs(d1 - d2) / d1, or abs(d1 - d2) / min(d1, d2) which is symmetrical?

Besides, I saw the discussion that said not using any data augmentation during training. Does that mean I can generate keypoints and descriptors offline for efficiency? But generate keypoints and descriptors with a little randomness (like lightly random brightness and scale) is better?

@sarlinpe
Copy link
Contributor

sarlinpe commented Jan 2, 2021

Ground truth matches

For each keypoint in each image: we interpolate the depth, lift the keypoint to 3D, and project it to the other image using the relative pose. We then interpolate the depth at the projected location and check if it is consistent with the 3D point using the relative error. If larger than 10%, the projection is marked as occluded. Two keypoints are deemed matchable if both are not occluded and if the distances to their projections are both lower than some threshold.

Data augmentation

The only random data augmentations used for MegaDepth are i) the random cropping (since not all images have the same size), ii) the random additional keypoints (for images that have fewer than 1024 detected keypoints). Both allow to detect and describe offline, and dynamically drop or add keypoints.

@ez4lionky
Copy link
Author

Thanks for your help, now I'm clear :).
Closing the issue.

@ssssjiang
Copy link

Ground truth matches

For each keypoint in each image: we interpolate the depth, lift the keypoint to 3D, and project it to the other image using the relative pose. We then interpolate the depth at the projected location and check if it is consistent with the 3D point using the relative error. If larger than 10%, the projection is marked as occluded. Two keypoints are deemed matchable if both are not occluded and if the distances to their projections are both lower than some threshold.

Data augmentation

The only random data augmentations used for MegaDepth are i) the random cropping (since not all images have the same size), ii) the random additional keypoints (for images that have fewer than 1024 detected keypoints). Both allow to detect and describe offline, and dynamically drop or add keypoints.

hi, Can you specify the formula for calculating the relative error? 3D point depth := d_3D, projection depth := d_2,relative_errors := abs(d_3D - d_2) / d_3D ?
if d_2 > d_3D, the projection does not seem to be obscured...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants