Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues faced during running the training and test scripts #5

Open
kkaytekin opened this issue Jul 6, 2023 · 16 comments
Open

Issues faced during running the training and test scripts #5

kkaytekin opened this issue Jul 6, 2023 · 16 comments

Comments

@kkaytekin
Copy link

kkaytekin commented Jul 6, 2023

Hello again,
I would like to share some issues that I faced while running the training script. Note that I have prepared the datasets as explained in the R2D2 repository.

  1. Mismatch of dimensions during det_loss calculation:
    While executing this line I get
RuntimeError: The size of tensor a (64) must match the size of tensor b (65) at non-singleton dimension 1

I solved this by replacing the line 357 as follows:

        elif self.detloss in ['ce']:
            # det_loss = self.det_loss(pred_score=output["semi"], gt_score=output["gt_semi"], weight=output["weight"],
            #                          stability_map=None)
            det_loss = self.det_loss(pred_score=output["semi"], gt_score=output["gt_semi_norm"], weight=output["weight"],
                                     stability_map=None)

I think this error is caused by parsing of wrong values. In inputs, we got

output["gt_semi"].shape = (4,64,64,64) (==gt_score)
output["semi"].shape = (4,65,64,64) (==pred_score)

in output dict we also had

output["gt_semi_norm"] with shape (4,65,64,64)

So i replaced gt_semi with gt_semi_norm which has matching dimensions. I am not sure if this is a valid solution.

  1. Learning rate decay parameters are not specified. In trainer.py, line 166 the interpreter complains that self.args.decay_rate and self.args.decay_iter cannot be found. Indeed, they are neither specified in the argparser nor in the config file. The workaround for now is to disable learning rate decay by replacing line 166 with
#lr = min(self.args.lr * self.args.decay_rate ** (self.iteration - self.args.decay_iter), self.args.lr)
lr = self.args.lr

I think this change will prevent us from replicating the results in the paper.

Also, while running the test script test_aachenv_1_1 there are some matters I would like to mention:

  1. I am not sure whether to use the Aachen dataset that we prepared during the training, or Aachen v1.1 dataset that we can find online (for example, I downloaded it from here, as mentioned in the readme file). Since the datasets might be different, I would like to ask if there are any specific preprocessing steps I should follow to reproduce your results?
  2. In line 31 of the test script, the file pairs-db-covis20.txt is missing. I found it here, but since I found this file and the aachen v1.1 database from different sources, I wanted to ask if there is some other source I should download the aachen v1.1 dataset from, maybe a source including this file already?
  3. We need to specify outputs folder as shown here Does that mean I should first run some other script to do inference and collect the results under some outputs folder I specified?
  4. Missing file aachen_db_imglist.txt here. Google search for this file was not successful.
  5. Missing file day_night_time_queries_with_intrinsics.txt here. Google search for this file was not successful. The aachen v1.1 dataset I mentioned above only has night_time_queries_with_intrinsics.txt.
    Thank you very much and best regards,
@meng152634
Copy link

same problem...

@1561213
Copy link

1561213 commented Sep 12, 2023

same problem.

@1561213
Copy link

1561213 commented Sep 13, 2023

And in trainer.py,lin385
eval_out = self.eval_on_data() is likely not defined,so I get

AributeError: 'Trainer' object has no attribute 'eval_on_data'

Thanks.

@XZYuann
Copy link

XZYuann commented Oct 14, 2023

same problem.

@pQWQq
Copy link

pQWQq commented Oct 23, 2023

I found that the data about decay_rate in the config_train_r2d2.json in the March 9th version of the code is set to decay_rate=0.99996 decay_iter=80000

@zhengshunkai
Copy link

+1

@zhengshunkai
Copy link

@zhengshunkai
Copy link

aachen_db_imglist.txt not used;
day_night_time_queries_with_intrinsics.txt may be the day+night

@zhengshunkai
Copy link

@Inverse-function
Copy link

jiu
that's why?

@eronez
Copy link

eronez commented Apr 23, 2024

And in trainer.py,lin385 eval_out = self.eval_on_data() is likely not defined,so I get

AributeError: 'Trainer' object has no attribute 'eval_on_data'

Thanks.

@1561213
same issue with u.
Do you have solved this problem?

@Inverse-function
Copy link

Inverse-function commented Apr 24, 2024 via email

@feixue94
Copy link
Owner

Hi,

Thank you for your interest in our work. I will fix these bugs and update the code.

@liutao23
Copy link

jiu that's why?

your mmseg version is too high,you can chosse it following :https://mmsegmentation.readthedocs.io/zh-cn/0.x/faq.html
image

@Inverse-function
Copy link

jiu that's why?

your mmseg version is too high,you can chosse it following :https://mmsegmentation.readthedocs.io/zh-cn/0.x/faq.html image

thank you

@Adolfhill
Copy link

Hi guys, have you solved the issue of the missing implementation of eval_on_data?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests