Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training speed and results #6

Closed
litingsjj opened this issue Nov 17, 2021 · 29 comments
Closed

Training speed and results #6

litingsjj opened this issue Nov 17, 2021 · 29 comments

Comments

@litingsjj
Copy link

Sorry to bother you, I have two question about this project. for training work, I use multi gpus, the speed about 1.79it/s in first two epoch, it takes one day. After that, the speed is so fast, I don't know why this happened. Also the result, the detections repeatability is more better than rpautrat/Superpoint's, but the descriptors: hpatches-i is 0.90. hpatches-v is 0.55. Compare with rpautrat/Superpoint's, the result is not good.

@shaofengzeng
Copy link
Owner

Thanks for your attention. I have no idea about the speed. It is similar to my training process. I guess the detector loss causes this phenomenon

The default hyper-parameters may not work well, you have to adjust them several times to get better results. The latest version adds two important variables, i.e., positive_dist, negative_dist, to help you fine tune the model. The parameters lambda_d, lambda_loss, positive_margin and negative_margin are the most decisive ones. You need to adjust them to make positive_dist and negative_dist as small as possible. (The weight i released in the latest version can achieve 0.66 acc. on hpatches_v)

@litingsjj
Copy link
Author

Thx much with your reply! I will try finetune these parameters! 0.66 acc for hpatches-v is your best result? Also, is the result use pretrain model(superpoint_bn.pth)?

@shaofengzeng
Copy link
Owner

Thx much with your reply! I will try finetune these parameters! 0.66 acc for hpatches-v is your best result? Also, is the result use pretrain model(superpoint_bn.pth)?

I trained the model without any pre-trained model, and it cost me several days to achieve this performance. Training the model really needs experience and tricks. And i'm failed to find a better hyper-parameters that can directly obtain a good model. 0.66 may not be the best, but this is the best model i can get right now.

@litingsjj
Copy link
Author

Got it! thx again!

@litingsjj
Copy link
Author

my result never reach 0.60 for hpatches-v when try several hyper-parameters, can you share hyper-parameters with superpoint_bn.pth? because I inference this model, the result hpatches-v is 0.66. Also if you get better result that can share the experience. I will so appreciate!

@litingsjj litingsjj reopened this Nov 19, 2021
@shaofengzeng
Copy link
Owner

shaofengzeng commented Nov 19, 2021

If you want to run with superpoint_bn.pth, remember to set eps=1e-3 for all BatchNorm2d in backbone.py and cnn_head.py. Moreover, another parameter, i.e., momentum , may also work for superpoint_bn.pth (set momentum=0.01), you can try.

@litingsjj
Copy link
Author

litingsjj commented Nov 19, 2021

Actually, I get 0.66 result when set eps=1e-3 and momentum=0.1 for inference. But when I training without any pre-trained model, how to set those hyper-parameters to get this result? I try to set eps=1e-3, lambda_d=250 and lambda_loss=10 or other val, fix lr... It can't get this result

@shaofengzeng
Copy link
Owner

shaofengzeng commented Nov 19, 2021

momentum=0.01 and eps=1e-3 are for BatchNorm2d, and only work for superpoint_bn.pth, which is converted from rpautrat's superpoint.
If you want to train your model using this pytorch scripts, i suggest you remove the parameters eps and momentum in BatchNorm2d
sp_0.pth is the model i trained without any pretrained weight.
As far as i know, this pytorch version is very sensitive to the parameters lambda_d and lambda_loss. A bigger lambda_d will make the model more stable,however,the performance is not good.

@litingsjj
Copy link
Author

litingsjj commented Nov 19, 2021

Thanks! Ihave a last question about README.md -> Steps. Is that means training model need two stage? The firsts stage need 2.Commet the following lines in loss.py and Set base_lr=0.01 , than use trained models train stage two: base_lr=0.001 and fix hyper-parameters:lambda_d = 250 #balance positive_dist and negtive_dist lambda_loss = 10 #balance detector loss and descriptor loss
The 7.Start training again.(lambda_d and lambda_loss may need to be adjusted several times). is means fix hyper-parameters several times to train model?(maybe stage three, stage four?)

@litingsjj
Copy link
Author

If I use your last version, should I train how many times?

@litingsjj
Copy link
Author

Also, converted from rpautrat's superpoint didn't have same result, Is possible have bug like this: rpautrat/SuperPoint#117
eric-yyjau/pytorch-superpoint#24

@shaofengzeng
Copy link
Owner

shaofengzeng commented Nov 19, 2021

Set eps=1e-3 for BN will achieve 0.66 acc. on hpatches-v. I'm not sure if you can achieve similar performances by setting eps=1e-3, momentum=0.01. Because the default parameters of BN between tf and torch are different, eps and momentum are two key parameters. Moreover, the conv2d in tf and pytorch are also slightly different. So we can only get similar performances as the rpautrat's sp.
I'm not sure how many times you need to train. Right now, i haven't got an effective training process. However, i strongly suggest you reading the descriptor_loss functions in loss.py and debugging some of the key variables like dot_product_desc, positive_dist, negtive_dist. I think this will greatly help you adjust the hyper-parameters

@litingsjj
Copy link
Author

Thanks for your answer!

@shaofengzeng
Copy link
Owner

shaofengzeng commented Nov 28, 2021

Hi litingsjj, I found that the parameters for photometric are different from rpautrat's SuperPoint. This may affect the training performance.
I have update superpoint_coco_train.yaml according to rpautrat's SuperPoint.

random_brightness: {max_abs_change: 0.2}
random_contrast: {strength_range: [0.5, 1.5]}
additive_gaussian_noise: {stddev_range: [0, 10]}
additive_speckle_noise: {prob_range: [0, 0.0035]}
additive_shade:
    transparency_range: [-0.5, 0.5]
    kernel_size_range: [100, 150]
    nb_ellipses: 20
motion_blur: {max_kernel_size: 3}

And,i'm checking to see if there are any other problems.

@litingsjj
Copy link
Author

great! what's the results when you update the *.yaml? Also, I can't reproduce your result for now

@litingsjj
Copy link
Author

litingsjj commented Nov 29, 2021

  1. Maybe my trained magic model is different with yours, so export coco dataset make affect performance. can you provide your magic model?

  2. last week , i use your project, the descriptor loss have some comment, like:
    ` ## better comment this at the begining of training
    #dot_product_desc = F.relu(dot_product_desc)

    ##Normalization scores, better comment this at the begining of training

    dot_product_desc = torch.reshape(F.normalize(torch.reshape(dot_product_desc, [batch_size, Hc, Wc, Hc * Wc]), p=2,dim=3), [batch_size, Hc, Wc, Hc, Wc])

    dot_product_desc = torch.reshape(F.normalize(torch.reshape(dot_product_desc, [batch_size, Hc * Wc, Hc, Wc]), p=2,dim=1), [batch_size, Hc, Wc, Hc, Wc])`

I'm confused about your train strategy

@shaofengzeng
Copy link
Owner

a. The magic point trained by this repo. is different from rp's superpoint. It seems that our magic point usually generate more keypoints than sp's mp. It may be caused by homography_adaptation in homo_export_labels.py. Or you can set a larger det_thresh in *.yaml when generate coco labels.
b. According to this issue, it is better to comment the following lines

dot_product_dest = F.relu(dot_product_dest)
dot_product_desc = torch.reshape(F.normalize....)
dot_product_desc = torch.reshape(F.normalize....)

This may be unnecessary.
c. The repository have been updated.

  1. change the photo_aug strategy in photometric_augmentation.py
  2. Apply homo_aug first, then photo_aug in coco.py and synthetic_shapes.py

@shaofengzeng
Copy link
Owner

Moreover, I can not achieve similar training performances with these improvements, keep on checking...

@shaofengzeng
Copy link
Owner

shaofengzeng commented Dec 1, 2021

Hi, i uncommented the following 3 lines and set lr=0.001, and train superpoint on the coco dataset generated by rpautrat's magicpoint model. The performance on hpatches-v is 0.698 ! much similar to rpautrat's model. I have update the repo.

    dot_product_desc = F.relu(dot_product_desc)

    ##l2_normalization
    dot_product_desc = torch.reshape(F.normalize(torch.reshape(dot_product_desc, [batch_size, Hc, Wc, Hc * Wc]),
                                                 p=2,
                                                 dim=3), [batch_size, Hc, Wc, Hc, Wc])
    dot_product_desc = torch.reshape(F.normalize(torch.reshape(dot_product_desc, [batch_size, Hc * Wc, Hc, Wc]),
                                                 p=2,
                                                 dim=1), [batch_size, Hc, Wc, Hc, Wc])

However, there may still have some problems training the magicpoint, it usually produce more key points than rpautrat's model. The may affect the final results

@litingsjj
Copy link
Author

if you worried about that, maybe we can use rpautrat's model to export coco dataset to train superpoint. Also, the dataset(coco2017) is different with rpautrat used(coco2014). And can i use your last version to reproduce your result?

@litingsjj
Copy link
Author

The performance on hpatches-v get 0.725!

@shaofengzeng shaofengzeng reopened this Dec 3, 2021
@shaofengzeng
Copy link
Owner

Great! Would you to share your training method ? This may help lots of people who follow this repo.

@litingsjj
Copy link
Author

I trained magic point with rpautrat's project to get coco gt points, then use your version to train superpoint.

@shaofengzeng
Copy link
Owner

OK, thinks

@FeiXie8
Copy link

FeiXie8 commented Jan 19, 2022

I trained magic point with rpautrat's project to get coco gt points, then use your version to train superpoint.

Hello,sorry to bother you, rpautrat's project generate the lables is .npz while this project generate lables is .npy,how do you solve this question?

@shaofengzeng
Copy link
Owner

I trained magic point with rpautrat's project to get coco gt points, then use your version to train superpoint.

Hello,sorry to bother you, rpautrat's project generate the lables is .npz while this project generate lables is .npy,how do you solve this question?

Hi, it is easy to convert *.npz to .npy. And remeber to zoom the coco image to 240320 by function ratio_preserving_resize

@FeiXie8
Copy link

FeiXie8 commented Jan 19, 2022

I trained magic point with rpautrat's project to get coco gt points, then use your version to train superpoint.

Hello,sorry to bother you, rpautrat's project generate the lables is .npz while this project generate lables is .npy,how do you solve this question?

Hi, it is easy to convert *.npz to _.npy. And remeber to zoom the coco image to 240_320 by function ratio_preserving_resize

Is that mean rpautrat's project's image size is not 240320 and this project's image size has been resized to 240320?

@FeiXie8
Copy link

FeiXie8 commented Jan 19, 2022

I trained magic point with rpautrat's project to get coco gt points, then use your version to train superpoint.

Hello,sorry to bother you, rpautrat's project generate the lables is .npz while this project generate lables is .npy,how do you solve this question?

Hi, it is easy to convert *.npz to _.npy. And remeber to zoom the coco image to 240_320 by function ratio_preserving_resize

Though rpautrat's lables are .npz,it is one image to one npz file,so it seems easy to convert.

@leon5678
Copy link

I trained magic point with rpautrat's project to get coco gt points, then use your version to train superpoint.

hi, could you email me the coco gt points you got in leon_wu6@163.com?
i tried to generate them as you said, but the results are not good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants