Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finetune the ConvNeXt-L on KITTI #26

Open
FangjunWang opened this issue Dec 5, 2023 · 28 comments
Open

Finetune the ConvNeXt-L on KITTI #26

FangjunWang opened this issue Dec 5, 2023 · 28 comments

Comments

@FangjunWang
Copy link

Hello and nice work! My question is how to finetune the model on KITTI?
I tried with the script ./finetune/train_ft_SQLdepth.py but cannot get good enough results. Only abs_rel 0.0494 and rmse 2.182.

@hisfog
Copy link
Owner

hisfog commented Dec 5, 2023

Did you load a pre-trained model (self-supervised pre-trained), and what's your SSL scores.

@FangjunWang
Copy link
Author

Did you load a pre-trained model (self-supervised pre-trained), and what's your SSL scores.

Thank you for your reply! Yes, I load a pre-trained model trained with script train.py with 20 epochs. What is SSL score? The training loss score is around 0.3~0.4, and the validation silog is 7.467.

@hisfog
Copy link
Owner

hisfog commented Dec 5, 2023

Thank you for your reply! Yes, I load a pre-trained model trained with script train.py with 20 epochs. What is SSL score? The training loss score is around 0.3~0.4, and the validation silog is 7.467.

I mean, what's your SSL model's metrics, AbsRel, RMSE, etc.

@FangjunWang
Copy link
Author

Thank you for your reply! Yes, I load a pre-trained model trained with script train.py with 20 epochs. What is SSL score? The training loss score is around 0.3~0.4, and the validation silog is 7.467.

I mean, what's your SSL model's metrics, AbsRel, RMSE, etc.

AbsRel

@hisfog
Copy link
Owner

hisfog commented Dec 5, 2023

AbsRel

Em, AbsRel = ?, I mean your SSL model's evaluation results on KITTI, not SiLog loss

@FangjunWang
Copy link
Author

AbsRel

Em, AbsRel = ?, I mean your SSL model's evaluation results on KITTI, not SiLog loss

The SSL model’s ecaluation results on KITTI are: abs_rel: 0.060, rmse: 2.642.

@hisfog
Copy link
Owner

hisfog commented Dec 5, 2023

The SSL model’s ecaluation results on KITTI are: abs_rel: 0.060, rmse: 2.642.

That's interesting, you got better SSL scores but worse SSL+Sup scores.

@hisfog
Copy link
Owner

hisfog commented Dec 5, 2023

Can you provide your fine-tuning args? I think you should choose a much smaller learning_rate.

@FangjunWang
Copy link
Author

Can you provide your fine-tuning args? I think you should choose a much smaller learning_rate.

--name cvnXt_075_1130
--root weights/inc_kitti_exps
--load_weights_folder weights/convnext_large/cvnXt_075/models/weights_15
--epochs 5
--bs 8
--lr 1e-5
--wd 0.01
--div_factor 10
--final_div_factor 100
--validate_every 250
--dataset kitti
--workers 8
--w_chamfer 0
--data_path datasets/KITTI/raw
--gt_path datasets/KITTI/gts/train
--filenames_file ./finetune/train_test_inputs/kitti_eigen_train_files_with_gt.txt
--input_height 320
--input_width 1024
--min_depth 0.001
--max_depth 80
--do_random_rotate
--degree 1.0
--data_path_eval datasets/KITTI/raw
--gt_path_eval datasets/KITTI/gts/val
--filenames_file_eval ./finetune/train_test_inputs/kitti_eigen_test_files_with_gt.txt
--min_depth_eval 1e-3
--max_depth_eval 80
--do_kb_crop
--garg_crop
--same_lr

@hisfog
Copy link
Owner

hisfog commented Dec 5, 2023

--epochs 5 --bs 8 --lr 1e-5

I recommend --bs 16 and I think lr should be smaller, 1e-6, 5e-6, etc.

@FangjunWang
Copy link
Author

--epochs 5 --bs 8 --lr 1e-5

I recommend --bs 16 and I think lr should be smaller, 1e-6, 5e-6, etc.

Thanks! I will try.

@Lavreniuk
Copy link

Lavreniuk commented Jan 24, 2024

Hi @FangjunWang, I am very excited to reproduce ConvNetX results as well. However, I am currently stuck on the first stage (SSL training). I ran the training using the following command:
python train.py ./args_files/hisfog/kitti/cvnXt_L_320x1024.txt

in cvnXt_L_320x1024.txt I changed only data_path, log_dir and batch_size=8 (instead of 16 as original, as I understood you did same change).
in other experiments I tried also lower lr, and remove diff_lr argument, but no improvement.
After that I calculated the score using command:
evaluate_depth_config.py args_files/hisfog/kitti/cvnXt_L_320x1024.txt
where I changed load_weights_folder to my weights path.
However I tried weights from all epochs and best result is:
abs_rel | sq_rel | rmse | rmse_log | a1 | a2 | a3 |
& 0.096 & 0.765 & 4.455 & 0.176 & 0.908 & 0.966 & 0.983 \
which is much worse than original and yours, could you please help me to understand what I did wrong on first stage, so I could fix it and after move to stage 2 (finetuning).

Should I download pretrained PoseNet or other weights, or maybe I calculates the metrics in the wrong way (but I checked it on downloaded resnet model and it reproduce same score as @hisfog claimed in gitrepo).
Could you please share your parameters as well as brief instruction what to do to reproduce the score.
Will be very thankful for help.

@FangjunWang
Copy link
Author

Hi @FangjunWang, I am very excited to reproduce ConvNetX results as well. However, I am currently stuck on the first stage (SSL training). I ran the training using the following command: python train.py ./args_files/hisfog/kitti/cvnXt_L_320x1024.txt

in cvnXt_L_320x1024.txt I changed only data_path, log_dir and batch_size=8 (instead of 16 as original, as I understood you did same change). in other experiments I tried also lower lr, and remove diff_lr argument, but no improvement. After that I calculated the score using command: evaluate_depth_config.py args_files/hisfog/kitti/cvnXt_L_320x1024.txt where I changed load_weights_folder to my weights path. However I tried weights from all epochs and best result is: abs_rel | sq_rel | rmse | rmse_log | a1 | a2 | a3 | & 0.096 & 0.765 & 4.455 & 0.176 & 0.908 & 0.966 & 0.983 \ which is much worse than original and yours, could you please help me to understand what I did wrong on first stage, so I could fix it and after move to stage 2 (finetuning).

Should I download pretrained PoseNet or other weights, or maybe I calculates the metrics in the wrong way (but I checked it on downloaded resnet model and it reproduce same score as @hisfog claimed in gitrepo). Could you please share your parameters as well as brief instruction what to do to reproduce the score. Will be very thankful for help.

Hello, my parameters are:
--data_path datasets/KITTI/raw/
--log_dir weights/convnext_large
--model_name cvnXt_075
--dataset kitti
--eval_split eigen
--backbone convnext_large
--height 320
--width 1024
--batch_size 8
--num_epochs 20
--scheduler_step_size 10
--model_dim 32
--patch_size 32
--dim_out 64
--query_nums 64
--dec_channels 1024 512 256 128
--min_depth 0.001
--max_depth 80.0
--diff_lr
--use_stereo
--load_weights_folder weights/ConvNeXt_Large_SQLdepth
--eval_mono
--post_process
--save_pred_disps

I did not change any other things besides above parameters. Hope this helps!

@Lavreniuk
Copy link

@FangjunWang, thank you for quick response. I have the same parameters.
do you train using only this command:
python train.py ./args_files/hisfog/kitti/cvnXt_L_320x1024.txt
and testing using this:
evaluate_depth_config.py args_files/hisfog/kitti/cvnXt_L_320x1024.txt

also did you do something else? download some pretrained weights before training or pretrained PoseNet?

@FangjunWang
Copy link
Author

@FangjunWang, thank you for quick response. I have the same parameters. do you train using only this command: python train.py ./args_files/hisfog/kitti/cvnXt_L_320x1024.txt and testing using this: evaluate_depth_config.py args_files/hisfog/kitti/cvnXt_L_320x1024.txt

also did you do something else? download some pretrained weights before training or pretrained PoseNet?

Yes, I trained and evaluated the model use the same command.
I only load a pretrained weights convnext_large_22k_1k_224.pth.

@Lavreniuk
Copy link

@FangjunWang, for this you should change params to:
--backbone convnext_large_in22ft1k
did you do it, or you manually change convnext_large to convnext_large_22k_1k_224.pth ?

@FangjunWang
Copy link
Author

@FangjunWang, for this you should change params to: --backbone convnext_large_in22ft1k did you do it, or you manually change convnext_large to convnext_large_22k_1k_224.pth ?

I changed networks/Unet.py like this:
if backbone == "convnext_large":
pretrained = False
backbone_kwargs = {"checkpoint_path": "weights/convnext_large_22k_1k_224_filtered.pth"}
encoder = create_model(
backbone, features_only=True, out_indices=backbone_indices,
in_chans=in_channels, pretrained=pretrained, **backbone_kwargs
)

@Lavreniuk
Copy link

@FangjunWang, thanks.
could you pls write me an email to nick_93@ukr.net, so I could directly connect to you for other questions?

@Lavreniuk
Copy link

@FangjunWang, I have tried convnext_large_22k_1k_224 as you suggest it provides slightly better results, however situation is similar.
For resnet50 I was able to mostly reproduce the original score:
abs_rel | sq_rel | rmse | rmse_log | a1 | a2 | a3 |
& 0.084 & 0.646 & 3.972 & 0.163 & 0.923 & 0.969 & 0.983 \

But for convnext I found next situation it improves first 6-9 epochs, and after that not improve but get worse and worse.
Have you get similar results or you have +- each epoch improvements?
ep1
abs_rel | sq_rel | rmse | rmse_log | a1 | a2 | a3 |
& 0.091 & 0.704 & 4.197 & 0.173 & 0.916 & 0.966 & 0.982 \
ep2
abs_rel | sq_rel | rmse | rmse_log | a1 | a2 | a3 |
& 0.091 & 0.675 & 4.182 & 0.168 & 0.918 & 0.968 & 0.983 \
ep3
abs_rel | sq_rel | rmse | rmse_log | a1 | a2 | a3 |
& 0.088 & 0.701 & 4.279 & 0.167 & 0.923 & 0.969 & 0.983 \
ep4
abs_rel | sq_rel | rmse | rmse_log | a1 | a2 | a3 |
& 0.085 & 0.625 & 4.017 & 0.166 & 0.926 & 0.968 & 0.983 \
ep5
abs_rel | sq_rel | rmse | rmse_log | a1 | a2 | a3 |
& 0.084 & 0.664 & 4.079 & 0.165 & 0.928 & 0.969 & 0.983 \
ep6
abs_rel | sq_rel | rmse | rmse_log | a1 | a2 | a3 |
& 0.082 & 0.647 & 4.119 & 0.167 & 0.926 & 0.967 & 0.982 \
ep7
abs_rel | sq_rel | rmse | rmse_log | a1 | a2 | a3 |
& 0.087 & 0.745 & 4.389 & 0.170 & 0.921 & 0.967 & 0.982 \
ep8
abs_rel | sq_rel | rmse | rmse_log | a1 | a2 | a3 |
& 0.086 & 0.707 & 4.256 & 0.169 & 0.923 & 0.967 & 0.982 \
ep9
abs_rel | sq_rel | rmse | rmse_log | a1 | a2 | a3 |
& 0.088 & 0.748 & 4.397 & 0.173 & 0.920 & 0.966 & 0.981 \

@jerry-ryu
Copy link

@Lavreniuk
I'm in the same problem as you. I've tried the imagenet pretrained model of convnext, and the posenet provided by @hisfog (#14 ). Can you help me if you make any progress?

here is my parameters:

--data_path /mnt/RG/dataset/kitti_data
--log_dir /mnt/RG/SfMNeXt-Impl/boost
--model_name cvnXt_high
--dataset kitti
--eval_split eigen
--backbone convnext_large_in22ft1k
--height 320
--width 1024
--batch_size 8
--num_epochs 20
--scheduler_step_size 10
--model_dim 32
--patch_size 32
--dim_out 64
--query_nums 64
--dec_channels 1024 512 256 128
--min_depth 0.001
--max_depth 80.0
--diff_lr
--use_stereo
--load_weights_folder /mnt/RG/SfMNeXt-Impl/boost/cvnXt_low/models/weights_0
--eval_mono
--post_process
--pretrained_pose
--pose_net_path /mnt/RG/SfMNeXt-Impl/checkpoints/pose

@Lavreniuk
Copy link

hi, @jerry-ryu , I have not reproduced the result of original repo, especially with much better result that was mentioned. I think you should train without pretrained posenet, but maybe I am wrong.
But from what I found in other issues it is similar for resnet and other model, that it is impossible to reproduce it. So I switch my interest to other model.

@hisfog
Copy link
Owner

hisfog commented Mar 17, 2024

Apologies for the delayed response, For reproducing results on KITTI,please DO NOT use the latest code release (I'm not sure what may cause these issues above). Instead, you can kindly utilize the following version by

git checkout 6a1e997f97caef8de080bb2873f71cfbad9a8047

which is consistent with the implementation of paper SQLdepth, without any additional modifications.

@jerry-ryu
Copy link

@Lavreniuk @hisfog
Thank you for your kind response, I will try again and let you know.

@jerry-ryu
Copy link

jerry-ryu commented Mar 19, 2024

@hisfog
Thank you so much, I was finally able to reproduce SQLdepth on resnet50 1024x320.

I will post my experimental results and argsfiles for those who want to train SQLdepth.

-Depth metrics:
paper:
image

ResNet50 320x1024 trained:
image

ConvNext 192x640 trained:
image

ResNet50 320x1024

-args:

  1. Do not use latest code realease
    git checkout 6a1e997f97caef8de080bb2873f71cfbad9a8047

Apologies for the delayed response, For reproducing results on KITTI,please DO NOT use the latest code release (I'm not sure what may cause these issues above). Instead, you can kindly utilize the following version by

git checkout 6a1e997f97caef8de080bb2873f71cfbad9a8047

which is consistent with the implementation of paper SQLdepth, without any additional modifications.

  1. args files
--data_path /mnt2/RG/data
--log_dir /mnt2/RG/SfMNeXt-Impl/sqldepth_log/
--model_name resnet_320x1024
--dataset kitti 
--eval_split eigen
--backbone resnet_lite
--height 320 
--width 1024
--batch_size 10
--num_epochs 25
--scheduler_step_size 15
--model_dim 32
--patch_size 20
--dim_out 128
--query_nums 128
--num_features 256
--num_layers 50
--min_depth 0.001
--max_depth 80.0
--use_stereo
--load_weights_folder /mnt2/RG/SfMNeXt-Impl/sqldepth_log/resnet_320x1024/models/weights_24
--eval_mono
--post_process

ConvNext 192x640

(Due to lack of gpu capacity, 192x640 was used instead of 320x1024)
-args:

  1. Do not use latest code realease
    git checkout 6a1e997f97caef8de080bb2873f71cfbad9a8047

Apologies for the delayed response, For reproducing results on KITTI,please DO NOT use the latest code release (I'm not sure what may cause these issues above). Instead, you can kindly utilize the following version by

git checkout 6a1e997f97caef8de080bb2873f71cfbad9a8047

which is consistent with the implementation of paper SQLdepth, without any additional modifications.

  1. args files
--data_path /mnt2/RG/data
--log_dir /mnt2/RG/SfMNeXt-Impl/sqldepth_log/
--model_name cvnXt_192x640
--dataset kitti 
--eval_split eigen 
--backbone convnext_large
--height 192 
--width 640
--batch_size 8
--num_epochs 25
--scheduler_step_size 15
--model_dim 32
--patch_size 16
--dim_out 64
--query_nums 64
--dec_channels 1024 512 256 128
--min_depth 0.001
--max_depth 80.0
--use_stereo
--load_weights_folder /mnt2/RG/SfMNeXt-Impl/sqldepth_log/cvnXt_192x640/models/weights_24
--eval_mono
--post_process

Thank you again for your wonderful code and congratulations paper accept!

p.s.
I don't think there's any special change between the commit you told me and the latest code, so if you have any ideas about what made the experimental results significantly different, I'd appreciate it if you could tell me.

@NoelShin
Copy link

Thank you @hisfog and @jerry-ryu for the kind responses and sharing the experiment settings.

Background: I was in the same situation where I couldn't get the similar results to the numbers reported in the paper when using the latest code. Now knowing this issue, I'm training with the suggested branch, but curious what caused the difference in my result.

I checked the differences between the latest commit and 6a1e997, and the most notable difference I can find was the filename changes in splits/eigen_zhou/train_files.txt which can possibly affect the training. @hisfog, do you think this is the cause?

@jerry-ryu
Copy link

@NoelShin
I looked it up after seeing your reply, and it seems quite reasonable. Thank you for finding it!!

@XIAN-XIAN-X
Copy link

Thank you @hisfog and @jerry-ryu for the kind responses and sharing the experiment settings.

Background: I was in the same situation where I couldn't get the similar results to the numbers reported in the paper when using the latest code. Now knowing this issue, I'm training with the suggested branch, but curious what caused the difference in my result.

I checked the differences between the latest commit and 6a1e997, and the most notable difference I can find was the filename changes in splits/eigen_zhou/train_files.txt which can possibly affect the training. @hisfog, do you think this is the cause?

hello!I notice that too.Do you know which paper the old split came from?

@chaoying0115
Copy link

非常感谢,我终于能够在 resnet50 1024x320 上重现 SQLdepth。

我将为那些想要训练 SQLdepth 的人发布我的实验结果和 argsfile。

**-深度指标:**纸: image

ResNet50 320x1024 训练: image

ConvNext 192x640 训练: image

ResNet50 320×1024

-参数:

  1. 不要使用最新的代码 realease
    git checkout 6a1e997f97caef8de080bb2873f71cfbad9a8047

对于延迟的响应,我们深表歉意,为了在KITTI上重现结果,请不要使用最新的代码版本(我不确定是什么原因导致了上述这些问题)。相反,您可以通过以下方式使用以下版本

git checkout 6a1e997f97caef8de080bb2873f71cfbad9a8047

这与论文 SQLdepth 的实现一致,无需任何额外的修改。

  1. args文件
--data_path /mnt2/RG/data
--log_dir /mnt2/RG/SfMNeXt-Impl/sqldepth_log/
--model_name resnet_320x1024
--dataset kitti 
--eval_split eigen
--backbone resnet_lite
--height 320 
--width 1024
--batch_size 10
--num_epochs 25
--scheduler_step_size 15
--model_dim 32
--patch_size 20
--dim_out 128
--query_nums 128
--num_features 256
--num_layers 50
--min_depth 0.001
--max_depth 80.0
--use_stereo
--load_weights_folder /mnt2/RG/SfMNeXt-Impl/sqldepth_log/resnet_320x1024/models/weights_24
--eval_mono
--post_process

转换下一个 192x640

(由于 GPU 容量不足,使用 192x640 而不是 320x1024) -参数:

  1. 不要使用最新的代码 realease
    git checkout 6a1e997f97caef8de080bb2873f71cfbad9a8047

对于延迟的响应,我们深表歉意,为了在KITTI上重现结果,请不要使用最新的代码版本(我不确定是什么原因导致了上述这些问题)。相反,您可以通过以下方式使用以下版本

git checkout 6a1e997f97caef8de080bb2873f71cfbad9a8047

这与论文 SQLdepth 的实现一致,无需任何额外的修改。

  1. args文件
--data_path /mnt2/RG/data
--log_dir /mnt2/RG/SfMNeXt-Impl/sqldepth_log/
--model_name cvnXt_192x640
--dataset kitti 
--eval_split eigen 
--backbone convnext_large
--height 192 
--width 640
--batch_size 8
--num_epochs 25
--scheduler_step_size 15
--model_dim 32
--patch_size 16
--dim_out 64
--query_nums 64
--dec_channels 1024 512 256 128
--min_depth 0.001
--max_depth 80.0
--use_stereo
--load_weights_folder /mnt2/RG/SfMNeXt-Impl/sqldepth_log/cvnXt_192x640/models/weights_24
--eval_mono
--post_process

再次感谢您的精彩代码,并祝贺论文接受!

p.s. 我不认为你告诉我的提交和最新代码之间有任何特别的变化,所以如果你对是什么让实验结果显着不同有任何想法,如果你能告诉我,我将不胜感激。

您好我尝试复现了resnet50 640x192 ,但是得到的效果相差很多
image

这是我的args_res50_kitti_192x640_train.txt

--data_path /home/ccy/project/kitti_data/
--dataset kitti
--eval_split eigen
--height 192
--width 640
--batch_size 6
--num_epochs 25
--model_dim 64
--patch_size 16
--query_nums 120
--scheduler_step_size 15
--eval_mono
--load_weights_folder /home/Process3/tmp/mdp/res50_models/weights_19
--post_process
--min_depth 0.001
--max_depth 80.0
--ext jpg
--model_name mdp2
--log_dir /home/ccy/tmp/

这是args_res50_kitti_192x640_eval.txt
--data_path /home/ccy/project/kitti_data/
--dataset kitti
--eval_split eigen
--height 192
--width 640
--batch_size 6
--model_dim 64
--patch_size 16
--query_nums 120
--eval_mono
--load_weights_folder /home/ccy/tmp/mdp2/models/weights_8/
--post_process
--min_depth 0.01
--max_depth 80.0
--save_pred_disps

我使用的数据集就是monodepth2对应处理的kitti_data
image

探索很久不知道具体原因 非常期待您的回复和指导,谢谢

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants