Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to training? #52

Open
xhsoldier opened this issue Jul 14, 2017 · 41 comments
Open

How to training? #52

xhsoldier opened this issue Jul 14, 2017 · 41 comments

Comments

@xhsoldier
Copy link

Using this:
PSPNet/evaluation/prototxt/pspnet101_VOC2012_473.prototxt
change the data layer to the training data?

@qizhuli
Copy link

qizhuli commented Jul 14, 2017

Unfortunately, not quite, because:

  1. It doesn't have the auxiliary loss branch used in training.
  2. Also it doesn't provide a solver prototxt.
  3. The initialisation model is not provided. If you look closely enough, you'll realise the ResNet-101 backbone in the released prototxt file is different from the original ResNet. You can't initialise this backbone with the weights released by Kaiming He et al. So either change the backbone to match the original ResNet structure, or train your own initialisation weights.
  4. But more importantly, the model is trained with a BN layer that syncs across GPUs, whereas the BN layer in this repo doesn't do that, I believe. Apparently training batch norm parameters is key to achieving the reported IoU. You can try to replace all the BN layers with SyncBN in this repo (see explanation on their relationship in this issue), train away on multiple GPUs, and see what happens. It would be great if you let us know what you get. I am really curious about that.

Hope this answers your question.

@landiaokafeiyan
Copy link

Hi, I use trian.prototxt add the data layer and loss layer ,when I set crop=347, the error is cudaSuccess out of memory, when I set crop=147, error is number of labels must match number of predictions, could you help me with that? BTW, i also confuse how to make sure source image size is same as label image size?

name: "pspnet101_VOC2012"

layer {
name: "data"
type: "ImageSegData"
top: "data"
top: "label"
#top: "data_dim"
include {
phase: TRAIN
}
transform_param {
mirror: true
crop_size: 473
mean_value: 104.008
mean_value: 116.669
mean_value: 122.675
scale_factors: 0.5
scale_factors: 0.75
scale_factors: 1
scale_factors: 1.25
scale_factors: 1.5
scale_factors: 1.75
scale_factors: 2.0
}
image_data_param {
root_folder: "/pspnet/"
source: "
/pascal_voc_train_aug.txt"
batch_size: 2
shuffle: true
label_type: PIXEL
}
}

layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "conv6_interp"
bottom: "label"
top: "loss"
include {
phase: TRAIN
}
loss_param {
ignore_label: 255
}
}

@xhsoldier
Copy link
Author

@landiaokafeiyan Does your training prototxt work? How about the solver.txt?

@landiaokafeiyan
Copy link

我上面的问题还没有解决,所以我不确定是不是work

@landiaokafeiyan
Copy link

@xhsoldier 你可以加我扣扣我们讨论一下 307821808

@xhsoldier
Copy link
Author

after reading the paper, I found the loss function is two softmax loss function addup together with the weight of 0.4.
Take pspnet101_VOC2012_473.prototxt as an example, I guess the loss function should be added to conv6(not conv6_interp ) and conv4_23 * 0.4, L= conv6 + 0.4*conv4_23, then softmax with the label. But the training will be difficult, it will need multi-gpu training because the batch size is big.

@yytzjgsu
Copy link

@xhsoldier Maybe you could set "iter_size: n" parameter in the solver.prototxt to make your actual batchsize be equal to n * batchsize, avoiding the insufficient GPU memory issue of training with a single GPU.

@bhadresh74
Copy link

@xhsoldier
Would you mind providing train_val.prototxt?
Or any script you used while training and testing?

@xhsoldier
Copy link
Author

Here is the training proto, I can train by using these protos. https://github.com/SoonminHwang/caffe-segmentation/tree/master/pspnet/models

@bhadresh74
Copy link

@xhsoldier
So I took one of your training model (pspnet101_VOC2012_train.prototxt) and started training on my dataset which are of size 473X473 and total classes 3.
I am getting this weird numbers while training.

I1010 16:55:58.493882 142934 solver.cpp:229] Iteration 0, loss = 92.0984
I1010 16:55:58.493952 142934 solver.cpp:245]     Train net output #0: loss = 87.3365 (* 1 = 87.3365 loss)
I1010 16:55:58.493966 142934 solver.cpp:245]     Train net output #1: loss_aux = 11.9047 (* 0.4 = 4.7619 loss)
I1010 16:55:58.494001 142934 sgd_solver.cpp:106] Iteration 0, lr = 0.0001
I1010 16:56:19.944349 142934 solver.cpp:229] Iteration 20, loss = 122.271
I1010 16:56:19.944444 142934 solver.cpp:245]     Train net output #0: loss = 87.3365 (* 1 = 87.3365 loss)
I1010 16:56:19.944473 142934 solver.cpp:245]     Train net output #1: loss_aux = 87.3365 (* 0.4 = 34.9346 loss)
I1010 16:56:19.944577 142934 sgd_solver.cpp:106] Iteration 20, lr = 0.0001
I1010 16:56:41.425942 142934 solver.cpp:229] Iteration 40, loss = 122.271
I1010 16:56:41.426399 142934 solver.cpp:245]     Train net output #0: loss = 87.3365 (* 1 = 87.3365 loss)
I1010 16:56:41.426414 142934 solver.cpp:245]     Train net output #1: loss_aux = 87.3365 (* 0.4 = 34.9346 loss)
I1010 16:56:41.426425 142934 sgd_solver.cpp:106] Iteration 40, lr = 0.0001
I1010 16:57:02.937829 142934 solver.cpp:229] Iteration 60, loss = 122.271
I1010 16:57:02.937969 142934 solver.cpp:245]     Train net output #0: loss = 87.3365 (* 1 = 87.3365 loss)
I1010 16:57:02.937994 142934 solver.cpp:245]     Train net output #1: loss_aux = 87.3365 (* 0.4 = 34.9346 loss)
I1010 16:57:02.938014 142934 sgd_solver.cpp:106] Iteration 60, lr = 0.0001
I1010 16:57:24.397997 142934 solver.cpp:229] Iteration 80, loss = 122.271
I1010 16:57:24.398241 142934 solver.cpp:245]     Train net output #0: loss = 87.3365 (* 1 = 87.3365 loss)
I1010 16:57:24.398254 142934 solver.cpp:245]     Train net output #1: loss_aux = 87.3365 (* 0.4 = 34.9346 loss)
I1010 16:57:24.398267 142934 sgd_solver.cpp:106] Iteration 80, lr = 0.0001
I1010 16:57:45.878556 142934 solver.cpp:229] Iteration 100, loss = 122.271
I1010 16:57:45.878654 142934 solver.cpp:245]     Train net output #0: loss = 87.3365 (* 1 = 87.3365 loss)
I1010 16:57:45.878666 142934 solver.cpp:245]     Train net output #1: loss_aux = 87.3365 (* 0.4 = 34.9346 loss)
I1010 16:57:45.878677 142934 sgd_solver.cpp:106] Iteration 100, lr = 0.0001
I1010 16:58:07.367333 142934 solver.cpp:229] Iteration 120, loss = 122.271
I1010 16:58:07.367502 142934 solver.cpp:245]     Train net output #0: loss = 87.3365 (* 1 = 87.3365 loss)
I1010 16:58:07.367516 142934 solver.cpp:245]     Train net output #1: loss_aux = 87.3365 (* 0.4 = 34.9346 loss)
I1010 16:58:07.367529 142934 sgd_solver.cpp:106] Iteration 120, lr = 0.0001

The weird thing is that no matter what learning rate I set, I always end up with this same loss value 87.3365. Check this #63

With your given prototxt, I have set this hyperparameter:

net: "<path>"  		# Change this to the absolute path to your model file
base_lr: 0.0001
lr_policy: "step"
gamma: 0.1
stepsize: 15000
display: 20
momentum: 0.9
max_iter: 45000
weight_decay: 0.0005
snapshot: 1000
snapshot_prefix: "<path>"  	# Change this to the absolute path to where you wish to output solver snapshots
solver_mode: GPU

Let me know if you find something wrong here.
Any advice would be appreciated.

@qizhuli Hi buddy,
Just keeping you in loop. I took your advice and change the image size to square 473 X 473 as suggested by the paper and you.
And still I see the same weird number of loss.

Thanks in advance.

@xhsoldier
Copy link
Author

you need an initial model, just fine-tune and you will get right result, do not train from scratch.

@bhadresh74
Copy link

@xhsoldier It worked. Thank you for showing me the right direction. I am pretty much following your caffe-segmentation code. Its very well organized and easy to understand. Great work.

One last question,

I1012 11:21:22.764320 103737 solver.cpp:229] Iteration 720, loss = 0.00591884
I1012 11:21:22.764439 103737 solver.cpp:245]     Train net output #0: accuracy = 0.998973
I1012 11:21:22.764449 103737 solver.cpp:245]     Train net output #1: loss = 0.00591873 (* 1 = 0.00591873 loss)
I1012 11:21:22.764456 103737 solver.cpp:245]     Train net output #2: per_class_accuracy = 1
I1012 11:21:22.764459 103737 solver.cpp:245]     Train net output #3: per_class_accuracy = 0
I1012 11:21:22.764463 103737 solver.cpp:245]     Train net output #4: per_class_accuracy = 0
I1012 11:21:23.167104 103737 sgd_solver.cpp:106] Iteration 720, lr = 0.0001
I1012 11:22:05.639000 103737 solver.cpp:229] Iteration 740, loss = 0.0227261
I1012 11:22:05.639124 103737 solver.cpp:245]     Train net output #0: accuracy = 0.996832
I1012 11:22:05.639134 103737 solver.cpp:245]     Train net output #1: loss = 0.022726 (* 1 = 0.022726 loss)
I1012 11:22:05.639139 103737 solver.cpp:245]     Train net output #2: per_class_accuracy = 1
I1012 11:22:05.639143 103737 solver.cpp:245]     Train net output #3: per_class_accuracy = 0
I1012 11:22:05.639147 103737 solver.cpp:245]     Train net output #4: per_class_accuracy = 0
I1012 11:22:06.039103 103737 sgd_solver.cpp:106] Iteration 740, lr = 0.0001
I1012 11:22:48.640102 103737 solver.cpp:229] Iteration 760, loss = 0.0143614
I1012 11:22:48.640224 103737 solver.cpp:245]     Train net output #0: accuracy = 0.997564
I1012 11:22:48.640234 103737 solver.cpp:245]     Train net output #1: loss = 0.0143613 (* 1 = 0.0143613 loss)
I1012 11:22:48.640239 103737 solver.cpp:245]     Train net output #2: per_class_accuracy = 1
I1012 11:22:48.640244 103737 solver.cpp:245]     Train net output #3: per_class_accuracy = 0
I1012 11:22:48.640247 103737 solver.cpp:245]     Train net output #4: per_class_accuracy = 0
I1012 11:22:49.014072 103737 sgd_solver.cpp:106] Iteration 760, lr = 0.0001
I1012 11:23:31.601939 103737 solver.cpp:229] Iteration 780, loss = 0.0409522
I1012 11:23:31.602075 103737 solver.cpp:245]     Train net output #0: accuracy = 0.991696
I1012 11:23:31.602087 103737 solver.cpp:245]     Train net output #1: loss = 0.0409521 (* 1 = 0.0409521 loss)
I1012 11:23:31.602092 103737 solver.cpp:245]     Train net output #2: per_class_accuracy = 1
I1012 11:23:31.602095 103737 solver.cpp:245]     Train net output #3: per_class_accuracy = 0
I1012 11:23:31.602099 103737 solver.cpp:245]     Train net output #4: per_class_accuracy = 0
I1012 11:23:31.926669 103737 sgd_solver.cpp:106] Iteration 780, lr = 0.0001
I1012 11:24:14.647243 103737 solver.cpp:229] Iteration 800, loss = 0.0438639
I1012 11:24:14.647406 103737 solver.cpp:245]     Train net output #0: accuracy = 0.989936
I1012 11:24:14.647439 103737 solver.cpp:245]     Train net output #1: loss = 0.0438638 (* 1 = 0.0438638 loss)
I1012 11:24:14.647457 103737 solver.cpp:245]     Train net output #2: per_class_accuracy = 1
I1012 11:24:14.647465 103737 solver.cpp:245]     Train net output #3: per_class_accuracy = 0
I1012 11:24:14.647476 103737 solver.cpp:245]     Train net output #4: per_class_accuracy = 0
I1012 11:24:15.036211 103737 sgd_solver.cpp:106] Iteration 800, lr = 0.0001
I1012 11:24:57.684042 103737 solver.cpp:229] Iteration 820, loss = 0.0286792
I1012 11:24:57.684188 103737 solver.cpp:245]     Train net output #0: accuracy = 0.995645
I1012 11:24:57.684209 103737 solver.cpp:245]     Train net output #1: loss = 0.0286791 (* 1 = 0.0286791 loss)
I1012 11:24:57.684213 103737 solver.cpp:245]     Train net output #2: per_class_accuracy = 1
I1012 11:24:57.684217 103737 solver.cpp:245]     Train net output #3: per_class_accuracy = 0
I1012 11:24:57.684221 103737 solver.cpp:245]     Train net output #4: per_class_accuracy = 0
I1012 11:24:58.083516 103737 sgd_solver.cpp:106] Iteration 820, lr = 0.0001
I1012 11:25:40.446902 103737 solver.cpp:229] Iteration 840, loss = 0.0117887
I1012 11:25:40.447089 103737 solver.cpp:245]     Train net output #0: accuracy = 0.998924
I1012 11:25:40.447103 103737 solver.cpp:245]     Train net output #1: loss = 0.0117886 (* 1 = 0.0117886 loss)
I1012 11:25:40.447121 103737 solver.cpp:245]     Train net output #2: per_class_accuracy = 1
I1012 11:25:40.447127 103737 solver.cpp:245]     Train net output #3: per_class_accuracy = 0
I1012 11:25:40.447139 103737 solver.cpp:245]     Train net output #4: per_class_accuracy = 0
I1012 11:25:40.865295 103737 sgd_solver.cpp:106] Iteration 840, lr = 0.0001
I1012 11:26:23.230521 103737 solver.cpp:229] Iteration 860, loss = 0.0343943
I1012 11:26:23.230664 103737 solver.cpp:245]     Train net output #0: accuracy = 0.993892
I1012 11:26:23.230684 103737 solver.cpp:245]     Train net output #1: loss = 0.0343942 (* 1 = 0.0343942 loss)
I1012 11:26:23.230690 103737 solver.cpp:245]     Train net output #2: per_class_accuracy = 1
I1012 11:26:23.230703 103737 solver.cpp:245]     Train net output #3: per_class_accuracy = 0
I1012 11:26:23.230707 103737 solver.cpp:245]     Train net output #4: per_class_accuracy = 0

The loss seems to fluctuate little bit, but per class accuracy is pretty much darn same for all iteration.
When I worked with SegNet FCN, I used class weighting to compute loss which would consider the weighting of the class while computing the loss.
Here it throws error saying that "caffe.LossParameter" has no field named "class_weighting".
Which completely make sense as it is not implemented in PSPNet caffe related code.

So the questions is, how can I get the stable per class accuracy in my training.
My class 1 is dominating class (90% pixels belong to this class) and other class (2 and 3) are non-dominating class (10% pixels belong to either of the class).

@ThienAnh
Copy link

@landiaokafeiyan Can you share for me training script?

@Dasona
Copy link

Dasona commented Oct 17, 2017

@bhadresh74 What init weight did you use in training?

@landiaokafeiyan
Copy link

@ThienAnh i am on a holiday now. maybe i will send you next monday. i just changed the input of the test prototxt

@bhadresh74
Copy link

@Dasona I have used VOC 2012 pre-trained model as my initial weights.

@ThienAnh
Copy link

@landiaokafeiyan Thank you so much. Have enjoy holiday!

@ThienAnh
Copy link

@landiaokafeiyan Are you come back from holiday?

@landiaokafeiyan
Copy link

@ThienAnh hi can you give me your email,so i can send you all materials

@ThienAnh
Copy link

@landiaokafeiyan My Email: nguyenthienanh@gmail.com
Thank you so much.

@bhadresh74
Copy link

@landiaokafeiyan Can you forward me as well?
It would be useful to me for some further research.
Big thanks.

@ThienAnh
Copy link

HI @landiaokafeiyan .
I has success to fix Train net output #0: loss = 87.3365 (Change lr) but per_class_accuracy alway =0. It is error? (I set crop_size = 137)

Train net output #0: accuracy = 0.768128
I1025 10:18:21.159145 3436 solver.cpp:245] Train net output #1: loss = 0.771035 (* 1 = 0.771035 loss)
I1025 10:18:21.159159 3436 solver.cpp:245] Train net output #2: loss_aux = 2.8734 (* 0.4 = 1.14936 loss)
I1025 10:18:21.159171 3436 solver.cpp:245] Train net output #3: per_class_accuracy = 0.916532
I1025 10:18:21.159183 3436 solver.cpp:245] Train net output #4: per_class_accuracy = 0
I1025 10:18:21.159193 3436 solver.cpp:245] Train net output #5: per_class_accuracy = 0
I1025 10:18:21.159202 3436 solver.cpp:245] Train net output #6: per_class_accuracy = 0
I1025 10:18:21.159212 3436 solver.cpp:245] Train net output #7: per_class_accuracy = 0
I1025 10:18:21.159222 3436 solver.cpp:245] Train net output #8: per_class_accuracy = 0
I1025 10:18:21.159232 3436 solver.cpp:245] Train net output #9: per_class_accuracy = 0
I1025 10:18:21.159242 3436 solver.cpp:245] Train net output #10: per_class_accuracy = 0
I1025 10:18:21.159252 3436 solver.cpp:245] Train net output #11: per_class_accuracy = 0
I1025 10:18:21.159262 3436 solver.cpp:245] Train net output #12: per_class_accuracy = 0
I1025 10:18:21.159272 3436 solver.cpp:245] Train net output #13: per_class_accuracy = 0
I1025 10:18:21.159282 3436 solver.cpp:245] Train net output #14: per_class_accuracy = 0
I1025 10:18:21.159292 3436 solver.cpp:245] Train net output #15: per_class_accuracy = 0
I1025 10:18:21.159302 3436 solver.cpp:245] Train net output #16: per_class_accuracy = 0
I1025 10:18:21.159312 3436 solver.cpp:245] Train net output #17: per_class_accuracy = 0
I1025 10:18:21.159322 3436 solver.cpp:245] Train net output #18: per_class_accuracy = 0
I1025 10:18:21.159332 3436 solver.cpp:245] Train net output #19: per_class_accuracy = 0.551905
I1025 10:18:21.159343 3436 solver.cpp:245] Train net output #20: per_class_accuracy = 0
I1025 10:18:21.159353 3436 solver.cpp:245] Train net output #21: per_class_accuracy = 0
I1025 10:18:21.159363 3436 solver.cpp:245] Train net output #22: per_class_accuracy = 0
I1025 10:18:21.159373 3436 solver.cpp:245] Train net output #23: per_class_accuracy = 0

@landiaokafeiyan
Copy link

@ThienAnh Hi,

Maybe you must reduce your learning rate.

Regards,

Liangyan Li

@ThienAnh
Copy link

@landiaokafeiyan
Thanks for quick reply.
I will try with some change learning rate.

@landiaokafeiyan
Copy link

@ThienAnh
You are welcome!

liangyan

@ThienAnh
Copy link

ThienAnh commented Nov 1, 2017

HI @xhsoldier, @landiaokafeiyan Why we need initial model? Can i using pspnet101_VOC2012.caffemodel for init model and train with my data ? (I have 21 labels: ring, clothers, box, cap, hat......It is difference with VOC2012 )

Thanks so much

@xhsoldier
Copy link
Author

@ThienAnh You can use pspnet101_VOC2012.caffemodel for tuning or training.
Why use initial model? You can not train a model without init model, otherwise it will not converge.

@ThienAnh
Copy link

ThienAnh commented Nov 1, 2017

@xhsoldier Thank you so much!

@fbi0817
Copy link

fbi0817 commented Nov 8, 2017

@xhsoldier @ThienAnh @landiaokafeiyan Hi,I am trying to train the network using my own annotated dataset. After i read the prototxt above, I am wondering the context in your pascal_voc_train_aug.txt file and where to put the images and segmentation annotation. I know in PASCAL_VOC the images are in the Imagesets folder and the segmentation annotations are in the SegmentationClass folder. So, how to organize my own dataset and what is in the pascal_voc_train_aug.txt file? Thank you!!!

@ThienAnh
Copy link

ThienAnh commented Nov 8, 2017

@fbi0817 You can search and download SegmentationClassAug for see format of label images.
In .prototxt file, please edit link of data at:
image_data_param {
root_folder: "/home/adminpxz/VOCdevkit/VOC2012"
source: "/home/adminpxz/PSPNet/PSPNet/splits/pascal_voc_train_aug.txt"
batch_size: 1
shuffle: true
label_type: PIXEL
}

In this: source: "/home/adminpxz/PSPNet/PSPNet/splits/pascal_voc_train_aug.txt" is list of training images(.jpg) and label images (.png).
And when run training, program will read image with full path: root_folder + imagelink (In pascal_voc_train_aug.txt file).

Label images is png 8 bit. If you open once image in SegmentationClassAug with msPaint, you can see that areas of object label will have value from 1-20 ( 20 classes).

@fbi0817
Copy link

fbi0817 commented Nov 8, 2017

@ThienAnh Thank you for your quick reply. I will have a try.

@mkarki2
Copy link

mkarki2 commented Jan 25, 2018

@bhadresh74 @ThienAnh @xhsoldier

Using the training prototxt file from https://github.com/SoonminHwang/caffe-segmentation/tree/master/pspnet/models
resulted in the following error. When I did some research I found out that it can happen when there is output mismatch (?) . I am trying out on the PASCAL VOC 2012 with the train model without any other changes. Any ideas why? Any one mind sharing a train.prototxt file I could try?

Thanks,

F0125 16:54:05.818322 25 math_functions.cu:121] Check failed: status == CUBLAS_STATUS_SUCCESS (11 vs. 0) CUBLAS_STATUS_MAPPING_ERROR
*** Check failure stack trace: ***
@ 0x7f03781125cd google::LogMessage::Fail()
@ 0x7f0378114433 google::LogMessage::SendToLog()
@ 0x7f037811215b google::LogMessage::Flush()
@ 0x7f0378114e1e google::LogMessageFatal::~LogMessageFatal()
@ 0x7f03789343fa caffe::caffe_gpu_asum<>()
@ 0x7f037894d4b0 caffe::SoftmaxWithLossLayer<>::Forward_gpu()
@ 0x7f03787da562 caffe::Net<>::ForwardFromTo()
@ 0x7f03787da687 caffe::Net<>::ForwardPrefilled()
@ 0x7f0378743115 caffe::Solver<>::Step()
@ 0x7f0378743b89 caffe::Solver<>::Solve()
@ 0x40b217 train()
@ 0x407380 main
@ 0x7f03773a0830 __libc_start_main
@ 0x407a39 _start
@ (nil) (unknown)

@ThienAnh
Copy link

@mkarki2 This is training prototxt file i'm using.
train.zip

@mkarki2
Copy link

mkarki2 commented Jan 26, 2018

@ThienAnh
I am still getting that issue with your protoxt file. Maybe there is something else going on for me. Thank you though.

EDIT: I tried removing the label_type: PIXEL and it at least doesn't get that error. Now I need to figure out if the training is correct.

@muralabmahmuds
Copy link

@xhsoldier @ThienAnh @landiaokafeiyan
Thank you for the ideas and the shared files!
I am just a beginner in the topic of semantic segmentation.
Anyway, is there anyone who evaluates and trains PSPNet for cityscapes? I need to see the instructions.
Please, kindly share your experience. I am having stuck with this.
Thank you very much.

@MashaFomina
Copy link

@ThienAnh
Hello! Do you make test on validation data after training? Can I get some information about you results? Accuracy?

@holyseven
Copy link

I've implemented sync batch normalization in pure tensorflow, which makes possible to train and reproduce the performance of PSPNet: https://github.com/holyseven/PSPNet-TF-Reproduce. Have a look at the code if you are interested.

@MashaFomina
Copy link

@holyseven Thank you for your code!

@engrjavediqbal
Copy link

@ThienAnh Thank you for sharing your train.prototxt file. I tried with your settings but unable to fine tune the pre-trained PSPNet. Have you had success with this settings.

@oandrienko
Copy link

oandrienko commented Sep 25, 2018

Since there are quite a few people on this thread I thought I would mention that I have also been able to train PSPNet50 (50 layer PSPNet) in Tensorflow from ResNet50 weights. Instead of implementing Batch Norm synchronization across GPUs, I was able to fit the entire model on a single Titan Xp using gradient checkpointing. I trained with a batch size of 8 and got the same accuracy as described in the ICNet paper. The repository for the project can be here: https://github.com/oandrienko/fast-semantic-segmentation/.

You can find a link to the trained PSPNet model here. There is also a usage guide for training PSPNet50 here.

@CamlinZ
Copy link

CamlinZ commented Sep 29, 2018

@ThienAnh hi, i was also try to training PSPNet lately and i doing everything ok, but when i use the eval_all.m to test my caffemodel, the whole predict image is wrong. i just want to ask your predict image is fine? i want to figure out which step i was wrong, thank you very much!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests