overfit when using DCFNet? #3

he010103 · 2017-05-31T07:29:14Z

I use this net to train:
elseif networkType == 31
%% target
conv1 = dagnn.Conv('size', [5 5 3 32],'pad', 2, 'stride', 1, 'dilate', 1, 'hasBias', true) ;
net.addLayer('conv1', conv1, {'target'}, {'conv1'}, {'conv1f', 'conv1b'}) ;
net.addLayer('relu1', dagnn.ReLU(), {'conv1'}, {'conv1r'});

conv2 = dagnn.Conv('size', [5 5 32 32],'pad', 2, 'stride', 1, 'dilate', 1, 'hasBias', true) ;
net.addLayer('conv2', conv2, {'conv1r'}, {'conv2'}, {'conv2f', 'conv2b'}) ;
net.addLayer('relu2', dagnn.ReLU(), {'conv2'}, {'conv2r'});

conv3 = dagnn.Conv('size', [5 5 32 32],'pad', 2, 'stride', 1, 'dilate', 1, 'hasBias', true) ;
net.addLayer('conv3', conv3, {'conv2r'}, {'conv3'}, {'conv3f', 'conv3b'}) ;

conv4 = dagnn.Conv('size', [5 5 32 32],'pad', 2, 'stride', 1, 'dilate', 1, 'hasBias', true) ;
net.addLayer('conv4', conv4, {'conv3'}, {'conv4'}, {'conv4f', 'conv4b'}) ;

conv5 = dagnn.Conv('size', [5 5 32 32],'pad', 2, 'stride', 1, 'dilate', 1, 'hasBias', true) ;
net.addLayer('conv5', conv5, {'conv4'}, {'conv5'}, {'conv5f', 'conv5b'}) ;

net.addLayer('conv5_dropout' ,dagnn.DropOut('rate', 0.2),{'conv5'},{'x'});

%% search
conv1s = dagnn.Conv('size', [5 5 3 32],'pad', 2, 'stride', 1, 'dilate', 1, 'hasBias', true) ;
net.addLayer('conv1s', conv1s, {'search'}, {'conv1s'}, {'conv1f', 'conv1b'}) ;
net.addLayer('relu1s', dagnn.ReLU(), {'conv1s'}, {'conv1sr'});

conv2s = dagnn.Conv('size', [5 5 32 32],'pad', 2, 'stride', 1, 'dilate', 1, 'hasBias', true) ;
net.addLayer('conv2s', conv2s, {'conv1sr'}, {'conv2s'}, {'conv2f', 'conv2b'}) ;
net.addLayer('relu2s', dagnn.ReLU(), {'conv2s'}, {'conv2sr'});

conv3s = dagnn.Conv('size', [5 5 32 32],'pad', 2, 'stride', 1, 'dilate', 1, 'hasBias', true) ;
net.addLayer('conv3s', conv3s, {'conv2sr'}, {'conv3s'}, {'conv3f', 'conv3b'}) ;

conv4s = dagnn.Conv('size', [5 5 32 32],'pad', 2, 'stride', 1, 'dilate', 1, 'hasBias', true) ;
net.addLayer('conv4s', conv4s, {'conv3s'}, {'conv4s'}, {'conv4f', 'conv4b'}) ;

conv5s = dagnn.Conv('size', [5 5 32 32],'pad', 2, 'stride', 1, 'dilate', 1, 'hasBias', true) ;
net.addLayer('conv5s', conv5s, {'conv4s'}, {'conv5s'}, {'conv5f', 'conv5b'}) ;

net.addLayer('conv5s_dropout' ,dagnn.DropOut('rate', 0.2),{'conv5s'},{'z'});
window_sz = [125,125];

end

Question: The CLE and objective (both train and val) decreases during training. But I test the saved model, surprisingly to find that the tracker fail to track the target as the epoch increases. I use the earlier model which the epoch is lower， the tracker track successfully. So I think maybe the model is overfitting? So I test one video in the train set. If the model become overfitting, tracker may perform excellently in the training set. However, the tracker fail to track the target in the early stage. Is there any bug in the layer DCF?
Another question is in the getlmdbRAM.m: imdb.images.set(randperm(num_all_frame,100)) = int8(2); The program select randomly 100 frames in the whole frames sequences to be the val set. But some val frames may be adjacent to the train frames. Maybe it will be better to choose the val set by the videos which don't exist in the train set?

The text was updated successfully, but these errors were encountered:

foolwood · 2017-05-31T07:52:50Z

Thank you for your attention.
You can see the config file in DCFNet/training/cnn_dcf_init.m , I tested 25 networks and find the same issue like you. So I design the control group net 11-15 v.s. net 16-20 .The inherent reason is that, when you add a padding in the network, there is a risk that it learns a bias at the centre not overfitting to the training set. The network simple get a gaussian response no matter what samples look like.

The solution is removing padding or using real crop pair ( I use centre crop for both training crop and detection crop, if using real crop, it hurts training speed.)

he010103 · 2017-06-02T01:57:44Z

I change the network structure:
%% target
conv1 = dagnn.Conv('size', [7 7 3 96],'pad', 0, 'stride', 1, 'dilate', 1, 'hasBias', true) ;
net.addLayer('conv1', conv1, {'target'}, {'conv1'}, {'conv1f', 'conv1b'}) ;
net.addLayer('relu1', dagnn.ReLU(), {'conv1'}, {'conv1x'});

conv2 = dagnn.Conv('size', [5 5 96 256],'pad', 0, 'stride', 1, 'dilate', 1, 'hasBias', true) ;
net.addLayer('conv2', conv2, {'conv1x'}, {'conv2'}, {'conv2f', 'conv2b'}) ;
net.addLayer('norm1', dagnn.LRN('param',[5 1 0.0001/5 0.75]), {'conv2'}, {'conv2x'});
net.addLayer('conv2_dropout' ,dagnn.DropOut('rate', 0.0),{'conv2x'},{'conv2x_d'});

conv3 = dagnn.Conv('size', [3 3 256 512],'pad', 0, 'stride', 1, 'dilate', 1, 'hasBias', true) ;
net.addLayer('conv3', conv3, {'conv2x_d'}, {'conv3'}, {'conv3f', 'conv3b'}) ;
net.addLayer('norm2', dagnn.LRN('param',[5 1 0.0001/5 0.75]), {'conv3'}, {'conv3n'});
net.addLayer('conv3_dropout' ,dagnn.DropOut('rate', 0.2),{'conv3n'},{'x'});

%% search
conv1s = dagnn.Conv('size', [7 7 3 96],'pad', 0, 'stride', 1, 'dilate', 1, 'hasBias', true) ;
net.addLayer('conv1s', conv1s, {'search'}, {'conv1s'}, {'conv1f', 'conv1b'}) ;
net.addLayer('relu1s', dagnn.ReLU(), {'conv1s'}, {'conv1sx'});

conv2s = dagnn.Conv('size', [5 5 96 256],'pad', 0, 'stride', 1, 'dilate', 1, 'hasBias', true) ;
net.addLayer('conv2s', conv2s, {'conv1sx'}, {'conv2s'}, {'conv2f', 'conv2b'}) ;
net.addLayer('norm1s', dagnn.LRN('param',[5 1 0.0001/5 0.75]), {'conv2s'}, {'conv2sx'});
net.addLayer('conv2s_dropout' ,dagnn.DropOut('rate', 0.0),{'conv2sx'},{'conv2sx_d'});

conv3s = dagnn.Conv('size', [3 3 256 512],'pad', 0, 'stride', 1, 'dilate', 1, 'hasBias', true) ;
net.addLayer('conv3s', conv3s, {'conv2sx_d'}, {'conv3s'}, {'conv3f', 'conv3b'}) ;
net.addLayer('norm2s', dagnn.LRN('param',[5 1 0.0001/5 0.75]), {'conv3s'}, {'conv3sn'});
net.addLayer('conv3s_dropout' ,dagnn.DropOut('rate', 0.2),{'conv3sn'},{'z'});
window_sz = [113,113];

Because the window_sz is 113, I choose the target_sz 113/(padding+1) = 45.2
target_sz = [45.2,45.2];
sigma = sqrt(prod(target_sz))/10;
The network continue to training without overfitting.
I carefully read the training code and testing code and have some questions:
1、In the getlmdbRAM stage, all the sample are center crop and the padding is constant. If using random crop and then record the position of target and using different padding, the samples can be more complicated. Therefore the powerful cnn can learn robust features?
2、It is possible to add dropout in the network?
3、I test the demo in the github code and find that the speed is lower than 5fps. But the paper states that the speed can be 60fps?

foolwood · 2017-06-02T03:33:45Z

1.I've tried to random crop like GOTURN, but it's really slow for matlab implementation. you can look from the github history[79d12e8] for some reference.
But I don't think that can boost the performance, since kcf is operated by circular correlation，the sample of negative/positive won't make the net a bias to position.

2.Maybe you can test different network config.
I only have 2 gpus, so I can't do large-scale experiment. The contribution of this work is end-to-end learning (the DCF layer). You can replace any network as feature extractor like (vgg+ fasterRCNN, ResNet + faster RCNN...XXNet+faster RCNN).

3.You should use my network config and run in gpu device without imshow (image frome gpu to cpu and change to uint8 are very slow.).
If you test the demo, you can find the time contains a lot of unnecessary process like vl_setupnn(), vl_imread(which can rather speed up in multicore) and a slow response map implementation. I will update this in this week ;
Your network contains 512 feature maps, it‘s really slow for fft. 10x slower than 32 channels.

foolwood · 2017-06-04T14:01:37Z

@he010103

Hi,
I find the reason of slow speed.
The default parameters is for cpu setting.

In DCFNet/DCFNet/run_DCFNet.m line 26

state.gpu = false;

Usually I will pass appropriate parameters to this function, so I did not find the problem.

In addition, in order to fit the VOT challenge, the program was made a little complex. The most simple version can run far more than 60fps. Unfortunately, our Institute blackout tonight, and I will upload the simple version [on the server now] to github tomorrow morning.

Thank you for your attention.

foolwood · 2017-06-05T00:57:27Z

@he010103
You can try the new version, it can run in 100FPS, even this is not the simplest version.
不过似乎没有人关注这项工作。。。

he010103 · 2017-06-07T02:48:09Z

Great work. Thanks for you contribution. I will try this version later. I try to run GOTURN using caffe. When simplify the caffenet, goturn can run faster than 100fps in cpu. So I believe DCFNet can run faster than that. I think matconvnet is not so lightweight because it costs much time loading the network(usually > 100s).

foolwood · 2017-06-07T03:32:13Z

@he010103
LOL. Your gpu Device is GTX10xx or GTX titan X pascal?
I have encountered similar problems. The command gpuDevice(1) takes minutes in the MATLAB when using Pascal gpu device.

Here is the solution.

solution2

he010103 · 2017-06-07T04:48:12Z

My gpu device is GTX1080. I also try the code in k40 with the some problem. So do you use 'nvidia-smi -i 0 -pm ENABLED' and 'export CUDA_CACHE_MAXSIZE=4200000000' to solve the problem?@foolwood

foolwood · 2019-02-10T11:15:52Z

@he010103
^_^
Problem solved at SiamRPN++.

foolwood closed this as completed Jun 1, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

overfit when using DCFNet? #3

overfit when using DCFNet? #3

he010103 commented May 31, 2017 •

edited

Loading

foolwood commented May 31, 2017 •

edited

Loading

he010103 commented Jun 2, 2017 •

edited

Loading

foolwood commented Jun 2, 2017 •

edited

Loading

foolwood commented Jun 4, 2017

foolwood commented Jun 5, 2017

he010103 commented Jun 7, 2017 •

edited

Loading

foolwood commented Jun 7, 2017

he010103 commented Jun 7, 2017 •

edited

Loading

foolwood commented Feb 10, 2019

overfit when using DCFNet? #3

overfit when using DCFNet? #3

Comments

he010103 commented May 31, 2017 • edited Loading

foolwood commented May 31, 2017 • edited Loading

he010103 commented Jun 2, 2017 • edited Loading

foolwood commented Jun 2, 2017 • edited Loading

foolwood commented Jun 4, 2017

foolwood commented Jun 5, 2017

he010103 commented Jun 7, 2017 • edited Loading

foolwood commented Jun 7, 2017

he010103 commented Jun 7, 2017 • edited Loading

foolwood commented Feb 10, 2019

he010103 commented May 31, 2017 •

edited

Loading

foolwood commented May 31, 2017 •

edited

Loading

he010103 commented Jun 2, 2017 •

edited

Loading

foolwood commented Jun 2, 2017 •

edited

Loading

he010103 commented Jun 7, 2017 •

edited

Loading

he010103 commented Jun 7, 2017 •

edited

Loading