Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UCF101 Training from Scratch #11

Closed
ardasnck opened this issue Dec 9, 2016 · 12 comments
Closed

UCF101 Training from Scratch #11

ardasnck opened this issue Dec 9, 2016 · 12 comments

Comments

@ardasnck
Copy link

ardasnck commented Dec 9, 2016

Thank you very much for your contribution on C3D.

Is it possible to provide some information about UCF101 Training from scratch instead of finetuning? It would be very helpful to provide a graph or at least some numerical data that shows the test accuracy/loss on each epoch so that we can compare our on-going training.

Thanks.

@hx173149
Copy link
Owner

Hi @ardasnck
I am a little busy in recent days, I think I can do the evaluation in next week.

@ardasnck
Copy link
Author

@hx173149 sure! i can't reproduce the same results with the paper on my own tensorflow implementation. So if you can get similar results after your evaluation, it would be great to add your train-from-scratch implementation in this repository.

@hx173149
Copy link
Owner

@ardasnck if you want to get the same acc with the paper you must do fine tuning from sports-1M, the paper has said it. Actually you can reference this issue #2, and I have tried that if I don't do the fine tuning I just get the 33% acc.
Cheers

@ardasnck
Copy link
Author

ardasnck commented Dec 13, 2016

@hx173149 yeah i know issue #2 and also read the C3D official documentation and paper about fine-tuning. But my question is exactly on training from scratch(not fine-tuning). Actually i got 40% accuracy when I train from the scratch and you mentioned that you only reached to 33%. This https://docs.google.com/document/d/1-QqZ3JHd76JfimY4QKqOojcEaf5g3JS0lNh-FHTxLag states that they reached 45% so I was wondering what could be the potential reason for the difference? Also another observation that loss value in tensorflow is clearly higher than caffe implementation during training...

@hx173149
Copy link
Owner

Hi @ardasnck I think I have some free time in next days,I will reproduce my result once more... and have you ever try the caffe version code? Did it can get the 45% accuracy with training from scratch? I am curious about this problem too...
PS: I can't open the URL page you mentioned upside.
Cheers

@ardasnck
Copy link
Author

ardasnck commented Dec 20, 2016

Hi @hx173149. I updated the link once again but I'm not sure what's happening with that...
For the training from scratch: Yes I run the caffe version of the code on my machine and I got 42.88% accuracy (note that I used batch size 16 because of my gpu capacity). I also edited my own tensorflow implementation (some minor changes) and I got 42.64%. I believe this shows that it works as it should be.
PS: In case of the link doesn't work again , I was referring to C3D-User Guide document which author provides it on his project page.

@hx173149
Copy link
Owner

Hi @ardasnck
There are 13318 videos in UCF101 dataset, I used 11318 videos for traning and 2000 videos for test, and I can get a 50% top 1 accuracy after 8000 iterations with batch_size is 64.
This is my traning from scratch top-1 accuracy curve, cross entropy curve, total loss(cross entropy + regularized loss) curve:
image
image
image

@ardasnck
Copy link
Author

Dear @hx173149 ,
Thank you very much for the very detailed feedback. This is great that you reach to 50% top 1 accuracy. Did you use the same train and test split that original caffe implementation used? Because paper claims that they got 45% accuracy and when I run their code on my own machine (batch size 16) i got 42.9% accuracy.

@gy2256
Copy link

gy2256 commented Feb 1, 2017

Hello,

I also want to train from scratch but I am kind of new to Deep Learning, especially using 3d convNet. Could you briefly explain the training mechanism? Based on my understanding, you feed in 16 frames as input and a label to perform supervised learning. But do you use all the frames for training? I would really appreciate your help if you can briefly explain the whole data preparation and training process.

(I am trying to rewrite everything in Keras. So far I have defined the nets but I do not know how to prepare the video data)

@hx173149
Copy link
Owner

hx173149 commented Feb 6, 2017

Hello @gyang1011
My training mechanism is like this:
First I will choose 64 samples randomly for each iteration
Then I will slice a 3.2 seconds(about 16 frames) randomly from each sample for training.

@LongLong-Jing
Copy link

@ardasnck @hx173149 @gyang1011
I trained this network and got 33% in split 1 of UCF101. However, I think the accuracy of this 8-layer convolution network should be 33%. In paper C3D, the author use a 5-layer convolution network (not 8-layer convolution), so they can get 45% in UCF101. This means that the structure of the network training from scratch and pre-trained in Sport 1M is different!

@hx173149
Copy link
Owner

@LongLong-Jing I think you are right, maybe there have some duplication samples among my train list and test list, I am not very sure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants