-
-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preview server for new waifu2x #122
Comments
Finally |
Thanks for the new method. I tried your new model in caffe. Also found 2.4x improvement in scale 2x.
|
I also tested WINOGRAD. |
@nagadomi Thanks for your testing and coding. Last time I thought there was no improvement... But I forgot to comment the So I retried your new model on latest caffe with cuDNN 5.1RC. Input patch is splitted into many 480x360 with batch=1.
Here, WINOGRAD is 1.6x of IMPLICIT_GEMM, and 1.07x of IMPLICIT_PRECOMP_GEMM (this one should be the default setting). Since Caffe seems not supporting deconv layer with cuDNN, my results should be higher, if only benchmarking conv layer like in your code. Also, FFT algorithm seems not good at small kernel? Could you also try IMPLICIT_PRECOMP_GEMM on your New GTX 1080? |
Hi nagadomi, I seem to get your score on WINOGRAD.
WINOGRAD is 2.60x of IMPLICIT_GEMM, and 1.44x of IMPLICIT_PRECOMP_GEM. I guess GTX1080 would behave similar to this one. |
Here is the results. It is similar with your results. code
|
Thanks, nagadomi. In the above 20 This doesn't make too much sense, since I found
The After I test on some conditions, I found some results, which is not always consistent with cuDNN's decision.
On my GTX960, all layers using |
Correctness: |
Thanks for reporting.
It is same as pad=0. |
Thanks for this suggestion. I tested this idea and here is the result. Since cuDNN v5 does not turn on
All results are the pure forward performance based on many 480x360 patches (python is much slower on the rest process compared to c++). Padding only affects 1% on this resolution, and we can expected 2% loss on a smaller 240x180 size. I have reported these two findings (padding and |
Here is auto tuning results of new waifu2x model using cudnn.torch.
Forward Algorithm 6=WINOGRAD, so WINOGRAD is selected. |
I read your output codes. They are related to cudnnFindConvolutionForwardAlgorithm within the branch of benchmark=true? But I think auto tuning should be related to cudnnGetConvolutionForwardAlgorithm? In Caffe, I set an if conditions on this, either auto get the algorithm, or read from a self-added line for algorithm in prototxt to conduct these previous tests. |
In cudnn.torch, when cudnn.benchmark=true, cudnnFindConvolutionForwardAlgorithm is used. when cudnn.benchmark=false(by default), cudnnGetConvolutionForwardAlgorithm is used. But if user specified a forward algorithm with setMode() (self.fmode is not nil), that specific algorithm is used. In my understanding. cudnnFindConvolutionForwardAlgorithm(fastest=true) selects fastest algorithm using runtime benchmark (it should cache results. selected algo is used repeatable). cudnnGetConvolutionForwardAlgorithm(fastest=true) selects fastest algorithm using rule based logic based on NVIDIA's knowledges. |
Thanks for your explanation. I also follow your method and change to The result below is the algorithm choice for the new waifu2x model on one forward of 480x360 patch on my GTX960. (one round of initialization and one round of forward)
Here my optimized algo for conv2 is 1 instead of 6. which is consistent with my previous result that 0.5% of improvement is observed on real image inference. I believe your benchmark result totally makes sense, that algo 6 is best for conv2 on your GTX 1080. Since we found 1080 benefit 51% on that 20 layers network (instead of 44% on GTX960), WINOGRAD seems more efficient on Pascal and is indeed better for conv2 settings for the future GPU. Anyway, if Nvidia can change the policy on cuDNN, everything will be fine. Could you try comparing the overall inference speed of the entire network, between all IMPLICIT_PRECOMP_GEMM and your best decision (1-6-6-6-6-6-6) on your GTX1080 new architecture, and see how much improvement you can get (hopefully more than 8.7%)? |
It seems that best forward algo depends on a patch sizes. |
Wow... That's ... a lot of improvement, far beyond my expectation... Definitely worth optimizing it, especially for Pascal architecture. 1 . The corresponding benchmark on my GTX 960 is below, for your reference. Time is calculated on new model, all without padding.
2 . I also try to understanding how
Without padding, I will get all Algo:1. With padding, the threshold is located at 209x209 and 210x210. The number of pixels (width x height) matters, where width and height do not really matter individually. Could you verify this finding on your GTX 1080? Thanks a lot. From your previous results, |
waifu2x-caffe supported new models. https://github.com/lltcggie/waifu2x-caffe/releases/tag/1.1.5 And, now I added new models for photo. |
What's the difference between RGB and the new UpRGB model? |
I don't know much about waifu2x-caffe. You should create issue on waifu2x-caffe's repo, if the issue is only happen on waifu2x-caffe. I think UpRGB is the same model as http://waifu2x-dev.udp.jp/ . |
I merged this changes into master branch. |
I published http://waifu2x-dev.udp.jp/. This server supports new waifu2x models.
(code and pretrained models are avaliable at upconv branch)
EDIT: This changes was merged into http://waifu2x.udp.jp/
photo model is not trained yet. I will add photo models. and http://waifu2x.udp.jp/ will be switch to new model in a few weeks.
The text was updated successfully, but these errors were encountered: