Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Advanced Machine Learning #231

Open
DonaldTsang opened this issue Apr 13, 2018 · 16 comments
Open

Advanced Machine Learning #231

DonaldTsang opened this issue Apr 13, 2018 · 16 comments

Comments

@DonaldTsang
Copy link

  1. Is it possible to replace caffe (the slowest in the Python platform) with PyTorch (fastest overall) or MXNet (can beat PyTorch in parallel GPUs)
  2. Is it possible to replace VGG7 with Inception or ResNet, which out-performs VGG7?
@DonaldTsang
Copy link
Author

Some idea: categorize images in the database into "pure", "single-JPG", "double-JPG", "multi-JPG" (JPG as in JPG compression).
Use that as the metric to how "noisy" an image is, and then proceed to apply the right amount of de-noising to not over-shoot.
Only the "pure" images should be used as the base dataset for testing reverse image compression and compression.
Reference: https://www.politesi.polimi.it/bitstream/10589/132721/1/2017_04_Chen.pdf

@nagadomi
Copy link
Owner

nagadomi commented Apr 13, 2018

Is it possible to replace caffe (the slowest in the Python platform) with PyTorch (fastest overall) or MXNet (can beat PyTorch in parallel GPUs)

waifu2x is implemented in LuaJIT/Torch, not Caffe. Torch already seems to outdated, it is good to switch to PyTorch, but for now I don't have resource to do it.
tsurumeso has released the chainer version.
https://github.com/tsurumeso/waifu2x-chainer

Is it possible to replace VGG7 with Inception or ResNet, which out-performs VGG7?

ResNet model is already found in dev branch.
benchmark: https://github.com/nagadomi/waifu2x/blob/dev/appendix/benchmark.md
Unfortunately it is much slower than the current model, so it can not be used in web services.

Some idea: categorize images in the database into "pure", "single-JPG", "double-JPG", "multi-JPG" (JPG as in JPG compression).

It has already been realized. waifu2x can specify JPEG quality and compression times for real-time data augmentation at training. The dataset has been constructed with images that is not JPEG compressed.

@DonaldTsang
Copy link
Author

DonaldTsang commented Apr 14, 2018

@nagadomi

Unfortunately it is much slower than the current model

Maybe reduce the size of the ResNet by using less modules? And compare that with VGG5/7/9/16/19 to create a graph of epoch training speed compared to total training time and accuracy?

waifu2x can specify JPEG quality

what about auto-detection of JPEG quality? Could that be implemented as well?

@nagadomi
Copy link
Owner

nagadomi commented Apr 17, 2018

Maybe reduce the size of the ResNet by using less modules? And compare that with VGG5/7/9/16/19 to create a graph of epoch training speed compared to total training time and accuracy?

Using shallow network, the accuracy is downgraded. I think it is related to the receptive field size (it depends on the number of layers and the filter size when use fully convolutional network). I think it may be solved with dilated convolution or progressive approach.

what about auto-detection of JPEG quality? Could that be implemented as well?

I already implemented it, but it is not an open source activity. JPEG noise level can be predicted with classification task, with sets of image patches.

@DonaldTsang
Copy link
Author

@nagadomi what about using expert systems for JPEG noise level detection?

@2ji3150
Copy link

2ji3150 commented May 2, 2018

Looks like the resnet version is 2.3 times slower than upconv version. But get better quallity than the upcov with TTA (8 times slower). Which means it faster than the upcov with TTA but better quality. So it make sence to replace the normal TTA option. BTW, is there any plain to train an resnet art version model?

@DonaldTsang
Copy link
Author

DonaldTsang commented Sep 20, 2018

@2ji3150 @nagadomi New idea: NASNet

It looks like NASNet can out-perform most other neural network architecture with LESS computation.

@DonaldTsang
Copy link
Author

DonaldTsang commented Sep 20, 2018

As a reference: #216
(BTW thanks @Yolkis for suggesting that)
We should consider training speed and model generation speed.

@nagadomi
Copy link
Owner

Generally, in super resolution task, pooling layer can not be used.
In network architectures for classification task, the input resolution decreases as the number of layers increases, but in super resolution task, it is not.

@DonaldTsang
Copy link
Author

@nagadomi is it possible to see this graph (the purple parts) and see if there are alternatives for Waifu2x?
mapclean_1 3

@nagadomi
Copy link
Owner

nagadomi commented Nov 21, 2018

@DonaldTsang
I added a new model last week.
benchmark: https://github.com/nagadomi/waifu2x/blob/master/appendix/benchmark.md#art (cunet/art)
It is two cascaded U-Net extended by SEBlock(Squeeze and Excitation Networks).

Edit:
In the above figure, RefineNet (Stack-U-Net) is a similar model.

@yu45020
Copy link

yu45020 commented Nov 27, 2018

@nagadomi
I come from this issue. Thanks for sharing the new model. Have you tried atrous convolutions on image up-scaling?

There is a paper using atrous conv to segment small objects on satellite images. The model increase the atrous rates and then decrease them. I code a similar model on my manga text segmentation project and find a clear improvement on accuracy. I am rewriting and testing a similar model on image up-scaling. The preliminary result seems acceptable, and I plan to train it thoroughly on a server.

@nagadomi
Copy link
Owner

@yu45020
I have tried dilated/atrous convolution. It is better than ordinary FCN, but it does not dramatically improve. Currently, I think that Residual U-Net(Concat replaced with Add) has better speed and accuracy than full dilated convolution networks.

I also develop OCR Engine for Manga, it is a closed source product so I can not describe the details, but there is a result on P59~ of this slide (Japanese).

@yu45020
Copy link

yu45020 commented Nov 28, 2018

@nagadomi
Thanks for the advice! I will also check a U-Net like model before training.

Your project seems to complete what I desire. It is very interesting and seems to be comparable to the ABBYSS's engine. My project's in sample prediction achieves similar result, but my goal is to segment all text pixels only. Back to your product. I notice the slices come from a seminar. Do you plan to publish a technical report ?

@DonaldTsang
Copy link
Author

@nagadomi @yu45020 any news? If yes, we can write something up in #251

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants