Skip to content

kice/vs_mxnet

master
Switch branches/tags
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 

Description

Use MXNet to accelerated Image-Processing in VapourSynth.

Installation

You can donwload MSVC Win64 build from Here

Require MXNet 1.0+

Since MXNet is very large and use many libraries to improve perforamce. We recommend install MXNet via pip.

Install the latest beta build with GPU(CUDA 10.1) support

pip install mxnet-cu101 --pre

Check here for more infomation Installing MXNet

You can check your MXNet installation with:

> python -c "import mxnet; print(mxnet.__version__)"
1.6.0

You can also check the GPU support of mxnet.

> python

>>> import mxnet as mx
>>> a = mx.nd.ones((2, 3), mx.gpu())
>>> b = a * 2 + 1
>>> b.asnumpy()
array([[ 3.,  3.,  3.],
       [ 3.,  3.,  3.]], dtype=float32)

THERE IS NO NEED TO COPY ANY DLLs TO PLUGIN FOLDER OF VAPOURSYNTH EXCEPT THE PLUGIN ITSELF

Add the follow lines to the beginning of your .vpy file for auto loading dependency

import mxnet as mx
import vapoursynth as vs

core = vs.get_core()

if not hasattr(core, 'mx'):
    core.std.LoadPlugin(r'vs_mxnet.dll', altsearchpath=True)

# Your code goes here

Due to Vapoursynth DLL loading method, by import mxnet, Python will try to help load all require dlls (like, MXNet and CUDA). If you delete core.std.LoadPlugin, it will still work for vsedit but not work under vspipe.

Usage

mx.Predict(clip clip, string symbol, string param[, int scale=1, int patch_w=0, int patch_h=0, int output_w=128, int output_h=block_w, int frame_w=3, int frame_h=True, int step_w=0, int step_h=0, int outstep_w=0, int outstep_h=0, int padding=0, int border_type=1, int ctx=0, int dev_id=0])
  • clip: Clip to process. Only planar format is float32 or int8 supported. RGB and GRAY is supported, YUV is not correctly supported.

  • symbol: MXNet symbol json file. If the plugin cannot read the file, it will try to read it from plugins64\mxnet-symbol\. You can find more MXNet models here.

  • param: The same as symbol, but for model parameters data.

  • scale: Set output shape and final frame shape form the shape of patch and input clip. It will be ignore if you manully set corresponding parameters. default: 1

  • patch_w: The horizontal block size for dividing the image during processing. Smaller value results in lower VRAM usage, while larger value may not necessarily give faster speed. The optimal value may vary according to different graphics card and image size. If patch_h is larger than clip's width, it will clamp to clip's width. default: clip's width.

  • patch_h: The same as patch_w but for vertical. default: clip's height.

  • output_w: The horizontal block size for MXNet model output. default: patch_h * scale.

  • output_h: The same as output_w but for vertical.

  • frame_w: The final output frame size. It dose not have to related to other shapes, like output shape. default: clip's width * scale.

  • frame_h: The same as frame_w but for vertical.

  • step_w: The stride of the sliding window for slicing the patch. It will clamp to clip's width if the step larger than it. default: patch_w.

  • step_h: The same as step_w but for vertical.

  • outstep_w: The stride of the sliding window for copying the model output to Vapoursynth target frame buffer. It will clamp to output frame' width if the step larger than it. default: output_w.

  • outstep_h: The same as outstep_w but for vertical.

  • padding: Add padding to the input clip before feeding the model. It will add a border to all size of the input image. default: 0 USE MXNET Pad Layer on GPU is much faster

  • output_format: Specify output frame sample format. e.g. vs.RGBS. defalut: same as input.

  • input_name: Set input name. Most MXNet model use data as input name. defalut: data.

  • ctx: Specifies which type of device to use. If GPU was chosen, cuDNN will be used by defalut.

    • 1 = CPU (default)
    • 2 = GPU
  • dev_id: Which device to use. Starting with 0.

Example

# Place Symbol file and params data into `plugins64\mxnet-symbol\` or use the full path of the files.

symbol = 'Some2x-symbol.json'
param  = 'Some2x-0000.params'
patch_w, patch_h = 400, 300

# Set input size
clip = core.resize.Bicubic(src, 960, 540)

# run some 2x upsampling model with patch size 400x300. Output size will be 1920x1080
sr2x = core.mx.Predict(src, symbol='Some2x-symbol.json', param='Some2x-0000.params', patch_w=patch_w, patch_h=patch_h, scale=2, ctx=2, dev_id=1)

# run Waifu2x 2x upconv model with pre-padding, patch size=400x300 on second GPU, output size is 1920x1080
waifu2x = core.mx.Predict(clip, symbol=r'noise0_scale2.0x_model-symbol.json', 
                     param=r'noise0_scale2.0x_model-0000.params', 
                     patch_w=patch_w,       patch_h=patch_h, 
                     output_w=patch_w*2,    output_h=patch_h*2, 
                     frame_w=1920,          frame_h=1080, 
                     step_w=patch_w,        step_h=patch_h, 
                     ctx=2, dev_id=1, scale=2)

# For multi-GPU processing (scales almost linearly). Only support data parallel now.

even = core.mx.Predict(core.std.SelectEvery(clip, 2, 0), symbol=symbol, param=param, patch_w=patch_w, patch_h=patch_h, scale=2, ctx=2, dev_id=0)
odd = core.mx.Predict(core.std.SelectEvery(clip, 2, 1), symbol=symbol, param=param, patch_w=patch_w, patch_h=patch_h, scale=2, ctx=2, dev_id=1)

res = core.std.Interleave([even, odd])

Also see muvsfunc's example.

Perforamce

OUTDATED

Here is the conclusion, generally MXNet is faster than Caffe with cuDNN enabled if the bottleneck is not GPU.

If you found that your GPU is not under full load while using Caffe, you can get significant perforamce boost by switching to MXNet. Or your GPU memory is small, you can also switch to MXNet for higher efficiency.

In this test, a 1280x720 RGB image was used as input image and resized by resize.Bicubic if needed.

Model Input Size Patch Size Output Size Speed(fps) VRAM Usage(MB) Backend
waifu2x UpRGB 1280x720 256x256 2560x1440 7.03 543 MXNet 1.3.0
waifu2x UpRGB 1280x720 1280x720 2560x1440 7.85 1815 MXNet 1.3.0
waifu2x UpRGB 1280x720 640x360 2560x1440 7.03 788 MXNet 1.3.0
waifu2x UpRGB 720x480 720x480 1440x960 21.74 958 MXNet 1.3.0
waifu2x UpRGB 720x480 720x480 1440x960 24.54 1476 MXNet 1.3.0 (2 Queues)
waifu2x UpRGB 720x480 720x480 1440x960 41.66 958 *2 MXNet 1.3.0 (2 GPUs)
waifu2x UpRGB 720x480 720x480 1440x960 47.7 1476 *2 MXNet 1.3.0 (4 Queues 2 GPUs)
waifu2x UpRGB 960x540 960x540 1920x1080 14.8 1216 MXNet 1.3.0
waifu2x UpRGB 1920x1080 1920x1080 3840x2160 3.60 3527 MXNet 1.3.0
waifu2x UpRGB 1280x720 256x256 2560x1440 2.93 527 Caffe w/ cuDNN
waifu2x UpRGB 1280x720 1280x720 2560x1440 3.11 2726 Caffe w/ cuDNN
waifu2x UpRGB 1280x720 640x360 2560x1440 3.08 959 Caffe w/ cuDNN
waifu2x UpRGB 720x480 720x480 1440x960 8.48 1622 Caffe w/ cuDNN
waifu2x UpRGB 720x480 720x480 1440x960 19.6 5976 Caffe w/ cuDNN (6 Queues)
waifu2x UpRGB 720x480 720x480 1440x960 32.8 5949 *2 Caffe w/ cuDNN (12 Queues 2 GPUs)
waifu2x UpRGB 960x540 960x540 1920x1080 5.31 1699 Caffe w/ cuDNN
waifu2x UpRGB 1920x1080 960x540 3840x2160 1.35 2254 Caffe w/ cuDNN
waifu2x RGB 1280x720 1280x720 2560x1440 1.01 1752 OpenCL (CUDA)
waifu2x RGB 1280x720 1280x720 2560x1440 0.93 1749 OpenCL (OpenCL)
waifu2x RGB 1280x720 1280x720 2560x1440 0.93 N/A OpenCL (CPU)
waifu2x RGB 1280x720 1280x720 2560x1440 1.82 1999 Caffe w/ cuDNN
waifu2x RGB 1280x720 1280x720 2560x1440 3.36 1442 MXNet 1.3.0
waifu2x RGB 2560x1440* 2560x1440 2560x1440 3.22 5155 MXNet 1.3.0
EDSR 2x 1280x720 1280x720 2560x1440 2.59 2732 MXNet 1.3.0
EDSR 2x 960x540 960x540 1920x1080 4.59 1732 MXNet 1.3.0
RCAN 2x 1280x720 1280x720 2560x1440 0.185 3015 MXNet 1.3.0
RCAN 2x 960x540 960x540 1920x1080 0.324 1916 MXNet 1.3.0
VDSR 2x (Y only) 2560x1440* 2560x1440 2560x1440 1.64 7697 MXNet 1.3.0
VDSR 2x (Y only) 1920x1080* 1920x1080 1920x1080 2.96 5857 MXNet 1.3.0
LapSRN 2x (Y only) 1280x720 1280x720 2560x1440 5.67 3310 MXNet 1.3.0
LapSRN 2x (Y only) 960x540 960x540 1920x1080 10.47 1474 MXNet 1.3.0
LapSRN 4x (Y only) 960x540 960x540 3840x2160 2.15 4565 MXNet 1.3.0
DRRN_B1U9 2x (Y only) 2560x1440* 2560x1440 2560x1440 0.496 5898 MXNet 1.3.0
DRRN_B1U9 2x (Y only) 1920x1080* 1920x1080 1920x1080 0.89 3514 MXNet 1.3.0
DRRN_B1U25 2x (Y only) 1920x1080* 1920x1080 1920x1080 0.316 4300 MXNet 1.3.0
DBPN 2x 640x360 640x360 1280x720 1.21 4987 MXNet 1.3.0
DBPN 2x 960x540 480x540 1920x1080 0.523 8090 MXNet 1.3.0
  • All cuDNN version is 7.

  • MXNet is using CUDA 9.2. (Version: mxnet_cu92-1.3.0b20180908)

  • For some models have the same the shape of output as the input, like Waifu2x RGB, we first resize/upscale the input image to target size by Bicubic, then feed into the model.

  • During testing, Waifu2x-Caffe is only utilizing around 30% of the GPU. By increasing the queues depth, we can have a significant boost; but it will take more resources and still slower than MXNet.

  • Waifu2x-Caffe is using CUDA 9.0.

  • OpenCL of Waifu2x implementation is VapourSynth-Waifu2x-w2xc.

  • All MXNet model in this test can be accessed here.

Here is the test code:

import mxnet
import vapoursynth as vs
import mvsfunc as mvf
import havsfunc as haf

core = vs.get_core(threads=20)

if not hasattr(core, 'mx'):
    core.std.LoadPlugin(r'vs_mxnet.dll', altsearchpath=True)

# How many frame to run
frames = 600

symbol = r'waifu2x\upconv_7_anime_style_art_rgb\scale2.0x_model-symbol.json'
param = r'waifu2x\upconv_7_anime_style_art_rgb\scale2.0x_model-0000.params'

src = core.lsmas.LWLibavSource(r'test.png', threads=1)
src = core.std.AssumeFPS(src, fpsnum=24000, fpsden=1001)

# If the model is only support Y channel, enable the following lines
#src = mvf.ToYUV(src, css='444', depth=32)
#src = core.std.ShufflePlanes(src, 0, vs.GRAY)
#src = core.resize.Bicubic(src, 720, 480)

src = core.resize.Bicubic(src, 720, 480, format=vs.RGBS)
src = core.std.Loop(src, frames)

block_w = src.width
block_h = src.height

scale = 2

# Waifu2x symbol file should comes with padding

def process(clip, gpu):
    return core.mx.Predict(clip, symbol=symbol, param=param,
                          patch_w = block_w,           patch_h = block_h,
                         output_w = block_w*scale,    output_h = block_h*scale,
                          frame_w = clip.width*scale,  frame_h = clip.height*scale,
                           step_w = block_w,            step_h = block_h,
                        ctx = 2, dev_id = gpu)

queue_size = 3
gpus = 2

res = []
for i in range(queue_size):
    part = process(core.std.SelectEvery(src, queue_size, i), i % gpus)
    res.append(part)

flt = core.std.Interleave(res)
flt.set_output()

Limitation

  1. If patch size is not a divsor of input image, it will overwrite some pixel near the edge and cause some perforamce issues.

  2. Padding can be done by other filter. It dose not support patch padding now.

  3. It will take long time load MXNet, please wait; or you can open an issue to tell the developer.

  4. MXNet needs large commit size, do be careful of your system maxinum commit size. But runtime memory usage is average.

  5. MXNet will take some time for cudnn auto tuning for convolution layers every time. set MXNET_CUDNN_AUTOTUNE_DEFAULT=0 to disable it. More info here.

  6. MXNet will allocate VRAM when feeding the first frame, you might get Out of Memory error. Reducing the patch size may solve it.

  7. You might need to restart the program (e.g. vsedit) after you changing the input model file.

Compilation

There are some code to bypass Vapoursynth plugin loading system, which only works on Windows. You can remove that part and replace all MXNet function calls with normal calls will work on other system. All the header you need is here MXNet C predict API

On Windows, the plugins uses LoadLibrary to dynamically load MXNet, no need for MXNet header to compile.

About

Use MXNet to accelerated Image-Processing in VapourSynth.

Resources

License

Stars

Watchers

Forks

Packages

No packages published