Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rescale the network #9

xuhao1 opened this issue Oct 17, 2020 · 27 comments

Rescale the network #9

xuhao1 opened this issue Oct 17, 2020 · 27 comments


Copy link

xuhao1 commented Oct 17, 2020

Thanks for your contribution! And I will include your license and your repo link correctly.
I also have a question, estimate facial landmarks from 224x224 image is not neccassary sometime since my input image is near 100x100, is that possible to rescale the network?
I will also try this by my own.

Copy link

Thank you!

I have experimented with this before. When I tried just halving the resolution it stopped producing useful output. If you need more speed, you can try using the new 30 point model lm_modelT_opt.onnx, which runs on 56x56 inputs and runs at about five times the frame rate as the slowest model for me. The output is noisier, but I find that with higher smoothing it can still give acceptable results and it is still very robust against bad lighting and head rotation. One thing to note is that it requires a higher cutoff threshold as it may start to hallucinate faces where there are none otherwise. You can find the changes neccessary for decoding landmarks by looking for the model_type < 0 parts here:


Lines 666 to 680 in 46e26f5

self.res = 224.
self.mean_res = self.mean_224
self.std_res = self.std_224
if model_type < 0:
self.res = 56.
self.mean_res = np.tile(self.mean, [56, 56, 1])
self.std_res = np.tile(self.std, [56, 56, 1])
self.res_i = int(self.res)
self.out_res = 27.
if model_type < 0:
self.out_res = 6.
self.out_res_i = int(self.out_res) + 1
self.logit_factor = 16.
if model_type < 0:
self.logit_factor = 8.

Copy link

xuhao1 commented Oct 17, 2020

@emilianavt Thanks for your reply, I will also trying the 56x56 network.
I am also trying to load your weights in pytorch, however it throws error

from model import *
PATH = "./weights/lm_model0.pth"
model = OpenSeeFaceLandmarks("small", 0.5, True)


TypeError                                 Traceback (most recent call last)
<ipython-input-5-d07b4f955829> in <module>
      1 PATH = "./weights/lm_model0.pth"
----> 2 model = OpenSeeFaceLandmarks("small", 0.5, True)
      3 model.load_state_dict(torch.load(PATH))
      4 model.eval()

~\Develop\OpenSeeFace\ in __init__(self, size, channel_multiplier, inference)
    135     def __init__(self, size="large", channel_multiplier=1.0, inference=False):
    136         kwargs = geffnet.mobilenetv3._gen_mobilenet_v3([size], channel_multiplier=channel_multiplier)
--> 137         super(OpenSeeFaceLandmarks, self).__init__(**kwargs)
    138         if size == "large":
    139             self.up1 = UNetUp(round_channels(960, channel_multiplier), round_channels(112, channel_multiplier), 256, (14,14))

TypeError: __init__() argument after ** must be a mapping, not MobileNetV3

My environment is Python3.7, pytorch 1.6

Copy link

xuhao1 commented Oct 17, 2020

Alright. The issue of cause by the update of genffnet. I fix the problem by adding

from geffnet.efficientnet_builder import *
from geffnet.config import layer_config_kwargs
from geffnet.activations import get_act_fn, get_act_layer

def _gen_mobilenet_v3(variant, channel_multiplier=1.0, pretrained=False, **kwargs):
    """Creates a MobileNet-V3 large/small/minimal models.
    Ref impl:
      channel_multiplier: multiplier to number of channels per layer.
    if 'small' in variant:
        num_features = 1024
        if 'minimal' in variant:
            act_layer = 'relu'
            arch_def = [
                # stage 0, 112x112 in
                # stage 1, 56x56 in
                ['ir_r1_k3_s2_e4.5_c24', 'ir_r1_k3_s1_e3.67_c24'],
                # stage 2, 28x28 in
                ['ir_r1_k3_s2_e4_c40', 'ir_r2_k3_s1_e6_c40'],
                # stage 3, 14x14 in
                # stage 4, 14x14in
                # stage 6, 7x7 in
            act_layer = 'hard_swish'
            arch_def = [
                # stage 0, 112x112 in
                ['ds_r1_k3_s2_e1_c16_se0.25_nre'],  # relu
                # stage 1, 56x56 in
                ['ir_r1_k3_s2_e4.5_c24_nre', 'ir_r1_k3_s1_e3.67_c24_nre'],  # relu
                # stage 2, 28x28 in
                ['ir_r1_k5_s2_e4_c40_se0.25', 'ir_r2_k5_s1_e6_c40_se0.25'],  # hard-swish
                # stage 3, 14x14 in
                ['ir_r2_k5_s1_e3_c48_se0.25'],  # hard-swish
                # stage 4, 14x14in
                ['ir_r3_k5_s2_e6_c96_se0.25'],  # hard-swish
                # stage 6, 7x7 in
                ['cn_r1_k1_s1_c576'],  # hard-swish
        num_features = 1280
        if 'minimal' in variant:
            act_layer = 'relu'
            arch_def = [
                # stage 0, 112x112 in
                # stage 1, 112x112 in
                ['ir_r1_k3_s2_e4_c24', 'ir_r1_k3_s1_e3_c24'],
                # stage 2, 56x56 in
                # stage 3, 28x28 in
                ['ir_r1_k3_s2_e6_c80', 'ir_r1_k3_s1_e2.5_c80', 'ir_r2_k3_s1_e2.3_c80'],
                # stage 4, 14x14in
                # stage 5, 14x14in
                # stage 6, 7x7 in
            act_layer = 'hard_swish'
            arch_def = [
                # stage 0, 112x112 in
                ['ds_r1_k3_s1_e1_c16_nre'],  # relu
                # stage 1, 112x112 in
                ['ir_r1_k3_s2_e4_c24_nre', 'ir_r1_k3_s1_e3_c24_nre'],  # relu
                # stage 2, 56x56 in
                ['ir_r3_k5_s2_e3_c40_se0.25_nre'],  # relu
                # stage 3, 28x28 in
                ['ir_r1_k3_s2_e6_c80', 'ir_r1_k3_s1_e2.5_c80', 'ir_r2_k3_s1_e2.3_c80'],  # hard-swish
                # stage 4, 14x14in
                ['ir_r2_k3_s1_e6_c112_se0.25'],  # hard-swish
                # stage 5, 14x14in
                ['ir_r3_k5_s2_e6_c160_se0.25'],  # hard-swish
                # stage 6, 7x7 in
                ['cn_r1_k1_s1_c960'],  # hard-swish
    with layer_config_kwargs(kwargs):
        model_kwargs = dict(
            act_layer=resolve_act_layer(kwargs, act_layer),
                act_layer=get_act_layer('relu'), gate_fn=get_act_fn('hard_sigmoid'), reduce_mid=True, divisor=8),
    return model_kwargs

to the and replace

kwargs = geffnet.mobilenetv3._gen_mobilenet_v3([size], channel_multiplier=channel_multiplier)


kwargs = _gen_mobilenet_v3([size], channel_multiplier=channel_multiplier)

Copy link

I should probably bundle the necessary geffnet code.

Copy link

xuhao1 commented Oct 17, 2020

I am trying to export PyTorch model to onnx model by my own and meat this error. With some hard code I got 112x112 model working. However when export like this

dummy_input = torch.randn(1, 3, 112, 112, device='cpu')
torch.onnx.export(model, dummy_input, "lm_model0.onnx", verbose=True, input_names=["input"], output_names=["output"], opset_version=11)

It throws

RuntimeError                              Traceback (most recent call last)
<ipython-input-4-0bb05b5a39e2> in <module>
      1 dummy_input = torch.randn(1, 3, 107, 107, device='cpu')
----> 2 torch.onnx.export(model, dummy_input, "lm_model0.onnx", verbose=True, input_names=["input"], output_names=["output"])

~\Anaconda3\envs\torch\lib\site-packages\torch\onnx\ in export(model, args, f, export_params, verbose, training, input_names, output_names, aten, export_raw_ir, operator_export_type, opset_version, _retain_param_name, do_constant_folding, example_outputs, strip_doc_string, dynamic_axes, keep_initializers_as_inputs, custom_opsets, enable_onnx_checker, use_external_data_format)
    206                         do_constant_folding, example_outputs,
    207                         strip_doc_string, dynamic_axes, keep_initializers_as_inputs,
--> 208                         custom_opsets, enable_onnx_checker, use_external_data_format)

~\Anaconda3\envs\torch\lib\site-packages\torch\onnx\ in export(model, args, f, export_params, verbose, training, input_names, output_names, aten, export_raw_ir, operator_export_type, opset_version, _retain_param_name, do_constant_folding, example_outputs, strip_doc_string, dynamic_axes, keep_initializers_as_inputs, custom_opsets, enable_onnx_checker, use_external_data_format)
     90             dynamic_axes=dynamic_axes, keep_initializers_as_inputs=keep_initializers_as_inputs,
     91             custom_opsets=custom_opsets, enable_onnx_checker=enable_onnx_checker,
---> 92             use_external_data_format=use_external_data_format)

~\Anaconda3\envs\torch\lib\site-packages\torch\onnx\ in _export(model, args, f, export_params, verbose, training, input_names, output_names, operator_export_type, export_type, example_outputs, propagate, opset_version, _retain_param_name, do_constant_folding, strip_doc_string, dynamic_axes, keep_initializers_as_inputs, fixed_batch_size, custom_opsets, add_node_names, enable_onnx_checker, use_external_data_format)
    543                     params_dict, opset_version, dynamic_axes, defer_weight_export,
    544                     operator_export_type, strip_doc_string, val_keep_init_as_ip, custom_opsets,
--> 545                     val_add_node_names, val_use_external_data_format, model_file_location)
    546             else:
    547                 proto, export_map = graph._export_onnx(

RuntimeError: ONNX export failed: Couldn't export Python operator HardSwishJitAutoFn

Defined at:
C:\Users\xuhao\Anaconda3\envs\torch\lib\site-packages\geffnet-1.0.0-py3.7.egg\geffnet\activations\ forward
C:\Users\xuhao\Anaconda3\envs\torch\lib\site-packages\torch\nn\modules\ _slow_forward
C:\Users\xuhao\Anaconda3\envs\torch\lib\site-packages\torch\nn\modules\ _call_impl
C:\Users\xuhao\Develop\OpenSeeFace\ _forward_impl
C:\Users\xuhao\Develop\OpenSeeFace\ forward


I reset commit to 8795d3298d to solve this issue.
The API of this geffnet is really not stable....

Copy link

I have not encountered this issue before. I'm on c450c12ae6ffb1757f62dde3c2765da3c10f6def of geffnet.

Copy link

xuhao1 commented Oct 17, 2020

I modified the UNetUp class and also the input and output size to make the origin model works on 112x112->14x14 and output the it to onnx.

from model import *
PATH = "weights/lm_model0.pth"
model = OpenSeeFaceLandmarks("small", 0.5, True)
dummy_input = torch.randn(1, 3, 112, 112, device='cpu')
torch.onnx.export(model, dummy_input, "lm_model0_small.onnx", verbose=True, input_names=["input"], output_names=["output"], opset_version=11)

The network is 4x faster which is suitable for my application. However it seems didn't provide any result could match the face. Do I need to retrain the model or just maybe some error on heat map processsing?

My heat map process code is like this

float logit(float p)
    if (p >= 1.0)
        p = 0.99999;
    else if (p <= 0.0)
        p = 0.0000001;

    p = p / (1 - p);
    return log(p) / 16;

CvPts proc_heatmaps(float* heatmaps, int x0, int y0, float scale_x, float scale_y)
    CvPts facical_landmarks;
    int heatmap_size = EMI_NN_OUTPUT_SIZE*EMI_NN_OUTPUT_SIZE;
    for (int landmark = 0; landmark < 66; landmark++)
        int offset = heatmap_size * landmark;
        int argmax = -100;
        float maxval = -100;
        for (int i = 0; i < heatmap_size; i++)
            if (heatmaps[offset + i] > maxval)
                argmax = i;
                maxval = heatmaps[offset + i];

        int x = argmax / EMI_NN_OUTPUT_SIZE;
        int y = argmax % EMI_NN_OUTPUT_SIZE;

        float conf = heatmaps[offset + argmax];
        float res = EMI_NN_SIZE - 1;

        int off_x = floor(res * (logit(heatmaps[66 * heatmap_size + offset + argmax])) + 0.1);
        int off_y = floor(res * (logit(heatmaps[2 * 66 * heatmap_size + offset + argmax])) + 0.1);

        float lm_y = (float)y0 + (float)(scale_x * (res * (float(x) / (EMI_NN_OUTPUT_SIZE-1)) + off_x));
        float lm_x = (float)x0 + (float)(scale_y * (res * (float(y) / (EMI_NN_OUTPUT_SIZE-1)) + off_y));

        facical_landmarks.push_back(cv::Point2f(lm_x, lm_y));
    return facical_landmarks;

Copy link

The landmark decoding looks correct to me, but this matches my experience with just reducing the resolution. I'm not sure why it doesn't work. I suspect the offset layers might give bad results if the resolution is different.

That's the reason I trained the special lower resolution model with less points. I also tried training at 112x112 before, but found that the gain in performance was smaller compared to the reduction in accuracy, so I settled on 56x56 to make the performance gain worthwhile.

Copy link

xuhao1 commented Oct 17, 2020

The landmark decoding looks correct to me, but this matches my experience with just reducing the resolution. I'm not sure why it doesn't work. I suspect the offset layers might give bad results if the resolution is different.

That's the reason I trained the special lower resolution model with less points. I also tried training at 112x112 before, but found that the gain in performance was smaller compared to the reduction in accuracy, so I settled on 56x56 to make the performance gain worthwhile.

Thanks for your patience again.
Can you still found your trained112x112 model, I wonder if I can give a try on this.
And I am also trying you 56x56 model. Are the 3D positions of the 56x56 model follows the ahead 30 of 66 points or not?

Copy link

The 30 points are the green points here and correspond to the blue ones of the 66 point set. I fill the 66 point array with the 30 points data like this.

Copy link

xuhao1 commented Oct 17, 2020

Ok! Thanks again!
If you are interesting in flight simulation you may take a look of my own project which I am integrating your awesome network.

Copy link

xuhao1 commented Oct 17, 2020

I have test the model. The 30 points model is too noisy for me =.=. Looks like 114x114 model with 66 features may balanced since I need more than 75FPS on cpu..

Copy link

emilianavt commented Oct 17, 2020

I have looked a bit more at the result of the downscaled models and it just looks completely broken. I don't think I have my old results, but might try training another 112x112 some time soon.

Copy link

xuhao1 commented Oct 17, 2020

@emilianavt Thanks a lot! Waiting for your update.

Copy link

Training will probably take a few more days as I mainly train overnight.

Copy link

xuhao1 commented Oct 23, 2020

Training will probably take a few more days as I mainly train overnight.

Thanks a lot!
My project is also a sparse project work in night, so I have enough time to wait hhhhh

Copy link

Validation loss doesn't seem to be improving anymore, so you can give this one a try:

The logit factor is 16, input resolution 112x112, output resolution 14x14.

Copy link

xuhao1 commented Oct 28, 2020

Validation loss doesn't seem to be improving anymore, so you can give this one a try:

The logit factor is 16, input resolution 112x112, output resolution 14x14.

Thanks a lot!!! As I have tried in my own code, this model works pretty well, looks like its running speed similar to model0 better performance similar to 1 or 2. More analyze will be processed later.

Copy link

xuhao1 commented Oct 29, 2020

Validation loss doesn't seem to be improving anymore, so you can give this one a try:

The logit factor is 16, input resolution 112x112, output resolution 14x14.

BTW, how about your quantization progress of these models? I see you are trying to quat them in onnxruntime's repo

Copy link

I have encountered the same issue as you. Models get smaller but slower when successfully quantized. I haven't bothered evaluating accuracy due to this. Hopefully something can be fixed on the onnxruntime side. I believe in theory quantization should be able to give a good speedup, which would help a lot.

Copy link

I also trained a faster version. I'm still trying to figure out where the two 112x112 models fit among the other different models quality-wise.

Copy link

xuhao1 commented Oct 31, 2020

I also trained a faster version. I'm still trying to figure out where the two 112x112 models fit among the other different models quality-wise.

In my practice, is more accuracy than lm_model0 with same inference time, and not as good as lm_model1.
I will test this lm_modelU_opt later.

Copy link

xuhao1 commented Oct 31, 2020

Btw, how did you solve the clip issue while quant the model?
I wonder if my quantized model gives bad result becasue I fix the clip min and max

Copy link

I only tried dynamic quantization on that model, which worked without solving that issue, but it caused things to run slower.

Copy link

xuhao1 commented Nov 2, 2020

I only tried dynamic quantization on that model, which worked without solving that issue, but it caused things to run slower.

looks like modelU has similar performance compare to model0 but much faster in my application.

Copy link

Thank you for your feedback!

Copy link

xuhao1 commented Nov 3, 2020

Thank you for your feedback!

Thanks you for your excellent work again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
None yet

No branches or pull requests

2 participants