Training codes #1

ghost · 2020-04-21T19:01:29Z

Great work. is the code in model.py used for training the onnx inference models? any chance to release the training codes?

emilianavt · 2020-04-22T12:33:51Z

Thank you. The code in model.py is was used for training, but I still need to update it for the current version of the models. One it is updated, I will update this issue.

I do not plan to release the full training code at this point.

sasanasadiabadi · 2020-04-27T08:02:08Z

are all the models (face detection, landmarking and gaze detection) based on mobilenet-v3?

emilianavt · 2020-04-27T11:40:28Z

Yes. They are all (except for the optional, pretrained retinaface model) basically heatmap regression with a mobilenet-v3 backend. The gaze tracking model works basically exactly like the landmark one, just for a single landmark.

Face detection is a bit special. That model outputs a heatmap, radius map and a maxpooled version of the heatmap that is used for decoding the output.

Because the landmarking is quite robust with respect to face size and orientation, the face detection model can get away with outputting only very rough bounding boxes.

sasanasadiabadi · 2020-04-27T17:21:58Z

Great thanks. I was wondering if its possible to share the pytorch pre-trained weights as well. I'm trying to run the codes in opencv dnn module instead of onnxruntime. current onnx model seems to be not compatible with dnn module.

…els as mentioned in issue #1

emilianavt · 2020-04-30T13:44:42Z

I have now updated the model definitions in model.py to match the currently used models.

I have also uploaded the pytorch weights here. My previous attempts at getting the models to work using opencv's dnn module weren't successful, but if you manage to get them to run, I would be very interested in hearing about it!

sasanasadiabadi · 2020-05-02T16:03:39Z

Thanks for sharing the files. I was able to convert the models and run them in opencv's dnn module. fps seems to be quite similar with both onnxruntime and dnn module. I will update you when the inference code is complete.

emilianavt · 2020-05-06T10:12:24Z

Thank you for the update!

I'm curious to know which format you converted the models to for use with the dnn module.

sasanasadiabadi · 2020-05-06T19:21:13Z

I converted the pytorch weights to onnx and using cv2.dnn.readNetFromONNX() with opencv 4.3 version I could run the inference (no other change to your original code). But the outputs of dnn module and onnxruntime are a little bit different with the same preprocessed input.

I have uploaded the converted weight here.

emilianavt · 2020-05-06T20:38:38Z

Thank you, I will give it a try using your converted models.

My first guess about the difference is that it might have something to do with the Upsample layers. The way I use them is apparently only fully supported with ONNX opset 11, which many inference engines do not seem to support yet.

sasanasadiabadi · 2020-05-11T10:34:43Z

yeah the problem was because of align_corners=True in nn.Upsample layer. dnn module is not supporting it yet somehow. therefore needs to set False for inference in dnn module. I will try to find a fix.

sasanasadiabadi · 2020-05-11T17:43:53Z

Finally solved. Thanks for pointing out the problem. with align_corners=True, converting from pytorch with onnx opset 11 and re-building opencv master (4.3.0-dev), the dnn module returns similar predictions as onnxruntime. I will re-upload the weights.

emilianavt · 2020-05-14T08:53:15Z

Nice, thank you for the updates! I tried using the models you previously posted with the dnn module, but got an error. I assume those already needed a more recent version than 4.2.0.

emilianavt · 2020-05-21T13:33:12Z

With the current OpenCV master branch, I also succeeded in loading the opset11 models (prior to optimization using onnxruntime). For the full landmark model, I get a pure inference time of around 13ms using onnxruntime with full optimization enabled and around 20ms for OpenCV's DNN module. The results are are practically identical.

sasanasadiabadi · 2020-05-22T12:17:18Z

Great!
However the landmarking looks very robust to various occlusion and illumination types, I think the pupil detection can be improved. As you are not planning to release the training codes, could you share some reference on the data preparation of the pupil model? you mentioned that the landmarking and pupil networks are basically same, but it seems the data pre-processing is quite different.

emilianavt · 2020-05-22T14:02:19Z

For pupil detection, the biggest challenge was finding training data with accurate annotations and variance in pose. Most datasets I looked at had a significant number of annotations that were noticably off. MPIIGaze was the best I could find, but it still had many issues. That's why I ended up training on basically just synthetic data generated with UnityEyes only, but that has its own issues.

Another challenge was keeping the gaze model fast, so it could be run in addition to the face landmark model without significantly impacting the frame rate for avatar animation. This lead me to select a very small model that is run at a low resolution.

To compensate for that, I forego training the model in a way that lets it adapt to different poses and align the eyes in a consistent way. This lead to another issue, because the eye corner points from the landmark model may not match the corner points (if any are given) in the gaze dataset and most gaze datasets do not include full face images, so it is not possible to run the face landmark model first to align the eyes in a consistent manner. In the end, I calculated eye corner points and pupil centers from the json generated by UnityEyes and aligned the eyes with that. The pupil center was then used as a single landmark to be detected by the model. When I was working on this part, I wasn't aware of skimage.transform.SimilarityTransform so I did things manually, but I would most likely change this if I were to rework the gaze tracking.

In the end, this alignment didn't quite match that produced when aligning according to the eye corner points, so there is some number fudging in the tracker to get better results.

During training, I also augmented the training data with rather strong blur, noise and color shifts to make up for the synthetic nature of the data. In addition, I overlaid random bright rectangles to imitate reflections on glasses.

While working on it, I posted some intermediate results in Twitter. The white dot is the model's prediction, the black dot is the target. The big picture is the red channel with the black and white dot overlaid. On the side, in the first column are the landmark map, the two offset layers as predicted and the adaptive wing loss mask. In the second rightmost column are the ground truth landmark map, adaptive wing loss mask (repeated) and the two ground truth offset layers.

Overall, considering the speed of the model, I think it's working decently well, but any improvement would be welcome of course! You can find the UnityEyes preprocessing script here.

sasanasadiabadi · 2020-05-28T18:45:08Z

no doubt about its decent performance. I just found poor performance on some challenging cases such as extreme glass reflections and outside sunny environments, which is mainly due to the limited training set and may not be a concern of your project. A first improvement could be some post processing stabilization scheme on the pupils to enhance their jittery behavior in case of glasses, with no change in the model. after I finish this part I can update you if stabilization makes any improvement.

And thanks for the detailed explanations. I set up the training and could get comparable results.

emilianavt · 2020-05-29T09:31:32Z

It's good to hear that you could get comparable results. Another thing I thought of, but haven't tried yet, is to train a bigger, slower model which would hopefully give more reliable results and use that to annotate a more diverse training set to train another smaller model.

About stabilizing the pupils, I do a lot bunch of filtering and stabilizing in the code I use to actually animate avatars.

sasanasadiabadi · 2020-05-29T10:22:30Z

Actually I tried a HG network (2 stacks) to train the pupil detector on the UnityEye set (with lots of augmentation) but it didn't improve much on my test set. I'm training now with 4 stacks.

Oh I wasn't aware of those stabilization part. Thanks.

emilianavt · 2020-06-02T13:52:11Z

That's very interesting! I'm curious to hear about your further results.

emilianavt · 2020-09-06T12:57:28Z

Since I posted the previous pytorch weights already, here are the weights for the new 56x56 30 point model.

sasanasadiabadi · 2020-09-18T15:15:06Z

Thank you for sharing the new trained weights. Intrestingly in OpenCV DNN, the inference time of the new model is higher than the lightest model before (6.5ms vs 5.5ms). However In OnnxRuntime inference time was reduced from (5.5ms to 1.7ms). I'm trying to figure out why DNN module is behaving like that!

emilianavt · 2020-09-19T16:46:58Z

That's an interesting difference. The new model is pretty much the full size model going by layer and channel count, but the resolution of the channels is lower. Maybe that has something to do with OpenCV DNN behaving differently.

emilianavt · 2020-10-08T11:51:28Z

One note regarding the inference=True code in model.py. Some users reported that the landmarks were noisier using this rather than the python landmark decoding function here:

OpenSeeFace/tracker.py

Lines 718 to 747 in e805aa2

    
           def landmarks(self, tensor, crop_info): 
        
               crop_x1, crop_y1, scale_x, scale_y, _ = crop_info 
        
               avg_conf = 0 
        
               res = self.res - 1 
        
               c0, c1, c2 = 66, 132, 198 
        
               if self.model_type < 0: 
        
                   c0, c1, c2 = 30, 60, 90 
        
               t_main = tensor[0:c0].reshape((c0,self.out_res_i * self.out_res_i)) 
        
               t_m = t_main.argmax(1) 
        
               indices = np.expand_dims(t_m, 1) 
        
               t_conf = np.take_along_axis(t_main, indices, 1).reshape((c0,)) 
        
               t_off_x = np.take_along_axis(tensor[c0:c1].reshape((c0,self.out_res_i * self.out_res_i)), indices, 1).reshape((c0,)) 
        
               t_off_y = np.take_along_axis(tensor[c1:c2].reshape((c0,self.out_res_i * self.out_res_i)), indices, 1).reshape((c0,)) 
        
               t_off_x = res * logit_arr(t_off_x, self.logit_factor) 
        
               t_off_y = res * logit_arr(t_off_y, self.logit_factor) 
        
               t_x = crop_y1 + scale_y * (res * np.floor(t_m / self.out_res_i) / self.out_res + t_off_x) 
        
               t_y = crop_x1 + scale_x * (res * np.floor(np.mod(t_m, self.out_res_i)) / self.out_res + t_off_y) 
        
               avg_conf = np.average(t_conf) 
        
               lms = np.stack([t_x, t_y, t_conf], 1) 
        
               lms[np.isnan(lms).any(axis=1)] = np.array([0.,0.,0.], dtype=np.float32) 
        
               if self.model_type < 0: 
        
                   lms = lms[[0,0,1,1,1,2,2,2,3,3,3,4,4,4,5,5,6,7,7,8,8,9,10,10,11,11,12,21,21,21,22,23,23,23,23,23,13,14,14,15,16,16,17,18,18,19,20,20,24,25,25,25,26,26,27,27,27,24,24,28,28,28,26,29,29,29]] 
        
                   #lms[[1,3,4,6,7,9,10,12,13,15,18,20,23,25,38,40,44,46]] += lms[[2,2,5,5,8,8,11,11,14,16,19,21,24,26,39,39,45,45]] 
        
                   #lms[[3,4,6,7,9,10,12,13]] += lms[[5,5,8,8,11,11,14,14]] 
        
                   #lms[[1,15,18,20,23,25,38,40,44,46]] /= 2.0 
        
                   #lms[[3,4,6,7,9,10,12,13]] /= 3.0 
        
                   part_avg = np.mean(np.partition(lms[:,2],3)[0:3]) 
        
                   if part_avg < 0.65: 
        
                       avg_conf = part_avg 
        
               return (avg_conf, np.array(lms))

Guocode · 2020-10-22T07:11:48Z

Your landmark is very robust for most case like large pose and exaggerated expression, I have train my model on 300WLP, but it failed to detect often, can you share the way of data process like data augmentation or training tricks.

emilianavt · 2020-10-22T09:39:48Z

I merged multiple datasets, partially reannotating them with FAN and older versions of the same model for some features, fixed some eye point annotations in various ways and filtered out samples where different annotations didn't agree by some threshold. I also used very strong augmentation with noise, blur, downscale, rectangle overlays, strong rotation and random margins at the sides of faces. You can look at the sample images in the results part of the readme to see what the training data looks like.

Guocode · 2020-10-23T02:14:43Z

How do you think about regression based and heatmap based method, I use regression based method and add strong data augment as you mentioned, but when face box is not so good, the result will get very bad, but I tried your heatmap based model, even if the box is very strange like much larger than the actual face, the result is also very stable. Whether the robustness comes from model struction or something else.
In addition, I found that the mouth point in 300W and in WFLW is very different so I gave up merging WFLW into the training data, in readme you did the mergement, how do you deal with the gap.

emilianavt · 2020-10-23T09:20:11Z

I can imagine that heatmap based methods lend themselves more to robustness, but I can't give a theoretical reason why. In this case, I think it is a combination of model structure and augmentation.

I don't remember the mouth points causing me issues as they at least have the same number of points. I deleted the center eye points in WFLW, but it changes the shape of the eye. You can do two step training, first training on a bigger dataset and once it has converged, training some more on an adjusted WFLW to take advantage of its higher quality annotations.

GitZinger · 2022-11-17T18:35:27Z

facedetection and landmark network are independent?
landmark with inference==false, what is the result? the shape like ?,198, 28, 28, how to train? if inference==true, the results of the network are with confident, still no clue how to train. During the training, are the labels just 66 pairs of x,y?
are there any inference code in pytorch to use the pytorch network to interpret the landmarks especially the lips?
thanks a lot

emilianavt · 2022-11-17T19:55:39Z

Yes.
Heatmaps, X offset maps and Y offset maps for each landmark, with map types grouped together. During training, the labels were turned into these maps. Setting inference to true just bakes the landmark decoding into the model itself.
Please refer to the landmarks function shown in my previous comment here on how to turn the maps into landmark locations. It's not pytorch though. If you mean interpreting the landmarks in the form of blendshapes, please refer to OpenSeeVRMDriver.cs in the examples folder.

GitZinger · 2022-11-17T20:25:52Z

Thank you for explaining to me.

How to convert the ground true coordinates labels to the heat map during training for the loss function? as I have no idea how to train right now. PyTorch dataset/dataloader gives coordinates, is there a customized dataset or a conversion coordinates to map?
or is your AdapWingLoss able to take the landmark network heat map results and the ground true 66 coordinate points to calculate the loss? and there are many magic number in the AdapWingLoss, is there any documentation to explain it?
what if I want to reduce the landmark to a lower amount? which part I can change?
And how to interpret the heat map? The heat map contains the whole face from forehead to the jaw or it could be anything and it doesn't matter for the whole face or not?

I really appreciate it. @emilianavt

Heatmaps, X offset maps and Y offset maps for each landmark, with map types grouped together. During training, the labels were turned into these maps. Setting inference to true just bakes the landmark decoding into the model itself.

Please refer to the landmarks function shown in my previous comment here on how to turn the maps into landmark locations. It's not pytorch though. If you mean interpreting the landmarks in the form of blendshapes, please refer to OpenSeeVRMDriver.cs in the examples folder.

emilianavt · 2022-11-18T00:19:47Z

Please refer to the landmark landmarks function shown in my previous comment in this thread on how to calculate landmarks from the heat and offset maps. This will also help understand what your training data should look like. Visualizing the maps will help too.

is there a customized dataset or a conversion coordinates to map?

You can create maps from coordinates, but the dataset I used to train is very customized.

or is your AdapWingLoss able to take the landmark network heat map results and the ground true 66 coordinate points to calculate the loss?

No.

and there are many magic number in the AdapWingLoss, is there any documentation to explain it?

There isn't. The numbers are mainly for weighting different landmarks.

what if I want to reduce the landmark to a lower amount? which part I can change?

Please carefully review the code to understand how everything works. It's not a completely trivial change.

emilianavt added a commit that referenced this issue Apr 30, 2020

The model definitions are now updated to match the currently used mod…

8145e4b

…els as mentioned in issue #1

emilianavt pinned this issue Sep 24, 2020

emilianavt mentioned this issue Jan 19, 2021

Model architecture #15

Closed

emilianavt mentioned this issue Jan 1, 2022

onnxruntime-gpu (CUDA/TensorRT) support #36

Closed

emilianavt mentioned this issue May 31, 2024

Can not convert model lm_model4_opt.onnx to tflite #69

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training codes #1

Training codes #1

ghost commented Apr 21, 2020

emilianavt commented Apr 22, 2020

sasanasadiabadi commented Apr 27, 2020

emilianavt commented Apr 27, 2020 •

edited

sasanasadiabadi commented Apr 27, 2020

emilianavt commented Apr 30, 2020

sasanasadiabadi commented May 2, 2020

emilianavt commented May 6, 2020

sasanasadiabadi commented May 6, 2020

emilianavt commented May 6, 2020

sasanasadiabadi commented May 11, 2020

sasanasadiabadi commented May 11, 2020

emilianavt commented May 14, 2020

emilianavt commented May 21, 2020 •

edited

sasanasadiabadi commented May 22, 2020

emilianavt commented May 22, 2020

sasanasadiabadi commented May 28, 2020

emilianavt commented May 29, 2020

sasanasadiabadi commented May 29, 2020 •

edited

emilianavt commented Jun 2, 2020

emilianavt commented Sep 6, 2020

sasanasadiabadi commented Sep 18, 2020 •

edited

emilianavt commented Sep 19, 2020

emilianavt commented Oct 8, 2020

Guocode commented Oct 22, 2020

emilianavt commented Oct 22, 2020

Guocode commented Oct 23, 2020

emilianavt commented Oct 23, 2020 •

edited

GitZinger commented Nov 17, 2022

emilianavt commented Nov 17, 2022 •

edited

GitZinger commented Nov 17, 2022 •

edited

emilianavt commented Nov 18, 2022

Training codes #1

Training codes #1

Comments

ghost commented Apr 21, 2020

emilianavt commented Apr 22, 2020

sasanasadiabadi commented Apr 27, 2020

emilianavt commented Apr 27, 2020 • edited

sasanasadiabadi commented Apr 27, 2020

emilianavt commented Apr 30, 2020

sasanasadiabadi commented May 2, 2020

emilianavt commented May 6, 2020

sasanasadiabadi commented May 6, 2020

emilianavt commented May 6, 2020

sasanasadiabadi commented May 11, 2020

sasanasadiabadi commented May 11, 2020

emilianavt commented May 14, 2020

emilianavt commented May 21, 2020 • edited

sasanasadiabadi commented May 22, 2020

emilianavt commented May 22, 2020

sasanasadiabadi commented May 28, 2020

emilianavt commented May 29, 2020

sasanasadiabadi commented May 29, 2020 • edited

emilianavt commented Jun 2, 2020

emilianavt commented Sep 6, 2020

sasanasadiabadi commented Sep 18, 2020 • edited

emilianavt commented Sep 19, 2020

emilianavt commented Oct 8, 2020

Guocode commented Oct 22, 2020

emilianavt commented Oct 22, 2020

Guocode commented Oct 23, 2020

emilianavt commented Oct 23, 2020 • edited

GitZinger commented Nov 17, 2022

emilianavt commented Nov 17, 2022 • edited

GitZinger commented Nov 17, 2022 • edited

emilianavt commented Nov 18, 2022

emilianavt commented Apr 27, 2020 •

edited

emilianavt commented May 21, 2020 •

edited

sasanasadiabadi commented May 29, 2020 •

edited

sasanasadiabadi commented Sep 18, 2020 •

edited

emilianavt commented Oct 23, 2020 •

edited

emilianavt commented Nov 17, 2022 •

edited

GitZinger commented Nov 17, 2022 •

edited