New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Surface MediaPipe Iris model for the web #2526
Comments
Hey @badlogic, Thanks for reaching out! I see 2 separate problems mentioned in your request:
Regarding (1), I CC'ed @chuoling and @mhays-google for visibility of this request as well as maybe to comment on a timeline Regarding (2), I can share that the same face geometry logic + (probably, a better) face mesh tracking model is what drives specifically ARKit-like eye blendshapes for AR Puppets in Google Duo apps (coverage, you can check it out in the app). Yes, Face Mesh "normalization" via Face Geometry is not perfect, but it should get you into the ballpark for solving your problem. Another question is whether the released MediaPipe Face Mesh tracking model is good enough for AR puppeteering. I wasn't the person who wrote the Google Duo puppet heuristic, so it's hard for me to share any specifics. I CC'ed @ivan-grishchenko, he should probably have more to say regarding this topic |
Hi! We're planning on releasing Iris data points in both our face_mesh and holistic solutions around October. |
Could someone clarify what the difference is between the various distributions? From what I can tell, the original @tensorflow-models/facemesh package (now deprecated) corresponded to the model from this paper: Real-time Facial Surface Geometry from Monocular Video on Mobile GPUs The subsequent release of @tensorflow-models/face-landmarks-detection appears to repackage the old model with the option to opt-in to a higher fidelity (but less performant) model for iris tracking using the advances from this paper: Attention Mesh: High-fidelity Face Mesh Prediction in Real-time. It's unclear whether this model also includes the improvements in to eye/lip tracking from that paper. Today I learned there is also the @mediapipe/face_mesh package from this repo which doesn't include iris tracking, which confuses me since this package was published more recently than the tensorflow one. Is this just the same as the original facemesh package? Which distribution are developers advised to use? Lastly, like @badlogic I am working on a blend shape puppeteering project and would appreciate guidance on how to achieve this, or better yet built-in support for some standard blend shapes. AR puppeteering was demonstrated as a use case in the Attention Mesh paper, but in practice since the mesh is not invariant to head orientation as this issue points out, it's unclear how this was actually implemented. Despite my attempts at normalization, the model often exhibits undesired blend shape activation when the user turns their head. |
I went with @tensorflow/face-landmarks-detection after all. It appears the
mediapipe landmark detector exhibits better temporal coherence, i.e. less
jitter, and slightly better inference performance. But it's not fit for
puppeteering tasks.
@tensorflow/face-landmarks-detection with enabled iris model is pretty
heavy computationally, so even on a desktop device, e.g. a macMini with
integrated Intel GPU, it may be too slow. I getting good results for head
pose tracking by applying a simple windowed mean filter. I had to employ a
somewhat complex filter for the eye close blend shape, which still falls
apart for head rotations around y at around 35°. The reason is that the eye
that becomes occluded is no longer tracked by the iris model in that
configuration, so the landmarks revert back to what the less accurate face
landmark detector uses. For my purposes, I can probably figure something
out, i.e. link the eye state based on how confident the iris model was.
The next big problem is mouth blend shapes, which may mean the end for my
little problem. Neither the tsjs model, nor the mediapipe model are
accurate enough for anything other than almost binary open/close. I may
apply postprocessing similar to what i described above. But that's a much
harder problem to solve with more classical image processing approaches.
I assume the full optimized mediapipe model including iris detection is
likely reserved for Duo, so I assume we'll not get access to it.
…On Wed, Sep 15, 2021, 19:25 Matt Rossman ***@***.***> wrote:
It's also very confusing to have two similarly named packages, one
provided by TensorFlow and one provided by MediaPipe. I understand the
MediaPipe models and pipelines do differ from the one in the TensorFlow
package.
Could someone clarify what the difference is between the various
distributions?
From what I can tell, the original @tensorflow-models/facemesh
***@***.***/facemesh> package (now
deprecated) corresponded to the model from this paper: Real-time Facial
Surface Geometry from Monocular Video on Mobile GPUs
<https://arxiv.org/abs/1907.06724>
The subsequent release of @tensorflow-models/face-landmarks-detection
***@***.***/face-landmarks-detection>
appears to repackage the old model with the option to opt-in to a higher
fidelity (but less performant) model for iris tracking using the advances
from this paper: Attention Mesh: High-fidelity Face Mesh Prediction in
Real-time <https://arxiv.org/abs/2006.10962>. It's unclear whether this
model also includes the improvements in to eye/lip tracking from that paper.
Today I learned there is also the @mediapipe/face_mesh
***@***.***/face_mesh> package from this
repo which doesn't include iris tracking, which confuses me since this
package was published more recently than the tensorflow one. Is this just
the same as the original facemesh package? Which distribution are
developers advised to use?
Lastly, like @badlogic <https://github.com/badlogic> I am working on a
blend shape puppeteering project and would appreciate guidance on how to
achieve this, or better yet built-in support for some standard blend
shapes. AR puppeteering was demonstrated as a use case in the Attention
Mesh paper, but in practice since the mesh is not invariant to head
orientation as this issue points out, it's unclear how this was actually
implemented. Despite my attempts at normalization, the model often exhibits
undesired blend shape activation when the user turns their head.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2526 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAD5QBBOZWO5R5ANPCKZ5MLUCDJILANCNFSM5DXD4Q3Q>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
Traditionally for MediaPipe we say facemesh to refer to retrieving face landmarks, while iris detection is a secondary refinement ML model which can be optionally applied afterwards. For reference, see the graphs for iris-tracking-on-top-of-face-landmarks (visualization and live web-demo here: https://viz.mediapipe.dev/demo/iris_tracking, as mentioned by @badlogic ); that demo is a bit older, but is probably still the best reference. The @tensorflow packages are part of TF.js, so while they may use MediaPipe models, our team has usually been less involved with those ports, so I'm unable to comment in too much detail there, although hopefully that trend is changing currently and in the near future. The @mediapipe/facemesh package will contain the latest open-sourced MediaPipe models for face landmarks, as well as the MediaPipe recommended pre- and post-processing pipelines (usually just what you'd find in the corresponding graphs under our modules/ directory). It is a standalone JS API initially created specifically for face landmarks, requiring minimal setup or extra code (we term these single-purpose turnkey offerings "Solutions APIs"), but therefore was not designed to be able to handle more complicated alternative use cases. Note that we have a sibling module "iris_landmark" which can be used for iris tracking refinements to the face landmarks, but there is no corresponding MediaPipe JS Solution API for it yet, nor has it been integrated into facemesh (see @mhays-google's comments above for ETA). Unfortunately, I don't believe any lip refinement code or models have been open-sourced as of yet either (nor do I know of any plans to do so). And as for blend shapes, @ivan-grishchenko will have to weigh in. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you. |
Closing as stale. Please reopen if you'd like to work on this further. |
Here's the response I got from Ivan when inquiring about computing blend shapes:
I've been considering something similar to the NN approach outlined here, by generating a dataset containing input images mapped to blend shape outputs using a pre-rigged ReadyPlayerMe avatar and Blender's Python API. However this may be too time consuming for the scope of my project since I'm not super experienced with tensorflow. Hopefully a future release of this solution can include automatic blend shape computation. |
Hi MediaPipe community, I am Muhammad Adnan from Pakistan, and doing my research on the iris landmarks dataset, but I need the dataset used by the MediaPipe iris landmarks module. Can you please help me in providing that dataset, So I can proceed with my research studies. I shall be very thankful to you if you give me this favor. |
System information (Please provide as much relevant information as possible)
Describe the feature and the current behavior/state:
I'm currently using the MediaPipe Face Mesh JavaScript API for a web-based virtual puppeteering application. I am successfully able to derrive a head pose from the face geometry data. I am also able to use mouth and eye land marks to drive parameters of a puppet, altough the accuracy of the mouth landmarks requires heavy post-processing to be useful.
Eyes are a very important part of conveying emotion. As such, the application must be able to track the iris position as well as the open/close state of the eye lids. Sadly, the current eye landmarks do not include iris data, and the open/close state can not be derrived from the eye contour landmarks reliably, if at all, depending on head orientation.
I'm using the landmarks of the metric face geometry mesh (e.g.
results.multiFaceGeometry[0].getMesh().getVertexBufferList()
). I assumed the vertices in this mesh are invariant with respect to the head orientation. However, they are not, which changes the relative distances of landmarks in the neutral pose. Here is the metric face geometry mesh at various head orientations, illustrating that the mesh is not head orientation invariant.Screen_Recording_2021-09-09_at_14.49.37.1.mp4
Without a head orientation invariant face mesh, it is hard to impossible to establish a neutral pose to compare the current pose against and calculate puppeteering parameters, like how closed an eye is expressed in the range [0,1]. While I can statistically treat the eye lids distance value in a way to detect eye blinking/winking/closing for an arbitrary but fixed head orientation, the tracking falls apart as soon as the user turns their heads.
The eye lid landmarks also never fully close in this model, and the left and right eye are linked to each other. In the video below, only a single eye is closed at a time, while the eye contour landmarks move for both eyes.
Screen.Recording.2021-09-09.at.15.01.24.mp4
For the iris position, I currently resort to image postprocessing, detecting the iris/pupil inside the eye contour landmarks bounding box via constrast enhancement and a simple sliding window histogram approach. The results are convincing enough under a wide range of lighting conditions. However, the additional computations are somewhat significantly contributing to overall processing time, which isn't ideal, especially in mobile web browsers.
Screen.Recording.2021-09-09.at.14.58.52.mp4
From the MediaPipe Iris web demo, it appears that all or most these issues can be solved by its model. Sadly, the iris model is not available for use on the web through the TypeScript/JavaScript API.
Will this change the current api? How?
This change would be non-breaking, and consist of additional configuration parameters, as well as additional data in the
Results
object.Who will benefit with this feature?
Anyone trying to use MediaPipe Face Mesh for facial expression detection and tracking.
Please specify the use cases for this feature:
Virtual puppeteering via blend shapes.
Any Other info:
There is a separate MediaPipe Face Landmarks Detection package from the TensorFlow people. It does contain the iris model, however, it's performance in both accuracy and runtime speed is worse than the JavaScript package provided by MediaPipe themselves. It's also very confusing to have two similarly named packages, one provided by TensorFlow and one provided by MediaPipe. I understand the MediaPipe models and pipelines do differ from the one in the TensorFlow package.
Apple's ARKit does have dedicated blend shape support, which would be ideal to have in MediaPipe Face Mesh for any facial expression detection and tracking.
The text was updated successfully, but these errors were encountered: