-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Webworker #47
Comments
I haven't tried it but it should work. I am assuming you want to use a webworker so that the ui doesn't get blocked because of loading the models or during face detection/recognition? Do keep in mind that you won't be able to access dom elements from a webworker. |
Wanted to use a tensor as input in the worker, to avoid DOM elements, would just transfer the image data to the worker and then transfer the results back m. |
I did it and it works, but at the moment tfjs is not compatible with offscreen canvas (tensorflow/tfjs#102) so you don't have access to the GPU from a webworker and the result is more than slow actually ... |
@akofman Out of curiosity, how did you make it work ? |
I had same experience with |
@akofman I also would be interested in how you got it working in a worker, if you're willing to share! |
Bumping this for relevance because i'm now working on doing the same thing and with offscreen canvas. My project is here if anyone wants to check out why. I'm rendering a threejs scene and the face detection allows me to use face detection to control the perspective of the 3d scene. Each time the face detection runs at 100ms, the scene janks out briefly because the 3d part runs in about 1ms cpu time over 16ms budget for RAF calls, but the face detection part goes 40ms-100ms so you lose 3 frames at a time, making it look like the rendering is broken. Now I have to find a way to get it under budget for the reasons @thexiroy mentioned |
It's probably better to ask for help at tfjs regarding how to get this running in a webworker |
For those interested, there is a pull request open here which was a bit dated, but I've been working on. The OffscreenCanvas support doesn't appear that involved, but there doesn't appear to be any special consideration for web workers or transferable objects, and those may take longer to integrate in. I did see there is a branch for web workers open, but I haven't been through it. |
@justadudewhohacks tfjs updates is a non-starter. Typescript wont have support for OffscreenCanvas until 3.5.0 and that's not officially released. Even if they do release it, it's currently buggy and TensorFlow wont build against 3.5.0 without ts-ignore hacks. Even if you do the ts-ignore hacks, the resulting build of tensorflow running in face-api.js barf due to those ignored incompatibilities. Models don't work. flattened maps don't work. Whole thing barfs. So, hacking created and updated tickets for those findings. The tfjs-core update thread above was updated. I also created a typescript issue that can be tracked here: microsoft/TypeScript#30998 It looks like things on both those projects move pretty quickly, so hopefully this wont fall to the bottom of the thousands of filed issues on the pile and actually get some updates. For now, I will be attempting to create a fake interface in the worker thread which proxies back to the main js with updates and commands. The approach is similar to one that @mizchi took here: tensorflow/tfjs#102 (comment) The difference between @mizchi 's approach and my approach will be that i am attempting to fool the tensorflow library into believing it is running under non-worker thread conditions using my new found knowledge of how it works (gained by trying to fix their code). The plan is to build a faux document object and window object, complete with interfaces and values the library checks for when creating a canvas. Instead, I'll return wrapped instances of OffscreenCanvas, maybe with Proxy Traps, and catch any calls by the library to APIs I haven't stubbed out and build adaptors for those to canvas. Because TensorFlow does not ever return canvas elements to be drawn to the screen, the only overhead I'll have to worry about is sending the data from video into the worker to be processed. Because I'll be doing this with ImageBitmap, those updates will be zero-copy transferable objects (low latency). I suppose this is somewhat of a "shim" pattern and could be added to face-api.js as an adaptor or different API call if it works. For anyone else following this path expecting that a heroic effort down this rabbit hole will maybe allow you to get this to work, a few notes you should consider: TensorFlow (Google) is a massive project written in Typescript (Microsoft) which is made up of monolithic modules here: |
This is a horrible function. If someone (me) wants to fake out a library into thinking it's in a browser, don't stifle that person by doing some oddball browser check (this is not how you detect if you're in a browser) and then be really silent and confusing when the library errors. I've been working against an error i thought was in tensorflow for hours only to realize it came from face-api.js code: Error: getEnv - environment is not defined, check isNodejs() and isBrowser() Tensorflow already has browser and node checks and it's own idea of environment. Why did you guys reinvent the wheel? :| |
I agree, that the error message might not be the best, but by looking at the stack trace one could have figured where the error message comes from. The browser check is that complex, because we want to only initialize the corresponding environment of the library in case we are in a valid browser or nodejs environment to avoid errors at runtime. In any other case, it is up to the user to initialize the environment manually. All environment specifics can be monkey patched using |
Ok, so I'm posting this update to let everyone in this thread know that it is possible today to fool both tensorflow and face-api.js into running in a web worker and that they will run GPU accelerated, however you shouldn't get your hopes way up for perfectly jank free UX. In my app, face detection takes approximately 60ms on a MacBook Pro (Retina, 15-inch, Mid 2015), which is only processing 640x480 stills. The stills are transferred to the worker using zero copy, so they avoid the serialize/deserialize and structured copy performance hits. The app itself is only taking 1-2ms for any given RAF cycle, but visual jank is still occurring on Chrome when the worker thread takes longer than expected. I'm not even seeing any GC issues. The jank appears to happen while processing microtasks. I see bunches of timers being set. I'd have to look further into the face-api.js source to see if it's breaking apart workloads into chunks using 0ms setTimeout calls. If it is, those should be converted to Promises. Allowing the browser to handle batch processing stacks of timeouts will definitely result in slower performance if that's what's happening. Timeouts can take 2-4ms to resolve and Promises are almost immediate. I believe the details of how promise scheduling is done is still on a per browser basis, but if you're in this thread, you're today only interested in the ones that support OffscreenCanvas, and that's Chrome. Chrome handles them async. Here's the admittedly over engineered code for creating a worker environment that tensorflow and face-api.js will run in: Parent
` Worker Canvas = HTMLCanvasElement = OffscreenCanvas; function HTMLImageElement(){} Image = HTMLImageElement; // Canvas.prototype = Object.create(OffscreenCanvas.prototype); function Storage () { let window, document = new Document();
` |
Check this out. This is what I'm talking about when I'm making this correlation. When timers are used in bulk, they appear to mess up the process scheduling by spamming the event loop. Promises don't appear to have the same problem. Chrome bundles them up nicely and still has the ability to handle requestAnimationFrame requests. I'd like to see if there's a way in face-api.js to fix the workload splitting so it doesn't rely on setTimeout |
Hmm, actually there are no calls to setTimeout, tf.nextFrame or requestAnimationFrame in face-api.js. Could it be, that the async behaviour you are encountering here is due to downloading data from the GPU via |
ok, possibly disproved the timer spam theory. I'm now pointing to the GPU work. While I was overwriting everything sacred (window, document, etc) I rewrote the setTimeout function so it uses promises and request animation frame for 0 ms setTimeouts, and falls back to setInterval for actual timers. It worked exactly how I anticipated it would, except that the jank is still present and the only thing left to point a finger at is the GPU load that's 2 frames long. 👎 So, for everyone watching, probably keep your GPU load in mind. It can block things just like anything else. |
Timeout replacement code // More really bad practices to fix closed libraries. Here we overload setTimeout to replace it with a flawed promise implementation which sometimes cant be canceled. let callStackCount = 0; setTimeout = function (timerHandler, timeout) { clearTimeout = (id)=>{ console.log(id); if(id && id.cancelable === false) { console.error('woops. cant cancel a 0ms timeout anymore! already ran it'); } else { clearInterval(id);} }; // var x = setTimeout((x,y,z)=>{console.log(x,y,z);}, 0, 'hello', 'im', 'cassius'); |
Is there any other "cleaner" way to do this ? @justadudewhohacks you mentioned the I mean, lets say I have a const worker = new Worker('worker.js');
worker.postMessage('foo'); and a worker where I want to be able to do this: import * as faceapi from 'face-api.js';
faceapi.loadFaceExpressionMode('assets/models/');
onmessage = function(event) {
console.log(event);
} where and how should I use the The error raised atm is the following: |
@maximeparisse you would monkey patch environment specific after importing the package. In the nodejs examples we monkey patch Canvas, Image and ImageData for example, as shown here. Refer to the Environment type to see what can be overridden. |
@justadudewhohacks : Thank you for your reply, i will give a shot and give my feedbacks here in case that can help others. |
@justadudewhohacks : I've tried without success to do that in a web worker. I understand how you patched the env spec in nodejs but i can't see how i can reproduce it for a web worker. |
@maximeparisse face-api.js only looks for those native methods as a node server vs browser check, per my example above. There are also tfjs detections. You have to set those values inside the worker before loading the libraries in order to fool the libraries into believing they're in a browser. If they fall into the node detection block, they will fail that check too, then bail out to a null result rather than a default (browser). A good patch to apply to face-api.js would be to add worker cases and change the detection to if/elseif/elseif/else style blocks so there is always a default case and more reasonable fallbacks. This is doable, but the trick is in supporting workers for browsers other than Chrome or Firefox with the flag set to enable OffscreenCanvas. |
Anyone manage to get this working? |
Yes. Turned out the integrated gpu was the biggest insurmountable bottleneck to avoid blocking the rendering pipeline. I’d be willing to revisit this once tensorflow and this lib have been updated to allow for offscreencanvas support, which is necessary to avoid excessive monkey patching and environmental fake outs to the two libs. |
@jeffreytgilbert It appears TensorFlow.js now supports Offscreen Canvas... At least according to this article: https://medium.com/@wl1508/webworker-in-tensorflowjs-49a306ed60aa - does that jive with what you're seeing? Should face-api/tfjs "just work" in WebWorkers now...? |
I can sadly confirm that face-api does not Just Work, even with monkeyPatch. When I do the following in my worker:
I get:
I've checked the
My own monkey patching is based on @jeffreytgilbert 's example above, with a few edits to make it compile, and I added Bottom line: Anyone have any suggestions on how to get this to even work? GPU or no GPU, I just want to try to get it to work. (Chrome 79 on a brand new MacBook Pro 15", so yes, OffscreenCanvas is supported.) |
Update: Got it working. How? Use this gist: https://gist.github.com/josiahbryan/770ca1a9d72f1b35c13219ba84dc0495 Import it into your worker. If you have a bundler setup for your worker, just do (assuming you put it in your import './utils/faceEnvWorkerPatch'; Don't need to call faceapi's monkeyPatch if you use that. Fair warning: That gist is NOT pretty. It is a conglomeration of hacks and workarounds and whatever else. But it works. Face detection is working for me now in a web worker. Ideally, face-api would support a WebWorker WITHOUT having to do that horrendous hack of a monkey patch I just uploaded, but, yeah. At least this works now. |
Hi ! These lines of monkey patch cause errors on the latest version of Chrome, when used elsewhere than localhost:
As face-api.js works well with OffscreenCanvas, shouldn't this type be added as a possible source ? (for detectSingleFace() per example) |
If you are interested in a PR let me know. |
Anyone seeing massive memory leaks in WebWorkers in the latest Chrome using |
I am pulling frames off the live video element, drawing them into a |
Hey @cindyloo - yeah the canvas I use for capture-and-transfer is invisible / hidden. The user's see the live The only thing they see is the "overlay" canvas from the worker - and yeah, that is a bit laggy at 12fps (I actually wrote a tuner to automatically move FPS up/down as needed, can share if desired) - but yeah, the overlay can be slow, but IMHO that's okay as long as the video behind it is buttery smooth. Ping with specific questions for more details, happy to share! |
@josiahbryan so your face detection overlay is a bit laggy too? I need it to be as fast to real-time as possible, looking at setting up an interval instead of sending the image to the worker at every render.. any suggestions are welcome! |
@cindyloo and anyone else who is interested - I know you didn't ask, but here's the FPS Tuner I use in some of my projects as needed. Using it right now in a bespoke project with face-api and a webworker to tune the FPS up/down as performance allows: https://gist.github.com/josiahbryan/c4716f7c9f051d7c084b1536bc8240a0 |
I mean, I don't need the faces 100% real time - as long as within a few frames of the actual real time, I'm happy. But yeah, the bottle neck is the detection, not the sending of frames. I can get 16-20fps when I force it on a MacBook pro with tons of ram and cpu. I haven't tried this yet with faceapi, but I do it with OpenCV in another worker: Specifically, to improve performance, I resize the video WHEN I DRAW IT into the canvas down to a smaller size:
The small size is calculated from the aspect ratio of the video and downsized to something like 420px x whatever for best results. I imagine doing something like that before sending the frame to the worker for face-api would improve both the transfer and the detection, but haven't tried it yet myself @cindyloo Just fair warning tho - might not matter for your application - but the smaller your source frame (e.g. the smaller you resize the video), the worse the detector will be at finding small faces. Will still work fine with faces that fill "a lot" of the frame, but will fail to find smaller faces as you resize smaller. Just FYI |
Hey @remipassmoilesel , you're right, when I revisited this project this year, I did have to update the monkeypatch I wrote to work with lates chrome. I've updated the gist above, but here is the updated monkeypatch script: https://gist.github.com/josiahbryan/770ca1a9d72f1b35c13219ba84dc0495 Also, unrelated, but for others (@cindyloo or whoever), here's the writeup on FpsTuner I did a while ago on how to use it: |
Hey @josiahbryan ! What do you think about just using @OffscreenCanvas as an input source ? It works well for me |
@remipassmoilesel not sure what you mean...can you be more specific? :-D Like pseudo code example or bulletpoints? Afaik you can't render into the canvas from outside the worker once you xfer a canvas to the worker? Or maybe I'm not understanding what you're saying? |
Hi @josiahbryan, I mean presently this sample code needs a type assertion on input (
|
The bottlenecks to making this work without jank are not simply cpu bound. On a machine without a discrete or very new mobile GPU, the GPU will lock up and cause jank on the main thread. See, the GPU is also used to render the main thread, and you can pin it with ease on intel integrated graphics chips that only have like 40ish cores. All the discrete GPUs these days start in the 1000+ cores range, so i think it just gets geeked when it cant handle the same quantity of tasks. I was never able to get an exact answer for just why that is the case, but it's the case. So in your dev tools, you can monitor GPU too. I would watch that one if you see issues on your target browser/feature support level. |
I couldn't get close to the speed of this example and as such implemented a webworker. Combined with an upgrade to tfjs and a throttle of the frames sent to the worker, I have a fairly responsive detector w landmarks. Now that I'm realizing we can't use offscreenCanvases on mobile browsers, I'm questioning my decision. Any thoughts/comments? |
It's not mobile browsers. It's embedded webviews in apps. Mobile browsers probably support what you need unless the android version is too old. As some have suggested above, if offscreen canvas is not supported for web workers and gpu accelerated detection, you can alternatively use a more cpu bound brute force technique with opencv. It's a different library built for cpu not gpu acceleration, so it should work. The web supported portions of this back in 2010s when a team at opera was leading the charge on perspective driven face detection and 3d, mimicking the nintendo wii functionality using your face, and orienting the 3d camera and depth based on where you are perceived to be standing from the display. Worked well back then on just cpu, but with much simpler models. Tensorflow is a great library, but if you cant use it because of offscreen canvas support, browser support, gpu support, etc, the thing to do would be go down the cpu accelerated route and simpler models like opencv and webassembly and stuff like that which might be good enough for your use case. |
thanks @jeffreytgilbert. The example cited above doesn't appear to work on iphone X chrome (error attached). I have used opencv but not opencv.js. I'll have a look. |
That's unfortunate. Dumb question, but did you grant access to the microphone and camera for the site? That looks like either the support for getUserMedia isn't there which would be very strange since chrome were the ones to introduce the webrtc apis i believe, or potentially an access restriction issue. You wouldnt be able to access this from within the webworker, but you should be able to get to it from the main window. |
Trying to use anything other than Safari on an iPhone or iPad is not going to work well, in my experience. Apple doesn't give full WebRTC support to those browsers the way it does to Safari. |
I believe they recently discontinued their policy around locking down other browser apps so they have to use the webkit engine. Chrome may still use it. Havent looked to see who swapped out webkit for their own engine. |
So, I did this for a project and had it working fairly well in a worker with OffscreenCanvas, but honestly wasn't getting any better perf from it than having it on the main thread, ended up removing it. I think the data transfer between working and main thread might be the bottleneck? |
Entirely possible that it is the bottleneck. The only reason I ever did
things in a worker thread was so that video playing in the main UI thread
didn't stutter or be affected. When I tried doing face detection in the
main thread, the video display from the camera was seriously affected, but
when I do face detection in a worker thread, the video display in the main
thread is butter smooth
…On Thu, Mar 18, 2021, 9:10 AM Ændrew Rininsland ***@***.***> wrote:
So, I did this for a project and had it working fairly well in a worker
with OffscreenCanvas, but honestly wasn't getting any better perf from it
than having it on the main thread, ended up removing it. I think the data
transfer between working and main thread might be the bottleneck?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#47 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABEZELFQ5XZCCRNKFGUK4KTTEICWFANCNFSM4FJTKRRA>
.
|
@josiahbryan Ohhh okay that totally makes sense. I was rendering the video at the same speed as the face-api.js processing so that there wouldn't be lag when drawing stuff on top of the canvas element, but if you're wanting the video to play in realtime that would be beneficial. Cool! 👍 I'm still a bit of a ways off from publishing my project but will post a working example repo once I do, have learned a lot getting this working! |
Yeah that makes total sense, I see what you're saying then in that case. In
my case I was capturing from a WebCam and showing it on the screen so
people had to see themselves moving in real time, but then the face
detection would be just at 12 FPS so yeah the face detection would lag a
bit but as long as people could see themselves move in real time early in a
matter in my case.
Good work on getting this far! It is so much fun learning things as we go
isn't it? Good job!
…On Thu, Mar 18, 2021 at 9:16 AM Ændrew Rininsland ***@***.***> wrote:
@josiahbryan <https://github.com/josiahbryan> Ohhh okay that totally
makes sense. I was rendering the video at the same speed as the face-api.js
processing so that there wouldn't be lag when drawing stuff on top of the
canvas element, but if you're wanting the video to play in realtime that
would be beneficial. Cool! 👍
I'm still a bit of a ways off from publishing my project but will post a
working example repo once I do, have learned a lot getting this working!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#47 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABEZELFIUKOUCUOCC3AMEWDTEIDKLANCNFSM4FJTKRRA>
.
--
Josiah Bryan
Phone/SMS: +1-765-215-0511
WhatsApp: +972-050-794-0566
www.josiahbryan.com <https://www.josiahbryan.com/?utm_source=sig>
***@***.***
|
You definitely don't need to process on every video frame. If you do that, it will choke. Rather than making it arbitrary (like say running it at 12FPS capture) you can have it async so the capture only happens when the detection has finished and the background thread is ready to process a new frame. If this causes too many hiccups in performance on the main thread, it's likely due to the video card not being fast enough to run the ML and also handle the Main UI thread. In Chrome, last I checked, even from a worker thread, the GPU is not isolated from the Main UI thread, so your GPU ends up being a bottleneck. If this is the case you are seeing where GPU (which can be evaluated in dev tools under the performance recorder) is taking too long, you'll need to get a better GPU orrrrr you can simplify your detection model orrrrr pick a simpler one. OffscreenCanvas uses zero copy to move data between parent and worker thread, however I did also notice the capture of the image data from the frame is a non-trivial hit to performance. So, quick recap of steps to take:
If that doesn't work, let us know and I'll see if there's something I might have missed. |
Also some food for thought, but if you can't buy a new GPU to run the ML without delays, you can fall back to CPU processing which would be isolated in a web worker on a background thread and won't cause visual jank. The downside on that approach is you will have fewer updates because each check will take longer and you're going to spin the fans on the CPU while it brute forces its way through the work. I don't have an exact answer on how you would force TensorFlow to CPU, but I seem to remember it doing that when it thought the runtime environment was node.js originally before I added the container hacks to fool it into thinking it was in the main UI thread context. I bet it could be done pretty trivially. |
Why BodyPix supports ImageData |
@josiahbryan is this solution work for 30 FPS video. Currently in my application, I've everything rendering video/canvas, and face detection in main thread. and on video play I'm trying to process each frame. But what end up happening is that I'm only able to process the about 1/10th of all video frames. Which is not I want to implement so my question is, for 1 minute 30 FPS video (i.e 1800 frames). can I detect all the faces from each and every frames using the Web worker solution? |
Hi there! Great work on this plugin!
Has anybody managed to run this in a webworker?
The text was updated successfully, but these errors were encountered: