Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Webworker #47

Open
patrick-nurt opened this issue Jul 12, 2018 · 83 comments
Open

Webworker #47

patrick-nurt opened this issue Jul 12, 2018 · 83 comments

Comments

@patrick-nurt
Copy link

Hi there! Great work on this plugin!
Has anybody managed to run this in a webworker?

@thexiroy
Copy link

thexiroy commented Jul 12, 2018

I haven't tried it but it should work. I am assuming you want to use a webworker so that the ui doesn't get blocked because of loading the models or during face detection/recognition? Do keep in mind that you won't be able to access dom elements from a webworker.

@patrick-nurt
Copy link
Author

Wanted to use a tensor as input in the worker, to avoid DOM elements, would just transfer the image data to the worker and then transfer the results back m.

@akofman
Copy link
Contributor

akofman commented Jul 16, 2018

I did it and it works, but at the moment tfjs is not compatible with offscreen canvas (tensorflow/tfjs#102) so you don't have access to the GPU from a webworker and the result is more than slow actually ...
If your purpose is to get better performances I advice you to have a try with the mtcnn model which is really faster, see the 0.9.0 version of this project.

@Floby
Copy link

Floby commented Aug 13, 2018

@akofman Out of curiosity, how did you make it work ?

@shershen08
Copy link

I had same experience with tracking.js library. I managed to to put its part to web worker eduardolundgren/tracking.js#99 and completely get rid of the UI lags but code was just for POC

@ScottDellinger
Copy link

I did it and it works, but at the moment tfjs is not compatible with offscreen canvas (tensorflow/tfjs#102) so you don't have access to the GPU from a webworker and the result is more than slow actually ...
If your purpose is to get better performances I advice you to have a try with the mtcnn model which is really faster, see the 0.9.0 version of this project.

@akofman I also would be interested in how you got it working in a worker, if you're willing to share!

@jeffreytgilbert
Copy link

Bumping this for relevance because i'm now working on doing the same thing and with offscreen canvas. My project is here if anyone wants to check out why. I'm rendering a threejs scene and the face detection allows me to use face detection to control the perspective of the 3d scene. Each time the face detection runs at 100ms, the scene janks out briefly because the 3d part runs in about 1ms cpu time over 16ms budget for RAF calls, but the face detection part goes 40ms-100ms so you lose 3 frames at a time, making it look like the rendering is broken. Now I have to find a way to get it under budget for the reasons @thexiroy mentioned

@justadudewhohacks
Copy link
Owner

It's probably better to ask for help at tfjs regarding how to get this running in a webworker

@jeffreytgilbert
Copy link

For those interested, there is a pull request open here which was a bit dated, but I've been working on. The OffscreenCanvas support doesn't appear that involved, but there doesn't appear to be any special consideration for web workers or transferable objects, and those may take longer to integrate in. I did see there is a branch for web workers open, but I haven't been through it.

tensorflow/tfjs-core#1221

@jeffreytgilbert
Copy link

@justadudewhohacks tfjs updates is a non-starter. Typescript wont have support for OffscreenCanvas until 3.5.0 and that's not officially released. Even if they do release it, it's currently buggy and TensorFlow wont build against 3.5.0 without ts-ignore hacks. Even if you do the ts-ignore hacks, the resulting build of tensorflow running in face-api.js barf due to those ignored incompatibilities. Models don't work. flattened maps don't work. Whole thing barfs.

So, hacking created and updated tickets for those findings. The tfjs-core update thread above was updated. I also created a typescript issue that can be tracked here: microsoft/TypeScript#30998

It looks like things on both those projects move pretty quickly, so hopefully this wont fall to the bottom of the thousands of filed issues on the pile and actually get some updates. For now, I will be attempting to create a fake interface in the worker thread which proxies back to the main js with updates and commands. The approach is similar to one that @mizchi took here: tensorflow/tfjs#102 (comment)

The difference between @mizchi 's approach and my approach will be that i am attempting to fool the tensorflow library into believing it is running under non-worker thread conditions using my new found knowledge of how it works (gained by trying to fix their code). The plan is to build a faux document object and window object, complete with interfaces and values the library checks for when creating a canvas. Instead, I'll return wrapped instances of OffscreenCanvas, maybe with Proxy Traps, and catch any calls by the library to APIs I haven't stubbed out and build adaptors for those to canvas. Because TensorFlow does not ever return canvas elements to be drawn to the screen, the only overhead I'll have to worry about is sending the data from video into the worker to be processed. Because I'll be doing this with ImageBitmap, those updates will be zero-copy transferable objects (low latency). I suppose this is somewhat of a "shim" pattern and could be added to face-api.js as an adaptor or different API call if it works.

For anyone else following this path expecting that a heroic effort down this rabbit hole will maybe allow you to get this to work, a few notes you should consider:

TensorFlow (Google) is a massive project written in Typescript (Microsoft) which is made up of monolithic modules here:
"dependencies": { "@tensorflow/tfjs-converter": "1.1.0", "@tensorflow/tfjs-core": "1.1.0", "@tensorflow/tfjs-data": "1.1.0", "@tensorflow/tfjs-layers": "1.1.0" }
Each of those has some dependencies of their own. You'll end up having to update core, then update all the other modules the depend on core, then rebuild the whole tensorflow project with the same version of typescript, which in the case of this feature set, would be "next" or 3.5.0+, all of which aren't compatible with that version out of the box (at this time). Typescript appears to be driven primarily by features in the IE/Edge browser suite because Microsoft owns that project, and TensorFlow being Google but subject to limitations in Typescript means they A) have their own blessed version of Typescript, B) this is older than whatever the most recent release is, and C) is not as up to date with the features of the web as Google's Chrome browser team support. Maybe, eventually, once MS moves Edge to the Chromium/Blink/Whatever engine, possibly Typescript becomes one with the universe and offers support for these DOM features in sync with minimally Chrome and Edge, but ideally all major browsers. That would be awesome! But, back to the topic, the issues I ended up seeing were related to compilation errors stemmed from code that was doing type conversion like " float32ToTypedArray ", including some map functions/loops. Those were throwing errors at compile time, but unit tests ran fine. I was never able to get browserstack tests to run correctly, so I'm not sure if it really did work in the browser or not. Best of luck! @ me if you want to chat!

@jeffreytgilbert
Copy link

function isBrowser() { return typeof window === 'object' && typeof document !== 'undefined' && typeof HTMLImageElement !== 'undefined' && typeof HTMLCanvasElement !== 'undefined' && typeof HTMLVideoElement !== 'undefined' && typeof ImageData !== 'undefined'; }

This is a horrible function. If someone (me) wants to fake out a library into thinking it's in a browser, don't stifle that person by doing some oddball browser check (this is not how you detect if you're in a browser) and then be really silent and confusing when the library errors. I've been working against an error i thought was in tensorflow for hours only to realize it came from face-api.js code:

Error: getEnv - environment is not defined, check isNodejs() and isBrowser()

Tensorflow already has browser and node checks and it's own idea of environment. Why did you guys reinvent the wheel? :|

@justadudewhohacks
Copy link
Owner

I've been working against an error i thought was in tensorflow for hours only to realize it came from face-api.js code

I agree, that the error message might not be the best, but by looking at the stack trace one could have figured where the error message comes from.

The browser check is that complex, because we want to only initialize the corresponding environment of the library in case we are in a valid browser or nodejs environment to avoid errors at runtime. In any other case, it is up to the user to initialize the environment manually. All environment specifics can be monkey patched using faceapi.env.monkeyPatch.

@jeffreytgilbert
Copy link

Ok, so I'm posting this update to let everyone in this thread know that it is possible today to fool both tensorflow and face-api.js into running in a web worker and that they will run GPU accelerated, however you shouldn't get your hopes way up for perfectly jank free UX.

In my app, face detection takes approximately 60ms on a MacBook Pro (Retina, 15-inch, Mid 2015), which is only processing 640x480 stills. The stills are transferred to the worker using zero copy, so they avoid the serialize/deserialize and structured copy performance hits.

The app itself is only taking 1-2ms for any given RAF cycle, but visual jank is still occurring on Chrome when the worker thread takes longer than expected. I'm not even seeing any GC issues. The jank appears to happen while processing microtasks. I see bunches of timers being set. I'd have to look further into the face-api.js source to see if it's breaking apart workloads into chunks using 0ms setTimeout calls. If it is, those should be converted to Promises. Allowing the browser to handle batch processing stacks of timeouts will definitely result in slower performance if that's what's happening. Timeouts can take 2-4ms to resolve and Promises are almost immediate. I believe the details of how promise scheduling is done is still on a per browser basis, but if you're in this thread, you're today only interested in the ones that support OffscreenCanvas, and that's Chrome. Chrome handles them async.

Here's the admittedly over engineered code for creating a worker environment that tensorflow and face-api.js will run in:

Parent
`

	var screenCopy = {};
	for(let key in screen){
		screenCopy[key] = +screen[key];
	}
	screenCopy.orientation = {};
	for(let key in screen.orientation){
		if (typeof screen.orientation[key] !== 'function') {
			screenCopy.orientation[key] = screen.orientation[key];
		}
	}

	var visualViewportCopy = {};
	if (typeof window['visualViewport'] !== 'undefined') {
		for(let key in visualViewport){
			if(typeof visualViewport[key] !== 'function') {
				visualViewportCopy[key] = +visualViewport[key];
			}
		}
	}

	var styleMediaCopy = {};
	if (typeof window['styleMedia'] !== 'undefined') {
		for(let key in styleMedia){
			if(typeof styleMedia[key] !== 'function') {
				styleMediaCopy[key] = styleMedia[key];
			}
		}
	}

	let fakeWindow = {};
	Object.getOwnPropertyNames(window).forEach(name => {
		try {
			if (typeof window[name] !== 'function'){
				if (typeof window[name] !== 'object' && 
					name !== 'undefined' && 
					name !== 'NaN' && 
					name !== 'Infinity' && 
					name !== 'event' && 
					name !== 'name' 
				) {
					fakeWindow[name] = window[name];
				} else if (name === 'visualViewport') {
					console.log('want this?', name, JSON.parse(JSON.stringify(window[name])));
				} else if (name === 'styleMedia') {
					console.log('want this?', name, JSON.parse(JSON.stringify(window[name])));
				}
			}
		} catch (ex){
			console.log('Access denied for a window property');
		}
	});

	fakeWindow.screen = screenCopy;
	fakeWindow.visualViewport = visualViewportCopy;
	fakeWindow.styleMedia = styleMediaCopy;
	console.log(fakeWindow);

	let fakeDocument = {};
	for(let name in document){
		try {
			if(name === 'all') {
				// o_O
			} else if (typeof document[name] !== 'function' && typeof document[name] !== 'object') {
					fakeDocument[name] = document[name];
			} else if (typeof document[name] === 'object') {
				fakeDocument[name] = null;
			} else if(typeof document[name] === 'function') {
				fakeDocument[name] = { type:'*function*', name: document[name].name };
			}
		} catch (ex){
			console.log('Access denied for a window property');
		}
	}

`

Worker
`

Canvas = HTMLCanvasElement = OffscreenCanvas;
HTMLCanvasElement.name = 'HTMLCanvasElement';
Canvas.name = 'Canvas';

function HTMLImageElement(){}
function HTMLVideoElement(){}

Image = HTMLImageElement;
Video = HTMLVideoElement;

// Canvas.prototype = Object.create(OffscreenCanvas.prototype);

function Storage () {
let _data = {};
this.clear = function(){ return _data = {}; };
this.getItem = function(id){ return _data.hasOwnProperty(id) ? _data[id] : undefined; };
this.removeItem = function(id){ return delete _data[id]; };
this.setItem = function(id, val){ return _data[id] = String(val); };
}
class Document extends EventTarget {}

let window, document = new Document();

		// do terrible things to the worker's global namespace to fool tensorflow
		for (let key in event.data.fakeWindow) {
			if (!self[key]) {
				self[key] = event.data.fakeWindow[key];
			} 
		}
		window = Window = self;
		localStorage = new Storage();
		console.log('*faked* Window object for the worker', window);

		for (let key in event.data.fakeDocument) {
			if (document[key]) { continue; }

			let d = event.data.fakeDocument[key];
			// request to create a fake function (instead of doing a proxy trap, fake better)
			if (d && d.type && d.type === '*function*') {
				document[key] = function(){ console.log('FAKE instance', key, 'type', document[key].name, '(',document[key].arguments,')'); };
				document[key].name = d.name;
			} else {
				document[key] = d;
			}
		}
		console.log('*faked* Document object for the worker', document);

		function createElement(element) {
			// console.log('FAKE ELELEMT instance', createElement, 'type', createElement, '(', createElement.arguments, ')');
			switch(element) {
				case 'canvas':
					// console.log('creating canvas');
					let canvas = new Canvas(1,1);
					canvas.localName = 'canvas';
					canvas.nodeName = 'CANVAS';
					canvas.tagName = 'CANVAS';
					canvas.nodeType = 1;
					canvas.innerHTML = '';
					canvas.remove = () => { console.log('nope'); };
					// console.log('returning canvas', canvas);
					return canvas;
				default:
					console.log('arg', element);
					break;
			}
		}

		document.createElement = createElement;
		document.location = self.location;
		console.log('*faked* Document object for the worker', document);

`

@jeffreytgilbert
Copy link

Screen Shot 2019-04-23 at 9 05 50 PM

Here's what I'm seeing btw. I'm going to try your suggestion of looking at using a different model that might process faster.

@jeffreytgilbert
Copy link

Screen Shot 2019-04-24 at 12 42 36 AM

Check this out. This is what I'm talking about when I'm making this correlation. When timers are used in bulk, they appear to mess up the process scheduling by spamming the event loop. Promises don't appear to have the same problem. Chrome bundles them up nicely and still has the ability to handle requestAnimationFrame requests. I'd like to see if there's a way in face-api.js to fix the workload splitting so it doesn't rely on setTimeout

@justadudewhohacks
Copy link
Owner

I'd like to see if there's a way in face-api.js to fix the workload splitting so it doesn't rely on setTimeout

Hmm, actually there are no calls to setTimeout, tf.nextFrame or requestAnimationFrame in face-api.js. Could it be, that the async behaviour you are encountering here is due to downloading data from the GPU via tf.data()?

@jeffreytgilbert
Copy link

Screen Shot 2019-04-24 at 2 34 49 AM

ok, possibly disproved the timer spam theory. I'm now pointing to the GPU work. While I was overwriting everything sacred (window, document, etc) I rewrote the setTimeout function so it uses promises and request animation frame for 0 ms setTimeouts, and falls back to setInterval for actual timers. It worked exactly how I anticipated it would, except that the jank is still present and the only thing left to point a finger at is the GPU load that's 2 frames long. 👎

So, for everyone watching, probably keep your GPU load in mind. It can block things just like anything else.

@jeffreytgilbert
Copy link

jeffreytgilbert commented Apr 24, 2019

Timeout replacement code

// More really bad practices to fix closed libraries. Here we overload setTimeout to replace it with a flawed promise implementation which sometimes cant be canceled.

let callStackCount = 0;
const maxiumCallStackSize = 750; // chrome specific 10402, of 774 in my tests

setTimeout = function (timerHandler, timeout) {
let args = Array.prototype.slice.call(arguments);
args = args.length <3 ? [] : args.slice(2, args.length);
if (timeout === 0) {
if (callStackCount < maxiumCallStackSize) {
var cancelator = {cancelable: false };
callStackCount++;
new Promise(resolve=>{
resolve(timerHandler.apply(self, args));
});
return cancelator;
} else {
requestAnimationFrame(()=>{
timerHandler.apply(self, args);
});
callStackCount = 0;
return;
}
}
const i = setInterval(()=>{
clearInterval(i);
timerHandler.apply(self, args);
}, timeout);
return i;
};

clearTimeout = (id)=>{ console.log(id); if(id && id.cancelable === false) { console.error('woops. cant cancel a 0ms timeout anymore! already ran it'); } else { clearInterval(id);} };

// var x = setTimeout((x,y,z)=>{console.log(x,y,z);}, 0, 'hello', 'im', 'cassius');
// var y = setTimeout((x,y,z)=>{console.log(x,y,z);}, 1000, 'hello', 'im', 'cassius');
// clearTimeout(x);
// clearTimeout(y);

@hyakki
Copy link

hyakki commented Apr 24, 2019

Is there any other "cleaner" way to do this ?

@justadudewhohacks you mentioned the faceapi.env.monkeyPatch but how does it work exactly ?

I mean, lets say I have a main.js that only do this:

const worker = new Worker('worker.js');

worker.postMessage('foo');

and a worker where I want to be able to do this:

import * as faceapi from 'face-api.js';

faceapi.loadFaceExpressionMode('assets/models/');

onmessage = function(event) {
  console.log(event);
}

where and how should I use the faceapi.env.monkeyPatch ?

The error raised atm is the following:Uncaught (in promise) Error: getEnv - environment is not defined, check isNodejs() and isBrowser().

@justadudewhohacks
Copy link
Owner

@maximeparisse you would monkey patch environment specific after importing the package. In the nodejs examples we monkey patch Canvas, Image and ImageData for example, as shown here.

Refer to the Environment type to see what can be overridden.

@hyakki
Copy link

hyakki commented Apr 25, 2019

@justadudewhohacks : Thank you for your reply, i will give a shot and give my feedbacks here in case that can help others.

@hyakki
Copy link

hyakki commented Apr 25, 2019

@justadudewhohacks : I've tried without success to do that in a web worker. I understand how you patched the env spec in nodejs but i can't see how i can reproduce it for a web worker.

@jeffreytgilbert
Copy link

jeffreytgilbert commented May 6, 2019

@maximeparisse face-api.js only looks for those native methods as a node server vs browser check, per my example above. There are also tfjs detections. You have to set those values inside the worker before loading the libraries in order to fool the libraries into believing they're in a browser. If they fall into the node detection block, they will fail that check too, then bail out to a null result rather than a default (browser). A good patch to apply to face-api.js would be to add worker cases and change the detection to if/elseif/elseif/else style blocks so there is always a default case and more reasonable fallbacks. This is doable, but the trick is in supporting workers for browsers other than Chrome or Firefox with the flag set to enable OffscreenCanvas.

@ivanbacher
Copy link

Anyone manage to get this working?

@jeffreytgilbert
Copy link

jeffreytgilbert commented Sep 20, 2019

Yes. Turned out the integrated gpu was the biggest insurmountable bottleneck to avoid blocking the rendering pipeline. I’d be willing to revisit this once tensorflow and this lib have been updated to allow for offscreencanvas support, which is necessary to avoid excessive monkey patching and environmental fake outs to the two libs.

@jeffreytgilbert
Copy link

@josiahbryan
Copy link

@jeffreytgilbert It appears TensorFlow.js now supports Offscreen Canvas... At least according to this article: https://medium.com/@wl1508/webworker-in-tensorflowjs-49a306ed60aa - does that jive with what you're seeing? Should face-api/tfjs "just work" in WebWorkers now...?

@josiahbryan
Copy link

I can sadly confirm that face-api does not Just Work, even with monkeyPatch.

When I do the following in my worker:

import * as faceapi from 'face-api.js';

faceapi.env.monkeyPatch({ Canvas: OffscreenCanvas })

I get:

Uncaught Error: monkeyPatch - environment is not defined, check isNodejs() and isBrowser()
    at Object.monkeyPatch (index.ts:38)

I've checked the isBrowser module, and I've done a TON of monkey patching of my own BEFORE calling monekyPatch(), and got the following checks to pass in my own code:

// isBrowserCheck is true in my tests
const isBrowserCheck = typeof window === 'object'
&& typeof document !== 'undefined'
&& typeof HTMLImageElement !== 'undefined'
&& typeof HTMLCanvasElement !== 'undefined'
&& typeof HTMLVideoElement !== 'undefined'
&& typeof ImageData !== 'undefined'
&& typeof CanvasRenderingContext2D !== 'undefined';

My own monkey patching is based on @jeffreytgilbert 's example above, with a few edits to make it compile, and I added CanvasRenderingContext2D = OffscreenCanvasRenderingContext2D;.

Bottom line: faceapi.env.monkeyPatch does not even try to monkey patch because of the error above.

Anyone have any suggestions on how to get this to even work? GPU or no GPU, I just want to try to get it to work. (Chrome 79 on a brand new MacBook Pro 15", so yes, OffscreenCanvas is supported.)

@josiahbryan
Copy link

Update: Got it working.

How? Use this gist: https://gist.github.com/josiahbryan/770ca1a9d72f1b35c13219ba84dc0495

Import it into your worker. If you have a bundler setup for your worker, just do (assuming you put it in your utils/ folder):

import './utils/faceEnvWorkerPatch';

Don't need to call faceapi's monkeyPatch if you use that.

Fair warning: That gist is NOT pretty. It is a conglomeration of hacks and workarounds and whatever else. But it works. Face detection is working for me now in a web worker.

Ideally, face-api would support a WebWorker WITHOUT having to do that horrendous hack of a monkey patch I just uploaded, but, yeah. At least this works now.

@remipassmoilesel
Copy link

Hi !

These lines of monkey patch cause errors on the latest version of Chrome, when used elsewhere than localhost:

self.HTMLCanvasElement.name = 'HTMLCanvasElement';
self.Canvas.name = 'Canvas';
Cannot assign to read only property name of Canvas

As face-api.js works well with OffscreenCanvas, shouldn't this type be added as a possible source ? (for detectSingleFace() per example)

@remipassmoilesel
Copy link

If you are interested in a PR let me know.

@josiahbryan
Copy link

Anyone seeing massive memory leaks in WebWorkers in the latest Chrome using detectAllFaces? I've found detectAllFaces to be leaking memory like a siv - #732

@cindyloo
Copy link

cindyloo commented Dec 1, 2020

Hey about your questions on displaying it on a canvas, you actually can
just render to a canvas from INSIDE the web worker and it will
AUTOMATICALLY update the canvas outside the webworker.

How?

Well, in my code, I passed in the canvas from the main thread like this:

const canvasFromMainThread = detectionResultsCanvas.transferControlToOffscreen();

myWebWorker.postMessage({
    type: "setCanvasFromMainThread",
    canvasFromMainThread,
}, [ canvasFromMainThread ]);

One tip:
The detectionResultsCanvas that I pass IN to the web worker will NOT have the video/image on it that the web worker is using - it is only for OUTPUT of the results.

So, in my main thread, what I really have is something like this:

  • CanvasContainer

    • Canvas 1 - LiveVideoCanvas
    • Canvas 2 - OverlayCanvas

I use simple CSS to make the LiveVideoCanvas be position: absolute; top: 0; left: 0; z-index: 0, and then the OverlayCanvas is position: absolute; top: 0; left: 0; z-index: 1

Since the OverlayCanvas is positioned on top of the LiveVideoCanvas and it is being cleared every time before detection results and it is transparent, what we end up having is the live video showing underneath the overlay with the live results from the web worker being rendered on top of the live video.

Make sense? Feel free to ping with questions!

I am pulling frames off the live video element, drawing them into a canvasVideoBuffer, then passing the imageData via postMessage to my worker. this worker draws the face detection box into self.offScreenCanvasCtx which was set up at the loading of the worker and the models. It's super slow however - the canvasVideoBuffer doesn't seem to keep up with the video feed (both canvas elements are visible/overlaid over the source video). Should I be setting the canvasVideoBuffer display to none or change the frame intervals or something to speed it up? is it b/c I'm not clearing the canvas target out? thanks!

@josiahbryan
Copy link

Hey @cindyloo - yeah the canvas I use for capture-and-transfer is invisible / hidden. The user's see the live <video> element. I do draw that video element to a canvas then do .getImageData to xfer it to the worker, but the users don't see the canvas, so they don't know how fast/slow I'm drawing. They just see the raw <video> feed, so they feel like it's real time.

The only thing they see is the "overlay" canvas from the worker - and yeah, that is a bit laggy at 12fps (I actually wrote a tuner to automatically move FPS up/down as needed, can share if desired) - but yeah, the overlay can be slow, but IMHO that's okay as long as the video behind it is buttery smooth.

Ping with specific questions for more details, happy to share!

@cindyloo
Copy link

cindyloo commented Dec 1, 2020

@josiahbryan so your face detection overlay is a bit laggy too? I need it to be as fast to real-time as possible, looking at setting up an interval instead of sending the image to the worker at every render.. any suggestions are welcome!

@josiahbryan
Copy link

@cindyloo and anyone else who is interested - I know you didn't ask, but here's the FPS Tuner I use in some of my projects as needed. Using it right now in a bespoke project with face-api and a webworker to tune the FPS up/down as performance allows: https://gist.github.com/josiahbryan/c4716f7c9f051d7c084b1536bc8240a0

@josiahbryan
Copy link

I mean, I don't need the faces 100% real time - as long as within a few frames of the actual real time, I'm happy.

But yeah, the bottle neck is the detection, not the sending of frames. I can get 16-20fps when I force it on a MacBook pro with tons of ram and cpu.

I haven't tried this yet with faceapi, but I do it with OpenCV in another worker: Specifically, to improve performance, I resize the video WHEN I DRAW IT into the canvas down to a smaller size:

faceCaptureCtx.drawImage(videoEl, 0, 0, smallWidth, smallHeight);

The small size is calculated from the aspect ratio of the video and downsized to something like 420px x whatever for best results.

I imagine doing something like that before sending the frame to the worker for face-api would improve both the transfer and the detection, but haven't tried it yet myself @cindyloo

Just fair warning tho - might not matter for your application - but the smaller your source frame (e.g. the smaller you resize the video), the worse the detector will be at finding small faces. Will still work fine with faces that fill "a lot" of the frame, but will fail to find smaller faces as you resize smaller. Just FYI

@josiahbryan
Copy link

These lines of monkey patch cause errors on the latest version of Chrome, when used elsewhere than localhost:

Hey @remipassmoilesel , you're right, when I revisited this project this year, I did have to update the monkeypatch I wrote to work with lates chrome. I've updated the gist above, but here is the updated monkeypatch script:

https://gist.github.com/josiahbryan/770ca1a9d72f1b35c13219ba84dc0495

Also, unrelated, but for others (@cindyloo or whoever), here's the writeup on FpsTuner I did a while ago on how to use it:

https://github.com/josiahbryan/fps-auto-tuner

@remipassmoilesel
Copy link

Hey @josiahbryan ! What do you think about just using @OffscreenCanvas as an input source ? It works well for me

@josiahbryan
Copy link

@remipassmoilesel not sure what you mean...can you be more specific? :-D Like pseudo code example or bulletpoints? Afaik you can't render into the canvas from outside the worker once you xfer a canvas to the worker? Or maybe I'm not understanding what you're saying?

@remipassmoilesel
Copy link

Hi @josiahbryan,

I mean presently this sample code needs a type assertion on input (as TNetInput):

  public async faceDetection(input: HTMLVideoElement | OffscreenCanvas): Promise<FaceDetection | undefined> {
    return faceApi
      .detectSingleFace(input as TNetInput, new faceApi.TinyFaceDetectorOptions())
      .withFaceLandmarks()
      .withFaceDescriptor()
      .withAgeAndGender();
  }

OffscreenCanvas cannot be used as input according to type definitions in face-api.js/build/commonjs/dom/types.d.ts. But I tried it and it work. I don't know if it is a desirable practice but if it is, it can simplify the use of face-api.js in a worker.

@jeffreytgilbert
Copy link

jeffreytgilbert commented Dec 7, 2020

The bottlenecks to making this work without jank are not simply cpu bound. On a machine without a discrete or very new mobile GPU, the GPU will lock up and cause jank on the main thread. See, the GPU is also used to render the main thread, and you can pin it with ease on intel integrated graphics chips that only have like 40ish cores. All the discrete GPUs these days start in the 1000+ cores range, so i think it just gets geeked when it cant handle the same quantity of tasks. I was never able to get an exact answer for just why that is the case, but it's the case. So in your dev tools, you can monitor GPU too. I would watch that one if you see issues on your target browser/feature support level.

#47 (comment)

@cindyloo
Copy link

I couldn't get close to the speed of this example and as such implemented a webworker. Combined with an upgrade to tfjs and a throttle of the frames sent to the worker, I have a fairly responsive detector w landmarks. Now that I'm realizing we can't use offscreenCanvases on mobile browsers, I'm questioning my decision. Any thoughts/comments?

@jeffreytgilbert
Copy link

It's not mobile browsers. It's embedded webviews in apps. Mobile browsers probably support what you need unless the android version is too old.

As some have suggested above, if offscreen canvas is not supported for web workers and gpu accelerated detection, you can alternatively use a more cpu bound brute force technique with opencv. It's a different library built for cpu not gpu acceleration, so it should work. The web supported portions of this back in 2010s when a team at opera was leading the charge on perspective driven face detection and 3d, mimicking the nintendo wii functionality using your face, and orienting the 3d camera and depth based on where you are perceived to be standing from the display. Worked well back then on just cpu, but with much simpler models. Tensorflow is a great library, but if you cant use it because of offscreen canvas support, browser support, gpu support, etc, the thing to do would be go down the cpu accelerated route and simpler models like opencv and webassembly and stuff like that which might be good enough for your use case.

@cindyloo
Copy link

thanks @jeffreytgilbert. The example cited above doesn't appear to work on iphone X chrome (error attached). I have used opencv but not opencv.js. I'll have a look.
IMG_0978

@jeffreytgilbert
Copy link

jeffreytgilbert commented Dec 15, 2020

thanks @jeffreytgilbert. The example cited above doesn't appear to work on iphone X chrome (error attached). I have used opencv but not opencv.js. I'll have a look.

IMG_0978

That's unfortunate. Dumb question, but did you grant access to the microphone and camera for the site? That looks like either the support for getUserMedia isn't there which would be very strange since chrome were the ones to introduce the webrtc apis i believe, or potentially an access restriction issue. You wouldnt be able to access this from within the webworker, but you should be able to get to it from the main window.

@ScottDellinger
Copy link

Trying to use anything other than Safari on an iPhone or iPad is not going to work well, in my experience. Apple doesn't give full WebRTC support to those browsers the way it does to Safari.

@jeffreytgilbert
Copy link

Trying to use anything other than Safari on an iPhone or iPad is not going to work well, in my experience. Apple doesn't give full WebRTC support to those browsers the way it does to Safari.

I believe they recently discontinued their policy around locking down other browser apps so they have to use the webkit engine. Chrome may still use it. Havent looked to see who swapped out webkit for their own engine.

@aendra-rininsland
Copy link

So, I did this for a project and had it working fairly well in a worker with OffscreenCanvas, but honestly wasn't getting any better perf from it than having it on the main thread, ended up removing it. I think the data transfer between working and main thread might be the bottleneck?

@josiahbryan
Copy link

josiahbryan commented Mar 18, 2021 via email

@aendra-rininsland
Copy link

@josiahbryan Ohhh okay that totally makes sense. I was rendering the video at the same speed as the face-api.js processing so that there wouldn't be lag when drawing stuff on top of the canvas element, but if you're wanting the video to play in realtime that would be beneficial. Cool! 👍

I'm still a bit of a ways off from publishing my project but will post a working example repo once I do, have learned a lot getting this working!

@josiahbryan
Copy link

josiahbryan commented Mar 18, 2021 via email

@jeffreytgilbert
Copy link

You definitely don't need to process on every video frame. If you do that, it will choke. Rather than making it arbitrary (like say running it at 12FPS capture) you can have it async so the capture only happens when the detection has finished and the background thread is ready to process a new frame. If this causes too many hiccups in performance on the main thread, it's likely due to the video card not being fast enough to run the ML and also handle the Main UI thread. In Chrome, last I checked, even from a worker thread, the GPU is not isolated from the Main UI thread, so your GPU ends up being a bottleneck. If this is the case you are seeing where GPU (which can be evaluated in dev tools under the performance recorder) is taking too long, you'll need to get a better GPU orrrrr you can simplify your detection model orrrrr pick a simpler one. OffscreenCanvas uses zero copy to move data between parent and worker thread, however I did also notice the capture of the image data from the frame is a non-trivial hit to performance.

So, quick recap of steps to take:

  1. check your GPU performance when it runs that detection to see if it takes longer than 16ms as referenced here: Webworker #47 (comment)
  2. only do detection work when you're not currently already doing detection work (promises help)
  3. don't capture too much data when reading from video if you can avoid it. Instead, wait until your promise resolves from the previous detection and then attempt to read more

If that doesn't work, let us know and I'll see if there's something I might have missed.

@jeffreytgilbert
Copy link

Also some food for thought, but if you can't buy a new GPU to run the ML without delays, you can fall back to CPU processing which would be isolated in a web worker on a background thread and won't cause visual jank. The downside on that approach is you will have fewer updates because each check will take longer and you're going to spin the fans on the CPU while it brute forces its way through the work. I don't have an exact answer on how you would force TensorFlow to CPU, but I seem to remember it doing that when it thought the runtime environment was node.js originally before I added the container hacks to fool it into thinking it was in the main UI thread context. I bet it could be done pretty trivially.

@zy308718320
Copy link

Why BodyPix supports ImageData

@Waishnav
Copy link

Waishnav commented Feb 16, 2024

Since the OverlayCanvas is positioned on top of the LiveVideoCanvas and it is being cleared every time before detection results and it is transparent, what we end up having is the live video showing underneath the overlay with the live results from the web worker being rendered on top of the live video.

@josiahbryan is this solution work for 30 FPS video. Currently in my application, I've everything rendering video/canvas, and face detection in main thread. and on video play I'm trying to process each frame. But what end up happening is that I'm only able to process the about 1/10th of all video frames. Which is not I want to implement

so my question is, for 1 minute 30 FPS video (i.e 1800 frames). can I detect all the faces from each and every frames using the Web worker solution?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests