Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Virtual background or background blur #166

Open
zerolabnet opened this issue Aug 1, 2023 · 10 comments
Open

Virtual background or background blur #166

zerolabnet opened this issue Aug 1, 2023 · 10 comments

Comments

@zerolabnet
Copy link

zerolabnet commented Aug 1, 2023

Any plans to add a virtual background? Or at least a background blur?

@jech
Copy link
Owner

jech commented Aug 1, 2023

It's not planned for the immediate future, but it would be a nice feature to have.

It should be implemented in the client, so that the background is blurred before the video is even sent to the server. I can see two ways to implement it:

Please note that for privacy reasons Galene bundles all of the libraries it uses, so any library needs to be freely licensed and sufficiently small to be bundled with Galene.

@zerolabnet
Copy link
Author

Jitsi uses TFLite for this purpose:
https://www.tensorflow.org/lite

with the MediaPipe Meet Segmentation model:
https://mediapipe.page.link/meet-mc
https://ai.googleblog.com/2020/10/background-features-in-google-meet.html

and paired with WebAssembly SIMD instructions:
https://v8.dev/features/simd

@zerolabnet
Copy link
Author

Perhaps something useful can be gleaned from this repository: https://github.com/minhkhue3214/new_virtualbackground

@zerolabnet
Copy link
Author

zerolabnet commented Aug 3, 2023

I tried using the @mediapipe/selfie_segmentation library to solve this issue

<script src="https://cdn.jsdelivr.net/npm/@mediapipe/selfie_segmentation/selfie_segmentation.js" crossorigin="anonymous"></script>

I added a blur filter:

    'blur': {
        description: "background blur",
        f: function (src, width, height, ctx) {
            if (!(ctx instanceof CanvasRenderingContext2D))
                throw new Error('bad context type');
            if (ctx.canvas.width !== width || ctx.canvas.height !== height) {
                ctx.canvas.width = width;
                ctx.canvas.height = height;
            }

            const selfieSegmentation = new SelfieSegmentation({ locateFile: (file) => {
                return `https://cdn.jsdelivr.net/npm/@mediapipe/selfie_segmentation/${file}`;
            } });
            selfieSegmentation.setOptions({
                modelSelection: 1,
            });

            function onResults(results) {
                ctx.save();

                ctx.drawImage(results.image, 0, 0, width, height);

                ctx.globalCompositeOperation = 'destination-atop';
                ctx.drawImage(results.segmentationMask, 0, 0, width, height);

                ctx.filter = 'blur(16px)';
                ctx.globalCompositeOperation = 'destination-over';
                ctx.drawImage(results.image, 0, 0, width, height);

                ctx.restore();
            }

            selfieSegmentation.onResults(onResults);

            selfieSegmentation.send({ image: ?where? });

            ctx.resetTransform();
            return true;
        },
    },

I figured out how to apply filters to canvas, but I don't know where to send selfieSegmentation.send({ image: ?where? });

JavaScript Solution API:
https://github.com/google/mediapipe/blob/master/docs/solutions/selfie_segmentation.md

Can you help me?

@zerolabnet
Copy link
Author

If I understand correctly, then:
selfieSegmentation.send({ image: src });

But the image freezes after 8 seconds, so I guess I'm wrong somewhere.

@jech
Copy link
Owner

jech commented Aug 3, 2023

I'm unable to find the docs of the library you're using, so I'm not sure, but I suspect that a number of the functions you are calling are async, so you'd need to synchronise things (and deal with dropping frames when you're getting overtaken by the video).

Also, you're creating a new instance of the library each time — I suggest you create it just once, in an init method.

@zerolabnet
Copy link
Author

I rewrote the code, but still haven't figured out where I should use asynchronous functions?

I now get a static picture immediately when the video conference starts. The background blur effect is applied successfully, but the picture is static. Please help to understand, I have provided all the information about the library above in google github repository. The model description is also in the links above. I need your help very much.

/**
 * @typedef {Object} filterDefinition
 * @property {string} [description]
 * @property {string} [contextType]
 * @property {Object} [contextAttributes]
 * @property {(this: Filter, ctx: RenderingContext) => void} [init]
 * @property {(this: Filter) => void} [cleanup]
 * @property {(this: Filter, src: CanvasImageSource, width: number, height: number, ctx: RenderingContext) => boolean} f
 */

/**
 * @param {MediaStream} stream
 * @param {filterDefinition} definition
 * @constructor
 */
function Filter(stream, definition) {
    /** @ts-ignore */
    if(!HTMLCanvasElement.prototype.captureStream) {
        throw new Error('Filters are not supported on this platform');
    }

    /** @type {MediaStream} */
    this.inputStream = stream;
    /** @type {filterDefinition} */
    this.definition = definition;
    /** @type {number} */
    this.frameRate = 30;
    /** @type {HTMLVideoElement} */
    this.video = document.createElement('video');
    /** @type {HTMLCanvasElement} */
    this.canvas = document.createElement('canvas');
    /** @type {any} */
    this.context = this.canvas.getContext(
        definition.contextType || '2d',
        definition.contextAttributes || null);
    /** @type {MediaStream} */
    this.captureStream = null;
    /** @type {MediaStream} */
    this.outputStream = null;
    /** @type {number} */
    this.timer = null;
    /** @type {number} */
    this.count = 0;
    /** @type {boolean} */
    this.fixedFramerate = false;
    /** @type {Object} */
    this.userdata = {}
    /** @type {MediaStream} */
    this.captureStream = this.canvas.captureStream(0);

    /** @ts-ignore */
    if(!this.captureStream.getTracks()[0].requestFrame) {
        console.warn('captureFrame not supported, using fixed framerate');
        /** @ts-ignore */
        this.captureStream = this.canvas.captureStream(this.frameRate);
        this.fixedFramerate = true;
    }

    this.outputStream = new MediaStream();
    this.outputStream.addTrack(this.captureStream.getTracks()[0]);
    this.inputStream.getTracks().forEach(t => {
        t.onended = e => this.stop();
        if(t.kind != 'video')
            this.outputStream.addTrack(t);
    });
    this.video.srcObject = stream;
    this.video.muted = true;
    this.video.play();
    if(this.definition.init)
        this.definition.init.call(this, this.context);
    this.selfieSegmentation = null; // Store the instance of SelfieSegmentation
    this.timer = setInterval(() => this.draw(), 1000 / this.frameRate);
}

Filter.prototype.draw = function() {
    // check framerate every 30 frames
    if((this.count % 30) === 0) {
        let frameRate = 0;
        this.inputStream.getTracks().forEach(t => {
            if(t.kind === 'video') {
                let r = t.getSettings().frameRate;
                if(r)
                    frameRate = r;
            }
        });
        if(frameRate && frameRate != this.frameRate) {
            clearInterval(this.timer);
            this.timer = setInterval(() => this.draw(), 1000 / this.frameRate);
        }
    }

    let ok = false;
    try {
        if (this.video.readyState >= 2) { // Check if video data is ready (HAVE_CURRENT_DATA)
            ok = this.definition.f.call(this, this.video,
                                        this.video.videoWidth,
                                        this.video.videoHeight,
                                        this.context);
        }
    } catch (e) {
        console.error(e);
    }
    if (ok && !this.fixedFramerate) {
        /** @ts-ignore */
        this.captureStream.getTracks()[0].requestFrame();
    }

    this.count++;
};

Filter.prototype.initSelfieSegmentation = function() {
    // Create an instance of SelfieSegmentation and set the options
    this.selfieSegmentation = new SelfieSegmentation({
        locateFile: (file) => {
            return `https://cdn.jsdelivr.net/npm/@mediapipe/selfie_segmentation/${file}`;
        }
    });
    this.selfieSegmentation.setOptions({
        modelSelection: 1,
    });

    // Set the onResults function for SelfieSegmentation
    this.selfieSegmentation.onResults((results) => {
        // Save the current canvas state
        this.context.save();

        // Draw the original frame
        this.context.drawImage(results.image, 0, 0, this.video.videoWidth, this.video.videoHeight);

        // Make all pixels outside the segmentation mask transparent
        this.context.globalCompositeOperation = 'destination-atop';
        this.context.drawImage(results.segmentationMask, 0, 0, this.video.videoWidth, this.video.videoHeight);

        // Blur the context for all subsequent drawings, and then set the original image as the background
        this.context.filter = 'blur(16px)';
        this.context.globalCompositeOperation = 'destination-over';
        this.context.drawImage(results.image, 0, 0, this.video.videoWidth, this.video.videoHeight);

        // Restore the canvas to its original state
        this.context.restore();
    });
};

Filter.prototype.cleanupSelfieSegmentation = function() {
    // Clean up the SelfieSegmentation instance
    if (this.selfieSegmentation) {
        this.selfieSegmentation.close();
        this.selfieSegmentation = null;
    }
};

Filter.prototype.stop = function() {
    if(!this.timer)
        return;
    this.captureStream.getTracks()[0].stop();
    clearInterval(this.timer);
    this.timer = null;
    if(this.definition.cleanup)
        this.definition.cleanup.call(this);
};

/**
 * Removes any filter set on c.
 *
 * @param {Stream} c
 */
function removeFilter(c) {
    let old = c.userdata.filter;
    if(!old)
        return;

    if(!(old instanceof Filter))
        throw new Error('userdata.filter не является фильтром');

    c.setStream(old.inputStream);
    old.stop();
    c.userdata.filter = null;
}

/**
 * Sets the filter described by c.userdata.filterDefinition on c.
 *
 * @param {Stream} c
 */
function setFilter(c) {
    removeFilter(c);

    if(!c.userdata.filterDefinition)
        return;

    let filter = new Filter(c.stream, c.userdata.filterDefinition);
    c.setStream(filter.outputStream);
    c.userdata.filter = filter;
}

/**
 * @type {Object.<string,filterDefinition>}
 */
let filters = {
    'mirror-h': {
        description: "Horizontal mirror",
        f: function(src, width, height, ctx) {
            if(!(ctx instanceof CanvasRenderingContext2D))
                throw new Error('bad context type');
            if(ctx.canvas.width !== width || ctx.canvas.height !== height) {
                ctx.canvas.width = width;
                ctx.canvas.height = height;
            }
            ctx.scale(-1, 1);
            ctx.drawImage(src, -width, 0);
            ctx.resetTransform();
            return true;
        },
    },
    'mirror-v': {
        description: "Vertical mirror",
        f: function(src, width, height, ctx) {
            if(!(ctx instanceof CanvasRenderingContext2D))
                throw new Error('bad context type');
            if(ctx.canvas.width !== width || ctx.canvas.height !== height) {
                ctx.canvas.width = width;
                ctx.canvas.height = height;
            }
            ctx.scale(1, -1);
            ctx.drawImage(src, 0, -height);
            ctx.resetTransform();
            return true;
        },
    },
    'blur': {
        description: "Background blur",
        f: function (src, width, height, ctx) {
            if (!(ctx instanceof CanvasRenderingContext2D))
                throw new Error('bad context type');
            if (ctx.canvas.width !== width || ctx.canvas.height !== height) {
                ctx.canvas.width = width;
                ctx.canvas.height = height;
            }

            // Initialize SelfieSegmentation if not done already
            if (!this.selfieSegmentation) {
                this.initSelfieSegmentation();
            }

            // Send the current image to SelfieSegmentation
            this.selfieSegmentation.send({ image: src });

            ctx.resetTransform();
            return true;
        },
        init: function(ctx) {
            // Initialize SelfieSegmentation when the filter is set
            this.initSelfieSegmentation();
        },
        cleanup: function() {
            // Clean up SelfieSegmentation when the filter is removed
            this.cleanupSelfieSegmentation();
        },
    },
};

function addFilters() {
    for(let name in filters) {
        let f = filters[name];
        let d = f.description || name;
        addSelectOption(getSelectElement('filterselect'), d, name);
    }
}

@zerolabnet
Copy link
Author

zerolabnet commented Aug 4, 2023

If I understand correctly, selfieSegmentation.send() will be called asynchronously as part of the setInterval operation.

As a result, after clearing the browser cache, the code I gave yesterday works, but problems are observed at any interaction with the video stream, for example, when the blackboardMode setting is enabled, then the video frame freezes and errors are thrown into the console:

Uncaught (in promise) RuntimeError: memory access out of bounds
    at selfie_segmentation_solution_simd_wasm_bin.wasm:0x46a9ef
    at selfie_segmentation_solution_simd_wasm_bin.wasm:0x46a9d8
    at selfie_segmentation_solution_simd_wasm_bin.wasm:0x14562
    at selfie_segmentation_solution_simd_wasm_bin.wasm:0x12f02
    at ClassHandle.send (VM33 selfie_segmentation_solution_simd_wasm_bin.js:9:131937)
    at ra.i (selfie_segmentation.js:82:496)
    at ua (selfie_segmentation.js:14:299)
    at va.next (selfie_segmentation.js:15:91)
    at b (selfie_segmentation.js:15:330)

Can you give any recommendations?

Maybe you should look at the library, it's from Google Inc, not unknown developers? The model works well, the same model is used in Jitsi. I made a demo, everything works fine on it, but I can't integrate it with your code, I'm stuck, I need help. For now I'm loading the library externally by disabling Content-Security-Policy in the webserver.go module, but in the final version I see no problem including the library in the package with the rest of the code.

Body Segmentation with MediaPipe and TensorFlow.js:
https://blog.tensorflow.org/2022/01/body-segmentation.html

@aguaviva
Copy link

The reason why Galene is awesome is because the author has done an excellent job at avoiding feature creep.

To be honest, how much is this feature worth to you? Would you be willing to fund this feature with your cash or time?

@jech
Copy link
Owner

jech commented Jan 29, 2024

The policy in Galene is to minimise the amount of server-side features: the server needs to be simple, economical and rock solid. On the other hand, I have no objection to putting functionality in the client.

I've had some feedback that indicated that background blur is important for some people, and I actually know people who have promised to switch from Zoom to Galene when it implements background blur. The reason I'm still waiting is that the Chrome people have been working on native background blur (https://developer.chrome.com/blog/background-blur), and it would be much preferable to have the feature implemented natively rather than in user code. I suggest waiting some more to see if something comes out of the Chrome effort; if nothing does, I'll definitely consider implementing it in user code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants