Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Major restructuring of quagga.js main module required to fix decodeSingle, numOfWorkers, etc #105

Closed
ericblade opened this issue Dec 31, 2019 · 5 comments · Fixed by #171

Comments

@ericblade
Copy link
Owner

Related to:
#103 (workers are completely broken after typescript due to removal of the custom UMD module generator that does not work with webpack v4) -- restructure so webpack worker-loader works correctly

#5 (parallel decoding with decodeSingle() is totally broken)

#78 (external reader plugins fail due to using workers) - workers need to be able to receive readers from external sources, or workers must be disabled for using external sources.

The answer to these problems is to restructure the main Quagga library so that it is able to create multiple instances of the reader/decoder system, and handle those instances inside a static module that basically looks like the current interface.

Closing related issues in favor of this one.

@ericblade
Copy link
Owner Author

A few thoughts: A restructuring of how Worker support is handled probably wouldn't have all identical code running in multiple threads -- right now, i would be willing to bet there's no benefit at all, just a lot of overhead, to using multiple threads, since it's just copying the entire Quagga lib into the Worker, and then running the exact same configuration on it.

Splitting it so that different decoders are run on different threads, as well as the live image detection on a different thread, could be quite valuable, however.

@ericblade
Copy link
Owner Author

I'm envisioning having a static object that handles managing a Worker pool and something rather close to the current public interface:

  • init(config, cb, imageWrapper) -- creates a new reader/decoder setup with the given config, returns a promise or callbacks to a function that receives, a new object that provides start/stop/pause/{on|off}detected/{on|off}processed. if not called with an imageWrapper, then setup live inputstream decoding, as it currently does

  • registerReader() -- configures all instances with a new external reader configuration

  • registerResultCollector() -- configures all instances with a new result collector

  • decodeSingle(config, resultCallback) -- basically does init(), attempts to decode the given image. returns a promise or callbacks to a function that receives the decoder output.

  • references to utility classes, such as ImageWrapper, ResultCollector, CameraAccess

Basically, how this would differ to current public usage, is that for live decoding, you'd be handling the object that comes back from init() instead of handling the module, as is currently done.

Internally, the static interface would manage a pool of Workers and/or class instances that would handle the image recognition and decoding tasks.

Right now, a quick look at the main quagga.js points out several "gotcha"s that apply to attempting to do anything in parallel -- there's a large quantity of module-level variables in the quagga.js file that completely prevent parallel execution of multiple different decodes from running at the same time. Probably the largest one being that there's a single _config variable that when changed, would instantly affect all running decodes.

A large part of this change is probably going to involve updating a lot of functions to be more "pure" functions, as well as converting several things into multiple-instance constructable classes.

There's a lot of ways to quickly break this library, taking advantage of it's natural asynchronus qualities .. the most obvious simply being Quagga.init(); Quagga.stop(); which pretty much breaks things if you don't wait for init() to complete before calling stop(). It gets worse from there, with the parallel decoding being something that obviously should work but doesn't.

ANYWAY, I'm entirely too long winded right now, so i'm going to wrap this up, and take anyone else's input into consideration as i embark on this task, while also improving typescript support and updating other stuff as i touch it.

@twsouthwick
Copy link
Contributor

twsouthwick commented Apr 13, 2020

This is blocking a project I was hoping to use Quagga for. I'm happy to help with any of the restructuring. I've never done this kind of thing in JavaScript, but have lots of experience doing it in .NET applications.

I took a look at the library and agree with your assessment. The first step would be to remove the globals that the library relies on and instead use pure functions. If the typescript work were completed, it would be much easier to reason about the variables and where they are going/coming from. However, progress can still be made.

It looks like the main set of variables are the following in Quagga.js:

var _inputStream,
    _framegrabber,
    _stopped,
    _canvasContainer = {
        ctx: {
            image: null,
            overlay: null,
        },
        dom: {
            image: null,
            overlay: null,
        },
    },
    _inputImageWrapper,
    _boxSize,
    _decoder,
    _workerPool = [],
    _onUIThread = true,
    _resultCollector,
    _config = {};

I haven't dug too deeply yet, but it seems like the first thing to do would be clean things up to access these variables via some sort of context object such as the following:

class QuaggaContext{
    constructor(
        public config: QuaggaJSConfigObject
    ){
    }

    public inputStream;
    public framegrabber;
    public stopped : boolean;
    public canvasContainer ;
    public inputImageWrapper;
    public boxSize;
    public decoder;
    public workerPool: any[];
    public onUIThread = true;
    public resultCollector;
}

This could be added slowly; ie a single property per commit. Again, if things were completely in typescript, we could have a bit more guarantee of variable access. Maybe there's something available in the JS ecosystem, but I'm relatively new to it.

Once this context is available, we can pass that around to functions and slowly convert them to pure functions.

@ericblade
Copy link
Owner Author

Yep, I totally agree -- I've been working my way through a lot of the code trying to fix it up for use with TypeScript, and I think that's going to be very valuable.

I'm hoping that the part that is blocking you is the parallel decoding part rather than the Worker part :)

@ericblade
Copy link
Owner Author

Regarding the topic of multi-threading with Workers:

So, with the pull req here #171 .. that we should be able to somehow get a Worker to load the Quagga class, which we then should be able to manage which workers do what with a little smart configuration work.

Unfortunately, there's some things that I'm having some issues with figuring out with regard to Workers, TypeScript, and Webpack. If anyone has any experience with that, I'd love to hear some tips.

In Serratus's version of Workers, there was a custom webpack plugin -- which does not port directly to webpack v4 -- here: #171

It looks like that plugin would load the entire contents of the packaged library as a string inside the packaged library, which could then be referenced by the Worker creation code, to create a BLOB to pass off as the worker code.

I'm not opposed to doing that exact same method -- i just have no idea how to do it in Webpack v4, and going back to v2 is not an option.

I've also attempted to load the 'quagga.js' file using a URI parameter to the Worker, but that doesn't seem to work in either the test environment or inside the quagga-reader-qr app, because the web server won't let you access arbitrary .js files. I've also attempted to ship a worker.js file that includes quagga.js as a module, but then i end up with a worker that is basically an exact duplicate of the main file .. but still has the same issue with not being able to access it via URI.

So, Serratus's Blob solution is the only thing that I see that could work. I'd appreciate it if anyone has any other ideas, or has any idea how to do that Blob idea inside webpack v4.

Google isn't finding me a whole lot to work with on how to actually do anything useful with Worker threads.

... I guess another idea, might be to strip Quagga of the Worker support completely, and then write another library that wraps it in some kind of manager that handles the Workers.

ericblade added a commit that referenced this issue May 1, 2020
Fix #5 parallel use of decodeSingle, part of #105, restructuring
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants