Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

terminating with uncaught exception when running in parallel #156

Open
LinusU opened this issue Sep 5, 2014 · 15 comments
Open

terminating with uncaught exception when running in parallel #156

LinusU opened this issue Sep 5, 2014 · 15 comments

Comments

@LinusU
Copy link

LinusU commented Sep 5, 2014

Whenever I try and process a large number of images (usually happens around image 150, I'm processing max 16 in parallel) node-opencv crashes and brings down the entire node process.

OpenCV Error: Bad flag (parameter or structure field) (Unrecognized or unsupported array type) in cvGetMat, file /tmp/opencv-rp0FJY/opencv-2.4.9/modules/core/src/array.cpp, line 2482
libc++abi.dylib: terminating with uncaught exception of type cv::Exception: /tmp/opencv-rp0FJY/opencv-2.4.9/modules/core/src/array.cpp:2482: error: (-206) Unrecognized or unsupported array type in function cvGetMat

If I decrease my parallel limit to 1 everything works as expected.

@salmanulhaq
Copy link
Contributor

What functions are you calling? I've tried processing a couple of images in parallel through a REST API (built myself on top of node-opencv ) and it worked just fine.

@LinusU
Copy link
Author

LinusU commented Sep 6, 2014

I'm calling readImage and detectObject with a buffer that I provide. e.g.

function hasFaces (buffer, cb) {

  opencv.readImage(buffer, function (err, image) {
    if (err) { return cb(err); }

    image.detectObject(opencv.FACE_CASCADE, {}, function (err, faces) {
      if (err) { return cb(err); }

      cb(null, (faces.length > 0));
    });

  });

}

This function is iterated over a set of ~2000 buffers using async with a limit of 16 jobs in parallel.

async.eachLimit(buffers, 16, hasFaces, function (err, result) {
  if (err) { throw err; }

  // result is array of Booleans
});

@salmanulhaq
Copy link
Contributor

I first ran your example but instead of feeding it the buffers, I read images from my filesystem and it just worked fine. Then I looked at the error message closely and realized that it's not cascade detection issue at all.

Unrecognized or unsupported array type

OpenCV basically fails to decode your buffer into a valid image. Can you specify how you're populating your buffers?

I remember I have to create a new (binary) Buffer before sending it off to node-opencv for proper decoding. It failed otherwise for me. Here's how I did it:

// read your in whatever way, maybe just fetching the raw file say http://abc.com/img.png

var opencvBuffer = new Buffer(normalBuffer, 'binary');

opencv.readImage(opencvBuffer, function(err, image){})

Hope this helps.

@LinusU
Copy link
Author

LinusU commented Sep 7, 2014

I just tried again and I'm very sure that the buffers are alright.

I've tried rerunning the script numerous times, only changing the number of jobs that runs in parallel. It also seems like the more concurrency the earlier it crashes. Dragging it up to 128 makes it crash even before it has started running all those jobs.

I have a total of 1798 images, as buffers loaded into memory, which makes it easy for me to test this out. How many files do you have? Could you try something like the code below and adjust the CONCURRENCY between runs.

var fs = require('fs');
var async = require('async');

var CONCURRENCY = 16;

function hasFaces (buffer, cb) {

  opencv.readImage(buffer, function (err, image) {
    if (err) { return cb(err); }

    image.detectObject(opencv.FACE_CASCADE, {}, function (err, faces) {
      if (err) { return cb(err); }

      cb(null, (faces.length > 0));
    });

  });

}

var list = fs.readdirSync('./photos');
var buffers = list.map(function (path) { return fs.readFileSync(path); });

async.eachLimit(buffers, CONCURRENCY, hasFaces, function (err, result) {
  if (err) { throw err; }

  console.log('IT WORKS!');
});

Thank you for taking time with this :)

@LinusU
Copy link
Author

LinusU commented Sep 7, 2014

I also tried to remove the image.detectObject and only load the buffers into opencv. It worked and didn't crash a single time.

@LinusU
Copy link
Author

LinusU commented Sep 7, 2014

Don't know if this is relevant but I might have found a bug in the c++ code.

Detecting faces is done with the help of uv_queue_work which basically means that we have three functions running.

The one that starts everything, this is called directly from javascript and is running on the main loop. In this case it comes from the call to detectObject() on our image object.

This function will collect all the data needed and put it onto a baton object. This object is accessible from all three functions.

The second function does all the actual work, it runs on a background thread.

This function does, among other, three things. Create a new vector, std::vector<cv::Rect> objects;. Calls opencv's detectMultiScale which will populate the vector. Sets a pointer on the baton to point to the vector baton->res = objects;.

The problem here is that objects was only allocated for this function and will now be freed. It might at any point be allocated and written over with something else.

The function that passes everything back to javascript land. It constructs an array out of the baton->res and passes it to the callback function inside nodejs.

The problem here is that it might be garbage in the ->res and we have no idea what we are returning.


The only problem is that the error message doesn't really seem to indicate this exact problem. It's more like there is a similar problem but with the image data being freed and then overwritten.

@salmanulhaq
Copy link
Contributor

Even more strangely, the code snippet you shared executes alright on my machine with ocv 2.4.9 but what's strange is that for the exact same image copied ~150 times in the folder which has just 1 face in it, detectObject could only find the face in 92 of them which means that the buffer indeed gets overwritten randomly.

@peterbraden do you think it would impact performance if we jumble up the cascade load, image load and running the classifier into a single function? I'll create modify, try out and create a pull request if that works.

@LinusU your assessment makes sense and the error does give us the reason but not exactly where and how it happens :) I'll ping on this thread once I fix it.

@salmanulhaq
Copy link
Contributor

@LinusU I need your advice: Instead of loading the cascade file for every buffer you send off -- if you want to do determine if a batch of images have faces in them or not, then it would make a lot more sense to load the cascade just once and then run cascade classifier on them at once (first just single-threaded CPU and then maybe multi-threaded CPU). Thoughts?

For this, I propose a new function say opencv.detectObjectBatch(buffers, cascadeFile, function(err, [{faces_in_img0},...]). I am assuming the current code wanted to do something similar.

@LinusU
Copy link
Author

LinusU commented Sep 7, 2014

I'm not too familiar with opencv but from my understanding the code currently just loads the cascade file once inside the CascadeClassifierWrap constructor. If you look at the function matrix.detectObject in lib/opencv.js you'll see that it caches the instantiation for each classifier value so I think that it's already done.

The only gain I can see with a detectObjectBatch function would be that it would only require one call into C++, and one call back. However I don't think that this is a necessary operation since detecting the faces is going to take way longer than that. We would also need a good way to deal with errors, do we throw away all the results if the last one gives an error?

I also think that the use case will be limited since having all the photos in memory at the same time is quite bad for performance. That's why I was limiting the whole thing to 16 parallel works to begin with. Since photo collections often can have thousands of photos in them, it's just not a good idea to load them all into memory, then send them all into C++ land, then scan them all, one by one, to finally be able to send back a long array of booleans.

The problem I talked about could be solved with a simple malloc and free thought, but I'm sensing that there is the same problem in more than one place. It would be really good to add a test-case for this.

@LinusU
Copy link
Author

LinusU commented Sep 7, 2014

Okay, so I added in a test case (I didn't really get along with vows, can't I use assert in an async function? wtf?) and I actually got a segmentation fault which was very nice! It shows that I'm on the right track. Most of the times I get the usual OpenCV Error: ....

I'm just going to clean up the code a bit and submit a patch with an added test.

@tonila
Copy link

tonila commented Jan 3, 2016

+1

@timbowhite
Copy link

timbowhite commented Sep 28, 2016

+1 still experiencing this with opencv 2.4.13 while calling detectObject on 13 images total async

@timbowhite
Copy link

timbowhite commented Sep 28, 2016

OK so the best way to avoid this error and avoid having your nodejs process crash (sometimes silenty with no error!) is to completely avoid async calls to detectObject.

Even though detectObject is an async function, you have to call it synchronously (call it, wait for it to finish, and only call it "one-at-a-time"), which probably means you'll probably have to process images sequentially.

EDIT: as @LinusU pointed out, this seems to be a bug with opencv.

@timbrandin
Copy link

I have also stumbled open this, node opencv can't handle parallell work on multiple images at the same time.

@btsimonh
Copy link
Collaborator

@timbrandin:
see comment at the bottom of #223 for advice on how to update detectObject to not run images in parallel; pull requests invited.
Basically, there are some functions in opencv which depend upon persistent data in their instance - where this is the case, you can't run two of these functions at the same time in different threads, so adding a mutex in node-opencv in node-opencv representation of the class, and then protecting the function concerned in the async thread solves the issue.
Be aware that this has been done for some async functions in the master branch (npm install peterbraden/node-opencv - they are not in the published npm version 6.0.0).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants