New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding video processing capability to Tesseract component. #243
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update this PR to merge into master. That way we can get it to the users quicker.
Reviewed 7 of 9 files at r1, 5 of 5 files at r2.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @brosenberg42 and @hhuangMITRE)
cpp/TesseractOCRTextDetection/TesseractOCRTextDetection.h, line 63 at r1 (raw file):
class TessApiWrapper; class TesseractOCRTextDetection : public MPFImageDetectionComponentAdapter {
This component is now supports more than images, so it doesn't make sense to use this adapter. I updated the code to extend MPFDetectionComponent
directly.
cpp/TesseractOCRTextDetection/TesseractOCRTextDetection.cpp, line 31 at r1 (raw file):
#include <algorithm> #include <cmath> #include <fstream>
CLion says this include is not used so I removed it.
cpp/TesseractOCRTextDetection/TesseractOCRTextDetection.cpp, line 145 at r1 (raw file):
bool TesseractOCRTextDetection::Supports(MPFDetectionDataType data_type) { // Supports images and documents, no audio or video data types. return data_type == MPFDetectionDataType::IMAGE || data_type == MPFDetectionDataType::UNKNOWN;
I had to add VIDEO
to this list.
This indicates that you haven't tested your changes in the WFM since it would fail on videos right away. Please test at least one video in the WFM and look at the JSON output to make sure it's what you expect.
cpp/TesseractOCRTextDetection/TesseractOCRTextDetection.cpp, line 1385 at r1 (raw file):
MPFVideoCapture cap(job); int total_frames = cap.GetFrameCount();
I see that you based a lot of this code on VideoCaptureComponent.cpp
. A lot of it is not necessary anymore. We should really update that example file when we have more time.
It's better to refer to one of the more recently-developed real components to figure out the best way to do things.
I made the updates to this file.
cpp/TesseractOCRTextDetection/TesseractOCRTextDetection.cpp, line 1441 at r1 (raw file):
wasRead = cap.Read(frame); if (wasRead && !frame.empty()) { check_default_languages(ocr_fset, job.job_name, run_dir, job_status);
Do we need to run check_default_languages
on every frame, or just once per job?
Note that I removed the job_status
parameter from check_default_languages
since it was not being used.
cpp/TesseractOCRTextDetection/TesseractOCRTextDetection.cpp, line 1441 at r1 (raw file):
wasRead = cap.Read(frame); if (wasRead && !frame.empty()) { check_default_languages(ocr_fset, job.job_name, run_dir, job_status);
Should we check job_status
after process_image_job
?
I only found one instance where job_status
is set to something meaningful:
job_status = MPF_COULD_NOT_OPEN_DATAFILE
(Note that immediately after that line an exception is thrown, so it doesn't seem like setting the status here is important.)
Otherwise, the status is always set to MPF_DETECTION_SUCCESS
or just used to shuttle a value around.
Also, the value of job_status
in this struct is never accessed (according to CLion):
struct OCR_results {
std::string text_result;
std::string lang;
MPFDetectionError job_status;
double confidence;
};
Throughout your code I see // Will re-throw exception from thread
. It seems we're no longer using job_status
. If not, please remove all traces of it.
cpp/TesseractOCRTextDetection/TesseractOCRTextDetection.cpp, line 627 at r2 (raw file):
if (inputs.parallel_processing) { // When parallel processing is enabled, new APIs must be started up to avoid deadlocking issues rather than
Is there any way to reuse APIs across video frames, or do we have the same problem?
cpp/TesseractOCRTextDetection/plugin-files/config/mpfOCR.ini, line 255 at r1 (raw file):
# If the frame_interval is set to a value less than 1, a frame_interval of 1 will be used, # so that detections are performed on every frame. For a frame interval N > 1, every N-1 frames will be skipped. FRAME_INTERVAL: 1
Note that it should no longer be necessary to list common properties shared by all components in the config file for any individual component, unless you want to change the default value to something else. Here are the common properties.
I removed this entry from this file.
cpp/TesseractOCRTextDetection/plugin-files/descriptor/descriptor.json, line 35 at r1 (raw file):
}, { "name": "FRAME_INTERVAL",
Again, it should no longer be necessary to list common properties shared by all components in the config file. Refer to the other comment.
I removed this entry from this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 7 of 9 files reviewed, 4 unresolved discussions (waiting on @brosenberg42 and @jrobble)
cpp/TesseractOCRTextDetection/TesseractOCRTextDetection.cpp, line 145 at r1 (raw file):
Previously, jrobble (Jeff Robble) wrote…
I had to add
VIDEO
to this list.This indicates that you haven't tested your changes in the WFM since it would fail on videos right away. Please test at least one video in the WFM and look at the JSON output to make sure it's what you expect.
Thanks for catching that!
Sorry, I usually run the WFM checks after the docker build test passes, but I got sidetracked after running the docker build last time. I'm downloading the latest MPF docker images and rerunning the WFM checks again.
cpp/TesseractOCRTextDetection/TesseractOCRTextDetection.cpp, line 1441 at r1 (raw file):
Previously, jrobble (Jeff Robble) wrote…
Should we check
job_status
afterprocess_image_job
?I only found one instance where
job_status
is set to something meaningful:job_status = MPF_COULD_NOT_OPEN_DATAFILE
(Note that immediately after that line an exception is thrown, so it doesn't seem like setting the status here is important.)
Otherwise, the status is always set to
MPF_DETECTION_SUCCESS
or just used to shuttle a value around.Also, the value of
job_status
in this struct is never accessed (according to CLion):struct OCR_results { std::string text_result; std::string lang; MPFDetectionError job_status; double confidence; };
Throughout your code I see
// Will re-throw exception from thread
. It seems we're no longer usingjob_status
. If not, please remove all traces of it.
I also checked and think we're no longer using job_status
, so I removed it from the component.
cpp/TesseractOCRTextDetection/TesseractOCRTextDetection.cpp, line 1441 at r1 (raw file):
Previously, jrobble (Jeff Robble) wrote…
Do we need to run
check_default_languages
on every frame, or just once per job?Note that I removed the
job_status
parameter fromcheck_default_languages
since it was not being used.
Default tesseract languages won't need to be checked on every frame. I've updated the component to set the non-frame related job settings once and only update the frame-related (text-type) settings on each frame.
cpp/TesseractOCRTextDetection/TesseractOCRTextDetection.cpp, line 627 at r2 (raw file):
Previously, jrobble (Jeff Robble) wrote…
Is there any way to reuse APIs across video frames, or do we have the same problem?
Currently, the component is processing the results frame-by-frame (w/ parallel threading per frame) and is subject to the same restrictions as parallel image processing.
I think the main reason reusing the API is difficult across parallel threads is the risk that two threads may end up calling the same API. This mainly applies for the PDF jobs where two pages likely share the same detected scripts.
I think the solution is to add an atomic tracking variable (https://www.cplusplus.com/reference/atomic/atomic/exchange/) when reusing APIs in parallel threads. If one thread is already using an existing API, then the atomic variable would be set to the in-use state and the current thread can either wait or launch a new, temporary copy of the API needed.
I believe the APIs can be safely reused for video and image jobs, as each thread should load a different language API. However, if we eventually want to process video frames in parallel, then it's likely we'd want to add an atomic check to make sure no two threads end up reusing the same API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r3.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @brosenberg42 and @hhuangMITRE)
a discussion (no related file):
Please update this to land to master and merge with master if you need to.
cpp/TesseractOCRTextDetection/TesseractOCRTextDetection.cpp, line 145 at r1 (raw file):
Previously, hhuangMITRE (Howard W Huang) wrote…
Thanks for catching that!
Sorry, I usually run the WFM checks after the docker build test passes, but I got sidetracked after running the docker build last time. I'm downloading the latest MPF docker images and rerunning the WFM checks again.
👍
cpp/TesseractOCRTextDetection/TesseractOCRTextDetection.cpp, line 627 at r2 (raw file):
Previously, hhuangMITRE (Howard W Huang) wrote…
Currently, the component is processing the results frame-by-frame (w/ parallel threading per frame) and is subject to the same restrictions as parallel image processing.
I think the main reason reusing the API is difficult across parallel threads is the risk that two threads may end up calling the same API. This mainly applies for the PDF jobs where two pages likely share the same detected scripts.
I think the solution is to add an atomic tracking variable (https://www.cplusplus.com/reference/atomic/atomic/exchange/) when reusing APIs in parallel threads. If one thread is already using an existing API, then the atomic variable would be set to the in-use state and the current thread can either wait or launch a new, temporary copy of the API needed.
I believe the APIs can be safely reused for video and image jobs, as each thread should load a different language API. However, if we eventually want to process video frames in parallel, then it's likely we'd want to add an atomic check to make sure no two threads end up reusing the same API.
Thanks for the intel. I left this comment on the other task. Let's hold off on that task for now, as discussed.
…into feature/tesseract-video-processing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on @brosenberg42)
This change is