Adding video processing capability to Tesseract component. #243

hhuangMITRE · 2021-04-12T06:31:32Z

This change is

jrobble

Please update this PR to merge into master. That way we can get it to the users quicker.

Reviewed 7 of 9 files at r1, 5 of 5 files at r2.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @brosenberg42 and @hhuangMITRE)

cpp/TesseractOCRTextDetection/TesseractOCRTextDetection.h, line 63 at r1 (raw file):

        class TessApiWrapper;

        class TesseractOCRTextDetection : public MPFImageDetectionComponentAdapter {

This component is now supports more than images, so it doesn't make sense to use this adapter. I updated the code to extend MPFDetectionComponent directly.

cpp/TesseractOCRTextDetection/TesseractOCRTextDetection.cpp, line 31 at r1 (raw file):

#include <algorithm>
#include <cmath>
#include <fstream>

CLion says this include is not used so I removed it.

cpp/TesseractOCRTextDetection/TesseractOCRTextDetection.cpp, line 145 at r1 (raw file):

bool TesseractOCRTextDetection::Supports(MPFDetectionDataType data_type) {
    // Supports images and documents, no audio or video data types.
    return data_type == MPFDetectionDataType::IMAGE || data_type == MPFDetectionDataType::UNKNOWN;

I had to add VIDEO to this list.

This indicates that you haven't tested your changes in the WFM since it would fail on videos right away. Please test at least one video in the WFM and look at the JSON output to make sure it's what you expect.

cpp/TesseractOCRTextDetection/TesseractOCRTextDetection.cpp, line 1385 at r1 (raw file):

    MPFVideoCapture cap(job);

    int total_frames = cap.GetFrameCount();

I see that you based a lot of this code on VideoCaptureComponent.cpp. A lot of it is not necessary anymore. We should really update that example file when we have more time.

It's better to refer to one of the more recently-developed real components to figure out the best way to do things.

I made the updates to this file.

cpp/TesseractOCRTextDetection/TesseractOCRTextDetection.cpp, line 1441 at r1 (raw file):

        wasRead = cap.Read(frame);
        if (wasRead && !frame.empty()) {
            check_default_languages(ocr_fset, job.job_name, run_dir, job_status);

Do we need to run check_default_languages on every frame, or just once per job?

Note that I removed the job_status parameter from check_default_languages since it was not being used.

cpp/TesseractOCRTextDetection/TesseractOCRTextDetection.cpp, line 1441 at r1 (raw file):

        wasRead = cap.Read(frame);
        if (wasRead && !frame.empty()) {
            check_default_languages(ocr_fset, job.job_name, run_dir, job_status);

Should we check job_status after process_image_job?

I only found one instance where job_status is set to something meaningful:

job_status = MPF_COULD_NOT_OPEN_DATAFILE

(Note that immediately after that line an exception is thrown, so it doesn't seem like setting the status here is important.)

Otherwise, the status is always set to MPF_DETECTION_SUCCESS or just used to shuttle a value around.

Also, the value of job_status in this struct is never accessed (according to CLion):

            struct OCR_results {
                std::string text_result;
                std::string lang;
                MPFDetectionError job_status;
                double confidence;
            };

Throughout your code I see // Will re-throw exception from thread. It seems we're no longer using job_status. If not, please remove all traces of it.

cpp/TesseractOCRTextDetection/TesseractOCRTextDetection.cpp, line 627 at r2 (raw file):

        if (inputs.parallel_processing) {
            // When parallel processing is enabled, new APIs must be started up to avoid deadlocking issues rather than

Is there any way to reuse APIs across video frames, or do we have the same problem?

cpp/TesseractOCRTextDetection/plugin-files/config/mpfOCR.ini, line 255 at r1 (raw file):

# If the frame_interval is set to a value less than 1, a frame_interval of 1 will be used,
# so that detections are performed on every frame. For a frame interval N > 1, every N-1 frames will be skipped.
FRAME_INTERVAL: 1

Note that it should no longer be necessary to list common properties shared by all components in the config file for any individual component, unless you want to change the default value to something else. Here are the common properties.

I removed this entry from this file.

cpp/TesseractOCRTextDetection/plugin-files/descriptor/descriptor.json, line 35 at r1 (raw file):

        },
        {
          "name": "FRAME_INTERVAL",

Again, it should no longer be necessary to list common properties shared by all components in the config file. Refer to the other comment.

I removed this entry from this file.

hhuangMITRE

Reviewable status: 7 of 9 files reviewed, 4 unresolved discussions (waiting on @brosenberg42 and @jrobble)

cpp/TesseractOCRTextDetection/TesseractOCRTextDetection.cpp, line 145 at r1 (raw file):

Previously, jrobble (Jeff Robble) wrote…

I had to add VIDEO to this list.

This indicates that you haven't tested your changes in the WFM since it would fail on videos right away. Please test at least one video in the WFM and look at the JSON output to make sure it's what you expect.

Thanks for catching that!

Sorry, I usually run the WFM checks after the docker build test passes, but I got sidetracked after running the docker build last time. I'm downloading the latest MPF docker images and rerunning the WFM checks again.

cpp/TesseractOCRTextDetection/TesseractOCRTextDetection.cpp, line 1441 at r1 (raw file):

Previously, jrobble (Jeff Robble) wrote…

Should we check job_status after process_image_job?

I only found one instance where job_status is set to something meaningful:
job_status = MPF_COULD_NOT_OPEN_DATAFILE
(Note that immediately after that line an exception is thrown, so it doesn't seem like setting the status here is important.)

Otherwise, the status is always set to MPF_DETECTION_SUCCESS or just used to shuttle a value around.

Also, the value of job_status in this struct is never accessed (according to CLion):
            struct OCR_results {
                std::string text_result;
                std::string lang;
                MPFDetectionError job_status;
                double confidence;
            };
Throughout your code I see // Will re-throw exception from thread. It seems we're no longer using job_status. If not, please remove all traces of it.

I also checked and think we're no longer using job_status, so I removed it from the component.

cpp/TesseractOCRTextDetection/TesseractOCRTextDetection.cpp, line 1441 at r1 (raw file):

Previously, jrobble (Jeff Robble) wrote…

Do we need to run check_default_languages on every frame, or just once per job?

Note that I removed the job_status parameter from check_default_languages since it was not being used.

Default tesseract languages won't need to be checked on every frame. I've updated the component to set the non-frame related job settings once and only update the frame-related (text-type) settings on each frame.

cpp/TesseractOCRTextDetection/TesseractOCRTextDetection.cpp, line 627 at r2 (raw file):

Previously, jrobble (Jeff Robble) wrote…

Is there any way to reuse APIs across video frames, or do we have the same problem?

Currently, the component is processing the results frame-by-frame (w/ parallel threading per frame) and is subject to the same restrictions as parallel image processing.

I think the main reason reusing the API is difficult across parallel threads is the risk that two threads may end up calling the same API. This mainly applies for the PDF jobs where two pages likely share the same detected scripts.

I think the solution is to add an atomic tracking variable (https://www.cplusplus.com/reference/atomic/atomic/exchange/) when reusing APIs in parallel threads. If one thread is already using an existing API, then the atomic variable would be set to the in-use state and the current thread can either wait or launch a new, temporary copy of the API needed.

I believe the APIs can be safely reused for video and image jobs, as each thread should load a different language API. However, if we eventually want to process video frames in parallel, then it's likely we'd want to add an atomic check to make sure no two threads end up reusing the same API.

jrobble

Reviewed 2 of 2 files at r3.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @brosenberg42 and @hhuangMITRE)

a discussion (no related file):
Please update this to land to master and merge with master if you need to.

cpp/TesseractOCRTextDetection/TesseractOCRTextDetection.cpp, line 145 at r1 (raw file):

Previously, hhuangMITRE (Howard W Huang) wrote…

Thanks for catching that!

Sorry, I usually run the WFM checks after the docker build test passes, but I got sidetracked after running the docker build last time. I'm downloading the latest MPF docker images and rerunning the WFM checks again.

👍

cpp/TesseractOCRTextDetection/TesseractOCRTextDetection.cpp, line 627 at r2 (raw file):

Previously, hhuangMITRE (Howard W Huang) wrote…

Currently, the component is processing the results frame-by-frame (w/ parallel threading per frame) and is subject to the same restrictions as parallel image processing.

I think the main reason reusing the API is difficult across parallel threads is the risk that two threads may end up calling the same API. This mainly applies for the PDF jobs where two pages likely share the same detected scripts.

I think the solution is to add an atomic tracking variable (https://www.cplusplus.com/reference/atomic/atomic/exchange/) when reusing APIs in parallel threads. If one thread is already using an existing API, then the atomic variable would be set to the in-use state and the current thread can either wait or launch a new, temporary copy of the API needed.

I believe the APIs can be safely reused for video and image jobs, as each thread should load a different language API. However, if we eventually want to process video frames in parallel, then it's likely we'd want to add an atomic check to make sure no two threads end up reusing the same API.

Thanks for the intel. I left this comment on the other task. Let's hold off on that task for now, as discussed.

…into feature/tesseract-video-processing

jrobble

Reviewable status: complete! all files reviewed, all discussions resolved (waiting on @brosenberg42)

Adding video processing capability to Tesseract component.

efdd663

hhuangMITRE requested review from jrobble and brosenberg42 April 12, 2021 06:31

Addess PR issues.

b2f5402

jrobble requested changes Apr 28, 2021

View reviewed changes

Removing used job status variable.

6bdb765

hhuangMITRE commented May 4, 2021

View reviewed changes

jrobble mentioned this pull request May 6, 2021

Use global TessAPI instances with parallel processing openmpf/openmpf#1008

Open

jrobble requested changes May 7, 2021

View reviewed changes

hhuangMITRE changed the base branch from develop to master May 7, 2021 22:04

Merge branch 'master' of https://github.com/openmpf/openmpf-components …

04632ef

…into feature/tesseract-video-processing

jrobble approved these changes May 10, 2021

View reviewed changes

hhuangMITRE merged commit 7b73571 into master May 10, 2021

hhuangMITRE deleted the feature/tesseract-video-processing branch May 10, 2021 17:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding video processing capability to Tesseract component. #243

Adding video processing capability to Tesseract component. #243

hhuangMITRE commented Apr 12, 2021 •

edited by jrobble

jrobble left a comment

hhuangMITRE left a comment

jrobble left a comment

jrobble left a comment

Adding video processing capability to Tesseract component. #243

Adding video processing capability to Tesseract component. #243

Conversation

hhuangMITRE commented Apr 12, 2021 • edited by jrobble

jrobble left a comment

Choose a reason for hiding this comment

hhuangMITRE left a comment

Choose a reason for hiding this comment

jrobble left a comment

Choose a reason for hiding this comment

jrobble left a comment

Choose a reason for hiding this comment

hhuangMITRE commented Apr 12, 2021 •

edited by jrobble