Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding video processing capability to Tesseract component. #243

Merged
merged 4 commits into from May 10, 2021

Conversation

hhuangMITRE
Copy link
Contributor

@hhuangMITRE hhuangMITRE commented Apr 12, 2021

This change is Reviewable

Copy link
Member

@jrobble jrobble left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update this PR to merge into master. That way we can get it to the users quicker.

Reviewed 7 of 9 files at r1, 5 of 5 files at r2.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @brosenberg42 and @hhuangMITRE)


cpp/TesseractOCRTextDetection/TesseractOCRTextDetection.h, line 63 at r1 (raw file):

        class TessApiWrapper;

        class TesseractOCRTextDetection : public MPFImageDetectionComponentAdapter {

This component is now supports more than images, so it doesn't make sense to use this adapter. I updated the code to extend MPFDetectionComponent directly.


cpp/TesseractOCRTextDetection/TesseractOCRTextDetection.cpp, line 31 at r1 (raw file):

#include <algorithm>
#include <cmath>
#include <fstream>

CLion says this include is not used so I removed it.


cpp/TesseractOCRTextDetection/TesseractOCRTextDetection.cpp, line 145 at r1 (raw file):

bool TesseractOCRTextDetection::Supports(MPFDetectionDataType data_type) {
    // Supports images and documents, no audio or video data types.
    return data_type == MPFDetectionDataType::IMAGE || data_type == MPFDetectionDataType::UNKNOWN;

I had to add VIDEO to this list.

This indicates that you haven't tested your changes in the WFM since it would fail on videos right away. Please test at least one video in the WFM and look at the JSON output to make sure it's what you expect.


cpp/TesseractOCRTextDetection/TesseractOCRTextDetection.cpp, line 1385 at r1 (raw file):

    MPFVideoCapture cap(job);

    int total_frames = cap.GetFrameCount();

I see that you based a lot of this code on VideoCaptureComponent.cpp. A lot of it is not necessary anymore. We should really update that example file when we have more time.

It's better to refer to one of the more recently-developed real components to figure out the best way to do things.

I made the updates to this file.


cpp/TesseractOCRTextDetection/TesseractOCRTextDetection.cpp, line 1441 at r1 (raw file):

        wasRead = cap.Read(frame);
        if (wasRead && !frame.empty()) {
            check_default_languages(ocr_fset, job.job_name, run_dir, job_status);

Do we need to run check_default_languages on every frame, or just once per job?

Note that I removed the job_status parameter from check_default_languages since it was not being used.


cpp/TesseractOCRTextDetection/TesseractOCRTextDetection.cpp, line 1441 at r1 (raw file):

        wasRead = cap.Read(frame);
        if (wasRead && !frame.empty()) {
            check_default_languages(ocr_fset, job.job_name, run_dir, job_status);

Should we check job_status after process_image_job?

I only found one instance where job_status is set to something meaningful:

job_status = MPF_COULD_NOT_OPEN_DATAFILE

(Note that immediately after that line an exception is thrown, so it doesn't seem like setting the status here is important.)

Otherwise, the status is always set to MPF_DETECTION_SUCCESS or just used to shuttle a value around.

Also, the value of job_status in this struct is never accessed (according to CLion):

            struct OCR_results {
                std::string text_result;
                std::string lang;
                MPFDetectionError job_status;
                double confidence;
            };

Throughout your code I see // Will re-throw exception from thread. It seems we're no longer using job_status. If not, please remove all traces of it.


cpp/TesseractOCRTextDetection/TesseractOCRTextDetection.cpp, line 627 at r2 (raw file):

        if (inputs.parallel_processing) {
            // When parallel processing is enabled, new APIs must be started up to avoid deadlocking issues rather than

Is there any way to reuse APIs across video frames, or do we have the same problem?


cpp/TesseractOCRTextDetection/plugin-files/config/mpfOCR.ini, line 255 at r1 (raw file):

# If the frame_interval is set to a value less than 1, a frame_interval of 1 will be used,
# so that detections are performed on every frame. For a frame interval N > 1, every N-1 frames will be skipped.
FRAME_INTERVAL: 1

Note that it should no longer be necessary to list common properties shared by all components in the config file for any individual component, unless you want to change the default value to something else. Here are the common properties.

I removed this entry from this file.


cpp/TesseractOCRTextDetection/plugin-files/descriptor/descriptor.json, line 35 at r1 (raw file):

        },
        {
          "name": "FRAME_INTERVAL",

Again, it should no longer be necessary to list common properties shared by all components in the config file. Refer to the other comment.

I removed this entry from this file.

Copy link
Contributor Author

@hhuangMITRE hhuangMITRE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 7 of 9 files reviewed, 4 unresolved discussions (waiting on @brosenberg42 and @jrobble)


cpp/TesseractOCRTextDetection/TesseractOCRTextDetection.cpp, line 145 at r1 (raw file):

Previously, jrobble (Jeff Robble) wrote…

I had to add VIDEO to this list.

This indicates that you haven't tested your changes in the WFM since it would fail on videos right away. Please test at least one video in the WFM and look at the JSON output to make sure it's what you expect.

Thanks for catching that!

Sorry, I usually run the WFM checks after the docker build test passes, but I got sidetracked after running the docker build last time. I'm downloading the latest MPF docker images and rerunning the WFM checks again.


cpp/TesseractOCRTextDetection/TesseractOCRTextDetection.cpp, line 1441 at r1 (raw file):

Previously, jrobble (Jeff Robble) wrote…

Should we check job_status after process_image_job?

I only found one instance where job_status is set to something meaningful:

job_status = MPF_COULD_NOT_OPEN_DATAFILE

(Note that immediately after that line an exception is thrown, so it doesn't seem like setting the status here is important.)

Otherwise, the status is always set to MPF_DETECTION_SUCCESS or just used to shuttle a value around.

Also, the value of job_status in this struct is never accessed (according to CLion):

            struct OCR_results {
                std::string text_result;
                std::string lang;
                MPFDetectionError job_status;
                double confidence;
            };

Throughout your code I see // Will re-throw exception from thread. It seems we're no longer using job_status. If not, please remove all traces of it.

I also checked and think we're no longer using job_status, so I removed it from the component.


cpp/TesseractOCRTextDetection/TesseractOCRTextDetection.cpp, line 1441 at r1 (raw file):

Previously, jrobble (Jeff Robble) wrote…

Do we need to run check_default_languages on every frame, or just once per job?

Note that I removed the job_status parameter from check_default_languages since it was not being used.

Default tesseract languages won't need to be checked on every frame. I've updated the component to set the non-frame related job settings once and only update the frame-related (text-type) settings on each frame.


cpp/TesseractOCRTextDetection/TesseractOCRTextDetection.cpp, line 627 at r2 (raw file):

Previously, jrobble (Jeff Robble) wrote…

Is there any way to reuse APIs across video frames, or do we have the same problem?

Currently, the component is processing the results frame-by-frame (w/ parallel threading per frame) and is subject to the same restrictions as parallel image processing.

I think the main reason reusing the API is difficult across parallel threads is the risk that two threads may end up calling the same API. This mainly applies for the PDF jobs where two pages likely share the same detected scripts.

I think the solution is to add an atomic tracking variable (https://www.cplusplus.com/reference/atomic/atomic/exchange/) when reusing APIs in parallel threads. If one thread is already using an existing API, then the atomic variable would be set to the in-use state and the current thread can either wait or launch a new, temporary copy of the API needed.

I believe the APIs can be safely reused for video and image jobs, as each thread should load a different language API. However, if we eventually want to process video frames in parallel, then it's likely we'd want to add an atomic check to make sure no two threads end up reusing the same API.

Copy link
Member

@jrobble jrobble left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r3.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @brosenberg42 and @hhuangMITRE)

a discussion (no related file):
Please update this to land to master and merge with master if you need to.



cpp/TesseractOCRTextDetection/TesseractOCRTextDetection.cpp, line 145 at r1 (raw file):

Previously, hhuangMITRE (Howard W Huang) wrote…

Thanks for catching that!

Sorry, I usually run the WFM checks after the docker build test passes, but I got sidetracked after running the docker build last time. I'm downloading the latest MPF docker images and rerunning the WFM checks again.

👍


cpp/TesseractOCRTextDetection/TesseractOCRTextDetection.cpp, line 627 at r2 (raw file):

Previously, hhuangMITRE (Howard W Huang) wrote…

Currently, the component is processing the results frame-by-frame (w/ parallel threading per frame) and is subject to the same restrictions as parallel image processing.

I think the main reason reusing the API is difficult across parallel threads is the risk that two threads may end up calling the same API. This mainly applies for the PDF jobs where two pages likely share the same detected scripts.

I think the solution is to add an atomic tracking variable (https://www.cplusplus.com/reference/atomic/atomic/exchange/) when reusing APIs in parallel threads. If one thread is already using an existing API, then the atomic variable would be set to the in-use state and the current thread can either wait or launch a new, temporary copy of the API needed.

I believe the APIs can be safely reused for video and image jobs, as each thread should load a different language API. However, if we eventually want to process video frames in parallel, then it's likely we'd want to add an atomic check to make sure no two threads end up reusing the same API.

Thanks for the intel. I left this comment on the other task. Let's hold off on that task for now, as discussed.

@hhuangMITRE hhuangMITRE changed the base branch from develop to master May 7, 2021 22:04
Copy link
Member

@jrobble jrobble left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @brosenberg42)

@hhuangMITRE hhuangMITRE merged commit 7b73571 into master May 10, 2021
@hhuangMITRE hhuangMITRE deleted the feature/tesseract-video-processing branch May 10, 2021 17:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants