Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: "Comparascope" view to show the detected text superimposed on original image #449

Closed
raindropsfromsky opened this issue Apr 13, 2020 · 8 comments

Comments

@raindropsfromsky
Copy link

In several industries, an instrument called "comparascope" is used that visually compares two items on a screen. Both items are supposed to have intricate details, and they may have a few differences.

The comparascope superimposes the images, and then shows both items alternately, and the view can immediately spot if there is any difference.

Can we use the same idea to compare the original image with the detected text?

  1. Just superimpose the detected text on the original image.
  2. Switch between the two periodically (let the user select this time interval).
  3. Once the user identifies the error, let him stop the auto-switching, and edit the word in the detected text. Once the correction is made, let him restart the auto-switching.

In step-2, manual switching can also be given. For example, if the user presses the < and > keys, gImageReader shows him the image and detected text layer. If he presses both < and > keys, then gImageReader shows him both layers simultaneously (text layer with 50% transparency, superimposed on the image layer).

@manisandro
Copy link
Owner

There is already the preview feature (last button in the hOCR output pane toolbar) which kinda does this? In case what could be done is adding a keyboard shortcut to toggle it.

@raindropsfromsky
Copy link
Author

raindropsfromsky commented Apr 13, 2020

I missed that button because it is attached to the wrong panel (the tree pane). It should have been the last button in the main toolbar; above the main panel. (because the output preview appears in the main panel only).

Anyhow, I tried it, and it can be used for a very fast error-checking.

Only thing is, the detected output does exactly superimpose on the original.

Compare the original-
image

-with the superimposed preview:
image

The font is off by at least one point. Also notice that the text telescopes into each other at bottom; probably because each word extends into the subsequent word (again, because its font is too large).
Also, can the font kerning be adjusted?

Once the text is made to match the image, then the combo can be easily used like comparascope.
A simple manual arrangement would be fine (just three shortcuts: <, > and <+>)

@manisandro
Copy link
Owner

manisandro commented Apr 13, 2020

gImageReader takes whatever font information is returned by tesseract in the hOCR document, but it is pretty much expected that you'll have to manually tweak the font to get a good result. A possible enhancement would be some logic to find a matching font, like user picks a font family and when whether to match the font say line-wise, paragraph-wise, and then based on the font metrics it computes the best font size.

@raindropsfromsky
Copy link
Author

raindropsfromsky commented Apr 13, 2020

That's a nice workflow!

Also please add a kerning slider, so that despite larger font size, all words can get accommodated within their given length, without running into (=overlapping with-) the next word.

[Edit]
Wait: I have another idea! I observed just now that gImageReader creates the bounding boxes for each individual word perfectly. In fact, it boxes the risers and descenders of the letters perfectly.

Can this property be used to fit the chosen font in the box? The only variables are the font size and the kerning.

@manisandro
Copy link
Owner

Well yeah that would be how the above is done, what I meant with line-wise and paragraph-wise is to say average the result over entire lines or paragraphs to avoid each word having a slightly different font size.

@raindropsfromsky
Copy link
Author

Caution: I came across many texts that have headings (or a line with larger/bold text). Can gImageReader detect such line and treat it separately?

Then it makes sense to calculate a common font for the paragraph (rather than for each line separately).

BTW the properties box shows x_fsize and x_wconf as parameters, and also bbox (=bounding box coordinates??).

But the properties do not include the kerning amount. Is it really available to gImageReader?
Even if it does not come from Tesseract, can gImageReader manipulate it?

Secondly, the properties box has a dropdown list of fonts, but none of them is selected by default. Why is that so?

Finally, if you would like to debug this text-fitting issue, I am ready to support by downloading the product several times a day and providing a feedback.

@manisandro
Copy link
Owner

Properties are those as returned by tesseract. Tesseract 4.x LTSM does not report font families.

@raindropsfromsky
Copy link
Author

TogglePreview.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants