Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update thresholding.py with Singh function #5490

Open
wants to merge 46 commits into
base: main
Choose a base branch
from

Conversation

lucArub
Copy link

@lucArub lucArub commented Jul 24, 2021

I add the threshold Singh function useful for text recognition.

T Romen Singh, Sudipta Roy, O Imocha Singh, Tejmani Sinam, Kh Manglem Singh.
"A New Local Adaptive Thresholding Technique in Binarization." IJCSI
International Journal of Computer Science Issues. 2011; 8(6-2): 271-276.

This technique is the one proposed in the article (Singh, 2011). It is a
locally adaptive thresholding technique that removes background by using
local mean and mean deviation. Indeed, the principal difference of this
method is that standard deviation
is not required. On the other hand, the threshold is calculated through
the local mean and mean deviation as:
,
where is the local mean
deviation and is a bias. Its range is
Calculation of is straightforward by subtracting the mean the concerned pixel. Because of that, Singh's technique can binaries faster than other local techniques and it's also found to be better in terms of quality.

Description

Checklist

For reviewers

  • Check that the PR title is short, concise, and will make sense 1 year
    later.
  • Check that new functions are imported in corresponding __init__.py.
  • Check that new features, API changes, and deprecations are mentioned in
    doc/release/release_dev.rst.

I add the threshold Singh function useful for text recognition.   

 T Romen Singh, Sudipta Roy, O Imocha Singh, Tejmani Sinam, Kh Manglem Singh.
        "A New Local Adaptive Thresholding Technique in Binarization." IJCSI
        International Journal of Computer Science Issues. 2011; 8(6-2): 271-276.
@pep8speaks
Copy link

pep8speaks commented Jul 24, 2021

Hello @lucArub! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-09-21 14:14:26 UTC

@lucArub lucArub closed this Jul 24, 2021
@lucArub lucArub reopened this Jul 24, 2021
@lucArub lucArub changed the title Update thresholding.py Update thresholding.py with Singh function Jul 24, 2021
Copy link
Contributor

@grlee77 grlee77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @lucArub, thank you for the submission.

The paper corresponding to this has around 200 citations on Google scholar which, although less than "Niblack" or "Sauvola" methods which each have > 2k, is not bad. If you make the suggested modification to the existing _mean_std function, very few lines of new code will be needed, so there is no real concern from a maintenance standpoint.

The main thing missing at this point is a demo showing its use. I would suggest updating the existing gallery example /doc/examples/segmentation/plot_niblack_sauvola.py to also show the result of this threshold.

Also, was there a specific use case where you found the output of this method worked better than the existing thresholds? If so, that could potentially be a new, independent gallery example if there is appropriately licensed data it could be demonstrated on.

@@ -970,6 +970,106 @@ def _mean_std(image, w):
# m*m when floating point error is considered
s = np.sqrt(np.clip(g2 - m * m, 0, None))
return m, s

def _only_mean(image, w):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than introduce a new function here, it would be better to modify the existing _mean_std function with a mean_only or omit_std argument that could be used to return the mean only. Then just return m (mean) early, before the computation of s (std).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks for the suggestion.

"""

m = _only_mean(image, window_size)
d = image - m
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For benefit of other reviewers, unlike the current threshold_sauvola and threshold_niblack, this method uses the local "mean deviation" as defined for d here rather than the usual local "standard deviation" used by those techniques. Thus, computation time should be less for this method.

Copy link
Author

@lucArub lucArub Jul 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The computation time is less for this method. In terms of thresholding quality is quite similar to the Sauvola technique.
If you want you can have a look at those simple results that I obtained for my university using ground truth images of DIBCO 2009 dataset.

https://github.com/lucArub/localthresholding-

Comment on lines 1059 to 1060
International Journal of Computer Science Issues. 2011; 8(6-2): 271-276.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
International Journal of Computer Science Issues. 2011; 8(6-2): 271-276.
International Journal of Computer Science Issues. 2011; 8(6-2): 271-276.
http://ijcsi.org/papers/IJCSI-8-6-3-275-280.pdf

We may as well also provide the URL to the (freely available) publication. As far as I could tell, there does not appear to be an associated DOI.

@grlee77
Copy link
Contributor

grlee77 commented Jul 25, 2021

The main benefit of this technique over Niblack and Sauvola appears to be computation time. Thresholding results appear qualitatively similar for the image in the publication.

I think the computation time comparison in the publication is likely vs. non-optimized Niblack and Sauvola (i.e. without use of integral images to speed up the computation of the local mean and standard deviation). Still, the Singh method as implemented here will be faster than those methods, although I suspect closer to a factor of two or so since it would have only one call to correlate_sparse instead of two.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants