Demo: Jupyter Notebook
Inspired by the paper Best Combination of Binarization Methods for License Plate Character Segmentation
This was originally developped while studying low-level character segmentation methods for LPR(License Plate Recognition). The more popular global thresholding methods for binarization (Otsu, etc) are not very well suited for LPR systems, as explored by the authors on the cited paper.
In the article the authors posit that using a single binarization method with static parameters while efficient for certain conditions, is not the best approach. Different methods or parameters will perform best for different visual features across images (or even different regions in a single image), so it follows that applying multiple methods and merging the results should yield better overrall accuracy.
Motivated by the lack of support for local thresholding binarization by popular computer vision libraries, I wrote this code to provide a simple interface for the production and use of multiple binary images in character segmentation (especially in LPR systems, but should be useful to OCR applications in general). It currently supports Niblack's, Sauvola's and Wolf's binarization methods.
- Python3
- numpy
- OpenCV
- scipy
- bottleneck
The main binarization function is located in multibin.py
. it can be used as follows:
import multibin as mb
img = cv2.imread(img_path)
bin_imgs = mb.binarize(img, bin_methods)
Optional arguments are:
- resize: Resize input image to desired output dimensions;
- morph_kernel: Define a morphological kernel to be used in opening image to reduce noise (see: this line). This improves the results slightly;
- return_original: Return a copy of the original image resized to output dimensions as the first position in the resulting array. Useful for prototyping.
You can find an example using all of them on the demo notebook included in this repo.
This function returns a list containing one binary image for each method described. The methods are defined as a list dictionary objects with the following format:
[{
'type' : Binarization method (string),
'window_size': Moving square window dimension (int),
'k_factor': Constant (int)
},
(...)]
You can read more about the window size an k constant selection on the paper that inspired this code. The threshold is calculated using bottleneck internally to speed up obtaining the moving average and standar deviation parameters.
Some auxiliary functions are defined in utils/cs_utils.py
that serve to demonstrate how to select potential ROIs from a binarized image. This is very rundimentary since I've given up on using this method on my original project and moved on to Deep Learning instead (who hasn't?). Anyway, should anyone ever need or want to explore binarization-based OCR it should be a helpful start. The demonstration notebook should be useful in visualizing what the system is doing and possible next steps.
Features:
- Wolf's, Sauvola's and Niblack's local thresholding methods
- CCA analysis for blob extraction
- Discard uninsteresting blobs following guidelines from this work
- Other local thresholding algorithms as described in the paper
- Perform non-maximum supression on redundant regions
- Implement character recognition for final ROIs
I'll probably not be coming back to work on this anymore, but should anyone feel the urge to continue the work, I'll happily be of assistance.