rlsa_python

This is a python implementation of the Run Length Smoothing Algorithm (RLSA). It was designed with text document layout analysis in mind in order to merge charatcters on the same line and make it easier to identify senteces and paragraphs.

External requirements

rlsa_python only requires you to have numpy installed.

Usage

How to install it

$ pip install rlsa_python

How to use it

Let's say we are trying to apply rlsa to the following image:

You would first read in the image as a numpy array and convert it to inverse binary. It is also recommended, but not required, to apply filters like a gaussian or morphological filter to remove unwanted noise.

# Use any image processing library you like, but the image passed to RLSA has to be a numpy array.
# In this case I am using opencv.
import cv2

# Read the image.
img = cv2.imread(image_file_path)

# Convert it to inverse binary.
gray_scale = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
__, binary_image = cv2.threshold(gray_scale, 0, 255, cv2.THRESH_BINARY_INV | cv2.THRESH_OTSU)

The result should be something like this:

Then you can use the rlsa_python library like so:

# Import RLSA class
from rlsa_python.rlsa import RLSA

smeared_text_image = RLSA.apply_rlsa(img, 100, 100, 50)

The result with the above image is the following:

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.idea		.idea
rlsa_python		rlsa_python
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

rlsa_python

External requirements

Usage

How to install it

How to use it

About

Releases 1

Packages

Languages

License

luizvalle/rlsa_python

Folders and files

Latest commit

History

Repository files navigation

rlsa_python

External requirements

Usage

How to install it

How to use it

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages