Skip to content

Synthetic Dataset Generation: Recovering Homography from Camera Captured Documents

License

Notifications You must be signed in to change notification settings

kapitsa2811/DocHomographyGenerator

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DocHomographyGenerator (Python2.7)

Keywords: OCR, Page Dewarping, Synthetic Dataset Generation, Deep Learning

Synthetic Dataset Generation

This work is based on Recovering Homography from Camera Captured Documents using Convolutional Neural Networks (2017) and aims to provide a synthetic dataset producer/ generator (using various data augmentation methods), which allow to estimate the corner displacement vectors of the distorted document image.

distorted document images:

dewarped document images (first row):

1. Introduction

Capturing document images is a common way for digitizing physical documents due to the ubiquitousness of smartphones. In contrast to scans from a flatbed scanner, camera captured documents require a more sophisticated processing pipeline, because of perspective distorted images. In order to restore (dewarp) the original document image, one computes the Homography (3x3 matrix), that maps the corner points of the document image to its canonical position. However, estimating the params of the Homography matrix directly from one single input image is difficult, see.

An alternative way of computing H, is the 4pts method (see chapter findHomography Camera Calibration and 3D Reconstruction). Having 4 corresponding coplanar points, the distorted image can be unwarped the following way:

    #  Calculate Homography
    h, status = cv2.findHomography(pts_src, pts_dst)

    #  Warp source image to destination
    return cv2.warpPerspective(src=img, M=h, dsize=(width, height)) 

2. Methodology

The following figures demonstrate the generation process of some sample images.

Note:

  • In contrast to the original work, this module also allows to generate images where corners are outside the image boundaries.
  • The param mode_p (src/config) determines the ratio between included and excluded corners. By default, at least 70% of all generated images will only have included corners.
  • Dataset Format (stored as .mat file):
    X = (N, height, width, 3)  
    Y = (N, 8)                  # 4 * x,y (top left, top right, ...,bottom left)
  • Beside the possibility of generating an arbitrarily large dataset (see DataProducer), DocHomography Generator also allows to be used as Python generator (see DataGenerator), where data is only generated batch-by-batch. This is in particular useful, when the dataset is too big to fit into memory (Big Data). For example, in order to train a model using a python generator, one can use the fit_generator()-method provided by Keras.

3. Setup

  1. Download textures or background images (e.g. from DTD or MIT Indoor scenes dataset) into res/backgrounds as collection of images (remove intermediate folders).

  2. Insert Pdf images (as PNG) into /res/input as collection of images; Note: to convert PDFs to PNGs, one can use the scripts provided in /src/data_utils.py

  3. Install dependencies (using pip)

    pip install -r requirements.txt

4. File Structure

res                               
    ├── backgrounds                 # background images (gif not supported)              
    ├── input                       # pdf images 
    ├── output                      # location where to store generated dataset + corners as .mat file
src
    ├── unit_tests                  # unit tests demonstrating functionality
        ├── ...
    ├── dataGenerator.py            # Data Synthesis optimized for Keras fit_generator()
    ├── dataProducer.py             # Data Synthesis using multiprocessing (fixed set)
    ├── dataConfig.py               # all config params of data synthesis
    ├── dataUtils.py                # helper methods       
requirements.txt                    # dependencies (Python2.7) 

5. Usage

# init Augmenter with input, output and backgrounds
augmenter = AugmenterV2(input, output, backgrounds)

# e.g. generate 100 document images
augmenter.augmentDataset_master(max=100, mode_p=0.7)

For more information: see src/unit_tests

License

DocHomographyGenerator_license

About

Synthetic Dataset Generation: Recovering Homography from Camera Captured Documents

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%